Commit graph

5255 commits

Author SHA1 Message Date
Alexandre Flament
f03ad0a3c0
Merge pull request #2555 from dalf/fix-github-data-update
[fix] fix github action data-update.yml
2021-02-09 10:48:55 +01:00
Alexandre Flament
966a7a1f25 [fix] fix github action data-update.yml 2021-02-09 09:58:59 +01:00
Alexandre Flament
e4cc7f13a3
Merge pull request #2542 from kvch/fix-naver-engine
Fix XPATHs in Naver engine
2021-02-09 08:52:38 +01:00
Alexandre Flament
bec9e30fe7
Merge pull request #2554 from MarcAbonce/zh-variants-in-wikipedia
Add support for Chinese variants in Wikipedia
2021-02-09 08:49:59 +01:00
Alexandre Flament
6c513095e4
Merge pull request #2553 from danielhones/improve-results-highlighting-updated
Ignore double-quotes when highlighting query parts
2021-02-09 08:39:07 +01:00
Daniel Hones
138f32471c Updated webutils.highlight_content to ignore double-quotes when highlighting query parts 2021-02-08 23:58:54 -05:00
Marc Abonce Seguin
64e81794fe add support for Chinese variants in Wikipedia 2021-02-08 21:56:45 -07:00
Noémi Ványi
ac309f5b8d Fix naver engine
Closes #2540
2021-02-07 18:58:13 +01:00
Noémi Ványi
ab8739809c
Merge pull request #2538 from return42/drop-metager
[drop] metager - xpath engine won't work anymore
2021-02-07 15:21:40 +01:00
Markus Heiser
41c03cf011 [drop] metager - xpath engine won't work anymore
The new version of MetaGer needs to reload the reults (into a iframe) with a
unique tag (see HTML response below).

Implementing a dedicated metager-engine for searx makes no sense to me. The
great days of MetaGer seems to be ended.  I remember the good old days this
project started in the 90's of the last century.  But in the last few years it
becomes more and more crap.  As the name suggested, MetaGer was made for
germans in the first place.  They have added a english and spain translation but
the i18n is very poor compared to what searx offers.

It's a pity, lets drop MetaGer.

This is the first response, the id (b82679980656899ba5a17ffd02a56846) is unique
for each query:

    $ curl "https://metager.org/meta/meta.ger3?eingabe=foo&submit-query=&focus=web"
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <link rel="stylesheet" href="/index.css?id=b82679980656899ba5a17ffd02a56846">
        <script src="/index.js?id=b82679980656899ba5a17ffd02a56846"></script>
    <title>foo - MetaGer</title>
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
    </head>
    <body>
        <iframe id="mg-framed" src="https://metager.org/meta/meta.ger3?eingabe=foo&amp;submit-query=&amp;focus=web&amp;mgv=b82679980656899ba5a17ffd02a56846" autofocus="true" onload="this.contentWindow.focus();"></iframe>
     </body>

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-07 14:55:21 +01:00
Noémi Ványi
1f09d7d561
Merge pull request #2539 from OliveiraHermogenes/recoll/paged_json
[feat] recoll: add support for paging
2021-02-07 14:28:57 +01:00
Hermógenes Oliveira
514faa9162 [feat] recoll: paged json support 2021-02-07 10:05:35 -03:00
Alexandre Flament
1e35c3ccce
Merge pull request #2531 from MarcAbonce/fix-browser-locale
[fix] Get correct locale with country from browser
2021-02-05 10:55:37 +01:00
Marc Abonce Seguin
c937a9e85f [fix] get correct locale with country from browser
Some of our interface locales include uppercase country codes,
which are separated by `_` instead of the more common `-`.
Also, a browser's `Accept-Language` header could be in lowercase.

This commit attempts to normalize those cases so a browser's
language+country codes can better match with our locales.

This solution assumes that our UI locales have nothing more than
language and optionally country. If we ever add a script specific
locale like `zh-Hant-TW` this would have to change to accomodate
that, but the idea would be pretty much the same as this fix.
2021-02-04 19:53:59 -07:00
Alexandre Flament
321788f14a
Merge pull request #2528 from dalf/mod-ci-gh-pages
[mod] CI: minor changes
2021-02-04 23:12:27 +01:00
Noémi Ványi
ffaf785f82
Merge pull request #2533 from mrwormo/ccengine
[Engine] Add Creative Commons search engine
2021-02-04 22:35:08 +01:00
mrwormo
c4c1636b18 Add Creative Commons search engine 2021-02-04 11:31:35 +01:00
Noémi Ványi
006f206dc9
Merge pull request #2530 from return42/fix-user-hb
[fix] make books/user.pdf
2021-02-02 20:50:35 +01:00
Markus Heiser
89554e42a9 [fix] make books/user.pdf
Error:

  Configuration error:
  There is a programmable error in your configuration file:
  ...
  NameError: name 'DOCS_URL' is not defined
  make: *** [utils/makefile.sphinx:156: books/user.latex] Fehler 2

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-02 20:14:07 +01:00
Alexandre Flament
90b9d0d6a8 [mod] CI: minor changes
* utils/makefile.python: travis-gh-pages renamed ci-gh-pages
2021-02-02 08:53:57 +01:00
Alexandre Flament
34de715e62
Merge pull request #2500 from dalf/github-action-data
[enh] every Sunday, call utils/fetch_*.py scripts and create a PR automatically
2021-02-01 17:16:58 +01:00
Alexandre Flament
1742355eb8
Merge pull request #2499 from dalf/remove-language-support-variable
[mod] dynamically set language_support variable
2021-02-01 17:16:18 +01:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Alexandre Flament
99244440e4
Merge pull request #2514 from return42/fix-gh-pages
[fix] Makefile target gh-pages & flatten history of branch gh.pages
2021-02-01 17:07:08 +01:00
Alexandre Flament
0a8799b834
Merge pull request #2517 from dalf/debug-ci
Update pyenv pyenvinstall Make targets
2021-02-01 17:01:34 +01:00
Markus Heiser
8c45f1149d [hardening] github workflows - corrupted cache
aka: ensure that 'make test' works as expected

The cache contains a copy './local' which is - under some circumstance -
corrupted.  It is not possible to clear the cache [1] (see the top of the page).

Ensure that 'make test' works as expected [2] even if

- the python interpreter is missing
- the virtualenv exists but pyyaml is missing

To hardening when the workflow cache fails, this patch adds the new target
'travis.test' into the workflow.  This target probes to import a python module
'yaml'.  If this fails the virtualenv will be completely new build.

[1] https://github.com/actions/cache/issues/2#issuecomment-673493515
[2] https://github.com/searx/searx/pull/2517#discussion_r567240235

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01 16:58:04 +01:00
Markus Heiser
38b39ef0ae [fix] re-add 'pip-exe' target - partial revert 9b48ae47
Target pip-exe is a prerequisite of the targets:

  - pyinstall
  - pyuninstall

and was accidentally deleted in commit 9b48ae47.

HINT:
  do not confuse pyinstall with penvinstall

pyinstall & pyuninstall
    Installing into user's HOME using pip from OS,
    therefore the message is needed.

pyenvinstall & pyenvuninstall
    Installing into virtualenv (./local) using pip which is provided by
    prerequisite 'pyenv' in the virtualenv.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01 16:58:04 +01:00
Alexandre Flament
d70c5a621a [mod] more robust make pyenv / make pyenvinstall
"make pyenv" ensures that ./local/py3/bin/python is an executable
2021-02-01 16:58:04 +01:00
Alexandre Flament
806af50738
Merge pull request #2494 from return42/rm-fabfile
[fix] remove Fabric file
2021-02-01 15:09:35 +01:00
Markus Heiser
40d2a116e1 [fix] Makefile target gh-pages & flatten history of branch gh.pages
1. This patch fixes error:

    rm -rf gh-pages/
    make V=1 gh-pages
    make[1]: Leaving directory '/800GBPCIex4/share/searx'
    [ -d "gh-pages/.git" ] || git clone  gh-pages
    fatal: repository 'gh-pages' does not exist

2. The gh-page build has been moved to ./build/gh-pages this also affects
   'travis-gh-pages'

3. The gh-pages commit messages now includes a ref to the repository and commit

4. Since a gh-pages history has only the drawback that the reposetory grows
   fast, this patch also flattens the history:

    cd build/gh-pages/; git log --oneline
    bash: cd: build/gh-pages/: Datei oder Verzeichnis nicht gefunden
    026126be (HEAD -> gh-pages, origin/gh-pages) make gh-pages: from https://github.com/return42/searx.git@71d66979c2935312e0aed7fc7c3cf6199fbe88a2

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-29 11:41:48 +01:00
Alexandre Flament
71d66979c2
Merge pull request #2482 from return42/fix-google-video
[fix] revise of the google-Video engine
2021-01-28 11:11:07 +01:00
Markus Heiser
7f505bdc6f [fix] google: avoid unnecessary SearxEngineXPathException errors
Avoid SearxEngineXPathException errors when parsing non valid results::

    .//div[@class="yuRUbf"]//a/@href index 0 not found
    Traceback (most recent call last):
      File "./searx/engines/google.py", line 274, in response
        url = eval_xpath_getindex(result, href_xpath, 0)
      File "./searx/searx/utils.py", line 608, in eval_xpath_getindex
        raise SearxEngineXPathException(xpath_spec, 'index ' + str(index) + ' not found')
    searx.exceptions.SearxEngineXPathException: .//div[@class="yuRUbf"]//a/@href index 0 not found

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
e436287385 [mod] checker: add some additional tests
BTW: fix indentation by 2 spaces

The additional tests has been commented out in the google engines to not release
any CAPTCHA issues.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:50 +01:00
Markus Heiser
b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament
0f18e885bf
Merge pull request #2479 from Tobi823/master
Document workaround for using 2 languages simultaneously #1508
2021-01-27 21:29:42 +01:00
Alexandre Flament
b661c3f5d4
Merge pull request #2509 from return42/fix-morty-key
[doc] improve admin-docs about result proxy (morty) configuration
2021-01-27 15:31:29 +01:00
Markus Heiser
a69a8a3ed5 [doc] improve admin-docs about result proxy (morty) configuration
[1] https://github.com/searx/searx/pull/1872#issuecomment-768107138

Suggested-by @dalf [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-27 09:58:06 +01:00
Markus Heiser
923b490022 [mod] add Makfile targets for search.checker.<engine_name>
To check all engines:

    make search.checker

To check a engine 'google news' replace space by underline:

    make search.checker.google_news

To see HTTP requests and more use SEARX_DEBUG:

    make SEARX_DEBUG=1 search.checker.google_news

To filter out HTTP redirects:

    make SEARX_DEBUG=1 search.checker.google_news | grep -A1 "HTTP/1.1\" 3[0-9][0-9]"
    ...
    Engine google news                   Checking
    https://news.google.com:443 "GET /search?q=life&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=life&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --
    https://news.google.com:443 "GET /search?q=computer&hl=en&lr=lang_en&ie=utf8&oe=utf8&ceid=US%3Aen&gl=US HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&lr=lang_en&ie=utf8&oe=utf8&ceid=US:en&gl=US HTTP/1.1" 200 None
    --

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-26 11:46:36 +01:00
Alexandre Flament
6047087aac [mod] utils/fetch_languages.py: write files at the right location 2021-01-24 14:25:27 +01:00
Alexandre Flament
3330cf4a46 [enh] every monday, call utils/fetch_*.py scripts and create a PR automatically 2021-01-24 13:32:39 +01:00
Markus Heiser
ff6804e545 [data] make engines.languages
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:52:32 +01:00
Markus Heiser
8cdad5d85d [fix] google-videos: parse values for 'length' & 'author'
The 'video.html' template from the 'oscar' design supports replacement
for *author* and *length*.  Google-videos does not have an author, alternatively
the publisher info from is used for the *author*.

Hint: these replacements are not supported by the 'simple' design.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:51:24 +01:00
Markus Heiser
89b3050b5c [fix] revise of the google-Video engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 09:39:30 +01:00
Alexandre Flament
f4a17acb7a
Merge pull request #2498 from dalf/minor-fix-google-news
[fix] google_news: avoid one HTTP redirect except for the English results
2021-01-24 09:13:48 +01:00
Alexandre Flament
96c2996857
Merge pull request #2497 from return42/fix-test.sh
[fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
2021-01-24 09:06:11 +01:00
Alexandre Flament
8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser
ea5c992d4f [fix] lxc.sh - SC2034: ubu2010_boilerplate appears unused.
$ make test.sh
  In utils/lxc.sh line 42:
  ubu2010_boilerplate="$ubu1904_boilerplate"
  ^-----------------^ SC2034: ubu2010_boilerplate appears unused. Verify use (or export if used externally).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-24 08:29:13 +01:00
Alexandre Flament
7d24850d49
Merge pull request #2483 from return42/fix-google-news
[fix] revise of the google-News engine
2021-01-23 20:21:09 +01:00
Markus Heiser
5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser
baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00