Commit graph

4236 commits

Author SHA1 Message Date
Markus Heiser
e36b023508 [mod] core.ac.uk: add cetgory 'scientific publications'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 16:16:22 +02:00
Markus Heiser
b424ee255e [mod] paper.html: simplify template by using result_link macro
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 16:13:36 +02:00
Alexandre Flament
bfd6f61849
Merge pull request #1804 from return42/fix-core.ac.uk
core.ac.uk: use paper.html template
2022-09-24 15:12:05 +02:00
Alexandre Flament
16443d4f4a [mod] core.ac.uk: try multiple ways to get url
If the url is not found, using:
* the DOI
* the downloadUrl
* the ARK id
2022-09-24 15:02:39 +02:00
Markus Heiser
3198c906af [mod] paper.html: add links to doi resolver
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 14:19:51 +02:00
Markus Heiser
c76830d8a8 [mod] core.ac.uk: use paper.html template
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 13:19:33 +02:00
Markus Heiser
3ff2ad939d [fix] ERROR searx.engines.core.ac.uk: list index out of range
Some result items from core.ac.uk do not have an URL::

  Traceback (most recent call last):
  File "searx/search/processors/online.py", line 154, in search
    search_results = self._search_basic(query, params)
  File "searx/search/processors/online.py", line 142, in _search_basic
    return self.engine.response(response)
  File "SearXNG/searx/engines/core.py", line 73, in response
    'url': source['urls'][0].replace('http://', 'https://', 1),

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 13:19:33 +02:00
Markus Heiser
caebafdd06 [fix] typo in crossref settings: disable --> disabled
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-24 08:12:36 +02:00
Alexandre Flament
d6446be38f [mod] science category: various update of about PR 1705 2022-09-23 20:52:55 +02:00
Alexandre FLAMENT
fe43b6e821 [build] /static 2022-09-23 20:45:58 +02:00
Alexandre FLAMENT
e36f85b836 Science category: update the engines
* use the paper.html template
* fetch more data from the engines
* add crossref.py
2022-09-23 20:45:58 +02:00
Alexandre FLAMENT
593026ad9c oa_doi_rewrite: add the doi to the result when it is found.
Currentty, when oa_doi_rewrite find a DOI in the result URL, it replace the URL.
In this commit, the plugin adds the key "doi" to the result,
so the paper.html can show it.
2022-09-23 20:45:58 +02:00
Alexandre FLAMENT
5ba831d6a8 Add paper.html result template 2022-09-23 20:45:58 +02:00
Alexandre FLAMENT
a96f503d7b Add searx.webutils.searxng_format_date
* Move the datetime to str code from searx.webapp.search to searx.webutils.searxng_format_date
* When the month, day, hour, day and second are zero, the function returns only the year.
2022-09-23 20:45:58 +02:00
Alexandre Flament
bef3984d03
Merge pull request #1728 from liimee/eng-ddw
add duckduckgo weather engine
2022-09-23 18:14:09 +02:00
Alexandre Flament
d3fec1388c
Merge pull request #1624 from liimee/eng-wttr
Add wttr.in engine
2022-09-23 18:13:37 +02:00
searxng-bot
ab6e1542ff [translations] update from Weblate
55133802 - 2022-09-21 - Linerly <linerly@protonmail.com>
b9309bdf - 2022-09-22 - Xosé M <correo@xmgz.eu>
6da8db13 - 2022-09-21 - Constantine Giannopoulos <K.Giannopoulos@acg.edu>
c1edbd89 - 2022-09-21 - Markus Heiser <markus.heiser@darmarit.de>
9795e5fe - 2022-09-22 - alexfs2015 <alex04fs@gmail.com>
2022-09-23 07:38:23 +00:00
Alexandre Flament
1a7b6872b5
Merge pull request #1792 from unixfox/google-images-internal-api
use the internal API for google images
2022-09-21 19:50:38 +02:00
Markus Heiser
cf7ee67f71 [mod] google-images: slightly improvements of the engine
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-21 18:59:55 +02:00
Markus Heiser
8b40e68c56 [fix] wording: SearXNG is 'open' and not 'hackable'
The word "hackable" may arouse interest in programmers to participate in the
development, but it scares the ordinary user.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-21 17:10:35 +02:00
Emilien Devos
df5f8d0e8e use the internal API for google images 2022-09-20 22:52:38 +02:00
Markus Heiser
dcf1d408a5 [fix] google-news: origin result does not have a content area
The google news are in a rework, the content area of a news item has been
removed.

Closes: https://github.com/searxng/searxng/issues/1790
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-20 20:18:43 +02:00
Alexandre FLAMENT
33b43763b9 Brave engine: fix BrotliDecoderDecompressStream error 2022-09-18 22:08:38 +00:00
Markus Heiser
fbf07237ff [fix] and improve docs generated from source code.
Fix::

    searx/locales.py:docstring of searx.locales.get_engine_locale:17: \
      WARNING: Definition list ends without a blank line; unexpected unindent.

Improvement: don't show default values in the generated documentation whe it is
more a mess than a usefull information (`:meta hide-value:`).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-18 12:44:12 +02:00
Alexandre Flament
b3708e4137
Merge pull request #1757 from LencoDigitexer/master
add yandex autocomplete
2022-09-17 13:17:00 +02:00
LencoDigitexer
bc28091557 remove the print statement 2022-09-17 11:25:14 +03:00
searxng-bot
2ee8e5eff2 [translations] update from Weblate
570c4f7d - 2022-09-15 - Fero Novák <itzwowsmile@gmail.com>
0ef09ea1 - 2022-09-15 - dogyx <aaronloit@tuta.io>
03f97e22 - 2022-09-09 - beriain <soila@disroot.org>
caddaedc - 2022-09-10 - Markus Heiser <markus.heiser@darmarit.de>
addfb0c2 - 2022-09-09 - NxOne14 <kiril2315@gmail.com>
2872e3a6 - 2022-09-11 - Markus Heiser <markus.heiser@darmarit.de>
d2835b09 - 2022-09-11 - Sadith Nadungoda <sadithnadungoda@gmail.com>
2022-09-16 07:30:31 +00:00
LencoDigitexer
3f72a79088 add yandex to autocomplete backends settings 2022-09-09 23:50:58 +03:00
LencoDigitexer
7b8d6015e3 add yandex autocompleter 2022-09-09 23:42:44 +03:00
Alexandre Flament
eb3d185e66
Merge pull request #1755 from searxng/dependabot/npm_and_yarn/searx/static/themes/simple/master/sharp-0.31.0
Bump sharp from 0.30.7 to 0.31.0 in /searx/static/themes/simple
2022-09-09 10:40:38 +02:00
searxng-bot
bf8ea2020f [translations] update from Weblate 2022-09-09 07:26:09 +00:00
dependabot[bot]
cbf65e8292
Bump sharp from 0.30.7 to 0.31.0 in /searx/static/themes/simple
Bumps [sharp](https://github.com/lovell/sharp) from 0.30.7 to 0.31.0.
- [Release notes](https://github.com/lovell/sharp/releases)
- [Changelog](https://github.com/lovell/sharp/blob/main/docs/changelog.md)
- [Commits](https://github.com/lovell/sharp/compare/v0.30.7...v0.31.0)

---
updated-dependencies:
- dependency-name: sharp
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-09 07:11:11 +00:00
Alexandre Flament
691c0ed6b9
Merge pull request #1743 from dalf/update_about_metrics
Update about the metrics
2022-09-04 11:29:28 +02:00
Markus Heiser
ad8ffd222c [mod] option 'ui: cache_url:' to configure internet cache or archive
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-04 09:42:20 +02:00
Alexandre FLAMENT
dd0887be18 xpath engine: change raise_for_httperror to no_result_for_http_status
no_result_for_http_status contains a list of HTTP status.
These HTTP status are seen an empty result list.
In other cases an exception is thrown as usual.

Previously raise_for_httperror were ignoring all HTTP error,
which make defective engines invisible in the stats.
2022-09-04 09:07:28 +02:00
Markus Heiser
a15dfa5ee1 [fix] engine woxikon.de - don't raise exception on empty result list
Woxikon expects a word in German, so with query "foo" the site finds nothing and
respons a 404:

    httpx.HTTPStatusError: Client error '404 Not Found' \
      for url 'https://synonyme.woxikon.de/synonyme/foo.php'

[1] https://github.com/searxng/searxng/issues/1543#issuecomment-1193317054

Closes: https://github.com/searxng/searxng/issues/1543
Suggested-by: @allendema [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-04 09:07:28 +02:00
Markus Heiser
8e9fb0b435
Merge pull request #1647 from return42/deepl-engine
[mod] add deepl translation engine
2022-09-02 14:09:22 +02:00
Alexandre FLAMENT
66f72a006f template: when enable_metrics are disabled, hide the empty stats
when metrics are disabled:
* hide the link to /stats at the bottom of the pages
* in /preferences, hide the columns "Response time" and "Reliability"
2022-09-02 08:52:23 +00:00
Alexandre FLAMENT
94a28ebe53 Stats: display only the score per result, no anymore the score 2022-09-02 08:30:38 +00:00
searxng-bot
9e84cf31a4 [translations] update from Weblate
9b3635db - 2022-08-31 - Andrij Mizyk <andmizyk@gmail.com>
875cbf38 - 2022-09-01 - Markus Heiser <markus.heiser@darmarit.de>
7b3b12a0 - 2022-08-31 - SecularSteve <fairfull.playing@gmail.com>
e60b4544 - 2022-08-28 - Markus Heiser <markus.heiser@darmarit.de>
dc9c17a4 - 2022-08-31 - SecularSteve <fairfull.playing@gmail.com>
7307cf31 - 2022-08-31 - SecularSteve <fairfull.playing@gmail.com>
ae642e6f - 2022-09-01 - Xosé M <correo@xmgz.eu>
8db6b9c9 - 2022-08-28 - Markus Heiser <markus.heiser@darmarit.de>
d74ef692 - 2022-08-31 - SecularSteve <fairfull.playing@gmail.com>
3ddf8997 - 2022-08-30 - Markus Heiser <markus.heiser@darmarit.de>
303d0890 - 2022-08-28 - Markus Heiser <markus.heiser@darmarit.de>
2022-09-02 07:25:26 +00:00
Émilien Devos
fcccf39030
Disable brave by default
Brave is too unstable and will often not work by default. As seen in many issues: https://github.com/searxng/searxng/issues?q=is%3Aissue++sort%3Aupdated-desc+brave+label%3Abug+
2022-08-31 15:47:56 +02:00
ta
85b5293e40 simplify infobox result 2022-08-31 18:29:50 +07:00
ta
12f7d4a46b add duckduckgo weather engine 2022-08-31 17:29:32 +07:00
Alexandre Flament
242db53118
Merge pull request #1708 from dalf/result_proxy_default_settings
settings.yml: set default values for result_proxy
2022-08-29 19:42:04 +02:00
Alexandre Flament
a7bd2b47c2
Merge pull request #1712 from dalf/remove_searx_env_var
Remove usage of SEARX environment variables
2022-08-29 19:41:39 +02:00
Markus Heiser
13ef9cc125
Merge pull request #1720 from searxng/update_data_update_ahmia_blacklist.py
Update searx.data - update_ahmia_blacklist.py
2022-08-29 07:09:04 +02:00
Markus Heiser
e9b564d066
Merge pull request #1722 from searxng/update_data_update_languages.py
Update searx.data - update_languages.py
2022-08-29 07:08:26 +02:00
Markus Heiser
2b65502388
Merge pull request #1723 from searxng/update_data_update_firefox_version.py
Update searx.data - update_firefox_version.py
2022-08-29 07:07:54 +02:00
Markus Heiser
55d04a089d
Merge pull request #1724 from searxng/update_data_update_engine_descriptions.py
Update searx.data - update_engine_descriptions.py
2022-08-29 07:07:14 +02:00
Markus Heiser
4a96480bd5
Merge pull request #1721 from searxng/update_data_update_wikidata_units.py
Update searx.data - update_wikidata_units.py
2022-08-29 07:06:39 +02:00
dalf
c2400a8677 Update searx.data - update_engine_descriptions.py 2022-08-29 02:17:55 +00:00
dalf
e8bf907eef Update searx.data - update_firefox_version.py 2022-08-29 02:09:34 +00:00
dalf
915c0a2bc6 Update searx.data - update_languages.py 2022-08-29 02:09:27 +00:00
dalf
b1ccecbeb3 Update searx.data - update_wikidata_units.py 2022-08-29 02:09:17 +00:00
dalf
2e6d41fa24 Update searx.data - update_ahmia_blacklist.py 2022-08-29 02:09:09 +00:00
dalf
83fbc16908 Update searx.data - update_currencies.py 2022-08-29 02:09:09 +00:00
Alexandre FLAMENT
4adc9920e9 Remove usage of SEARX environment variables 2022-08-28 17:12:57 +00:00
Alexandre FLAMENT
341ad46303 settings.yml: set default values for result_proxy
* initialize result_proxy with searx/settings_defaults.py
* allow result_proxy.key to be a string

this commit supersedes #1522
2022-08-28 09:27:53 +00:00
Markus Heiser
8bdc6986a1
Merge pull request #1706 from dalf/fix-autocomplete-post
Fix: autocomplete with the POST method: url encode the user query
2022-08-28 09:14:47 +02:00
Markus Heiser
3be847149e
Merge pull request #1707 from dalf/fix-external-bang
External bang: bug fix: URL encode the query so "!!g 1+1" works as intended
2022-08-28 09:07:24 +02:00
Alexandre FLAMENT
2af1a6f547 External bang: bug fix: URL encode the query so "!!g 1+1" works as intended 2022-08-27 07:10:26 +00:00
Alexandre FLAMENT
268fa7e036 [build] /static 2022-08-27 06:52:20 +00:00
Alexandre FLAMENT
4a72a6b9fc Theme: fix autocompletion with the POST method
With the POST method, autocomplete.js does not URL encode the values.
For example "1+1" is sent as "1+1" which is read as "1 1" since space are URL encoded with a plus.

There is no clean way to fix the bug since autocomplete.js seems abandoned.

The commit monkey patches the ajax function of autocomplete.js

Related to #1695
2022-08-27 06:48:30 +00:00
Alexandre Flament
56000d5162
Merge pull request #1699 from liimee/eng-app-store
add apple app store engine
2022-08-27 07:43:23 +02:00
Alexandre Flament
44bc94c36e
Merge pull request #1700 from liimee/eng-ddm
add apple maps engine
2022-08-27 07:36:16 +02:00
ta
5057007270 remove thumbnail from results 2022-08-27 06:23:30 +07:00
ta
525946d7dd add poi's website and phone number, doesn't crash when there is no displayMapRegion, query the token on the first request 2022-08-27 06:17:58 +07:00
Alexandre Flament
5284de9137
Merge pull request #1702 from tiekoetter/limiter-accept-encoding-handling
[mod] limiter plugin: Accept-Encoding handling
2022-08-26 11:54:12 +02:00
searxng-bot
e5a25e51bf [translations] update from Weblate
3e034294 - 2022-08-26 - Markus Heiser <markus.heiser@darmarit.de>
46a4dfd3 - 2022-08-24 - Markus Heiser <markus.heiser@darmarit.de>
d41463fd - 2022-08-24 - Markus Heiser <markus.heiser@darmarit.de>
338b6716 - 2022-08-22 - Markus Heiser <markus.heiser@darmarit.de>
0c9d7756 - 2022-08-22 - Markus Heiser <markus.heiser@darmarit.de>
b422a480 - 2022-08-19 - Markus Heiser <markus.heiser@darmarit.de>
44c9caa0 - 2022-08-22 - Ricardo Simões <xmcorporation@gmail.com>
a774721f - 2022-08-20 - Markus Heiser <markus.heiser@darmarit.de>
d8a322d6 - 2022-08-22 - Markus Heiser <markus.heiser@darmarit.de>
2022-08-26 07:24:01 +00:00
Léon Tiekötter
221740f76e
[mod] limiter plugin: Accept-Encoding handling
Only raise "suspicious Accept-Encoding" when both "gzip" and "deflate" are missing from Accept-Encoding.
Prevent Browsers which only implement one compression solution from being blocked by the limiter plugin.
Example Browser which is currently blocked: Lynx Browser (https://lynx.invisible-island.net)
2022-08-25 23:21:30 +02:00
ta
5dce299b22 add apple maps engine 2022-08-25 17:05:40 +07:00
Alexandre Flament
5a241e545e
Merge pull request #1688 from liimee/eng-9gag
Add 9gag engine
2022-08-25 09:32:52 +02:00
ta
cef7bbab22 get the not cropped version of the thumbnail when the image height is not too important 2022-08-24 18:33:11 +07:00
ta
78bff4618c add safesearch support 2022-08-24 18:31:04 +07:00
ta
bcae7ae4e3 add developer info as author 2022-08-24 17:50:38 +07:00
ta
e5c1b64b1d add the apple app store engine
The Apple App Store is the digital app distribution platform for iOS & iPadOS.
2022-08-24 17:27:36 +07:00
ta
040e24f9ad support playing videos directly 2022-08-24 16:48:31 +07:00
Markus Heiser
c2db7b2a66 [fix] Internal server error after changing UI language to BG
A placeholder has been translated to BG, issue was added 8 month ago, when BG
translation was added [1]

    msgid "Compute {functions} of the arguments"
    msgstr "Изчислете {функции} на аргументите"

The incorrect translation has been corrected here in the message files and on
weblate.

[1] https://weblate.bubu1.eu/translate/searxng/searxng/bg/?&offset=49#history
Closes: https://github.com/searxng/searxng/issues/1692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-23 08:11:16 +02:00
ta
79d06509c1 add tags as suggestions 2022-08-23 05:18:35 +07:00
ta
d22f469010 use invalid-name instead of C0103 for pylint 2022-08-22 18:27:35 +07:00
ta
dd9127492f add 9gag engine
9GAG is a social media website where users upload and share user-generated images and videos
2022-08-22 17:35:07 +07:00
ta
e64cca8c3f don't raise error when nothing was found 2022-08-22 17:04:29 +07:00
M Asenov
faa32d5773 fixed xpath selector for appropriate results 2022-08-21 20:08:00 +01:00
Alexandre Flament
5ed40af3ba
Merge pull request #1661 from liimee/eng-tw
Add twitter engine
2022-08-21 15:21:18 +02:00
Markus Heiser
ee3c5e7752
Merge pull request #1666 from return42/harden-get_engine_locales
[fix] typo in get_engine_locale
2022-08-21 08:22:29 +02:00
Markus Heiser
77a0f33819 [fix] engine duden - don't raise exception on empty result list
Duden expects a word in German, so with query "amazing" the site finds nothing
and respons a 404:

    httpx.HTTPStatusError: Client error '404 Not Found' for url\
      'https://www.duden.de/suchen/dudenonline/amazing'

[1] https://github.com/searxng/searxng/issues/1543#issuecomment-1193317054

Suggested-by: @allendema [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-20 08:41:03 +02:00
Markus Heiser
6f28a69f12
Merge pull request #1677 from searxng/dependabot/pip/master/pygments-2.13.0
Bump pygments from 2.12.0 to 2.13.0
2022-08-19 10:21:22 +02:00
Markus Heiser
299635fb8b [build] /static 2022-08-19 10:01:25 +02:00
Markus Heiser
b08a779f2e make pygments.less
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-19 10:00:50 +02:00
searxng-bot
3478c0bc8b [translations] update from Weblate
2b94abf3 - 2022-08-13 - Markus Heiser <markus.heiser@darmarit.de>
249c92f8 - 2022-08-13 - gkkulik <gregorykkulik@gmail.com>
a331870c - 2022-08-12 - Markus Heiser <markus.heiser@darmarit.de>
5aca8ddc - 2022-08-17 - Markus Heiser <markus.heiser@darmarit.de>
6e7d76a0 - 2022-08-18 - Markus Heiser <markus.heiser@darmarit.de>
2a49e5f0 - 2022-08-15 - Markus Heiser <markus.heiser@darmarit.de>
2d2cafa6 - 2022-08-18 - Content Card <weblate-bubu1@gabg.email>
adcf97ed - 2022-08-15 - Markus Heiser <markus.heiser@darmarit.de>
2022-08-19 07:18:58 +00:00
ta
05851978cf add explanation of token 2022-08-17 19:45:42 +07:00
ta
c8acd4a3b6 add profile image to user results 2022-08-17 14:30:59 +07:00
ta
b6fd7cd571 add thumbnail to results if available 2022-08-17 14:25:22 +07:00
Markus Heiser
de1e7d12f7 [fix] get_engine_locale: better approximation of 'en' is 'en-US'
Compared to `en-EN` the better approximation of 'en' is 'en-US'.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 15:45:07 +02:00
Markus Heiser
ac7776663b [fix] typo in get_engine_locale
Due to a typo in get_engine_locale, a language selection like `!qw :de siemens`
did not work.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 14:35:09 +02:00
Markus Heiser
ef81d14ccf [fix] harden get_engine_locale: handle UnknownLocaleError exceptions
When a user selects an unknown or invalid locale by using the search syntax:

    !qw siemens :de-TW

Before this patch a UnknownLocaleError exception will be rasied:

```
Traceback (most recent call last):
  File "SearXNG/searx/search/processors/online.py", line 154, in search
    search_results = self._search_basic(query, params)
  File "SearXNG/searx/search/processors/online.py", line 128, in _search_basic
    self.engine.request(query, params)
  File "SearXNG/searx/engines/qwant.py", line 98, in request
    q_locale = get_engine_locale(params['language'], supported_languages, default='en_US')
  File "SearXNG/searx/locales.py", line 216, in get_engine_locale
    locale = babel.Locale.parse(searxng_locale, sep='-')
  File "SearXNG/local/py3/lib/python3.8/site-packages/babel/core.py", line 330, in parse
    raise UnknownLocaleError(input_id)
```

This patch implements a simple exception handling, since e.g. `de-TW` does not
exists `de` will be used to get engines locale.  On invalid terms like `xy-XY`
the default will be returned.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 13:55:42 +02:00
Markus Heiser
27385e7898 [mod] qwant - add safesearch option
Closes: https://github.com/searxng/searxng/issues/1640
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:36:14 +02:00
Markus Heiser
6579d6d558 [fix] qwant - API error::locale must be one ..
The request function should not request a language (aka locale) that is not
supported by qwant. Select a locale like zh-TW ends in qwant's API error:

  ERROR searx.engines.qwant news: exception : \
  API error::locale must be one of the following values: \
    en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \
    fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \
    es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \
    nl_be, nl_nl

The existing searx.utils.match_language function is unsuitable for this purpose,
it is replaced by function searx.locales.get_engine_locale that is based on the
methods from the babel package.

The quant's _fetch_supported_languages function has been revised to filter out
languages 8aka locales) not supported by qwant.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:36:14 +02:00
Markus Heiser
9ae409a05a [mod] add locale.get_engine_locale to get predictable results
The match_language function sometimes returns incorrect results which is why a
new function get_engine_locale is required.

A bugfix of the match_language is not easily possible, because there is almost
no documentation for it and already the call parameters are undefined.  E.g. the
function processes values like the ones from yahoo::

    "yahoo": [
        "ar",
        ...
        "zh_chs",
        "zh_cht"
     ]

The get_engine_locale has been documented in detail, there is a clear
description of the assumptions as well as the requirements and approximation
rules (read doc-string for more details)::

    Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to
    corresponding *engine locales*:

      <engine>: {
          # SearXNG string : engine-string
          'ca-ES'          : 'ca_ES',
          'fr-BE'          : 'fr_BE',
          'fr-CA'          : 'fr_CA',
          'fr-CH'          : 'fr_CH',
          'fr'             : 'fr_FR',
          ...
          'pl-PL'          : 'pl_PL',
          'pt-PT'          : 'pt_PT'
      }

    .. hint::

       The *SearXNG locale* string has to be known by babel!

In the following you will find a comparison:

>>> import babel.languages
>>> from searx.utils import match_language
>>> from searx.locales import get_engine_locale

Assume we have an engine that supports the follwoing locales:

>>> lang_list = {
...     "zh-CN": "zh_CN",
...     "zh-HK": "zh_HK",
...     "nl-BE": "nl_BE",
...     "fr-CA": "fr_CA",
... }

Assumption:

  A. When a user selects a language the results should be optimized according to
     the selected language.

  B. When user selects a language and a territory the results should be
     optimized with first priority on territory and second on language.

----

Example: (Assumption A.)

  A user selects region 'zh-TW' which should end in zh_HK

hint:
  CN is 'Hans' and HK ('Hant') fits better to TW ('Hant')

>>> get_engine_locale('zh-TW', lang_list)
'zh_HK'
>>> lang_list[match_language('zh-TW', lang_list)]
'zh_CN'

----

Example: (Assumption A.)

  A user selects only the language 'zh' which should end in CN

>>> get_engine_locale('zh', lang_list)
'zh_CN'
>>> lang_list[match_language('zh', lang_list)]
'zh_CN'

----

Example: (Assumption B.)

  A user selects region 'fr-BE' which should end in nl-BE

hint:
  priority should be on the territory the user selected.  If the user
  prefers 'fr' he will select 'fr' without a region tag.

>>> get_engine_locale('fr-BE', lang_list, default='unknown')
'nl_BE'
>>> match_language('fr-BE', lang_list, fallback='unknown')
'fr-CA'

----

Example: (Assumption A.)

  A user selects only the language 'fr' which should end in fr_CA

>>> get_engine_locale('fr', lang_list)
'fr_CA'
>>> lang_list[match_language('fr', lang_list)]
'fr_CA'

----

The difference in priority on the territory is best shown with a engine that
supports the following locales:

>>> lang_list = {
...     "fr-FR": "fr_FR",
...     "fr-CA": "fr_CA",
...     "en-GB": "en_GB",
...     "nl-BE": "nl_BE",
... }

----

Example: (Assumption A.)

   A user selects only a language

>>> get_engine_locale('en', lang_list)
'en_GB'
>>> match_language('en', lang_list)
'en-GB'

hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR
takes priority ..

>>> get_engine_locale('fr', lang_list)
'fr_FR'
>>> lang_list[match_language('fr', lang_list)]
'fr_FR'

----

Example: (Assumption B.)

  A user selects region 'fr-BE' which should end in nl-BE

>>> get_engine_locale('fr-BE', lang_list)
'nl_BE'
>>> lang_list[match_language('fr-BE', lang_list)]
'fr_FR'

----

If the user selects a language and there are two locales like the following:

>>> lang_list = {
...      "fr-BE": "fr_BE",
...      "fr-CH": "fr_CH",
...  }
>>>

>>> get_engine_locale('fr', lang_list)
'fr_BE'
>>> lang_list[match_language('fr', lang_list)]
'fr_BE'

Looks like both functions return the same value, but match_language depends on the
order of the dictionary (which is not predictable):

>>> lang_list = {
...      "fr-CH": "fr_CH",
...      "fr-BE": "fr_BE",
...  }
>>> get_engine_locale('fr', lang_list)
'fr_BE'
>>> lang_list[match_language('fr', lang_list)]
'fr_CH'
>>>

The get_engine_locale selects the locale by looking at the "population percent"
and this percentage has an higher amount in BE (68.%) compared to CH (21%)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:35:55 +02:00
Markus Heiser
75bb8c45d0 [mod] decouple qwant's categories from SearXNG's categories
By using new property `qwant_categ:` the category of qwant is no longer bound to
the category of SearXNG.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14 10:26:54 +02:00