Commit graph

41 commits

Author SHA1 Message Date
Markus Heiser
0ef6aa5126 [docs] add documentation from the sources of the google engines
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-21 18:25:52 +02:00
Markus Heiser
9328c66e93 [fix] google news - send CONSENT Cookie to not be redirected
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking.  To get the consent
from the user, google-news requests are redirected to confirm and get a CONSENT
Cookie from https://consent.google.de/s?continue=...

This patch adds a CONSENT Cookie to the google-news request to avoid
redirection.

The behavior of the CONTENTS cookies over all google engines seems similar but
the pattern is not yet fully clear to me, here are some random samples from my
analysis ..

Using common google search from different domains::

    google.com:        CONSENT=YES+cb.{{date}}-14-p0.de+FX+816
    google.de:         CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
    google.fr:         CONSENT=YES+srp.gws-{{date}}-0-RC2.fr+FX+826

When searching about videos (google-videos)::

    google.es:         CONSENT=YES+srp.gws-{{date}}-0-RC2.es+FX+076
    google.de:         CONSENT=YES+srp.gws-{{date}}-0-RC2.de+FX+171

Google news has only one domain for all languages::

    news.google.com:   CONSENT=YES+cb.{{date}}-14-p0.de+FX+816

Using google-scholar search from different domains::

    scholar.google.de: CONSENT=YES+cb.{{date}}-14-p0.de+FX+333
    scholar.google.fr: does not use such a cookie / did not ask the user
    scholar.google.es: does not use such a cookie / did not ask the user

Interim summary:

  Pattern is unclear and I won't apply the CONSENT cookie to all google engines.
  More experience is need before we generalize the CONSENT cookies over all
  google engines.

Related:

- e9a6ab401 [fix] youtube - send CONSENT Cookie to not be redirected
- https://github.com/benbusby/whoogle-search/issues/311
- https://github.com/benbusby/whoogle-search/issues/243

[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-18 13:21:20 +02:00
Markus Heiser
dd7b53d369 [fix] google-news engine - KeyError: 'hl in request
Since we added

- 1c67b6aec [enh] google engine: supports "default language"

there is a KeyError: 'hl in request,error pattern::

    ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
    Traceback (most recent call last):
      File "searx/search/processors/online.py", line 144, in search
        search_results = self._search_basic(query, params)
      File "searx/search/processors/online.py", line 118, in _search_basic
        self.engine.request(query, params)
      File "searx/engines/google_news.py", line 97, in request
        if lang_info['hl'] == 'en':
      KeyError: 'hl'

Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-18 11:34:11 +02:00
Markus Heiser
2ac3e5b20b [fix] log messages from: google- images, news, scholar, videos
- HTTP header Accept-Language --> lang_info['headers']['Accept-Language']
- remove obsolete query_url log messages which is already logged by
  httpx._client:HTTP request

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-06-11 16:31:50 +02:00
Alexandre Flament
1c67b6aece [enh] google engine: supports "default language"
Same behaviour behaviour than Whoogle [1].  Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.

When searching for a locate place, the result are in the expect language,
without missing results [2]:

  > When a language is not specified, the language interpretation is left up to
  > Google to decide how the search results should be delivered.

The query parameters are copied from Whoogle.  With the ``all`` language:

- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.

The new signature of function ``get_lang_info()`` is:

    lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)

Argument ``supported_any_language`` is True for google.py and False for the other
google engines.  With this patch the function now returns:

- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
  - ``lang_info['subdomain']``
  - ``lang_info['country']``
  - ``lang_info['language']``

[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
2021-06-10 10:22:01 +02:00
Markus Heiser
dc29f1d826 [pylint] tag PYLINT_FILES by comment # lint: pylint
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-26 20:18:20 +02:00
Alexandre Flament
ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser
b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament
8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser
5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser
baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament
a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament
b00d108673 [mod] pylint: numerous minor code fixes 2020-12-01 15:21:19 +01:00
Alexandre Flament
3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Dalf
1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Markus Heiser
c89c05bceb bugfix: google-news and bing-news has changed the language parameter
closes: https://github.com/asciimoo/searx/issues/1838

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-25 18:44:28 +01:00
Noémi Ványi
b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Marc Abonce Seguin
5568f24d6c [fix] check language aliases when setting search language 2019-01-06 20:31:57 -06:00
Marc Abonce Seguin
835d1edd58 [fix] google news xpath 2018-04-08 20:56:05 -05:00
Marc Abonce Seguin
772c048d01 refactor engine's search language handling
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.

Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.
2018-03-27 00:08:03 -06:00
marc
4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
misnyo
3182ba7069 [fix] google news dom xpath fix 2017-08-31 17:48:07 +02:00
Adam Tauber
52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
Adam Tauber
108392f8da [fix] skip non-complete google news results 2017-01-10 11:03:05 +01:00
Adam Tauber
8bff42f049 Merge branch 'master' into languages 2016-12-28 20:00:53 +01:00
Adam Tauber
0171db5c3f [fix] handle missing images in google news 2016-12-23 12:59:52 +01:00
marc
af35eee10b tests for _fetch_supported_languages in engines
and refactor method to make it testable without making requests
2016-12-15 00:40:21 -06:00
marc
f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc
149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Noémi Ványi
c59c76e6ee add year to time range to engines which support "Last year"
Engines:
 * Bing images
 * Flickr (noapi)
 * Google
 * Google Images
 * Google News
2016-12-11 16:58:31 +01:00
Adam Tauber
a2c94895c1 [fix] google news engine change follow-up 2016-12-11 01:03:52 +01:00
Alexandre Flament
4689fe341c update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00
Cqoicebordel
b7dc1fb9d5 Google news' unit test 2015-01-31 16:38:03 +01:00
dalf
7c13d630e4 [fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 16:37:56 +01:00
Thomas Pointhuber
144f89bf78 add comments to google-engines 2014-09-01 15:10:05 +02:00
Adam Tauber
ce08abe223 [fix] remove unused imports ++ pep8 2014-03-18 19:24:01 +01:00
Thomas Pointhuber
337bd6d907 simplify datetime extraction 2014-03-18 13:19:50 +01:00
Adam Tauber
b735093564 [fix] pep8 2014-03-15 20:20:41 +01:00
Thomas Pointhuber
b88146d669 showing publishedDate for news 2014-03-14 09:55:11 +01:00
Adam Tauber
98b6313d5d [fix] pep8 2014-03-04 14:20:29 +01:00
Thomas Pointhuber
0549f8c7c8 Create google_news.py 2014-03-04 13:11:04 +01:00