forked from Ponysearch/Ponysearch
313 lines
24 KiB
ReStructuredText
313 lines
24 KiB
ReStructuredText
Engine overview
|
|
===============
|
|
|
|
|
|
searx is a `metasearch-engine <https://en.wikipedia.org/wiki/Metasearch_engine>`__,
|
|
so it uses different search engines to provide better results.
|
|
|
|
Because there is no general search API which could be used for every
|
|
search engine, an adapter has to be built between searx and the
|
|
external search engines. Adapters are stored under the folder
|
|
`searx/engines
|
|
<https://github.com/asciimoo/searx/tree/master/searx/engines>`__.
|
|
|
|
|
|
.. contents::
|
|
:depth: 3
|
|
|
|
general engine configuration
|
|
----------------------------
|
|
|
|
It is required to tell searx the type of results the engine provides. The
|
|
arguments can be set in the engine file or in the settings file
|
|
(normally ``settings.yml``). The arguments in the settings file override
|
|
the ones in the engine file.
|
|
|
|
It does not matter if an option is stored in the engine file or in the
|
|
settings. However, the standard way is the following:
|
|
|
|
|
|
engine file
|
|
~~~~~~~~~~~
|
|
|
|
+----------------------+-----------+-----------------------------------------+
|
|
| argument | type | information |
|
|
+======================+===========+=========================================+
|
|
| categories | list | pages, in which the engine is working |
|
|
+----------------------+-----------+-----------------------------------------+
|
|
| paging | boolean | support multible pages |
|
|
+----------------------+-----------+-----------------------------------------+
|
|
| language\_support | boolean | support language choosing |
|
|
+----------------------+-----------+-----------------------------------------+
|
|
| time\_range\_support | boolean | support search time range |
|
|
+----------------------+-----------+-----------------------------------------+
|
|
|
|
settings.yml
|
|
~~~~~~~~~~~~
|
|
|
|
+------------+----------+-----------------------------------------------+
|
|
| argument | type | information |
|
|
+============+==========+===============================================+
|
|
| name | string | name of search-engine |
|
|
+------------+----------+-----------------------------------------------+
|
|
| engine | string | name of searx-engine (filename without .py) |
|
|
+------------+----------+-----------------------------------------------+
|
|
| shortcut | string | shortcut of search-engine |
|
|
+------------+----------+-----------------------------------------------+
|
|
| timeout | string | specific timeout for search-engine |
|
|
+------------+----------+-----------------------------------------------+
|
|
|
|
overrides
|
|
~~~~~~~~~
|
|
|
|
A few of the options have default values in the engine, but are
|
|
often overwritten by the settings. If ``None`` is assigned to an option
|
|
in the engine file, it has to be redefined in the settings,
|
|
otherwise searx will not start with that engine.
|
|
|
|
The naming of overrides is arbitrary. But the recommended
|
|
overrides are the following:
|
|
|
|
+-----------------------+----------+----------------------------------------------------------------+
|
|
| argument | type | information |
|
|
+=======================+==========+================================================================+
|
|
| base\_url | string | base-url, can be overwritten to use same engine on other URL |
|
|
+-----------------------+----------+----------------------------------------------------------------+
|
|
| number\_of\_results | int | maximum number of results per request |
|
|
+-----------------------+----------+----------------------------------------------------------------+
|
|
| language | string | ISO code of language and country like en\_US |
|
|
+-----------------------+----------+----------------------------------------------------------------+
|
|
| api\_key | string | api-key if required by engine |
|
|
+-----------------------+----------+----------------------------------------------------------------+
|
|
|
|
example code
|
|
~~~~~~~~~~~~
|
|
|
|
.. code:: python
|
|
|
|
# engine dependent config
|
|
categories = ['general']
|
|
paging = True
|
|
language_support = True
|
|
|
|
making a request
|
|
----------------
|
|
|
|
To perform a search an URL have to be specified. In addition to
|
|
specifying an URL, arguments can be passed to the query.
|
|
|
|
passed arguments
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
These arguments can be used to construct the search query. Furthermore,
|
|
parameters with default value can be redefined for special purposes.
|
|
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| argument | type | default-value, information |
|
|
+======================+============+========================================================================+
|
|
| url | string | ``''`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| method | string | ``'GET'`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| headers | set | ``{}`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| data | set | ``{}`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| cookies | set | ``{}`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| verify | boolean | ``True`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| headers.User-Agent | string | a random User-Agent |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| category | string | current category, like ``'general'`` |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| started | datetime | current date-time |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| pageno | int | current pagenumber |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
| language | string | specific language code like ``'en_US'``, or ``'all'`` if unspecified |
|
|
+----------------------+------------+------------------------------------------------------------------------+
|
|
|
|
parsed arguments
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
The function ``def request(query, params):`` always returns the
|
|
``params`` variable. Inside searx, the following paramters can be
|
|
used to specify a search request:
|
|
|
|
+------------+-----------+---------------------------------------------------------+
|
|
| argument | type | information |
|
|
+============+===========+=========================================================+
|
|
| url | string | requested url |
|
|
+------------+-----------+---------------------------------------------------------+
|
|
| method | string | HTTP request method |
|
|
+------------+-----------+---------------------------------------------------------+
|
|
| headers | set | HTTP header information |
|
|
+------------+-----------+---------------------------------------------------------+
|
|
| data | set | HTTP data information (parsed if ``method != 'GET'``) |
|
|
+------------+-----------+---------------------------------------------------------+
|
|
| cookies | set | HTTP cookies |
|
|
+------------+-----------+---------------------------------------------------------+
|
|
| verify | boolean | Performing SSL-Validity check |
|
|
+------------+-----------+---------------------------------------------------------+
|
|
|
|
example code
|
|
~~~~~~~~~~~~
|
|
|
|
.. code:: python
|
|
|
|
# search-url
|
|
base_url = 'https://example.com/'
|
|
search_string = 'search?{query}&page={page}'
|
|
|
|
# do search-request
|
|
def request(query, params):
|
|
search_path = search_string.format(
|
|
query=urlencode({'q': query}),
|
|
page=params['pageno'])
|
|
|
|
params['url'] = base_url + search_path
|
|
|
|
return params
|
|
|
|
returned results
|
|
----------------
|
|
|
|
Searx is able to return results of different media-types.
|
|
Currently the following media-types are supported:
|
|
|
|
- default
|
|
- images
|
|
- videos
|
|
- torrent
|
|
- map
|
|
|
|
To set another media-type as default, the parameter
|
|
``template`` must be set to the desired type.
|
|
|
|
default
|
|
~~~~~~~
|
|
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------+
|
|
| result-parameter | information |
|
|
+====================+===============================================================================================================+
|
|
| url | string, url of the result |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------+
|
|
| title | string, title of the result |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------+
|
|
| content | string, general result-text |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------+
|
|
| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------+
|
|
|
|
images
|
|
~~~~~~
|
|
|
|
to use this template, the parameter
|
|
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| result-parameter | information |
|
|
+====================+=======================================================================================================================================+
|
|
| template | is set to ``images.html`` |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| url | string, url to the result site |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| title | string, title of the result *(partly implemented)* |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| content | *(partly implemented)* |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish *(partly implemented)* |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| img\_src | string, url to the result image |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| thumbnail\_src | string, url to a small-preview image |
|
|
+--------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
|
|
videos
|
|
~~~~~~
|
|
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| result-parameter | information |
|
|
+====================+==============================================================================================================+
|
|
| template | is set to ``videos.html`` |
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| url | string, url of the result |
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| title | string, title of the result |
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| content | *(not implemented yet)* |
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish |
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| thumbnail | string, url to a small-preview image |
|
|
+--------------------+--------------------------------------------------------------------------------------------------------------+
|
|
|
|
torrent
|
|
~~~~~~~
|
|
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| result-parameter | information |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| template | is set to ``torrent.html`` |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| url | string, url of the result |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| title | string, title of the result |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| content | string, general result-text |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish *(not implemented yet)* |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| seed | int, number of seeder |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| leech | int, number of leecher |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| filesize | int, size of file in bytes |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| files | int, number of files |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| magnetlink | string, `magnetlink <https://en.wikipedia.org/wiki/Magnet_URI_scheme>`__ of the result |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
| torrentfile | string, torrentfile of the result |
|
|
+------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
|
|
|
|
|
map
|
|
~~~
|
|
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| result-parameter | information |
|
|
+=========================+==============================================================================================================+
|
|
| url | string, url of the result |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| title | string, title of the result |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| content | string, general result-text |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| publishedDate | `datetime.datetime <https://docs.python.org/2/library/datetime.html#datetime-objects>`__, time of publish |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| latitude | latitude of result (in decimal format) |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| longitude | longitude of result (in decimal format) |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| boundingbox | boundingbox of result (array of 4. values ``[lat-min, lat-max, lon-min, lon-max]``) |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| geojson | geojson of result (http://geojson.org) |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| osm.type | type of osm-object (if OSM-Result) |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| osm.id | id of osm-object (if OSM-Result) |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| address.name | name of object |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| address.road | street name of object |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| address.house\_number | house number of object |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| address.locality | city, place of object |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| address.postcode | postcode of object |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
| address.country | country of object |
|
|
+-------------------------+--------------------------------------------------------------------------------------------------------------+
|
|
|