Merge pull request #109 from return42/sql-engines

Merge MySQL and PostgreSQL from searx
This commit is contained in:
Alexandre Flament 2021-05-29 08:34:19 +02:00 committed by GitHub
commit e239369f1f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 307 additions and 0 deletions

View file

@ -13,3 +13,4 @@ Blog
private-engines private-engines
command-line-engines command-line-engines
search-indexer-engines search-indexer-engines
sql-engines

160
docs/blog/sql-engines.rst Normal file
View file

@ -0,0 +1,160 @@
===========
SQL engines
===========
.. sidebar:: further read
- `SQLite <https://www.sqlite.org/index.html>`_
- `PostgreSQL <https://www.postgresql.org>`_
- `MySQL <https://www.mysql.com>`_
With the *SQL engines* you can bind SQL databases into SearxNG. The following
Relational Database Management System (RDBMS) are supported:
- :ref:`engine sqlite`
- :ref:`engine postgresql`
- :ref:`engine mysql_server`
All of the engines above are just commented out in the :origin:`settings.yml
<searx/settings.yml>`, as you have to set the required attributes for the
engines, e.g. ``database:`` ...
.. code:: yaml
- name: ...
engine: {sqlite|postgresql|mysql_server}
database: ...
result_template: {template_name}
query_str: ...
By default, the engines use the ``key-value`` template for displaying results /
see :origin:`oscar <searx/templates/oscar/result_templates/key-value.html>` &
:origin:`simple <searx/templates/simple/result_templates/key-value.html>`
themes. If you are not satisfied with the original result layout, you can use
your own template, set ``result_template`` attribute to ``{template_name}`` and
place the templates at::
searx/templates/{theme_name}/result_templates/{template_name}
As mentioned in previous blog posts, if you do not wish to expose these engines
on a public instance, you can still add them and limit the access by setting
``tokens`` as described in section :ref:`private engines`.
Configure the engines
=====================
The configuration of the new database engines are similar. You must put a valid
SQL-SELECT query in ``query_str``. At the moment you can only bind at most one
parameter in your query. By setting the attribute ``limit`` you can define how
many results you want from the SQL server. Basically, it is the same as the
``LIMIT`` keyword in SQL.
Please, do not include ``LIMIT`` or ``OFFSET`` in your SQL query as the engines
rely on these keywords during paging. If you want to configure the number of
returned results use the option ``limit``.
.. _engine sqlite:
SQLite
------
.. _MediathekView: https://mediathekview.de/
SQLite is a small, fast and reliable SQL database engine. It does not require
any extra dependency. To demonstrate the power of database engines, here is a
more complex example which reads from a MediathekView_ (DE) movie database. For
this example of the SQlite engine download the database:
- https://liste.mediathekview.de/filmliste-v2.db.bz2
and unpack into ``searx/data/filmliste-v2.db``. To search the database use e.g
Query to test: ``!mediathekview concert``
.. code:: yaml
- name: mediathekview
engine: sqlite
disabled: False
categories: general
result_template: default.html
database: searx/data/filmliste-v2.db
query_str: >-
SELECT title || ' (' || time(duration, 'unixepoch') || ')' AS title,
COALESCE( NULLIF(url_video_hd,''), NULLIF(url_video_sd,''), url_video) AS url,
description AS content
FROM film
WHERE title LIKE :wildcard OR description LIKE :wildcard
ORDER BY duration DESC
Extra Dependencies
------------------
For using :ref:`engine postgresql` or :ref:`engine mysql_server` you need to
install additional packages in Python's Virtual Environment of your SearxNG
instance. To switch into the environment (:ref:`searx-src`) you can use
:ref:`searx.sh`::
$ sudo utils/searx.sh shell
(searx-pyenv)$ pip install ...
.. _engine postgresql:
PostgreSQL
----------
.. _psycopg2: https://www.psycopg.org/install
.. sidebar:: requirements
``pip install`` psycopg2_
PostgreSQL is a powerful and robust open source database. Before configuring
the PostgreSQL engine, you must install the dependency ``psychopg2``. You can
find an example configuration below:
.. code:: yaml
- name: my_database
engine: postgresql
database: my_database
username: searx
password: password
query_str: 'SELECT * from my_table WHERE my_column = %(query)s'
.. _engine mysql_server:
MySQL
-----
.. _mysql-connector-python: https://pypi.org/project/mysql-connector-python
.. sidebar:: requirements
``pip install`` mysql-connector-python_
MySQL is said to be the most popular open source database. Before enabling MySQL
engine, you must install the package ``mysql-connector-python``.
The authentication plugin is configurable by setting ``auth_plugin`` in the
attributes. By default it is set to ``caching_sha2_password``. This is an
example configuration for quering a MySQL server:
.. code:: yaml
- name: my_database
engine: mysql_server
database: my_database
username: searx
password: password
limit: 5
query_str: 'SELECT * from my_table WHERE my_column=%(query)s'
Acknowledgement
===============
This development was sponsored by `Search and Discovery Fund
<https://nlnet.nl/discovery>`_ of `NLnet Foundation <https://nlnet.nl/>`_ .

View file

@ -0,0 +1,57 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
# lint: pylint
# pylint: disable=missing-function-docstring
"""MySQL database (offline)
"""
# import error is ignored because the admin has to install mysql manually to use
# the engine
import mysql.connector # pylint: disable=import-error
engine_type = 'offline'
auth_plugin = 'caching_sha2_password'
host = "127.0.0.1"
database = ""
username = ""
password = ""
query_str = ""
limit = 10
paging = True
result_template = 'key-value.html'
_connection = None
def init(engine_settings):
global _connection # pylint: disable=global-statement
if 'query_str' not in engine_settings:
raise ValueError('query_str cannot be empty')
if not engine_settings['query_str'].lower().startswith('select '):
raise ValueError('only SELECT query is supported')
_connection = mysql.connector.connect(
database = database,
user = username,
password = password,
host = host,
auth_plugin=auth_plugin,
)
def search(query, params):
query_params = {'query': query}
query_to_run = query_str + ' LIMIT {0} OFFSET {1}'.format(limit, (params['pageno'] - 1) * limit)
with _connection.cursor() as cur:
cur.execute(query_to_run, query_params)
return _fetch_results(cur)
def _fetch_results(cur):
results = []
for res in cur:
result = dict(zip(cur.column_names, map(str, res)))
result['template'] = result_template
results.append(result)
return results

View file

@ -0,0 +1,69 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
# lint: pylint
# pylint: disable=missing-function-docstring
"""PostgreSQL database (offline)
"""
# import error is ignored because the admin has to install mysql manually to use
# the engine
import psycopg2 # pylint: disable=import-error
engine_type = 'offline'
host = "127.0.0.1"
port = "5432"
database = ""
username = ""
password = ""
query_str = ""
limit = 10
paging = True
result_template = 'key-value.html'
_connection = None
def init(engine_settings):
global _connection # pylint: disable=global-statement
if 'query_str' not in engine_settings:
raise ValueError('query_str cannot be empty')
if not engine_settings['query_str'].lower().startswith('select '):
raise ValueError('only SELECT query is supported')
_connection = psycopg2.connect(
database = database,
user = username,
password = password,
host = host,
port = port,
)
def search(query, params):
query_params = {'query': query}
query_to_run = (
query_str
+ ' LIMIT {0} OFFSET {1}'.format(limit, (params['pageno'] - 1) * limit)
)
with _connection:
with _connection.cursor() as cur:
cur.execute(query_to_run, query_params)
return _fetch_results(cur)
def _fetch_results(cur):
results = []
titles = []
try:
titles = [column_desc.name for column_desc in cur.description]
for res in cur:
result = dict(zip(titles, map(str, res)))
result['template'] = result_template
results.append(result)
# no results to fetch
except psycopg2.ProgrammingError:
pass
return results

View file

@ -871,6 +871,16 @@ engines:
url: https://thepiratebay.org/ url: https://thepiratebay.org/
timeout : 3.0 timeout : 3.0
# Required dependency: psychopg2
# - name : postgresql
# engine : postgresql
# database : postgres
# username : postgres
# password : postgres
# limit : 10
# query_str : 'SELECT * from my_table WHERE my_column = %(query)s'
# shortcut : psql
- name : pubmed - name : pubmed
engine : pubmed engine : pubmed
shortcut : pub shortcut : pub
@ -1252,6 +1262,16 @@ engines:
# See : http://mymemory.translated.net/doc/usagelimits.php # See : http://mymemory.translated.net/doc/usagelimits.php
# api_key : '' # api_key : ''
# Required dependency: mysql-connector-python
# - name : mysql
# engine : mysql_server
# database : mydatabase
# username : user
# password : pass
# limit : 10
# query_str : 'SELECT * from mytable WHERE fieldname=%(query)s'
# shortcut : mysql
- name : 1337x - name : 1337x
engine : 1337x engine : 1337x
shortcut : 1337x shortcut : 1337x