Ponysearch/searx/engines/wolframalpha_noapi.py

# WolframAlpha (Maths)
#
# @website     http://www.wolframalpha.com/
# @provide-api yes (http://api.wolframalpha.com/v2/)
#
# @using-api   no
# @results     HTML
# @stable      no
# @parse       answer

from re import search, sub
from json import loads
from urllib import urlencode
from lxml import html
import HTMLParser

# search-url
url = 'http://www.wolframalpha.com/'
search_url = url+'input/?{query}'

# xpath variables
scripts_xpath = '//script'
title_xpath = '//title'
failure_xpath = '//p[attribute::class="pfail"]'


# do search-request
def request(query, params):
    params['url'] = search_url.format(query=urlencode({'i': query}))

    return params


# get response from search-request
def response(resp):
    results = []
    line = None

    dom = html.fromstring(resp.text)
    scripts = dom.xpath(scripts_xpath)

    # the answer is inside a js function
    # answer can be located in different 'pods', although by default it should be in pod_0200
    possible_locations = ['pod_0200\.push(.*)\n',
                          'pod_0100\.push(.*)\n']

    # failed result
    if dom.xpath(failure_xpath):
        return results

    # get line that matches the pattern
    for pattern in possible_locations:
        for script in scripts:
            try:
                line = search(pattern, script.text_content()).group(1)
                break
            except AttributeError:
                continue
        if line:
            break

    if line:
        # extract answer from json
        answer = line[line.find('{'):line.rfind('}')+1]
        answer = loads(answer.encode('unicode-escape'))
        answer = answer['stringified']

        # clean plaintext answer
        h = HTMLParser.HTMLParser()
        answer = h.unescape(answer.decode('unicode-escape'))
        answer = sub(r'\\', '', answer)

        results.append({'answer': answer})

    # user input is in first part of title
    title = dom.xpath(title_xpath)[0].text.encode('utf-8')
    result_url = request(title[:-16], {})['url']

    # append result
    results.append({'url': result_url,
                    'title': title.decode('utf-8')})

    return results
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00			`# WolframAlpha (Maths)`
			`#`
			`# @website http://www.wolframalpha.com/`
[enh] wolframalpha appends result 2016-01-02 05:02:10 +01:00			`# @provide-api yes (http://api.wolframalpha.com/v2/)`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00			`#`
			`# @using-api no`
Make wolframalpha_noapi.py flake8 compliant 2015-12-30 04:37:48 +01:00			`# @results HTML`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00			`# @stable no`
			`# @parse answer`

[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`from re import search, sub`
Add tests for the Wolfram Alpha engines (both API and NO API versions) 2015-12-30 07:53:15 +01:00			`from json import loads`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00			`from urllib import urlencode`
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`from lxml import html`
[fix] unescape htmlentities in wolframalpha_noapi's answer 2016-01-03 22:58:01 +01:00			`import HTMLParser`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00
			`# search-url`
			`url = 'http://www.wolframalpha.com/'`
			`search_url = url+'input/?{query}'`
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00
			`# xpath variables`
			`scripts_xpath = '//script'`
			`title_xpath = '//title'`
			`failure_xpath = '//p[attribute::class="pfail"]'`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00

			`# do search-request`
			`def request(query, params):`
			`params['url'] = search_url.format(query=urlencode({'i': query}))`

			`return params`


Remove unnecessary code in wolframalpha_noapi engine The answer is scraped from a js function, so parsing the html tree doesn't achieve anything here. 2015-12-30 04:11:49 +01:00			`# get response from search-request`
			`def response(resp):`
			`results = []`
Add tests for the Wolfram Alpha engines (both API and NO API versions) 2015-12-30 07:53:15 +01:00			`line = None`
Make wolframalpha_noapi.py flake8 compliant 2015-12-30 04:37:48 +01:00
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`dom = html.fromstring(resp.text)`
			`scripts = dom.xpath(scripts_xpath)`

Remove unnecessary code in wolframalpha_noapi engine The answer is scraped from a js function, so parsing the html tree doesn't achieve anything here. 2015-12-30 04:11:49 +01:00			`# the answer is inside a js function`
			`# answer can be located in different 'pods', although by default it should be in pod_0200`
			`possible_locations = ['pod_0200\.push(.*)\n',`
			`'pod_0100\.push(.*)\n']`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`# failed result`
			`if dom.xpath(failure_xpath):`
			`return results`

Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00			`# get line that matches the pattern`
Remove unnecessary code in wolframalpha_noapi engine The answer is scraped from a js function, so parsing the html tree doesn't achieve anything here. 2015-12-30 04:11:49 +01:00			`for pattern in possible_locations:`
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`for script in scripts:`
			`try:`
			`line = search(pattern, script.text_content()).group(1)`
			`break`
			`except AttributeError:`
			`continue`
			`if line:`
Remove unnecessary code in wolframalpha_noapi engine The answer is scraped from a js function, so parsing the html tree doesn't achieve anything here. 2015-12-30 04:11:49 +01:00			`break`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00
[enh] wolframalpha appends result 2016-01-02 05:02:10 +01:00			`if line:`
			`# extract answer from json`
			`answer = line[line.find('{'):line.rfind('}')+1]`
			`answer = loads(answer.encode('unicode-escape'))`
[fix] unescape htmlentities in wolframalpha_noapi's answer 2016-01-03 22:58:01 +01:00			`answer = answer['stringified']`

			`# clean plaintext answer`
			`h = HTMLParser.HTMLParser()`
			`answer = h.unescape(answer.decode('unicode-escape'))`
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`answer = sub(r'\\', '', answer)`
update tests for wolframalpha 2016-01-02 07:41:14 +01:00
[enh] wolframalpha appends result 2016-01-02 05:02:10 +01:00			`results.append({'answer': answer})`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`# user input is in first part of title`
add tests for unicode strings in wolframalpha 2016-01-04 02:57:37 +01:00			`title = dom.xpath(title_xpath)[0].text.encode('utf-8')`
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`result_url = request(title[:-16], {})['url']`
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00
[enh] wolframalpha appends result 2016-01-02 05:02:10 +01:00			`# append result`
[fix] pass wolframalpha_noapi tests 2016-01-02 08:49:32 +01:00			`results.append({'url': result_url,`
add tests for unicode strings in wolframalpha 2016-01-04 02:57:37 +01:00			`'title': title.decode('utf-8')})`
Make wolframalpha_noapi.py flake8 compliant 2015-12-30 04:37:48 +01:00
Wolfram Alpha (no API needed now) 2015-12-30 03:59:51 +01:00			`return results`