Name: CrawlerManager
Owner: Transparency Toolkit
Description: API for calling crawlers
Created: 2015-11-20 21:24:22.0
Updated: 2017-10-23 01:18:27.0
Pushed: 2017-05-22 14:58:42.0
Homepage: null
Size: 85
Language: Ruby
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
API for running and managing crawlers and parsing results
CrawlerManager can be used in combination with Harvester web interface to run queries and load results.
Make sure you have the proper system dependencies with
sudo apt-get install sqlite3 libsqlite3-dev
git clone https://github.com/TransparencyToolkit/CrawlerManager
bundle install
rake db:create:all
rake db:reset
WARNING
Currently, for Harvester to save data, you need to have the path /home/user/Data/KG/
and /home/user/Data/KG/All_Pics/
to exist. This is kludgy and will be configurable soon!
rails server -p 9506
To use proxies, set environment variable PROXYLIST to the path to the proxylist you want to use.
To solve CAPTCHAs, set environment variable SOLVERDETAILS to your 2Captcha key.