datamade/scrapers_ca_app

Name: scrapers_ca_app

Owner: datamade

Description: Canadian legislative scrapers Django app

Created: 2015-09-17 13:56:21.0

Updated: 2016-05-04 20:47:24.0

Pushed: 2015-09-15 19:14:57.0

Homepage: https://scrapers.herokuapp.com/

Size: 4977

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Canadian Legislative Scrapers

Dependency Status

This Django project runs the Canadian legislative scrapers, displays the status of each scraper, and returns the scraped data as JSON.

Development

Follow the instructions in the Python Quick Start Guide to install Homebrew, Git, PostgreSQL, Python and virtualenv.

mkvirtualenv scrapers_ca_app
git clone git@github.com:opennorth/scrapers_ca_app.git
cd scrapers_ca_app

Set up the submodule and switch it to master:

git submodule init
git submodule update
cd scrapers
git checkout master
cd ..

Install the requirements:

pip install -r requirements.txt

Create a database (dropdb pupa if it already exists):

dropdb pupa
createdb pupa
python manage.py syncdb --noinput

Run all the scrapers:

python manage.py update

Or run specific scrapers:

python manage.py update ca_ab_edmonton ca_ab_grande_prairie_county_no_1

Install the foreman gem:

gem install foreman

Start the web app:

foreman start
Deployment

Add configuration variables (replace REPLACE):

heroku config:set PRODUCTION=1
heroku config:set AWS_ACCESS_KEY_ID=REPLACE
heroku config:set AWS_SECRET_ACCESS_KEY=REPLACE
heroku config:set DJANGO_SECRET_KEY=REPLACE
heroku config:set DATABASE_URL=`heroku config:get REPLACE`

You can generate a secret key in Python:

 django.utils.crypto import get_random_string
random_string(50, 'abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*(-_=+)')

You'll need a production tier PostgreSQL database to use PostGIS (replace DATABASE):

heroku addons:add heroku-postgresql:standard-0
heroku pg:wait
heroku pg:promote DATABASE
heroku addons:remove heroku-postgresql:dev
heroku pg:psql

In the PostgreSQL shell, run:

CREATE EXTENSION postgis;

You'll need the geo buildpack for GeoDjango:

heroku config:add BUILDPACK_URL=https://github.com/ddollar/heroku-buildpack-multi.git

Setup the database (replace DATABASE):

heroku pg:reset DATABASE
heroku run pupa dbinit ca
heroku run python manage.py migrate --noinput

Add python manage.py update to the Heroku Scheduler.

Checking consistency
python manage.py check
Eliminating duplicates

If a scraper creates duplicates, you may need to:

python manage.py flush MODULE_NAME
Troubleshooting
Bugs? Questions?

This repository is on GitHub: https://github.com/opennorth/scrapers_ca_app, where your contributions, forks, bug reports, feature requests, and feedback are greatly welcomed.

Copyright (c) 2013 Open North Inc., released under the MIT license


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.