datamade/election-transcriber

Name: election-transcriber

Owner: datamade

Description: :pencil2: Election Transcription Interface built in collaboration with National Democratic Institute

Created: 2015-02-06 16:28:29.0

Updated: 2017-12-08 20:18:17.0

Pushed: 2018-02-23 20:07:26.0

Homepage:

Size: 5398

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Election Transcriber

A tool for digitizing election results data in the form of handwritten digits.

Setup

The instructions below should get you setup for a development environment. To get going in production, follow the instructions in DEPLOYMENT.md.

  1. Install OS level dependencies:

  2. Python 3.4+

  3. Clone this repo & install app requirements

    We recommend using virtualenv and virtualenvwrapper for working in a virtualized development environment. Read how to set up virtualenv.

    Once you have virtualenvwrapper set up,

    rtualenv et
    clone git@github.com:datamade/election-transcriber.git
    lection-transcriber
    install -r requirements.txt
    
  4. Create a PostgreSQL database for election transcriber If you aren't already running PostgreSQL, we recommend installing version 9.6 or later.

    tedb election_transcriber
    
  5. Create your own app_config.py file

    ranscriber/app_config.py.example transcriber/app_config.py
    

    You will need to change, at minimum:

  6. DB_USER and DB_PW to reflect your PostgreSQL username/password (by default, the username is your computer name & the password is '')

  7. S3_BUCKET to tell the application where to look for your cache of images to transcribe

  8. AWS_CREDENTIALS_PATH tells the application where to find the CSV file with your AWS credentials in it. By default, the application looks for a file called credenitals.csv in the root folder of the project.

    You can also change the username, email and password for the initial user roles, defined by ADMIN_USER, MANAGER_USER, and CLERK_USER

  9. Create your own alembic.ini file

    lembic.ini.example alembic.ini
    

    You will need to change, at minimum, user & pass (to reflect your PostgreSQL username/password) on line 6

  10. Initialize the database

    bic upgrade head
    
  11. Import images

    on update_images.py
    
  12. Run the app

    on runserver.py
    
  13. In another terminal, run the worker

    on run_queue.py
    

Once the server is running, navigate to http://localhost:5000/

Syncing images between Google Drive and AWS

There is a script in the root folder of the project called syncDriveFolder.py. As you might guess, it's the script that is responsible for syncing files from a Google Drive folder to an AWS S3 bucket.

Setup Google Service Account


ype": "service_account",
roject_id": "[name of the project]",
rivate_key_id": "[long hash]",
rivate_key": "[very very long hash]",
lient_email": "some-user@project-name.iam.gserviceaccount.com",
lient_id": "[long number]",
uth_uri": "https://accounts.google.com/o/oauth2/auth",
oken_uri": "https://accounts.google.com/o/oauth2/token",
uth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
lient_x509_cert_url": "[long URL]"

As was explained in the part where you download that, the contents of this file should be kept secret.

Setup AWS User


"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "Stmt1508430268000",
        "Effect": "Allow",
        "Action": [
            "s3:*"
        ],
        "Resource": [
            "arn:aws:s3:::[bucket_name]/*"
        ]
    },
    {
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket"
        ],
        "Resource": [
            "arn:aws:s3:::[bucket_name]"
        ]
    }
]

To run the syncDriveFolder.py script, just put the credentials file from Google and the credentials file from AWS in the root folder of the project run the script like

on syncDriveFolder.py -f [name_of_drive_folder] -n [name_of_election]

A full list of options for that script can be seen by running python syncDriveFolder.py --help.

e: syncDriveFolder.py [-h] [--aws-creds AWS_CREDS]
                      [--google-creds GOOGLE_CREDS] -n ELECTION_NAME -f
                      DRIVE_FOLDER [--capture-hierarchy]

 and convert images from a Google Drive Folder to an S3 Bucket

onal arguments:
, --help            show this help message and exit
aws-creds AWS_CREDS
                    Path to AWS credentials. (default:
                    /home/eric/code/election-transcriber/credentials.csv)
google-creds GOOGLE_CREDS
                    Path to Google credentials. (default:
                    /home/eric/code/election-transcriber/credentials.json)
 ELECTION_NAME, --election-name ELECTION_NAME
                    Short name to be used under the hood for the election
                    (default: None)
 DRIVE_FOLDER, --drive-folder DRIVE_FOLDER
                    Name of the Google Drive folder to sync (default:
                    None)
capture-hierarchy   Capture a geographical hierarchy from the name of the
                    file. (default: False)

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.