motherjones/judicial-elections

Name: judicial-elections

Owner: Mother Jones

Description: null

Created: 2014-04-04 20:51:04.0

Updated: 2014-10-30 19:41:24.0

Pushed: 2014-10-28 05:58:13.0

Homepage: null

Size: 104537

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Data Diary for Mother Jones Judicial Campaign Finance Investigation

This is a data diary that walks through the steps of acquiring campaign contribution data for state supreme court and lower court candidates, preparing it for SQL analysis and producing specific charts for an online interactive at MotherJones.com

The steps

Design inspired by NYT graphic on motorcycle fatalities and You vs. John Paul

Quick reference
Data

Google spreadsheet publish URLs for each section:

Source Tools Produced data

NEED TO FILL THIS PART IN

About the data

The National Institute on Money in State Politics is a nonpartisan, nonprofit organization that systematically tracks down campaign conributon data in all 50 states. The organization says its data represents information on:

100,000+ lobbyists and clients who register annually, and a 50-state database of contributions documenting $28 billion. Recent expansions include selected local-level data, and collection of reports submitted by independent spending entities in up to 32 states, and political action committees in up to 39 states.

Not sure if we're using this part*

I'm leaving his stuff in this section so I can duplicate it with mine if need be

The BTS web form

The data can be downloaded as flat CSV files from this web form.

To download the November 2013 dataset, set the form options accordingly:

  1. Filter geography - All
  2. Filter year - 2013
  3. Filter period - November
  4. Check the Select all fields checkbox
  5. Then click the Download button.

Your browser will download a ZIP file weighing roughly 10.4 MB and it will be named something like: 932989999_T_T100D_SEGMENT_US_CARRIER_ONLY.zip

Unzipping this file will produce a 94.1 MB plaintext CSV file.

This is what the first 200 rows of that CSV looks like.

Importing into SQL

We will now import the T100 data file into the SQL database of your choice: Sequel Pro is a great GUI for Mac and MySQL, and Firefox's SQLite Manager Plugin is cross-platform.

For the purposes of this exercise, I use Sequel Pro

  1. Data preparation - For some reason, the T100 data file has a trailing column at the end of each row, which effectively creates an empty column.

    Trailing commas in the T100 data

    This is problematic in the first row (i.e. the headers) during the import phase. A quick fix is to just type in some easy-to-ignore value. In the prepared file, I've simply named it EMPTYFIELD

  2. Create a database - In your database manager, create a database named bts_data

  3. Create a table - By default, most import managers will treat each field as a VARCHAR of length 255. Some of the fields in the data are clearly numbers (e.g. DISTANCE, DEPARTURES_SCHEDULED), and others have values much shorter than 255 (e.g. ORIGIN_STATE_ABR).

    We also want to index some of the columns, such as ORIGIN_AIRPORT_ID, to speed up our queries.

    Here's the resulting SQL to create a table named t100_domestic_carriers:

    TE TABLE `t100_domestic_carriers` (
    ARTURES_SCHEDULED` int(12) DEFAULT NULL,
    ARTURES_PERFORMED` int(12) DEFAULT NULL,
    LOAD` int(11) DEFAULT NULL,
    TS` int(11) DEFAULT NULL,
    SENGERS` int(11) DEFAULT NULL,
    IGHT` int(11) DEFAULT NULL,
    L` int(11) DEFAULT NULL,
    TANCE` int(11) DEFAULT NULL,
    P_TO_RAMP` int(11) DEFAULT NULL,
    _TIME` int(11) DEFAULT NULL,
    QUE_CARRIER` char(6) DEFAULT NULL,
    LINE_ID` int(11) DEFAULT NULL,
    QUE_CARRIER_NAME` varchar(255) DEFAULT NULL,
    QUE_CARRIER_ENTITY` char(10) DEFAULT NULL,
    ION` char(2) DEFAULT NULL,
    RIER` char(5) DEFAULT NULL,
    RIER_NAME` varchar(255) DEFAULT NULL,
    RIER_GROUP` char(2) DEFAULT NULL,
    RIER_GROUP_NEW` char(2) DEFAULT NULL,
    GIN_AIRPORT_ID` char(12) DEFAULT NULL,
    GIN_AIRPORT_SEQ_ID` char(7) DEFAULT NULL,
    GIN_CITY_MARKET_ID` char(6) DEFAULT NULL,
    GIN` varchar(255) DEFAULT NULL,
    GIN_CITY_NAME` varchar(255) DEFAULT NULL,
    GIN_STATE_ABR` char(3) DEFAULT NULL,
    GIN_STATE_FIPS` varchar(255) DEFAULT NULL,
    GIN_STATE_NM` varchar(255) DEFAULT NULL,
    GIN_WAC` int(11) DEFAULT NULL,
    T_AIRPORT_ID` char(12) DEFAULT NULL,
    T_AIRPORT_SEQ_ID` char(7) DEFAULT NULL,
    T_CITY_MARKET_ID` char(6) DEFAULT NULL,
    T` varchar(255) DEFAULT NULL,
    T_CITY_NAME` varchar(255) DEFAULT NULL,
    T_STATE_ABR` char(3) DEFAULT NULL,
    T_STATE_FIPS` varchar(255) DEFAULT NULL,
    T_STATE_NM` varchar(255) DEFAULT NULL,
    T_WAC` char(2) DEFAULT NULL,
    CRAFT_GROUP` char(3) DEFAULT NULL,
    CRAFT_TYPE` char(5) DEFAULT NULL,
    CRAFT_CONFIG` char(2) DEFAULT NULL,
    R` int(11) DEFAULT NULL,
    RTER` int(11) DEFAULT NULL,
    TH` int(11) DEFAULT NULL,
    TANCE_GROUP` int(11) DEFAULT NULL,
    SS` char(2) DEFAULT NULL,
    TYFIELD` char(1) DEFAULT NULL,
    `UNIQUE_CARRIER` (`UNIQUE_CARRIER`),
    `ORIGIN_AIRPORT_ID` (`ORIGIN_AIRPORT_ID`),
    `ORIGIN_CITY_MARKET_ID` (`ORIGIN_CITY_MARKET_ID`),
    `DEST_AIRPORT_ID` (`DEST_AIRPORT_ID`),
    `DEST_CITY_MARKET_ID` (`DEST_CITY_MARKET_ID`),
    `ORIGIN_AIRPORT_ID_2` (`ORIGIN_AIRPORT_ID`,`DEST_AIRPORT_ID`),
    `ORIGIN_STATE_ABR` (`ORIGIN_STATE_ABR`),
    `DEST_STATE_ABR` (`DEST_STATE_ABR`),
    `YEAR_AND_MONTH` (`YEAR`,`MONTH`)
    GINE=InnoDB DEFAULT CHARSET=utf8;
    
  4. Import the T100 CSV - Now simply import the CSV file into our newly created table. Sequel Pro will match up the columns in the CSV file to those in our t100_domestic_carriers

    Importing into Sequel Pro

Data exploration
Normalizing the database
Download lookup tables

The BTS provides several lookup tables that will be useful in keeping our Air Carrier data normalized. I've included them in this diary's data/ directory for your convenience. You can also download them directly from the BTS webform, though they annoyingly have the nonsensical file extension of .csv-

Table | Records | Description ——|——–:|———— L_AIRCRAFT_CONFIG.csv | 6 | e.g. "1","Passenger Configuration" L_AIRCRAFT_GROUP.csv | 10 | e.g. "7","Jet, 3-Engine" L_AIRPORT_ID.csv | 6,260 | a unique identifier for airports that persists regardless of changes to codes, e.g. "10030","Port Vita, AK: Port Vita Airport" L_AIRCRAFT_TYPE.csv | 385 | e.g. "889","B787-900 Dreamliner" L_CITY_MARKET_ID.csv | 5,656 | - a reference to a metro area that may be served by several airports., e.g. "31703","New York City, NY (Metropolitan Area)" L_REGION.csv | 6 | e.g. "D", "Domestic" L_SERVICE_CLASS.csv | 14 | e.g."K","Scheduled Service K (F+G)" L_UNIQUE_CARRIERS.csv | 1,565 | e.g. "DL","Delta Air Lines Inc." L_WORLD_AREA_CODES.csv | 333 | e.g. "759","North Vietnam"

(to be continued)

Next steps

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.