Name: judicial-elections
Owner: Mother Jones
Description: null
Created: 2014-04-04 20:51:04.0
Updated: 2014-10-30 19:41:24.0
Pushed: 2014-10-28 05:58:13.0
Homepage: null
Size: 104537
Language: JavaScript
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This is a data diary that walks through the steps of acquiring campaign contribution data for state supreme court and lower court candidates, preparing it for SQL analysis and producing specific charts for an online interactive at MotherJones.com
Design inspired by NYT graphic on motorcycle fatalities and You vs. John Paul
Google spreadsheet publish URLs for each section:
NEED TO FILL THIS PART IN
The National Institute on Money in State Politics is a nonpartisan, nonprofit organization that systematically tracks down campaign conributon data in all 50 states. The organization says its data represents information on:
100,000+ lobbyists and clients who register annually, and a 50-state database of contributions documenting $28 billion. Recent expansions include selected local-level data, and collection of reports submitted by independent spending entities in up to 32 states, and political action committees in up to 39 states.
I'm leaving his stuff in this section so I can duplicate it with mine if need be
The data can be downloaded as flat CSV files from this web form.
To download the November 2013 dataset, set the form options accordingly:
All
2013
November
Your browser will download a ZIP file weighing roughly 10.4 MB and it will be named something like: 932989999_T_T100D_SEGMENT_US_CARRIER_ONLY.zip
Unzipping this file will produce a 94.1 MB plaintext CSV file.
This is what the first 200 rows of that CSV looks like.
We will now import the T100 data file into the SQL database of your choice: Sequel Pro is a great GUI for Mac and MySQL, and Firefox's SQLite Manager Plugin is cross-platform.
For the purposes of this exercise, I use Sequel Pro
Data preparation - For some reason, the T100 data file has a trailing column at the end of each row, which effectively creates an empty column.
This is problematic in the first row (i.e. the headers) during the import phase. A quick fix is to just type in some easy-to-ignore value. In the prepared file, I've simply named it EMPTYFIELD
Create a database - In your database manager, create a database named bts_data
Create a table - By default, most import managers will treat each field as a VARCHAR
of length 255. Some of the fields in the data are clearly numbers (e.g. DISTANCE
, DEPARTURES_SCHEDULED
), and others have values much shorter than 255 (e.g. ORIGIN_STATE_ABR
).
We also want to index some of the columns, such as ORIGIN_AIRPORT_ID
, to speed up our queries.
Here's the resulting SQL to create a table named t100_domestic_carriers
:
TE TABLE `t100_domestic_carriers` (
ARTURES_SCHEDULED` int(12) DEFAULT NULL,
ARTURES_PERFORMED` int(12) DEFAULT NULL,
LOAD` int(11) DEFAULT NULL,
TS` int(11) DEFAULT NULL,
SENGERS` int(11) DEFAULT NULL,
IGHT` int(11) DEFAULT NULL,
L` int(11) DEFAULT NULL,
TANCE` int(11) DEFAULT NULL,
P_TO_RAMP` int(11) DEFAULT NULL,
_TIME` int(11) DEFAULT NULL,
QUE_CARRIER` char(6) DEFAULT NULL,
LINE_ID` int(11) DEFAULT NULL,
QUE_CARRIER_NAME` varchar(255) DEFAULT NULL,
QUE_CARRIER_ENTITY` char(10) DEFAULT NULL,
ION` char(2) DEFAULT NULL,
RIER` char(5) DEFAULT NULL,
RIER_NAME` varchar(255) DEFAULT NULL,
RIER_GROUP` char(2) DEFAULT NULL,
RIER_GROUP_NEW` char(2) DEFAULT NULL,
GIN_AIRPORT_ID` char(12) DEFAULT NULL,
GIN_AIRPORT_SEQ_ID` char(7) DEFAULT NULL,
GIN_CITY_MARKET_ID` char(6) DEFAULT NULL,
GIN` varchar(255) DEFAULT NULL,
GIN_CITY_NAME` varchar(255) DEFAULT NULL,
GIN_STATE_ABR` char(3) DEFAULT NULL,
GIN_STATE_FIPS` varchar(255) DEFAULT NULL,
GIN_STATE_NM` varchar(255) DEFAULT NULL,
GIN_WAC` int(11) DEFAULT NULL,
T_AIRPORT_ID` char(12) DEFAULT NULL,
T_AIRPORT_SEQ_ID` char(7) DEFAULT NULL,
T_CITY_MARKET_ID` char(6) DEFAULT NULL,
T` varchar(255) DEFAULT NULL,
T_CITY_NAME` varchar(255) DEFAULT NULL,
T_STATE_ABR` char(3) DEFAULT NULL,
T_STATE_FIPS` varchar(255) DEFAULT NULL,
T_STATE_NM` varchar(255) DEFAULT NULL,
T_WAC` char(2) DEFAULT NULL,
CRAFT_GROUP` char(3) DEFAULT NULL,
CRAFT_TYPE` char(5) DEFAULT NULL,
CRAFT_CONFIG` char(2) DEFAULT NULL,
R` int(11) DEFAULT NULL,
RTER` int(11) DEFAULT NULL,
TH` int(11) DEFAULT NULL,
TANCE_GROUP` int(11) DEFAULT NULL,
SS` char(2) DEFAULT NULL,
TYFIELD` char(1) DEFAULT NULL,
`UNIQUE_CARRIER` (`UNIQUE_CARRIER`),
`ORIGIN_AIRPORT_ID` (`ORIGIN_AIRPORT_ID`),
`ORIGIN_CITY_MARKET_ID` (`ORIGIN_CITY_MARKET_ID`),
`DEST_AIRPORT_ID` (`DEST_AIRPORT_ID`),
`DEST_CITY_MARKET_ID` (`DEST_CITY_MARKET_ID`),
`ORIGIN_AIRPORT_ID_2` (`ORIGIN_AIRPORT_ID`,`DEST_AIRPORT_ID`),
`ORIGIN_STATE_ABR` (`ORIGIN_STATE_ABR`),
`DEST_STATE_ABR` (`DEST_STATE_ABR`),
`YEAR_AND_MONTH` (`YEAR`,`MONTH`)
GINE=InnoDB DEFAULT CHARSET=utf8;
Import the T100 CSV - Now simply import the CSV file into our newly created table. Sequel Pro will match up the columns in the CSV file to those in our t100_domestic_carriers
Create a list of carriers in this dataset, sorted in descending order by number of performed departures
CT UNIQUE_CARRIER, UNIQUE_CARRIER_NAME, SUM(DEPARTURES_PERFORMED) as total_departures
bts_data.t100_domestic_carriers
P BY UNIQUE_CARRIER
R BY total_departures DESC
Results:
UNIQUE_CARRIER | UNIQUE_CARRIER_NAME | total_departures —————-|————–|—————-: WN | Southwest Airlines Co. | 1030147 DL | Delta Air Lines Inc. | 698188 EV | ExpressJet Airlines Inc. | 678610 OO | SkyWest Airlines Inc. | 568164 AA | American Airlines Inc. | 487525 UA | United Air Lines Inc. | 463063
List the city-to-city routes flown by Delta Air Lines, sorted in descending order of the sum of passengers flown.
CT ORIGIN_CITY_NAME, DEST_CITY_NAME, SUM(PASSENGERS) AS total_passengers
t100_domestic_carriers
E UNIQUE_CARRIER='DL'
P BY ORIGIN_AIRPORT_ID, DEST_AIRPORT_ID
R BY total_passengers DESC
Results:
|ORIGIN_CITY_NAME|DEST_CITY_NAME|total_passengers |—————-|————–|—————-: |Orlando, FL|Atlanta, GA|888016 |Atlanta, GA|Orlando, FL|882425 |Atlanta, GA|New York, NY| 704765 |New York, NY|Atlanta, GA|699186 |Atlanta, GA|Los Angeles, CA|673863 |Atlanta, GA|Fort Lauderdale, FL|659698
The BTS provides several lookup tables that will be useful in keeping our Air Carrier data normalized. I've included them in this diary's data/
directory for your convenience. You can also download them directly from the BTS webform, though they annoyingly have the nonsensical file extension of .csv-
Table | Records | Description
——|——–:|————
L_AIRCRAFT_CONFIG.csv | 6 | e.g. "1","Passenger Configuration"
L_AIRCRAFT_GROUP.csv | 10 | e.g. "7","Jet, 3-Engine"
L_AIRPORT_ID.csv | 6,260 | a unique identifier for airports that persists regardless of changes to codes, e.g. "10030","Port Vita, AK: Port Vita Airport"
L_AIRCRAFT_TYPE.csv | 385 | e.g. "889","B787-900 Dreamliner"
L_CITY_MARKET_ID.csv | 5,656 | - a reference to a metro area that may be served by several airports., e.g. "31703","New York City, NY (Metropolitan Area)"
L_REGION.csv | 6 | e.g. "D", "Domestic"
L_SERVICE_CLASS.csv | 14 | e.g."K","Scheduled Service K (F+G)"
L_UNIQUE_CARRIERS.csv | 1,565 | e.g. "DL","Delta Air Lines Inc."
L_WORLD_AREA_CODES.csv | 333 | e.g. "759","North Vietnam"
(to be continued)
t100_domestic_carriers
table