Name: IGNORE_THIS-congress-legislators
Owner: GovTrack.us
Description: Members of the United States Congress, 1789-Present, in YAML, as well as committees, presidents, and vice presidents.
Created: 2015-10-02 16:04:04.0
Updated: 2016-05-09 05:48:37.0
Pushed: 2016-01-09 15:02:38.0
Size: 26533
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Members of the United States Congress (1789-Present) and congressional committees (1973-Present) in YAML.
This repository contains data about legislators…:
legislators-current.yaml
: Currently serving Members of Congress (as of last update).legislators-historical.yaml
: Historical Members of Congress (i.e. all Members of Congress except those in the current file).legislators-social-media.yaml
: Current social media accounts for Members of Congress. Official accounts only (no campaign or personal accounts).…and about committees:
committees-current.yaml
: Current committees of the Congress, with subcommittees.committee-membership-current.yaml
: Current committee/subcommittee assignments as of the date of last update.committees-historical.yaml
: Current and historical committees of the Congress, with subcommittees, from the 93rd Congress (1973) and on.This repository also contains a database of presidents and vice presidents in executive.yaml. Recall that vice presidents are also president of the Senate and cast tie-breaking votes.
The files are in YAML format. YAML is a serialization format similar in structure to JSON but typically written with one field per line. Like JSON, it allows for nested structure. Each level of nesting is indicated by indentation or a dash.
This database has been collected from a variety of sources:
The data is currently maintained both by hand and by some scripts in the scripts
directory.
You can just use the data directly without running any scripts. If you want to develop on and help maintain the data, our scripts are tested and developed on Python 3.3.
Every script in scripts/
should be safely import-able without executing code, beyond imports themselves. We typically do this with a def run():
declaration after the imports, and putting this at the bottom of the script:
_name__ == '__main__':
n()
Every pull request will pass submitted scripts through an import, to catch exceptions, and through pyflakes, to catch unused imports or local vars.
legislators-current.yaml
and legislators-historical.yaml
contain biographical information on all Members of Congress that have ever served in Congress, that is, since 1789, as well as cross-walks into other databases.
Each legislator record is grouped into four guaranteed parts: id's which relate the record to other databases, name information (first, last, etc.), biographical information (birthday, gender), and terms served in Congress. A typical record looks something like this:
- id:
bioguide: R000570
thomas: '01560'
govtrack: 400351
opensecrets: N00004357
votesmart: 26344
fec:
- H8WI01024
cspan: 57970
wikipedia: Paul Ryan
ballotpedia: Paul Ryan
washington_post: gIQAUWiV9O
maplight: 445
house_history: 20785
icpsr: 29939
name:
first: Paul
middle: D.
last: Ryan
bio:
birthday: '1970-01-29'
gender: M
terms:
...
- type: rep
start: '2011-01-03'
end: '2013-01-03'
...
- type: rep
start: '2013-01-03'
end: '2015-01-03'
state: WI
party: Republican
district: 1
url: http://paulryan.house.gov
address: 1233 Longworth HOB; Washington DC 20515-4901
phone: 202-225-3031
fax: 202-225-3393
contact_form: http://www.house.gov/ryan/email.htm
office: 1233 Longworth House Office Building
Terms correspond to elections and are listed in chronological order. If a legislator is currently serving, the current term information will always be the last one. To check if a legislator is currently serving, check that the end date on the last term is in the future.
The split between legislators-current.yaml
and legislators-historical.yaml
is somewhat arbitrary because these files may not be updated immediately when a legislator leaves office. If it matters to you, just load both files.
A separate file legislators-social-media.yaml
stores social media account information. Its structure is similar but includes different fields.
The following fields are available in legislators-current.yaml
and legislators-historical.yaml
:
id
http://www.washingtonpost.com/politics/[washington_post]_topic.html
)name
other_names, when present, lists other names the legislator has gone by officially. This is helpful in cases where a legislator's legal name has changed. These listings will only include the name attributes which differ from the current name, and a start or end date where applicable. Where multiple names exist, other names are listed chronologically by end date. An excerpted example:
bio
terms (one entry for each election)
party_affiliations
will be set. Values are typically “Democrat”, “Independent”, or “Republican”. The value typically matches the political party of the legislator on the ballot in his or her last election, although for state affiliate parties such as “Democratic Farmer Labor” we will use the national party name (“Democrat”) instead to keep the values of this field normalized.party
field. Omitted if the legislator caucuses with the party indicated in the party field. When in doubt about the difference between the party
and caucus
fields, the party
field is what displays after the legislator's name (i.e. “(D)“) but the caucus
field is what normally determines committee seniority. This field was added starting with terms for the 113th Congress.start
and end
dates, each of which has a party
field and a caucus
field if applicable, with the same meanings as the main party
and caucus
fields. The time periods cover the entire term, so the first start
will match the term start
, the last end
will match the term end
, and the last party
(and caucus
if present) will match the term party
(and caucus
).Leadership roles:
ership_roles:
title: Minority Leader
chamber: senate
start: '2007-01-04'
end: '2009-01-06'
For members with top formal positions of leadership in each party in each chamber, a leadership_roles
field will include an array of start/end dates and titles documenting when they held this role.
Leadership terms are not identical to legislative terms, and so start and end dates will be different than legislative term dates. However, leaders do need to be re-elected each legislative term, so their leadership terms should all be subsets of their legislative terms.
Except where noted, fields are omitted when their value is empty or unknown. Any field may be unknown.
Notes: In most cases, a legislator has a single term on any given date. In some cases a legislator resigned from one chamber and was sworn in in the other chamber on the same day. Terms for senators list each six-year term, so the terms span three Congresses. For representatives and delegates, each two-year term is listed, each corresponding to a single Congress. But Puerto Rico's Resident Commissioner serves four-year terms, and so the Resident Commissioner will have a single term covering two Congresses (this has not been updated in historical data).
Historically, some states sending at-large representatives actually sent multiple at-large representatives. Thus, state and district may not be a unique key.
The social media file legislators-social-media.yaml
stores current social media account information.
Each record has two sections: id
and social
. The id
section identifies the legislator using bioguide, thomas, and govtrack IDs (where available). The social
section has social media account identifiers:
Several legislators do not have an assigned YouTube username. In these cases, only the youtube_id field is populated.
All values can be turned into URLs by preceding them with the domain name of the service in question (and in the case of YouTube channels, the path /channel
):
https://twitter.com/[twitter]
https://youtube.com/user/[youtube]
https://youtube.com/channel/[youtube_id]
https://instagram/[instagram]
https://facebook.com/[facebook or facebook_id]
Legislators are only present when they have one or more social media accounts known. Fields are omitted when the account is unknown.
Available tasks with scripts/social_media.py
:
--sweep
: Given a --service
, looks through current members for those missing an account on that service, and checks that member's official website's source code for mentions of that service. Uses a CSV at data/social_media_blacklist.csv
to exclude known non-individual account names. A CSV of “leads” is produced for manual review.
--update
: Given a --service
, reads the CSV produced by –sweep back in and updates the YAML accordingly. Note: With small updates, for people already in the YAML, it's easiest to just update by hand.
--clean
: Given a --service
, removes legislators from the social media file who are no longer current.
--resolvefb
: Uses Facebook usernames to look up graph IDs, and updates the YAML accordingly.
--resolveyt
Uses YouTube usernames to look up any channel IDs, and updates the YAML accordingly.
--resolveig
Uses Instagram user IDs to look up any usernames, and updates the YAML accordingly.
Options used with the above tasks:
--service
: Can be “twitter”, “youtube”, or “facebook”.--bioguide
: Limit activity to a single member, by bioguide ID.--email
: In conjunction with --sweep
, send an email if there are any new leads, using settings in scripts/email/config.yml (if it was created and filled out).The committees-current.yaml
file lists all current House, Senate, and Joint committees of the United States Congress. It includes metadata and cross-walks into other databases of committee information. It is based on data scraped from House.gov and Senate.gov.
The committees-historical.yaml
file is a possibly partial list of current and historical committees and subcommittees referred to in the unitedstates/congress project bill data, as scraped from THOMAS.gov. Only committees/subcommmittees that have had bills referred to them are included.
The basic structure of a committee entry looks like the following:
- type: house
name: House Committee on Agriculture
url: http://agriculture.house.gov/
thomas_id: HSAG
house_committee_id: AG
jurisdiction: The U.S. House Committee on Agriculture, or Agriculture Committee,
is a standing committee of the ...
jurisdiction_source: http://en.wikipedia.org/wiki/House_Committee_on_Agriculture
subcommittees:
(... subcommittee list ...)
The two files are structured each as a list of committees, each entry an associative array of key/value pairs of committee metadata.
The fields available in both files are as follows:
Additional fields are present on current committee entries (that is, in committees-current.yaml
):
Two additional fields are present on committees and subcommmittees in the committees-historical.yaml
file:
The committee-membership-current.yaml
file contains current committee assignments, as of the date of the last update of this file. The file is structured as a mapping from committee IDs to a list of committee members. The basic structure looks like this:
HSAG:
- name: Frank D. Lucas
party: majority
rank: 1
title: Chair
bioguide: L000491
thomas: '00711'
- name: Bob Goodlatte
party: majority
rank: 2
(...snip...)
HSAG03:
- name: Jean Schmidt
party: majority
rank: 1
title: Chair
The committee IDs in this file are the thomas_id's from the committees-current.yaml
file, or for subcommittees the concatentation of the thomas_id of the parent committee and the thomas_id of the subcommittee.
Each committee/subcommittee entry is a list containing the members of the committee. Each member has the following fields:
house
or senate
.Because of their role in the legislative process, we also include a file executive.yaml
which contains terms served by U.S. presidents (who signed legislation) and U.S. vice presidents (who are nominally the president of the Senate and occassionally cast tie-breaking votes there).
This file has a similar structure as the legislator files. The file contains a list, where each entry is a person. Each entry is a dict with id, name, bio, and terms fields.
The id, bio, and name fields are the same as those listed above. Except:
Each term has the following fields:
Presidents and vice presidents that previously served in Congress will also be listed in one of the legislator files, but their Congressional terms will only appear in the legislator files and their executive-branch terms will only appear in executive.yaml
.
Although you can find the USPS abbreviations for the 50 states anywhere, non-voting delegates from territories — including historical territories that no longer exist — are included in this database. Here is a complete list of abbreviations:
The 50 States:
AK Alaska
AL Alabama
AR Arkansas
AZ Arizona
CA California
CO Colorado
CT Connecticut
DE Delaware
FL Florida
GA Georgia
HI Hawaii
IA Iowa
ID Idaho
IL Illinois
IN Indiana
KS Kansas
KY Kentucky
LA Louisiana
MA Massachusetts
MD Maryland
ME Maine
MI Michigan
MN Minnesota
MO Missouri
MS Mississippi
MT Montana
NC North Carolina
ND North Dakota
NE Nebraska
NH New Hampshire
NJ New Jersey
NM New Mexico
NV Nevada
NY New York
OH Ohio
OK Oklahoma
OR Oregon
PA Pennsylvania
RI Rhode Island
SC South Carolina
SD South Dakota
TN Tennessee
TX Texas
UT Utah
VA Virginia
VT Vermont
WA Washington
WI Wisconsin
WV West Virginia
WY Wyoming
Current Territories:
Legislators serving in the House from these territories are called delegates, except for the so-called “Resident Commissioner” from Puerto Rico.
AS American Samoa
DC District of Columbia
GU Guam
MP Northern Mariana Islands
PR Puerto Rico
VI Virgin Islands
Historical Territories:
These territories no longer exist.
DK Dakota Territory
OL Territory of Orleans
PI Philippines Territory/Commonwealth
(Recommended) First, create a virtualenv in the scripts directory:
cripts
ualenv virt
ce virt/bin/activate
Install the requirements:
install -r requirements.txt
Try updating the House members contact information (mailing address, etc.):
on house_contacts.py
Check whether and how the data has changed:
diff ../*.yaml
We run the following scripts periodically to scrape for new information and keep the data files up to date. The scripts do not take any command-line arguments.
house_contacts.py
: Updates House members' contact information (address, office, and phone fields on their current term, and their official_full name field)house_websites.py
: Updates House members' current website URLs.senate_contacts.py
: Updates senator information (party, class, state_rank, address, office, phone, and contact_form fields on their current term, and their official_full name, bioguide ID, and lis ID fields)committee_membership.py
: Updates committees-current.yaml
(name, address, and phone fields for House committees; name and url fields for Senate committees; creates new subcommittees when found with name and thomas_id fields) and writes out a whole new committee-membership-current.yaml
file by scraping the House and Senate websites.historical_committees.py
: Updates committees-historical.yaml
based on the committees listed on THOMAS.gov, which are committees to which bills have been referred since the 103rd Congress (1973).social_media.py
: Generates leads for Twitter, YouTube, and Facebook accounts for members of Congress by scraping their official websites. Uses a blacklist CSV and a whitelist CSV to manage false positives and negatives.influence_ids.py
: Grabs updated FEC and OpenSecrets IDs from the Influence Explorer API. Will only work for members with a Bioguide ID.The following script takes one required command line argument
icpsr_ids.py
: Updates ICPSR ID's for all members of the House and Senate in a given congress, based on roll call vote data files stored by Voteview.com. The script takes one command line argument:
–congress=congress_number
where congress_number is the number of the congress to be updated. As of July, 2013, the permanent URL for future roll call data is unclear, and as such, the script may need to be modified when it is run for the 114th congress.The following script may be run to create alternatly formatted data files. It takes no command-line arguments.
The ballotpedia
and washington_post
fields have been created using code from James Michael DuPont, using the code in git@github.com:h4ck3rm1k3/rootstrikers-wikipedia.git in the branch ballotpedia
.
This repository was public domain until Oct 2, 2015. Changes since then are owned by GovTrack and licensed under CC-BY.