Name: car-scraper
Owner: datamade
Description: ?Make spreadsheets out of Chicago Association of REALTORS® reports
Created: 2017-04-17 20:21:31.0
Updated: 2018-05-18 16:22:37.0
Pushed: 2018-05-18 16:16:21.0
Size: 69
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Grab Chicagoland real estate reports from the CAR website and convert them all to spreadsheets.
Make sure you have OS-level requirements installed:
brew install poppler
)Then, make a virtualenv and install Python requirements:
rtualenv car-scraper
install -U -r requirements.txt
Finally, build tabula-java 0.9.1 from source:
tabula-java
You'll need to decrypt the CAR login credentials before you can scrape the PDFs. If you're on the keyring for this repo, you can decrypt the secrets file:
kbox_cat configs/secrets.py.gpg > scripts/secrets.py
Otherwise, copy over the example secrets file:
onfigs/secrets.example.py > scripts/secrets.py
Then, adjust the variables to reflect your CAR username and password:
USER = '<your_username>'
PASS = '<your_password>'
Set the desired month and year for the reports in config.mk
:
llow this format:
= 2016
h = 02
Use the DataMade Make standard operating procedure to get your files. make all
produces the final output for the year/month you selected, and make clean
removes all generated files from your repo.
Output files land in the final/
directory. Files with monthly
in the name catalogue month-over-month statistics, while files with yearly
in the name catalogue year-to-date totals.
If you're interested in year-end statistics, just run the scraper for December of a given year ($(month) = 12
) and grab the yearly
files. These are the files we use in Where to Buy.
In the process of cleaning the CSVs, the scraper will double-check to make sure that table values look plausible. It will print these errors to the console while making the target cleaned_csvs
, but you can also examine the output file conversion_errors.csv
if you want to inspect further. Error messages look something like this:
entage error in raw/csvs/suburbs/clean/DuPage_County_4.csv
unity: Carol Stream
mn: months_supply_change
value: -35.8
ulated delta: -34.5
e: calculated deltas should be within +-1 of the row value.)
CAR often slightly miscalculates changes in values between years, as you can see above. This is the most frequent error I've encountered, and you can safely ignore it as long as the delta is within a reasonable range.