motherjones/west-texas-data-diary

Name: west-texas-data-diary

Owner: Mother Jones

Description: A data diary for our project on why we can't predict the next West, TX explosion

Created: 2014-05-30 17:47:35.0

Updated: 2016-09-21 22:34:25.0

Pushed: 2014-05-30 18:43:47.0

Homepage: null

Size: 1564

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Data Diary for “Is There a Risky Chemical Plant Near You?”

A data diary for our project on why we can't predict the next West, TX explosion

The goal

In the wake of the April 17, 2013 explosion at a fertilizer plant in the town of West, Texas, we wanted to find where else plants posed a chemical risk to their surrounding communities. Read the published story here, which talks about the project methodology.

This data diary (inspired by Dan Nguyen) walks through the steps of preparing the EPA's Risk Management Plan data for mapping.

This guide will cover how to read a dataset about US chemical facilities that report to the EPA every five years, add geographical data, and run some basic analyses on it that will help you think about visualization.

Before you get started

Here's what you'll need for this exercise:

The data source
Tools
The data you'll end up with (for cheaters)
What you should know about the data

The Environmental Protection Agency monitors about 12,000 chemical facilities under its Risk Management Plan. RMP facilities file a report with the EPA every 5 years. While you can look up individual RMP facilities, the entire dataset is only available through a public records request. The Center for Effective Government's Right-to-Know Network files a FOIA request each year in May and stores the data in its own database. The data here is up to date as of May 2013. Some facilities in this dataset may have been de-registered since the data was obtained.

Here's some data about the data:

The EPA's description of the RMP data:

Risk Management Program

Requirements There are three levels of regulation under the Risk Management Program. Program 1 is for firms that have relatively safe processes and low risk. Programs 2 and 3 are more highly regulated than Program 1. These Programs are meant for firms that have a higher risk of affecting the public in case of a chemical spill. All facilities that are listed under the Risk Management Program must complete a hazard assessment (consisting of an off-site consequence analysis and five-year accident history), implement an emergency response program, and submit a Risk Management Plan (RMP) to EPA. Most firms must also create a detailed accident prevention program that will help to prevent the accidental release of hazardous chemicals.

Four Sections of the Risk Management Program

Offsite Consequence Analysis In this section of the program, firms outline their worse-case and alternative accident release scenarios. The worse-case scenario is an unlikely scenario that describes the potential consequences of the release of the largest single vessel containing a regulated substance that produces the greatest offsite endpoint distance. The alternative scenario describes a more likely scenario for a release that could affect the public.

Five-Year Accident History A facility?s five year accident history describes all those accidents that have caused deaths, injuries, evacuations, sheltering-in-place, significant on-site or off-site property damage, or environmental damage significant on-site or off-site damage. As these criteria generally are associated with only the most serious accidents, many firms have accidental releases that are not reportable under the Risk Management Program. Only the most serious accidents are reported.

Prevention Program The RMP prevention program requirements are similar to the requirements of the Occupational Safety and Health Administration Process Safety Management (PSM) standard program. Most RMP facilities must conduct operator training, implement written operating procedures, maintain equipment, and take other accident prevention measures.

Emergency Response Program In addition to the requirements listed above, all facilities must work with their Local Emergency Planning Committees and other local responders to ensure that local responders are prepared to respond to emergencies at the facility. Facilities must also have mechanisms in place to notify local responders when a release occurs. Facilities that choose to use their own employees to respond to accidental releases must implement additional emergency response program measures, including having an emergency response plan, emergency equipment procedures, documentation of first aid and emergency medical treatment needed to treat chemical exposures, and trained emergency responders.proper measures are taken when accidents occur.

Step 1: Download the RMP facilities data

We obtained the RMP data directly from the Center for Effective Government, in the form of an Excel file that was emailed to us.

We have included that data in this repo, which you can download above. Your browser will download a xlsx file weighing roughly 5.3 MB and it will be named: RmpProcessChemsNew.xlsx.

Opening this file will boot up Microsoft Excel. I've tried importing the file into Google Spreadsheets, but the file size slowed down the program to where I switched to Excel instead.

Step 2: Run a quick data quality check

A couple of tips before you dive into the data, to save time and headache later:

Step 3: Reshape the data.

Now that you're looking at the RMP dataset, you'll notice that each row represents a unique chemical reported at a given facility, not a unique facility. For example, there are 2 rows referring facility 100000000045 and facility 100000000054, for each unique chemical:

FacilityID | FacilityName | NameOfChemical —————-|————–|—————-: 100000000045 | Yellow Breaches Water Treatment Plant | Chlorine 100000000045 | Yellow Breaches Water Treatment Plant | Public OCA Chemical 100000000054 | Sooner Cooperative, Inc | Ammonia (anhydrous) 100000000054 | Sooner Cooperative, Inc | Public OCA Chemical

What we want to map, though, is each unique facility, and we want to display the list of all chemicals reported by that facility in a tooltip. In order to do that, we need to reshape the data so that each row represents a unique facility. I couldn't find a good way to do this through SQL or R or Pivot Tables, so with help from a few generous friends (thanks, Noah!), I wrote a script that loops through the data set and creates one row per facility. For any repeated facilities, the program adds the new chemical into the chemicals column, separating each unique chemical with a semicolon within.

The result looks something like this:

FacilityID | FacilityName | NameOfChemical —————-|————–|—————-: 100000000045 | Yellow Breaches Water Treatment Plant | Chlorine; Public OCA Chemical 100000000054 | Sooner Cooperative, Inc | Ammonia (anhydrous); Public OCA Chemical

Pretty magical. Shall we give it a whirl?

I should disclose that the need to reshape the data didn't occur to me until I was already pretty far (many months) into this project. But the smart way would have been to reshape first. So that's how I'm recommending you do it here. The hindsight would have saved me a lot of time and hassle.

Step 4: Geocode your data

Facility_Name | Process | Street | City | State | Zip | Latitude | Longitude | Accidents | Total_No_Chemicals | Chemical_Names —————-|————–|—————-|—————-|————–|—————-|—————-|————–|—————-|—————-|————–: Foster Farms C St. (10/09 RMP Rev.) " " “” | 3 | 520 C Street | Turlock | CA | 95380 | 37.491 | -120.849 | 0 | 2 | Public OCA Chemical; Ammonia (anhydrous) INITIATIVE FOODS, LLC | 2 | 1117 K Street " " “” | Sanger | CA | 93657 | 36.700701 | -119.551613 | 0 | 2 | Public OCA Chemical; Ammonia (anhydrous) REC Silicon | 3 | 3322 Road N N.E. " " “” | Moses Lake | WA | 98837 | 47.135278 | -119.193333 | 0 | 3 | Public OCA Chemical; Silane; Flammable Mixture Moses Lake Plant # 80546 | 3 | 3245 N N. E. " " “” | Moses Lake | WA | 98837 | 47.133273 | -119.188444 | 0 | 2 | Public OCA Chemical; Ammonia (anhydrous) Butterfield Water Treatment Plant | 3 | 1306 W. B Street " " “” | Pasco | WA | 99301 | 46.220278 | -119.1025 | 0 | 2 | Public OCA Chemical; Chlorine Big L Packers " " “” | 2 | 12901 Packing House Road | Edison | CA | 93307 | 35.348056 | -118.867222 | 0 | 3 | Public OCA Chemical; Ammonia (anhydrous); Chlorine Endicott, WA 384 | 2 | 215 E Street " " “” | Endicott | WA | 99125 | 46.927778 | -117.686389 | 0 | 3 | Public OCA Chemical; Ammonia (anhydrous); Ammonia (conc 20% or greater) Big N Fertilizer " " “” | 2 | 1201 SW 2nd Street | Tulia | TX | 79088 | 34.535556 | -101.778611 | 0 | 2 | Public OCA Chemical; Ammonia (anhydrous) HOBART NH3 | 2 | 13160 8 ROAD " " “” | PLAINS | KS | 67869 | 37.291111 | -100.5225 | 0 | 2 | Public OCA Chemical; Ammonia (anhydrous)

Step 5: Filter out the facilities you want to map

For this project, we initially wanted to map the facilities falling under Process 2 and Process 3. Read about why, but you could map all facilities and skip this step and go straight to mapping. I'll show you how I filtered the data anyway.

Choose your geocoded CSV file, in this case facilitiesRemappedGeocoded.csv.

Here is where you tell the database manager how to treat each field. Labels, text, descriptions, addresses, are usually treated as VARCHAR. Keep in mind that VARCHAR has a 255-character limit. Anything you want the database manager to treat as a number, such as pounds, dollars, and any other values you might add, subtract, divide, multiply, or perform math on, should be treated as INT.

Don't forget to Name the table in the upper right-hand corner. I'll call it rmpAll, since I'm importing the whole database. Click Import.

CREATE TABLE rmp2and3registered SELECT * FROM rmpAll WHERE (Process = "2" OR Process = "3") AND Is_Facility_Deregistered = "n"

If you only want to display a few columns in the results, replace the * in the SELECT line with the column names, separated by commas.

Run the query.

Step 6: Export your filtered results as a new CSV

Your new table with the filtered results should show up in the Tables panel on the left-hand side. Right click on the new table then select Export… > As CSV file. (Don't mind all the table names in the screenshot below. Like I said, this tutorial benefits from hindsight; the actual process took months of trial and error.)

Step 7: Time to map (on your own)!

Congratulations! Now you have a CSV, of the registered EPA facilities under Programs 2 and 3, with geodata of your choice. Now upload this to your mapping software of choice, and go forth! Visualize away.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.