newsdev/nyt_inmates

Name: nyt_inmates

Owner: NYT Newsroom Developers

Description: Methodology notes and data from the series on discipline and parole in New York State

Created: 2016-12-20 17:34:56.0

Updated: 2018-02-20 04:37:46.0

Pushed: 2016-12-22 18:55:38.0

Homepage: null

Size: 3718

Language: null

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

RACIAL DISPARITIES IN PRISON DISCIPLINE AND PAROLE

A New York Times investigation found minority inmates in New York State prisons are punished for violating prison rules at a higher rate than white inmates, a gap that is especially pronounced at certain prisons. The investigation also found a racial disparity in parole decisions made by the New York Board of Parole particularly among burglars, thieves and other nonviolent offenders.

For those interested in exploring the data used in the investigation, here is a detailed explanation of the methodology, along with relevant data files.

It's important to put the data analysis in the proper context. The Times investigation relied primarily on interviews with inmates and expert sources by journalists who have covered the New York justice system for years. The data analysis helped confirm and quantify the findings of this traditional form of reporting, but the issues raised and conclusions in the stories were not based on numbers alone.

It's also important to note, as the stories do, the limits of the data analysis. The underlying data, obtained from the state Department of Corrections via the Freedom of Information Law (FOIL), did not contain some important variables a researcher might want to know about the inmates, and cannot fully explain the reasons for the disparities in discipline and parole beyond showing the extent to which the disparities exist. It also made it difficult to conduct more advanced statistical measures, such as regression analysis.

DISCIPLINE

Our approach was to akin to calculating crime rates. We used disciplinary incident data for a single year as the numerator, and the prison population for the year as the denominator.

We did not rely on one single metric. We looked at how many disciplinary tickets were issued, how many specific violations were listed on each of those tickets, how many of the tickets resulted in solitary confinement, and how much solitary time was sentenced. We looked at all metrics by race and analyzed the data separately using other variables ? gender, age, facility and the type of offender.

The disparity in punishment persists in many different ways. There is not one subgroup that drives the difference. In certain prisons, and among certain groups, the gap grows larger or smaller but for the most part, doesn't go away.

In response to a FOIL request, the state provided a log of 59,394 adjudicated disciplinary incidents from calendar year 2015. The table included only Tier II (moderate) and Tier III (severe) incidents for which the inmate was found guilty and punished.

Each incident included a personal identifier for the inmate, which we have converted here to our own identifier to avoid posting personally identifiable information, as well as the place and time of the incident, a list of specific violations the inmate was found guilty of for that incident, and the resulting punishment, including the number of days, if any, the inmate was sentenced to solitary confinement.

The disciplinary data contained no demographic characteristics for inmates. But using the personal identifier, The Times was able to attach a number of key variables by joining the disciplinary data to other databases we had obtained from the state. A key source of this was point-in-time prison “snapshots” that list all inmates currently incarcerated, along with demographic information about the inmate and the crime of imprisonment.

We used two of these snapshots ? one from the middle of 2015, one from the end ? to create our population file, which we used as the denominator in our calculations. While the demographic characteristics of the inmate population do not change much over the course of a single year, we used two different snapshots and averaged them to smooth out any possible anomalies.

In our analysis, we collapsed the race and ethnicity of inmates into four categories ? non-Hispanic white, non-Hispanic black, Hispanic and other. (Less than five percent of the inmate population is Asian, Native American or unclassified).

We also reshaped the incident file into a relational dataset. The state provided specific violations involved in each incident as columns in the primary table (violation 1, violation 2, etc.), which made it difficult to look at the specific offenses for which inmates were being punished. So we converted this into a separate table of violations, where each of the 139,759 specific infractions became its own row.

These tables ? the disciplinary tickets and the specific violations ? in tandem provided us with raw counts of incidents that could be grouped by race, age, gender, offender type, etc.

We paid particular attention to the type of offender, to see if, for example, people in prison for violent offenses were more likely to get punished in prison. We used three different measures. One being the severity of the crime considered the “primary offense” leading to the inmate's imprisonment; second was the overall severity, as measured in the number of total crimes committed and the combined severity of these crimes; and lastly, based on state standards, we placed offenders into three categories used in state reports based on their primary crime: violent offenders, property criminals and those considered to have committed “other coercive” offenses, which are mostly lower-levels of violent offenses.

The snapshot data, provided in a file called “inmates” has all of the same demographic fields about inmates so it's easy to write queries to produce a denominator matching the numerator. Users should remember, though, that because the inmates file consists of two snapshots at different points of the year, they should always divide the results by 2 to get a proper population estimate.

To best illustrate how our analysis worked, here are some query examples written in the SQL query language. Users should first import the .csv files provided into a database manager or other statistical software before proceeding.

One thing we observed is that the disparity in rates held for all age groups. To generate the numbers that show this, we can first query the overall inmate population, group by race and age group, and divide the count by 2 to average the two snapshots that comprise our inmate file:

SELECT race,age_group,COUNT(id)/2 as inmates FROM inmates GROUP BY 1,2 ORDER BY 1,2

| race | age_group | Inmates | | — | — | — | | B | 25-29 | 4,384 | | B | 30-39 | 7,162 | | B | 40+ | 9,963 | | B | <25 | 3,736 | | H | 25-29 | 1,903 | | H | 30-39 | 3,614 | | H | 40+ | 4,766 | | H | <25 | 1,483 | | O | 25-29 | 278 | | O | 30-39 | 426 | | O | 40+ | 623 | | O | <25 | 256 | | W | 25-29 | 2,089 | | W | 30-39 | 3,648 | | W | 40+ | 5,791 | | W | <25 | 1,212 |

We then can run the same query on the discipline table to count tickets:

SELECT race, age_group, COUNT(id) as tickets FROM disciplines GROUP BY 1,2 ORDER BY 1,2

| crace | age_group | tickets | | — | — | — | | B | 25-29 | 7,358 | | B | 30-39 | 8,004 | | B | 40+ | 6,577 | | B | <25 | 9,483 | | H | 25-29 | 2,945 | | H | 30-39 | 4,104 | | H | 40+ | 3,259 | | H | <25 | 3,739 | | O | \N | 1 | | O | 25-29 | 388 | | O | 30-39 | 356 | | O | 40+ | 338 | | O | <25 | 568 | | W | 25-29 | 2,951 | | W | 30-39 | 3,677 | | W | 40+ | 3,231 | | W | <25 | 2,375 |

We can then combine the results of these two queries to generate rates:

| | White | White | | Black | Black | | | | — | — | — | — | — | — | — | — | | AGE | Inmates | Tickets | Rate | Inmates | Tickets | Rate | Diff | | <25 | 1,212 | 2,375 | 1.96 | 3,736 | 9,483 | 2.54 | 30% | | 25-29 | 2,089 | 2,951 | 1.41 | 4,384 | 7,358 | 1.68 | 19% | | 30-39 | 3,648 | 3,677 | 1.01 | 7,162 | 8,004 | 1.12 | 11% | | 40+ | 5,791 | 3,231 | 0.56 | 9,963 | 6,577 | 0.66 | 18% | | TOTAL | 12,739 | 12,234 | 0.96 | 25,244 | 31,422 | 1.24 | 30% |

This shows us that the discipline rate is higher for young people whether they are white or black, but that the gap in rates persists for all age groups. It also shows us that the young inmate population skews black (3,736/1,212 under 25 vs. 9,963/5,791 for 40+), which contributes to the overall disparity but does not explain it entirely.

We can then run the same queries but limit the results to a specific facility, such as Clinton Correctional Facility, by adding “where nyt_facility='Clinton'” to the where clause:

| | White | White | | Black | Black | | | | — | — | — | — | — | — | — | — | | AGE | Inmates | Tickets | Rate | Inmates | Tickets | Rate | Diff | | <25 | 44 | 71 | 1.63 | 147 | 347 | 2.36 | 45% | | 25-29 | 66 | 66 | 1.00 | 227 | 391 | 1.73 | 73% | | 30-39 | 161 | 119 | 0.74 | 410 | 528 | 1.29 | 74% | | 40+ | 360 | 168 | 0.47 | 512 | 374 | 0.73 | 56% | | TOTAL | 630 | 424 | 0.67 | 1,295 | 1,640 | 1.27 | 88% |

This shows a much higher level of ticketing disparity than the state as a whole, mostly because whites were punished at lower rates than in other prisons (0.67 rate at Clinton, 0.96 statewide). It also shows, once again, that younger inmates are punished at higher rates than older inmates, and that younger inmates are disproportionately black (when compared to the white inmate population).

We found that the disparity in 2015 punishment seemed to grow with the severity of what we were measuring. Because minority inmates averaged more individual violations than whites each time a ticket was issued, the disparity for violations was greater than the disparity for tickets alone. To see this, let's count violations instead of tickets by joining the discipline and violation tables:

SELECT race, COUNT(id)/2 AS inmates FROM inmates GROUP BY 1

SELECT race, COUNT(violations.id) as violations FROM violations INNER JOIN disciplines ON disciplines.incident_id=violations.incident_id GROUP BY 1

| Race | Inmates | Violations | Rate | Rate vs White | | — | — | — | — | — | | B | 25,244 | 76,598 | 3.03 | 47% | | H | 11,765 | 33,113 | 2.81 | 37% | | O | 1,582 | 3,831 | 2.42 | | | W | 12,739 | 26,217 | 2.06 | |

The disparity was even greater for tickets that resulted in solitary confinement:

SELECT race, COUNT(id) AS shu_tickets FROM disciplines WHERE current_net_shu>0 GROUP BY 1 ORDER BY 1

| race | Inmates | shu_tickets | Rate | Rate vs White | | — | — | — | — | — | | B | 25,244 | 6,641 | 0.26 | 65% | | H | 11,765 | 3,074 | 0.26 | 64% | | O | 1,582 | 302 | 0.19 | | | W | 12,739 | 2,029 | 0.16 | |

These queries are just examples of how we approached the data. Here is documentation for the data files we are posting so readers can write their own queries.

FILES:

disciplines.csv

This is a table of 59,394 disciplinary tickets for which a New York State prison inmate was punished in 2015. Each row is a distinct incident, and the incident_id field joins to the violations table, which includes one or more specific infractions for which the inmate was punished. The file layout, a sample record and field descriptions are below:

| Id | 1 | The primary key for this table | | — | — | — | | inmate_id | 16144 | A unique identifier for individual inmates | | incident_id | 45341 | A unique identifier for disciplinary tickets. Join this field to the violations table to see the specific infractions involved with individual incidents | | Age | 33 | The inmate's age at the time of the incident | | age_group | 30-39 | A convenience field for quick age grouping | | Race | B | W= non-Hispanic White; B= non-Hispanic Black; H=Hispanic; O=Other; | | Sex | M | M=Male, F=Female | | primary_crime | MURDER 2ND | While inmates can be convicted of more than one crime, this is primary crime of conviction. | | crime_class | A | New York felonies range from Class A – the most severe – to Class E, the least severe. Most crime types are usually placed into the same classification, but some can be bumped up to higher felony levels under certain circumstances. | | Severity | A1 | This summarizes the severity by incorporating multiple counts and crimes into a single string. For example,. A1B1 means the inmate is serving time for one count of a class A felony and one count of a class B crime ; C2D4+ means two counts of a class C felony, 4 or more counts of a class D felony | | official_crime_type | VFO | This indicates whether the primary crime is, under state standards, a violent offense (VFO), a property offense (PDO), or an “other coercive” offense (CVO), which are mostly lower-severity (3rd degree or lower) of violent crimes. | | incident_date | 1/20/15 | Reported date of incident in prison | | incident_time | 1520 | Reported time of incident | | incident_facility | UPSTATE SHU | Facility name listed in state records ? this includes the name of the prison and whether the inmate was in a specific program within the prison. For example, some prisons have an annex, or a drug treatment program. | | facility_id | 46 | This is a New York Times ID for facilities | | nyt_facility | Upstate | This is a New York Times standardized name for facilities | | program_id | 121 | This ID is unique for specific programs within prisons | | tier | 3 | This indicates whether the disciplinary incident was moderate (Tier II) or severe (Tier III) | | current_net_shu | 30 | This shows how much solitary confinement time, if any, was sentenced as a result of this incident. To count tickets where the inmate was sent to solitary, set current_net_shu>0 |

Violations.csv

This table lists one or more violations associated with every disciplinary ticket ? it joins to the disciplines table via the incident_id field. The original source data listed each violation horizontally across the disciplinary records. By reshaping the data, it is easier to focus on specific violations. In reshaping, the first violation listed was given a rule_id of 1, the second, 2, and so on. The Times analysis found in general that white inmates racked up fewer violations in general, and much fewer in certain categories.

| id | 1 | Primary key for the table | | — | — | — | | incident_id | 45341 | A unique identifier for disciplinary tickets, joins to the disciplines table | | incident_date | 1/20/15 | The date of the incident | | rule_id | 1 | This shows the order in which the violation was listed on the disciplinary ticket. In queries that simultaneously count tickets and violations, sum all records for violations but for tickets count only those where rule_id=1. | | rule | 102.1 | This is administrative code for the rule that was violated – see http://www.legal-aid.org/media/121933/standards-of-inmate-behavior%20(2).pdf | | rule_label | THREATS | This is the description of the violation. See the link above for a more robust description. |

Inmates.csv

This table provides the denominator for rate calculations. It combines the rosters of all inmates in New York prisons on 5/31/2015 and January 1, 2016. Combined, this represents the average population for the system and for individual institutions for 2015. Because this file combines two different inmate rosters, users must divide any counts by 2 to get a proper count of any demographic group being queried.

| Id | 1 | Primary key for the table | | — | — | — | | nyt_facility | Upstate | This is a New York Times standardized name for facilities | | facility_id | 46 | This is a New York Times ID for facilities | | facility_name | UPSTATE SHU | Facility name listed ? this includes the name of the prison and whether the inmate was in a specific program within the prison. | | program_id | 121 | This ID is unique for specific programs within prisons | | Sex | M | M=Male, F=Female | | Age | 34 | The inmate's age at the time the snapshot (either the middle or end of 2015) | | age_group | 30-39 | A convenience field for quick age grouping | | Race | B | W= non-Hispanic White; B= non-Hispanic Black; H=Hispanic; O=Other; | | primary_crime | MURDER 2ND | While inmates can be convicted of more than one crime, this is the first listed in state records | | crime_class | A | New York felonies range from Class A ? the most severe ? to Class E, the least severe. Most crime types are usually the same classification but some can be bumped up under certain circumstances. | | Severity | A1 | This summarizes the severity by incorporating multiple counts and crimes into a single string. For example, A1B1 means the inmate is serving time for one count of a class A felony and one count of a class B crime; C2D4+ means two counts of a class C felony, 4 or more counts of a class D felony | | official_crime_type | VFO | This indicates whether the primary crime is, under state standards, a violent offense (VFO), a property offense (PDO), or an “other coercive” offense(CVO), which are mostly lower-severity (3rd degree or lower) of violent crimes. |

PAROLE

There are two primary differences in methodology for the analysis of decisions by the New York Board of Parole. For one, the source data was downloaded from the agency's website and not obtained via FOIL; and secondly, the initial analysis showed that the racial disparity in decisions did not persist for every group, but was concentrated in cases involving lower-level offenders.

The board does not provide an easy-to-use download link for anybody that wants to study cases, so instead, The Times programmatically downloaded decisions from the board's calendar. The analysis considered all cases posted from May 2013 through May 2016.

That data includes decisions for various hearing types, but The Times focused on the initial hearings that parole-eligible inmates have upon completing their minimum sentence. For the bulk of inmates, this hearing represents the earliest possible release date, and success or failure here has a major impact on the amount of time he ultimately serves.

We further whittled down the dataset by eliminating cases where no decision was rendered ? listed in the data as “or other” decisions. Often, the case is postponed because of missing paperwork or some other administrative snafu.

This left us with cases where the interview type was listed as “initial” and the decision was actually rendered, listed as either “denied,” “open date” (parole granted) or “paroled.” We also decided to focus on male inmates. While there was also a racial disparity for women, the overall release rate was much higher, suggesting women have a totally different experience with the board than men.

Within this core dataset, we at first observed that the disparity was particularly large for offenders who had been convicted of felonies that in New York State are classified as “C”, “D” or “E” ? less serious than “A” or “B” felonies, which are mostly comprised of violent crimes.

When we later attached the state's official crime code table (see page 34, http://www.doccs.ny.gov/Research/Reports/2016/Statistical_Overview_2015_Discharges.pdf)), we gained more insight that the board rarely paroled violent offenders of any race, but that there was a racial disparity among inmates who had committed property crimes.

In the article, we use third-degree burglary as an example of this, but it also applies to inmates doing time for larceny and lower-level robbery charges. These inmates, by the way, comprise a significant share of the board's work. Because of changes in sentencing laws, a growing share of violent offenders are no longer parole eligible, and if they are, they have to serve more years behind bars before reaching their initial hearing date.

The Times had also requested from the state variables from the COMPAS system, an algorithm that synthesizes numerous characteristics about each inmate into a set of “risk” scores. These variables may have provided additional clarity, since they include such important factors as the inmate's full criminal history and complete disciplinary record while behind bars. The state, citing inmate confidentiality, refused to release these variables.

Here are some examples of how we queried the data:

To get the overall release rate by race:

SELECT race, COUNT(id) AS hearings, SUM(IF(decision='yes',1,0)) AS releases, SUM(IF(decision='yes',1,0))/ COUNT(id) AS release_rate FROM paroles GROUP BY 1 ORDER BY 1

| race | hearings | releases | release_rate | | — | — | — | — | | B | 5,874 | 870 | 14.8% | | H | 3,004 | 475 | 15.8% | | O | 454 | 104 | 22.9% | | W | 4,544 | 1,124 | 24.7% |

One way to break this down is by offender type, which shows that most of the disparity occurs among lower-level property offenders:

SELECT race, official_crime_type, COUNT(id) AS hearings, SUM(IF(decision='yes',1,0)) AS releases, SUM(IF(decision='yes',1,0))/ COUNT(id) AS release_rate FROM paroles GROUP BY 1,2 ORDER BY 2,1

| race | official_crime_type | hearings | releases | release_rate | | — | — | — | — | — | | B | CVO | 1,135 | 167 | 14.7% | | H | CVO | 559 | 85 | 15.2% | | O | CVO | 95 | 20 | 21.1% | | W | CVO | 811 | 147 | 18.1% | | | | | | | | B | PDO | 2,663 | 482 | 18.1% | | H | PDO | 1,487 | 276 | 18.6% | | O | PDO | 263 | 73 | 27.8% | | W | PDO | 2,964 | 895 | 30.2% | | | | | | | | B | VFO | 2,076 | 221 | 10.7% | | H | VFO | 958 | 114 | 11.9% | | O | VFO | 96 | 11 | 11.5% | | W | VFO | 768 | 82 | 10.7% |

FILE: paroles.csv

| id | 1 | Primary key for the table | | — | — | — | | age | 40 | Age at time of hearing | | age_group | 40+ | A convenience field for quick age grouping | | race | H | W= white non-Hispanic, B= black non-Hispanic, H=Hispanic, O=Other | | dins | 2 | Number of prison terms for this inmate for distinct crimes (does not include prison stints for parole violations) | | primary_crime | ATT MURDER-2 | Primary crime listed on the parole calendar | | crime_class | B | New York felonies range from Class A ? the most severe ? to Class E, the least severe. Most crime types are usually the same classification but some can be bumped up under certain circumstances. | | min_years | 20 | Minimum sentence in years | | min_months | 0 | Additional months of minimum sentence | | max_years | 40 | Maximum sentence in years (99=Life) | | max_months | 0 | Additional months of maximum sentence | | severity | B1B1D2 | This summarizes the severity by incorporating multiple counts and crimes into a single string. For example, A1B1 means the inmate is serving time for one count of a class A felony and one count of a class B crime; C2D4+ means two counts of a class C felony, 4 or more counts of a class D felony | | official_crime_type | VFO | This indicates whether the primary crime is, under state standards, a violent offense (VFO), a property offense (PDO), or an “other coercive” offense9 CVO), which are mostly lower severity (3rd degree or lower) of violent crimes. | | decision | NO | Whether the Parole Board voted to release the inmate | | | | |

For any questions or comments, contact me at rgebeloff@nytimes.com or @gebeloffnyt on Twitter.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.