TransparentHealth/provider-data-tools

Name: provider-data-tools

Owner: Transparent Health

Description: Tools for working with CMS Health Provider Data

Forked from: HHSIDEAlab/provider-data-tools

Created: 2017-02-20 14:27:06.0

Updated: 2017-02-20 14:27:07.0

Pushed: 2016-12-19 18:13:03.0

Homepage: null

Size: 7297

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

pdt - Provider Data Tools

Version: 0.8.2.3
Build Status

This repository contains a number of command-line utilities and related code libraries for parsing, creating, and validating US-based health provider data. These tools are:

Parsing Scripts

Indexing Scripts

Note: These scripts are only meant to be run after the data/files to be indexed have already been loaded into MongoDB.

Pulling/Loading Scripts

Utility Scripts

Please note the utilities csv2json, json2mongo, and jsondir2mongo have been moved from pdt and placed in their own package called jdt. These tools are generic and have utility outside health provider data.

Requirements

These scripts require Python >= 2.7 or Python >= 3.3

In order to utilize all of the scripts that Provider Data Tools provides, you will need to have MongoDB Installed and running. See MongoDB Docs for reference on installation.

Installation

You can install the tool using pip.

To install with pip just type:

~$ sudo pip install pdt

Note: If you use sudo, the scripts will be installed at the system level and used by all users. Add --upgrade to the above install instructions to ensure you fetch the newest version.

chop_nppes_public.py

To make use of this script you need first fecth the “NPPES Data Dissemination” file.

To obtain the “NPPES Data Dissemination”, go to http://download.cms.gov/nppes/NPI_Files.html. Get the “Full Replacement Monthly” zip file. Unzip the file with the unzip tool of your choice.

To run the utility simply call it on a command line and provide one command line argument, the csv file to parse:

~$ chop_nppes_public.py npidata_20050523-20140413.csv

The file name npidata_20050523-20140413.csv will vary depending on the date.

The script make take a few minutes to complete. When it completes you will have more files in your current directory. Everything is still indexed by NPI. These files are described below.

csv2pjson_public.py

Convert the NPPES Public Data Dissemination CSV file format to a directory of files in ProviderJSON format.

Usage:

csv2pjson_public.py [CSV_FILE] [OUTPUT_DIR]

Example:

csv2pjson_public.py public_csvfile.csv output

Output:

One file is created per line in the CSV file file inside the directoryoutput. Files are fanned out into a directory structure so as not to create millions of files in one directory.

csv2fhir_public.py

Convert the NPPES Public Data Dissemination CSV file format to a directory of files in ProviderJSON format.

Usage:

csv2fhir_public.py [CSV_FILE] [OUTPUT_DIR]

Example:

csv2fhir_public.py public_csvfile.csv output

Output:

One file is created per line in the CSV file file inside the directoryoutput. Files are fanned out into a directory structure so as not to create millions of files in one directory.

validate_pjson

Validate the PJSON for compliance with a create/update request. It returns errors and warnings in JSON to stdout.

Usage:

validate_pjson [ProivderJSON] [update|create]

Example:

validate_pjson  1003819723.json update

Example Output:

{
"errors": [
    "authorized_official_telephone_number must be in XXX-XXX-XXXX format.",
    "EIN is required for a type-2 organization provider."
],
"warnings": [
    "Enumeration date is generated by CMS. The provided value will be ignored.",
    "Last updated date is generated by CMS. The provided value will be ignored.",
    "status is determined by CMS. The provided value will be ignored."
]
}
pull_pecos.py

The script will download all or individual Public Provider Enrollment Files from https://data.cms.gov/public-provider-enrollment in CSV format. Note that wget is a requirement for this script.

Usage:

  pull_pecos.py [DOWNLOAD ALL Y/N] [DOWNLOAD BASE Y/N] [DOWNLOAD REASSIGNMENT Y/N]  [DOWNLOAD ADDRESS Y/N]

Example:

  pull_pecos.py y n n n

Example Output:

      Downloading Address CSV file
    --2016-07-12 10:39:00--  https://data.cms.gov/api/views/je57-c47h/rows.csv?accessType=DOWNLOAD
    Resolving data.cms.gov (data.cms.gov)... 216.227.229.148
    Connecting to data.cms.gov (data.cms.gov)|216.227.229.148|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [text/csv]
    Saving to: ?pecos_address.csv?

    pecos_address.csv
      [                  <=>              ]   3.14M   914KB/s   
loadnppes.py

By streamlining several of the pdt utilities, the script loadnppes.py combines functionalty for automatic setup. The script will download public data, parse to JProvider SON, and load to MongoDB in one step. Note this script requires unzip and wget to be installed.

Usage:

loadnppes.py [PROCESS_FULL Y/N] [DOWNLOAD_FROM_PUBLIC_FILE Y/N] [DELETE FILES AFTER UPLOADED TO MONGO?]"

Example:

loadnppes.py y y

Example Output:

Downloading http://nppes.viva-it.com/NPPES_Data_Dissemination_March_2015.zip
--2015-04-13 14:14:57--  http://nppes.viva-it.com/NPPES_Data_Dissemination_March_2015.zip
Resolving nppes.viva-it.com (nppes.viva-it.com)... 68.142.118.4, 68.142.118.254
Connecting to nppes.viva-it.com (nppes.viva-it.com)|68.142.118.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 514406694 (491M) [application/zip]
Saving to: `NPPES_Data_Dissemination_March_2015.zip'

0% [                                       ] 2,691,064   58.1K/s  eta 3h 38m
.
.
.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.