CD2H gitForager

neo4j-examples/health-graph

Name: health-graph

Owner: Neo4j Examples

Description: Graph of health and pharm data.

Created: 2016-06-15 21:35:21.0

Updated: 2018-03-06 12:02:59.0

Pushed: 2016-10-15 06:11:54.0

Homepage: null

Size: 1924

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

health-graph

Health-Graph schema Click to check the detailed documentation of the schema. I blog post about this project where you can find topic about Modeling data in Neo4j, ETL XML data using APOC.

follow me at:

Twitter

email: shiyaqi29@hotmail.com

Data Source

All the data for this project are avalible online and you can download or get the detailed data documentation by clicking on the link.

xlsx_2_csv.py

converts xlsx to csv

meter: {xls} the path of xls/xlsx file. {target} the path of CSV file. 

function converts xlsx file to csv format. 

tring_conerter.py
fines functions which are used to process string values in a list of dictionary. It is part of the proccedure in creating relationship for nodes when using fuzzy string matching**

emove_non_alphaNumerics(lst, key)```

ort_strings(lst, key)```

Load_disclosure.py

ETL disclosure xml files

get_property functions:

Disclosure_property (file)


LobbyFirm_proferty (file)


Client_property (file)


Issue_property(file)


Lobbyist_property(file)

Parameter: {file}: the path to the Lobbying Disclosures XML file.

The function calls APOC procedure ?apoc.load.xml? to extract child elements in the Disclosure files, then store the values in dictionary which will be passed to create_node functions as node properties.

create_node functions:

te_Disclousure_node (properties, file)

te_LobbyFirm_node(properties)

te_Client_node(properties)

te_Issue_node(properties)

te_lobbyist_node(properties, issueID)

Parameter: {file}: the path to the Lobbying Disclosures XML file.

{properties}: a dictionary of properties.

{issueID}: the internal Id of Issue node

The function calls Cypher CREATE or MERGE to create nodes with properties. Index are created. The function returns the internal id of the node being created.

Load_contribution.py

ETL contribution xml files

r_type(file)

The function takes a contribution xml file to check the contribution filer type. It returns a string: either ?L? or ?O?. (?L?- lobbyist, ?O?- Lobby Firm)

contribution(file)

The function takes a contribution xml file to check if the contribution is empty. It returns a boolean.

get_property_cb functions:

LobbyFirm_property_cb (file)

Lobbyist_property_cb (file)

contribution_property_cb(file)

committee_property_cb(file)

legislator_property_cb(file)

Parameter: {file}: the path to the Lobbying Contribution XML files.

The function calls APOC procedure ?apoc.load.xml? to extract child elements in the Contribution files, then store the values in dictionary which will be passed to create_node functions as node properties.

create_NODE_node_cb functions:

te_LobbyFirm_node_cb(properties, file)

te_Lobbyist_node_cb(properties)

te_contribution_node_cb(property_lst)

te_committee_node(property_lst, contributionID)

te_legislator_node(property_lst, committeeID)

te_contributor_node(property, contribution_id )

Parameter: {file}: the path to the Lobbying Contribution XML files.

{properties}: a dictionary of properties.

{contributionID}: the internal Id of contribution node

The function calls Cypher CREATE or MERGE to create nodes with properties. Index are created. The function returns the internal id of the node being created.

ributerType(file)

The function takes a contribution file to extract contributor type for a contribution. The function returns a dictionary to store contributor type and contribution number. If contributor type is ?Self?, create [:FILED{self:1}] between a filer and a contribution, a filer is either a lobbyist or lobbyFirm(refer to filer_type(file) function). If contributor type is not ?Self?, create a Contributor node to store this information, set [:FILED{self:0}] between a filer and a contribution.

Load_drug.py

ETL drug txt file

te_Drug_node(file)

Take the drug txt file to extract properties and create the drug node.

Load_drugfirm.py

ETL drug manufacture txt file

te_DrugFirm_node(file, g)

Take the drug manufacture txt file to extract properties and create the drug node.

Load_prescription.py

ETL prescription csv file

te_prescription_node(file, g)

Take the prescription csv file to extract properties and create the prescription node.

Load_provider.py

ETL provider csv file

te_provider_node(file, g)

Take the provider csv file to extract properties and create the provider node.

drug_df_rel.py

Create drugfirm-[:BRANDS]->(drug) by doing fuzzy match.

Stored drug.labelerNames, drug internal IDs in a list of dictionaries:

[{laberName1: name1, id: id1}..{laberNameX: nameX, id: idX}]

Stored drugfirm.firmName. drugfirm internal ID in a list of dictionaries:

[{firmName1: name1, id: id1}..{lfirmNameX: nameX, id: idX}]

Processed name values in these 2 lists:

lowering case
remove non alphanumeric marks
chop the end with ?s?
ordered words in the names

Returned unique values in the list with node id aggregated for the same value. Example:

[{name1: id1, id2, id3.. idX}, {name2 : id5}]

Calling fuzzywussy package to do string matching between labeler name and firmname. Result is stored in array such:

[[[id(drug1)], [id(matched drugfirm1), ?,id(matched drugfirmX)]],

[[id(drug2)], [id(matched drugfirm1), ?,id(matched drugfirmX)]],

…

[[id(drugX)], [id(matched drugfirm1), ?,id(matched drugfirmX)]]]

Create relations between drug and drugFirm by using the node id.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.