Name: health-graph
Owner: Neo4j Examples
Description: Graph of health and pharm data.
Created: 2016-06-15 21:35:21.0
Updated: 2018-03-06 12:02:59.0
Pushed: 2016-10-15 06:11:54.0
Homepage: null
Size: 1924
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Click to check the detailed documentation of the schema. I blog post about this project where you can find topic about Modeling data in Neo4j, ETL XML data using APOC.
follow me at:
email: shiyaqi29@hotmail.com
All the data for this project are avalible online and you can download or get the detailed data documentation by clicking on the link.
'FDA Drug Codes' txt
converts xlsx to csv
meter: {xls} the path of xls/xlsx file. {target} the path of CSV file.
function converts xlsx file to csv format.
tring_conerter.py
fines functions which are used to process string values in a list of dictionary. It is part of the proccedure in creating relationship for nodes when using fuzzy string matching**
emove_non_alphaNumerics(lst, key)```
ort_strings(lst, key)```
ETL disclosure xml files
get_property functions:
Disclosure_property (file)
LobbyFirm_proferty (file)
Client_property (file)
Issue_property(file)
Lobbyist_property(file)
Parameter: {file}: the path to the Lobbying Disclosures XML file.
The function calls APOC procedure ?apoc.load.xml? to extract child elements in the Disclosure files, then store the values in dictionary which will be passed to create_node functions as node properties.
create_node functions:
te_Disclousure_node (properties, file)
te_LobbyFirm_node(properties)
te_Client_node(properties)
te_Issue_node(properties)
te_lobbyist_node(properties, issueID)
Parameter: {file}: the path to the Lobbying Disclosures XML file.
{properties}: a dictionary of properties.
{issueID}: the internal Id of Issue node
The function calls Cypher CREATE or MERGE to create nodes with properties. Index are created. The function returns the internal id of the node being created.
ETL contribution xml files
r_type(file)
The function takes a contribution xml file to check the contribution filer type. It returns a string: either ?L? or ?O?. (?L?- lobbyist, ?O?- Lobby Firm)
contribution(file)
The function takes a contribution xml file to check if the contribution is empty. It returns a boolean.
get_property_cb functions:
LobbyFirm_property_cb (file)
Lobbyist_property_cb (file)
contribution_property_cb(file)
committee_property_cb(file)
legislator_property_cb(file)
Parameter: {file}: the path to the Lobbying Contribution XML files.
The function calls APOC procedure ?apoc.load.xml? to extract child elements in the Contribution files, then store the values in dictionary which will be passed to create_node functions as node properties.
create_NODE_node_cb functions:
te_LobbyFirm_node_cb(properties, file)
te_Lobbyist_node_cb(properties)
te_contribution_node_cb(property_lst)
te_committee_node(property_lst, contributionID)
te_legislator_node(property_lst, committeeID)
te_contributor_node(property, contribution_id )
Parameter: {file}: the path to the Lobbying Contribution XML files.
{properties}: a dictionary of properties.
{contributionID}: the internal Id of contribution node
The function calls Cypher CREATE or MERGE to create nodes with properties. Index are created. The function returns the internal id of the node being created.
ributerType(file)
The function takes a contribution file to extract contributor type for a contribution. The function returns a dictionary to store contributor type and contribution number. If contributor type is ?Self?, create [:FILED{self:1}] between a filer and a contribution, a filer is either a lobbyist or lobbyFirm(refer to filer_type(file) function). If contributor type is not ?Self?, create a Contributor node to store this information, set [:FILED{self:0}] between a filer and a contribution.
ETL drug txt file
te_Drug_node(file)
Take the drug txt file to extract properties and create the drug node.
ETL drug manufacture txt file
te_DrugFirm_node(file, g)
Take the drug manufacture txt file to extract properties and create the drug node.
ETL prescription csv file
te_prescription_node(file, g)
Take the prescription csv file to extract properties and create the prescription node.
ETL provider csv file
te_provider_node(file, g)
Take the provider csv file to extract properties and create the provider node.
Create drugfirm-[:BRANDS]->(drug) by doing fuzzy match.
Stored drug.labelerNames, drug internal IDs in a list of dictionaries:
[{laberName1: name1, id: id1}..{laberNameX: nameX, id: idX}]
Stored drugfirm.firmName. drugfirm internal ID in a list of dictionaries:
[{firmName1: name1, id: id1}..{lfirmNameX: nameX, id: idX}]
Processed name values in these 2 lists:
Returned unique values in the list with node id aggregated for the same value. Example:
[{name1: id1, id2, id3.. idX}, {name2 : id5}]
Calling fuzzywussy package to do string matching between labeler name and firmname. Result is stored in array such:
[[[id(drug1)], [id(matched drugfirm1), ?,id(matched drugfirmX)]],
[[id(drug2)], [id(matched drugfirm1), ?,id(matched drugfirmX)]],
…
…
[[id(drugX)], [id(matched drugfirm1), ?,id(matched drugfirmX)]]]
Create relations between drug and drugFirm by using the node id.