NCATS-Tangerine/kgx

Name: kgx

Owner: NCATS Data Translator Project - Tangerine Team

Description: knowledge graph exchange tools

Created: 2018-04-24 02:09:07.0

Updated: 2018-05-09 23:41:22.0

Pushed: 2018-05-09 23:41:23.0

Homepage:

Size: 82

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

knowledge graph interchange

A utility library and set of command line tools for exchanging data in knowledge graphs.

The tooling here is partly generic but intended primarily for building the translator-knowledge-graph.

For additional background see the Translator Knowledge Graph Drive

Installation
install -r requirements.txt
on setup.py install
Command Line Usage

Use the --help flag to get help. Right now there is a single command:

e: kgx dump [OPTIONS] [INPUT]... OUTPUT

ansforms a knowledge graph from one representation to another
PUT  : any number of files or endpoints
TPUT : the output file

ons:
input-type TEXT   Extention type of input files: ttl, json, csv, rq, tsv,
                  graphml
output-type TEXT  Extention type of output files: ttl, json, csv, rq, tsv,
                  graphml
help              Show this message and exit.

CSV/TSV representation require two files, one that represents the vertex set and one for the edge set. JSON, TTL, and GRAPHML files represent a whole graph in a single file. For this reason when creating CSV/TSV representation we will zip the resulting files in a .tar file.

The format will be inferred from the file extention. But if this cannot be done then the --input-type and --output-type flags are useful to tell the program what formats to use. Currently not all conversions are supported.

Here are some examples that mirror the tests:

x dump --output-type=csv tests/resources/x1n.csv tests/resources/x1e.csv target/x1out
 created at: target/x1out.tar
x dump tests/resources/x1n.csv tests/resources/x1e.csv target/x1n.graphml
 created at: target/x1n.graphml
x dump tests/resources/monarch/biogrid_test.ttl target/bgcopy.csv
 created at: target/bgcopy.csv.tar
x dump tests/resources/monarch/biogrid_test.ttl target/x1n.graphml
 created at: target/x1n.graphml
x dump tests/resources/monarch/biogrid_test.ttl target/x1n.json
 created at: target/x1n.json
Internal Representation

Internal representation is networkx MultiDiGraph which is a property graph.

The structure of this graph is expected to conform to the tr-kg standard, briefly summarized here:

Serialization/Deserialization

Intended to support

RDF
Neo4J

Neo4j implements property graphs out the box. However, some implementations use reification nodes. The transform should allow for de-reification.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.