ihmpdcc/portal-api

Name: portal-api

Owner: Human Microbiome Project

Description: API for the HMP data portal (found at https://portal.hmpdacc.org)

Created: 2017-02-15 15:56:18.0

Updated: 2017-10-16 14:29:54.0

Pushed: 2017-10-09 19:21:28.0

Homepage: https://portal.hmpdacc.org

Size: 232

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Human Microbiome Project (HMP) Portal API

Overview
Video Tutorials
Setup
  1. First, make sure all dependencies are installed and start Neo4j
  2. Request an account to access OSDF
  3. Use the loader to move the data from CouchDB to Neo4j
  4. Make sure your Neo4j server is started and available
  5. python couchdb2neo4j_with_tags.py --db <OSDF_URL> --neo4j_password <NEO4J_PASS>
  6. Start your Flask app (must create a conf.py before doing this):
  7. conf.py requires:
  8. app_root - the path to the location of this repository
  9. access_origin - a list of origins to accept requests from
  10. be_port - the port to run this API on
  11. be_loc - the IP:PORT that the API is accessible on
  12. secret_key - a complex and private string for Flask to sign sessions/cookies with
  13. neo4j_ip - IP of where Neo4j is running
  14. neo4j_bolt - port for Neo4j bolt connection
  15. neo4j_http - port for Neo4j http connection
  16. neo4j_un - username to access Neo4j database
  17. neo4j_pw - password to access Neo4j database
    • An additional MySQL conf is needed under the ./lib/ directory if logins are to be supported, fill with dummy values if logging in is not required
  18. mysql_h - host of the MySQL authentication database
  19. mysql_db - name of the DB that houses the username+password rows
  20. mysql_un - username to access this MySQL database
  21. mysql_pw - password to access this MySQL database
    • An additional set of the same parameters but for a database which will store session/query history information
  22. mysql_h_2 - host of the MySQL session/query database
  23. mysql_db_2 - name of the DB that houses the session/query information
  24. mysql_un_2 - username to access this MySQL database
  25. mysql_pw_2 - password to access this MySQL database
  26. Once conf.py is made, use the command python app.py
  27. Can now interact with the GQL at any of the following endpoints or setup your own portal UI
  28. /sum_schema
  29. /ac_schema
  30. /files_schema
  31. /table_schema
  32. /indiv_files_schema
  33. /indiv_sample_schema
Dependencies
Searching

The HMP portal offers two methods for searching the data: facet search and advanced search.

Facet Search

Facet Search enables one to search for data entirely through clicking. Clicking a slice of a pie chart or a checkbox within the panel on the left will subset the data by the selected property+value combination. Additional properties can be added to subset by via the “Add a filter” option towards the top-right of the panel on the left. Selecting a new property here will add a new set of values to the panel on the left that one can interact with to filter the data by. Facet search builds by inclusion of a particular property+value combination, in order to efficiently perform an exclusive search (e.g. looking for all data not associated with a particular property+value combination) it is recommended to use Advanced Search.

Advanced Search

Advanced Search is meant to be similar to how one would query a database directly. Each query requires the following general format:

perty) (comparison operator) (value)

The property is what you want to search on. The comparison operator is how you want to relate your value to your property. Your value is what you want to subset your property by.

For example:

ect.name = "Human Microbiome Project (HMP)"

The results of this query will be only those samples and files that are associated with the project name “Human Microbiome Project (HMP)“.

Try type this query in the interface and observe how auto-complete helps along the way. Auto-complete should be used for every query as it pulls directly from the database and makes sure you are searching by a valid property, comparison operator, and value. Thus, if you use auto-complete and find no results in your query, you know you have entered combinations of property+comparison operator+value which do not exist. It is also helpful to navigate through the values found as this consists of all the values that currently exist in the database for that particular property.

Available Properties

The list of properties available to search on is actively growing. Below you can find the name+description for those which will eventually be searchable.

Controlled Vocabulary

The HMP portal converts the OSDF document data store of the HMP data into a graph representation. During this process certain data values are harmonized to facilitate searching. Thus, multiple OSDF values may map to a single HMP representation (e.g. both body sites 'FMA:64183' and 'stool' in OSDF become solely 'feces' in the HMP portal). Below is a table which maps the HMP portal representation of a data point to the data point(s) it originates from in OSDF.

| HMP representation | OSDF representation | | —————— | ——————- | | study name | | | 16S-GM-AO | The Thrifty Microbiome: The Role of the Gut Microbiota in Obesity in the Amish. | | 16S-GM-CD | Effect of Crohn's Disease Risk Alleles on Enteric Microbiota. | | 16S-GM-CD2 | Diet, Genetic Factors, and the Gut Microbiome in Crohn's Disease. | | 16S-GM-CGD | The Human Microbiome in Pediatric Abdominal Pain and Intestinal Inflammation. | | 16S-GM-EA | Foregut Microbiome in Development of Esophageal Adenocarcinoma. | | 16S-GM-NE | The Neonatal Microbiome and Necrotizing Enterocolitis. | | 16S-GM-UC | The Role of the Gut Microbiota in Ulcerative Colitis, Targeted Gene Survey. | | 16S-PP1 | Human microbiome project 16S production phase I. | | 16S-PP2 | Human microbiome project 16S production phase II. | | 16S-SM-ADI | Skin Microbiome in Disease States: Atopic Dermatitis and Immunodeficiency. | | 16S-SM-P | Evaluation of the Cutaneous Microbiome in Psoriasis. | | 16S-UM-AD | Urethral Microbiome of Adolescent Males. | | 16S-VM-BV | The Microbial Ecology of Bacterial Vaginosis: A Fine Scale Resolution Metagenomic Study. | | 16S-VM-DGE | The Vaginal Microbiome: Disease, Genetics and the Environment, 16S Gene Survey. | | IBDMDB | ibdmdb,Inflammatory Bowel Disease Multi-omics Database (IBDMDB) | | MOMS-PI | momspi | | T2D | prediabetes | | WGS-GM-CD | Metagenomic Analysis of the Structure and Function of the Human Gut Microbiota in Crohn's Disease. | | WGS-GM-UC | The Role of the Gut Microbiota in Ulcerative Colitis, Whole Metagenome Sequencing Project. | | WGS-PP1 | Human microbiome project WGS production phase I. | | WGS-PP2 | Human microbiome project WGS production phase II. | | WGS-VIR-FE | The Human Virome in Children And Its Relationship to Febrile Illness. | | | | | body site | | | abdomen | abdomen | | angle of seventh rib | FMA:7842 | | anterior part of leg | shin | | ascending colon | ascending_colon | | back | back | | blood cell | blood | | buccal mucosa | Buccal mucosa [FMA:59785],buccal_mucosa | | cerebrospinal fluid | cerebrospinal_fluid | | cervix of uterus | cervix | | cubital fossa | antecubital_fossa | | descending colon | descending_colon | | dorsum of tongue | Dorsum of tongue [FMA:54651],tongue_dorsum | | elbow | elbow | | external naris | anterior_nares,External naris [FMA:59645],nare | | feces | stool,FMA:64183 | | foot | foot | | forearm | volar_forearm,forearm | | gall bladder | gall_bladder | | gastric antrum | gastric_antrum | | gastrointestinal tract | gut,Gastrointestinal tract [FMA:71132] | | gingiva | gingival_crevices,subgingival_plaque,supragingival_plaque,attached_keratinized_gingiva,gingiva [FMA:59762],Gingiva [FMA:59762] | | hand | hand | | hard palate | Hard palate [FMA:55023],hard_palate | | head | head | | ileal-anal pouch | ileal-anal_pouch | | ileum | ileal_pouch,ileum | | knee | knee | | left arm | left_arm | | left cubital fossa | left_antecubital_fossa | | left retroauricular crease | left_retroauricular_crease,Skin of left auriculotemporal part of head [FMA:70332] | | leg | leg | | lung aspirate | lung_aspirate | | lymph node | lymph_node | | nasal cavity | nasal | | nasopharynx | Nasopharynx [FMA:54878],nasopharynx | | oral cavity | Oral cavity [FMA:20292],oral_cavity | | orifice of vagina | Orifice of vagina [FMA:19984],vaginal_introitus | | palatine tonsil | Palantine tonsil [FMA:9610],palatine_tonsils,Palatine tonsil [FMA:9610] | | perianal space | perianal_region | | peripheral blood mononuclear cell | FMA:86713 | | plasma | Plasma [FMA:62970] | | popliteal fossa | popliteal_fossa | | portion of saliva | saliva | | posterior fornix of vagina | posterior_fornix,Posterior fornix of vagina [FMA:19987] | | rectum | rectal | | respiratory tract | respiratory_tract | | right cubital fossa | right_antecubital_fossa,right cubital fossa [FMA:39849] | | right nasal cavity | FMA:276108 | | right retroauricular crease | Skin of right auriculotemporal part of head [FMA:70331],right_retroauricular_crease | | scalp | scalp | | shoulder | shoulder | | sigmoid colon | sigmoid_colon | | spinal cord | spinal_cord | | synovial fluid | synovial_fluid | | terminal ileum | terminal_ileum | | thigh | thigh | | throat | Throat [FMA:228738],throat | | transverse colon | transverse_colon | | unknown | unknown | | upper respiratory tract | upper_respiratory_tract | | urethra | urethra | | urinary tract | FMA:326482,urinary_tract | | vagina | mid_vagina,Vagina [FMA:19949],vaginal | | wall of vagina | wall_of_vagina |

Cart Metadata

On the cart page of the UI, one can download both a manifest of their samples+files of interest as well as metadata for these same entities. The manifest is to be used in conjunction with the HMP client to efficiently download all the files. The metadata serves as an additional source of input for analysis. The metadata, which is tab-separated, will always consist of a minimum set of (in this order):

Additional columns may be present in the metadata file if at least one of the samples present has a non-null value for the metadata. All potential metadata which will be present, if it was collected for the sample of interest, is defined in the OSDF schemas found below:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.