CD2H gitForager

biostream/bioschemas

Name: bioschemas

Owner: biostream

Description: ga4gh, gdc and bmeg in one place

Created: 2016-11-11 21:37:22.0

Updated: 2017-09-08 20:58:29.0

Pushed: 2017-07-14 22:03:15.0

Homepage:

Size: 129

Language: Protocol Buffer

GitHub Committers

User	Most Recent Commit	# Commits
Brian	2017-04-19 16:58:44.0	12
Kyle Ellrott	2017-03-28 19:02:07.0	2
Brian King	2017-01-06 21:53:31.0	12

Other Committers

User	Email	Most Recent Commit	# Commits

README

bioschemas

Common data structures and APIs.

This repo contains

git submodules from ga4gh, gdc and bmeg
A utility to read the schemas and produce different output (jsonschema and cerberus)

packaging

The schemas are packaged into a python module bioschemas The justification for the packaging is threefold:

Moves complexities of gitmodule management from the end user to the package release process
Each of the submodules referenced have many other components other than the schemas themselves. Packaging allows us to trim all components other than schema source.
The generated snapshot is checked into git - the rationalization is that is allows us to tag package explicitly and allows clients to install the package without submodule complexity.

install git+https://github.com/ohsu-computational-biology/bioschemas

package release

in
ckage-all.sh
 generates schema snapshot ...
 runs setup tests ...
------------------------------------------------------------------
4 tests in 0.100s

usage

oschemas-snapshot --help
e: bioschemas-snapshot [-h] [-o OUTPUT] [-v]

act bioschemas schema directory [ga4gh,bmeg,gdc]

onal arguments:
, --help            show this help message and exit
 OUTPUT, --output OUTPUT
                    Extract to this directory name. Must not already
                    exist; it will be created as well as missing parent
                    directories.
, --version         Print git hashes

The snapshot can be used by any language context and has the following structure:


cerberus
??? bmeg
??? ga4gh
?�� ??? ga4gh
?�� ??? google
?��     ??? api
?��     ??? protobuf
??? gdc
jsonschema
??? bmeg
??? ga4gh
?�� ??? ga4gh
?�� ??? google
?��     ??? api
?��     ??? protobuf
??? gdc
proto
??? bmeg
??? ga4gh
    ??? ga4gh
    ??? google
        ??? api

python usage

rt  bioschemas

chemas.schema_path()
/home/someuser/bioschemas/bioschemas/snapshot'

schemas.json_schema('Resource')
u'properties': {u'checksum': {u'type': u'string'}, u'class': {u'type': u'string'}, u'created': {u'type': u'string'}, u'datasetID': {u'type': u'string'}, u'description': {u'type': u'string'}, u'format': {u'type': u'string'}, u'gid': {u'type': u'string'}, u'id': {u'type': u'string'}, u'info': {u'type': u'object'}, u'location': {u'type': u'string'}, u'mimeType': {u'type': u'string'}, u'name': {u'type': u'string'}, u'size': {u'type': u'integer'}, u'type': {u'type': u'string'}}, u'type': u'object'}  

schemas.cerberus_schema('Resource')
u'checksum': {u'type': u'string'}, u'class': {u'type': u'string'}, u'created': {u'type': u'string'}, u'datasetID': {u'type': u'string'}, u'description': {u'type': u'string'}, u'format': {u'type': u'string'}, u'gid': {u'type': u'string'}, u'id': {u'type': u'string'}, u'info': {u'type': {u'type': u'dict'}}, u'location': {u'type': u'string'}, u'mimeType': {u'type': u'string'}, u'name': {u'type': u'string'}, u'size': {u'type': u'integer'}, u'type': {u'type': u'string'}}

chemas.git_hashes()
{u'bioschemas': u'f40f653', u'bmeg': u'537f94a', u'created_at': u'2016-11-18T17:47:56.858397Z', u'gdc': u'288f042'}

chemas.gdc_submission_template('file')

u'aliquots': {u'submitter_id': None}, u'analytes': {u'submitter_id': None}, u'archives': {u'submitter_id': None}, u'cases': {u'submitter_id': None}, u'centers': {u'code': None}, u'data_formats': {u'name': None}, u'data_subtypes': {u'name': None}, u'derived_files': {u'submitter_id': None}, u'described_cases': {u'submitter_id': None}, u'experimental_strategies': {u'name': None}, u'file_name': None, u'file_size': None, u'md5sum': None, u'platforms': {u'name': None}, u'portions': {u'submitter_id': None}, u'project_id': None, u'related_files': {u'submitter_id': None}, u'samples': {u'submitter_id': None}, u'slides': {u'submitter_id': None}, u'state_comment': None, u'submitter_id': None, u'tags': {u'name': None}, u'type': u'file'}

utilty

The ga4gh and bmeg cannonical schemas are maintained in protobuf. The bin/custom-plugin.py processes the schemas for alternate uses (jsonschema, cerebus). The bioschemas/snapshot directory contains output from protoc. Please do not hand edit, rather change custom-plugin.py or json-to-cerberus.py

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.