cBioPortal/clinical-data-normalization

Name: clinical-data-normalization

Owner: cBioPortal

Description: Tools and data relating to clinical data normalization efforts.

Created: 2017-08-16 15:00:52.0

Updated: 2017-09-04 01:45:45.0

Pushed: 2017-09-04 01:45:44.0

Homepage: null

Size: 2555

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

cBioPortal-new-study-assistant
Introduction

This tool is designed to aid in curating new study clinical data into cBioPortal. The tool reads clinical data from the new study and compares the data to existing studies in the cBioPortal database to find attributes which match those in existing studies. The ultimate goal of the tool is to aid the curator in normalizing the new study data relative to cBioPortal studies.

Required python packages
Running the script

The default mode of the script selects a random study from the portal and searches other studies on the portal for matching attributes. The current version of the script typically takes around 10 minutes to run and depends on internet access to download data from cBioPortal. Alternatively, one can also clone the datahub repository and run the tool on that local data.

Default example:

on new_study_assistant.py

A more practical mode of the tool is to test raw study data from a new study against the existing cBioPortal data. Currently the tool assumes that the raw study data only contains one line in the header. Example raw data from the acyc_mda_2015 study is provided on this repository for reference.

Example using acyc_mda_2015 raw data (this data is provided in the acyc_mda_2015 folder in this repository):

on new_study_assistant.py --study_to_drop='acyc_mda_2015' --new_study_path='./acyc_mda_2015/raw_data_clinical.txt' > similarity_output.txt
Options available
w_study_path PATH
udy_to_drop STUDY_ID
ecific_study STUDY_ID
tput_pdf
tahub_path PATH
Output

Similar attributes that are detected in the test study are printed to a report. The report can be output as either an HTML(default) or PDF file.

Full list of the files generated by the script:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.