Name: clinical-data-normalization
Owner: cBioPortal
Description: Tools and data relating to clinical data normalization efforts.
Created: 2017-08-16 15:00:52.0
Updated: 2017-09-04 01:45:45.0
Pushed: 2017-09-04 01:45:44.0
Homepage: null
Size: 2555
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This tool is designed to aid in curating new study clinical data into cBioPortal. The tool reads clinical data from the new study and compares the data to existing studies in the cBioPortal database to find attributes which match those in existing studies. The ultimate goal of the tool is to aid the curator in normalizing the new study data relative to cBioPortal studies.
The default mode of the script selects a random study from the portal and searches other studies on the portal for matching attributes. The current version of the script typically takes around 10 minutes to run and depends on internet access to download data from cBioPortal. Alternatively, one can also clone the datahub repository and run the tool on that local data.
Default example:
on new_study_assistant.py
A more practical mode of the tool is to test raw study data from a new study against the existing cBioPortal data. Currently the tool assumes that the raw study data only contains one line in the header. Example raw data from the acyc_mda_2015 study is provided on this repository for reference.
Example using acyc_mda_2015 raw data (this data is provided in the acyc_mda_2015 folder in this repository):
on new_study_assistant.py --study_to_drop='acyc_mda_2015' --new_study_path='./acyc_mda_2015/raw_data_clinical.txt' > similarity_output.txt
w_study_path PATH
udy_to_drop STUDY_ID
ecific_study STUDY_ID
tput_pdf
tahub_path PATH
Similar attributes that are detected in the test study are printed to a report. The report can be output as either an HTML(default) or PDF file.
Full list of the files generated by the script: