Sage-Bionetworks/Genie

Name: Genie

Owner: Sage Bionetworks

Description: For analyzing data in www.synapse.org/genie

Created: 2016-10-20 20:22:55.0

Updated: 2017-09-01 04:43:09.0

Pushed: 2018-01-17 23:52:35.0

Homepage: null

Size: 5152

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

AACR Project GENIE

Introduction

This repository documents code used to gather, QC, standardize, and analyze data uploaded by institutes participating in AACR's Project GENIE (Genomics, Evidence, Neoplasia, Information, Exchange). Follow instructions below to download and populate the folder data with the core Genie dataset. And then review the contents of the analyses folder, which contains a sub-folder each, for various analyses that can be performed on those datasets.

Dependencies

These are tools or packages you will need, to be able to reproduce these results:

Fetch Data

Download the core dataset into data/cbioportal. This data feeds the web-based UI at cbioportal.org/genie:

pse get -r --downloadLocation data/cbioportal syn5521835
File Validator
on sage_processing/validateGENIE.py -h
e: validateGENIE.py [-h]
                    {maf,clinical,fusions,cnv,vcf,seg,bed} file [file ...]
                    {MSK,GRCC,DFCI,NKI,JHU,MDA,VICC,UHN}

date GENIE files

tional arguments:
af,clinical,fusions,cnv,vcf,seg,bed}
                    File type that you are validating: maf, clinical,
                    fusions, cnv, vcf, seg, bed
le                  File(s) that you are validating. If you validation
                    your clinical files and you have both sample and
                    patient files, you must provide both
SK,GRCC,DFCI,NKI,JHU,MDA,VICC,UHN}
                    Contributing Center

onal arguments:
, --help            show this help message and exit

Examples:

on validateGENIE.py clinical data_clinical_supp_NKI.txt NKI
on validateGENIE.py clinical data_clinical_supp_patient_VICC.txt data_clinical_supp_sample_VICC.txt VICC

Processing instructions
  1. Go onto GENIE ec2

  2. Run shell script - .sage_processing/processGENIE.sh

    Currently processGENIE.sh is set up with default parameters and set to upload into the staging site. Make sure you modify these lines below can be modified in the shell script. (Working on being able to pass in arguments into the shell script)

on database_to_staging.py Jul-2017 ~/cbioportal/ 1.1.0 --consortiumReleaseCutOff 183

on consortium_to_public.py Jul-2017 ~/cbioportal/ 1.1.0 --publicReleaseCutOff 365

on ../sage_dashboard/dashboardTableUpdater.py 1.1.0 
ipt ../sage_dashboard/public_dashboard.R 1.1.0
ipt ../sage_dashboard/clinicalImages.R
  1. Check dashboard page to make sure numbers are correct
  2. If numbers don't match up, please check the log files to make sure no errors came up during processing.
Releases
  1. release 1.1.0 - 2.0.0
    on database_to_staging.py Jan-2017 ~/cbioportal/ 1.1.0 --skipMutationsInCis
    on consortium_to_public.py Jul-2017 ~/cbioportal/ 2.0.0
    
  2. release 2.1.0 - 3.0.0
    on database_to_staging.py Jul-2017 ~/cbioportal/ 2.1.0
    on consortium_to_public.py Jan-2018 ~/cbioportal/ 3.0.0
    
  3. release 3.1.0 - 4.0.0
    on database_to_staging.py Jan-2018 ~/cbioportal/ 3.1.0
    on consortium_to_public.py Jul-2018 ~/cbioportal/ 4.0.0
    
Instructions for batch
  1. Build an AMI that can run batch jobs! Start from this page and follow instructions and specify your docker image. It is important at this stage that you time the building of your AMI, or your AMI will not be able to start batch jobs. After doing so, you will have to start an instance with the AMI and run these 2 commands:
 stop ecs
 rm -rf /var/lib/ecs/data/ecs_agent_data.json
  1. Rebuild the AMI above, specify the size of the image and put whatever you want in the instance that you would want to bind
Reporting bugs

Please double check and triple check your bug, and if you're super certain that we're not infallible, then click here to report the bug, along with a clear explanation of how we can find or reproduce it.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.