hammerlab/mhcflurry-icml-compbio-2016

Name: mhcflurry-icml-compbio-2016

Owner: Hammer Lab

Description: Data and analysis notebooks for Predicting Peptide-MHC Binding Affinities With Imputed Training Data

Created: 2016-04-29 13:29:09.0

Updated: 2016-06-07 19:12:14.0

Pushed: 2016-07-12 16:11:58.0

Homepage: http://biorxiv.org/content/early/2016/05/22/054775

Size: 58480

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Predicting Peptide-MHC Binding Affinities With Imputed Training Data

This repository has the data, analysis notebooks, and Authorea-generated latex files for Predicting Peptide-MHC Binding Affinities With Imputed Training Data, submitted to the ICML 2016 Workshop on Computational Biology.

Data and notebooks

The predictions on the blind test data generated by the MHCflurry predictors, netMHC, netMHCpan, and SMM are available in data/validation_predictions_full.csv. This file has predictions from 64 MHCflurry models, 32 with imputation and 32 without. Descriptions of the models are in data/validation_models.csv.

The notebook to train the predictors and generate these results took about 20 hours to run on a single TITAN X GPU and is in notebooks/validation.ipynb.

The analysis of these results, including generating ensemble predictions from the individual predictors and calculating AUC, F1, and tau scores is in notebooks/validation results analysis.ipynb.

The command to generate the data for Figure 1 was:

lurry-dataset-size-sensitivity.py \
--allele HLA-A0201  \
--training-csv data/bdata.2009.mhci.public.1.txt \
--imputation-method mice \
--number-dataset-sizes 15 \
--random-negative-samples 0 \
--min-observations-per-peptide 3 \
--training-epochs 250 \
--repeat 3 \
--max-training-samples 500 \
--min-training-samples 10 \
--dropout 0.5 \
--hidden-layer-size 64 \
--embedding-size 32
Versions

We used MHCflurry revision 52a88ace.

Other libraries:

irs==1.4.0
ports-abc==0.4
ports.shutil-get-terminal-size==1.0.0
ports.ssl-match-hostname==3.5.0.1
ython==1.66
le==0.12.9
ifi==2016.2.28
ryPy==5.1.0
ate==0.4.6
igparser==3.3.0.post2
anon==0.0.23.4
pt==1.1.8
y==0.4.0
er==0.10.0
cache==0.4.17
rator==4.0.9
==0.2.5
hill==0.3.2
==2.0.4
ypoints==0.2.1
it+git@github.com:hammerlab/fancyimpute.git@c4510c5a77fcf27af65149610f260f18826129a4#egg=fancyimpute
tools32==3.2.3.post2
==2.6.0
ernel==4.3.1
hon==4.2.0
hon-genutils==0.1.0
idgets==5.1.2
a2==2.8
schema==2.5.1
ter-client==4.2.2
ter-core==4.1.0
s==1.0.2
==3.6.0
upSafe==0.23
lotlib==1.5.1
une==0.7.2
iprocess==0.70.4
nvert==4.2.0
rmat==4.0.1
book==4.2.0
y==1.10.4
as==0.18.0
lib2==2.1.0
it+git@github.com:hammerlab/pepdata.git@a76e9606a24ff0d1b4c817182cdd06d5c75ba169#egg=pepdata
ect==4.0.1
leshare==0.7.2
==0.9.1
ressbar33==2.4
rocess==0.5.1
iro==1.10.0
ents==2.1.3
rsing==2.1.1
on-dateutil==2.5.2
==2016.3
ML==3.11
q==15.2.0
ests==2.10.0
it-learn==0.17.1
y==0.17.0
=1.2.6
orn==0.7.0
legeneric==0.8.1
ledispatch==3.4.0.3
=1.10.0
inado==0.6
no==0.9.0.dev0
z==0.7.4
ado==4.3
tlets==4.2.1
checks==0.0.2
etsnbextension==1.2.1

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.