Name: MiHAIP
Owner: NMDP/Be The Match Bioinformatics Research
Description: null
Created: 2017-04-30 23:49:09.0
Updated: 2017-10-04 18:46:48.0
Pushed: 2018-01-11 18:52:17.0
Homepage: null
Size: 17452
Language: Java
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Allogeneic hematopoietic stem cell transplantation (alloHSCT) can be a curative therapy for hematological malignancies. This approach cures a significant percentage of patients via graft-versus-leukemia (GVL) effect[1-3]. Unfortunately, the desired GVL effect is often accompanied with undesired graft-versus-host disease (GVHD)[4].
After HLA-matched alloHSCT, GVL and GVHD are both mediated by donor-derived T-cells recognizing polymorphic peptides presented by HLA molecules on recipient antigen present cells (APCs).These polymorphic peptides can be tumor-specific antigens (TSAs) or neoantigens, produced by tumor-mediated mutations, mostly contribute GVL effect. It can also be minor histocompatibility antigens (MiHAs), which are produced by genetic differences between donor and recipient and cause GVHD. The expression level and tissue distribution ofMiHA are proved factors in transplant effects, GVL and GVHD.
MiHAs are considered as well as targets for immunotherapy. At least 100 MiHAs with HLA-restriction have been validated across the genome including male-specific, Y chromosome-encoded antigens, which affect sex-mismatched HSCT-related outcomes[5].
Therefore, comprehensive identification of dominant repertoire of MiHA and TSA is a reliable strategy for immunotherapy to augment GVL and prevent GVHD.
The MiHA identification pipeline integrates diverse biological data sources include whole genome sequencing data, HLA genotyping, clinical outcomes, tissue-specific expression data, known MiHAs data and minor allele frequencies, as well as integrates multiple 3rd party software include RTGtools[6], snpEff[7], netchop3.1[8] and netMHCpan3.0[9] to predict the potential MiHAs by comparing the variants between transplanted donor and recipient.
(Note: all the list tools would require individual users to decide their license situation and take proper action)
Meanwhile, the pipeline simulates the antigen processing and presenting on recipients? cell surface via restricted HLA molecules. For details of pipeline please check the flowchart:
A workflow to identify MiHAs. Note: Male recipients with female donors were treated separately to analyze variants on the Y chromosome (HY).
Two genome comparison: RTGtools were applied to compare the whole genome sequencing data between donor and recipient. The variants? effect annotation: snpEff was used to annotate the variants? effects. MiHA prediction algorithms: The proteasomal processing prediction by netChop3.1 and MHC class I binding prediction was performed by netMHCpan3
a. Install RTGtools
from REAL TIME GENOMICS:
https://github.com/RealTimeGenomics/rtg-tools/releases/download/3.7.1/rtg-tools-3.7.1-linux-x64.zip
p rtg-tools-3.7.1-linux-x64.zip
ols/rtg-tools-3.6.2$ java -jar RTG.jar
e: rtg COMMAND [OPTION]...
RTG_MEM=16G COMMAND [OPTION]... (e.g. to set maximum memory use to 16 GB)
'rtg help COMMAND' for help on a specific command.
b. Install SnpEff
from http://snpeff.sourceforge.net/index.html SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).
http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip
p snpEff_latest_core.zip
-jar snpEff.jar
java -jar snpEff.jar build GRCh38.82
c. Install netChop3.1
from Center For Biological Sequence Analysis, Technical University of Denmark.
The NetChop 3.1 may be downloaded only by special agreement. For academic users there is a download site at:http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netchop. Other users are requested to contact software@cbs.dtu.dk.
netchop-3.1.<unix>.tar.Z | uncompress | tar xvf -
rt NETCHOP=/full/path/to/netchop-3.1
rt TMPDIR=/full/path/to/netchop-3.1/tmp
In the 'netchop-3.1' directory test the software: bin/netchop test/test.fsa > test.out For research purpose, the net Chop3.1 is pre-built in MiHA identification pipeline. Make sure two environment variables, NETCHOPandTMDIR are set properly.Alsomakesurethat TMPDIR has the sticky bit set (the long listing should render 'drwxrwxrwt…'). If not, set it:
d 1777 $TMPDIR
d. Install netMHCpan-3.0
Like netChop, the netMHCpan-3.0 may only be downloaded only by special agreement as well. In addition, it requests to download the data file (data.tar.gz) separately.
netMHCpan-3.0.<unix>.tar.gz | uncompress | tar xvf -
/netMHCpan-3.0
http://www.cbs.dtu.dk/services/NetMHCpan-3.0/data.tar.gz
-xvf data.tar.gz
rt TMPDIR=/Path/to/Tools/netMHCpan-3.0/tmp
rt NMHOME=/Path/to/Tools/netMHCpan-3.0/bin
rt NETMHCpan=/Path/to/Tools/netMHCpan-3.0
etMHCpan -p test.pep > test.pep.myout
etMHCpan test.fsa > test.fsa.myout
etMHCpan -hlaseq B0702.fsa test.fsa > test.fsa_userMHC.myout
cds.db is pre-built hg38 transcriptome and minor allele frequence (version:snp147common) database.
Build you own minor allele frequency database by using lastest versions. Go to http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ download the SNP dataset. For example, snp147common
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp147Common.txt.gz
ip snp147Common.txt.gz
-jar ./MiHAIP1.4.5.jar/Tools/filterFreq.jar snp147Common.txt
Generate an output file named filteredFile.txt
Run the freq2db which import the minor allele frequence data from filteredFile.txt into the cds.db
-jar ./MiHAIP1.4.5.jar/Tools/freq2db.jar filteredFile.txt
a. Copy MiHAIP_1.4.5.tar.gz and cds.db from /Volumes/bioxover/users/wwang/MiHAIP_released
-xvzf MiHAIP_1.4.5.tar.gz
b. Read documentation
-jar MiHAIP1.4.5.jar -help
For testing the MiHA identification pipeline, you could use a pair of sample data from: /Volumes/bioxover/users/wwang/MiHAIP_released Copy all the executable programs (netChop-3.1, netMHCpan-3.0, RTG.jar and snpEff.jar) into the directory ./MiHAIP1.4.5/Tools. Copy the input files into ./path/to/input. Caution: the input files should include the .vcf.gz and index file .vcf.gz.tbi
Assign a unique pair ID for each run, e.g. -pid 1;
The sex needs to be defined donor's sex follow recipient's sex, e.g. -sex MM, which means male donor to male recipient. It helps to find Y chromosome encoded MiHAs (H-Y).
The HLA types should include all 6 alleles and separated by comma (there is no space in between alleles).
-t argument points the path to tools box.
Set the directory of output files by using -o
cp /Volumes/bioxover/users/wwang/MiHAIP_released/vcf /path/to/your/input/directory
-jar MiHAIP0.3.3.jar -d /path/to/input/Donor.vcf.gz -r /path/to/input/Recipient.vcf.gz -pid 1 -sex MM -hla A02:01,A02:01,B07:02,B14:02,C07:02,C08:02 -t /path/to/Tools/ -o /path/to/output
The program takes ~5min run through sample data ( average 22min for a whole genome), then check the predicted results from ./path/to/output/predicted
[1] Copelan, E.A., Hematopoietic stem-cell transplantation. N Engl J Med, 2006. 354(17): p. 1813-26.
[2] Horowitz, M.M., et al., Graft-versus-leukemia reactions after bone marrow transplantation. Blood, 1990. 75(3): p. 555-62.
[3] Miller, J.S., et al., NCI First International Workshop on The Biology, Prevention, and Treatment of Relapse After Allogeneic Hematopoietic Stem Cell Transplantation: Report from the Committee on the Biology Underlying Recurrence of Malignant Disease following Allogeneic HSCT: Graft-versus-Tumor/Leukemia Reaction. Biol Blood Marrow Transplant, 2010. 16(5): p. 565-86.
[4] Ferrara, J.L., et al., Graft-versus-host disease. Lancet, 2009. 373(9674): p. 1550-61.
[5] Gooley, T.A., et al., Reduced mortality after allogeneic hematopoietic-cell transplantation. N Engl J Med, 2010. 363(22): p. 2091-101.
[6] Cleary, J.G., et al., Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. J Comput Biol, 2014. 21(6): p. 405-19.
[7] Cingolani, P., et al., A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin), 2012. 6(2): p. 80-92.
[8] Kesmir, C., et al., Prediction of proteasome cleavage motifs by neural networks. Protein Eng, 2002. 15(4): p. 287-96.
[9] Nielsen, M. and M. Andreatta, NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med, 2016. 8(1): p. 33.