marcottelab/MS_grouped_lookup

Name: MS_grouped_lookup

Owner: The Marcotte Lab

Description: null

Created: 2016-12-07 18:50:00.0

Updated: 2017-08-24 16:03:44.0

Pushed: 2017-08-24 19:38:08.0

Homepage: null

Size: 14108

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

MS_grouped_lookup

This is a folder for looking up peptides from fractionation-MS experiments using custom grouping for uniqueness

One part of a larger scheme that goes:

  1. Convert .RAW thermo files in /MS/submit to mzXML in /MS/processed using local Windows MSConvert
  2. Run MSblender on TACC (ls5) $scratch using the https://github.com/marcottelab/run_msblender setup (using branch https://github.com/marcottelab/MSblender/tree/msblender_restructure)
    • Download output ot /MS/processed
  3. Format proteome for peptide lookup
  4. Lookup peptides
Analysis steps
Define grouping of peptides

1. Run eggnog-mapper to assign proteins to groups and format output

Running hmmer straight on a full protein will take several days to process

$ nohup python /project/eggnog-mapper-0.99.2/emapper.py -i /project/cmcwhite/orthology_proteomics/proteomes/human/uniprot-reviewed%3Ayes+AND+proteome%3Aup000005640.fasta --output human_hmmer_euNOG -d euNOG --override --scratch_dir /project/cmcwhite/orthology_proteomics/proteomes/human/ -m hmmer --output_dir /project/cmcwhite/orthology_proteomics/eggnog_mapper  &> /project/cmcwhite/orthology_proteomics/logs/nohup_human_euNOG.txt &


Instead, break up the proteome into chunks and process in parallel

Make a file with the species ID of one or more species

spec_list.txt
    human
    mouse

This script currently assumes a proteomes directory structure of 
   /proteomes/[speciesID]/working_proteome/[single fasta]

Break up a proteome and create a list of commands for each chunk
$ bash scripts/create_emapper_commands.sh spec_list.txt /project/eggnog-mapper-0.99.2/emapper.py euNOG hmmer        
Run each command in parallel
$ cat human_euNOG_hmmer_COMMANDS.txt | parallel -j10

Combine the outputs
$ cat proteomes/human/output/*emapper.annotations > eggnog_mapper/human_hmmer_euNOG.emapper.annotations


The output from the eggnog mapper need to be formatted 
$ format_emapper_output.R -f human_hmmer_euNOG.emapper.annotations -o human_hmmer_euNOG.mapping -s hmmer -l euNOG

creates file with format:
ProteinID   ID

2. Do an artificial trypsin digest on a proteome

$ python scripts/trypsin.py –input proteomes/human/uniprot-proteome%3AUP000005640.fasta –output proteomes/human/uniprot-proteome%3AUP000005640_peptides.csv –miss 2

3. Get protein-unique peptides

Identify proteins from peptides that are unique to single proteins

$ python scripts/define_grouping.py --spec human --grouping_type protein  --peptides proteomes/human/working_proteome/uniprot-proteome_human_reviewed_peptides.csv --output_dir proteomes/human/working_proteome/

4. Get group-unique peptides

Identify groups of proteins from peptides that are unique to the proteins in a group

$ python scripts/define_grouping.py --spec human --grouping_type euNOG --grouping eggnog_mapper/human_hmmer_euk.mapping  --peptides proteomes/human/working_proteome/uniprot-proteome_human_reviewed_peptides.csv --output_dir proteomes/human/working_proteome/
Identify proteins in an experiment

1. Consolidate identified peptides from multiple experiments into a single file

$ bash scripts/consolidate_MSblender_output.sh /MS/processed/Fusion_data/ExperimentA/output ExperimentA elutions/

/MS/processed/Fusion_data/ExperimentA/output

  fraction1_pep_count_FDR001
     ACDER 1
     ETIAJR 2

  fraction2_pep_count_FDR001
     GFEAR 1
     AYTQWER 3

—>

ExperimentA_elution.csv

    ExperimentA,fraction1,ACDER,1
    ExperimentA,fraction1,ETIAJR,2
    ExperimentA,fraction2,GFEAR,1
    ExperimentA,fraction2,AYTQWER,3

These formatted files are stored in the elutions/ folder

2. Lookup peptides by protein

Do the look up $ python scripts/get_elution_profiles.py human protein ExperimentA elutions/ExperimentA_elution.csv proteomes/human/working_proteome/unique_peptides_human_protein.csv proteomes/contam/contam_benzo_peptides.csv

$ python scripts/get_wideform_prot.py identified_elutions/human/ExperimentA_elution_human_protein.csv

Transform columns to a wide table. ex.

   "tidy elution format"
   ExperimentA,fraction1,protein1,10
   ExperimentA,fraction2,protein1,30
   ExperimentA,fraction1,protein2,3
   ExperimentA,fraction2,protein2,2

   --> 
   "wide elution format"

   ID,fraction1,fraction1
   protein1,10,30
   protein2,3,2

3. Lookup peptides according to a grouping of proteins

$ python scripts/get_elution_profiles.py human euNOG ExperimentA elutions/ExperimentA_elution.csv proteomes/human/working_proteome/unique_peptides_human_euNOG.csv proteomes/contam/contam_benzo_peptides.csv

$ python scripts/get_wideform_group.py identified_elutions/human/ExperimentA_elution_human_euNOG.csv eggnog_mapper/human_hmmer_euNOG.mapping annotation_files/all_annotations.csv

Going to be removed, not very useful extra format Similar to get_wideform_prot.py, but also creates an alternate format that shows the proteins in a group

ID,proteinIDs,fraction1,fraction2


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.