Name: pcpipe
Owner: Hurwitz Lab
Description: null
Created: 2015-09-25 22:14:10.0
Updated: 2016-07-14 23:20:00.0
Pushed: 2016-07-14 23:23:53.0
Homepage: null
Size: 22
Language: Perl
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
To build Docker image:
To run:
$ docker run --rm pcpipe ...
The input is a set of ORFs (e.g. peptides from Chesapeake Bay (CB)) plus a fasta file with already clustered ORFs (like the POV+TOV clusters). Here are the steps:
Use cd-hit-2d to compare the input CB peptides to a fasta file of already cluster proteins (TOV + POV)
You will get a file with the clusters (CB + POV + TOV), taking the remaining unclustered CB peptides and self cluster them (via cd-hit)
Take a representative sequence from each new cluster (from CB), and use the blast pipeline to compare the representative ORFS to simap.
Provide the user with new cluster file (POV+TOV_CB, and CB self clustered) and the annotation for the new clusters (based on the representative sequence).
Note that the input should be a directory where you can have multiple peptide files, for a test you can use the Peptides and Read_pep from this dataset.
A couple gotchas: the clusters should have a minimum of two ORFs. Use the same percent identity and coverage as in the scripts, some of the POV orfs in the *fa may not be in the TOV+POV clusters (this is because they were not in clusters with at least 20 ORFs). But, I think Simon included all when we ran the clustering with TOV. Just something to watch for.