cbcrg/L-GERT

Name: L-GERT

Owner: Notredame Lab

Description: Leishmania - GEnome Reporting Tool

Forked from: giovannibussotti/L-GERT

Created: 2018-05-03 12:21:16.0

Updated: 2018-05-10 09:01:43.0

Pushed: 2018-05-10 09:01:42.0

Homepage: null

Size: 494

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

The L-GERT (Leishmania-GEnome Report tool), a.k.a. LSD (Leishmania Sequencing Data analysis) a.k.a. GRT (Genome Reporting Tool) is a pipeline meant to map whole genome sequencing PAIRED-END short reads against annotated reference genomes.

Dependency: repeatmasker bwa bwa spades freebayes trimmomatic samtools bedtools ucsc-bedgraphtobigwig deeptools gat vcftools tabix delly blast
snpeff MUMmer gatk picard ucsc-gtftogenepred ucsc-genepredtobed perl-pod-usage perl-getopt-long perl-list-util r-argparse r-data.table r-gtools r-ggplot2 redundans (https://github.com/lpryszcz/redundans.git) recycler

The input is a .tsv file as in test.tsv (derived by /pasteur/projets/policy01/BioIT/Giovanni/leish/p2p5/info/bigTable/bigTable.tsv) The first step is to select from the bigTable.tsv the samples you wanna map. You can use the utility script LSD/utility/selectFromTable.R or do it your own with grep on the bigTable (as long as you maintain the bigTable header also in the selected table). Then you run runLSD specifying: -f the dir with the fastq.gz -b the output directory where the bam files will be created -t the input table -c the configuration file

If you you use the default configuration file you can run it like this: D=/pasteur/entites/HubBioIT/gio/apps/my_scripts/WholeGenomeSequencingPipeline/LSD/version_XXX bash ${D}/runLSD -f ../fq/ -b outDir -t test.tsv -c ${D}/configLSD

The LSD pipeline is based on these scripts/steps: A-collapseMultirunSamples.R Collapse multi-run samples to single rows adding the MULTIRUN_RX_FQIDS fields and copy it in the outDir. This is the table actually used by the rest of the pipeline B-mapping.sh Bwa-map using the mapSample.sh wrapper C-covPerChr.sh Compute the median chr coverage D-covPerNt.sh Generage covPerNt E-covPerBin.sh Generate covPerBin F-mappingStats.sh Generate .stats files (picard mapping stats) using the script mappingStats.R G-covPerGe.sh Generate covPerGe H-freebayes.sh Generate .vcf files I-spades.sh Generate the contigs and scaffold files running trimmomatic+spades with the wrapper runSpades.sh L-snpEff.sh Edit .vcf files annotating the variants O-dellySVref.sh Delly defined structural variations of the sample with respect to the reference P-redundans.sh Reduce redundant contigs, do scaffolding, and fill gaps. Greately reduce spades contigs fragmentation R-bigWigGenomeCov.sh Generate a genome coverage bigWig file using chromosome specific RPKM normalizations to account for possible aneuploidies

gencov2intervals.pl generates averaged bind and it is used by covPerBin.sh utility/ Stores other useful scripts, like selectFromTable.R, some examples (example-* scripts) on how to use it to select specific samples or covPerChrSummary.R, a script to plot together all median chr coverages

The cosmid-seq pipeline shares several steps with the LSD pipeline, but some are missing. Additionally it has these extra/different steps: M-cosmidSeqMappingStats.sh Estimate the median coverage and stats of just what was sequenced N-blastnContigsToRef.sh Maps spades contigs back to the reference (define cosmide regions)

The test.tsv and testDeNovo.tsv files are meant to be used for pipeline debugging development. test.tsv is a general purpose test you can run with just one small sample. testDeNovo.tsv includes two samples (one normal, the other multi run) but has very few reads to spead up the spades de novo assembly step and the downstream steps. However the bummer here is that the median genome coverage is zero, so it fails in performing covPerNt and all the downstream files

TO DO:


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.