hackseq/2016_project_6

Name: 2016_project_6

Owner: hackseq

Description: Inferring sex chromosome and autosomal ploidy in NGS data

Created: 2016-08-31 22:56:14.0

Updated: 2016-10-17 23:46:44.0

Pushed: 2016-10-17 22:50:38.0

Homepage: null

Size: 11785

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

2016_project_6

Inferring sex chromosome and autosomal ploidy in NGS data

Slide show here: https://docs.google.com/presentation/d/1OB2d_mu5zC742N_NKfzHjVpUm4BFtm5lUzniLLI–OQ/edit?usp=sharing

Publication Links
List of Goals: Assess X/Y ploidy and correct for misalignment
  1. Extract input chromosomes - recommend chrX, chrY, chr19 - from BAM (can input any autosome)

  2. Infer sex chromosome ploidy from WGS data relative to autosomal ploidy

  3. XX

  4. XY

  5. XXY

  6. X0

  7. And all other combinations Use A. Quality B. Read Depth C. Allele Balance D. Ampliconic/Palindromic/CNV filter

    Typical expectations for heterozygous calls under different sex chromosome complements:

    Genotype | X_call | Y_call — | — | — XX | het | none XY | hap | hap X0 | hap | none or partial_hap XXY | het or hap | hap XYY | hap | hap XXX | het | none

    Note: Half of 47,XXY are paternal in origin -> do not expect het sites: http://humupd.oxfordjournals.org/content/9/4/309.full.pdf

    Expectations for depth under different sex chromosome complements:

    Genotype | X_depth | Y_depth — | — | — XX | 2x | 0x XY | 1x | 1x X0 | 1x | 0x (or partial) XXY | 2x | 1x XYY | 1x | 2x XXX | 3x | 0x

  8. IF - If we infer there are no Y chromosomes in the sample, conduct re-mapping to increase confidence in X-linked alleles. Strip reads from X and Y Remap all X & Y reads to the X chromosome only Remove X and Y from the input BAM file Merge the empty Y and the remapped X chromosome into the BAM

  9. Assessment of 1000 genomes high coverage data Compare SNV and CNV variant calling in 1000 genomes high coverage before/after running this pipeline Test how different alignment algorithms, parameters, and reference sequences affect variant calling in different regions of the X and Y Compare variant calling with the “Gold Standard” reference individual

Other goals: Because I think we have to address this if we want to get a really good handle on #2 given the extremely high copy number variable regions on X and Y - the ampliconic regions. Likely we will masking them out to infer #2, which will be easiest, but then we can have an extended goal to see characterize variations in these regions.

Known problems and complications
Group Members

Name | email | github ID — | — | — Madeline Couse| mhcouse@gmail.com | @Madelinehazel Bruno Grande | bgrande@sfu.ca | @brunogrande Eric Karlins | karlinser@mail.nih.gov | @ekarlins Tanya Phung | tnphung@ucla.edu | @tnphung Phillip Richmond | phillip.a.richmond@gmail.com | @Phillip-a-Richmond Tim Webster | timothy.h.webster@asu.edu | @thw17 Whitney Whitford | whitney.whitford@auckland.ac.nz | @whitneywhitford Melissa A. Wilson Sayres | melissa.wilsonsayres@asu.edu | @mwilsonsayres


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.