hammerlab/prohlatype

Name: prohlatype

Owner: Hammer Lab

Description: Probabilistic HLA typing

Created: 2016-04-29 06:22:05.0

Updated: 2018-01-13 00:11:16.0

Pushed: 2018-01-15 06:35:40.0

Homepage:

Size: 22246

Language: OCaml

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status Coverage Status

Probabilistic HLA Typing

Paper: Prohlatype: A Probabilistic Framework for HLA Typing

This project provides a set of tools to calculate the full posterior distribution of HLA types given read data.

Instead of:

A1      A2      B1      B2      C1      C2      Reads   Objective
A*31:01 A*02:01 B*45:01 B*15:03 C*16:01 C*02:10 538.0   513.79

one can calculate:

| Allele 1 | Allele 2 | Log P | P | |—————|——————-|————-|——-| | A02:05:01:01| A30:114| -23046.81 | 0.5000| | A02:05:01:01| A30:01:01| -23046.81 | 0.5000| | A02:05:01:01| A30:106| -23103.15 | 0.0000| | A02:05:01:02| A30:114| -23146.35 | 0.0000| | … | | | | B07:36| B57:03:01:02| -13717.33 | 0.5000| | B07:36| B57:03:01:01| -13717.33 | 0.5000| | B07:36| B57:03:03| -13804.74 | 0.0000| | B27:157| B57:03:01:02| -13816.17 | 0.0000| | … | | | | C06:103| C18:10| -11936.35 | 0.3338| | C06:103| C18:02| -11936.36 | 0.3331| | C06:103| C18:01| -11936.36 | 0.3331| | C15:102| C18:02| -11951.72 | 0.0000|

How:
There are three options to obtain the software:
  1. If you are running on Linux, standalone binaries are available with each release.

  2. Use the linked Docker image.

  3. Build the software from source:

    a. Install opam.

    b. Make sure that the opam packages are up to date:

      $ opam update
    

    c. Make sure that you're on the relevant compiler:

      $ opam switch 4.05.0
      $ eval `opam config env`
    

    d. Get source:

      $ git clone https://github.com/hammerlab/prohlatype.git prohlatype
      $ cd prohlatype
    

    e. Install the dependent packages:

      $ make setup
    

    f. Build the programs (afterwards they'll be in _build/default/src/apps):

      $ make
    
Make sure that you have IMGT/HLA available:

$ git clone https://github.com/ANHIG/IMGTHLA.git imgthla

“Prohla”-typing:
  1. Create an imputed HLA reference sequence via align2fasta. This step makes sure that all alleles have sequence information that spans the entire locus. This way, reads that originate from a region for which we normally do not have sequence information will still align (in the next filtering step), albeit poorly:

      $ align2fasta path-to-imgthla/alignments -o imputed_hla_class_I.fasta
    

    This step needs to be performed only once, per each IMGT version. Run $align2fasta --help for further information.

  2. Filter your data against the reference, by first aligning. Ex:

      $ bwa mem imputed_hla_class_I.fasta ${SAMPLE}.fastq | \
          samtools view -F 4 -bT imputed_hla_class_I.fasta -o ${SAMPLE}.bam
    

    While fundamentally, the algorithms here are alignment based. They're too slow to run for all sequences. Sequences that do not originate from the HLA-region would just act as background noice.

  3. and then convert aligned reads back to FASTQ:

      $ samtools fastq ${SAMPLE}.bam > ${SAMPLE}_filtered.fastq
    
  4. Infer types (see $ multi_par --help for further details):

      $ multi_par path-to-imgthla/aignments ${SAMPLE}_filtered.fastq -o ${SAMPLE}_output.tsv
    

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.