soedinglab/b-lore

Name: b-lore

Owner: Söding Lab

Description: Bayesian multiple logistic regression for GWAS meta-analysis

Created: 2017-07-06 14:16:22.0

Updated: 2017-11-01 09:00:52.0

Pushed: 2017-10-17 12:59:32.0

Homepage: null

Size: 21519

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

B-LORE

Bayesian LOgistic REgression

A tool for meta-analysis in GWAS using Bayesian multiple logistic regression

Description

B-LORE is a command line tool that creates summary statistics from multiple logistic regression on GWAS data, and combines the summary statistics from multiple studies in a meta-analysis. It can also incorporate functional information about the SNPs from other external sources. Several genetic regions, or loci are preselected for analysis with B-LORE.

Key features
  1. Association probability: B-LORE outputs probabilities of the input genetic loci being statistically associated with the phenotype.
  2. Finemapping: B-LORE also outputs the probability of each SNP being statistically associated with the phenotype.
  3. Leverage functional genomic data as a prior probability to improve prioritization.
  4. Models data with logistic regression, and is suited for case/control studies.
  5. Combines information over all SNPs in a locus with multiple regression.
Installation

B-LORE is written in python and C++. To run B-LORE, you will need

To use B-LORE, you have to download the repository and compile the C++ shared libraries:

clone https://github.com/soedinglab/b-lore.git
-lore

The Makefile uses g++ by default, which you can change depending on the compiler available on your system.

Input files

For calculating summary statistics, it uses the following file formats as input:

  1. Genotype files in Oxford format, for all loci of interest.
  2. Sample file in Oxford format

For meta-analysis, it uses the following input:

  1. Output from B-LORE summary statistics. Note that it cannot use standard SNPTEST summary statistics.
  2. (Optional) Functional genomics data, separately for each locus. Each feature file contains 2 parts: (a) a header line detailing the names of the columns in the file, and (b) a line for each SNP detailing the information for that SNP. The columns are tab-separated. The annotation tracks are present from column 4 onwards. The first 3 columns are:
    • RSID: must have the same SNP identifier as in the genotype files
    • CHR: chromosome number
    • POS: base-pair position of the SNP.
Usage
Quick start
Command line arguments

An executable file to run B-LORE is provided as bin/blore. This can used as follows:

e [--help] [COMMAND] [OPTIONS]

There are 2 commands for B-LORE:

Each of these 2 commands takes different options, as described below.

blore –summary [OPTIONS]

Create summary statistics of individual studies. Valid options are:

Option | Description | Priority | Default value :— | :— |:— | :– ‑‑gen filename(s) | Input genotype file(s), all loci should have separate genotype files and specified here (wildcards allowed) | Required | – ‑‑sample filename | Input sample file | Required | – ‑‑pheno string | Name of the phenotype as it appears in the header of the sample file| Optional | pheno ‑‑regoptiom | If specified, the variance of the regularizer will be optimized, otherwise it will be N(0, ?2) where ? is specified by --reg | Optional | – ‑‑reg float | Value of the standard deviation (?) of the regularizer | Optional | 0.01 ‑‑pca int | Number of principal components of the genotype to be included as covariates | Optional | 0 ‑‑cov string(s) | Name of covariate(s) as they appears in the header of the sample file, multiple covariates can be specified as space-separated strings | Optional | None ‑‑out directory | Name of the output directory where summary statistics will be created | Optional | directory of the genotype files ‑‑prefix string | Prefix for the summary statistics files | Optional | _summary

blore –meta [OPTIONS]

Perform meta-analysis from summary statistics of multiple studies. Valid options are:

Option | Description | Priority | Default value :— | :— |:— | :– ‑‑statinfo filename(s) | Input file prefix(es) of summary statistics, full path is required | Required | – ‑‑feature filename(s) | Input file(s) for genomic feature tracks | Optional | – ‑‑params floats | Initial values of the hyperparameters, requires 4 space-separated floats corresponding to ?? ? ? ?bg| Optional | 0.01 0.0 0.01 0.01 ‑‑muvar | If specified, ? will be optimized, otherwise it will be fixed to the initial value | Optional | – ‑‑zmax int | Maximum number of causal SNPs allowed | Optional | 2 ‑‑out directory | Name of the output directory where result files will be created | Optional | current directory ‑‑prefix string | Prefix for the meta-analysis output files | Optional | _meta

Example

View commands.sh in your favorite editor to see the commands, and execute ./commands.sh to run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.

Citation
License

B-LORE is released under the GNU General Public License version 3. See LICENSE for more details. Copyright Johannes Soeding and Saikat Banerjee.

Contact

Saikat Banerjee


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.