Name: variant_warehouse
Owner: Paradigm4 Labs
Description: Examples for analyzing Genomic Variant data in SciDB
Created: 2014-11-04 19:48:19.0
Updated: 2017-08-04 12:33:18.0
Pushed: 2017-03-27 16:44:21.0
Size: 13549
Language: HTML
GitHub Committers
User | Most Recent Commit | # Commits |
---|---|---|
rvernica | 2017-03-24 05:16:08.0 | 1 |
Timothy Danford | 2014-12-05 06:34:08.0 | 1 |
Chris Beaumont | 2014-11-23 19:25:35.0 | 1 |
Alex Poliakov | 2018-03-12 22:12:52.0 | 127 |
Paradigm4Labs | 2015-05-20 03:34:34.0 | 1 |
Jonathan Rivers | 2015-05-11 21:27:28.0 | 31 |
mingshengzhangp4 | 2016-01-18 21:36:02.0 | 56 |
Kriti Sen Sharma | 2018-02-22 17:17:24.0 | 7 |
Other Committers
User | Most Recent Commit | # Commits | |
---|---|---|---|
apoliakov | apoliakov@kali.local | 2015-03-24 18:21:50.0 | 2 |
mingsheng zhang | mingshengzhangp4@paradigm4.com | 2015-10-07 15:38:53.0 | 1 |
scidb | scidb@ip-10-95-163-155.ec2.internal | 2015-09-10 00:23:14.0 | 2 |
SciDB user | scidb@salty1.local.paradigm4.com | 2015-10-06 00:07:49.0 | 2 |
This repository has been constructed to organize the functions to load and process variant datasets and provide other functionality to facilitate the exploration of the publicly available variant datasets in general. A few of the scripts may still be prototype. These can be adapted quickly for a variety of purposes and your particular use case.
In the base directory(variant_warehouse) are examples of loading and processing Genomic Variant Datasets in SciDB, currently built around the 1000 Genomes dataset. (http://www.1000genomes.org)
Part of the original prototype was adapted from scidb-genotypes by Douglas Slotta (NCBI) (http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/) See: https://github.com/slottad/scidb-genotypes
These scripts were created for SciDB 14.12 or newer. The larger the cluster - the faster these will run as they are designed for scalability. The load_tools plugin is required for a vast majority of the examples. See: www.github.com/paradigm4/load_tools
Below are examples of demonstration code for variant processing use cases.
A set of example queries using 1000 Genomes and ESP data using R. Includes sample lookups, allele counts, PCA plot, range joins.
A set of example queries using 1000 Genomes and ESP data using R-Markdown.
A set of example queries using 1000 Genomes using jupyter notebook.
Some sample queries in AFL, including grouped allele count and a join of ESP and 1000 Genomes.
A variant browser app that computes allele counts grouped by major population and makes an interactive plot.
An app that can filter and plot TCGA alteration frequencies filtered against dbNSFP scores, as well as clinical keywords. You need to have TCGA data loaded in order to run it - you can use the AMI, for example.
Some examples are shown in the Bioinformatics AMI. Last updated June 2015. Instructions for that are here: http://www.paradigm4.com/try_scidb/
The Benchmark comprises common genomic processing queries to highlight the differences between SciDB and Spark-Adam. The code for the spark benchmark is located in variant_warehouse/spark_benchmark.