Frank Austin Nothaft

Login: fnothaft

Company: Databricks

Location: Oakland, CA

Bio: null

Blog: http://www.fnothaft.net

Blog: http://www.fnothaft.net

Member of

  1. AMPLab at UC Berkeley
  2. Big Data Genomics

Repositories

adam
A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
adam-fs-benchmarks
null
adam-queries
Small queries to run on ADAM for GDA.
adam-regression
Regression testing setup for ADAM.
adam-snpeff
SnpEff on Spark via ADAM pipe APIs
ananas
null
avocado
A Variant Caller, Distributed
bdg-formats
Open source formats for scalable genomic processing systems using Avro. Apache 2 licensed.
bdg-recipes
Recipes using BDG projects. Apache 2 licensed.
bdg-services
Utility classes for wrapping services or other interfaces around a Spark/ADAM cluster.
bdg-utils
General (non-omics) code used across BDG products. Apache 2 licensed.
BigData_2015
null
bigdatagenomics.github.io
Web Site for the Big Data Genomics Group
bits
Firebox Benchmarks
blog
null
cannoli
A little, Apache 2 licensed pipe.
cannoli-1
Big Data Genomics ADAM Pipe API wrappers for bioinformatics tools. Apache 2 licensed.
cgcloud
Image and VM management for Jenkins, Spark and Mesos clusters in EC2
cgl-docker-lib
null
cloud-scale-bwamem
null
conductor
Efficient, distributed downloads of large files from S3 to HDFS using Spark.
copier
Apache Spark-based tool for downloading a list of URLs. Apache 2 licensed.
corretto
Read error correction utilities.
deca
Distributed exome CNV analyzer. Apache 2 licensed.
docker-images
Miscellaneous Docker images.
docs
My writings.
eggo
Ready-to-go Parquet-formatted public 'omics datasets
fig
A tool for Finding Interesting variants that modify regulatory Grammar.
fnothaft.github.io
Personal website.
gatk
Official code repository for GATK versions 4 and up
gatk-whole-genome-pipeline
End to end pipeline for calling variants against an entire genome
gnocchi
null
gnocchi-1
null
gt
null
Hadoop-BAM
Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools.
hadoop-interfaces
null
homebrew-science
Scientific formulae for the Homebrew package manager
htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
IEEE_TBD_2016
null
incubator-parquet-mr
Mirror of Apache Parquet
jsr203-hadoop
A Java NIO file system provider for HDFS
jsr203-s3a
null
mango
Visualization tools for genomic data. Apache 2 licensed.
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
mmtf-workshop-2017
Structural Bioinformatics Training Workshop & Hackathon 2017
PacMin
Assembler for PacBio reads. Apache 2 licensed.
parquet-mr
Java readers/writers for Parquet columnar file formats to use with Map-Reduce
picard
A set of tools (in Java) for working with next generation sequencing data in the BAM (http://samtools.sourceforge.net) format.
qc-metrics
Read and variant metrics, useable for pipeline quality control purposes. Apache 2 licensed.
ReadTaskTeam
null
RefVariationTaskTeam
Work on data models and APIs for graph structures that represent the human reference genome plus common human genetic variants. Includes mapping to a reference genome, and standardization of the mechanism for referring to variants.
RNAdam
An RNA fusion transcript pipeline built on top of ADAM. Apache 2 licensed.
sequence-graphs
Implementation of the Sequence Graphs data model
server
A reference implementation of the APIs defined in the schemas repository.
siren-release
Public version of the SiRen project
snap
Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
snappea
Parallel alignment using SNAP on ADAM. Apache 2 licensed.
snark
A nested vector library for Scala.
spark-bam
Load genomic BAM files using Apache Spark
SparkMontage
null
toil
Python based pipeline management software for clusters that makes running recursive and dynamically scheduled computations straightforward. So far works with gridEngine, lsf, parasol and on multi-core machines.
toil-lib
A common library for functions and tools used in toil-based pipelines
toil-scripts
null
toil-wdl-api
Exemplar API that mediates Toil with a WDL front-end and workflow tracking.
training
Training materials for Strata, AMP Camp, etc
VariantDB_Challenge
Finding a scalable alternative to the VCF File for genomics analysis
vcfimp
Strict VCF Parser
workflows
Toil workflows for bigdatagenomics tools. Apache 2 licensed.
xASSEMBLEx
An ADAM/GraphX based assembler. Apache 2 licensed.

Commits To

RepositoryMost Recent Commit# Commits
bigdatagenomics/bigdatagenomics.github.io2018-01-09 05:50:44.012
bigdatagenomics/bdg-formats2017-09-12 20:07:34.087
bigdatagenomics/cannoli2017-07-05 20:54:17.010


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.