Duke-GCB/iMADS

Name: iMADS

Owner: Duke Center for Genomic and Computational Biology

Description: Web portal and supporting services that allow searching/creating TF predictions and preferences.

Created: 2016-02-12 15:07:50.0

Updated: 2016-10-24 14:06:31.0

Pushed: 2017-12-18 16:05:45.0

Homepage:

Size: 3031

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

iMADS CircleCI

Website for searching and creating transcription factor binding predictions/preferences. Searches predictions and preference data by gene lists and custom ranges. Creates predictions/preferences for user uploaded DNA sequences.

Major Components

Predictions Config File

imadsconf.yaml - this config file determines what will be downloaded and how prediction/preference database will work

Predictions Database

Postgres database contains indexed gene lists, custom user data and predictions/preference data for use by webserver.py

Database Loading Script

load.py - downloads files and loads the database based on imadsconf.yaml

Webserver

webserver.py serves web portal and API for accessing the 'pred' database

Database Vacuum Script

vacuum.py deletes old user data from the 'pred' database

Web Portal

Directory portal/ contains the reactjs project that builds static/js/bunde.js for webserver.py to serve.

Custom Prediction/Preference Worker

Calculates predictions and preferences for user uploaded sequences. https://github.com/Duke-GCB/iMADS-worker

Running

Deployment

We use playbook imads.yml from https://github.com/Duke-GCB/gcb-ansible.

Run via docker-compose

Download docker-compose.yml and .env_sample. Rename .env_sample to .env Change DB_PASS_ENV and POSTGRES_PASSWORD to be whatever password you want. Start the database and webserver.

er-compose up -d

Populate the database. (This will take quite a while depending upon imadsconf.yaml)

er-compose run --no-deps --rm web python load.py
Javascript unit tests

Requires mocha and chai. Setup:

ortal
install -g mocha
install --dev

To run:

ortal
run test
Python unit tests

From the root directory run this:

tests

Integration tests are skipped (they are run by circleci). See tests/test_integration.py skip_postgres_tests for instructions for running them manually.

Config file updates

Under the util directory there is a python script for updating the config file. It can be run like so:

til
on create_conf.py

This will lookup the latest predictions based on the DATA_SOURCE_URL in create_conf.yaml. If you want to add a new gene list you will need to update GENOME_SPECIFIC_DATA in create_conf.yaml.

Data provenance

This database consists of datasets generated using the following programs:

Binding Predictions

Binding predictions were generated for each transcription factor on both fasta-formatted hg19 and hg38 genome assemblies using predict_tf_binding.py in https://github.com/Duke-GCB/Predict-TF-Binding. The work was divided to run the program for each combination of:

Configuration arguments for the model/core combinations are decoded from tracks-predictions.yaml. Each invocation of predict_tf_binding.py produced a BED format file containing genomic coordinates and the probability (score) that the considered TF will bind at that site.

These per-chromosome and per-model/core files were combined to produce a single bigBed format file for each transcription factor on each assembly (hg19 E2f1, hg38 E2f1, hg19 E2f4, hg38 E2f4), using a CWL workflow: bigbed-workflow-no-resize.cwl in https://github.com/Duke-GCB/TrackHubGenerator/.

The browser tracks are published at http://trackhub.genome.duke.edu/gordanlab/tf-dna-binding-predictions/. Scores from these tracks are ingested using load.py.

Binding Preferences

Binding Preferences were generated for the pairs of transcription factors in a family, enumerated in predict-TF-preference.R. The preference data are derived from the prediction data, starting with the BED format files generated by predict_tf_binding.py.

The collections of per-assembly-chromosome and per-model/core files were fed into predict-TF-preference.R in https://github.com/Duke-GCB/Predict-TF-Preference.

The preference scores were generated for each of the pairs using a CWL workflow: preference-bigbed-workflow.cwl in https://github.com/Duke-GCB/TrackHubGenerator/. This workflow considers the binding prediction at each site, determines preference using predict-TF-preference.R, and filters out insignificant preferences. This produced a bigBed format track, containing the preference score at each site.

The browser tracks are published at http://trackhub.genome.duke.edu/gordanlab/tf-dna-preferences/. Scores from these tracks are ingested using load.py.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.