hurwitzlab/ML_Feature_Extraction_TRAINING

Name: ML_Feature_Extraction_TRAINING

Owner: Hurwitz Lab

Description: pipeline for features extraction (kmer frequency count) to train a linear classifier

Forked from: aponsero/ML_Feature_Extraction_TRAINING

Created: 2017-11-02 21:29:58.0

Updated: 2017-11-02 21:30:00.0

Pushed: 2017-08-07 19:07:38.0

Homepage: null

Size: 11

Language: Shell

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

ML_Feature_Extraction_TRAINING

pipeline for features extraction (kmer frequency count) to train a linear classifier using HPC cluster.

Quick start
Edit scripts/config.sh file

please modify the

You can also modify

Filter and create subsets

Run

split.sh

This command will remove short contigs from the dataset (< MIN_SIZE) and create NUM_FILE files containing SPLIT_SIZE sequences randomly selected from the DATASET. The split files are stored in RESULT_DIR/.

Once the job is completed successfully, the analysis can be run.

Calculate kmer frequencies

Run

submit.sh

Will place in queue an array job for the analysis. The final output is located in SAMPLE_DIR/kmers.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.