hammerlab/infiltrate-rnaseq-pipeline

Name: infiltrate-rnaseq-pipeline

Owner: Hammer Lab

Description: null

Created: 2016-07-26 17:57:06.0

Updated: 2016-12-07 19:17:54.0

Pushed: 2017-11-10 23:03:00.0

Homepage: null

Size: 7606

Language: Perl

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Overview

Launches kubernetes jobs to download and process FASTQ files:

pipeline graphic from August project review

This is used by hammmerlab/immune-infiltrate-explorations.

Runbook

Set up NFS for reading and writing data

Launch a single-node NFS server. I called it mz-nfs-vm and went with 8 vCPU, 40 GB RAM.

Here's how to monitor it:

Here's how to mount the NFS into a GCE VM:

 apt-get install nfs-common
 mkdir /mnt/mz-data
 chmod a+w /mnt/mz-data
 'mz-nfs-vm:/mz-data /mnt/mz-data nfs rw 0 0' | sudo tee -a /etc/fstab
 mount -t nfs mz-nfs-vm:/mz-data /mnt/mz-data

Next, we must download the proper Kallisto index into the NFS. I executed this line from mz-nfs-vm:

il cp gs://mz-hammerlab/index/Homo_sapiens.GRCh38.cdna.all.kallisto.idx /mz-data/

Create a cluster in Kubernetes. Below I suppose it's called some-cluster. You can create it from the Cloud Console (web UI) or from command line like this:

ud container --project "pici-1286" clusters create "some-cluster" \
--zone "us-east1-b" --machine-type "n1-highmem-4" \
--scope "https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/taskqueue","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management" \
--num-nodes "5" --network "default" --enable-cloud-logging --no-enable-cloud-monitoring;

See cluster details:

ctl cluster-info
ctl config view

Set up persistent volume in Kubernetes. First, modify nfs/nfs-pv.yaml to have the right NFS VM name. Then execute these commands:

ud container clusters get-credentials some-cluster
ctl create -f nfs/nfs-pv.yaml # persistent volume
ctl create -f nfs/nfs-pvc.yaml # persistent volume claim
First set of containers/jobs: downloading data

Build, test, and publish image to Google Container Registry (Docker must be installed):

et_data
ild.sh
st.sh
blish_image.sh

(Note, I ran the above from a GCE VM, but you can do it locally as well.)

The image is based on the containers from A cloud-based workflow to quantify transcript-expression levels in public cancer compendia, except with custom shell scripts dropped in. The original containers are in original_containers_from_paper.

Create Kubernetes jobs from a YAML template, and launch them:

obs/*
on make_jobs.py # creates files in jobs/ from template.yaml and ../list_of_data.txt
ctl create -f ./jobs
ctl get jobs | wc -l # should be 127 with header; subtract one = 126
l ../list_of_data.txt # should be 126

Monitor:

ctl get jobs
ctl get pods
ctl describe jobs/download-err431606-1

When done, clean up: kubectl delete jobs --all.

Note that logs get garbage collected quickly (a known issue):

ctl get pods -a
.
wnload-err431623-2-52tlr   0/1       Completed   0          1h
wnload-err431623-2-rds71   0/1       Error       0          2h
.
ctl logs download-err431623-2-52tlr --previous
ror from server: previous terminated container "download-err431623-2" in pod "download-err431623-2-52tlr" not found
Second set of containers/jobs: process.

Same procedure:

rocess/
n these on GCE VM
ild.sh
n.sh # may need to change some of the paths in here first to a tmp directory; all test files are available in gs://mz-hammerlab/data and gs://mz-hammerlab/index
blish_image.sh

n these from local machine
obs/*
on make_jobs.py
ctl create -f ./jobs
ctl get jobs | grep 'process' | wc -l # should be 63 = 126/2 -- one job per pair of paired-end reads.
l ../list_of_data.txt # 126

Gotchas

I experienced a weird NFS bug where I could no longer write to the NFS and existing files now appeared to be owned by usernames that belonged to Googlers. See gce bug for the details. I worked around this by ssh-ing into the NFS VM and enabling global write permissions: sudo chmod -R 777 /mz-data/*.

Then I ran out of space on that first NFS VM. I tried to extend the ZFS volume (following http://alittlestupid.com/2010/10/24/how-to-grow-a-zfs-volume/), but failed because the volume was always busy. So I just created a new NFS VM. Note that nfs-pv.yaml must be updated with the nfs server's name. See last_few_jobs for the updated YAML. You must run kubectl delete pv,pvc --all and then rerun the create commands for nfs-pv.yaml and nfs-pvc.yaml. You also must download the Kallisto index again.

An easy way to test if NFS is working properly, by the way, is this:

ctl create -f nfs/nfsFullTest.yaml
ctl create -f nfs/nfsFullTest2.yaml
ctl get jobs # check to see if they succeeded
ctl delete jobs --all

Finally, note that Kubernetes does not respect restartPolicy: never for Jobs. I filed a feature request for being able to specify a maximum number of retries, because failed jobs will get rescheduled on new pods indefinitely and destroying all those pods takes forever (plus API requests are rate limited, and the latest kubectl client hides throttling messages from you!). Note that using the onFailure restart policy may help – it may keep rescheduling broken jobs on the same pods instead of making new pods.

Tear down

When done, spin down the cluster in the control panel. I pushed all the output and log tarballs from the two NFS servers to gs://mz-hammerlab, then tore them down, as well.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.