Name: infiltrate-rnaseq-pipeline
Owner: Hammer Lab
Description: null
Created: 2016-07-26 17:57:06.0
Updated: 2016-12-07 19:17:54.0
Pushed: 2017-11-10 23:03:00.0
Homepage: null
Size: 7606
Language: Perl
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Launches kubernetes jobs to download and process FASTQ files:
This is used by hammmerlab/immune-infiltrate-explorations
.
Launch a single-node NFS server. I called it mz-nfs-vm
and went with 8 vCPU, 40 GB RAM.
Here's how to monitor it:
gcloud compute ssh --ssh-flag=-L3000:localhost:3000 --project=pici-1286 --zone us-east1-b mz-nfs-vm
singlefs
in the deployment details)/mz-data
Here's how to mount the NFS into a GCE VM:
apt-get install nfs-common
mkdir /mnt/mz-data
chmod a+w /mnt/mz-data
'mz-nfs-vm:/mz-data /mnt/mz-data nfs rw 0 0' | sudo tee -a /etc/fstab
mount -t nfs mz-nfs-vm:/mz-data /mnt/mz-data
Next, we must download the proper Kallisto index into the NFS. I executed this line from mz-nfs-vm
:
il cp gs://mz-hammerlab/index/Homo_sapiens.GRCh38.cdna.all.kallisto.idx /mz-data/
Create a cluster in Kubernetes. Below I suppose it's called some-cluster
. You can create it from the Cloud Console (web UI) or from command line like this:
ud container --project "pici-1286" clusters create "some-cluster" \
--zone "us-east1-b" --machine-type "n1-highmem-4" \
--scope "https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/taskqueue","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management" \
--num-nodes "5" --network "default" --enable-cloud-logging --no-enable-cloud-monitoring;
See cluster details:
ctl cluster-info
ctl config view
Set up persistent volume in Kubernetes. First, modify nfs/nfs-pv.yaml
to have the right NFS VM name. Then execute these commands:
ud container clusters get-credentials some-cluster
ctl create -f nfs/nfs-pv.yaml # persistent volume
ctl create -f nfs/nfs-pvc.yaml # persistent volume claim
Build, test, and publish image to Google Container Registry (Docker must be installed):
et_data
ild.sh
st.sh
blish_image.sh
(Note, I ran the above from a GCE VM, but you can do it locally as well.)
The image is based on the containers from A cloud-based workflow to quantify transcript-expression levels in public cancer compendia, except with custom shell scripts dropped in. The original containers are in original_containers_from_paper
.
Create Kubernetes jobs from a YAML template, and launch them:
obs/*
on make_jobs.py # creates files in jobs/ from template.yaml and ../list_of_data.txt
ctl create -f ./jobs
ctl get jobs | wc -l # should be 127 with header; subtract one = 126
l ../list_of_data.txt # should be 126
Monitor:
ctl get jobs
ctl get pods
ctl describe jobs/download-err431606-1
When done, clean up: kubectl delete jobs --all
.
Note that logs get garbage collected quickly (a known issue):
ctl get pods -a
.
wnload-err431623-2-52tlr 0/1 Completed 0 1h
wnload-err431623-2-rds71 0/1 Error 0 2h
.
ctl logs download-err431623-2-52tlr --previous
ror from server: previous terminated container "download-err431623-2" in pod "download-err431623-2-52tlr" not found
Same procedure:
rocess/
n these on GCE VM
ild.sh
n.sh # may need to change some of the paths in here first to a tmp directory; all test files are available in gs://mz-hammerlab/data and gs://mz-hammerlab/index
blish_image.sh
n these from local machine
obs/*
on make_jobs.py
ctl create -f ./jobs
ctl get jobs | grep 'process' | wc -l # should be 63 = 126/2 -- one job per pair of paired-end reads.
l ../list_of_data.txt # 126
I experienced a weird NFS bug where I could no longer write to the NFS and existing files now appeared to be owned by usernames that belonged to Googlers. See gce bug
for the details. I worked around this by ssh-ing into the NFS VM and enabling global write permissions: sudo chmod -R 777 /mz-data/*
.
Then I ran out of space on that first NFS VM. I tried to extend the ZFS volume (following http://alittlestupid.com/2010/10/24/how-to-grow-a-zfs-volume/), but failed because the volume was always busy. So I just created a new NFS VM. Note that nfs-pv.yaml
must be updated with the nfs server's name. See last_few_jobs
for the updated YAML. You must run kubectl delete pv,pvc --all
and then rerun the create
commands for nfs-pv.yaml
and nfs-pvc.yaml
. You also must download the Kallisto index again.
An easy way to test if NFS is working properly, by the way, is this:
ctl create -f nfs/nfsFullTest.yaml
ctl create -f nfs/nfsFullTest2.yaml
ctl get jobs # check to see if they succeeded
ctl delete jobs --all
Finally, note that Kubernetes does not respect restartPolicy: never
for Jobs. I filed a feature request for being able to specify a maximum number of retries, because failed jobs will get rescheduled on new pods indefinitely and destroying all those pods takes forever (plus API requests are rate limited, and the latest kubectl client hides throttling messages from you!). Note that using the onFailure
restart policy may help – it may keep rescheduling broken jobs on the same pods instead of making new pods.
When done, spin down the cluster in the control panel. I pushed all the output and log tarballs from the two NFS servers to gs://mz-hammerlab
, then tore them down, as well.