DataBiosphere/leonardo

Name: leonardo

Owner: Data Biosphere

Description: Notebook service

Created: 2017-07-11 22:29:16.0

Updated: 2018-01-05 02:47:00.0

Pushed: 2018-02-12 21:40:24.0

Homepage: null

Size: 1572

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status Coverage Status

Leonardo

Leo provisions Spark clusters through Google Dataproc and installs Jupyter notebooks and Hail on them. It can also proxy end-user connections to the Jupyter interface in order to provide authorization for particular users.

For more information and an overview, see the wiki.

Swagger API documentation: https://notebooks.firecloud.org/

Project status

This project is under active development. It is not yet ready for independent production deployment. See the roadmap section of the wiki for details.

Configurability

Documentation on how to configure Leo is Coming Soon?. Until then, a brief overview: there are two points at which Leonardo is pluggable.

Authorization provider

Leo provides two modes of authorization out of the box:

  1. By whitelist
  2. Through Sam, the Workbench IAM service

Users wanting to roll their own authorization mechanism can do so by subclassing LeoAuthProvider and setting up the Leo configuration file appropriately.

Service account provider

There are (up to) three service accounts used in the process of spinning up a notebook cluster:

  1. The Leo service account itself, used to make the call to Google Dataproc
  2. The service account passed to dataproc clusters create via the --service-account parameter, whose credentials will be used to set up the instance and localized into the GCE metadata server
  3. The service account that will be localized into the user environment and returned when any application asks for application default credentials.

Currently, Leo uses its own SA for #1, and the same per-user project-specific SA for #2 and #3, which it fetches from Sam. Users wanting to roll their own service account provision mechanism by subclassing ServiceAccountProvider and setting up the Leo configuration file appropriately.

Building and running Leonardo

Clone the repo.

t clone https://github.com/broadinstitute/leonardo.git
 leonardo

Ensure docker is running. Spin up MySQL locally:

docker/run-mysql.sh start leonardo

Build Leonardo and run tests.

rt SBT_OPTS="-Xmx2G -Xms1G -Dmysql.host=localhost -Dmysql.port=3311"
clean compile test

Once you're done, tear down MySQL.

cker/run-mysql.sh stop leonardo
Building Leonardo docker image

To install git-secrets

 install git-secrets

To ensure git hooks are run

r hooks/ .git/hooks/
d 755 .git/hooks/apply-git-secrets.sh

To build jar, leonardo docker image, and leonardo-notebooks docker image

cker/build.sh jar -d build

To build jar, leonardo docker image, and leonardo-notebooks docker image and push to repos broadinstitute/leonardo and broadinstitute/leonardo-notebooks tagged with git hash

cker/build.sh jar -d push

To build the leonardo-notebooks docker image with a given tag

 ./jupyter-docker/build.sh build <TAG NAME>

To push the leonardo-notebooks docker image you built to repo broadinstitute/leonardo-notebooks

 ./jupyter-docker/build.sh push <TAG NAME>

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.