hammerlab/stancache

Name: stancache

Owner: Hammer Lab

Description: Filecache for stan models

Created: 2016-10-31 12:38:20.0

Updated: 2017-06-20 12:50:18.0

Pushed: 2017-04-26 00:26:32.0

Homepage: null

Size: 63

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Build Status Coverage Status PyPI version

stancache

author: Jacqueline Buros Novik

Overview

Filecache for stan models

Installation

You can install this package from pypi using pip:

$ pip install stancache

Or clone the repo & run setup.py:

$ git clone https://github.com/hammerlab/stancache.git
$ python setup.py install
Introduction

This is a filecache for pystan models fit to data. Each pystan model fit to data is comprised of two parts - the compiled model code & the result of MCMC sampling of that model given data. Both model compilation & model sampling can be time-consuming operations, so both are cached as separate pickled objects on the filesystem.

This separation allows one to (for example) compile a model once & execute the model several times - caching the result each time. You might be testing the model on different samples of data, or using different initializations or passing in different parameters.

Loading pickled pystan.fit objects into memory is also safer using cached_stan_fit() since this will ensure that the compiled model is first unpickled before the fit model.

Getting started
Configuration

The configuration uses python's configparser module, allowing the user to either load a config.ini file from disk or set the configuration in code.

stancache looks for a default config file to be located in '~/.stancache.ini'. You can modify this using stancache.config.load_config('/another/config/file.ini').

Currently, the config settings include

You can use config.set_value(NAME=value) to modify a setting.

For example, you might want to set up a shared-nfs-mount containing fitted models among your collaborators:

 stancache import config
ig.set_value(CACHE_DIR='/mnt/trial-analyses/cohort1/stancache')

An updated list of configuration defaults is available in defaults.py

Fitting cached models

Once you have configured your settings, you would then use stancache.cached_stan_fit to fit your model, like so:

 stancache import stancache
 = stancache.cached_stan_fit(file = '/path/to/model.stan', data=dict(), chains=4, iter=100)

The options to cached_stan_fit are the same as those to pystan.stan (see pystan.stan documentation).

Also see ?stancache.cached_stan_fit for more details.

Caching other items

The caching is very sensitive to certain things which would change the returned object, such as the sort order of your data elements within the dictionary. But is not sensitive to other things, such as whether you use a file-based stan code or string-based version of same code.

In practice, we find that it can be helpful to cache data-preparation steps, especially when simulating data. There is thus a stancache.cached() wrapper function for this purpose,. This will save or cache all objects other than pystan.stan objects to disk using the same file-cache settings as are used for stancache.

Avoiding re-executing a model

There are a number of scenarios where you might want to use a cache of fitted models in read-only mode. You can avoid accidentally re-fitting the model by setting cache_only=True.

For example, you may have fit a set of models which you want to read into a jupyter notebook for model exploration. Or, you may be reviewing a colleague's fitted model objects. Note that this is foolproof so please back up your work.

Contributing

TBD

Examples

For example (borrowing from pystan's docs):

rt stancache

ols_code = """
 {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // s.e. of effect estimates

meters {
real mu;
real<lower=0> tau;
real eta[J];

sformed parameters {
real theta[J];
for (j in 1:J)
theta[j] <- mu + tau * eta[j];

l {
eta ~ normal(0, 1);
y ~ normal(theta, sigma);



ols_dat = {'J': 8,
           'y': [28,  8, -3,  7, -1,  1, 18, 12],
           'sigma': [15, 10, 16, 11,  9, 11, 10, 18]}

t model to data
= stancache.cached_stan_fit(model_code=schools_code, data=schools_dat,
                            iter=1000, chains=4)

ad fit model from cache
 = stancache.cached_stan_fit(model_code=schools_code, data=schools_dat,
                             iter=1000, chains=4)

In addition, there are a number of publicly-accessible ipynbs using stancache.

These include:

If you know of other examples, please let us know and we will add them to this list.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.