NaturalHistoryMuseum/raster-project

Name: raster-project

Owner: Natural History Museum

Description: Code for projecting R models using python

Forked from: ricardog/raster-project

Created: 2017-10-11 15:49:18.0

Updated: 2018-05-10 14:33:10.0

Pushed: 2018-05-10 14:33:08.0

Homepage: null

Size: 1391

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Overview

This directory provides a fast implementation of the PREDICTS projection code. There are three main components:

  1. r2py
  2. rasterset
  3. predicts-specific code

All three components are in the directory projections and get installed as a single python module (it's easier to only install one python module for now).

The directory also contains a number of driver scripts (luh2-test.py, luh5-test.py), utility scripts, and throw-away scripts I used while writing my masters' dissertation. Over time I will try to clean up this stuff and only leave here code related to projections.

Data acquisition, cleanup, and normalization

The code requires the following data-sets:

All map / grid based data needs to be cleaned up and normalized, that is make sure they all have the same projection, dimensions, and cell size.

How to

Before you begin

You need to get a copy of the source datasets and the scaled datasets for the resolution you want. For example, if you want to generate 0.5° rasters, you will need to take the TM-WORLDBORDERS shape file and rasterize it—once to generate a map of UN country code and once to generate a map of UN subregion codes.

Scripts

There are a number of scripts to can be used to generate projections using PREDICTS models.

These scripts are meant as starting points from which you should develop your own code. They have hard-coded assumptions about where to find input rasters and models, and where to save output rasters. They expect source data to be under $DATA_ROOT and generated data under $OUTDIR/<name>/...

You will need to have a number of input data rasters, e.g. UN sub-regions, reference human population density.

The script setup.sh will attempt to generate all the derived data for all land use data sources (luh2, luh5, rcp, 1km) but will likely not work under windows :(. But it at does have the recipes required to generate the data.

Code structure

Installation

Ubuntu
install -e .

In the directory you are in should take care of things. But, I've found this often fails. On linux the following seems to do the trick

 apt-get -y install software-properties-common
 apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
 add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'
 apt-get update
 apt-get install virtualenv libgdal-dev libnetcdf-dev libnetcdf11 libproj-dev python-dev libgdal1-dev gdal-bin virtualenv python-pip python-gdal libnetcdf-dev libudunits2-dev libcairo2-dev libxt-dev mosh r-base
ualenv venv
venv/bin/activate
install numpy
S_INCLUDE_PATH=/usr/include/gdal C_INCLUDE_PATH=/usr/include/gdal pip install GDAL==1.11.2 --no-binary GDAL
install -e .

This will install all the required libraries in a virtual environment (so you can keep python packages for different project separate).

You now need to set two environment variables that point to the root of the data folder and the output folder. Place the following two lines in .bash_profile, replacing /data and /ds with the location of the data and ds folders on your computer.

DATA_ROOT=/data
OUTDIR=/ds
Windows (or Mac) with Anaconda

The easiest way to install the code on Windows is to use Anaconda (or miniconda) and git. If they are not already installed on your system, follow the instructions in the Download page.

To follow this instructions, on Windows open an Anaconda prompt (should be in the start menu if Anaconda was installed properly). On macOS open a terminal. Type the commands below into the window you just opened.

Once you have conda installed first define which channels to use. Make sure the channels are listed with the following priority

  1. conda-forge
  2. r
  3. defaults

I've run into problems when conda decides to mix packages from different channels. I solved this problem by making conda-forge the highest priority channel since it has the largest selection of packages—hence better chance of solving the dependency quagmire. Adding a new channel will make it the highest priority, so add them in reverse order.

a config --add channels defaults
a config --add channels r
a config --add channels conda-forge

The next step is to clone the repo using git. Unfortunately because of the way things work on Windows you may need to switch back and forth between the git window and the conda window. Use the git window (shell) to clone the repo (on macOS use the same terminal). If you are using (Github Desktop)[https://desktop.github.com] clone the repo and then cd to the folder of the repo.

clone https://github.com/NaturalHistoryMuseum/raster-project

Go back to the conda window and create and activate a new environment for your projections. Use cd to go to the repo you just checked out.

a env create -n gis python=3.6 --file environment.yml
vate gis # For Windows

urce activate gis # For macOS

The last step is to use pip to install the projections package.

install -e .

The -e flag tells pip to make the package editable. If you edit the code the changes will get picked up automatically. You only need to re-run this step if you add a new entry point.

You now need to set two variables that point to the root of the data folder and the output folder. Replace /data and /ds with the location of the data and ds folders on your computer.

DATA_ROOT=/data
OUTDIR=/ds

You will need to do this once in every window you use, i.e. if you close a window and open a new one you will need to run this again. Unless you put these commands in a startup file.

That's it. Don't forget that to run projections you use the conda window but to use git use the git window.

Docker

This is an altenative installation method. If you can't install using conda, try using docker.

This repo contains a Dockerfile which you can use to build a docker image for generating projections. The image is built using jupyter/datascience-notebook as a base and therefore has the jupyter notebook server installed with support for python3, R, and Julia. In addition it contains many packages useful for fitting models so you can do both model fitting and projections in the same environment (but you don't have to).

The advantage of using docker is that anyone should be able to download (pull in docker-lingo) the image and get started using the code right away. All the packages are already installed and ready to go so you don't need to install anything.

Using Docker

Use docker pull to download the image on a different computer. Once the image is ready use docker run to run it

er run --rm -it ricardog/project-notebook -v
h/to/data/folder:/data -v /path/to/output/folder:/out -e
T_SUDO=yes -p 8888:8888 --user root  

It will print a URL you can use to access the notebook server. From there you can run or create notebooks and access the console. Notebooks can be in python, R, or Julia.

When using docker toolbox (instead of docker for windows or docker for mac) you will need to increase the resources of the default VM. See this page for instructions (see the section title “Change default vm settings”). You will need to choose how much memory (RAM) and CPU to allocate. This depends on how “big” the computer is and how big are the projections you want to run. You don't need to change the size of the disk since the projections place output files in a shared folder (/out).

If you use the existing scripts to generate projections, they will write output files to /out, which is a shared folder between the container and the host computer. You specify which folder to use when running the contains (see the -v options above).

When you are done, simply press Ctrl-C in the window where the container started and it will be destroyed and removed.

Build

Only follow this steps to build a new version of the image. See steps above if you want to generate projections.

Assuming you have docker toolbox or docker installed run

er build . -t ricardog/project-notebook

If the build succeeds it is a good idea to verify all the packages installed correctly (or at least verify some troublesome ones installed correctly). In particular, I've had problem with the conda dependency calculation decides to install libgdal 2.2.* when both fiona and rasterio are pinned to libgdal 2.1.*. The way I worked around this for now was to add an explicit dependency on libgdal 2.1.*. This seems to make conda do the “right thing”.

If you are satisfied with the container, push it to docker hub with

er push ricardog/project-notebook

See the inked instructions for more details. I am not certain whether anyone other than me (ricardog) can push to that image or not but we will find out when someone tries.

Model building

I wrote these notes early on as I started looking through Tim's code. They are not relevant to using the code.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.