pydata/parallel

Name: parallel

Owner: Python for Data

Description: null

Created: 2016-03-23 19:23:23.0

Updated: 2018-02-25 16:22:48.0

Pushed: 2016-07-11 04:27:21.0

Homepage: null

Size: 442

Language: Jupyter Notebook

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Parallel Python: Analyzing Large Datasets

Student Goals

Students will walk away with a high-level understanding of both parallel problems and how to reason about parallel computing frameworks. They will also walk away with hands-on experience using a variety of frameworks easily accessible from Python.

Student Level

Knowledge of Python and general familiarity with the Jupyter notebook are assumed. This is generally aimed at a beginning to intermediate audience.

Outline

For the first half, we will cover basic ideas and common patterns encountered when analyzing large data sets in parallel. We start by diving into a sequence of examples that require increasingly complex tools. From the most basic parallel API: map, we will cover some general asynchronous programming with Futures, and high level APIs for large data sets, such as Spark RDDs and Dask collections, and streaming patterns. For the second half, we focus on traits of particular parallel frameworks, including strategies for picking the right tool for your job. We will finish with some common challenges in parallel analysis, such as debugging parallel code when it goes wrong, as well as deployment and setup strategies.

Installation
  1. Install Anaconda

  2. Update select packages

    Everyone:

    conda install -c conda-forge ipyparallel ujson
    pip install snakeviz
    

    Python 2 users:

    conda install futures
    

    Linux/Mac users:

    conda install -c quasiben spark
    

Test your installation:

python -c 'import concurrent.futures, ipyparallel, dask, jupyter, pyspark'
Sponsored Cloud Provider

Google Compute Engine


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.