Name: datascience
Owner: Data Science 8
Description: A Python library for introductory data science
Created: 2015-07-17 18:17:39.0
Updated: 2017-12-24 21:26:02.0
Pushed: 2018-01-10 05:32:20.0
Homepage: null
Size: 16151
Language: Jupyter Notebook
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
A Berkeley library for introductory data science.
written by Professor John DeNero, Professor David Culler, Sam Lau, and Alvin Wan
For an example of usage, see the Berkeley Data 8 class.
Use pip
:
install datascience
This project adheres to Semantic Versioning.
sample_proportions
function.OrderedDict
bug in Table.hist
.CurrencyFormatter
to handle commas.Table.hist
to keep histograms in the order of the columns.join
so that it keeps all rows in the inner join of two tables.group_barh
and group_bar
to plot counts by a grouping category,
a common use case.hist
to produce a histogram for each group on a
column.pivot_hist
. Added an option to hist
to
simulate pivot_hist
's behavior.apply
, hist
, and bin
to accept multiple columns without a listhist
argument name counts
in favor of bin_column
with_column
and with_columns
(not a breaking change)group
and groups
(not a breaking change)proportions_from_distribution
method to datascience.util
.
(993e3d2)Table.column
now throws a descriptive ValueError
instead of a KeyError
when the column isn't in the table. (ef8b319)Breaking changes
table.sample
to with_replacement=True
instead
of False
. (3717b67)Additions
Map.copy
.Map.overlay
which overlays a feature(s) on a new copy of Map.
(315bb63e)table.hist
containing
and contained_in
. (#231)API reference is at http://data8.org/datascience/ .
The required environment for installation and tests is the Anaconda Python3 distribution
If you encounter an Image not found
error on Mac OSX, you may need an
XQuartz upgrade.
Start by cloning this repository:
git clone https://github.com/data-8/datascience
Install the dependencies into a Conda environment with:
conda env create -f osx_environment.yml -n datascience
# For Linux, use
conda env create -f linux_environment.yml -n datascience
Source the environment to use the correct packages while developing:
source activate datascience
# `source deactivate` will unload the environment
The above command must be run each time you develop in the package. You can also install direnv to auto-load/unload the environment.
Install datascience
locally with:
make install
Then, run the tests:
make test
After that, go ahead and start hacking!
The source activate datascience
command must be run each time you develop in
the package. Alternatively, you can install direnv to auto-load/unload
the environment.
Documentation is generated from the docstrings in the methods and is pushed online at http://data8.org/datascience/ automatically. If you want to preview the docs locally, use these commands:
make docs # Generates docs inside doc/ folder
make serve_docs # Starts a local server to view docs
We use Zenhub to organize development on this library. To get started, go ahead and install the Zenhub Chrome Extension.
Then navigate to the issue board or press b
. You'll see a screen
that looks something like this:
Here's another example.
on setup.py sdist upload -r pypi