JuliaPlots/StatPlots.jl

Name: StatPlots.jl

Owner: JuliaPlots

Description: Statistical plotting recipes for Plots.jl

Created: 2016-07-10 14:39:33.0

Updated: 2017-12-26 19:52:29.0

Pushed: 2018-01-18 14:17:56.0

Homepage: null

Size: 181

Language: Julia

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

StatPlots

Build Status

Primary author: Thomas Breloff (@tbreloff)

This package contains many statistical recipes for concepts and types introduced in the JuliaStats organization, intended to be used with Plots.jl:

Initialize:

.clone("git@github.com:JuliaPlots/StatPlots.jl.git")
g StatPlots
ize=(400,300))

Table-like data structures, including DataFrames, IndexedTables, DataStreams, etc… (see here for an exhaustive list), are supported thanks to the macro @df which allows passing columns as symbols. Those columns can then be manipulated inside the plot call, like normal Arrays:

g DataFrames, IndexedTables
 DataFrame(a = 1:10, b = 10*rand(10), c = 10 * rand(10))
df plot(:a, [:b :c], colour = [:red :blue])
df scatter(:a, :b, markersize = 4 * log.(:c + 0.1))
table(1:10, rand(10), names = [:a, :b]) # IndexedTable
t scatter(2 * :b)

Inside a @df macro call, the cols utility function can be used to refer to a range of columns:

df plot(:a, cols(2:3), colour = [:red :blue])

or to refer to a column whose symbol is represented by a variable:

:b
df plot(:a, cols(s))

In case of ambiguity, symbols not referring to DataFrame columns must be escaped by ^():

red] = rand(10)
df plot(:a, [:b :c], colour = ^([:red :blue]))

The @df macro plays nicely with the new syntax of the Query.jl data manipulation package (v0.8 and above), in that a plot command can be added at the end of a query pipeline, without having to explicitly collect the outcome of the query first:

g Query, StatPlots
>
@filter(_.a > 5) |>
@map({_.b, d = _.c-10}) |>
@df scatter(:b, :d)

The @df syntax is also compatible with Plots grouping machinery:

g RDatasets
ol = RDatasets.dataset("mlmRev","Hsb82")
school density(:MAch, group = :Sx)

To group by more than one column, use a tuple of symbols:

school density(:MAch, group = (:Sx, :Sector), legend = :topleft)

grouped


The old syntax, passing the DataFrame as the first argument to the plot call is no longer supported.


marginalhist with DataFrames
g RDatasets
 = dataset("datasets","iris")
iris marginalhist(:PetalLength, :PetalWidth)

marginalhist


corrplot and cornerplot
iris corrplot([:SepalLength :SepalWidth :PetalLength :PetalWidth], grid = false)

or also:

iris corrplot(cols(1:4), grid = false)

corrplot

A correlation plot may also be produced from a matrix:

randn(1000,4)
2] += 0.8sqrt.(abs.(M[:,1])) - 0.5M[:,3] + 5
3] -= 0.7M[:,1].^2 + 2
plot(M, label = ["x$i" for i=1:4])

erplot(M)

erplot(M, compact=true)


boxplot and violin
rt RDatasets
ers = RDatasets.dataset("lattice","singer")
singers violin(:VoicePart,:Height,marker=(0.2,:blue,stroke(0)))
singers boxplot!(:VoicePart,:Height,marker=(0.3,:orange,stroke(2)))

violin

Asymmetric violin plots can be created using the side keyword (:both - default,:right or :left), e.g.:

ers_moscow = deepcopy(singers)
ers_moscow[:Height] = singers_moscow[:Height]+5
singers violin(:VoicePart,:Height, side=:right, marker=(0.2,:blue,stroke(0)), label="Scala")
singers_moscow violin!(:VoicePart,:Height, side=:left, marker=(0.2,:red,stroke(0)), label="Moscow")

2violin


Equal-area histograms

The ea-histogram is an alternative histogram implementation, where every 'box' in the histogram contains the same number of sample points and all boxes have the same area. Areas with a higher density of points thus get higher boxes. This type of histogram shows spikes well, but may oversmooth in the tails. The y axis is not intuitively interpretable.

[randn(100); randn(100)+3; randn(100)/2+3]
istogram(a, bins = :scott, fillalpha = 0.4)

equal area histogram


Distributions
g Distributions
(Normal(3,5), fill=(0, .5,:orange))

 = Gamma(2)
ter(dist, leg=false)
(dist, func=cdf, alpha=0.3)

Quantile-Quantile plots

The qqplot function compares the quantiles of two distributions, and accepts either a vector of sample values or a Distribution. The qqnorm is a shorthand for comparing a distribution to the normal distribution. If the distributions are similar the points will be on a straight line.

rand(Normal(), 100)
rand(Cauchy(), 100)

(
lot(x, y, qqline = :fit), # qqplot of two samples, show a fitted regression line
lot(Cauchy, y),           # compare with a Cauchy distribution fitted to y; pass an instance (e.g. Normal(0,1)) to compare with a specific distribution
orm(x, qqline = :R)       # the :R default line passes through the 1st and 3rd quartiles of the distribution

skaermbillede 2017-09-28 kl 22 46 28

Grouped Bar plots
pedbar(rand(10,3), bar_position = :stack, bar_width=0.7)

tmp

This is the default:

pedbar(rand(10,3), bar_position = :dodge, bar_width=0.7)

tmp

The group syntax is also possible in combination with groupedbar:

pedbar([1, 2, 1, 2, 1, 2], rand(6), group = [1, 1, 2, 2, 3, 3])
GroupedErrors.jl for population analysis

Population analysis on a table-like data structures can be done using the highly recommended GroupedErrors package.

This external package, in combination with StatPlots, greatly simplifies the creation of two types of plots:

1. Subject by subject plot (generally a scatter plot)

Some simple summary statistics are computed for each experimental subject (mean is default but any scalar valued function would do) and then plotted against some other summary statistics, potentially splitting by some categorical experimental variable.

2. Population plot (generally a ribbon plot in continuous case, or bar plot in discrete case)

Some statistical analysis is computed at the single subject level (for example the density/hazard/cumulative of some variable, or the expected value of a variable given another) and the analysis is summarized across subjects (taking for example mean and s.e.m), potentially splitting by some categorical experimental variable.

For more information please refer to the README.

A GUI based on QML and the GR Plots.jl backend to simplify the use of StatPlots.jl and GroupedErrors.jl even further can be found here (usable but still in alpha stage).


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.