axiom-data-science/pyaxiom

Name: pyaxiom

Owner: Axiom Data Science

Description: An ocean data toolkit developed and used by Axiom Data Science

Created: 2015-01-22 18:55:10.0

Updated: 2016-12-13 22:30:15.0

Pushed: 2017-04-13 16:57:24.0

Homepage: null

Size: 8744

Language: Python

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

pyaxiom Build Status Anaconda-Server Badge

An ocean data toolkit developed and used by Axiom Data Science

Installation
Stable

Anaconda-Server Badge Anaconda-Server Badge

nda install -c conda-forge pyaxiom
Development

Anaconda-Server Badge Anaconda-Server Badge

nda install -c axiom-data-science pyaxiom
Enhanced netcdf4-python Dataset object

A subclass of the netCDF4.Dataset object that adds some additional features

Safe closing

Vanilla netCDF4.Dataset objects raise a RuntimeError when trying to close an already closed file. This won't raise.

 netCDF4 import Dataset

 Dataset('http://thredds45.pvd.axiomalaska.com/thredds/dodsC/grabbag/USGS_CMG_WH_OBS/WFAL/9001rcm-a.nc')
lose()
lose()
-----------------------------------------------------------------------
imeError                              Traceback (most recent call last)
thon-input-18-db44c06d8538> in <module>()
> 1 nc.close()
e/kwilcox/.virtualenvs/overlord/lib/python2.7/site-packages/netCDF4.so in netCDF4.Dataset.close (netCDF4.c:23432)()
imeError: NetCDF: Not a valid ID

 pyaxiom.netcdf.dataset import EnhancedDataset as Dataset
 Dataset('http://thredds45.pvd.axiomalaska.com/thredds/dodsC/grabbag/USGS_CMG_WH_OBS/WFAL/9001rcm-a.nc')
lose()
lose()
Retrieving variables by attributes and values/callables
 pyaxiom.netcdf.dataset import EnhancedDataset as Dataset
 Dataset('http://thredds45.pvd.axiomalaska.com/thredds/dodsC/grabbag/USGS_CMG_WH_OBS/WFAL/9001rcm-a.nc')

turn variables with a standard_name attribute equal to 'latitude'
t nc.get_variables_by_attributes(standard_name='latitude')
pe 'netCDF4.Variable'>
t64 latitude()
units: degrees_north
standard_name: latitude
long_name: sensor latitude
mited dimensions:
ent shape = ()
ing off


turn all variables with a 'standard_name attribute'
ables = nc.get_variables_by_attributes(standard_name=lambda v: v is not None)
t [s.name for s in variables]
atitude', u'longitude', u'depth', u'T_28', u'CS_300', u'CD_310', u'u_1205', u'v_1206', u'O_60', u'DO', u'time']

t creative... return all variablse with the attribute units equal to m/s and a grid_mapping attribute
ables = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None, units='m/s')
t [s.name for s in variables]
S_300', u'u_1205', u'v_1206']
IOOS URNs

More Information

URN Normalization
 pyaxiom.urn import IoosUrn
IoosUrn(asset_type='station', authority='axiom', label='station1')
t u.__dict__
set_type': 'station',
thority': 'axiom',
mponent': None,
bel': 'station1',
rsion': None}
t u.urn
:ioos:station:axiom:station1'
ython
 pyaxiom.urn import IoosUrn
IoosUrn.from_string('urn:ioos:station:axiom:station1')
t u.__dict__
set_type': 'station',
thority': 'axiom',
mponent': None,
bel': 'station1',
rsion': None}
t u.urn
:ioos:station:axiom:station1'
NetCDF Integration
 pyaxiom.utils import urnify, dictify_urn

tCDF variable attributes from a "sensor" urn
t dictify_urn('urn:ioos:sensor:axiom:station1')
andard_name': 'wind_speed'}

t dictify_urn('urn:ioos:sensor:axiom:foo:lwe_thickness_of_precipitation_amount#cell_methods=time:mean,time:variance;interval=pt1h')
andard_name': 'lwe_thickness_of_precipitation_amount',
ll_methods': 'time: mean time: variance (interval: PT1H)'}

N from `dict` of variable attributes
ibutes = {'standard_name': 'wind_speed',
          'cell_methods': 'time: mean (interval: PT24H)'}
t urnify('authority', 'label', attributes)
:ioos:sensor:authority:label:wind_speed#cell_methods=time:mean;interval=pt24h'

N from a `netCDF4` Variable object
 netCDF4.Dataset('http://thredds45.pvd.axiomalaska.com/thredds/dodsC/grabbag/USGS_CMG_WH_OBS/WFAL/9001rcm-a.nc')
t urnify('authority', 'label', nc.variables['T_28'])
:ioos:sensor:authority:label:sea_water_temperature'
Gridded NetCDF Collections
Binning files

pyaxiom installs an executable called binner that will combine many files into a single file. Useful for cleanup and optimization.

If you have a script that is opening and reading hundreds of files, those open operations are slow, and you should combine them into a single file. This doesn't handle files that overlap in time or files that have data on both sides of a bin boundary.

e: binner [-h] -o OUTPUT -d {day,month,week,year} [-f [FACTOR]]
          [-n [NCML_FILE]] [-g [GLOB_STRING]] [-a] [-s HARD_START]
          [-e HARD_END]

onal arguments:
, --help            show this help message and exit
 OUTPUT, --output OUTPUT
                    Directory to output the binned files to
 {day,month,week,year}, --delta {day,month,week,year}
                    Timedelta to bin by
 [FACTOR], --factor [FACTOR]
                    Factor to apply to the delta. Passing a '2' would be
                    (2) days or (2) months. Defauts to 1.
 [NCML_FILE], --ncml_file [NCML_FILE]
                    NcML containing an aggregation scan to use for the
                    individual files. One of 'ncml_file' or 'glob_string'
                    is required. If both are passed in, the 'glob_string'
                    is used to identify files for the collection and the
                    'ncml_file' is applied against each member.
 [GLOB_STRING], --glob_string [GLOB_STRING]
                    A Python glob.glob string to use for file
                    identification. One of 'ncml_file' or 'glob_string' is
                    required. If both are passed in, the 'glob_string' is
                    used to identify files for the collection and the
                    'ncml_file' is applied against each member.
, --apply_to_members
                    Flag to apply the NcML to each member of the
                    aggregation before extracting metadata. Ignored if
                    using a 'glob_string'. Defaults to False.
 HARD_START, --hard_start HARD_START
                    A datetime string to start the aggregation from. Only
                    members starting on or after this datetime will be
                    processed.
 HARD_END, --hard_end HARD_END
                    A datetime string to end the aggregation on. Only
                    members ending before this datetime will be processed.
Examples Directory globbing
er \
output ./output/monthly_bins \
glob_string "pyaxiom/tests/resources/coamps/cencoos_4km/wnd_tru/10m/*.nc" \
 month \
 1
Directory globbing and applying NcML file to each member
er \
output ./output/monthly_bins \
glob_string "pyaxiom/tests/resources/coamps/cencoos_4km/wnd_tru/10m/*.nc" \
 pyaxiom/tests/resources/coamps_10km_wind.ncml \
 month \
 1
NcML aggregation reading the <scan> element
er \
output ./output/monthly_bins \
 pyaxiom/tests/resources/coamps_10km_wind.ncml \
 month \
 1
Creating CF1.6 TimeSeries files
TimeSeries
 pyaxiom.netcdf.sensors import TimeSeries
name = 'test_timeseries.nc'
s = [0, 1000, 2000, 3000, 4000, 5000]
icals = None
 TimeSeries(output_directory='./output',
            latitude=32,   # WGS84
            longitude=-74, # WGS84
            station_name='timeseries_station',
            global_attributes=dict(id='myid'),
            output_filename='timeseries.nc',
            times=times,
            verticals=verticals)
es = [20, 21, 22, 23, 24, 25]
s = dict(standard_name='sea_water_temperature')
dd_variable('temperature', values=values, attributes=attrs)
lose()
TimeSeriesProfile
 pyaxiom.netcdf.sensors import TimeSeries

s = [0, 1000, 2000, 3000, 4000, 5000]  # Seconds since Epoch
icals = [0, 1, 2]  # Meters down
 TimeSeries(output_directory='./output',
            latitude=32,   # WGS84
            longitude=-74, # WGS84
            station_name='timeseriesprofile_station',
            global_attributes=dict(id='myid'),
            output_filename='timeseriesprofile.nc',
            times=times,
            verticals=verticals)
es = np.repeat([20, 21, 22, 23, 24, 25], len(verticals))
s = dict(standard_name='sea_water_temperature')
dd_variable('temperature', values=values, attributes=attrs)
lose()
Pandas Integration

Pandas integration assumes that there is a Series column time and a Series column depth in your DataFrame. Data values are pulled from a column named 'value', but you may also pass in the data_column attribute for more control.

 pyaxiom.netcdf.sensors import TimeSeries
 pd.DataFrame({ 'time':   [0, 1, 2, 3, 4, 5, 6],
                'value':  [10, 20, 30, 40, 50, 60],
                'depth':  [0, 0, 0, 0, 0, 0] })
Series.from_dataframe(df,
                      output_directory='./output',
                      latitude=30,   # WGS84
                      longitude=-74, # WGS84
                      station_name='dataframe_station',
                      global_attributes=dict(id='myid'),
                      variable_name='values',
                      variable_attributes=dict(),
                      output_filename='from_dataframe.nc')
ython
 pd.DataFrame({ 'time':   [0, 1, 2, 3, 4, 5, 6],
                'temperature':  [10, 20, 30, 40, 50, 60],
                'depth':  [0, 0, 0, 0, 0, 0] })
Series.from_dataframe(df,
                      output_directory='./output',
                      latitude=30,   # WGS84
                      longitude=-74, # WGS84
                      station_name='dataframe_station',
                      global_attributes=dict(id='myid'),
                      output_filename='from_dataframe.nc',
                      variable_name='temperature',
                      variable_attributes=dict(standard_name='air_temperature'),
                      data_column='temperature')

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.