CD2H gitForager

liip/ckanext-ddi

Name: ckanext-ddi

Owner: Liip

Description: CKAN extension for DDI, developed for the World Bank

Created: 2015-02-18 12:09:13.0

Updated: 2018-01-22 18:45:40.0

Pushed: 2015-09-04 17:05:37.0

Homepage: null

Size: 1260

Language: Python

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

ckanext-ddi

DDI extension for CKAN for the Worldbank.

Features:

Configuration of DDI fields to customize display
Import DDI data via command line
Upload DDI files (XML) to a CKAN instance
Manage DDI data manually via the CKAN frontend
Harvest data from a NADA instance

Installation

Requirement: This extensions runs on CKAN 2.3 or higher.

Use pip to install this plugin. This example installs it in /home/www-data

ce /home/www-data/pyenv/bin/activate
install -e git+https://github.com/liip/ckanext-ddi.git#egg=ckanext-ddi --src /home/www-data
home/www-data/ckanext-ddi
install -r requirements.txt
on setup.py develop

Make sure to add ddi_schema, ddi_theme and nada_harvester to ckan.plugins in your config file. If you don't want to use the frontend-part of this extension, you can omit the ddi_theme.

To use the nada_harvester, make sure the ckanext-harvest extension is installed as well.

Configuration

CKAN configuration (production.ini)

Four options are available:

ext.ddi.config_file = /path/to/my/config.yml
ext.ddi.default_license = CC0-1.0
ext.ddi.allow_duplicates = True
ext.ddi.override_datasets = False

The config_file is simply the path to the DDI-specific configuration of this extension (see below). The default_license allows a user to configure a license that is used for all DDI imports, if the license is not specified explicitly. The allow_duplicates option is used to determine, if duplicate datasets are allowed or not. Duplicates are determined by the unique id_number attribute (defaults to False). With override_datasets you can specify, if you import a dataset that already exists, if a new dataset should be created or if the existing one should be overridden (defaults to False).

DDI fields configuration

The display and structure of the DDI fields can be configured individually. A separate YAML config file is used for that.

There are 3 sections:

sections: describes different section, used to group together fields
vocabularies: describes availables controlled vocabularies, that can be referenced
fields: describes all fields, their type and how they are displayed. Only the predefined fields can be used:
- id
- name
- title
- url
- author
- author_email
- maintainer
- maintainer_email
- license_id
- copyright
- version
- version_notes
- notes
- tags
- abbreviation
- study_type
- series_info
- id_number
- description
- production_type
- production_date
- abstract
- kind_of_data
- unit_of_analysis
- description_of_scope
- country
- geographic_coverage
- time_period_covered
- universe
- primary_investigator
- other_producers
- funding
- sampling_procedure
- data_collection_dates
- access_authority
- conditions
- citation_requirement
- contact_persons
- contact_persons_email

Example:

ds:
identification:
    title:
        type: text
        visible: False
        display: Title
    url:
        type: url
        display_field: title
        visible: True
        display: Source
overview:
    abstract:
        type: markdown
        visible: True
        display: Abstract
    kind_of_data:
        type: vocabulary
        visible: True
        display: Kind of Data
contact:
    contact_persons:
        type: text
        visible: False
        display: Contact Persons
    contact_persons_email:
        type: email
        display_field: contact_persons
        visible: True
        display: Contact person
        internal_display: Contact Email

bularies:
kind_of_data:
    - Sample survey data [ssd]
    - Census/enumeration data [cen]
    - Administrative records data [adm]
    - Aggregate data [agg]
    - Clinical data [cli]
    - Event/transaction data [evn]
    - Observation data/ratings [obs]
    - Process-produced data [pro]

ions:
identification: Identification
overview: Overview
contact: Contact information

Based on this configuration the web UI is generated:

Dataset page

Usage

Data can be imported either via command line or using the web interface.

Web interface

Import

If you are logged in and you have the appropriate permissions, you find a new button “Import Dataset from DDI/XML” on the dataset page.

Import Dataset from DDI/XML button

This buttons leads you to an import page, where a DDI XML can either be uploaded or specified as URL.

Import Dataset page

Manage DDI datasets

Instead of importing the DDI data, you can manually add datasets just like you would on any CKAN instance. The “Add Dataset” form is modified, so you can find all the fields from your DDI configuration (see above).

Dataset form

Run import from command line

This plugin provides the possibility to import DDI XML files using a paster command.

ce /home/www-data/pyenv/bin/activate
er --plugin=ckanext-ddi ddi import <path_or_url> [<license>] -c <path to config file>

<path_or_url> is a required parameter and - as the name implies - it can can either be a local file or a publicly accessible URL.
<license> is an optional parameter to specify the license of the dataset. Ideally this is a value from the configured license group file.

NADA harvester

To add a harvester for a NADA instance, you should be logged in and visit /harvest on your CKAN installation (e.g. http://my.ckaninstance.org/harvest). There you can add a new harvest source with the type “NADA harvester for DDI”.

In the URL field, specify the base URL of your NADA instance. If the start page of your NADA instance is my.nada-instance.org/index.php/home, then please specify http://my.nada-instance.org as the URL for the harvester.

Example configuration for NADA harvester

You can specify the configuration as JSON:

user: the CKAN user to perform the harvesting (default: harvest)
license: A default license to apply to all harvested datasets (default: empty). If this is not specified the config value ckanext.ddi.default_license is used (see above).
access_type: Parameter for NADA to specify the the data access type of the datasets, that should be harvester (default: public_use)

Possible values for access_type:

"" (empty string, i.e. all data access types are allowed)
"direct_access"
"public_use" (default)
"licensed"
"data_enclave"
"data_external"
"no_data_available"
"open_data"

JSON example:

er": "harvest", "access_type": "public_use", "license": "CC-BY-SA-4.0"}

Development

This CKAN extensions uses flake8 to ensure basic code quality.

You can add a pre-commit hook when you have installed flake8:

e8 --install-hook

Travis CI is used to check the code for all PRs.

Acknowledgements

This module was developed with support from the World Bank to provide a solution for National Statistical Offices (NSOs) that need to publish data on CKAN platforms.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.