OpenBudget/data-quality-dashboard

Name: data-quality-dashboard

Owner: ?????? ?? ??????

Description: Data Quality Dashboards display statistics on a collection of published data.

Created: 2016-05-27 14:39:29.0

Updated: 2016-05-27 14:39:30.0

Pushed: 2016-06-21 11:36:14.0

Homepage:

Size: 503

Language: JavaScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Data Quality Dashboard

Data Quality Dashboard provides access to, and displays statistics on, a collection of published data. This collection of data is logically related: for example, data published by a single government department, or a group of departments.

The Data Quality Dashboard has been developed in order to display data quality information on the 25K spend data published by the UK Government. The Dashboard can be used for any published collection of data by following a few key steps.

Local development
t the code
clone https://github.com/okfn/data-quality-dashboard.git

stall the dependencies
install

ild the sources and run the server
run develop

st build the sources
run build

st run the server
run start

ew the app in your browser
 http://localhost:3000/

See the scripts section in package.json for more available commands.

Read on for details.

Application

The Data Quality Dashboard is a Node.js application written in ES6, largely using Express and React.

The app.backend module renders the basic views (using React on the server) and is responsible for preparing the data as JSON by parsing the CSV database. It also provides some simple routes for standard pages like FAQ and About.

The app.ui module is a React-Redux application for displaying the data to the user.

The codebase is written in Node.js-style CommonJS, using ES6 syntax. The app.ui code is bundled by Browserify, and app.backend is transformed using Babel at runtime.

Remote deployment

We push to Heroku, and a postinstall script ensures that app.ui is bundled before the app is served. Make sure you set NPM_CONFIG_PRODUCTION=false to include devDependencies on Heroku.

Data

The Data Quality Dashboard reads data from a flat file storage, with data written to CSV and JSON. Any publicly available file storage will do, as long as the file naming and data structure of the files is consistent.

Currently, we run the database for the UK Spend Publishing Dashboard from a public repository on GitHub. This gives easy access to the files, and enables a version history of the database.

As GitHub does not support CORS, we then use a proxy that does - RawGit.

When the application loads, it reads the data from the database, parses the content to JSON, and stores the new data representation as JSON. This JSON representation is accessible via an API endpoint that the frontend app uses.

To configure the database, the application needs to know the base path as a URL.

For example:

By default, the application expects to find at that base the following files:

Of course, each of these files must conform to a certain datastructure - think of them as tables in a database. As long as you conform to the structure and expected data within that structure, it does not matter how the database is actually produced.

For how to change the database see the Configure database section.

Schema

The Data Quality Dashboard expects the following schema.

instance.json

A single object with the following fields:

sources.csv

A CSV with the following columns:

publishers.csv

A CSV with the following columns:

results.csv

A CSV with the following columns:

performance.csv

A CSV with the following columns:

runs.csv

A CSV with the following columns:

Configure database

The database can be configured through the following environment variables:

Following this pattern, you can also configure SOURCE_TABLE, RUN_TABLE, PERFORMANCE_TABLE and INSTANCE_TABLE.

Tooling

In order to generate the result set for a Data Quality Dashboard, we build a command line utility that is designed to be run by a developer at regular intervals (as relevant for the data being assessed). This tool, Data Quality CLI is configurable to use in assessing data quality based on metrics of:

Note that, like the Data Quality Dashboard itself, the CLI has currently only been tested on the UK 25K spend data.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.