CD2H gitForager

librariesio/metrics

Name: metrics

Owner: Libraries.io

Description: :chart_with_upwards_trend: What to measure, how to measure it.

Created: 2017-07-06 09:52:58.0

Updated: 2018-05-19 16:27:33.0

Pushed: 2018-03-29 09:39:34.0

Homepage:

Size: 56

Language: null

GitHub Committers

User	Most Recent Commit	# Commits

Other Committers

User	Email	Most Recent Commit	# Commits

README

Metrics

This repo aims to gather a diverse group of organisations, institutions and individuals to explore how best to measure, classify and otherwise infer infromation from software ecosystems and projects.

Goals

To define a set of useful direct and derivative measures for judging aspects of a project within a number of thematic areas (i.e. code quality, community, documentation).
To signpost toward sources of data.
To define a set of metrics that are missing from Libraries.io that are not provided by another service.

Process

Related work

A place to refrence the significant works of others tackling these issues in industry, academia or as individuals.

For me this process begins with a number of framing questions. These questions are user-centred and based on the needs of our users, as defined in our personas from there we can define measures and metrics.

Measures

Measures can be direct (a quoted metric i.e. 'released on 1st Jan 2017), derivative (information gleaned from data i.e. 'released more than a year ago') or aggregated (compiled from >1 data i.e. 'all releases we're within the last year').

Measures may be broken down into a number of areas. At Libraries.io we have been considering areas for code, community, distribution and documentation. The single, overarching measure in Libraries.io is SourceRank which is defined over in our documentation.

Data

Data can be 'harvested' (gathered automatically using APIs, data dumps, scraping etc) or 'farmed' (gathered from a community by contribution).

Libraries.io currently focusses on harvesting metrics. It is preferable for a metric to be present in many sources rather than a single source so that we can make like-for-like comparisons across supported package managers.

Sources

Sources contain metrics. They may also contain measures themselves. We think it is important not to rely on proprietary, third party services for measures.

Reproducibility is the key issue here. An inability to reproduce a measure from the source data (metrics) risks the ability to create a like for like comparison of two pieces of software and ties all users of the classifier to that service. This is unacceptable(?).

Contributing

While there will be no one single approach that is right for everyone our hope is that we can come to consensus about what good measures look like and what metrics they will require, so that Libraries.io can provide them.

Please feel free to fork and PR additions to any of the documents in this repo, to share your thoughts and ideas in an issue, to propose a draft specifications for areas, measures etc. as a PR or reference.

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.