Name: velocity
Owner: Cloud Native Computing Foundation (CNCF)
Description: Track development velocity
Created: 2017-04-26 14:07:13.0
Updated: 2018-04-30 17:33:39.0
Pushed: 2018-04-03 09:03:37.0
Homepage: https://cncf.io
Size: 248182
Language: Ruby
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
?# velocity
This tool set generates data for a Bubble/Motion Google Sheet Chart.
The main script is analysis.rb
. The input is a csv file created from BigQuery results.
This tool is being used for periodical chars update as described in the following documents:
Guide to the CNCF projects chart creation
Guide to the LinuxFoundation projects chart creation
Guide to the Top-30 projects chart creation
https://www.cncf.io/blog/2017/06/05/30-highest-velocity-open-source-projects/
Links to various charts and videos generated using this project
ruby analysis.rb data/data_yyyymm.csv projects/projects_yyyymm.csv map/hints.csv map/urls.csv map/defmaps.csv skip.csv ranges.csv
Depending on data, the script will stop execution and present a command line.
pry(main)>
To continue, type 'quit' and hit enter/return.
Arguments list:
.sql
files are stored in BigQuery/
folderprojects/
folderinput.csv
data/data_yyyymm.csv from BigQuery, like the following:
repo,activity,comments,prs,commits,issues,authors
rnetes,kubernetes/kubernetes,11243,9878,720,70,575,40
reum,ethereum/go-ethereum,10701,570,109,43,9979,14
output.csv
to be imported via Google Sheet (File -> Import) and then chart created from this data. It looks like this:
repo,activity,comments,prs,commits,issues,authors,project,url
et,corefx+coreclr+roslyn+cli+docs+core-setup+corefxlab+roslyn-project-system+sdk+corert+eShopOnContainers+core+buildtools,20586,14964,1956,1906,1760,418,dotnet,microsoft.com/net
rnetes+kubernetes-incubator,kubernetes+kubernetes.github.io+test-infra+ingress+charts+service-catalog+helm+minikube+dashboard+bootkube+kargo+kube-aws+community+heapster,20249,15735,2013,1323,1178,423,Kubernetes,kubernetes.io
hints.csv
a csv file with hints for repo –> project mapping, it has this format:
,project
osoft/TypeScript,Microsoft TypeScript
urls.csv
a csv file with project –> url mapping with the following format:
ect,url
lar,angular.io
defmaps.csv
a csv file with proper names for projects generated as default groupping within org:
,project
et,ASP.net
kgs,NixOS
e,=SKIP
The special flag '=SKIP' for a project means that this org should NOT be groupped
skip.csv
a csv file that contains lists of repos and/or orgs and/or projects to be skipped in the analysis:
repo,project
idevs,csu2017sp314,thoughtbot,illacceptanything,RubySteps,RainbowEngineer",Microsoft/techcasestudies,"Apache (other),OpenStack (other)"
5firstcmsc100,swcarpentry,exercism,neveragaindottech,ituring","mozilla/learning.mozilla.org,Microsoft/HolographicAcademy,w3c/aria-practices,w3c/csswg-test",
X,orgY","org1/repo1,org2/repo2","project1,project2"
ranges.csv
a csv file that contains ranges of repos properties which makes repo included in calculations.
It can constrain any of “commits, prs, comments, issues, authors” to be within range n1 .. n2 (if n1 or n2 < 0 then this value is skipped, so -1..-1 means unlimited
There can be also be exception repos/orgs that do not use those ranges:
min,max,exceptions
vity,50,-1,"kubernetes,docker/containerd,coreos/rkt"
ents,20,100000,"kubernetes,docker/containerd,coreos/rkt"
10,-1,"kubernetes,docker/containerd,coreos/rkt"
its,10,-1,"kubernetes,kubernetes-incubator"
es,10,-1,"kubernetes,docker/containerd,coreos/rkt"
ors,3,-1,"kubernetes,docker/containerd,google/go-github"
The generated output file contains all the input data (so it can be 600 rows for 1000 input rows for example). You should manually review generated output and choose how many rocords you need.
hintgen.rb
is a tool that takes data already processed for various created charts and creates distinct projects hint file from it. Example usage:
hintgen.rb data.csv map/hints.csv
Use multiple times putting a different data file (1st parameter) and generate final hints.csv
.
Data files existing in the repository:
analysis.rb
based on data_YYYYMM.csv with map/
: hints.csv
, urls.csv
, defmaps.csv
, skip.csv
, ranges.csv
parametersgenerate_motion.rb
a tool that merges data from multiple files into one to be used for motion chart. Usage:
ruby generate_motion.rb projects/files.csv motion/motion.csv motion/motion_sums.csv [projects/summaries.csv]
File files.csv
contains a list of data files to be merged. It has the following format:
,label
ects/projects_201601.csv,01/2016
ects/projects_201602.csv,02/2016
This tool generates 2 output files:
analysis.rb
. The following column is a label that will be used as “time” for google sheets motion chart.Output format:
ect,url,label,activity,comments,prs,commits,issues,authors,sum_activity,sum_comments,sum_prs,sum_commits,sum_issues,sum_authors
rnetes,kubernetes.io,2016-01,6289,5211,548,199,331,73,174254,136104,18264,8388,11498,373
rnetes,kubernetes.io,2016-02,13021,10620,1180,360,861,73,174254,136104,18264,8388,11498,373
rnetes,kubernetes.io,2017-04,174254,136104,18264,8388,11498,373,174254,136104,18264,8388,11498,373
et,microsoft.com/net,2016-01,8190,5933,779,760,718,158,158624,111553,17019,17221,12831,382
et,microsoft.com/net,2016-02,17975,12876,1652,1908,1539,172,158624,111553,17019,17221,12831,382
et,microsoft.com/net,2017-04,158624,111553,17019,17221,12831,382,158624,111553,17019,17221,12831,382
ode,code.visualstudio.com,2016-01,7526,5278,381,804,1063,112,155621,104386,9501,17650,24084,198
ode,code.visualstudio.com,2016-02,17139,11638,986,1899,2616,133,155621,104386,9501,17650,24084,198
ode,code.visualstudio.com,2017-04,155621,104386,9501,17650,24084,198,155621,104386,9501,17650,24084,198
Each row contains its label data (separate or cumulative) whereas columns with starting with max_
contain cumulative data for all labels.
This is to make the data ready for google sheet motion chart without complex cell indexing.
The final (optional) file summaries.csv
is used to read the number of authors. This is because the number of authors is computed differently.
Without the summaries file (or if a given project is not in the summaries file), we have a number of distinct authors in each period. Summary value is a sum of all periods max.
This is obviously not a real count of all distinct authors in all periods. Number of authors would be computed if another file is supplied, one which contains summary data for a longer period that is equal to sum of all periods.
Tool to create ranks per project (for all project's numeric properties) report_projects_ranks.rb
& shells/report_cncf_project_ranks.sh
Shell script projects from projects/unlimited_both.csv
and uses: reports/cncf_projects_config.csv
file to get a list of projects that needs to be included in the rank statistics.
File format is:
ect
ect1
ect2
ectN
It outputs a rank statistics file reports/cncf_projects_ranks.txt