CD2H GitHub Organization

These are repositories owned by "data2health". (switch to repository view)

Repositories Owned by data2health


JSP support tag library for PMC acknowledgements


An application supporting search and browsing of the acknowledgment sections of PubMed Central papers.

  1. data2health
  2. pea


CD2H Administration Core

  1. data2health
  2. cd2h-core


annual meetings and other resources

  1. data2health


The Awareness Management Platform (AMP) seeks to move beyond repetitious surveys to a model based on tracking status changes. It is based on JD eSurvey, an open source enterprise survey web application written in Java and based on the Spring Framework. Check out the tutorial videos to find out more about the application features.

  1. data2health
  2. pea


Support library for AMP

  1. data2health
  2. pea
TitleDue OnDescriptionCreator
Demo Surveys null null amikaili


This project aims to provide a mechanism to track and provide attribution.

  1. data2health
  2. pea
  3. cd2hpm
TitleDue OnDescriptionCreator
Release Contribution Role Ontology 2019-08-02T00:00:00Z nicolevasilevsky
Pilot CRO in research information systems 2019-08-02T00:00:00Z nicolevasilevsky
Host attribution workshop 2019-06-28T00:00:00Z Goal is: Attribution workshop and community building nicolevasilevsky
Create an annotation file null The annotation file should be used for mapping contributor roles and research objects. nicolevasilevsky
Acknowledgement section mining 2019-08-31T00:00:00Z nicolevasilevsky
Finalize our GitHub repo 2019-03-31T00:00:00Z Creating a milestone for all the tickets related to getting this repo set up. My due date is arbitrary. nicolevasilevsky
Engagement 2019-07-01T00:00:00Z Engagement and outreach for the AA project kristiholmes
Demonstrator Project 2019-09-02T00:00:00Z Demonstrator for the attribution project. kristiholmes
Incorporate research objects/artifacts 2019-08-01T00:00:00Z Research artifacts are essential for a complete research graph. This work will leverage ongoing community efforts in this space (especially those that have been put through a peer review process) and coordinate incorporation of these concepts into a working model. kristiholmes
Phase 3 2020-12-31T00:00:00Z LisaOKeefe1


Welcome! This is where we'll be planning the CD2H Attribution workshop - 2019


A customizable kit to develop data science community in your institution.

TitleDue OnDescriptionCreator
Implementation of BDC Kit at Two Sites 2019-08-31T00:00:00Z At two institutions (one CD2H site, and one site external to CD2H). laderast
Biodata Club Kit Assessment Surveys Completed 2019-05-31T00:00:00Z Final drafts of two surveys to be completed: 1) Learning Needs Assessment Survey and 2) Learning Outcomes Assessment Survey. laderast
Competency Mapping Completed 2019-07-31T00:00:00Z Open Access workshop materials mapped to the Educational Resource and Competency Harmonization Project. laderast
Identify the creation of at least 1 new educational module 2019-08-31T00:00:00Z Identify the creation of at least 1 new educational module. laderast


Labs demonstration app regarding various aspects of CD2H (personnel, current projects, etc.)

  1. data2health


Java library supporting integration access for the CD2H Google Drive hierarchy.

  1. data2health
  2. pea


Demonstration web site for the Center for Data to Health and its collaborators

  1. data2health



TitleDue OnDescriptionCreator
Milestone 1 - Create Git Repo for project null tmdillon


Science of Translational Science platform


Testbed for search prototypes for the People, Expertise and Attribution working group of CD2H


WordPress theme for CD2H website; tickets for the website should be in website repo

  1. wordpress
  2. data2health


CD2H-specific utility library (primarily for Labs, RPPR, etc.)

  1. data2health
  2. pea




Harvesting and integration bridge between CD2H and CLIC


Use the FHIR Resource model to enhance the existing CDMH infrastructure for the collection of Real World Data (RWD) from EHR for line level STDM data submission to FDA

  1. data2health
TitleDue OnDescriptionCreator
FHIR JSON File null null KenGersing
Integrate Authorization 2019-04-30T00:00:00Z KenGersing
US Core conformant null null KenGersing
Complete FHIR to CDMH-BRIDG mapping null null KenGersing
Create de-identified FHIR Research data warehouse null null KenGersing
Create a HIPPA / Safe Harbor compliant de-identification service null null KenGersing
Implement FHIR Server using HL7 Bulk FHIR to create a research data warehouse. null null KenGersing
Create a FHIR Query Plug In for CDM null null KenGersing
create mapping for each CDM to FHIR null null KenGersing
Deliver Query Management Consul for local query management null null KenGersing
Export to FHIR from CDMH-BRIDG null null KenGersing
Setup staging database for results null null KenGersing
ETL results into CDMH-BRIDG null null KenGersing
Result Viewer on CDMH-BRIDG null null KenGersing




Secure cloud-based infrastructure for CTSA hub data sharing #data2health

TitleDue OnDescriptionCreator
Milestone 1: Convening 2019-04-01T00:00:00Z This milestone includes active engagements from CTSA hubs that are interested in Project Landscape, Analysis and Planning. Kick-off meeting to occur beginning of April. ezampino
Milestone 4: Pilot in the cloud 2019-08-30T00:00:00Z Pilot data sharing in the cloud between multiple health org?s based on Phase 1 use case (i.e., UW, Data QUEST) ezampino
Milestone 3: Pilot Leaf 2019-07-31T00:00:00Z Pilot Leaf across multiple instances of OMOP to explore investigator?s ability for self service based on Phase 1 use case - (based on use case from phase 1) ezampino
Milestone 5: NCATS cloud environment 2019-09-30T00:00:00Z Explore / define potential for NCATS cloud environment to meet needs across CTSA consortium for multi org data sharing (scope with Ken and community) ezampino
Milestone 2: Preliminary Pilot Plan 2019-05-28T00:00:00Z -Defining the purpose, goals, and objectives. -Establishing success criteria -Outlining the benefits of pilot -Defining scope and duration -Writing pilot work plan (infrastructure of team prep) -Minimizing Risk ezampino


This project is in the Tool & Cloud Infrastructure

  1. cd2hpm
  2. data2health
TitleDue OnDescriptionCreator
A) Identify the goals of creating a cloud environment that facilities collaboration and software/data sharing across all CTSA hubs. null Due Apr 2019 / Status: Completed tmdillon
C) Design and implement a federated authorization and authentication process that will provide a common, reusable access process that will be available to all participating CTSA teams. null Due Sep 2019 Status: In progress tmdillon
C) Coordinate with NCATS on cloud design and access 2019-05-24T00:00:00Z tmdillon
D) Coordinate with Mortality-Prediction and LNP CD2H projects 2019-05-31T00:00:00Z tmdillon
E) NCATS create project image in the cloud 2019-06-06T00:00:00Z tmdillon
F) Create the GitLab container and deploy a GitLab instanceI 2019-06-10T00:00:00Z tmdillon
B) A joint effort with NCATS and CD2H resources will collect the technical, process and security requirements to design the cloud environment null Due Jul 2019 Status: in progress tmdillon
H) Create Mortality-Prediction container 2019-06-24T00:00:00Z tmdillon
J) Create NLP GitLab repo 2019-07-30T00:00:00Z tmdillon
I) G) Create NLP container 2019-07-31T00:00:00Z tmdillon
K) Create Guidebook/workflow to instruct CD2H team on deploying to the cloud 2019-08-27T00:00:00Z Ceate the Guidebook/workflow with experience from the M-P and NLP deployment tmdillon
D) Deploy Competitions and Dream Challenge in the NCATS cloud as proof of concept. null Due Sep 2019 Status: In progress tmdillon
E) NCATS technical staff and the application teams for Competitions and Dream Challenge will work as a team to identify the architecture resources required for these applications to be deployed in the cloud. null Due Jul 2019 Status In progress tmdillon
F) Leverage the experience gained from deploying Competitions and Dream Challenge to create and publish a simple, repeatable process and workflow for future CD2H deployments. Responsibilities for each step will be identified. null Due Dec 2019 tmdillon
G) The NCATS cloud environment and a well-documented deployment process/workflow is available for other CTSA teams to deploy application, tool, and algorithms. null Due Jan 2020 tmdillon
H) The app store concept and governance and best practices have been identified as processes that must be defined, approved, and documented. null Due: November 2019 e. App Store Update i. Timeline update on the App Store 1. When will the App store will be open? 2. What will be on the app store a. Update on the CTSA landscape Analysis of Tools, Software, Algorithms 3. App Store governance a. When will the draft of a written plan Governance / process for what programs can live on the app store be posted? 4. Who is creating the ?amazon rating systems for software, tools and Algorithms? e. CD2H Labs Update Governance and best practices i. When will CD2H Labs governance and best practices be ready? 1. What are the guidelines for posting shared NLP/ML artifacts? 2. When will the draft of document of how projects get chosen for the Sandbox or SaaS be completed? 3. When will the draft of the process of how does projects get promoted to production? a. For example, if the community creates an artifact i.e. algorithm for a phenotype for Alzheimer?s how is the algorithm tested, QA?ed, documented and posted on the CD2H web site tmdillon
I) Deploy the NLP solution in the NCATS cloud null Due: Jan 2020 tmdillon


Competitions is an open source tool to run NIH-style peer review of competitions, pilot projects, and research proposals in a cloud-based consortium-wide single sign-on platform. To report a bug or submit a feature request please head over the to the code repository at

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
2 - Set up authentication 2019-08-23T00:00:00Z firaswehbe
1 - Set up cloud or local hosting environment 2019-05-31T00:00:00Z firaswehbe
3 - Code Competitions for cloud 2019-09-13T00:00:00Z firaswehbe
5 - User acceptance testing 2019-12-06T00:00:00Z firaswehbe
6 - Evaluation Work Stream 2020-01-17T00:00:00Z firaswehbe
4 - Competitive Analysis 2019-12-06T00:00:00Z lmkw


A simple data model to represent contributions made by agents to research artifacts


This ontology provides contribution roles for use in crediting persons or organizations.

  1. data2health
  2. pea
  3. obofoundry


Contributorship section for the authorship paper

  1. force11
  2. authorship
  3. scholarship
  4. credit
  5. contributor-roles
  6. manubot


COVID-19 DREAM Challenge

TitleDue OnDescriptionCreator
Challenge Release 2020-05-19T00:00:00Z Challenge announcement: asap tschaffter


COVID-19 literature analytics tools


An OWL implementation of CRediT; a high-level classification of the diverse roles performed in the work leading to a published research output in the sciences. Its purpose to provide transparency in contributions to scholarly published work, to enable improved systems of attribution, credit, and accountability.

  1. data2health
  2. pea


Translational workforce roles and persona profiles

  1. translational-research
  2. pea
  3. data2health
  4. cd2hpm
TitleDue OnDescriptionCreator
Identify relevant roles in the CTS landscape 2019-02-26T00:00:00Z Compiling a list of CTS jobs titles and other stakeholder roles in the CTS landscape. saragon02
Evaluation 2019-09-30T00:00:00Z saragon02
Identify and outline hierarchy of CTS roles null saragon02
Inform Personas from interviews 2019-06-28T00:00:00Z saragon02
Inform Persona profiles from literature 2019-04-30T00:00:00Z Begin to write the Persona 1-pagers based on information learned about the various roles from job descriptions and literature. saragon02
Complete Persona 1-pagers 2019-07-31T00:00:00Z saragon02
Complete Personas guidebook and sample use cases 2019-08-30T00:00:00Z At the end of this process, disseminate the Personas to the CD2H and partner organizations along with the guidebook and sample use cases. saragon02
Choose elements for Persona templates 2019-03-29T00:00:00Z saragon02
Engagement 2019-09-30T00:00:00Z saragon02
Education 2019-09-30T00:00:00Z saragon02




Supporting multi-center research requires combining data created in different data models; this community coordination project aims to provide an data model adaptor for CTSA hubs.

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Complete FHIR appropriateness evaluation 2019-05-15T00:00:00Z How well do FHIR resources function as a persistent canonical model for CTSA clinical data. cgchute
Complete FHIR appropriateness evaluation 2019-08-31T00:00:00Z Determine how well FHIR resources function as a persistent canonical model for CTSA clinical data. Evaluate open-source commercial options: Bunsen project from Cerner, as well as persistent FHIR data stores offered by Google and Microsoft. Presently, all these options are open source. cgchute
Create or modify one HL7 FHIR resource for CTSA use 2019-08-31T00:00:00Z Build on momentum and relationships in the translator program to extend a FHIR resource for translational research application. cgchute
Demonstration of FHIR to CDM 2019-08-31T00:00:00Z Leverage output of the clinical adapter project to demonstrate transition cgchute
Establish FHIR terminology server for CD2H use 2019-08-01T00:00:00Z Mount a FHIR terminology resource on the NCATS server to accommodate CTSA semantic use-cases. cgchute
Engagement with the Data Community null tricfran


Data Quality Methods and Tools to Support CTSA Hub Data Sharing

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Milestone 1: Convening 2019-04-01T00:00:00Z This milestone includes active engagements from CTSA hubs that are interested in Project Landscape, Analysis and Planning. Kick-off meeting to occur beginning of April. ezampino
Milestone 3: Tool Refinement 2019-05-14T00:00:00Z UW DQe-c tool is being rewritten in python in order to support scalability. ezampino
Milestone 4: Tool Pilot 2019-06-15T00:00:00Z Pilot the UW DQ-e tool. Software will be deployed to site. ezampino
Milestone 5: Paper 2019-08-30T00:00:00Z OMOP Paper ezampino
Milestone 6: Quality Control of Tool 2019-08-30T00:00:00Z ezampino
Milestone 2: Preliminary Pilot Plan 2019-05-28T00:00:00Z -Defining the purpose, goals, and objectives. -Establishing success criteria -Outlining the benefits of pilot -Defining scope and duration -Writing pilot work plan (infrastructure of team prep) -Minimizing Risk ezampino


CD2H Tool and Cloud Core Project



A JSP reference implementation for the direct2experts federated search protocol

  1. data2health
  2. pea


The Data Discovery Engine project by the CD2H Data working-group

  1. data2health


DQe-c is a tool for examining completeness in clinical research data repositories.


Re-engineering the DQe-c Data Quality package


EHR DREAM Challenge

  1. data2health
  2. sta
  3. ehr
  4. evaluation-method
  5. cd2hpm
TitleDue OnDescriptionCreator
Data Aggregation and Data Quality Assessment 2019-02-04T00:00:00Z Collect patient cohorts and use DQe-c to assess completeness of collected EHR data. trberg
Internal Evaluation of Mortality Models 2019-04-08T00:00:00Z Build existent models from the literature and evaluate their performance on UW OMOP EHR data. trberg
Survey CTSAs for Prediction Models 2019-06-03T00:00:00Z Survey the CTSAs to find which sites have mortality prediction models that would be willing to participate. trberg
Build Challenge Site 2019-05-17T00:00:00Z Build the Synapse pilot challenge site with instructions for participating in the challenge. trberg
Build Challenge Infrastructure 2019-05-20T00:00:00Z Build the infrastructure for facilitating the DREAM challenge, using Docker, Synapse, and UW servers. trberg
Open Phase 2019-09-03T00:00:00Z Phase 1 of the prediction challenge. Have a period of time where the parties identified in step 1 submit their models to predict on UW patients. trberg
Leaderboard Phase 2019-10-03T00:00:00Z Begin taking phase 2 submission for prospectively evaluating the model performances, evaluating accuracy between models. This phase will run for 6 months into December 2018. trberg
Packages for DREAM Challenge on EHR data 2020-01-15T00:00:00Z Make scripts and documentation available for the CTSAs. trberg


The Drupal configuration for the CD2H website.


Educational resource and competency harmonization project

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Perform landscape analysis 2019-09-24T00:00:00Z This landscape analysis will include all translational educational resources relevant to the CTSA program. mellybelly
Perform educational resource discovery requirements analysis 2019-09-24T00:00:00Z This will include discovery and personalized (precision) matching of resources to individuals, as well as their being put together into series or pathways. mellybelly
Perform ontology requirements analysis 2019-09-24T00:00:00Z This will depend on both the landscape analysis and search requirements. mellybelly
create CTSA educational resource discovery strategic plan 2019-09-24T00:00:00Z This will inform community next steps and ensure coordination between different groups. mellybelly


This repository is for the organizers of the EHR DREAM Challenges to share code and documentation.


Conversion of EHR data (such as LOINC) to HPO codes

  1. cd2hpm
  2. data2health
TitleDue OnDescriptionCreator
Loinc2hpo paper 2019-03-08T00:00:00Z submission of revised manuscriptversion bioRxiv: todo pnrobinson
Loinc2Hpo annotations 2019-03-07T00:00:00Z Complete our first set of loinc2hpo annotations that will accompany the initial publication of the tool. Let us use the articial threshold of 7000 total annotations. pnrobinson
HPO terms for Loinc project 2019-03-29T00:00:00Z We have added numerous new terms to the HPO for the LOINC project, and this will be an ongoing project. Currently, there are 178 closed and 17 open New Term Requests for LOINC-related terms on the HPO tracker. We will consider this mielstone achieved if all 17 open NTRs have been processed. pnrobinson
Add LOINC data to HPO website 2019-05-31T00:00:00Z We would like to add the LOINC biocuration data to the HPO website pnrobinson
Implement the SMART on FHIR app at a CD2H site 2019-06-14T00:00:00Z We would like to use LOINC2HPO at several CD2H sites. Currently, the app has been used in an epic sandbox at OHSU. We would like to try the app in a "real-lfe" situation. The deliverable will be a report on what percentage of LOINC terms were covered by the analysis, new biocuration to cover gaps, and an analysis on common reasons for mapping failures. pnrobinson
Planning for CD2H-wide implementation of Ehr2Hpo project 2019-04-30T00:00:00Z pnrobinson
Data mining algorithm 2019-09-30T00:00:00Z A phenotype-driven approach opens up entirely new ways of mining EHR data for correlations that might be important in understanding disease pathophysiology, gender or age-differences, and biomarkers. It is important to develop clever ways of analyzing the data. We expect that many phenotype abnormalities might be highly correlated in all disease states, and thus identifying such an ?obvious? correlation would not be an interesting result. For instance, Abnormal hematocrit and Abnormal hemoglobin level are expected to be highly correlated. Here, we propose adapting the approach taken to characterize synergy networks in expression data [2], which was developed to find gene-gene interactions that are specifically associated with a phenotype (such as a particular cancer). The method is based on an information theoretic analysis of multivariate synergy that decomposes sets of genes into submodules each of which contains synergistically interacting gene [3]. The method can be extended to phenotype to search for pairs of markers (HPO terms) that show mutual information conditional upon the presence of a specific diagnosis (e.g., an ICD9 code, or possible an eMERGE classification). The result would be a data driven way of defining pairs of features that show a surprising correlation in the presence of a disease ? this might lead to the discovery of potential biomarkers (in this case, if one finds some HPO term in a person with some disease, then ?synergy? would suggest the other HPO term of the pair would be more likely to be present than expected by chance). We also believe this might be a good opportunity to engage CTSA hubs in data exploration or the use of this approach/resulting derived data for DREAM challenges. References Son JH, Xie G, Yuan C, Ena L, Li Z, Goldstein A, et al. Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet. 2018;103:58?73. Watkinson J, Wang X, Zheng T, Anastassiou D. Identification of gene interactions associated with disease from gene expression data using synergy networks. BMC Syst Biol. 2008;2:10. Anastassiou D. Computational analysis of the synergy among multiple interacting genes. Mol Syst Biol. 2007;3:83. - [ ] The deliverable in 6 montsh will be an implemented and tested algorithm in Java that will be able to be easily integrated into the existing LOINC2HPO code. pnrobinson
Radiology 2019-09-02T00:00:00Z Plans for extending HPO2LOINC to the domain of radiology pnrobinson
Mutual information content and synergy network algorithm null Implement and test mutual information content and synergy network algorithm as described in implementation plan. pnrobinson
Cross-site clustering algorithm null pnrobinson


Engagement effectiveness icon library


A JSP tag library supporting the Direct2Experts API.

  1. data2health
  2. pea


Application scaffold for GeoNames data

  1. data2health
  2. pea


JSP Tag library providing access to a local cache of GeoNames data

  1. data2health
  2. pea


JSP application framework for navigating a local copy of targeted GitHub metadata.


JSP tag library providing access to a local repository of GitHub metadata


CTSA Data Sharing Governance Pathways Project

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Map Examples to Matrix (Table/Grid) 2019-09-01T00:00:00Z We will collect/define use cases, perform a landscape/gap analysis, and define requirements for a pathways decision process guide. mellybelly
CTSA Master Agreement for Research 2019-11-01T00:00:00Z Parking Lot. Drafted in collaboration with CTSA hubs, this master data sharing templating system will provide hubs a general agreement (i.e., DUA functionality) for general permissions and refinement using an ontology-based templating system. mellybelly
DUA Generator Tool 2020-06-30T00:00:00Z Application - ?how to decide you need a DUA and how to execute one.? Complements the DUA templating system and provides institutional guidance. mellybelly
Build Matrix | Stakeholders, Dimensions & Components 2019-07-01T00:00:00Z Create matrix that takes into account stakeholders, dimensions of data sharing and the vapors components. mellybelly
Data Use Agreement Guideline 2019-12-31T00:00:00Z Guidelines for CTSA data sharing kstephen0909
Manuscript | Grid and Mapping 2019-12-01T00:00:00Z ezampino
Survey CTSA Community 2019-10-01T00:00:00Z Survey community, identity gaps... and survey again ezampino
Draft of Paper on Data Sharing Principles 2019-12-31T00:00:00Z ezampino
Draft of paper for grid and mapping 2019-12-31T00:00:00Z ezampino


JSP Tag library providing a graph abstraction in support of D3 visualization


Application scaffold for GRID data


JSP Tag library providing access to a local cache of GRID data in RDF


Overarching repo for hot-fhir projects.

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Establish FHIR terminology server for CD2H use null tricfran



TitleDue OnDescriptionCreator
Announce competition and collect applications 2019-02-14T00:00:00Z jmcmurry
Appoint reviewers and assign proposals 2019-03-01T00:00:00Z jmcmurry
convene study section and announce semi-finalists 2019-03-15T00:00:00Z jmcmurry
Announce 6 semi-finalists 2019-03-15T00:00:00Z jguinney
final proposals due 2019-04-05T00:00:00Z jguinney
3 finalists selected for presentations at CD2H Show & Tell 2019-04-19T00:00:00Z jguinney
Finalist presentations at CD2H show & tell 2019-05-03T00:00:00Z jguinney
winner announced; subcontract work initiated 2019-05-09T00:00:00Z jguinney


Java Server Page web application supporting entity extraction and navigation




CD2H Informatics Maturity and Best Practices Core

  1. data2health
  2. cd2h-core


An internal CD2H project supporting identification and interconnection of CD2H and CTSA information sources and flows.


InvenioRDM: an interdisciplinary open research repository

  1. data2health
  2. python
  3. invenio
  4. inveniosoftware
  5. data-indexing
  6. research-repo
  7. cd2hpm
TitleDue OnDescriptionCreator
Phase 2 Development 2019-08-31T00:00:00Z carsonicator
Engagement 2019-08-31T00:00:00Z carsonicator
Evaluation 2019-08-31T00:00:00Z carsonicator
Phase 3 Development null fenekku


A JSP tag library supporting access to various JSON constructs as tags. This is currently targeted at the V4 GitHub API.


Open Source Clinical Enterprise Data Warehouse (EDW) Data Browser (Leaf)

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Milestone 1: Convene pilot interest 2019-04-01T00:00:00Z Recruit pilot sites. ezampino
Milestone 3: Engaged Multi-Instance Pilot 2019-12-31T00:00:00Z Engaged = pilot site contributes to code; features, enhancements, etc. Pilot Leaf with multi instances at a hub, but no data sharing with UW. Pilot to conclude 12/31. ezampino
Milestone 6: Manuscript Submission 2019-05-15T00:00:00Z Submit to JAMIA. ezampino
Milestone 5: Single Instance Pilot 2019-12-31T00:00:00Z Pilot leaf as a single instance to a hub. Success based upon: Pilot site loading their data Making sure it fits with their authentication structure Recommendations / planning stage to make request to go live at their institution ezampino
Milestone 4: Release of Leaf Code 1.0 2019-05-15T00:00:00Z Code will be release to CTSA hubs after conclusion of first pilot. ezampino
Milestone 7: Evaluation of Pilot(s) Paper 2020-02-02T00:00:00Z A paper of lessons learned, and takeaways after pilots have concluded. ezampino
Milestone 8: Pilot Plan/Framework 2019-06-30T00:00:00Z Framework/ straw man created for pilot plan. ezampino


CD2H Tool and Cloud Core Project


Research Informatics and open science maturity model

  1. data2health
  2. cd2hpm
TitleDue OnDescriptionCreator
Main Adoption Model: Vignettes 2019-08-01T00:00:00Z Scoring and benchmarking information gathered from interviews. All face to face interviews will be completed. Evaluation: 1) List of institutions, interviewees, and interview dates will be posted 2) Transcripts, notes, and / or distillations from the interviews will be available davedorr9
Present to Data group 2019-03-01T00:00:00Z Wilcox davedorr9
Main Adoption Model: Adoption model tool 2019-09-15T00:00:00Z Adoption model tool as document describing model with descriptions of levels and distributions davedorr9
Main Adoption Model: Distribution 2019-08-31T00:00:00Z davedorr9
Software Assessment tool: requirements and validation 2019-09-30T00:00:00Z ezampino
Expanded Maturity Models: Existing 2019-08-30T00:00:00Z Topics prioritized, grouped and mapped to components of existing models where available ezampino
Software Assessment Tool: Development Plan 2019-10-31T00:00:00Z First Draft completed on 10/24.19 - Qualtrics form recreated in google forms linked here: ezampino
Software Assessment Tool: Version 1 of software 2019-12-01T00:00:00Z Software Assessment Tool: Version 1 of software developed and ready for pilot testing ezampino
Expanded Maturity Models: Community Engagement 2019-09-30T00:00:00Z Community responses used to complete top and bottom levels of maturity for prioritized topics ezampino
Software Assessment Tool V1.1.- Incorporate Community Feedback 2020-02-28T00:00:00Z Update the tool based on feedback from the 1/16, iCore community meeting. ramussa
Develop NLP model w/ community 2020-09-01T00:00:00Z ezampino
Develop Data Analysis model w/ Community 2020-09-01T00:00:00Z ezampino
Add 2-3 new models 2020-09-01T00:00:00Z ezampino
Develop Data Sharing model w/ Community 2020-09-01T00:00:00Z ezampino
Community Meeting 6.18.2020 2020-06-17T00:00:00Z Run a successful community meeting to get started on developing new maturity models. Interested parties sign up and work in break out room on a document related to the model they are interested in. ezampino
Community Meeting 7.16.2020 2020-07-15T00:00:00Z ezampino


JSP tag library implementing access to a local copy of the MEDLINE database.


CD2H Metadata Workshop 2019


Managing Translational Informatics Projects Tutorial

  1. data2health
TitleDue OnDescriptionCreator
Milestone 1 2019-03-26T00:00:00Z Testing pg05
Complete MTIP class 2019-03-25T00:00:00Z LisaOKeefe1
Testing milestone 2020-05-01T00:00:00Z readkev
Milepebble 2019-03-25T00:00:00Z theresenelson
Milerock 2019-03-25T00:00:00Z theresenelson
Mileboulder 2019-03-29T00:00:00Z lindsmith


CD2H Next Generation Data Sharing and Analytics Core

  1. data2health
  2. cd2h-core


Phase I project related to systematic review of the Natural Language Processing field

TitleDue OnDescriptionCreator
* 01. Develop Research Question 2018-07-01T00:00:00Z Developing the research question is the foundation of any high quality research. Ideally an iterative process between the literature, research staff, and relevant stakeholders, the goal is to scope a review question such that it is answerable, feasible and relevant. RoseRelevo
02. Consult Experts 2018-10-01T00:00:00Z Consulting with experts at the research question development stage can provide vital context and insight. Think broadly about experts, patients, allied health personnel, economists, statisticians , etc. may all be vital to understanding the question, as well as ensure that the review is relevant RoseRelevo
03. Development of Search Strategies 2018-08-30T00:00:00Z Developing a search strategy is the foundation of a quality review. The search needs to balance recall and precision. In an attempt to reduce bias the search should be broad and include multiples sources, however resource limitations make precision an important component in the development of the search strategy. RoseRelevo
04. Bibliographic Management 2018-08-30T00:00:00Z Put effort in here! Properly setting up your bibliographic management system early will save much time later. RoseRelevo
05. Protocol - write and register 2019-02-13T00:00:00Z Eligible reviews should be registered with PROSPERO the International prospective register of systematic reviews. RoseRelevo
06. Title and Abstract Review 2018-12-01T00:00:00Z Reviewers look at the title and abstracts of all citations and make any exclusions. They should be categorized as Exclude ; Proceed to Full Text Review ; Background. RoseRelevo
* 07. Full Text Review 2019-02-01T00:00:00Z This is the heart of the reviews. Reading the full text of the identified studies. Humans still do it best, conversations help. RoseRelevo
* 08. Data Abstraction 2019-03-19T00:00:00Z This is the methods abstraction, inclusion/exclusion criteria. RoseRelevo
09. Synthesis 2019-04-19T00:00:00Z This is the standard synthesis used for systematic reviews and is the summarization of the patterns. Includes narrative and quantitative. RoseRelevo
10. Manuscript Preparation null all methods, results, tables and preliminary figures are drafted. RoseRelevo
RR Meta-Milestones null This is to keep track of the things that I need to do within PM. This is a rolling list and refers to the management of repository as a template for all SRs rather than to this individual review. RoseRelevo


Cloud-based sandbox for text analytics

TitleDue OnDescriptionCreator
Task 1: Review two NLP methods 2020-03-26T00:00:00Z Review two NLP methods listed in #1. - Review methods (technology used, language, input/output format) - Check that each method is provided as a Docker image required for model-to-data benchmarking tschaffter
Task 2: Develop continuous benchmarking for NLP de-identification 2020-06-16T00:00:00Z tschaffter
NLP DREAM Challenge (Launch) 2020-10-01T00:00:00Z tschaffter
August 2020 Sprint 2020-08-31T00:00:00Z tschaffter
September 2020 Sprint 2020-09-30T00:00:00Z tschaffter


An internal project aimed at developing single source of truth workflows for program management.

TitleDue OnDescriptionCreator
CD2H project dashboard 2019-06-15T00:00:00Z Improved dashboard overview of all past and current CD2H projects, with their milestones and milestone progress indicated. The Dashboard will allow navigation to more detailed project overview pages and progress on deliverables eg. Gantt charts where appropriate. mellybelly
Project information feed to CD2H website 2019-07-01T00:00:00Z This deliverable/milestone will provision automatically fed information feed to the CD2H website that highlights to non-technical users what CD2H is working on in a big picture overview. mellybelly



TitleDue OnDescriptionCreator
All 18 Phase II projects launched in GitHub 2019-03-15T00:00:00Z jmcmurry


JSP support tag library for ORCiD


Initial repository for standard web template


Web service for the NLP de-id method Philter developed by UCSF


Open source clinical text de-identification


template for project repositories

  1. data2health
TitleDue OnDescriptionCreator
Milestone 1 example 2019-03-01T00:00:00Z Description of the milestone 1 jmcmurry
Deliverable: Report to xyz 2019-04-01T00:00:00Z Example milestone which is a deliverable jmcmurry
Milestone: partial 2019-02-23T00:00:00Z 75% complete mellybelly

A new system for managing OBO PURLs


CD2H project: Reusable Data best practice portal

  1. data2health
  2. reusable-datasets
  3. metadata
  4. schema
  5. schema-org
  6. cd2hpm
TitleDue OnDescriptionCreator
Best Practice Guidebook - Identify Re-usability Practices to Address in V1.0 of the guidebook 2019-03-22T00:00:00Z rchampieux
Best Practice Guidebook - Create Guidebook Repo and Wiki Framework 2019-04-19T00:00:00Z rchampieux
Best Practice Guidebook - Complete draft of initial documentation for guidebook 2019-06-03T00:00:00Z rchampieux
Best Practice Guidebook - Publish V1.0 of Guidebook and Call for Community Contributions 2019-06-28T00:00:00Z rchampieux
Best Practice Widgets - Hosting 2019-06-30T00:00:00Z Develop a best-practics widget for data-hosting. newgene
Best Practice Widgets - Licensing 2019-06-30T00:00:00Z Develop a best-practice web widget for picking data license. newgene
Best Practice Widgets - Metadata 2019-06-30T00:00:00Z Develop a best-practice web widget for authoring dataset metadata. newgene


This is a list of repositories, repository frameworks, and data catalogs. It focuses on technical architecture, how metadata is handled and what standards are used, and what next-generation repository features (if any) are implemented.

  1. repository-tools
  2. metadata
  3. data-catalog
  4. data2health


This ontology provides scholarly research outputs, for use in crediting persons or organizations.

  1. data2health
  2. pea


CD2H Resource Discovery Core

  1. cd2h-core
  2. data2health
TitleDue OnDescriptionCreator
FAIR Roadmap/Vision null A roadmap for the core related to FAIR; The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. kristiholmes


A collaborative guidebook for reusable data best practices

  1. data2health
  2. best-practices
  3. data-sharing
  4. reusable-datasets
TitleDue OnDescriptionCreator
YR 2 Guidebook 2019-06-30T00:00:00Z Foundational work on the guidebook kristiholmes
YR 3 Guidebook 2020-06-30T00:00:00Z Putting the guidebook into practice and creating a community resource kristiholmes
Aug F2F Draft content 2019-08-01T00:00:00Z At the F2F we will review draft content for all chapters. mellybelly
Sept CTSA Program meeting release 2019-09-24T00:00:00Z We should have complete content for all chapters in time for the fall CTSA program meeting. mellybelly
Administrative Core Guidebook Chapters 2019-12-10T00:00:00Z Guidebook chapters Admin Core authoring per 3.30.19 F2F. dietzr
Resource Discovery Core Guidebook Chapters 2019-12-10T00:00:00Z dietzr
Informatics Maturity & Best Practices Core Guidebook Chapters 2019-12-10T00:00:00Z dietzr
Next Generation Data Sharing Core Guidebook Chapters 2019-12-10T00:00:00Z dietzr
Tool & Cloud Infrastructure Core Guidebook Chapters 2019-12-10T00:00:00Z dietzr


Start here to explore the projects and repositories of the Center for Data to Health

  1. data2health


Hosting CD2H data schemas in standard

  1. data2health
  2. schema-org
  3. schema


Science of translational science research platform

  1. data2health
  2. pea
  3. cd2hpm
TitleDue OnDescriptionCreator
User interface for data exploration beta 2019-06-01T00:00:00Z A beta user interface for exploration is implemented. mellybelly
Widgets for CTSA hub websites 2019-08-01T00:00:00Z These widgets can be implemented in CTSA hub sites for exploring local and national data, using CD2H data stores. mellybelly
User interface for data exploration release 2019-09-20T00:00:00Z Official release of CD2H search application. mellybelly
Scholar tracking demonstrator 2019-05-01T00:00:00Z mellybelly
Deploy disambiguation environment 2019-08-01T00:00:00Z mellybelly
Preliminary warehouse configuration on CD2H Labs 2019-04-15T00:00:00Z eichmann
Preliminary warehouse connection points deployed 2019-07-01T00:00:00Z eichmann
Version 1.0 of warehouse deployed 2019-07-01T00:00:00Z eichmann
Landscape analysis of scholar reporting best practices 2019-06-01T00:00:00Z eichmann
Scholar tracking guidebook, dashboard, use cases 2019-09-20T00:00:00Z eichmann
Initial connectivity between 4DM & faceted search 2019-03-01T00:00:00Z eichmann
Integration of CDEK data and warehouse 2019-08-01T00:00:00Z eichmann
Expansion to full semantic faceted search in 4DM 2019-09-01T00:00:00Z eichmann


A small utility web application used to capture CTSA hub service descriptions and align them with one or more CD2H service taxonomies.

  1. data2health
  2. pea


SPARC in the Cloud for CTSA Hubs


A JSP tag library providing functionality roughly equivalent to the JSTL SQL tag set, just for a triple store.

  1. data2health
  2. pea


Prototype taxonomy definition and mapping platform


Application for discovery and sharing of software resources across a community

TitleDue OnDescriptionCreator
November 2019 2019-12-01T00:00:00Z tschaffter
December 2019 2019-12-31T00:00:00Z tschaffter
Tool Registry Prototype 2020-07-16T00:00:00Z tschaffter


CD2H Tools and Cloud Infrastructure Core

  1. data2health
  2. cd2h-core


Web app supporting curation of YouTube videos for inclusion in the CTSAsearch index.

  1. data2health
  2. pea




VIVO-like application using pure connection to a VIVOISF-compliant triplestore.


JSP Tag library providing access to a local cache of VIVOISF-compliant data

Website source code for


Issues related to the CD2H website (

  1. data2health
TitleDue OnDescriptionCreator
Stage instance deployed 2020-01-24T00:00:00Z A development instance is operational for configuration and experimentation. eichmann
Production instance deployed null New web site live eichmann
User interface design completed null This includes configuration on the staging server eichmann
Relevant legacy information migrated to new platform null eichmann
Legacy system decommissioned null eichmann
Policies and procedures documented null eichmann