monarch-initiative/sleep-apnea-clustering

Name: sleep-apnea-clustering

Owner: Monarch Initiative

Description: R library and scripts to process sleep data for clustering

Created: 2017-01-13 19:09:25.0

Updated: 2017-09-10 03:28:02.0

Pushed: 2017-10-10 00:27:20.0

Homepage:

Size: 112

Language: R

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

sleep-apnea-clustering

This repository aims to be a pipeline for processing the Sleep Heart Health Study (SHHS) dataset for cluster analysis. SHHS is available for download through the National Sleep Research Repository (NSRR). The R scripts and library functions here work with the CSV files downloaded from the NSRR listed in shhs-process.R. (Users will have to apply to download those files through the NSRR).

Currently, these scripts are intended to be run in a certain order. We are working on combining everything into one package, but to replicate the analysis done in my thesis, it is best to run the scripts.

1. shhs-process

Loads data into R workspace. The key files are a data csv and a datadict csv (the dictionary of all the variables). Note: You need to adjust file pointers to point to the directories where the files are located.

2. workingSHHS1
3. factoranal

At this point, one would go back and select only the columns they want in df (that is, the important factors).

3a. parallelpretty

Function for plotting scree plot/parallel analysis in a nice, publication-worthy way.

Code taken from https://sakaluk.wordpress.com/2016/05/26/11-make-it-pretty-scree-plots-and-parallel-analysis-using-psych-and-ggplot2/

Sakaluk, J. K., & Short, S. D. (2016). A Methodological Review of Exploratory Factor Analysis in Sexuality Research: Used Practices, Best Practices, and Data Analysis Resources. Journal of Sex Research.

4. calcGowerMat

Calculates Gower distance matrix for hc.

5. plotinternalmetric

Ideas and concepts of comparing clustering solutions came from: A consensus framework for clustering microarray data. Ted Laderas and Shannon McWeeney. OMICS: A Journal of Integrative Biology. 2007. 116-128.

At this point, pick a particular clustering method and number of clusters


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.