Name: sleep-apnea-clustering
Owner: Monarch Initiative
Description: R library and scripts to process sleep data for clustering
Created: 2017-01-13 19:09:25.0
Updated: 2017-09-10 03:28:02.0
Pushed: 2017-10-10 00:27:20.0
Size: 112
Language: R
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
This repository aims to be a pipeline for processing the Sleep Heart Health Study (SHHS) dataset for cluster analysis. SHHS is available for download through the National Sleep Research Repository (NSRR). The R scripts and library functions here work with the CSV files downloaded from the NSRR listed in shhs-process.R. (Users will have to apply to download those files through the NSRR).
Currently, these scripts are intended to be run in a certain order. We are working on combining everything into one package, but to replicate the analysis done in my thesis, it is best to run the scripts.
Loads data into R workspace. The key files are a data csv and a datadict csv (the dictionary of all the variables). Note: You need to adjust file pointers to point to the directories where the files are located.
At this point, one would go back and select only the columns they want in df (that is, the important factors).
Function for plotting scree plot/parallel analysis in a nice, publication-worthy way.
Code taken from https://sakaluk.wordpress.com/2016/05/26/11-make-it-pretty-scree-plots-and-parallel-analysis-using-psych-and-ggplot2/
Sakaluk, J. K., & Short, S. D. (2016). A Methodological Review of Exploratory Factor Analysis in Sexuality Research: Used Practices, Best Practices, and Data Analysis Resources. Journal of Sex Research.
Calculates Gower distance matrix for hc.
Ideas and concepts of comparing clustering solutions came from: A consensus framework for clustering microarray data. Ted Laderas and Shannon McWeeney. OMICS: A Journal of Integrative Biology. 2007. 116-128.
At this point, pick a particular clustering method and number of clusters