CenterForOpenScience/rpp

Name: rpp

Owner: Center for Open Science

Description: Repository for working on the Reproducibility project

Created: 2015-03-19 15:44:38.0

Updated: 2017-12-10 23:16:45.0

Pushed: 2016-07-27 10:31:39.0

Homepage: null

Size: 67031

Language: PostScript

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

Reproducibility Project: Psychology

Repository for working on the Reproducibility Project: Psychology. These files represent the analyses conducted in Tilburg and reported in the Science publication, as well as the code used to generate Figures 1-3. The Github repository also operates as the issue tracker for the analysis committee.

The data are contained in this repository and were retrieved from rpp_data.csv. The analysis script includes a function to download the .csv file directly from the Open Science Framework project page. If you would like to view the .csv file separately, you can visit it here: https://osf.io/fgjvw/. Differences might arise due to changes made subsequently to the data. The data used to generate the Figures 1-3 is downloaded from the OSF.

Running the analyses for the RP:P project

There are three ways of getting the files required to reproduce all analyses in the RPP manuscript:

  1. Download the masterscript.R file, and run this in R (downloads all dependencies from the OSF)
  2. Download the zip file rpp_reproduce.zip and extract the folder (this is for the non-git users). You can use this link to do that.
  3. Clone this git repository and run the masterscript.R (this is for the git-users. The command to do this would be git clone https://github.com/centerforopenscience/rpp FOLDERNAME, where FOLDERNAME is the name of the folder these files will be contained (note your working directory to know where this folder will be placed)

Once the files are downloaded, running the analyses has been made user-friendly (please make sure you have the R statistical package installed, downloadable here).

  1. Open the masterscript.R file in R.
  2. Run all
  3. Select the directory where you downloaded the files (i.e., the folder where masterscript.R, functions.R, RPP_figures.R, and rpp_data.csv are located)
  4. Now you can run all the results.
masterscript.R explained

The masterscript.R contains all the analyses conducted by the analysis team, save for the additional figures. These have been ordered to correspond with the Statistical Analyses section of the manuscript in the Supplemental Information (note there might be some changes in the ordering that were not picked up, but in general it corresponds; if you cannot find something in direct chronological order it usually follows later in that section).

Note that there was some code run in TURBO PASCAL, which was not run in R and is only available in the supplement.

Lines 46-56 in masterscript.R correspond to the Preliminary analyses section.

Lines 58-186 in masterscript.R correspond to the Evaluating replication effect against null hypothesis of no effect section. Note that this section also includes the part with TURBO PASCAL code.

Lines 188-261 in masterscript.R correspond to the Comparing original and replication effect sizes section.

Lines 263-307 in masterscript.R correspond to the figure in Evaluating replication effect against original effect size section. Code from [A5] is in lines 541-767

Lines 309-539 are data-ordering code.

Lines 783-928 in masterscript.R correspond to the Combining original and replication effect sizes for cumulative evidence? section.

Lines 981-1051 in masterscript.R correspond to the Meta­analysis of all original study effects, and of all replication study effects section.

Lines 930-979 in masterscript.R correspond to the Meta­analysis of difference of effect size between original and replication study section.

Lines 1109-1455 in masterscript.R correspond to the Moderator Analyses section.

Versions of packages used in the analyses


Generating the Figures in the RP:P project publication

The R code that can be used to generate Figure 1-3 of the RP:P Science article is in the file RPP_figures.R
Download this file either by cloning this repository as explained above, or by simply downloading the linked file (e.g. by right-clicking the name)

To generate the Figures:

  1. Open your local of the file RPP_figures.R in your R environment.
  2. Select run all from your R GUI or use the R console to run: source('RPP_figures.R', echo=TRUE) (this assumes the RPP_figures.R file is in current working directory)

Note: You will need an internet connection to run the script.

This will create 3 PDF files in the current working directory:

  1. RPP_Figure1_vioQtile.pdf - Figure 1: Violin Quantile Plots displaying Original and Replication study densities of p-values and effect sizes.
  2. RPP_Figure2_pvalues.pdf - Figure 2: Scatterplot of Original and Replication study p-values, with a 'blow-up' of the X-axis.
  3. RPP_Figure3_ESdensity.pdf - Figure 3: Scatterplot of Original and Replication study Effect sizes, with X and Y margin density plots.

Note Figure appearance will not be exactly the same as in the Science publication.

RPP_figures.R explained

The R source file contains three main sections, SETUP, FIGURE 1, FIGURE 2, FIGURE 3.


Section: SETUP

All the custom built functions that are required to recreate the Figures are available in a GitHub sourceable file: C-3PR.R. It is available here, but it is not required (or recommended) to download the file to your local environment.

Explanation of code in this section:

Lines 22-23:

ire(devtools)  
ce_url('https://raw.githubusercontent.com/FredHasselman/toolboxR/master/C-3PR.R')  

Source C-3PR.R directly from GitHub using source_url() in package devtools.

Line 26:

T(c('plyr','dplyr','ggplot2','RColorBrewer','scales','lattice','gridExtra'))

Load (and if necessary install) libraries used for data management and plotting using the in.IT() function that is now available due to sourcing C-3PR

Line 30:

ata <- get.OSFfile(code='https://osf.io/fgjvw/',dfCln=T)$df

Read the data from the OSF storage

Note: get.OSFfile() returns a list with the CSV data (df) and information (info) containing the URL download timestamp and original column and rownames (these names will be changed if dfCln=TRUE).

Lines 32-41:
These lines contain code to select the correct studies and prepare some variables needed for plotting.


Section: FIGURE 1

This section contains the code to generate the Violin Quantile Plots. The functions are inspired by the code Troy posted on Stack Overflow on 10-03-2014. Thank you Troy!

Explanation of code in this section:

Lines 48-55:
This code creates a temporary dataset convenient for plotting the violin plots (i.e. long format data with Study as a factor indicating an original or replication study).

Lines 56-82:
This code plots panel A of the figure containing the p-values.
First the quantiles are caclulated from the data and output to the R console:

i
            freq
.0004]        24
004,0.0069]   26
069,0.0232]   22
232,0.912]    26

p
           freq
.0077]       24
077,0.198]   26
98,0.534]    24
34,0.975]    24

Then the plot is created as follows:

t regular violinplot using package ggplot2
 <- ggplot(df,aes(x=grp,y=p.value)) + 
    geom_violin(aes(group=grp),scale="width",color="grey30",fill="grey30",trim=T,adjust=.7)
t at quantiles using vioQtile() in C-3PR
0 <- vioQtile(g.pv,qtiles,probs)
rnish 
1 <- g.pv0 + geom_hline(aes(yintercept=.05),linetype=2) + 
     ggtitle("A") + xlab("") + ylab("p-value") + 
     mytheme
ew
1

ncomment to save panel A as a seperate file
save("RPP_F1_VQPpv.eps",plot=g.pv1)

Lines 83-109:
This code plots panel B of the figure containing the effect sizes.
First the quantiles are caclulated from the data and output to the R console:

i
            freq
0464,0.222]   24
22,0.372]     25
72,0.551]     25
51,0.86]      24

p
           freq
45,0.0194]   24
194,0.12]    24
2,0.323]     25
23,0.923]    24

Then the plot is created as follows:

t regular violinplot using package ggplot2
  <- ggplot(df,aes(x=grp,y=EffectSize)) + 
     geom_violin(aes(group=grpN),scale="width",fill="grey40",color="grey40",trim=T,adjust=1) 
t at quantiles using vioQtile() in C-3PR
0 <- vioQtile(g.es,qtiles=qtiles,probs=probs) 
rnish
1 <- g.es0 + 
     ggtitle("B") + xlab("") + ylab("Effect Size") + 
     scale_y_continuous(breaks=c(-.25,-.5,0,.25,.5,.75,1),limits=c(-.5,1)) + mytheme 
ew
1 

Uncomment to save panel B as a seperate file
save("RPP_F1_VQPes.eps",plot=g.es1)

Lines 110-118:
To view and save the combined plot the function multi.PLOT() is used. This function was copied from the online version of Winston Chang's Cookbook for R. Thanks Winston!

EW panels in one plot using the multi.PLOT() function from C-3PR
i.PLOT(g.pv1,g.es1,cols=2)  

VE combined plots as PDF
"RPP_Figure1_vioQtile.pdf",pagecentre=T, width=20,height=8 ,paper = "special")
i.PLOT(g.pv1,g.es1,cols=2)
off()

Section: FIGURE 2

This section contains the code to generate the scatterplot comparing Original and Replication study p-values. The Figure contains a subplot which is a magnification of the scale of the X-axis. The grey lines in the published figure were added using image editing software.

Explanation of code in this section:

Lines 130-145:
This code sets up the viewport such that the main scatterplot and the magnification are plotted correctly.

e of many ways to manipulate positions of multiple plots in one output device
<- viewport(width = 0.6, height = 0.6, x=0.4, y=.7)
<- viewport(width = 0.6, height = 0.3, x=.4, y=.25)
.show.viewport(vpM)
.show.viewport(vpZ)
is will position a ggplot "main" and "sub" in their respective slots
 <- function() {
 plot.new()
 print(main,vp = vpM)
 print(sub, vp = vpZ)

If the function full() is called, an object main and sub will be arranged on the viewport according to information provided in vpM and vpZ. The command grid.show.viewport() of package grid will reveal the coordinates defined int hese variables.

Lines 146-171: This code creates two ggplot objects, main and sub.

 <- ggplot(RPPdata,aes(x=T.pval.USE.O,y=T.pval.USE.R))+ 
ale_x_continuous(breaks=c(0,.01,.05),limits=c(0,.06)) + 
title("") + xlab("Original Study p-value") + ylab("Replication p-value") + 
ale_color_brewer(name="Journal",palette="Set2") + 
ale_size_continuous(name="Replication:\nPower",breaks=seq(0,1,length=11))  + ylim(c(0,1)) + 
om_vline(aes(xintercept=0.05),linetype=2,color=mypalette[9]) + 
om_hline(aes(yintercept=0.05),linetype=2,color=mypalette[9]) + 
om_point(aes(size=Power.Rn,color=Journal.O),alpha=.8) + mytheme

ncomment to save subplot
save("RPP_F2_pvMain.png",plot=main)

<- ggplot(RPPdata,aes(x=T.pval.USE.O,y=T.pval.USE.R)) + 
om_vline(aes(xintercept=0.05),linetype=2,color=mypalette[9]) +
om_hline(aes(yintercept=0.05),linetype=2,color=mypalette[9]) + 
om_point(aes(size=Power.Rn,color=Journal.O),alpha=.8) + 
ale_x_continuous(breaks=c(0,.001,.005),limits=c(0,.005)) + 
title("0 < p < 0.005") + 
ab("")+ ylab("") + 
ale_size_continuous(guide=F) +
ale_color_brewer(palette="Set2",guide=F) +
im(c(0,1)) + mytheme

ncomment to save subplot
save("RPP_F2_pvsub.png",plot=sub)

Lines 172-179:
To view and save the combined plot call the function full() defined above.

EW combined plots
()

VE combined plots as PDF (note: the lines emphasizing the 'blow-up' figure in the publication were drawn in later using image editing software)
"RPP_Figure2_pvalues.pdf",pagecentre=T, width=15,height=8 ,paper = "special")
()
off()

Section: FIGURE 3

This section contains the code to generate the scatterplot comparing Original and Replication study effect sizes. The Figure margins contain density plots of the X and Y axis.

Explanation of code in this section:

Lines 187-202:
This code sets up some variables needed to create the plots.
The function plotHolder() is used to create a 'dummy' plot needed to create the margin plot.

Lines 203-243:
This code creates the marginplots xDense and yDense and the main scatterplot scatterP.

margin density plot (note: gg.theme() from C-3PR can be used directly in a ggplot2() call)
se <- ggplot(RPPdata, aes(x=T.r.O, fill=oriSig)) + 
om_density(aes(y= ..count..),trim=F,alpha=.5) + 
ab("") + ylab("") + xlim(0,1) +
.theme("noax") + 
eme(legend.position = "none",plot.margin = unit(c(0,0,0,4), "lines"))

ncomment to save subplot
save("RPP_F3_xDense.png",plot=xDense)

margin density plot (note: gg.theme() from C-3PR can be used directly in a ggplot2() call)
se <- ggplot(RPPdata, aes(x=T.r.R, fill=repSig)) + 
om_density(aes(y= ..count..),trim=F,alpha=.5) + 
ab("") + ylab("") + xlim(-.5,1) + 
ord_flip() + 
.theme("noax") + 
eme(legend.position = "none", plot.margin = unit(c(0,0,3,0), "lines")) 

ncomment to save subplot
save("RPP_F3_yDense.png",plot=yDense)

e main scatterplot (note: gg.theme() from C-3PR can be used directly in a ggplot2() call)
terP<-
plot(RPPdata,aes(x=T.r.O,y=T.r.R)) +  
om_hline(aes(yintercept=0),linetype=2) +
om_abline(intercept=0,slope=1,color="Grey60")+
om_point(aes(size=Power.Rn,fill=repSig),color="Grey30",shape=21,alpha=.8) + 
om_rug(aes(color=oriSig),size=1,sides="b",alpha=.6) + 
om_rug(aes(color=repSig),,size=1,sides="l",alpha=.6) + 
ale_x_continuous(name="Original Effect Size",limits=c(0,1),breaks=c(0,.25,.5,.75,1)) + 
ale_y_continuous(name="Replication Effect Size",limits=c(-.5,1),breaks=c(-.5,-.25,0,.25,.5,.75,1)) + 
title("") + xlab("") + ylab("") + 
ale_size_continuous(name="Replication Power",range=c(2,9)) + 
ale_color_discrete(name="p-value") +
ale_fill_discrete(name="p-value") +
.theme("clean") + 
eme(legend.position=c(.9,.6), plot.margin = unit(c(-2,-1.5,2,2), "lines")) 

ncomment to save subplot
save("RPP_F3_scatter.png",plot=scatterP)

Lines 244-250: To view and save the combined plots the function grid.arrange() from the gridExtra package is used.

.arrange(xDense, blankPlot, scatterP, yDense, ncol=2, nrow=2, widths=c(4, 1.4), heights=c(1.4, 4))

VE combined plots as PDF
"RPP_Figure3_ESdensity.pdf",pagecentre=T, width=15,height=12 ,paper = "special")
.arrange(xDense, blankPlot, scatterP, yDense, ncol=2, nrow=2, widths=c(4, 1.4), heights=c(1.4, 4))
off()
Versions of packages used to generate the figures (not showing imports)

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.