# Reproducibility

## Reproducible Research

### [Better methods can’t make up for mediocre theory](https://www.nature.com/articles/d41586-019-03350-5)

<https://www.nature.com/articles/d41586-019-03350-5>

### A push for reproducibility in biomedical research

[http://www.emoryhealthsciblog.com/a-push-for-reproducibility-in-biomedical-research/?utm\_source=feedburner\&utm\_medium=twitter\&utm\_campaign=Feed%3A+EmoryHealthNowBlog+(Lab+Land)](http://www.emoryhealthsciblog.com/a-push-for-reproducibility-in-biomedical-research/?utm_source=feedburner\&utm_medium=twitter\&utm_campaign=Feed%3A+EmoryHealthNowBlog+%28Lab+Land%29)

## Statcheck

* Controversial software is proving surprisingly accurate at spotting errors in psychology papers

<http://www.sciencemag.org/news/2017/11/controversial-software-proving-surprisingly-accurate-spotting-errors-psychology-papers>

<http://statcheck.io/index.php>

* We need a similar program for #pathology articles. Though most pathology articles do not report #statistics in APA style. #statcheck

<http://statcheck.io/>

* Stat-checking software stirs up psychology

<http://www.nature.com/news/stat-checking-software-stirs-up-psychology-1.21049>

## Coursera: Reproducible Templates for Analysis and Dissemination

<https://www.coursera.org/learn/reproducible-templates-analysis/supplement/Pw4r9/articles-resources-and-file-organization-examples>

## Reproducibility Articles

* [**“Why Most Published Research Findings Are False”**](https://doi.org/10.1371/journal.pmed.0020124) by John P. A. Ioannidis
* [**“Public Availability of Published Research Data in High-Impact Journals”**](https://doi.org/10.1371/journal.pone.0024357) by Alawi A. Alsheikh-Ali , Waqas Qureshi, Mouaz H. Al-Mallah, John P. A. Ioannidis.
* [**“Gene name errors are widespread in the scientific literature”**](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7) by Mark Ziemann, Yotam Eren, and Assam El-Osta.
* [**“Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology”**](https://projecteuclid.org/euclid.aoas/1267453942) by Keith A. Baggerly and Kevin R. Coombes.
* [**“Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff”**](https://academic.oup.com/cje/article/38/2/257/1714018)

  by Thomas Herndon, Michael Ash, Robert Pollin.
* An example of an article given the highest designation for a fully reproducible article:

  [**“Air pollution and health in Scotland: a multicity study”**](https://academic.oup.com/biostatistics/article-lookup/doi/10.1093/biostatistics/kxp010)

  published in 2009 by Duncan Lee, Claire Ferguson, and Richard Mitchell. To see the article's marking, download the PDF and look for the marking letter in a bold box at the top right.
* [**“Ten Simple Rules for Reproducible Computational Research”**](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)

  by Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, Eivind Hovig.
* [**“Open-Source Genomic Analysis of Shiga-Toxin–Producing E. coli O104:H4”**](http://www.nejm.org/doi/full/10.1056/NEJMoa1107643) (multiple authors). The associated [**Dataset GigaScience**](http://gigadb.org/dataset/100001) and [**GitHub Wiki**](https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki) **.**

## Document Conversion

* [**Pandoc**](https://pandoc.org/)
  * A universal document converter

## Other Resources

* There is an R package called “rrrpkg,” which was created to “facilitate reproducible research.” Their focus is on creating a Research Compendium. Here are some illustrations of

  [**directory structure and file organization**](https://github.com/ropensci/rrrpkg)

  they use that may be helpful to you.
* [**Center for Open Science**](https://cos.io/)
* [**Gitbook**](https://www.gitbook.com/)
* [**Fivethirtyeight.com**](http://fivethirtyeight.com/)

## Reproducible Research

* A push for reproducibility in biomedical research

[http://www.emoryhealthsciblog.com/a-push-for-reproducibility-in-biomedical-research/?utm\_source=feedburner\&utm\_medium=twitter\&utm\_campaign=Feed%3A+EmoryHealthNowBlog+(Lab+Land)](http://www.emoryhealthsciblog.com/a-push-for-reproducibility-in-biomedical-research/?utm_source=feedburner\&utm_medium=twitter\&utm_campaign=Feed%3A+EmoryHealthNowBlog+%28Lab+Land%29)

## Statcheck

* Controversial software is proving surprisingly accurate at spotting errors in psychology papers

<http://www.sciencemag.org/news/2017/11/controversial-software-proving-surprisingly-accurate-spotting-errors-psychology-papers>

<http://statcheck.io/index.php>

* We need a similar program for #pathology articles. Though most pathology articles do not report #statistics in APA style. #statcheck

<http://statcheck.io/>

* Stat-checking software stirs up psychology

<http://www.nature.com/news/stat-checking-software-stirs-up-psychology-1.21049>

## Coursera: Reproducible Templates for Analysis and Dissemination

<https://www.coursera.org/learn/reproducible-templates-analysis/supplement/Pw4r9/articles-resources-and-file-organization-examples>

## Reproducibility Articles

* [**“Why Most Published Research Findings Are False”**](https://doi.org/10.1371/journal.pmed.0020124) by John P. A. Ioannidis
* [**“Public Availability of Published Research Data in High-Impact Journals”**](https://doi.org/10.1371/journal.pone.0024357) by Alawi A. Alsheikh-Ali , Waqas Qureshi, Mouaz H. Al-Mallah, John P. A. Ioannidis.
* [**“Gene name errors are widespread in the scientific literature”**](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7) by Mark Ziemann, Yotam Eren, and Assam El-Osta.
* [**“Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology”**](https://projecteuclid.org/euclid.aoas/1267453942) by Keith A. Baggerly and Kevin R. Coombes.
* [**“Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff”**](https://academic.oup.com/cje/article/38/2/257/1714018)

  by Thomas Herndon, Michael Ash, Robert Pollin.
* An example of an article given the highest designation for a fully reproducible article:

  [**“Air pollution and health in Scotland: a multicity study”**](https://academic.oup.com/biostatistics/article-lookup/doi/10.1093/biostatistics/kxp010)

  published in 2009 by Duncan Lee, Claire Ferguson, and Richard Mitchell. To see the article's marking, download the PDF and look for the marking letter in a bold box at the top right.
* [**“Ten Simple Rules for Reproducible Computational Research”**](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)

  by Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, Eivind Hovig.
* [**“Open-Source Genomic Analysis of Shiga-Toxin–Producing E. coli O104:H4”**](http://www.nejm.org/doi/full/10.1056/NEJMoa1107643) (multiple authors). The associated [**Dataset GigaScience**](http://gigadb.org/dataset/100001) and [**GitHub Wiki**](https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki) **.**

## Document Conversion

* [**Pandoc**](https://pandoc.org/)
  * A universal document converter

## Other Resources

* There is an R package called “rrrpkg,” which was created to “facilitate reproducible research.” Their focus is on creating a Research Compendium. Here are some illustrations of

  [**directory structure and file organization**](https://github.com/ropensci/rrrpkg)

  they use that may be helpful to you.
* [**Center for Open Science**](https://cos.io/)
* [**Gitbook**](https://www.gitbook.com/)
* [**Fivethirtyeight.com**](http://fivethirtyeight.com/)

## Reproducibility

## Reproducible Research

* A push for reproducibility in biomedical research

[http://www.emoryhealthsciblog.com/a-push-for-reproducibility-in-biomedical-research/?utm\_source=feedburner\&utm\_medium=twitter\&utm\_campaign=Feed%3A+EmoryHealthNowBlog+(Lab+Land)](http://www.emoryhealthsciblog.com/a-push-for-reproducibility-in-biomedical-research/?utm_source=feedburner\&utm_medium=twitter\&utm_campaign=Feed%3A+EmoryHealthNowBlog+%28Lab+Land%29)

## Statcheck

* Controversial software is proving surprisingly accurate at spotting errors in psychology papers

<http://www.sciencemag.org/news/2017/11/controversial-software-proving-surprisingly-accurate-spotting-errors-psychology-papers>

<http://statcheck.io/index.php>

* We need a similar program for #pathology articles. Though most pathology articles do not report #statistics in APA style. #statcheck

<http://statcheck.io/>

* Stat-checking software stirs up psychology

<http://www.nature.com/news/stat-checking-software-stirs-up-psychology-1.21049>

## Coursera: Reproducible Templates for Analysis and Dissemination

<https://www.coursera.org/learn/reproducible-templates-analysis/supplement/Pw4r9/articles-resources-and-file-organization-examples>

## Reproducibility Articles

* [**“Why Most Published Research Findings Are False”**](https://doi.org/10.1371/journal.pmed.0020124) by John P. A. Ioannidis
* [**“Public Availability of Published Research Data in High-Impact Journals”**](https://doi.org/10.1371/journal.pone.0024357) by Alawi A. Alsheikh-Ali , Waqas Qureshi, Mouaz H. Al-Mallah, John P. A. Ioannidis.
* [**“Gene name errors are widespread in the scientific literature”**](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7) by Mark Ziemann, Yotam Eren, and Assam El-Osta.
* [**“Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology”**](https://projecteuclid.org/euclid.aoas/1267453942) by Keith A. Baggerly and Kevin R. Coombes.
* [**“Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff”**](https://academic.oup.com/cje/article/38/2/257/1714018)

  by Thomas Herndon, Michael Ash, Robert Pollin.
* An example of an article given the highest designation for a fully reproducible article:

  [**“Air pollution and health in Scotland: a multicity study”**](https://academic.oup.com/biostatistics/article-lookup/doi/10.1093/biostatistics/kxp010)

  published in 2009 by Duncan Lee, Claire Ferguson, and Richard Mitchell. To see the article's marking, download the PDF and look for the marking letter in a bold box at the top right.
* [**“Ten Simple Rules for Reproducible Computational Research”**](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285)

  by Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, Eivind Hovig.
* [**“Open-Source Genomic Analysis of Shiga-Toxin–Producing E. coli O104:H4”**](http://www.nejm.org/doi/full/10.1056/NEJMoa1107643) (multiple authors). The associated [**Dataset GigaScience**](http://gigadb.org/dataset/100001) and [**GitHub Wiki**](https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki) **.**

## Document Conversion

* [**Pandoc**](https://pandoc.org/)
  * A universal document converter

## Other Resources

* There is an R package called “rrrpkg,” which was created to “facilitate reproducible research.” Their focus is on creating a Research Compendium. Here are some illustrations of

  [**directory structure and file organization**](https://github.com/ropensci/rrrpkg)

  they use that may be helpful to you.
* [**Center for Open Science**](https://cos.io/)
* [**Gitbook**](https://www.gitbook.com/)
* [**Fivethirtyeight.com**](http://fivethirtyeight.com/)

> Şöyle birşey düşünün, Pankreas patolojisi ile ilgileniyorsunuz. "Bizim pankreas serisi ne durumda" diye merak ettiniz. Yaptığınız şey birkaç düğmeye basmak, ve o zamana kadar bölümünüzde rapor edilen pankreas vakalarının yaş, cinsiyet, tümör çapı, tümör tipi, evre, derece, lenf nodu durumu vesair bilgileri sağ kalım grafikleri ile word dökümanı olarak oluşturuluveriyor. Bu hayal değil. Yapılabilir. Makul bir bilgi işlem çalışanı, CAP ve AJCC'ye uygun doldurulması zorunlu yapılandırılmış patoloji raporları, ana veri tablosuna erişim, biraz SQL, biraz R, biraz da R Markdown kullanarak bunu yapmak işten bile değil.

<https://www.serdarbalci.com/2018/05/tekrarlanabilir-ve-otomatik-raporlar.html>

```
# https://github.com/spgarbet/tangram
# http://htmlpreview.github.io/?https://github.com/spgarbet/tg/blob/master/vignettes/example.html

library(tangram)
library(Hmisc)
getHdata(pbc)
# View(pbc)
table <- tangram(drug ~ bili + albumin + stage + protime + sex + age + spiders, data = pbc)

table
html5(table)
latex(table)
index(table)

write(
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc, msd=TRUE, quant=seq(0, 1, 0.25)),
      fragment=TRUE, inline="hmisc.css", caption = "HTML5 Table Hmisc Style", id="tbl2"),
"tangram1.html")

write(
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
      fragment=TRUE, inline="nejm.css", caption = "HTML5 Table NEJM Style", id="tbl3"),
"tangram_nejm.html")


tbl <- tangram("drug ~ bili[2] + albumin + stage::Categorical[1] + protime + sex[1] + age + spiders[1]", 
               data=pbc,
               pformat = 5)
write(html5(tbl,
      fragment=TRUE,
      inline="lancet.css",
      caption = "HTML5 Table Lancet Style", id="tbl4"
),
"tangram_lancet.html")

index(tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc))[1:20,]


library(readxl)
MDL307_Data <- read_excel("MDL307 - Data.xlsx")

MDL307_Data <- as.data.frame(MDL307_Data)

names(MDL307_Data)

View(MDL307_Data)

MDL307_Data$biyokimyasalrekurrens <- as.factor(MDL307_Data$biyokimyasalrekurrens)
levels(MDL307_Data$biyokimyasalrekurrens)[1] <- "yok"
levels(MDL307_Data$biyokimyasalrekurrens)[2] <- "var"

collist <- c("gleasonskor",
                 "tersiyer",
                 "kribriform",
                 "cerrahisinir",
                 "ekstaprostatik",
                 "lenfnodu",
                 "seminalvezikul"
                 )


MDL307_Data[collist] <- lapply(MDL307_Data[collist], as.factor)


table <- tangram(biyokimyasalrekurrens ~ yas +
                 gleasonskor +
                 tersiyer +
                 kribriform +
                 kribriformyuzde +
                 cerrahisinir +
                 ekstaprostatik +
                 lenfnodu +
                 seminalvezikul +
                 biyokimyasalrekurrens,
                 data = MDL307_Data)
table
```

* Export R output to a file

<https://www.r-bloggers.com/export-r-output-to-a-file/>

```
out <- capture.output(summary(my_very_time_consuming_regression))

cat("My title", out, file="summary_of_my_very_time_consuming_regression.txt", sep="n", append=TRUE)
```

> Şöyle birşey düşünün, Pankreas patolojisi ile ilgileniyorsunuz. "Bizim pankreas serisi ne durumda" diye merak ettiniz. Yaptığınız şey birkaç düğmeye basmak, ve o zamana kadar bölümünüzde rapor edilen pankreas vakalarının yaş, cinsiyet, tümör çapı, tümör tipi, evre, derece, lenf nodu durumu vesair bilgileri sağ kalım grafikleri ile word dökümanı olarak oluşturuluveriyor. Bu hayal değil. Yapılabilir. Makul bir bilgi işlem çalışanı, CAP ve AJCC'ye uygun doldurulması zorunlu yapılandırılmış patoloji raporları, ana veri tablosuna erişim, biraz SQL, biraz R, biraz da R Markdown kullanarak bunu yapmak işten bile değil.

<https://www.serdarbalci.com/2018/05/tekrarlanabilir-ve-otomatik-raporlar.html>

```
# https://github.com/spgarbet/tangram
# http://htmlpreview.github.io/?https://github.com/spgarbet/tg/blob/master/vignettes/example.html

library(tangram)
library(Hmisc)
getHdata(pbc)
# View(pbc)
table <- tangram(drug ~ bili + albumin + stage + protime + sex + age + spiders, data = pbc)

table
html5(table)
latex(table)
index(table)

write(
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc, msd=TRUE, quant=seq(0, 1, 0.25)),
      fragment=TRUE, inline="hmisc.css", caption = "HTML5 Table Hmisc Style", id="tbl2"),
"tangram1.html")

write(
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
      fragment=TRUE, inline="nejm.css", caption = "HTML5 Table NEJM Style", id="tbl3"),
"tangram_nejm.html")


tbl <- tangram("drug ~ bili[2] + albumin + stage::Categorical[1] + protime + sex[1] + age + spiders[1]", 
               data=pbc,
               pformat = 5)
write(html5(tbl,
      fragment=TRUE,
      inline="lancet.css",
      caption = "HTML5 Table Lancet Style", id="tbl4"
),
"tangram_lancet.html")

index(tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc))[1:20,]


library(readxl)
MDL307_Data <- read_excel("MDL307 - Data.xlsx")

MDL307_Data <- as.data.frame(MDL307_Data)

names(MDL307_Data)

View(MDL307_Data)

MDL307_Data$biyokimyasalrekurrens <- as.factor(MDL307_Data$biyokimyasalrekurrens)
levels(MDL307_Data$biyokimyasalrekurrens)[1] <- "yok"
levels(MDL307_Data$biyokimyasalrekurrens)[2] <- "var"

collist <- c("gleasonskor",
                 "tersiyer",
                 "kribriform",
                 "cerrahisinir",
                 "ekstaprostatik",
                 "lenfnodu",
                 "seminalvezikul"
                 )


MDL307_Data[collist] <- lapply(MDL307_Data[collist], as.factor)


table <- tangram(biyokimyasalrekurrens ~ yas +
                 gleasonskor +
                 tersiyer +
                 kribriform +
                 kribriformyuzde +
                 cerrahisinir +
                 ekstaprostatik +
                 lenfnodu +
                 seminalvezikul +
                 biyokimyasalrekurrens,
                 data = MDL307_Data)
table
```

* Export R output to a file

<https://www.r-bloggers.com/export-r-output-to-a-file/>

```
out <- capture.output(summary(my_very_time_consuming_regression))

cat("My title", out, file="summary_of_my_very_time_consuming_regression.txt", sep="n", append=TRUE)
```

## Reproducible Reports

> Şöyle birşey düşünün, Pankreas patolojisi ile ilgileniyorsunuz. "Bizim pankreas serisi ne durumda" diye merak ettiniz. Yaptığınız şey birkaç düğmeye basmak, ve o zamana kadar bölümünüzde rapor edilen pankreas vakalarının yaş, cinsiyet, tümör çapı, tümör tipi, evre, derece, lenf nodu durumu vesair bilgileri sağ kalım grafikleri ile word dökümanı olarak oluşturuluveriyor. Bu hayal değil. Yapılabilir. Makul bir bilgi işlem çalışanı, CAP ve AJCC'ye uygun doldurulması zorunlu yapılandırılmış patoloji raporları, ana veri tablosuna erişim, biraz SQL, biraz R, biraz da R Markdown kullanarak bunu yapmak işten bile değil.

<https://www.serdarbalci.com/2018/05/tekrarlanabilir-ve-otomatik-raporlar.html>

```
# https://github.com/spgarbet/tangram
# http://htmlpreview.github.io/?https://github.com/spgarbet/tg/blob/master/vignettes/example.html

library(tangram)
library(Hmisc)
getHdata(pbc)
# View(pbc)
table <- tangram(drug ~ bili + albumin + stage + protime + sex + age + spiders, data = pbc)

table
html5(table)
latex(table)
index(table)

write(
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc, msd=TRUE, quant=seq(0, 1, 0.25)),
      fragment=TRUE, inline="hmisc.css", caption = "HTML5 Table Hmisc Style", id="tbl2"),
"tangram1.html")

write(
html5(tangram("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
      fragment=TRUE, inline="nejm.css", caption = "HTML5 Table NEJM Style", id="tbl3"),
"tangram_nejm.html")


tbl <- tangram("drug ~ bili[2] + albumin + stage::Categorical[1] + protime + sex[1] + age + spiders[1]", 
               data=pbc,
               pformat = 5)
write(html5(tbl,
      fragment=TRUE,
      inline="lancet.css",
      caption = "HTML5 Table Lancet Style", id="tbl4"
),
"tangram_lancet.html")

index(tangram("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc))[1:20,]


library(readxl)
MDL307_Data <- read_excel("MDL307 - Data.xlsx")

MDL307_Data <- as.data.frame(MDL307_Data)

names(MDL307_Data)

View(MDL307_Data)

MDL307_Data$biyokimyasalrekurrens <- as.factor(MDL307_Data$biyokimyasalrekurrens)
levels(MDL307_Data$biyokimyasalrekurrens)[1] <- "yok"
levels(MDL307_Data$biyokimyasalrekurrens)[2] <- "var"

collist <- c("gleasonskor",
                 "tersiyer",
                 "kribriform",
                 "cerrahisinir",
                 "ekstaprostatik",
                 "lenfnodu",
                 "seminalvezikul"
                 )


MDL307_Data[collist] <- lapply(MDL307_Data[collist], as.factor)


table <- tangram(biyokimyasalrekurrens ~ yas +
                 gleasonskor +
                 tersiyer +
                 kribriform +
                 kribriformyuzde +
                 cerrahisinir +
                 ekstaprostatik +
                 lenfnodu +
                 seminalvezikul +
                 biyokimyasalrekurrens,
                 data = MDL307_Data)
table
```

* Export R output to a file

<https://www.r-bloggers.com/export-r-output-to-a-file/>

```
out <- capture.output(summary(my_very_time_consuming_regression))

cat("My title", out, file="summary_of_my_very_time_consuming_regression.txt", sep="n", append=TRUE)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.parapathology.com/writing-journal-articles/reproducibility.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
