class: center, middle, inverse, title-slide # Literate programming with R Markdown ### Computation Skills Workshop --- class: inverse, middle <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/reproducibility_court.png" title="A judge’s desk labeled "Reproducibility" with a witness stand right next to it. On the witness stand is a smiling and confident R Markdown document pointing at some lines of code on itself. A fuzzy monster lawyer in a polka-dot tie stands proudly saying "Nothing further!" The judge (also a cute fuzzy monster) is smiling with their hands raised in celebration of reproducible work." alt="A judge’s desk labeled "Reproducibility" with a witness stand right next to it. On the witness stand is a smiling and confident R Markdown document pointing at some lines of code on itself. A fuzzy monster lawyer in a polka-dot tie stands proudly saying "Nothing further!" The judge (also a cute fuzzy monster) is smiling with their hands raised in celebration of reproducible work." /> .footnote[Source: [@AllisonHorst](https://github.com/allisonhorst/stats-illustrations)] --- # Data science notebooks <iframe src="https://datasciencenotebook.org/" width="864" height="600px"></iframe> --- class: inverse <div style="width:100%;height:0;padding-bottom:62%;position:relative;"><iframe src="https://giphy.com/embed/mCClSS6xbi8us" width="100%" height="100%" style="position:absolute" frameBorder="0" class="giphy-embed" allowFullScreen></iframe></div> --- # A substantial debate - [I don't like notebooks](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/preview?slide=id.g362da58057_0_1) - [The first notebook war](https://yihui.org/en/2018/09/notebook-war/) -- - R vs. Python - Data analysis vs. software engineering --- # Jupyter Notebooks <img src="https://www.dataquest.io/wp-content/uploads/2019/01/interface-screenshot.png" title="A screenshot of the Jupyter Notebook interface, depicting code cells, executed output, and Markdown formatted text." alt="A screenshot of the Jupyter Notebook interface, depicting code cells, executed output, and Markdown formatted text." /> .footnote[Source: [Dataquest](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)] --- # Critiques about notebooks - Hidden state and out-of-order execution - Notebooks discourage modularity and testing - Hard to copy/paste into Slack/GitHub issues - Notebooks hinder reproducible and extensible science - Notebooks cannot easily be tracked under version control .footnote[Source: [The first notebook war](https://yihui.org/en/2018/09/notebook-war/)] -- **Data science is not the same as software engineering** --- class: inverse, middle .center[ <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/rmarkdown_rockstar.png" title="A glam rock band comprised of 3 fuzzy round monsters labeled as "Text", "Outputs", and "Code" performing together. Stylized title text reads: "R Markdown - we’re getting the band back together."" alt="A glam rock band comprised of 3 fuzzy round monsters labeled as "Text", "Outputs", and "Code" performing together. Stylized title text reads: "R Markdown - we’re getting the band back together."" width="80%" /> ] .footnote[Source: [@AllisonHorst](https://github.com/allisonhorst/stats-illustrations)] --- # R Markdown basics ```` --- title: "Gun deaths" date: "`r lubridate::today()`" output: html_document --- ```{r setup, include = FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ```{r packages} library(tidyverse) library(rcfss) theme_set(theme_minimal()) ``` ```{r youths} youth <- gun_deaths %>% filter(age <= 65) ``` # Gun deaths by age We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below: ```{r youth-dist, echo = FALSE} youth %>% ggplot(mapping = aes(age)) + geom_freqpoly(binwidth = 1) ``` # Gun deaths by race ```{r race-dist} youth %>% ggplot(mapping = aes(fct_infreq(race) %>% fct_rev())) + geom_bar() + coord_flip() + labs(x = "Victim race") ``` ```` --- # Major components 1. A **YAML header** surrounded by `---`s 1. **Chunks** of R code surounded by ` ``` ` 1. Text mixed with simple text formatting using the [Markdown syntax](../hw01-edit-README.html) --- # Knitting process .center[ ![](https://r4ds.had.co.nz/images/RMarkdownFlow.png)<!-- --> ] .footnote[Source: [R for Data Science](https://r4ds.had.co.nz/r-markdown.html)] --- # Text formatting with Markdown - Lightweight set of conventions for formatting plain text files - `\(\LaTeX\)` simplified -- - Demonstration Markdown file --- # Write a simple Markdown file - Edit `my-cv.md` to create a brief CV - Title should be your name - Create sections for education and employment (use headers) - Include bulleted list of jobs/degrees - Highlight the year in bold
08
:
00
--- # Render and edit an R Markdown document * Render `gun-deaths.Rmd` as an HTML document * Add text describing the frequency polygon
03
:
00
--- # Code chunks ```{r youth-dist, echo = FALSE, message = FALSE, warning = FALSE} # code goes here ``` -- * Naming code chunks * Code chunk options * `echo = FALSE` * `message = FALSE` or `warning = FALSE` * `eval = FALSE` * `include = FALSE` * `error = TRUE` * `cache = TRUE` --- # Global options ```r knitr::opts_chunk$set( echo = FALSE ) ``` --- # Inline code ``` We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. ``` -- We have data about 100798 individuals killed by guns. Only 15687 are older than 65. --- # Customize code chunk options * Set `echo = FALSE` as a global option * Enable caching as a global option and render the document. Look at the file structure for the cache. Now render the document again. Does it run faster?
07
:
00
--- # YAML header ``` --- author: Benjamin Soltoff date: '2021-11-12' title: Gun deaths output: github_document --- ``` * **Y**et **A**nother **M**arkup **L**anguage * Standardized format for storing hierarchical data in a human-readable syntax * Defines how `rmarkdown` renders your `.Rmd` file --- # HTML document ``` --- author: Benjamin Soltoff date: '2021-11-12' title: Gun deaths output: html_document --- ``` --- # Table of contents ``` --- author: Benjamin Soltoff date: '2021-11-12' title: Gun deaths output: html_document: toc: true toc_depth: 2 --- ``` --- # Appearance and style ``` --- author: Benjamin Soltoff date: '2021-11-12' title: Gun deaths output: html_document: theme: readable highlight: pygments --- ``` --- # Customize YAML header * Add a table of contents * Use the `"cerulean"` theme * Modify the figures so they are 8x6
07
:
00
--- # PDF document ``` --- author: Benjamin Soltoff date: '2021-11-12' title: Gun deaths output: pdf_document --- ``` - Requires installation of `\(\LaTeX\)` - `tinytex::install_tinytex()` --- # Core R Markdown formats Output format | Creates --------------|--------- `html_document`| .html `pdf_document` | .pdf `word_document` | Microsoft Word (.docx) `md_document` | Markdown `github_document` | Markdown for GitHub --- # Extensions of R Markdown - [`flexdashboard`](https://pkgs.rstudio.com/flexdashboard/) - [`bookdown`](https://bookdown.org/) - [`distill`](https://rstudio.github.io/distill/) - [`blogdown`](https://pkgs.rstudio.com/blogdown/) --- # Slide presentations * [ioslides](http://rmarkdown.rstudio.com/ioslides_presentation_format.html) * [reveal.js](http://rmarkdown.rstudio.com/revealjs_presentation_format.html) * [Slidy](http://rmarkdown.rstudio.com/slidy_presentation_format.html) * [Beamer](http://rmarkdown.rstudio.com/beamer_presentation_format.html) * [`xaringan`](https://bookdown.org/yihui/rmarkdown/xaringan.html) --- # `render()` ```r rmarkdown::render("my-document.Rmd", output_format = "html_document") rmarkdown::render("my-document.Rmd", output_format = "all") ``` --- # Rendering different document formats - Render `gun-deaths.Rmd` as - HTML document - PDF document - Word document
05
:
00
--- # R Markdown notebooks - Interactive version of an R Markdown document - `*.nb.html` - More similar to Jupyter Notebook - Still plain-text files for version control --- # R scripts ``` # gun-deaths.R # 2017-02-01 # Examine the distribution of age of victims in gun_deaths # load packages library(tidyverse) library(rcfss) # filter data for under 65 youth <- gun_deaths %>% filter(age <= 65) # number of individuals under 65 killed nrow(gun_deaths) - nrow(youth) # graph the distribution of youth youth %>% ggplot(aes(age)) + geom_freqpoly(binwidth = 1) # graph the distribution of youth, by race youth %>% ggplot(aes(fct_infreq(race) %>% fct_rev())) + geom_bar() + coord_flip() + labs(x = "Victim race") ``` --- # When to use a script - Software development - For troubleshooting - Initial stages of project - Building a reproducible pipeline -- - **It depends** -- ## Running scripts - Interactively - Programmatically using `source()`