4  Scripting and Piping in R

4.1 Commenting

A comment is used to describe your code within an R Script. To comment your code in R, you will use the # key, and R will not execute any code after the symbol. The # key can be used to anywhere in the line, from beginning to midway. It will not execute any code coming after the #.

Additionally, commenting is a great way to debug long scripts of code or functions. You comment certain lines to see if any errors are being produced. It can be used to test code line by line with out having to delete everything.

4.2 Scripting

When writing a script, it is important to follow a basic structure for you to follow your code. While this structure can be anything, the following sections below has my main recommendations for writing a script. The most important part is the Beginning of the Script section.

4.2.1 Beginning of the Script

Load any R packages, functions/scripts, and data that you will need for the analysis. I always like to get the date and time of the

## Todays data 
analysis_data <- format(Sys.time(),"%Y-%m-%d-%H-%M")

## R Packages
library(tidyverse)
library(magrittr)

## Functions
source("fxs.R")
Rcpp::sourceCpp("fxs.cpp")

## Data
df1 <- read_csv("file.csv")
df2 <- load("file.RData") %>% get

4.2.2 Middle of the Script

Run the analysis, including pre and post analysis.

## Pre Analysis
df1_prep <- Prep_data(df1)
df2_prep <- Prep_data(df2)

## Analysis
df1_analysis <- analyze(df1_prep)
df2_analysis <- analyze(df2_prep)

## Post Analysis
df1_post <- Prep_post(df1_anlysis)
df2_post <- Prep_post(df2_anlysis)

4.2.3 End of the Script

Save your results in an R Data file:

## Save Results
res <- list(df1 = list(pre = df1_prep,
                       analysis = df1_analysis,
                       post = df1_post),
            df2 = list(pre = df2_prep,
                       analysis = df2_analysis,
                       post = df2_post))
file_name <- paste0("results_", analysis_data, ".RData")
save(res, file = file_name)

4.3 Pipes

In R, pipes are used to transfer the output from one function to the input of another function. Piping will then allow you to chain functions to run an analysis. Since R 4.1.0, there are two version of pipes, the base R pipe and the pipes from the magrittr package. The table below provides a brief description of each type pipes

Pipe Name Package Description
|> R Pipe Base This pipe will use the output of the previous function as the input for the first argument following function.
%>% Forward Pipe magrittr The forward pipe will use the output of the previous function as the input of the following function.
%$5 Exposition Pipe magrittr The exposition function will expose the named elements of an R object (or output) to the following function.
%T>% Tee Pipe magrittr The Tee pipe will evaluate the next function using the output of the previous function, but it will not retain the output of the next function and utilize the output of the previous function.
%<>% Assignment Pipe magrittr The assignment pipe will rewrite the object that is being piped into the next function.

When choosing between Base or magrittr’s pipes, I recommend using magrittr’s pipes due to the extended functionality. However, when writing production code or developing an R package, I recommend using the Base R pipe.

Lastly, when using the pipe, I recommend only stringing a limited amount of functions (~10) to maintain code readability and conciseness. Any more functions may make the code incoherent.

If you plan to use magrittr’s pipe, I recommend loading magrittr package instead of tidyverse package.

library(magrittr)

4.3.1 |>

The base pipe will use the output from the first function and use it as the input of the first argument in the second function. Below, we obtain the mpg variable from mtcars and pipe it in the mean() function.

mtcars$mpg |> mean()
[1] 20.09062

4.3.2 %>%

4.3.2.1 Uses

Magrittr’s pipe is the equivalent of Base R’s pipe, with some extra functionality. Below we repeat the same code as before:

mtcars$mpg %>% mean()
[1] 20.09062

Alternatively, we do not have to type the parenthesis in the second function:

mtcars$mpg %>% mean
[1] 20.09062

Below is another example where we will pipe the value 3 into the rep() with times=5, this will repeat the value 3 five times:

3 %>% rep(5)
[1] 3 3 3 3 3

If we are interested in piping the output to another argument other than the first argument, we can use the (.) placeholder in the second function to indicate which argument should take the previous output. Below, we repeat the vector c(1, 2) three times because the . is in the second argument:

3 %>% rep(c(1,2), .)
[1] 1 2 1 2 1 2

4.3.2.2 Creating Unary Functions

You can use %>% and . to create unary functions, a function with one argument, can be created. The following code will create a new function called logsqrt() which evaluates \(\sqrt{\log(x)}\):

logsqrt <- . %>% log(base = 10) %>% sqrt
logsqrt(10000)
[1] 2
sqrt(log10(10000))
[1] 2

4.3.3 %$%

The exposition pipe will expose the named elements of an object or output to the following function. For example, we will pipe the mtcars into the lm() function. However, we will use the %$% pipe to access the variables in the data frame for the formula= argument without having to specify the data= argument:

mtcars %$% lm(mpg ~ hp)

Call:
lm(formula = mpg ~ hp)

Coefficients:
(Intercept)           hp  
   30.09886     -0.06823  

4.3.4 %T>%

The Tee pipe will pipe the contents of the previous function into the following function, but will retain the previous functions output instead of the current function. For example, we use the Tee pipe to push the results from the lm() function to print out the summary table, then use the same lm() function results to print out the model standard error:

x_lm <- mtcars %$% lm(mpg ~ hp) %T>% 
  (\(x) print(summary(x))) %T>% 
  (\(x) print(sigma(x)))

Call:
lm(formula = mpg ~ hp)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

[1] 3.862962

4.4 Keyboard Shortcuts

Below is a list of recommended keyboard shortcuts:

Shortcut Windows/Linux Mac
%>% Ctrl+Shift+M Cmd+Shift+M
Run Current Line Ctrl+Enter Cmd+Return
Run Current Chunk Ctrl+Shift+Enter Cmd+Shift+Enter
Knit Document Ctrl+Shift+K Cmd+Shift+K
Add Cursor Below Ctrl+Alt+Down Cmd+Alt+Down
Comment Line Ctrl+Shift+C Cmd+Shift+C

I recommend modify these keyboard shortcuts in RStudio

Shortcut Windows/Linux Mac
%in% Ctrl+Shift+I Cmd+Shift+I
%$% Ctrl+Shift+D Cmd+Shift+D
%T>% Ctrl+Shift+T Cmd+Shift+T

Note you will need to install the extraInserts package:

remotes::install_github('konradzdeb/extraInserts')