(x <- -5:5)#> [1] -5 -4 -3 -2 -1 0 1 2 3 4 5
Modifiend from Statistical Computing
Control Flow is the process for a computer to complete a task. There are statements that a computer will read and react when executing a tasks. This section briefly discusses the main components and statements of completing tasks in R.
A vector can be a certain data type with a set number of elements. Here we construct a vector called x increasing from -5 to 5 by one unit:
(x <- -5:5)#> [1] -5 -4 -3 -2 -1 0 1 2 3 4 5
The vector x has 11 elements. If you want to know what the 6th element of x, you can index the 6th element from a vector. To do this, we use [] square brackets on x to index it. For example, we index the 6th element of x:
x[6]#> [1] 0
When ever we use [] next to an R object, it will print out the data to a specific value inside the square brackets. We can index an R object with multiple values:
x[1:3]#> [1] -5 -4 -3
x[c(3,9)]#> [1] -3 3
Notice how the second line uses the c(). This is necessary when we want to specify non-contiguous elements. Now let’s see how we can index a matrix
A matrix can be indexed the same way as a vector using the [] brackets. However, since the matrix is a 2-dimensional objects, we will need to include a comma to represent the different dimensions: [,]. The first element indexes the row and the second element indexes the columns. To begin, we create the following \(4 \times 3\) matrix:
(x <- matrix(1:12, nrow = 4, ncol = 3))#> [,1] [,2] [,3]
#> [1,] 1 5 9
#> [2,] 2 6 10
#> [3,] 3 7 11
#> [4,] 4 8 12
Now to index the element at row 2 and column 3, use x[2, 3]:
x[2, 3]#> [1] 10
We can also index a specific row and column:
x[2,]#> [1] 2 6 10
x[,3]#> [1] 9 10 11 12
There are several ways to index a data frame, since it is in a matrix format, you can index it the same way as a matrix. Here are a couple of examples using the mtcars data frame.
mtcars[,2]#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[2,]However, a data frame has labeled components, variables, we can index the data frame with the variable names within the brackets:
mtcars[, "cyl"]#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Lastly, a data frame can be indexed to a specific variable using the $ operator:
mtcars$cyl#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Lists contain elements holding different R objects. To index a specific element of a list, you will use [[]] double brackets. Below is a toy list:
toy_list <- list(mtcars = mtcars,
vector = rep(0, 4),
identity = diag(rep(1, 3)))To access the second element, vector element, you can type toy_list[[2]]
toy_list[[2]]#> [1] 0 0 0 0
Since the elements are labeled within the list, you can place the label in quotes inside [[]]:
toy_list[["vector"]]#> [1] 0 0 0 0
The element can be accessed using the $ notation with a list:
toy_list$vector#> [1] 0 0 0 0
Lastly, you can further index the list if needed, we can access the mpg variable in mtcars from the toy_list:
toy_list$mtcars$mpg#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
#> [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
#> [31] 15.0 21.4
toy_list[["mtcars"]]$mpg#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
#> [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
#> [31] 15.0 21.4
toy_list$mtcars[,'mpg']#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
#> [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
#> [31] 15.0 21.4
In R, there are control flow functions that will dictate how a program will be executed. The first set of functions we will talk about are if and else statements. First, the if statement will evaluate a task, If the conditions is satisfied, yields TRUE, then it will conduct a certain task, if it fails, yields FALSE, the else statement will guide it to a different task. Below is a general format:
Below is an example where we generate x from a standard normal distribution and print the statement ‘positive’ or ‘non-positive’ based on the condition x > 0.
x <- rnorm(1)
## if statements
if (x > 0){
print("Positive")
} else {
print("Non-Positive")
}#> [1] "Positive"
What if we want to print the statement ‘negative’ as well if the value is negative? We will then need to add another if statement after the else statement since x > 0 only lets us know if the value is positive.
x <- rnorm(1)
if (x > 0){
print("Positive")
} else if (x < 0) {
print("Negative")
}#> [1] "Negative"
Above, we add the if statement with condition (x < 0) indicating if the number is negative. Lastly, if x is ever \(0\), we will want R to let us know it is \(0\). We can achieve this by adding one last else statement:
x <- rnorm(1)
if (x > 0){
print("Positive")
} else if (x < 0) {
print("Negative")
} else {
print("Zero")
}#> [1] "Positive"
for loopsA for loop is a way to repeat a task a certain amount of times. Every time a loop repeats a task, we state it is an iteration of the loop. For each iteration, we may change the inputs by a certain way, either from an indexed vector, and repeat the task. The general anatomy of a loop looks like:
The for statement indicates that you will repeat a task inside the brackets. The i in the parenthesis controls how the task will be completed. The in statement tells R where i can look for the values, and vectorr is a vector R object that contains the values i can be. It also controls how many times the task will be repeated based on the length of the vector.
Learning about a loop is quite challenging, my recommendation is to read the section below and break the example code so you can understand how a for loop works.
for loopLet’s say we want R to print one to five separately. We can achieve this by repeating the print() 5 times.
print(1); print(2); print(3); print(4); print(5)#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
However, this takes quite awhile to type up. Let’s try to achieve the same task using a for loop.
for (i in 1:5){
print(i)
}#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
Here, i will take a value from the vector 1:5,1 Then, R will print out what the value of i is.
Now, let’s try another example with letters. To begin, create a new vector called letters_10 containing the first 10 letters of the alphabet. Use the vector letters to construct the neww vector.
letters_10 <- letters[1:10]Now, we will use a loop to print out the first 10 letters:
for (i in 1:10) {
print(letters_10[i])
}#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
#> [1] "e"
#> [1] "f"
#> [1] "g"
#> [1] "h"
#> [1] "i"
#> [1] "j"
Here, we have i take on the values 1 through 10. Using those values, we will index the vector letters_10 by i. The resulting letter will then be printed. This task repeated 10 times.
Lastly, we can replace 1:10 by letters_10 instead:
for (i in letters_10){
print(i)
}#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
#> [1] "e"
#> [1] "f"
#> [1] "g"
#> [1] "h"
#> [1] "i"
#> [1] "j"
This is because letters_10 are the values that we want to print and i takes on the value of letters_10 each time.
for loopsA nested for loop is a loop that contain a loop within. Below is an example of 3 for loops nested within each other. Below is a general example:
As an example, we will use the greekLetter::2 and use the greek_vector vector to obtain greek letters in R. Lastly, create a vector called greek_10.
library(greekLetters)
greek_10 <- print_greeks()[1:10]#> alpha beta gamma
#> "α" "β" "γ"
#> delta epsilon zeta
#> "δ" "ε" "ζ"
#> eta theta iota
#> "η" "θ" "ι"
#> kappa lambda mu
#> "κ" "λ" "μ"
#> nu xi omicron
#> "ν" "ξ" "ο"
#> pi rho sigma
#> "π" "ρ" "σ"
#> tau upsilon phi
#> "τ" "υ" "φ"
#> chi psi omega
#> "χ" "ψ" "ω"
#> Alpha Beta Gamma
#> "Α" "Β" "Γ"
#> Delta Epsilon Zeta
#> "Δ" "Ε" "Ζ"
#> Eta Theta Iota
#> "Η" "Θ" "Ι"
#> Kappa Lambda Mu
#> "Κ" "Λ" "Μ"
#> Nu Xi Omicron
#> "Ν" "Ξ" "Ο"
#> Pi Rho Sigma
#> "Π" "Ρ" "Σ"
#> Tau Upsilon Phi
#> "Τ" "Υ" "Φ"
#> Chi Psi Omega
#> "Χ" "Ψ" "Ω"
#> infinity leftrightarrow forall
#> "∞" "⇔" "∀"
#> exist notexist emptyset
#> "∃" "∄" "∅"
#> elementof notelementof proportional
#> "∈" "∉" "∝"
#> asymptoticallyEqual notasymptoticallyEqual approxEqual
#> "≃" "≄" "≅"
#> almostEqual leq geq
#> "≈" "≤" "≥"
#> muchless muchgreater leftarrow
#> "≪" "≫" "⇐"
#> rightarrow equal notEqual
#> "⇒" "=" "≠"
#> integral doubleintegral tripleintegral
#> "∫" "∬" "∭"
#> logicalAnd logicalOr intersection
#> "∧" "∨" "∩"
#> union
#> "∪"
For this example, we want R to print “a” and “\(\alpha\)” together as demonstrated below3:
print(paste0(letters_10[1], greek_10[1]))#> [1] "aα"
Now let’s repeat this process to print all possible combinations of the first 3 letters and 3 greek letters:
for (i in 1:3){
for (ii in 1:3){
print(paste0(letters_10[i], greek_10[ii]))
}
}#> [1] "aα"
#> [1] "aβ"
#> [1] "aγ"
#> [1] "bα"
#> [1] "bβ"
#> [1] "bγ"
#> [1] "cα"
#> [1] "cβ"
#> [1] "cγ"
breakA break statement is used to stop a loop midway if a certain condition is met. A general setup of break statement goes as follows:
As you can see there is an if statement in the loop. This is used to tell R when to break the loop. If the if statement was not there, then the loop will break without iterating.
To demonstrate the break statement, we will simulate from a \(N(1,1)\) until we have 30 positive numbers or we simulate a negative number.
x <- rep(NA,length = 30)
for (i in seq_along(x)){
y <- rnorm(1,1)
if (y<0) {
break
}
else {
x[i] <- y
}
}
print(x)#> [1] 1.1536174 0.5876646 2.7237244 3.8131697 NA NA NA
#> [8] NA NA NA NA NA NA NA
#> [15] NA NA NA NA NA NA NA
#> [22] NA NA NA NA NA NA NA
#> [29] NA NA
print(y)#> [1] -0.2191125
Notice that the vector does not get filled up all the way, that is because the loop will break once a negative number is simulated
nextSimilar to the break statement, the next statement is used in loops that will tell R to move on to the next iteration if a certain condition is met.
The main difference here is that a next statement is used instead of a break statement.
Going back to simulating positive numbers, we will use the same setup but change it to a next statement.
x <- rep(NA,length = 30)
for (i in seq_along(x)){
y <- rnorm(1,1)
if (y<0) {
next
}
else {
x[i] <- y
}
}
print(x)#> [1] 2.56445323 1.28195308 NA 1.09527041 1.02739384 2.31600037
#> [7] 0.17356929 0.14414393 NA 0.83590715 0.76820287 NA
#> [13] 1.00815926 1.30286248 0.40190957 2.36875999 0.11732494 1.81505100
#> [19] 1.03719769 0.99868101 0.69572100 0.39488943 1.03677512 NA
#> [25] 1.91617203 0.23186470 0.70988818 1.26809135 NA 0.05375247
As you can see, the vector contains missing values, these were the iterations that a negative number was simulated.
while loopThe last loop that we will discuss is a while loop. The while loop is used to keep a loop running until a certain condition is met. To construct a while loop, we will use the while statement with a condition attached to it. In general, a while loop will have the following format:
Above, we see that the while statement is used followed by a condition. Then the loop will complete its task and update the condition. If the condition yields a FALSE value, then the loop will stop. Otherwise, it will continue.
while loopsTo implement a basic while loop, we will work on the previous example of simulating positive numbers. We want to simulate 30 positive numbers from \(N(0,1)\) until we have 30 values. Here, our condition is that we need to have 30 numbers. Therefore we can use the following code to simulate the values:
x <- c()
size <- 0
while (size < 30){
y <- rnorm(1)
if (y > 0) {
x <- c(x, y)
}
size <- length(x)
}
print(size)#> [1] 30
print(x)#> [1] 1.271562094 0.281040692 0.587014986 0.140418368 0.624348470 0.174733956
#> [7] 0.300641743 0.431327240 0.178739337 0.623975095 0.212282944 0.372721337
#> [13] 0.811431683 0.470319955 0.003910115 1.126360645 1.926045559 0.663370779
#> [19] 0.013265289 0.430096301 1.205223047 1.325914514 0.334917098 0.077390232
#> [25] 0.211909634 1.156161137 0.288470540 0.226115484 0.537942969 0.524022776
Notice that we do not use an else statement. This is because we do not need R to complete a task if the condition fails.
while loopsWith while loops, we must be weary about potential infinite loops. This occurs when the condition will never yield a FALSE value. Therfore, R will never stop the loop because it does not know when to do this.
For example, let’s say we are interest if \(y=sin(x)\) will converge to a certain value. As you know it will not converge to a certain value; however, we can construct a while loop:
x <- 1
diff <- 1
while (diff > 1e-20) {
old_x <- x
x <- x + 1
diff <- abs(sin(x) - sin(old_x))
}
print(x)
print(diff)My condition above is to see if the absolute difference between sequential values is smaller than \(10^{-20}\). As you may know, the absolute difference will never become that small. Therefore, the loop will continue on without stopping.
To prevent an infinite while loop, we can add a counter to the condition statement. This counter will also need to be true for the loop to continue. Therefore, we can arbitrarily stop it when the loop has iterated a certain amount of times. We just need to make sure to add one to the counter every time it iterates it. Below is the code that adds a counter to the while loop:
x <- 1
counter <- 0
diff <- 1
while (diff > 1e-20 & counter < 10^3) {
old_x <- x
x <- x + 1
diff <- abs(sin(x) - sin(old_x))
counter <- counter + 1
}
print(x)#> [1] 1001
print(diff)#> [1] 0.09311106
print(counter)#> [1] 1000
The functionality in R is what makes it completely powerful compared to other statistical software. There are several pre-built functions, and you can extend R’s functionality further with the use of R Packages.
There are several available functions in R to conduct specific statistical methods. The table below provides a set of commonly used functions:
| Functions | Description |
|---|---|
aov() |
Fits an ANOVA Model |
lm() |
Fits a linear model |
glm() |
Fits a general linear model |
t.test() |
Conducts a t-test |
Several of these functions have help documentation that provide the following sections:
| Section | Description |
|---|---|
| Description | Provides a brief introduction of the function |
| Usage | Provides potential usage of the function |
| Arguments | Arguments that the function can take |
| Details | An in depth description of the function |
| Value | Provides information of the output produced by the function |
| Notes | Any need to know information about the function |
| Authors | Developers of the function |
| References | References to the model and function |
| See Also | Provide information of supporting functions |
| Examples | Examples of the function |
To obtain the help documentation of each function, use the ? operator and function name in the console pane.
Commonly used functions, such as summary() and plot() functions, are considered generic functions where their functionality is determined by the class of an R object. For example, the summary() function is a generic function for several types of functions: summary.aov(), summary.lm(), summary.glm(), and many more. Therefore, the appropriate function is needed depending the type of R object. This is where generic functions come in. We can use a generic function, ie summary(), to read the type of object and then apply to correct procedure to the object.
While R has many capable functions that can be used to analyze your data, you may need to create a custom function for specific needs. For example, if you find yourself writing the same to repeat a task, you can wrap the code into a user-built function and use it for analysis.
To create a user-built function, you will using the function() to create an R object that is a function. To use the function Inside the funtion() parentheses, write the arguments that need to specified for your function. These are arguments you choose for the function.
In general function we construct a function with the following anatomy:
name_of_function <- function(data_1, data_2 = NULL,
argument_1, argument_2 = TRUE, argument_3 = NULL,
...){
# Conduct Task
# Conduct Task
output_object <- Tasks
return(output_object)
}Here, we are creating an R function called name_of_function that will take the following arguments: data_1, data_2, argument_1, argument_2, argument_3, and .... From this function, it requires us to supply data for data_1 and argument_1. Arguments data_2 and argument_3 are not required, but can be utilized in the function if necessary. Argument argument_2 is also required for the function, but it it has a default setting (in this case TRUE) if it is not specified. Lastly, the ... argument allows you to pass other arguments to R built in functions if they are present. For example, we may use the plot() to create graphics and want to manipulate the output plot further, but do not want to specify the arguments in the user-based function. In the function itself, we will complete the necessary tasks and then use the return() to return the output.
To begin, let’s create a function that squares any value:
x_square <- function(x){x^2}Above, a new function called x_square is being created and it will take values of x and square it. Here are a couple of examples of x_square():
x_square(4)#> [1] 16
x_square(5)#> [1] 25
The mtcars data set has several numeric variables that can be used for analysis. Let’s say we want to apply a function (x_square()) to the sum of a specific variable and return the value. Then let’s further complicate the function by allowing the sum of 2 variables, take the log of the sum and dividing the value if necessary. Below is the code for such function called summing:
summing <- function(vec1, vec2 = NULL, FUN, log_val = FALSE, divisor_val = NULL){
FUN <- match.fun(FUN)
wk_vec <- c(vec1, vec2)
fun_sum_val <- FUN(sum(wk_vec))
lval <- NULL
if (isTRUE(log_val)){
lval <- log(fun_sum_val)
} else {
lval <- fun_sum_val
}
if (!is.null(divisor_val)){
dval <- divisor_val
} else {dval <- 1}
output <- lval/dval
return(output)
}Now let’s try obtaining the
sum(mtcars$mpg)^2#> [1] 413320.4
summing(mtcars$mpg, FUN = x_square)#> [1] 413320.4
log(sum(c(mtcars$mpg,mtcars$disp))^2)#> [1] 17.98088
summing(mtcars$mpg, mtcars$disp, x_square, T)#> [1] 17.98088
log(sum(c(mtcars$mpg,mtcars$disp))^2)/5#> [1] 3.596177
summing(mtcars$mpg, mtcars$disp, x_square, T, 5)#> [1] 3.596177
*apply functions are used to iterate a function through a set of elements in a vector, matrix, or list. The process will return a vector or list depending on what is requested.
apply()The apply() function is used to apply a function to the margins of an array or matrix. It will iterate between the elements, apply a function to the data, and return a vector, array or list if necessary. To use the apply() function, you will need to specify three arguments, X or the array, MARGIN which margin to apply the function on, and FUN the function.
Below we calculate the row means and column means using the apply function for a \(5 \times 4\) matrix containing the elements 1 through 20:
x <- matrix(1:20, nrow = 5, ncol = 4)
# Row Means
apply(x, 1, mean)#> [1] 8.5 9.5 10.5 11.5 12.5
# Col Means
apply(x, 2, mean)#> [1] 3 8 13 18
lapply()The lapply() function is used to apply a function to all elements in a vector or list. The lapply() function will then return a list as the output.
sapply()The sapply() function is used to apply a function to all elements in a vector or list. Afterwards, the sapply() will return a “simplified” version of the list format. This could be a vector, matrix, or array.
Anonymous functions are functions that R temporarily creates to conduct a task. They are commonly used with the *apply functions, piping or within functions. To create an anonymous function, we use the function() function to create a function.
For example, let x be a vector with the values 1 through 15. Let’s say we want to apply the function \(f(x) = x^2+\ln(x) + e^x/x!\). We can evaluate the function as the expression in the function:
x <- 1:15
x^2 + log(x) + exp(x)/factorial(x)#> [1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
#> [7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
#> [13] 171.565020 198.639071 227.708053
Let’s say we could not do that, we need to evaluate the function for each value of x. We can use the sapply() function with an anonymous function:
sapply(x, function(x) x^2 + log(x) + exp(x) / factorial(x))#> [1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
#> [7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
#> [13] 171.565020 198.639071 227.708053
In R 4.1.0, developers introduce a shortcut approach to create functions. You can create a function using \() expression, and specify the arguments for your function within the parenthesis. Reworking the previous code, we can use \() instead of function():
sapply(x, \(x) x^2 + log(x) + exp(x)/factorial(x))#> [1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
#> [7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
#> [13] 171.565020 198.639071 227.708053
sapply(x, \(.) .^2 + log(.) + exp(.)/factorial(.))#> [1] 3.718282 8.387675 13.446202 19.661217 27.846214 38.352077
#> [7] 51.163496 66.153374 83.219555 102.308655 123.399395 146.485246
#> [13] 171.565020 198.639071 227.708053
Notice that the argument in the anonymous function can be anything.
This section provides the basic components to script an R file.
A comment is used to describe your code within an R Script. To comment your code in R, you will use the # key, and R will not execute any code after the symbol. The # key can be used to anywhere in the line, from beginning to midway. It will not execute any code coming after the #.
Additionally, commenting is a great way to debug long scripts of code or functions. You comment certain lines to see if any errors are being produced. It can be used to test code line by line with out having to delete everything.
When writing a script, it is important to follow a basic structure for you to follow your code. While this structure can be anything, the following sections below has my main recommendations for writing a script. The most important part is the Beginning of the Script section.
Load any R packages, functions/scripts, and data that you will need for the analysis. It is also recommended to record the date the script is being executed.
## Todays data
analysis_data <- format(Sys.time(),"%Y-%m-%d-%H-%M")
## R Packages
library(tidyverse)
library(magrittr)
## Functions
source("fxs.R")
Rcpp::sourceCpp("fxs.cpp")
## Data
df1 <- read_csv("file.csv")
df2 <- load("file.RData") %>% getRun the analysis, including pre and post analysis.
## Pre Analysis
df1_prep <- Prep_data(df1)
df2_prep <- Prep_data(df2)
## Analysis
df1_analysis <- analyze(df1_prep)
df2_analysis <- analyze(df2_prep)
## Post Analysis
df1_post <- Prep_post(df1_anlysis)
df2_post <- Prep_post(df2_anlysis)Save your results in an R Data file:
## Save Results
res <- list(df1 = list(pre = df1_prep,
analysis = df1_analysis,
post = df1_post),
df2 = list(pre = df2_prep,
analysis = df2_analysis,
post = df2_post))
file_name <- paste0("results_", analysis_data, ".RData")
save(res, file = file_name)In R, pipes are used to transfer the output from one function to the input of another function. Piping will then allow you to chain functions to run an analysis. Since R 4.1.0, there are two version of pipes, the base R pipe and the pipes from the magrittr package. The table below provides a brief description of each type pipes
| Pipe | Name | Package | Description |
|---|---|---|---|
|> |
R Pipe | Base | This pipe will use the output of the previous function as the input for the first argument following function. |
%>% |
Forward Pipe | magrittr | The forward pipe will use the output of the previous function as the input of the following function. |
%$5 |
Exposition Pipe | magrittr | The exposition function will expose the named elements of an R object (or output) to the following function. |
%T>% |
Tee Pipe | magrittr | The Tee pipe will evaluate the next function using the output of the previous function, but it will not retain the output of the next function and utilize the output of the previous function. |
%<>% |
Assignment Pipe | magrittr | The assignment pipe will rewrite the object that is being piped into the next function. |
Ehen using the pipe, it is recommend to only string a limited amount of functions (~10) to maintain code readability and conciseness. Any more functions may make the code incoherent.
If you plan to use magrittr’s pipe, it is recommend to load the magrittr package instead of tidyverse package.
library(magrittr)|>The base pipe will use the output from the first function and use it as the input of the first argument in the second function. Below, we obtain the mpg variable from mtcars and pipe it in the mean() function.
mtcars$mpg |> mean()#> [1] 20.09062
%>%Magrittr’s pipe is the equivalent of Base R’s pipe, with some extra functionality. Below we repeat the same code as before:
mtcars$mpg %>% mean()#> [1] 20.09062
Alternatively, we do not have to type the parenthesis in the second function:
mtcars$mpg %>% mean#> [1] 20.09062
Below is another example where we will pipe the value 3 into the rep() with times=5, this will repeat the value 3 five times:
3 %>% rep(5)#> [1] 3 3 3 3 3
If we are interested in piping the output to another argument other than the first argument, we can use the (.) placeholder in the second function to indicate which argument should take the previous output. Below, we repeat the vector c(1, 2) three times because the . is in the second argument:
3 %>% rep(c(1,2), .)#> [1] 1 2 1 2 1 2
You can use %>% and . to create unary functions, a function with one argument, can be created. The following code will create a new function called logsqrt() which evaluates \(\sqrt{\log(x)}\):
logsqrt <- . %>% log(base = 10) %>% sqrt
logsqrt(10000)#> [1] 2
sqrt(log10(10000))#> [1] 2
%$%The exposition pipe will expose the named elements of an object or output to the following function. For example, we will pipe the mtcars into the lm() function. However, we will use the %$% pipe to access the variables in the data frame for the formula= argument without having to specify the data= argument:
mtcars %$% lm(mpg ~ hp)#>
#> Call:
#> lm(formula = mpg ~ hp)
#>
#> Coefficients:
#> (Intercept) hp
#> 30.09886 -0.06823
%T>%The Tee pipe will pipe the contents of the previous function into the following function, but will retain the previous functions output instead of the current function. For example, we use the Tee pipe to push the results from the lm() function to print out the summary table, then use the same lm() function results to print out the model standard error:
x_lm <- mtcars %$% lm(mpg ~ hp) %T>%
(\(x) print(summary(x))) %T>%
(\(x) print(sigma(x)))#>
#> Call:
#> lm(formula = mpg ~ hp)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.7121 -2.1122 -0.8854 1.5819 8.2360
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
#> hp -0.06823 0.01012 -6.742 1.79e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.863 on 30 degrees of freedom
#> Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
#>
#> [1] 3.862962
Below is a list of recommended keyboard shortcuts:
| Shortcut | Windows/Linux | Mac |
|---|---|---|
%>% |
Ctrl+Shift+M | Cmd+Shift+M |
| Run Current Line | Ctrl+Enter | Cmd+Return |
| Run Current Chunk | Ctrl+Shift+Enter | Cmd+Shift+Enter |
| Knit Document | Ctrl+Shift+K | Cmd+Shift+K |
| Add Cursor Below | Ctrl+Alt+Down | Cmd+Alt+Down |
| Comment Line | Ctrl+Shift+C | Cmd+Shift+C |
It is recommended to modify these keyboard shortcuts in RStudio
| Shortcut | Windows/Linux | Mac |
|---|---|---|
%in% |
Ctrl+Shift+I | Cmd+Shift+I |
%$% |
Ctrl+Shift+D | Cmd+Shift+D |
%T>% |
Ctrl+Shift+T | Cmd+Shift+T |
Note you will need to install the extraInserts package:
remotes::install_github('konradzdeb/extraInserts')