```{r setup, include=FALSE}
library(tidyverse)
adv_plot <- ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") + ylab("Horse Power") +
scale_color_discrete(labels = c("V-Shaped", "Straight"),
name = "") +
theme_bw()
```
## Presentation Online
Presentation:
[www.inqs.info/files/hiss_3/hiss_3.html](https://www.inqs.info/files/hiss_3/hiss_3.html)
RMD:
[www.inqs.info/files/hiss_3/hiss_3.qmd](https://www.inqs.info/files/hiss_3/hiss_3.qmd)
Website:
[www.inqs.info](https://www.inqs.info)
Email:
iquin002\@ucr.edu
# Data Cleaning
## dplyr
- Known as the Grammar of Data Manipulation
- [dplyr.tidyverse.org](https://dplyr.tidyverse.org/)
## dplyr Functions
- `mutate()` adds new variables
- `select()` selects variables
- `filter()` filters data
- `if_else()` conditional function that returns 2 values
- `group_by()` a dataset is grouped by factors
- `summarise()` provides summaries of data
## tidyr
- Used to create tidy data
- [tidyr.tidyverse.org](https://tidyr.tidyverse.org/)
## tidyr Functions
- `pivot_longer()` (formerly `gather()`) transforms the data from wide to long
- `pivot_wider()` (formerly `spread()`) transforms the data from long to wide
- `separate()` separates a one variable to multiple variables
- `unite()` merge multiple variable to one variable
## Pipe Operator `%>%`
- The pipe operator is the real power of tidyverse.
- It takes the output of a function and uses it as input for another function.
- Tidyverse works best when data frames (tibbles) are used a inputs.
## Data Set
- We will work on manipulating the `mtcars` data set
- Below prints out the code:
::: fragment
```{r}
mtcars %>%
head(n=3)
```
:::
## `mutate()`
- Adds a new variable to a data frame
- Example:
::: fragment
```{r}
#| code-line-numbers: "|2"
mtcars %>%
mutate(log_mpg=log(mpg)) %>%
head(n=3)
```
:::
## `mutate()`
- Each argument adds a new variable added
- Example:
::: fragment
```{r}
#| code-line-numbers: "|2"
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
head(n=3)
```
:::
## `select()`
-This selects the variables to keep in the data frame
-Example:
::: fragment
```{r}
#| code-line-numbers: "|3"
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
head(n=3)
```
:::
## `filter()`
- Selects observations that satisfy a condition
- Example:
::: fragment
```{r}
#| code-line-numbers: "|4"
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
head(n=3)
```
:::
## `if_else()`
- A function that provides T (1) if the condition is met and F (0) otherwise
- Example:
::: fragment
```{r}
#| code-line-numbers: "|5"
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
head(n=3)
```
:::
## `group_by()`
- This groups the data frame
- Example:
::: fragment
```{r}
#| code-line-numbers: "|6"
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
group_by(hilhp) %>%
head(n=3)
```
:::
## `summarise()`
- Creates summary statistics for variables
::: fragment
```{r}
#| code-line-numbers: "|7-8"
mtcars %>%
mutate(log_mpg=log(mpg),log_hp=log(hp)) %>%
select(mpg,log_mpg,hp,log_hp) %>%
filter(log_hp<5) %>%
mutate(hilhp=if_else(log_hp>mean(log_hp),1,0)) %>%
group_by(hilhp) %>%
summarise(mean_mpg=mean(mpg),mean_lmpg=mean(log_mpg),
sd_mpg=sd(mpg),sd_lmpg=sd(log_mpg)) %>%
head(n=3)
```
:::
# Wide to Long Example
## Wide to Long Data Example
We work on converting data from wide to long using the functions in the tidyr package. For many statistical analysis, long data is necessary.
## Load Data
Use the `read_csv()` to read `data_3_4.csv` into an object called `data1`;
```{r,message=F}
data1 <- read_csv(file="http://www.inqs.info/files/hiss_3/data_3_4.csv")
```
## Wide Data
```{r}
#| echo: false
names(data1)
head(data1)
```
## Long Data
```{r,include=FALSE}
data1_long <- data1 %>% pivot_longer(`v1/mean`:`v4/median`,"measurement","value") %>%
separate(measurement,c("time","stat"),sep="/") %>%
pivot_wider(names_from = stat,values_from = value)
```
```{r}
#| echo: false
head(data1_long, n = 10)
```
## `pivot_longer()`
- The `pivot_longer()` function grabs the variables that repeated in an observation places them in one variable:
::: fragment
```{r}
#| code-line-numbers: "|2"
data1 %>%
pivot_longer(cols=`v1/mean`:`v4/median`,names_to = "measurement",values_to = "value") %>%
head()
```
:::
## `separate()`
- The `separate()` function will separate a variable to multiple variables:
::: fragment
```{r}
#| code-line-numbers: "|3"
data1 %>%
pivot_longer(cols=`v1/mean`:`v4/median`,names_to = "measurement",values_to = "value") %>%
separate(col=measurement,into=c("time","stat"),sep="/") %>%
head()
```
:::
## `pivot_wider()`
- The `pivot_wider()` function then converts long data to wide data.
::: fragment
```{r}
#| code-line-numbers: "|4"
data1 %>%
pivot_longer(`v1/mean`:`v4/median`,"measurement","value") %>%
separate(measurement,c("time","stat"),sep="/") %>%
pivot_wider(names_from = stat,values_from = value) %>%
head()
```
:::
# Graphics
## ggplot2
- Known as the Grammar of Graphics
- [ggplot2.tidyverse.org](https://ggplot2.tidyverse.org/)
## Basics
- ggplot2 creates a plot by layering graphical elements on top of a plot
- A base plot is created with the data
- The data must be a data frame or tibble
- Additional layers are added to base plot with `+` sign
## Using ggplot2
- Create Base Plot
- Add geometrical Elements
- Customize Plot
- Google
## Base Plot
- A base plot is created using `ggplot2()`
- `data`: specifies data frame to construct the base plot
- `mapping`: specifies the aesthetic mapping for the plot
- `aes()`: creates the mapping function
::: fragment
```{r}
base_plot <- ggplot(mtcars, aes(x=mpg))
```
:::
## Base Plot
```{r}
base_plot
```
## Univariate
- Histograms
- `geom_histogram()`
- Density Plots
- `geom_density()`
- qq plot
- `geom_qq()`
- `geom_qq_line()`
## Histograms
```{r}
base_plot + geom_histogram()
```
## Density Plot
```{r}
base_plot + geom_density()
```
## QQ Plot
```{r}
ggplot(mtcars, aes(sample = mpg)) +
geom_qq() +
geom_qq_line()
```
## Bivariate
- Scatter Plot
- `geom_point()`
- Line Plot
- `geom_line()`
## Bivariate Base Plot
```{r}
base_plot2 <- ggplot(mtcars, aes(x=mpg, y = hp))
base_plot2
```
## Scatter Plot
```{r}
base_plot2 + geom_point()
```
## Line Plot
```{r}
base_plot2 + geom_line()
```
## Line & Scatter Plot
```{r}
base_plot2 +
geom_point() +
geom_line()
```
## Special Cases
::: columns
::: {.column width="50%"}
### Bivariate
- Heat Map
- `geom_bin2d()`
- Contour Map
- `geom_density_2d()`
:::
::: {.column width="50%"}
### Trivariate
- Heat Map
- `geom_contour_filled()`
- Contour Map
- `geom_contour()`
:::
:::
## Heat Map
```{r}
base_plot2 + geom_bin2d()
```
## Contour Map
```{r}
base_plot2 +
geom_density2d()
```
## Trend Lines
- Regression Line
- `geom_smooth(method = "lm")`
- LOESS
- `geom_smooth()`
## Regression Line
```{r}
base_plot2 +
geom_point() +
geom_smooth(method = "lm")
```
## LOESS Line
```{r}
base_plot2 +
geom_point() +
geom_smooth()
```
## Grouping Plots
- Faceting: Facet allows you to subset the data by a categorical variable
- `facet_grid()`
- `facet_wrap()`
- Grouping can be done within the mapping function: `aes()`
- `color`
- `group`
- `shape`
## Facet
```{r}
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point() +
facet_grid(vars(cyl))
```
## Mapping
```{r}
ggplot(mtcars, aes(x = mpg, y = hp, col = factor(cyl))) +
geom_point()
```
## Customization
- Title
- `ggtitle()`
- Labels
- X Label: `xlab()`
- Y Label: `ylab()`
## Themes
- The `theme()` function allows you to change any component in the plot
- ggplot2 has several prebuilt themes:
- `theme_bw()`
- `theme_void()`
- Legends can be adjusted using the `scale_XX_YY()`
- `XX`: the type grouping factor
- `YY`: the type variable
## Advanced Example {auto-animate="true"}
![](adv_plot.png){fig-align="center"}
## Advanced Example {auto-animate="true"}
::: columns
::: {.column width="50%"}
![](adv_plot.png)
:::
::: {.column width="50%"}
- Base Plot
- Scatter Plot
- Add Regression Line
- Split The Plot
- Change the Labels
- Adjust the Legend
- Change the theme
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs)))
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs)))
```
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point()
```
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm")
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm")
```
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic")))
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(`1` = "Manual",
`0` = "Automatic")))
```
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power")
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power")
```
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power") +
scale_color_discrete(
labels = c("V-Shaped", "Straight"),
name = "")
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power")+
scale_color_discrete(
labels = c("V-Shaped", "Straight"),
name = "")
```
:::
:::
## Plot Code {auto-animate="true" visibility="uncounted"}
::: columns
::: {.column width="60%"}
```{r}
#| eval: false
ggplot(mtcars,
aes(mpg, hp,
color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power") +
scale_color_discrete(
labels = c("V-Shaped", "Straight"),
name = "") +
theme_bw()
```
:::
::: {.column width="40%"}
```{r}
#| echo: false
ggplot(mtcars, aes(mpg, hp, color = factor(vs))) +
geom_point()+
geom_smooth(method = "lm") +
facet_grid(cols = vars(am),
labeller = as_labeller(c(
`1` = "Manual",
`0` = "Automatic"))) +
ggtitle("Mtcars Plot") +
xlab("Miles Per Gallon") +
ylab("Horse Power")+
scale_color_discrete(
labels = c("V-Shaped", "Straight"),
name = "") +
theme_bw()
```
:::
:::
## Final Thoughts
- Google is your friend!
- Practice!
- Read the documentation!
- Utilize Cheatsheets!
## Resources
-
-
-
-