Week 8: Wrangling Data with dplyr
Enter the Tidyverse and learn essential data manipulation verbs.
Explore Chapter 8Core `dplyr` Verbs.
`dplyr` revolves around a few key verbs (functions) that take a data frame as the first argument and return a modified data frame. We'll use the built-in `iris` dataset for examples.
# Load dplyr if not already loaded
library(dplyr)
# View the structure of the iris dataset
str(iris)
1. `filter()`: Subset rows based on conditions.
Selects rows that meet specified logical criteria.
# Filter for rows where Species is 'setosa'
setosa_data <- filter(iris, Species == "setosa")
head(setosa_data)
# Filter for rows where Sepal.Length is greater than 7
long_sepals <- filter(iris, Sepal.Length > 7)
head(long_sepals)
# Filter with multiple conditions (Species is virginica AND Petal.Width > 2)
virginica_wide <- filter(iris, Species == "virginica" & Petal.Width > 2)
head(virginica_wide)
2. `select()`: Subset columns by name.
Keeps or removes columns.
# Select only the Sepal.Length and Species columns
sepal_species <- select(iris, Sepal.Length, Species)
head(sepal_species)
# Select all columns *except* Species
no_species <- select(iris, -Species)
head(no_species)
# Select columns starting with "Petal"
petal_cols <- select(iris, starts_with("Petal"))
head(petal_cols)
3. `arrange()`: Reorder rows based on column values.
Sorts the rows.
# Arrange rows by Petal.Length (ascending by default)
arranged_data <- arrange(iris, Petal.Length)
head(arranged_data)
# Arrange rows by Petal.Length in descending order
arranged_desc <- arrange(iris, desc(Petal.Length))
head(arranged_desc)
# Arrange by Species, then by Sepal.Width within each species
arranged_multi <- arrange(iris, Species, Sepal.Width)
head(arranged_multi)
4. `mutate()`: Create new columns or modify existing ones.
Computes and adds new variables.
# Create a new column 'Petal.Area'
iris_with_area <- mutate(iris, Petal.Area = Petal.Length * Petal.Width)
head(iris_with_area)
# Create multiple new columns
iris_mod <- mutate(iris,
Petal.Area = Petal.Length * Petal.Width,
Sepal.Area = Sepal.Length * Sepal.Width
)
head(iris_mod)
These verbs form the foundation of data manipulation with `dplyr`. We'll cover `summarise()` and `group_by()` next week.