Week 11: Polishing Plots and Basic Statistics
Learn to customize ggplot2 visualizations, save them, and perform basic statistical analysis in R.
Explore Chapter 11Basic Statistical Functions in R.
Beyond visualization, R is fundamentally a statistical programming language. Base R includes many functions for calculating common descriptive statistics.
These functions typically operate on numeric vectors. Missing values (`NA`) often require special handling (e.g., using the `na.rm = TRUE` argument).
data_vector <- c(1, 5, 2, 8, 3, NA, 7)
# Mean (Average)
mean(data_vector) # Output: NA (because of the NA value)
mean(data_vector, na.rm = TRUE) # Output: [1] 4.333333
# Median (Middle value)
median(data_vector, na.rm = TRUE) # Output: [1] 4
# Standard Deviation
sd(data_vector, na.rm = TRUE) # Output: [1] 2.734263
# Variance
var(data_vector, na.rm = TRUE) # Output: [1] 7.47619
# Minimum and Maximum
min(data_vector, na.rm = TRUE) # Output: [1] 1
max(data_vector, na.rm = TRUE) # Output: [1] 8
# Range (Minimum and Maximum)
range(data_vector, na.rm = TRUE) # Output: [1] 1 8
# Quantiles (e.g., 0%, 25%, 50%, 75%, 100%)
quantile(data_vector, na.rm = TRUE)
# Output:
# 0% 25% 50% 75% 100%
# 1.00 2.25 4.00 6.50 8.00
These functions are building blocks for exploring and understanding your data.