Week 12: Applying Your Skills & Moving Forward

Consolidate your learning with a project, discover R Markdown, and plan your next steps in R.

Explore Chapter 12

Chapter 12: Consolidation and Future Learning

Mini Data Analysis Project.

It's time to put everything you've learned together! This mini-project aims to consolidate your skills in data import, manipulation with `dplyr`, and visualization with `ggplot2`.

Project Goal

Choose a built-in R dataset (like `mtcars`, `diamonds` from ggplot2, or `iris`) or find a simple, clean CSV dataset online (e.g., from Kaggle Datasets, data.gov). Your goal is to explore the data and answer a few simple questions using R.

Suggested Steps

  1. Choose Data & Define Questions: Select your dataset. Formulate 2-3 specific questions you want to answer using the data (e.g., "What is the relationship between horsepower and fuel efficiency in `mtcars`?", "How does diamond price vary by cut in `diamonds`?", "Are there differences in petal length between iris species?").
  2. Load Data & Packages: Load the necessary packages (`tidyverse` or `dplyr`/`ggplot2`). If using an external CSV, import it using `read.csv()` or `readr::read_csv()`.
  3. Explore & Clean (if necessary): Use functions like `str()`, `summary()`, `head()` to understand the data. Check for missing values (`NA`) and decide how to handle them (e.g., remove rows using `na.omit()` or `dplyr::filter()`).
  4. Manipulate & Analyze: Use `dplyr` verbs (`filter`, `select`, `group_by`, `summarise`, `mutate`, `arrange`) to process the data relevant to your questions. Calculate summary statistics.
  5. Visualize: Create informative plots using `ggplot2` (`geom_point`, `geom_boxplot`, `geom_bar`, etc.) to illustrate your findings. Customize labels and themes for clarity.
  6. Conclude: Briefly summarize your findings based on your analysis and visualizations. What answers did you find to your initial questions?

The process of working through even a small project like this will significantly reinforce your understanding and build confidence.

Introduction to R Markdown.

R Markdown (`.Rmd`) is a powerful framework that allows you to create dynamic, reproducible documents directly from R. It integrates your R code, the output of that code (including plots and tables), and narrative text (written using Markdown syntax) into a single file.

Why Use R Markdown?

  • Reproducibility: Your analysis, results, and explanations are all in one place, making it easy for others (and your future self) to understand and reproduce your work.
  • Automation: If your data changes, you simply "knit" the R Markdown document again to update all results and outputs automatically.
  • Multiple Output Formats: You can render your `.Rmd` file into various formats like HTML, PDF, Word documents, presentations (slides), dashboards, and more, often with minimal changes to the source file.
  • Integration: Seamlessly blends code, output, and text.

Basic Structure

An R Markdown file typically consists of:

  1. YAML Header: At the top, enclosed by `---`, specifies metadata like title, author, date, and output format.
  2. Markdown Text: Narrative text formatted using standard Markdown syntax (headings, lists, bold, italics, links, etc.).
  3. Code Chunks: Blocks of R code enclosed by `` ```{r chunk-name, options} `` and `` ``` ``. When the document is rendered ("knitted"), the code in these chunks is executed, and the results (text output, plots, tables) can be embedded in the final document according to chunk options (e.g., `echo=TRUE` shows code, `eval=TRUE` runs code, `fig.show='hold'` shows plots).
---
title: "My Analysis Report"
author: "Your Name"
date: "`r Sys.Date()`" # R code can be embedded here!
output: html_document
---

## Introduction

This is regular text written using Markdown. We can include **bold** and *italic* text.

Here is an R code chunk:

` `` `{r setup, include=FALSE}
# Chunk options like include=FALSE prevent code and output from appearing
library(tidyverse)
` `` `

## Analysis Section

We loaded the `mpg` dataset earlier. Let's create a plot:

` `` `{r scatterplot, echo=TRUE, fig.cap="MPG Scatter Plot"}
# This code and its output (the plot) will be included
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  labs(title = "Displacement vs Highway MPG")
` `` `

The plot above shows the relationship between engine displacement and fuel efficiency.

To create and render R Markdown documents, you'll typically use RStudio, which has excellent built-in support (File > New File > R Markdown..., and the "Knit" button).

Where to Go Next?

Congratulations on completing this 12-week introduction to R! You've built a solid foundation in R programming, data manipulation, and visualization. But the R world is vast! Here are some ideas for continuing your journey:

Deepen Your Skills

  • More Tidyverse: Explore `purrr` for functional programming, `lubridate` for dates/times, `stringr` for text manipulation in more detail.
  • Advanced ggplot2: Learn about facets, coordinate systems, advanced theme customization, and extension packages like `ggrepel` or `patchwork`.
  • Statistical Modeling: Dive into R's powerful modeling capabilities, covering linear models (`lm`), generalized linear models (`glm`), and machine learning packages (`caret`, `tidymodels`, `randomForest`, `xgboost`).
  • R Programming Concepts: Learn about R's object-oriented systems (S3, S4, R6), debugging tools, profiling, and creating your own packages.

Explore Specific Domains

  • Time Series Analysis: Packages like `forecast`, `xts`, `zoo`.
  • Spatial Analysis: Packages like `sf`, `sp`, `raster`, `leaflet`.
  • Bioinformatics: The Bioconductor project offers a huge range of specialized packages.
  • Interactive Visualizations & Dashboards: Packages like `shiny`, `plotly`, `flexdashboard`.

Engage with the Community

  • R-bloggers: An aggregator of R news and tutorials from numerous blogs.
  • Stack Overflow: Search for answers to specific R questions (use the `[r]` tag) and contribute answers once you're comfortable.
  • Twitter/Mastodon: Follow R users and hashtags like `#rstats`.
  • Conferences: Look for useR! (international) or local R conferences and user groups.
  • Contribute: Find an open-source R package you use and contribute bug reports, documentation improvements, or even code.

The key is to keep practicing, work on projects that interest you, and never stop learning. Good luck!

Syllabus