Week 1: Your First Steps with R

Discover the R language and set up your RStudio environment.

Start Learning

Chapter 1: Introduction to R and Your Environment

What is R? Its History and Applications.

Welcome to the world of R! R is a powerful open-source programming language and software environment primarily designed for statistical computing and graphical representation. It's widely used by statisticians, data scientists, researchers, and analysts across various fields.

A Brief History

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, in the early 1990s. It's an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. R's name comes partly from the first names of its authors and partly as a play on the name of S.

Key Characteristics

  • Interpreted: R commands are executed directly without needing prior compilation.
  • Vectorized Operations: R excels at operations on vectors, matrices, and data frames, making code more concise and often faster for data manipulation tasks.
  • Statistical Focus: It has a vast collection of built-in functions and packages specifically for statistical analysis and modeling.
  • Powerful Graphics: R is renowned for its high-quality static and dynamic graphics capabilities, especially through packages like `ggplot2`.
  • Extensible: The Comprehensive R Archive Network (CRAN) hosts thousands of user-contributed packages that extend R's functionality for almost any task imaginable.
  • Open Source: Free to use, distribute, and modify, with a large and active community.

Diverse Applications

R is a cornerstone in:

  • Data Analysis & Statistics: From basic descriptive statistics to complex modeling.
  • Data Visualization: Creating publication-quality plots and charts.
  • Machine Learning: Numerous packages for various ML algorithms.
  • Bioinformatics & Genomics: A standard tool in biological research.
  • Finance & Econometrics: Used extensively for financial modeling and analysis.
  • Social Sciences: Applied for survey analysis, modeling social phenomena.

Why Learn R? Advantages and Use Cases.

Choosing to learn R offers significant advantages, particularly if your interests lie in data analysis, statistics, or research:

  • Leading Tool for Data Science: R is one of the dominant languages in data science, especially for statistical modeling and visualization.
  • Exceptional Graphics: R's visualization capabilities (especially with `ggplot2`) are arguably best-in-class for creating insightful and beautiful plots.
  • Vast Package Ecosystem (CRAN): Access to thousands of specialized, cutting-edge packages for statistics, machine learning, visualization, reporting, and more.
  • Strong Community Support: A large, active, and supportive community provides help through forums (like Stack Overflow), mailing lists, and conferences.
  • Designed for Data Analysis: Its syntax and data structures (vectors, data frames) are inherently suited for data manipulation and analysis tasks.
  • Reproducibility: Tools like R Markdown allow seamless integration of code, results, and narrative text for reproducible research and reporting.
  • High Demand in Relevant Fields: Proficiency in R is highly valued in roles involving data analysis, statistics, research, and quantitative analysis.

Use Cases to Inspire You

Consider these real-world applications:

  • Analyzing clinical trial data in pharmaceuticals.
  • Building financial models for risk assessment.
  • Visualizing geographical data for environmental studies.
  • Conducting statistical analysis for academic research papers.
  • Developing machine learning models for customer churn prediction.

Setting up Your Environment: Installing R & RStudio.

To start coding in R, you need two key components: R itself (the language interpreter) and RStudio (an Integrated Development Environment - IDE - that makes working with R much easier).

Step 1: Installing R

R must be installed first. Download the appropriate version for your operating system from the official Comprehensive R Archive Network (CRAN).

  • Go to https://cran.r-project.org/
  • Click on the link corresponding to your operating system (Linux, macOS, or Windows).
  • Follow the instructions on the page to download and install the latest base distribution of R. Accept the default settings during installation.

Step 2: Installing RStudio

RStudio provides a user-friendly interface with features like a code editor, console, environment viewer, plotting window, package manager, and debugger.

  • Go to the Posit website (the company behind RStudio): https://posit.co/download/rstudio-desktop/
  • Download the free RStudio Desktop version suitable for your operating system.
  • Run the installer, accepting the default settings. RStudio should automatically detect your R installation.

Once both are installed, launch RStudio. This will be your primary workspace for learning and using R.

Tour of RStudio: Your R Workbench.

RStudio provides an integrated environment that makes working with R much more efficient. It's typically divided into four main panes (though the layout is customizable):

  1. Console (Bottom-Left): This is where you type and execute R commands directly. You'll see output and error messages here. It's like a command line specifically for R.
  2. Source Editor (Top-Left): This is where you write and save R scripts (`.R` files). Scripts allow you to write multiple lines of code, save them, and run them together. This pane might not be visible until you open or create a new script (File > New File > R Script).
  3. Environment/History/Connections (Top-Right):
    • Environment: Shows the objects (variables, data frames, functions, etc.) currently loaded in your R session.
    • History: Records the commands you've previously entered in the console.
    • Connections: For connecting to external databases or other resources.
  4. Files/Plots/Packages/Help/Viewer (Bottom-Right):
    • Files: Navigate your computer's file system.
    • Plots: Displays graphs and plots generated by your R code.
    • Packages: View installed R packages, load/unload them, and install new ones.
    • Help: Access R's built-in help documentation.
    • Viewer: Displays local web content, like interactive visualizations.

Familiarize yourself with these panes, as you'll be using them constantly throughout this course.

Running Your First R Commands: Saying Hello!

Let's start by interacting directly with the R console in RStudio.

Basic Calculations

You can use R as a powerful calculator. Type these directly into the Console pane and press Enter:

5 + 3           # Addition
10 - 4          # Subtraction
6 * 7           # Multiplication
15 / 3          # Division
2 ^ 5           # Exponentiation (2 to the power of 5)

Assigning Variables

Store values in variables using the assignment operator `<-` (preferred) or `=`. Variable names in R often use dots (`.`) or underscores (`_`).

x <- 10
my.variable <- "Hello"
result = x * 3

To see the value of a variable, simply type its name in the console and press Enter:

x
my.variable
result

Your First Script: "Hello, R!"

  1. New Script: In RStudio, go to File > New File > R Script. This opens the Source Editor pane.
  2. Write Code: Type the following line into the script editor:
    print("Hello, R!")

    The `print()` function displays output to the console.

  3. Save the Script: Go to File > Save As. Choose a location (e.g., create an "RLearn" folder) and name the file `hello.R`. R scripts typically have the `.R` extension.
  4. Run the Code:
    • You can run the current line by placing your cursor on it and pressing `Ctrl+Enter` (Windows/Linux) or `Cmd+Enter` (macOS).
    • You can run the entire script by clicking the "Source" button at the top of the editor pane or pressing `Ctrl+Shift+Enter` / `Cmd+Shift+Enter`.
  5. See the Output: You should see the following output appear in the Console pane:
    [1] "Hello, R!"

    (The `[1]` indicates that "Hello, R!" is the first element of the output vector).

    Congratulations! You've successfully run your first R script.

Basic Syntax and Conventions: The Rules of R.

Like any language, R has its grammar and style. Understanding these basics is key to writing clear and functional R code.

  • Comments: Explain your code using comments. In R, comments start with the hash symbol `#`. Everything after `#` on that line is ignored by R.
    # This is a comment explaining the code below
    radius <- 5 # Assigning the value 5 to the variable 'radius'
  • Case Sensitivity: R is case-sensitive. `myVariable`, `MyVariable`, and `myvariable` are treated as distinct variables.
  • Statements: R code is executed statement by statement, typically one per line. You can put multiple statements on one line separated by semicolons `;`, but this is generally less readable.
    a <- 1; b <- 2; print(a+b) # Less common style
  • Assignment Operator: The preferred assignment operator in R is `<-`. While `=` also works for assignment in most contexts, `<-` is stylistically preferred and less ambiguous (as `=` is also used for function arguments).
    x <- 100   # Preferred style
    y = 200    # Also works, but less conventional
  • Blocks and Braces `{}`: Unlike Python's reliance on indentation[cite: 10], R uses curly braces `{}` to group multiple statements into a block, often used with control flow structures (like `if`, `for`) and functions. Indentation is still crucial for readability but doesn't define the block structure syntactically.
    if (x > 50) {
      print("x is large")
      print("This is part of the if block")
    }
  • Variable Naming Conventions:
    • Should be descriptive.
    • Can contain letters, numbers, dots (`.`), and underscores (`_`).
    • Must start with a letter or a dot (if starting with a dot, the second character cannot be a number).
    • Common styles include `snake_case` (e.g., `user_age`) or using dots (e.g., `user.age`). Consistency is key.
    • Avoid using names of built-in functions (e.g., `c`, `list`, `mean`).

Getting comfortable with these basic rules will make your R learning journey smoother.

Syllabus