Week 7: Writing Reusable Code with Functions

Learn how to define and use functions to make your R code modular and efficient.

Explore Chapter 7

Chapter 7: Functions in R

Defining Functions.

Functions are fundamental building blocks in R (and most programming languages). They allow you to encapsulate a block of code that performs a specific task. This makes your code more organized, easier to read, less repetitive (following the DRY principle - Don't Repeat Yourself), and easier to debug and maintain.

In R, functions are first-class objects, meaning you can treat them like any other object: assign them to variables, pass them as arguments to other functions, and return them from functions.

Syntax

You define a function in R using the `function` keyword and assign it to a variable (which becomes the function's name).

function_name <- function(parameter1, parameter2, ...) {
  # Code block (function body)
  statement1
  statement2
  # ...
  # Optional return value (often the last expression evaluated)
  result_expression
}
  • `function_name`: The name you choose for your function (following variable naming rules).
  • `<-`: The assignment operator used to assign the function object to the name.
  • `function`: The keyword indicating you are defining a function.
  • `(parameter1, parameter2, ...)`: Parentheses containing the function's parameters (inputs). A function can have zero or more parameters.
  • `{ ... }`: Curly braces enclosing the function's body, which contains the R code to be executed when the function is called.
  • `result_expression`: Typically, the value of the last expression evaluated in the body is automatically returned. You can also use the `return()` function explicitly.

Example: A Simple Greeting Function

# Define the function
greet <- function(name) {
  message <- paste("Hello,", name, "!")
  print(message)
}

# Call the function
greet("Alice")   # Output: Hello, Alice !
greet("Bob")     # Output: Hello, Bob !

Function Parameters and Arguments.

Parameters are the variables defined within the function's parentheses that act as placeholders for the inputs. Arguments are the actual values supplied to the function when it is called.

Positional Matching

By default, R matches arguments to parameters based on their position. The first argument goes to the first parameter, the second to the second, and so on.

calculate_power <- function(base, exponent) {
  result <- base ^ exponent
  print(paste(base, "to the power of", exponent, "is", result))
}

calculate_power(2, 3) # Output: [1] "2 to the power of 3 is 8"
# 2 is passed to 'base', 3 is passed to 'exponent'

Named Matching

You can explicitly match arguments to parameters by name. This makes the code more readable and allows you to supply arguments in any order.

calculate_power(exponent = 4, base = 3) # Output: [1] "3 to the power of 4 is 81"

Default Parameter Values

You can provide default values for parameters directly in the function definition. If an argument for that parameter is not provided during the function call, the default value is used.

greet_with_default <- function(name = "Guest", greeting = "Hello") {
  message <- paste(greeting, ",", name, "!")
  print(message)
}

greet_with_default()                 # Output: Hello , Guest !
greet_with_default("Charlie")        # Output: Hello , Charlie !
greet_with_default(greeting = "Hi") # Output: Hi , Guest !
greet_with_default("David", "Good morning") # Output: Good morning , David !

The Ellipsis (`...`)

The special `...` argument allows a function to accept an arbitrary number of additional arguments. These are often passed down to other functions called within the main function. We won't delve deep into this now, but it's good to be aware of it.

Returning Values from Functions.

Functions often need to compute a result and send it back to the part of the script that called it. R handles return values in a specific way.

Implicit Return

By default, an R function implicitly returns the value of the last expression evaluated within its body.

add_numbers <- function(x, y) {
  sum_result <- x + y
  sum_result  # This is the last evaluated expression, so its value is returned
}

result <- add_numbers(10, 5)
print(result) # Output: [1] 15

Explicit Return with `return()`

You can also use the `return()` function to explicitly specify the value to return and potentially exit the function early (e.g., based on a condition).

check_value <- function(x) {
  if (x <= 0) {
    return("Value must be positive") # Exit early and return message
  }
  # This code only runs if x > 0
  processed_value <- sqrt(x)
  return(processed_value) # Explicitly return the calculated value
}

output1 <- check_value(9)
print(output1) # Output: [1] 3

output2 <- check_value(-5)
print(output2) # Output: [1] "Value must be positive"

Returning Multiple Values (as a List)

To return multiple values, you typically combine them into a list (often a named list for clarity).

calculate_stats <- function(numbers) {
  mean_val <- mean(numbers)
  sd_val <- sd(numbers)
  return(list(mean = mean_val, stdev = sd_val))
}

data <- 1:10
stats_result <- calculate_stats(data)
print(stats_result)
# Output:
# $mean
# [1] 5.5
# $stdev
# [1] 3.02765

# Access individual returned values
print(stats_result$mean) # Output: [1] 5.5

Scope of Variables (Environments).

Scope determines where in your program a variable can be accessed. R uses a system of environments to manage scope.

Local Environment

When you call a function, R creates a new, temporary local environment for that function call. Variables defined inside the function body exist only within this local environment and are typically destroyed when the function finishes executing. These are called local variables.

my_function <- function() {
  local_var <- "I exist only inside the function"
  print(local_var)
}

my_function()
# print(local_var) # This would cause an error: "object 'local_var' not found"

Global Environment

Variables defined outside any function (e.g., directly in your script or console) exist in the global environment. These are called global variables.

Functions can access global variables, but if you assign a value to a variable name inside a function that also exists globally, R creates a new local variable by default, leaving the global variable unchanged.

global_var <- 100

access_global <- function() {
  print(paste("Inside function, global_var is:", global_var)) # Accesses global
}

shadow_global <- function() {
  global_var <- 50 # Creates a *local* variable named global_var
  print(paste("Inside function, local global_var is:", global_var))
}

access_global()  # Output: [1] "Inside function, global_var is: 100"
shadow_global()  # Output: [1] "Inside function, local global_var is: 50"
print(paste("Outside function, global_var is still:", global_var)) # Output: [1] "Outside function, global_var is still: 100"

Modifying Global Variables (Use with Caution)

While generally discouraged because it can make code harder to track, you can modify a global variable from within a function using the superassignment operator `<<-`.

counter <- 0

increment_global_counter <- function() {
  counter <<- counter + 1 # Modifies the global 'counter'
}

increment_global_counter()
increment_global_counter()
print(counter) # Output: [1] 2

Understanding environments and scope is key to avoiding bugs related to variable access in R.

Syllabus