11  R: Functions

11.1 Functions in R

Functions in R are used to encapsulate reusable code. R provides a wide range of built-in functions for mathematical operations, statistics, and data manipulation, and users can also define their own functions.


11.1.1 Built-in Functions in R

R is a language specifically designed for statistical computing, with many of its core libraries developed by leading statisticians. It excels in statistical analysis and is widely used in academic research and applied statistics. While Python is more versatile for general-purpose programming and machine learning, R remains a strong choice for pure statistical work.

R has many predefined functions for mathematical operations, statistics, and data manipulation. These functions are part of R’s base and default packages, allowing users to perform various tasks without additional libraries.

11.1.1.1 mathematical Functions

Function Description
abs(x) Absolute value of x
sqrt(x) Square root of x
exp(x) Exponential function ( e^x )
log(x, base=n) Logarithm (default natural log)
log10(x) Base-10 logarithm
log2(x) Base-2 logarithm
round(x, n) Rounds x to n decimal places
floor(x) Rounds x down
ceiling(x) Rounds x up
trunc(x) Truncates x (removes decimal part)
sqrt(25)
[1] 5
abs(-10)
[1] 10
round(3.14159, 2)
[1] 3.14

11.1.1.2 Statistical Functions

Function Description
mean(x) Mean of x
median(x) Median of x
var(x) Variance of x
sd(x) Standard deviation of x
sum(x) Sum of all elements in x
prod(x) Product of all elements in x
min(x) Minimum value in x
max(x) Maximum value in x
range(x) Minimum and maximum of x
quantile(x, p) p-th quantile of x
cor(x, y) Correlation between x and y
cov(x, y) Covariance between x and y
# Create a sample numeric vector
x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
y <- c(15, 25, 35, 45, 55, 65, 75, 85, 95, 105)

# 1. Basic statistical functions
mean_x <- mean(x)       # Mean of x
median_x <- median(x)   # Median of x
var_x <- var(x)         # Variance of x
sd_x <- sd(x)           # Standard deviation of x

# 2. Summation and product
sum_x <- sum(x)         # Sum of elements in x
prod_x <- prod(x[1:5])  # Product of first five elements of x

# 3. Minimum, Maximum, and Range
min_x <- min(x)         # Minimum value
max_x <- max(x)         # Maximum value
range_x <- range(x)     # Range (min and max)

# 4. Quantiles
quantiles_x <- quantile(x, probs = c(0.25, 0.5, 0.75))  # 25th, 50th, 75th percentile

# 5. Correlation and Covariance
cor_xy <- cor(x, y)  # Correlation between x and y
cov_xy <- cov(x, y)  # Covariance between x and y

# Print results
cat("Mean of x:", mean_x, "\n")
Mean of x: 55 
cat("Median of x:", median_x, "\n")
Median of x: 55 
cat("Variance of x:", var_x, "\n")
Variance of x: 916.6667 
cat("Standard deviation of x:", sd_x, "\n\n")
Standard deviation of x: 30.2765 
cat("Sum of x:", sum_x, "\n")
Sum of x: 550 
cat("Product of first five elements in x:", prod_x, "\n\n")
Product of first five elements in x: 1.2e+07 
cat("Min of x:", min_x, "\n")
Min of x: 10 
cat("Max of x:", max_x, "\n")
Max of x: 100 
cat("Range of x:", range_x, "\n\n")
Range of x: 10 100 
cat("Quantiles of x:\n")
Quantiles of x:
print(quantiles_x)
 25%  50%  75% 
32.5 55.0 77.5 
cat("\nCorrelation between x and y:", cor_xy, "\n")

Correlation between x and y: 1 
cat("Covariance between x and y:", cov_xy, "\n")
Covariance between x and y: 916.6667 

11.1.1.3 Random number generation

R provides various functions to generate random numbers from different distributions. These functions are essential for simulation, statistical modeling, and machine learning.

Function Description Example
runif(n, min, max) Uniform distribution runif(5, 0, 10)
rnorm(n, mean, sd) Normal distribution rnorm(5, mean=0, sd=1)
rbinom(n, size, prob) Binomial distribution rbinom(5, size=10, prob=0.5)
rpois(n, lambda) Poisson distribution rpois(5, lambda=3)
sample(x, size, replace) Random sampling sample(1:10, 5, replace=TRUE)
11.1.1.3.1 Setting a seed for reproducibility

Setting a seed ensures that random number generation produces the same results every time.

set.seed(42)
runif(3)
[1] 0.9148060 0.9370754 0.2861395
11.1.1.3.2 Generating uniform random numbers (runif())

The runif() function generates random numbers from a uniform distribution between a given min and max.

set.seed(42)  # Set seed for reproducibility
runif(5, min=0, max=10)
[1] 9.148060 9.370754 2.861395 8.304476 6.417455
11.1.1.3.3 Generating Normally Distributed Numbers (rnorm())

The rnorm() function generates random numbers from a normal (Gaussian) distribution.

set.seed(42)
rnorm(5, mean=0, sd=1)
[1]  1.3709584 -0.5646982  0.3631284  0.6328626  0.4042683
11.1.1.3.4 Random Sampling (sample())

The sample() function randomly selects elements from a given vector.

set.seed(42)
sample(1:10, 5, replace=TRUE)
[1]  1  5  1  9 10

11.2 User-defined Functions

11.2.1 Defining a function

Functions in R are defined using the keyword function(). All the statements within a function are enclosed with {} braces.

my_function <- function() { # create a function with the name my_function
  print("Hello R!")
}

11.2.2 Calling a function

After creating a Function, you have to call the function to use it.

my_function() # call the function named my_function
[1] "Hello R!"

11.2.3 Arguments

Information can be passed into functions as arguments.

Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a comma.

The following example has a function with one argument (fname). When the function is called, we pass along a first name, which is used inside the function to print the full name:

my_function <- function(fname) {
  paste("Hello", fname)
}

my_function("Lizhen")
[1] "Hello Lizhen"
my_function("Sarah")
[1] "Hello Sarah"
my_function("Jack")
[1] "Hello Jack"

11.2.4 Return values

To let a function return a result, use the return() function:

# Function to return the largest of three numbers
largest_value <- function(a, b, c) {
  return(max(a, b, c))
}

# Example usage
result <- largest_value(10, 25, 15)
print(result) 
[1] 25

A function in R does not necessarily need to return a value explicitly; it can simply print output to the console. Look at the function defined below. It takes an integer as an argument, and prints whether the integer is odd or even.

odd_even <- function(intgr) {
  if (intgr %% 2 == 0) {
    print("even")
  } else {
      print("odd")
  }
}

odd_even(3)
[1] "odd"

11.3 Function arguments

Like Python, functions in R support multiple types of arguments, including positional arguments, default arguments, variable-length arguments, and keyword arguments. The behavior of function arguments in R is nearly identical to Python.

11.3.1 Positional arguments

Write a function that returns all prime numbers between \(a\) and \(b\), where \(a\) and \(b\) are parameters of the function.

prime <- function(a, b) {
  prime_numbers <- c()
  for (number in a:b) {
    prime = 1
    
    for (factor in 2:(number - 1)) {
      if (number %% factor == 0) {
        prime = 0
      }
    }
    
    if (prime == 1) prime_numbers <- c(prime_numbers, number)
  }
  return(prime_numbers)
}
prime(40, 60)
[1] 41 43 47 53 59

11.3.2 default arguments

greet <- function(name = "Guest", age = 25, country = "USA") {
  cat("Hello,", name, "!\n")
  cat("You are", age, "years old and from", country, ".\n")
}

# Calling the function with default values
greet()
Hello, Guest !
You are 25 years old and from USA .

11.3.3 keyword arguments

greet <- function(name = "Guest", age = 25, country = "USA") {
  cat("Hello,", name, "!\n")
  cat("You are", age, "years old and from", country, ".\n")
}

# Calling the function with default values
greet()
Hello, Guest !
You are 25 years old and from USA .
# Calling the function with keyword arguments
greet(name = "Alice", age = 30, country = "Canada")
Hello, Alice !
You are 30 years old and from Canada .
# Calling with positional arguments (order matters)
greet("Bob", 40, "UK")
Hello, Bob !
You are 40 years old and from UK .
# Mixing positional and named arguments
greet("Charlie", country = "Australia")  
Hello, Charlie !
You are 25 years old and from Australia .

11.3.4 Variable-number arguments

# Function that accepts variable-length arguments and computes their sum
sum_values <- function(...) {
  values <- c(...)  # Collect arguments into a vector
  sum(values, na.rm = TRUE)  # Sum with NA removal support
}

# Example usage
result <- sum_values(10, 20, 30, 40, 50)
print(result)  
[1] 150
# Function that accepts extra named arguments
display_info <- function(name, ...) {
  cat("Name:", name, "\n")
  
  # Capture extra arguments as a list
  extras <- list(...)
  for (key in names(extras)) {
    cat(key, ":", extras[[key]], "\n")
  }
}

# Example usage with additional named arguments
display_info("David", age = 28, country = "Germany", occupation = "Engineer")
Name: David 
age : 28 
country : Germany 
occupation : Engineer 

11.4 Variable scope

R has global and local variables, similar to Python, but there are some differences in how they are managed.

11.4.1 Local variables

  • Variables declared inside a function are local to that function.
  • They are not accessible outside the function.
my_function <- function() {
  x <- 10  # Local variable
  cat("Inside function: x =", x, "\n")
}

my_function()
print(x)  # Error: object 'x' not found

11.4.2 Global variables

  • Variables declared outside any function are global.
  • They can be accessed from anywhere unless shadowed by a local variable of the same name.
y <- 100  # Global variable

my_function <- function() {
  cat("Inside function: y =", y, "\n")  # Can access global variable
}

my_function()
Inside function: y = 100 
print(y)  # Still accessible
[1] 100

11.4.3 Modifying Global variables inside a function

Unlike Python (global keyword), R requires the special assignment operator <<- to modify a global variable inside a function.

counter <- 0  # Global variable

increase_counter <- function() {
  counter <<- counter + 1  # Modifies global variable
}

increase_counter()
increase_counter()
print(counter)  
[1] 2

11.5 Practice exercises

11.5.1 Problem 1

Write a function where the input is any word and the output is the number of letters in it. Store the output of the function, then print the sentence: “{} has {} letters”.

11.5.2 Problem 2

Write a function that has no input values and simulates rolling a die. In other words, it should generate a random integer between 1 and 6, inclusive. The function should return the integer. Then use a loop to “roll the die” 5 times.

11.5.3 Problem 3

Write a function that simulates rolling a die. The function should have an input for the number of sides of the die and the number of times the die is rolled. Have the default number of sides be 6. Within the function calculate the mean(), sum(), min(), max() of all the dice rolled.

The function should return the dice rolled and the 4 calculations. Run your function with any sided dice and any number of times.

Below is my function and make it work

roll_die <- function(size=6, roll_times=1){
  return (sample(1:size, roll_times, replace=TRUE))
}

paste("The average of", roll_times, "rolling a die with", size, "is", mean(roll_die(roll_times=5)))