11 R: Functions
11.1 Functions in R
Functions in R are used to encapsulate reusable code. R provides a wide range of built-in functions for mathematical operations, statistics, and data manipulation, and users can also define their own functions.
11.1.1 Built-in Functions in R
R is a language specifically designed for statistical computing, with many of its core libraries developed by leading statisticians. It excels in statistical analysis and is widely used in academic research and applied statistics. While Python is more versatile for general-purpose programming and machine learning, R remains a strong choice for pure statistical work.
R has many predefined functions for mathematical operations, statistics, and data manipulation. These functions are part of R’s base and default packages, allowing users to perform various tasks without additional libraries.
11.1.1.1 mathematical Functions
Function | Description |
---|---|
abs(x) |
Absolute value of x |
sqrt(x) |
Square root of x |
exp(x) |
Exponential function ( e^x ) |
log(x, base=n) |
Logarithm (default natural log) |
log10(x) |
Base-10 logarithm |
log2(x) |
Base-2 logarithm |
round(x, n) |
Rounds x to n decimal places |
floor(x) |
Rounds x down |
ceiling(x) |
Rounds x up |
trunc(x) |
Truncates x (removes decimal part) |
sqrt(25)
[1] 5
abs(-10)
[1] 10
round(3.14159, 2)
[1] 3.14
11.1.1.2 Statistical Functions
Function | Description |
---|---|
mean(x) |
Mean of x |
median(x) |
Median of x |
var(x) |
Variance of x |
sd(x) |
Standard deviation of x |
sum(x) |
Sum of all elements in x |
prod(x) |
Product of all elements in x |
min(x) |
Minimum value in x |
max(x) |
Maximum value in x |
range(x) |
Minimum and maximum of x |
quantile(x, p) |
p-th quantile of x |
cor(x, y) |
Correlation between x and y |
cov(x, y) |
Covariance between x and y |
# Create a sample numeric vector
<- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
x <- c(15, 25, 35, 45, 55, 65, 75, 85, 95, 105)
y
# 1. Basic statistical functions
<- mean(x) # Mean of x
mean_x <- median(x) # Median of x
median_x <- var(x) # Variance of x
var_x <- sd(x) # Standard deviation of x
sd_x
# 2. Summation and product
<- sum(x) # Sum of elements in x
sum_x <- prod(x[1:5]) # Product of first five elements of x
prod_x
# 3. Minimum, Maximum, and Range
<- min(x) # Minimum value
min_x <- max(x) # Maximum value
max_x <- range(x) # Range (min and max)
range_x
# 4. Quantiles
<- quantile(x, probs = c(0.25, 0.5, 0.75)) # 25th, 50th, 75th percentile
quantiles_x
# 5. Correlation and Covariance
<- cor(x, y) # Correlation between x and y
cor_xy <- cov(x, y) # Covariance between x and y
cov_xy
# Print results
cat("Mean of x:", mean_x, "\n")
Mean of x: 55
cat("Median of x:", median_x, "\n")
Median of x: 55
cat("Variance of x:", var_x, "\n")
Variance of x: 916.6667
cat("Standard deviation of x:", sd_x, "\n\n")
Standard deviation of x: 30.2765
cat("Sum of x:", sum_x, "\n")
Sum of x: 550
cat("Product of first five elements in x:", prod_x, "\n\n")
Product of first five elements in x: 1.2e+07
cat("Min of x:", min_x, "\n")
Min of x: 10
cat("Max of x:", max_x, "\n")
Max of x: 100
cat("Range of x:", range_x, "\n\n")
Range of x: 10 100
cat("Quantiles of x:\n")
Quantiles of x:
print(quantiles_x)
25% 50% 75%
32.5 55.0 77.5
cat("\nCorrelation between x and y:", cor_xy, "\n")
Correlation between x and y: 1
cat("Covariance between x and y:", cov_xy, "\n")
Covariance between x and y: 916.6667
11.1.1.3 Random number generation
R provides various functions to generate random numbers from different distributions. These functions are essential for simulation, statistical modeling, and machine learning.
Function | Description | Example |
---|---|---|
runif(n, min, max) |
Uniform distribution | runif(5, 0, 10) |
rnorm(n, mean, sd) |
Normal distribution | rnorm(5, mean=0, sd=1) |
rbinom(n, size, prob) |
Binomial distribution | rbinom(5, size=10, prob=0.5) |
rpois(n, lambda) |
Poisson distribution | rpois(5, lambda=3) |
sample(x, size, replace) |
Random sampling | sample(1:10, 5, replace=TRUE) |
11.1.1.3.1 Setting a seed for reproducibility
Setting a seed ensures that random number generation produces the same results every time.
set.seed(42)
runif(3)
[1] 0.9148060 0.9370754 0.2861395
11.1.1.3.2 Generating uniform random numbers (runif()
)
The runif()
function generates random numbers from a uniform distribution between a given min
and max
.
set.seed(42) # Set seed for reproducibility
runif(5, min=0, max=10)
[1] 9.148060 9.370754 2.861395 8.304476 6.417455
11.1.1.3.3 Generating Normally Distributed Numbers (rnorm()
)
The rnorm()
function generates random numbers from a normal (Gaussian) distribution.
set.seed(42)
rnorm(5, mean=0, sd=1)
[1] 1.3709584 -0.5646982 0.3631284 0.6328626 0.4042683
11.1.1.3.4 Random Sampling (sample()
)
The sample()
function randomly selects elements from a given vector.
set.seed(42)
sample(1:10, 5, replace=TRUE)
[1] 1 5 1 9 10
11.2 User-defined Functions
11.2.1 Defining a function
Functions in R are defined using the keyword function()
. All the statements within a function are enclosed with {}
braces.
<- function() { # create a function with the name my_function
my_function print("Hello R!")
}
11.2.2 Calling a function
After creating a Function, you have to call the function to use it.
my_function() # call the function named my_function
[1] "Hello R!"
11.2.3 Arguments
Information can be passed into functions as arguments.
Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a comma.
The following example has a function with one argument (fname). When the function is called, we pass along a first name, which is used inside the function to print the full name:
<- function(fname) {
my_function paste("Hello", fname)
}
my_function("Lizhen")
[1] "Hello Lizhen"
my_function("Sarah")
[1] "Hello Sarah"
my_function("Jack")
[1] "Hello Jack"
11.2.4 Return values
To let a function return a result, use the return()
function:
# Function to return the largest of three numbers
<- function(a, b, c) {
largest_value return(max(a, b, c))
}
# Example usage
<- largest_value(10, 25, 15)
result print(result)
[1] 25
A function in R does not necessarily need to return a value explicitly; it can simply print output to the console. Look at the function defined below. It takes an integer as an argument, and prints whether the integer is odd or even.
<- function(intgr) {
odd_even if (intgr %% 2 == 0) {
print("even")
else {
} print("odd")
}
}
odd_even(3)
[1] "odd"
11.3 Function arguments
Like Python, functions in R support multiple types of arguments, including positional arguments, default arguments, variable-length arguments, and keyword arguments. The behavior of function arguments in R is nearly identical to Python.
11.3.1 Positional arguments
Write a function that returns all prime numbers between \(a\) and \(b\), where \(a\) and \(b\) are parameters of the function.
<- function(a, b) {
prime <- c()
prime_numbers for (number in a:b) {
= 1
prime
for (factor in 2:(number - 1)) {
if (number %% factor == 0) {
= 0
prime
}
}
if (prime == 1) prime_numbers <- c(prime_numbers, number)
}return(prime_numbers)
}prime(40, 60)
[1] 41 43 47 53 59
11.3.2 default arguments
<- function(name = "Guest", age = 25, country = "USA") {
greet cat("Hello,", name, "!\n")
cat("You are", age, "years old and from", country, ".\n")
}
# Calling the function with default values
greet()
Hello, Guest !
You are 25 years old and from USA .
11.3.3 keyword arguments
<- function(name = "Guest", age = 25, country = "USA") {
greet cat("Hello,", name, "!\n")
cat("You are", age, "years old and from", country, ".\n")
}
# Calling the function with default values
greet()
Hello, Guest !
You are 25 years old and from USA .
# Calling the function with keyword arguments
greet(name = "Alice", age = 30, country = "Canada")
Hello, Alice !
You are 30 years old and from Canada .
# Calling with positional arguments (order matters)
greet("Bob", 40, "UK")
Hello, Bob !
You are 40 years old and from UK .
# Mixing positional and named arguments
greet("Charlie", country = "Australia")
Hello, Charlie !
You are 25 years old and from Australia .
11.3.4 Variable-number arguments
# Function that accepts variable-length arguments and computes their sum
<- function(...) {
sum_values <- c(...) # Collect arguments into a vector
values sum(values, na.rm = TRUE) # Sum with NA removal support
}
# Example usage
<- sum_values(10, 20, 30, 40, 50)
result print(result)
[1] 150
# Function that accepts extra named arguments
<- function(name, ...) {
display_info cat("Name:", name, "\n")
# Capture extra arguments as a list
<- list(...)
extras for (key in names(extras)) {
cat(key, ":", extras[[key]], "\n")
}
}
# Example usage with additional named arguments
display_info("David", age = 28, country = "Germany", occupation = "Engineer")
Name: David
age : 28
country : Germany
occupation : Engineer
11.4 Variable scope
R has global and local variables, similar to Python, but there are some differences in how they are managed.
11.4.1 Local variables
- Variables declared inside a function are local to that function.
- They are not accessible outside the function.
<- function() {
my_function <- 10 # Local variable
x cat("Inside function: x =", x, "\n")
}
my_function()
print(x) # Error: object 'x' not found
11.4.2 Global variables
- Variables declared outside any function are global.
- They can be accessed from anywhere unless shadowed by a local variable of the same name.
<- 100 # Global variable
y
<- function() {
my_function cat("Inside function: y =", y, "\n") # Can access global variable
}
my_function()
Inside function: y = 100
print(y) # Still accessible
[1] 100
11.4.3 Modifying Global variables inside a function
Unlike Python (global
keyword), R requires the special assignment operator <<-
to modify a global variable inside a function.
<- 0 # Global variable
counter
<- function() {
increase_counter <<- counter + 1 # Modifies global variable
counter
}
increase_counter()
increase_counter()
print(counter)
[1] 2
11.5 Practice exercises
11.5.1 Problem 1
Write a function where the input is any word and the output is the number of letters in it. Store the output of the function, then print the sentence: “{} has {} letters”.
11.5.2 Problem 2
Write a function that has no input values and simulates rolling a die. In other words, it should generate a random integer between 1 and 6, inclusive. The function should return the integer. Then use a loop to “roll the die” 5 times.
11.5.3 Problem 3
Write a function that simulates rolling a die. The function should have an input for the number of sides of the die and the number of times the die is rolled. Have the default number of sides be 6. Within the function calculate the mean(), sum(), min(), max() of all the dice rolled.
The function should return the dice rolled and the 4 calculations. Run your function with any sided dice and any number of times.
Below is my function and make it work
<- function(size=6, roll_times=1){
roll_die return (sample(1:size, roll_times, replace=TRUE))
}
paste("The average of", roll_times, "rolling a die with", size, "is", mean(roll_die(roll_times=5)))