12 R: Atomic vectors

Atomic vectors are the most basic data type in R.

An atomic vector is a one-dimensional collection of elements, where all elements must have the same data type. If you assign different data type values to an atomic vector, the vector will coerce all elements to the most flexible common data type. This is known as type coercion in R.

12.0.1 Creating a vector

Atomic vectors are created using the c() function, which stands for combine or concatenate

numeric_vec <- c(1, 2.5, 3.8)
print(numeric_vec)

[1] 1.0 2.5 3.8

# Character vector
char_vec <- c("apple", "banana", "cherry")
print(char_vec)

[1] "apple"  "banana" "cherry"

12.0.1.1 Type Coercion (mixed types)

When you assign elements of different types to an atomic vector, R does not raise an error.
Instead, it automatically performs type coercion to ensure all elements in the vector share the same type.
Coercion follows a hierarchy: Logical → Integer → Double → Character.

vec <- c(1, "apple", TRUE)  # Mixed types
typeof(vec)

[1] "character"

12.0.2 Atomic vector types

R has six atomic vector types:
1. Logical (TRUE, FALSE)
2. Integer (Whole numbers: 1L, 2L, -5L)
3. Double (Numeric) (Decimal numbers: 1.5, 3.14, -2.8)
4. Character (Text strings: "hello", "R programming")
5. Complex (Complex numbers: 1 + 2i)
6. Raw (Stores raw bytes)
In this course, we normally use the first four types: Logical, Integer, Double, and Character.

12.0.2.1 Checking Whether a Variable is an Atomic Vector

There are two common functions to check if a variable is an atomic vector:
- is.atomic(x): Returns TRUE if x is an atomic object.
- is.vector(x): Returns TRUE if x is a vector (with no attributes other than names).

is.atomic(numeric_vec)

[1] TRUE

is.vector(char_vec)

[1] TRUE

Note that atomic vectors are a subset of vectors in R. Vectors can be list or expression as well.

is.vector(list())

[1] TRUE

is.vector(expression())

[1] TRUE

is.atomic(list())

[1] FALSE

is.atomic(expression())

[1] FALSE

12.0.2.2 Checking the Data Type of an Atomic Vector

To determine the specific type of an atomic vector, use:
- typeof(x): Returns the underlying storage type of x.
- class(x): Returns the class of x.

typeof(numeric_vec)

[1] "double"

class(char_vec)

[1] "character"

12.0.3 Slicing a vector

An atomic vector can be sliced using the indices of the elements within [] brackets. In R

In R, no indexing backwards - you cannot index from the end
You can use negative numbers as an index - but they are for removing elements.

For example, considering this vector:

num_vec <- c(10, 20, 30, 40, 50)

Suppose, we wish to get the $3^{rd}$ element of the vector. We can get it using the index 3:

num_vec[3]

[1] 30

A sequence of consecutive elements can be sliced using the indices of the first element and the last element around the : operator. For example, let us slice elements from the $1^{st}$ index to the $3^{rd}$ element of the vector num_vec:

num_vec[1:3]

[1] 10 20 30

We can slice elements at different indices by putting the indices in an atomic vector within the [] brackets. Let us slice the $2^{nd}$, and $4^{th}$ elements of the vector num_vec:

num_vec[c(2,4)]

[1] 20 40

We can slice consecutive elements, and non-consecutive elements simultaneously. Let us slice the elements from the $1^{st}$ index to the $2^{nd}$ index and the $4^{th}$ element.

num_vec[c(1:2,4)]

[1] 10 20 40

12.0.3.1 Slicing using a logical atomic vector

An atomic vector can be sliced using a logical atomic vector of the same length. The logical atomic vector will have TRUE values corresponding to the indices where the element is to be selected, and FALSE where the element is to be discarded. See the example below.

num_vec[c(TRUE, FALSE, FALSE, TRUE, FALSE)]

[1] 10 40

filtered_num_vector <- num_vec[num_vec > 2]
print(filtered_num_vector)

[1] 10 20 30 40 50

12.0.4 Mutating a vector

In R, all the data structures are mutable. You can alter and access the content freely.

# modify an element
num_vec[2] <- 99
print(num_vec)

[1] 10 99 30 40 50

# adding an element at a specific position
append(num_vec, 3, after=2)

[1] 10 99  3 30 40 50

print(num_vec)

[1] 10 99 30 40 50

#expanding a vector
new_vec <- c(num_vec, 60, 70) 
print(new_vec)

[1] 10 99 30 40 50 60 70

12.0.5 Removing elements from a vector

Elements can be removed from the vector using the negative sign within [] brackets.

Remove the 2nd element from the vector:

new_vec <- new_vec[-3]
print(new_vec)

[1] 10 99 40 50 60 70

If multiple elements need to be removed, the indices of the elements to be removed can be given as an atomic vector.

Remove elements 1 to 3 and element 5 from the vector:

new_vec <- new_vec[-c(1:3, 5)]
print(new_vec)

[1] 50 70

12.0.6 Iterating

vec <- c(10, 20, 30, 40, 50)

for (val in vec) {
  print(val)
}

[1] 10
[1] 20
[1] 30
[1] 40
[1] 50

12.0.7 Using as a function input

You can define a function that takes a vector as input and performs operations on its elements.

sum_vector <- function(vec) {
  return(sum(vec))  # Sum of all elements
}

# Call the function with a vector
my_vec <- c(1, 2, 3, 4, 5)
sum_vector(my_vec)

[1] 15

12.0.8 Applying a Function to Each Element of a Vector

If you want to process each element individually, use sapply()

numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Function to check if a number is even or odd
is_even_or_odd <- function(x) {
  if (x %% 2 == 0) {
    return("Even")
  } else {
    return("Odd")
  }
}

# Apply the function to each element in the vector
result <- sapply(numbers, is_even_or_odd)

# Print the result
print(result)

 [1] "Odd"  "Even" "Odd"  "Even" "Odd"  "Even" "Odd"  "Even" "Odd"  "Even"

Another example:

vec <- 1:6
vec

[1] 1 2 3 4 5 6

Suppose, we wish to square each element of the vector. We can use the sapply() function as below:

sapply(vec, function(x) x**2)

[1]  1  4  9 16 25 36

Note that : function(x) x**2 is an anonymous function (also called a lambda function) in R.

12.0.9 The `rep()` function

The rep() function is used to repeat an object a fixed number of times.

rep(4, 10)

 [1] 4 4 4 4 4 4 4 4 4 4

rep(c(2, 3), 10)

 [1] 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3

12.0.10 The `which()` function

The which() function is used to find the indices of TRUE elements in a logical atomic vector.

vec <- c(8, 3, 4, 7, 9, 7, 5)

which(vec == 8)

[1] 1

In the above code, a logical vector is being created with vec == 8, and the which() function is returning the indices of the TRUE elements.

The index of the maximum and minimum values can be found using which.max() and which.min() respectively. In case of multple maximum or minimum elements, the smallest index is returned.

which.max(vec)

[1] 5

which.min(vec)

[1] 2

12.0.11 Element-wise operations on atomic vectors

When we use arithmetic operators (+, -, *, /, etc.) or comparison operators (>, >=, ==, etc.) between atomic vectors, these operations are applied element-wise, meaning they operate on corresponding elements at the same index in each vector.

Example: Element-wise Arithmetic Operations

vec1 <- c(2, 4, 6)
vec2 <- c(1, 2, 3)

# Element-wise addition
result_add <- vec1 + vec2  
print(result_add)

[1] 3 6 9

# Element-wise multiplication
result_mul <- vec1 * vec2  
print(result_mul)

[1]  2  8 18

# Element-wise exponentiation
result_exp <- vec1^vec2  
print(result_exp)

[1]   2  16 216

Example: Element-wise Comparison Operations

vec1 <- c(10, 20, 30)
vec2 <- c(5, 20, 35)

# Element-wise greater than
result_gt <- vec1 > vec2  
print(result_gt)

[1]  TRUE FALSE FALSE

# Element-wise equality check
result_eq <- vec1 == vec2  
print(result_eq)

[1] FALSE  TRUE FALSE

Recycling Rule in vector operations

If vectors are of different lengths, R applies the recycling rule, where the shorter vector is recycled (repeated) to match the length of the longer vector.

vec1 <- c(2, 4, 6, 8)  
vec2 <- c(1, 2)        

result_recycle <- vec1 + vec2  
print(result_recycle)

[1]  3  6  7 10

Operations return a new vector of the same length as the longest input

If an operator is applied between an atomic vector and a scalar, then the operation is performed on each element of the atomic vector and the scalar. See the examples below.

vec1*4

[1]  8 16 24 32

vec1 > 2

[1] FALSE  TRUE  TRUE  TRUE

Suppose, we wish to slice all elements from the object vec that are greater than 2. Here is one approach to do it. We will apply the > operator between vec and 2 to obtain a logical vector that is TRUE on indices where the condition is satisfied, and FALSE otherwise. We will then use this logical vector to slice vec. Below is the code.

vec1[vec1 > 2]

[1] 4 6 8

12.0.12 Commonly Used Functions for Atomic Vectors

Function	Description	Example
`c()`	Combines elements into a vector	`c(1, 2, 3, 4, 5)`
`rep(x, times)`	Repeats a value multiple times	`rep(5, times=3)` → `5 5 5`
`is.vector(x)`	Checks if an object is a vector	`is.vector(c(1, 2, 3))` → `TRUE`
`length(x)`	Returns the number of elements in a vector	`length(c(1, 2, 3))` → `3`
`unique(x)`	Removes duplicate values	`unique(c(1, 2, 2, 3))` → `1 2 3`
`sort(x)`	Sorts elements in ascending order	`sort(c(3, 1, 2))` → `1 2 3`
`rev(x)`	Reverses the order of elements	`rev(c(1, 2, 3))` → `3 2 1`
`order(x)`	Returns indices that sort the vector	`order(c(10, 5, 8))` → `2 3 1`

Mathematical and Statistical Functions

Function	Description	Example
`sum(x)`	Computes the sum of elements	`sum(c(1, 2, 3))` → `6`
`mean(x)`	Computes the mean (average)	`mean(c(2, 4, 6))` → `4`
`median(x)`	Computes the median	`median(c(1, 3, 5, 7))` → `4`
`sd(x)`	Computes standard deviation	`sd(c(2, 4, 6, 8))` → `2.58`
`prod(x)`	Computes the product of all elements	`prod(c(2, 3, 4))` → `24`
`range(x)`	Returns the min and max as a vector	`range(c(10, 5, 20))` → `5 20`
`min(x)`	Returns the smallest value	`min(c(3, 7, 1))` → `1`
`max(x)`	Returns the largest value	`max(c(3, 7, 1))` → `7`

Logical and Comparison Functions

Function	Description	Example
`any(x > value)`	Checks if any element meets a condition	`any(c(1, 2, 3) > 2)` → `TRUE`
`all(x > value)`	Checks if all elements meet a condition	`all(c(1, 2, 3) > 2)` → `FALSE`
`which(x == value)`	Returns indices of elements equal to `value`	`which(c(5, 10, 5) == 5)` → `1 3`

Next, let’s explore how vectorized operations in R improve performance over loops!

12.1 Vectorization in R

Vectorization in R refers to the ability to perform operations on entire vectors (or arrays) without the need for explicit loops. This makes computations faster and more efficient because R applies the operation element-wise internally using optimized C or Fortran code.

12.1.1 Comparing Loop vs. Vectorized Operations

Let’s compare the execution time of a loop-based operation versus a vectorized operation using system.time().

Example: Squaring Each Element in a Vector

Using a for Loop (Slow)

# Create a large vector
vec <- 1:1e6  # 1 to 1,000,000

# Squaring using a for loop
square_loop <- function(x) {
  result <- numeric(length(x))  # Pre-allocate memory
  for (i in seq_along(x)) {
    result[i] <- x[i]^2
  }
  return(result)
}

# Measure time taken
system.time(square_loop(vec))

   user  system elapsed 
   0.03    0.00    0.03

Using Vectorized Operation (Fast)

# Squaring using vectorized approach
square_vectorized <- function(x) {
  return(x^2)  # Directly applies to the whole vector
}

# Measure time taken
system.time(square_vectorized(vec))

   user  system elapsed 
      0       0       0

Key takeaway:

The for loop takes significantly longer (because it iterates element-by-element).
The vectorized operation is much faster (because R applies the operation to the entire vector at once).

12.1.2 Performance Comparison: Loop vs. `sapply()` vs. Vectorized

Example: Calculating square roots

# Create a large vector
x <- 1:1000000

# Using a for loop
system.time({
  result_loop <- numeric(length(x))
  for(i in seq_along(x)) {
    result_loop[i] <- sqrt(x[i])
  }
})

   user  system elapsed 
   0.03    0.00    0.03

# Using sapply
system.time({
  result_sapply <- sapply(x, sqrt)
})

   user  system elapsed 
   0.41    0.05    0.45

# Using vectorized operation (for comparison)
system.time({
  result_vector <- sqrt(x)
})

   user  system elapsed 
      0       0       0

for loop and `sapply()’ performs comparably.
The direct vectorized operation (sqrt(vec)) is the fastest.

12.1.3 Functions that support vectorization

Many R functions are already vectorized, such as:

Operation	Function
Arithmetic	`+`, `-`, `*`, `/`, `^`
Logical	`>`, `<`, `==`, `!=`
Math Functions	`sqrt()`, `log()`, `exp()`, `abs()`, `round()`
Aggregation	`sum()`, `mean()`, `max()`, `min()`, `sd()`

12.1.4 Understanding why vectorization is faster

Reasons Vectorized Operations are More Efficient:

R is optimized for vectorized operations (implemented in C under the hood).
No explicit loops are needed, reducing computation overhead.
Memory allocation is handled efficiently, unlike loops where repeated memory allocation slows execution.
Parallel execution is possible for some vectorized functions.

In R, both vectors and matrices support full vectorization, meaning element-wise operations can be applied directly without explicit loops.

12.2 Practice exercises

12.2.1 Exercise 1

USA’s GDP per capita from 1960 to 2021 is given by the vector G in the code chunk below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on.

G = c(3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

12.2.1.1

What was the GDP per capita in 1990?

12.2.1.2

What is the mean, median, and standard deviation of the GDP per capita?

12.2.1.3

Find the first year where GDP per capita exceeded $50,000.

12.2.1.4

What is the range of GDP per capita over the years?

12.2.1.5

In which years did the GDP per capita increase by more than 10%?

Here is the loop version

years <- c()
for (i in 1:(length(G) - 1)) {
  diff <- (G[i+1] - G[i]) / G[i]
  if (diff > 0.1) years <- c(years, 1960 + i)
}
print(years)

[1] 1973 1976 1977 1978 1979 1981 1984

You need to solve this problem without using a for loop.

diff_gdp <- (G[-1] - G[-length(G)]) /G[-length(G)] *100
diff_gdp [diff_gdp > 10]

[1] 10.37086 10.13973 10.02095 11.76346 10.49692 11.14115 10.14539

which (diff_gdp>10) +1959

[1] 1972 1975 1976 1977 1978 1980 1983

12.2.1.6

Find the year with the highest GDP per capita growth (percentage-wise).

12.2.1.7

Identify all years where GDP per capita decreased.

12.2.2 Exercise 2

Below is a vector consisting of responses to the question: “At what age do you think you will marry?” from students of the STAT303-1 Fall 2022 class.

exp_marriage_age <- c('24','30','28','29','30','27','26','28','30+','26','28','30','30','30','probably never','30','25','25','30','28','30+ ','30','25','28','28','25','25','27','28','30','30','35','26','28','27','27','30','25','30','26','32','27','26','27','26','28','37','28','28','28','35','28','27','28','26','28','26','30','27','30','28','25','26','28','35','29','27','27','30','24','25','29','27','33','30','30','25','26','30','32','26','30','30','I wont','25','27','27','25','27','27','32','26','25','never','28','33','28','35','25','30','29','30','31','28','28','30','40','30','28','30','27','by 30','28','27','28','30-35','35','30','30','never','30','35','28','31','30','27','33','32','27','27','26','N/A','25','26','29','28','34','26','24','28','30','120','25','33','27','28','32','30','26','30','30','28','27','27','27','27','27','27','28','30','30','30','28','30','28','30','30','28','28','30','27','30','28','25','never','69','28','28','33','30','28','28','26','30','26','27','30','25','Never','27','27','25')

12.2.2.1 Cleaning data

Remove the elements that are not integers - such as ‘probably never’, ‘30+’, etc. Convert the reamining elements to integer. What is the length of the new vector?

Hint: Use is.na() to detect missing values.

exp_marriage_age_cleaned <- exp_marriage_age[!is.na (as.integer (exp_marriage_age))]

Warning: NAs introduced by coercion

exp_marriage_age_cleaned <- as.integer (exp_marriage_age_cleaned)
exp_marriage_age_cleaned

  [1]  24  30  28  29  30  27  26  28  26  28  30  30  30  30  25  25  30  28
 [19]  30  25  28  28  25  25  27  28  30  30  35  26  28  27  27  30  25  30
 [37]  26  32  27  26  27  26  28  37  28  28  28  35  28  27  28  26  28  26
 [55]  30  27  30  28  25  26  28  35  29  27  27  30  24  25  29  27  33  30
 [73]  30  25  26  30  32  26  30  30  25  27  27  25  27  27  32  26  25  28
 [91]  33  28  35  25  30  29  30  31  28  28  30  40  30  28  30  27  28  27
[109]  28  35  30  30  30  35  28  31  30  27  33  32  27  27  26  25  26  29
[127]  28  34  26  24  28  30 120  25  33  27  28  32  30  26  30  30  28  27
[145]  27  27  27  27  27  28  30  30  30  28  30  28  30  30  28  28  30  27
[163]  30  28  25  69  28  28  33  30  28  28  26  30  26  27  30  25  27  27
[181]  25

12.2.2.2 Capping unreasonably high values

Cap the values greater than 80 to 80, in the clean vector obtained above. What is the mean age when people expect to marry in the new vector?

exp_marriage_age_cleaned [exp_marriage_age_cleaned > 80] <- 80
mean (exp_marriage_age_cleaned)

[1] 28.9558

12.2.2.3 People marrying at 30 or more

length (exp_marriage_age_cleaned [exp_marriage_age_cleaned >= 30])/length(exp_marriage_age_cleaned) *100

[1] 37.01657

12.2.3 Exercise 3

Write a function that identifies if a word is a palindrome (A palindrome is a word that reads the same both backwards and forwards, for example, peep, rotator, madam, etc.). Apply the function to the vector of words below to count the number of palindrome words.

words_vec <- c('fat', 'civic', 'radar', 'mountain', 'noon','papa')

palindrome <- function(word) {
  for (i in 1:as.integer(nchar(word)/2)) {
    if (substr(word, i, i) != substr(word, nchar(word) - (i-1), nchar(word) - (i-1))) {
      return(FALSE)
    }
  }
  return(TRUE)
}
sum(sapply(words_vec, palindrome))

[1] 3

Try to solve this without using a for loop

Hints:

Define a function to reverse a word. You may need strsplit() to split the word into individual characters and paste() to concatenate them back into a word.
Use sapply() to apply a function to each element of an atomic vector.

reverse_word <- function (s){
  paste (rev(strsplit (s, "")[[1]]), collapse = "")
}
reversed_words_vec <- sapply (words_vec, reverse_word)

is_palindrome <- words_vec == reversed_words_vec
sum (is_palindrome)

[1] 3

Understanding `collapse` vs. `sep` in `paste()` in R

The paste() function in R allows you to combine strings, but collapse and sep have different roles.

`sep` (Separator Between Arguments)

Used when multiple arguments are passed to paste().
Defines the separator between these arguments.

paste("Hello", "World", sep = "-")

[1] "Hello-World"

paste("R", "is", "fun", sep = " | ")

[1] "R | is | fun"

Use sep when combining multiple elements passed separately.

`collapse` (Collapsing a Vector into One String)

Used when concatenating elements of a vector into a single string.
Defines the separator between vector elements.

char_vec <- c("R", "is", "fun")
paste(char_vec, collapse = " ")

[1] "R is fun"

char_vec <- c("apple", "banana", "cherry")
paste(char_vec, collapse = ", ")

[1] "apple, banana, cherry"

char_vec <- c("l", "o", "v", "e")
paste(char_vec, collapse = "")

[1] "love"