[1] 5
[1] 6
[1] 21
[1] 3.75
[1] 8
[1] 2
[1] 3
NRES Capstone Workshop
2026-03-03
Why learn R?
✅ Free forever
✅ Powerful graphics
✅ Thousands of packages
✅ Reproducible research
✅ High demand in job market
| R | RStudio | |
|---|---|---|
| What it is | The engine | The car dashboard |
| Function | Runs your code | Makes coding easier |
| Can you use alone? | Yes | No (needs R) |
| Recommended? | Install first | Use this daily |
Tip
Think of R as the engine and RStudio as the car. You need both!
.exe fileImportant
Always install the latest version of R.
.exe fileNote
Posit is the company that makes RStudio. It’s completely free!
Open RStudio and type this in the Console panel:
Tip
If you see a version number – congratulations! R is installed correctly 🎉
+---------------------+---------------------+
| | |
| Source Editor | Environment / |
| (Write scripts) | History |
| | |
+---------------------+---------------------+
| | |
| Console | Files / Plots / |
| (Run code here) | Packages / Help |
| | |
+---------------------+---------------------+
.R or .qmdThe working directory is the folder where R looks for files.
# Check your current working directory
getwd()
# Set a new working directory
setwd("C:/Users/Manoj/OneDrive - Kansas State University/00_NRES_research_project")
# Better way: Use RStudio Projects!
# File -> New Project -> New Directory
Tip
Best Practice: Always use RStudio Projects (.Rproj files). They automatically set your working directory!
Packages are add-ons that extend R’s capabilities.
# Install a package (only do this ONCE)
install.packages("tidyverse")
# Load a package (do this EVERY session)
library(tidyverse)
# Check what packages are installed
installed.packages()[, "Package"]Warning
install.packages() = download from internet (once)
library() = load into session (every time you open R)
Go to Tools -> Global Options:
[1] 5
[1] 6
[1] 21
[1] 3.75
[1] 8
[1] 2
[1] 3
Note
The # symbol creates a comment. R ignores everything after # on a line.
# Assign values using <- (preferred) or =
my_name <- "Alice"
age <- 20
gpa <- 3.85
is_student <- TRUE
# View the value
my_name # Prints: "Alice"[1] "Alice"
[1] 20
[1] "age" "gpa" "is_student" "my_name"
[5] "pandoc_dir" "quarto_bin_path"
Tip
Shortcut: Press Alt + - to type <- automatically in RStudio!
Tip
Use snake_case (words separated by underscores) – it’s the most readable style in R!
[1] "numeric"
[1] "character"
[1] "logical"
[1] "integer"
[1] TRUE
[1] TRUE
[1] TRUE
[1] 42
[1] "100"
[1] FALSE
[1] TRUE
[1] NA
A vector is a sequence of values of the same type.
[1] 5
[1] 439
[1] 87.8
[1] 96
[1] 78
[1] 6.870226
[1] 85
[1] 92 78 96
[1] 92 78 96 88
[1] FALSE
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE
[1] 85 92 88
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# Loop over a vector
fruits <- c("apple", "banana", "cherry")
for (fruit in fruits) {
print(paste("I like", fruit))
}[1] "I like apple"
[1] "I like banana"
[1] "I like cherry"
# Practical: calculate squares
squares <- c()
for (i in 1:5) {
squares <- c(squares, i^2)
}
print(squares) # 1 4 9 16 25[1] 1 4 9 16 25
[1] 4
[1] 7
[1] 3.14
[1] "HELLO"
# Writing your OWN function
say_hello <- function(name) {
message <- paste("Hello,", name, "!")
return(message)
}
say_hello("Alice") # "Hello, Alice !"[1] "Hello, Alice !"
# Function with default argument
greet <- function(name, greeting = "Hello") {
paste(greeting, name)
}
greet("Bob") # "Hello Bob"[1] "Hello Bob"
[1] "Hi there Bob"
# Create a data frame
students <- data.frame(
name = c("Alice", "Bob", "Carol", "David"),
age = c(20, 22, 21, 23),
grade = c(92, 85, 78, 96),
pass = c(TRUE, TRUE, TRUE, TRUE)
)
# View the data
students # Print the whole table name age grade pass
1 Alice 20 92 TRUE
2 Bob 22 85 TRUE
3 Carol 21 78 TRUE
4 David 23 96 TRUE
name age grade pass
1 Alice 20 92 TRUE
2 Bob 22 85 TRUE
3 Carol 21 78 TRUE
4 David 23 96 TRUE
[1] 4
[1] 4
[1] 4 4
'data.frame': 4 obs. of 4 variables:
$ name : chr "Alice" "Bob" "Carol" "David"
$ age : num 20 22 21 23
$ grade: num 92 85 78 96
$ pass : logi TRUE TRUE TRUE TRUE
name age grade pass
Length:4 Min. :20.00 Min. :78.00 Mode:logical
Class :character 1st Qu.:20.75 1st Qu.:83.25 TRUE:4
Mode :character Median :21.50 Median :88.50
Mean :21.50 Mean :87.75
3rd Qu.:22.25 3rd Qu.:93.00
Max. :23.00 Max. :96.00
# Simple scatter plot
x <- 1:10
y <- x^2
plot(x, y,
main = "My First R Plot",
xlab = "X Values",
ylab = "Y = X squared",
col = "steelblue",
pch = 16, # filled circles
type = "b") # both points and lines
# Bar chart
grades <- c(A = 5, B = 8, C = 4, D = 1)
barplot(grades,
main = "Grade Distribution",
col = c("green", "blue", "orange", "red"))
# How to get help
# Use the help() function or the Help panel in RStudio
help("mean")
help("sum")
# Check function arguments
args(round)
# See a function's source code
print
# Useful for packages
vignette("dplyr")Tip
Google is your friend! Search “R how to [do something]” – Stack Overflow has answers for almost everything.
| Error | What it Means | Fix |
|---|---|---|
object 'x' not found |
Variable doesn’t exist | Check spelling / create it first |
could not find function |
Package not loaded | Run library(package) |
unexpected symbol |
Syntax error | Check for missing commas/parentheses |
NAs introduced by coercion |
Type conversion failed | Check your data types |
subscript out of bounds |
Index too large | Check length() or nrow() |
Try these in your RStudio Console:
# 1. Calculate: (15 + 3) * 2 / 4
# 2. Create a variable called "my_name" with your name
# 3. Create a vector of 5 of your favorite numbers
# 4. Find the mean of your vector
# 5. Which numbers in your vector are greater than the mean?Note
Type the code yourself – don’t copy-paste! Muscle memory matters in programming.
# Write a function called "bmi_calculator" that:
# - Takes weight (kg) and height (m) as arguments
# - Calculates BMI = weight / height^2
# - Returns the BMI value
# Test it with: weight = 70 kg, height = 1.75 m
# Expected answer: 22.86
# BONUS: Add an if-else inside that prints:
# "Underweight" if BMI < 18.5
# "Normal" if BMI 18.5-24.9
# "Overweight" if BMI >= 25Free Online Books:
Interactive Learning:
| Package | Purpose |
|---|---|
tidyverse |
Data wrangling & visualization |
ggplot2 |
Beautiful graphics |
dplyr |
Data manipulation |
readr |
Reading CSV files |
tidyr |
Reshaping data |
lubridate |
Working with dates |
stringr |
Working with text |
# -- BASICS ------------------------------------------
x <- 5 # assign
c(1, 2, 3) # create vector
1:10 # sequence
length(x)
class(x) # inspect
# -- DATA FRAMES -------------------------------------
df <- data.frame(a=1:3, b=c("x","y","z"))
df$a # access column
df[1,] # access row
nrow(df); ncol(df) # dimensions
# -- CONTROL FLOW ------------------------------------
if (x > 0) { } else { } # conditional
for (i in 1:10) { } # loop
function(x) { return(x) } # function
# -- USEFUL FUNCTIONS --------------------------------
sum(); mean(); sd(); max(); min()
paste(); paste0() # combine strings
is.na(); na.omit() # handle missinglibrary(readxl)
library(dplyr)
library(tidyr)
setwd("C:/Users/Manoj/OneDrive - Kansas State University/00_NRES_research_project")
yr2020 <- read_excel("01_RawData/CRPPracticesbyCountyJAN20.xlsx", sheet = "JAN20", skip = 4, col_names = FALSE)
yr2021 <- read_excel("01_RawData/CRPPracticesbyCountyJAN21.xlsx", sheet = "JAN21", skip = 4, col_names = FALSE)
yr2022 <- read_excel("01_RawData/CRPPracticesbyCountyJAN22.xlsx", sheet = "JAN22", skip = 4, col_names = FALSE)
yr2023 <- read_excel("01_RawData/CRPPracticesbyCountyJAN23.xlsx", sheet = "JAN23", skip = 4, col_names = FALSE)
yr2024 <- read_excel("01_RawData/CRPPracticesbyCountyJAN24.xlsx", sheet = "JAN24", skip = 4, col_names = FALSE)
yr2025 <- read_excel("01_RawData/CRPPracticesbyCountySEP25.xlsx", sheet = "SEP25", skip = 4, col_names = FALSE)col_2020 <- c(
"fips", "state", "county",
"cp1", "cp2", "cp3", "cp3a_pine", "cp3a_hardwood",
"cp4d", "cp4b", "cp5", "cp6_cp7",
"cp8", "cp9", "cp10", "cp11", "cp12",
"cp15", "cp16", "cp17", "cp18",
"cp21", "cp22",
"cp23", "cp23_floodplain", "cp23a_nonfloodplain",
"cp24", "cp25",
"cp27", "cp28", "cp29", "cp30",
"cp31", "cp32", "cp33", "cp36", "cp37", "cp38",
"cp39", "cp40", "cp41",
"cp42",
"cp87", "cp88",
"total"
)col_2021_2023 <- c(
"fips", "state", "county",
"cp1", "cp2", "cp3", "cp3a_pine", "cp3a_hardwood",
"cp4d", "cp4b", "cp5", "cp6_cp7",
"cp8", "cp9", "cp10", "cp11", "cp12",
"cp15", "cp16", "cp17", "cp18",
"cp21", "cp22",
"cp23", "cp23_floodplain", "cp23a_nonfloodplain",
"cp24", "cp25",
"cp27", "cp28", "cp29", "cp30",
"cp31", "cp32", "cp33", "cp36", "cp37", "cp38",
"cp39", "cp40", "cp41",
"cp42", "cp43",
"cp87", "cp88", "cp90",
"total"
)col_2024_2025 <- c(
"fips", "state", "county",
"cp1", "cp2", "cp3", "cp3a_pine", "cp3a_hardwood",
"cp4d", "cp4b", "cp5", "cp6_cp7",
"cp8", "cp9", "cp10", "cp11", "cp12",
"cp15", "cp16", "cp17", "cp18",
"cp21", "cp22",
"cp23", "cp23_floodplain", "cp23a_nonfloodplain",
"cp24", "cp25",
"cp27", "cp28", "cp29", "cp30",
"cp31", "cp33", "cp36", "cp37", "cp38",
"cp39", "cp40", "cp41",
"cp42", "cp43",
"cp87", "cp88", "cp90",
"total"
)# Assign column names to each year
colnames(yr2020) <- col_2020
colnames(yr2021) <- col_2021_2023
colnames(yr2022) <- col_2021_2023
colnames(yr2023) <- col_2021_2023
colnames(yr2024) <- col_2024_2025
colnames(yr2025) <- col_2024_2025
# Add a year column to each dataset
yr2020$year <- 2020
yr2021$year <- 2021
yr2022$year <- 2022
yr2023$year <- 2023
yr2024$year <- 2024
yr2025$year <- 2025# Remove junk rows (where fips is not a number)
yr2020 <- yr2020[grepl("^\\d+$", as.character(yr2020$fips)), ]
yr2021 <- yr2021[grepl("^\\d+$", as.character(yr2021$fips)), ]
yr2022 <- yr2022[grepl("^\\d+$", as.character(yr2022$fips)), ]
yr2023 <- yr2023[grepl("^\\d+$", as.character(yr2023$fips)), ]
yr2024 <- yr2024[grepl("^\\d+$", as.character(yr2024$fips)), ]
yr2025 <- yr2025[grepl("^\\d+$", as.character(yr2025$fips)), ]# bind_rows() automatically fills missing columns with NA
crp_all <- bind_rows(yr2020, yr2021, yr2022, yr2023, yr2024, yr2025)
# Quick check
dim(crp_all) # should show total rows x columns[1] 14334 48
[1] "fips" "state" "county"
[4] "cp1" "cp2" "cp3"
[7] "cp3a_pine" "cp3a_hardwood" "cp4d"
[10] "cp4b" "cp5" "cp6_cp7"
[13] "cp8" "cp9" "cp10"
[16] "cp11" "cp12" "cp15"
[19] "cp16" "cp17" "cp18"
[22] "cp21" "cp22" "cp23"
[25] "cp23_floodplain" "cp23a_nonfloodplain" "cp24"
[28] "cp25" "cp27" "cp28"
[31] "cp29" "cp30" "cp31"
[34] "cp32" "cp33" "cp36"
[37] "cp37" "cp38" "cp39"
[40] "cp40" "cp41" "cp42"
[43] "cp87" "cp88" "total"
[46] "year" "cp43" "cp90"
# Convert all CP columns to numeric
crp_all <- crp_all %>%
mutate(across(starts_with("cp"), as.numeric),
fips = as.integer(fips))
# Reshape from WIDE to LONG format
# One row per county per practice per year
crp_long <- crp_all %>%
pivot_longer(
cols = starts_with("cp"), # all CP practice columns
names_to = "practice", # column name -> "practice"
values_to = "acres" # column value -> "acres"
) %>%
select(year, fips, state, county, practice, acres)practice_lookup <- data.frame(
practice = c(
"cp1", "cp2", "cp3", "cp3a_pine", "cp3a_hardwood",
"cp4d", "cp4b", "cp5", "cp6_cp7",
"cp8", "cp9", "cp10", "cp11", "cp12",
"cp15", "cp16", "cp17", "cp18",
"cp21", "cp22",
"cp23", "cp23_floodplain", "cp23a_nonfloodplain",
"cp24", "cp25",
"cp27", "cp28", "cp29", "cp30",
"cp31", "cp32", "cp33", "cp36", "cp37", "cp38",
"cp39", "cp40", "cp41",
"cp42", "cp43",
"cp87", "cp88", "cp90"
),
practice_name = c(
"Grass Planting - Introduced",
"Grass Planting - Native",
"Tree Planting - Softwoods",
"Tree Planting - Longleaf Pine",
"Tree Planting - Hardwoods",
"Wildlife Habitat",
"Wildlife Corridors",
"Field Windbreaks",
"Diversions & Erosion Control",
"Grass Waterways",
"Shallow Water for Wildlife",
"Existing Grass",
"Existing Trees",
"Wildlife Food Plots",
"Contour Grass Strips",
"Shelterbelts",
"Living Snow Fences",
"Salinity Reducing Vegetation",
"Filter Strips",
"Riparian Buffers",
"Wetland Restoration",
"Wetland Restoration - Floodplain",
"Wetland Restoration - Non-Floodplain",
"Cross Wind Trap Strips",
"Rare and Declining Habitat",
"Farmable Wetland - Wetland",
"Marginal Pasture Buffer - Buffer",
"Marginal Pasture Buffer - Wildlife",
"Marginal Pasture Buffer - Wetland",
"Bottomland Hardwood Trees",
"Expired Hardwood Trees",
"Upland Bird Habitat Buffers",
"Longleaf Pine",
"Duck Nesting Habitat",
"State Acres for Wildlife Enhancement",
"Constructed Wetlands",
"Aquaculture Wetlands",
"Flooded Prairie Wetlands",
"Pollinator Habitat",
"Prairie Strips",
"CRP Grasslands - Introduced",
"CRP Grasslands - Native",
"Soil Health Perennial Conservation Cover"
)
)
crp_long <- left_join(crp_long, practice_lookup, by = "practice")# A tibble: 6 × 7
year fips state county practice acres practice_name
<dbl> <int> <chr> <chr> <chr> <dbl> <chr>
1 2020 1001 ALABAMA AUTAUGA cp1 NA Grass Planting - Introduced
2 2020 1001 ALABAMA AUTAUGA cp2 NA Grass Planting - Native
3 2020 1001 ALABAMA AUTAUGA cp3 882. Tree Planting - Softwoods
4 2020 1001 ALABAMA AUTAUGA cp3a_pine 79.4 Tree Planting - Longleaf Pine
5 2020 1001 ALABAMA AUTAUGA cp3a_hardwood 39.4 Tree Planting - Hardwoods
6 2020 1001 ALABAMA AUTAUGA cp4d 10.1 Wildlife Habitat
[1] 616362 7
Happy Coding! 🎉
Remember: Every expert was once a beginner.
The best way to learn R is to use R every day!
Introduction to R | NRES Capstone