Introduction to R studio

Tufte Handout with R Markdown

DATASCI 120 Teaching Team

2026-03-27

R and R studio

Installation

Even if you use RStudio, you’ll still need to download R to your computer. RStudio helps you use the version of R that lives on your computer, but it doesn’t come with a version of R on its own.

Install and load packages

Currently, the CRAN package repository features 20646 available packages.

R Markdown

This is an R Markdown document. It is a file format in R that lets you mix text, figures, R code, and the code’s results in one document, which can be HTML, PDF, or MS Word. When you click the Knit button a document will be generated.

For more details on using R Markdown see the NYTcovid files on Canvas, as well as http://rmarkdown.rstudio.com.

Tufte handout style

The Tufte handout style is a style that Edward Tufte uses in his books and handouts. Tufte’s style is known for its extensive use of sidenotes, tight integration of graphics with text, and well-set typography.

Any use of a footnote will automatically be converted to a sidenote.1 This is a sidenote that was entered using a footnote.

To place ancillary information in the margin without the sidenote mark (the superscript number), you can use the margin_note() function. This is a margin note. Notice that there is no number preceding the note.

Or, you can write a chunk starting with ```{marginfigure} instead of R code chunks ```{r}. See an example on the right about the first fundamental theorem of calculus. We know from the first fundamental theorem of calculus that for \(x\) in \([a, b]\): \[\frac{d}{dx}\left( \int_{a}^{x} f(u)\,du\right)=f(x).\]

By default, figures are placed in the main column.

When you want a margin figure, all you need to do is the chunk option fig.margin = TRUE.

simple margin picture. simple margin picture.

a <- c(1:10)
plot(a)

When you want a full-width figrue, set fig.fullwidth = TRUE.

a <- c(1:10)
plot(a)
simple full-width picture.

simple full-width picture.

This style provides first and second-level headings (that is, # and ##), demonstrated in the next section. You may get unexpected output if you try to use ### and smaller headings.

References can be displayed as margin notes for HTML output. For example, we can cite R here (R Core Team 2025R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.).

In R, both = and <- can be used for assigning a value to variables. R uses 1-indexing, meaning that the indexing of elements in vectors and matrices starts at 1, not 0 as in some other programming languages.

Vectors

vector_a <- c(1:10)
cat("vector_a:", vector_a,'\n') 
## vector_a: 1 2 3 4 5 6 7 8 9 10
#cat stands for "concatenate and print"
vector_b <- seq(1, 10, by=2)  
# Create a vector with a sequence of numbers
cat("vector_b:", vector_b,'\n')
## vector_b: 1 3 5 7 9
vector_c <- rep(2, length(vector_b))  
# Replicate 2, matching the length of vector_c
cat("vector_c:", vector_c,'\n')
## vector_c: 2 2 2 2 2
vector_d <- vector_b + vector_c 
# Element-wise addition of vectors
cat("vector_d:", vector_d,'\n')
## vector_d: 3 5 7 9 11

Matrices

matrix_a <- matrix(1:9, nrow=3, byrow=TRUE)  
# A 3x3 matrix with values 1 to 9
print(matrix_a)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
matrix_b <- diag(3) 
# Create a 3x3 identity matrix
print(matrix_b)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
# Element-wise multiplication of matrices
element_wise_product <- matrix_a * matrix_b
print(element_wise_product)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    5    0
## [3,]    0    0    9
# Matrix multiplication
product_matrix <- matrix_a %*% matrix_b
print(product_matrix)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
# Applying a function to rows of a matrix
row_means <- apply(matrix_a, 1, mean)  
# Calculate mean of each row
print(row_means)
## [1] 2 5 8
# Eigenvalues and eigenvectors of a matrix
eigen_values_vectors <- eigen(matrix_a)
print(eigen_values_vectors$values)
## [1]  1.611684e+01 -1.116844e+00 -1.303678e-15
# Accessing eigenvectors
eigenvectors <- eigen_values_vectors$vectors
# Indexing an eigenvector, e.g., the first eigenvector
first_eigenvector <- eigenvectors[, 1]
# Print the first eigenvector
print(first_eigenvector)
## [1] -0.2319707 -0.5253221 -0.8186735

Simple data analysis

Any content in tufte’s style can span to the full width of the page. This, along with the sections that follow, serves as illustrative examples.

R includes many pre-installed datasets accessible via data("datasetName"). For importing your own data, such as from a CSV file, read.csv("path/to/your/data.csv") is commonly used.

Here are a few simple data analysis commands using the built-in dataset mtcars. The mtcars dataset contains fuel consumption data (measured in miles per gallon, or mpg) and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The dataset includes variables such as the number of cylinders (cyl), displacement (disp), horsepower (hp), and weight (wt).

Exploratory data analysis:

# Load the 'mtcars' dataset
data("mtcars")

# Summary statistics of the dataset
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
# Calculate the mean miles per gallon (mpg)
mean_mpg <- mean(mtcars$mpg)
cat("Average MPG:", mean_mpg)
## Average MPG: 20.09062
# Simple plot: Miles per Gallon (MPG) vs. Weight (1000 lbs)
plot(mtcars$wt, mtcars$mpg, main="MPG vs. Weight",
     xlab="Weight (1000 lbs)", ylab="Miles per Gallon",
     pch=19, col="blue")

# main specifies the main title of the plot
# lab stands for label
# pch stands for "plotting character."
# It specifies the symbol used in the plot for each point.
# col stands for color

Statistical models:

The lm function comes with base R and does not require the installation of additional packages to use. It is part of R’s standard statistical functions, designed to perform linear modeling, including fitting linear regression models.

# Fit a linear model: MPG ~ Weight
lm_model <- lm(mpg ~ wt, data = mtcars)
summary(lm_model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
# Print coefficients of the linear model
cat("Coefficients of the linear model:\n")
## Coefficients of the linear model:
print(coef(lm_model))
## (Intercept)          wt 
##   37.285126   -5.344472
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "MPG vs. Weight with Regression Line (ggplot2)",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'