Scatter Plot in R

In this post, we will learn how make scatter plots using R and the package ggplot2. This is the notebook for the scatter plot in R tutorial: https://www.marsja.se/how-to-make-a-scatter-plot-in-r-with-ggplot2/

Table of Contents

Required r-packages

You need to install the packages used in this tutorial packages before continuing.

How to Install R-packages

You install packages with the install.packages() function. Make sure to uncomment (remove the '#') if you actually need to install the packages!

In [1]:
# install.packages(c("tidyverse", "GGally"))

Here's the single packages used in the tutorial, if you only want those installed;

In [2]:
# to.install <- c("magittr", "purrr", 
#  "ggplot2", "dplyr", "broom", "GGally")
# install.packages(to.install)

How to Make a Scatter Plot in R

Time to learn how to produce a scatter plot using R statistical programming environment and we start by using the mtcars dataset.

In [3]:
require(ggplot2)

head(mtcars)
Loading required package: ggplot2
Warning message:
"package 'ggplot2' was built under R version 3.6.1"
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX421.0 6 160 110 3.90 2.62016.460 1 4 4
Mazda RX4 Wag21.0 6 160 110 3.90 2.87517.020 1 4 4
Datsun 71022.8 4 108 93 3.85 2.32018.611 1 4 1
Hornet 4 Drive21.4 6 258 110 3.08 3.21519.441 0 3 1
Hornet Sportabout18.7 8 360 175 3.15 3.44017.020 0 3 2
Valiant18.1 6 225 105 2.76 3.46020.221 0 3 1

Data can, also, be stored in Excel files:

How to use Ggplot2 to Produce Scatter Plots in R

In this section we will learn how to make scattergraphs in R using ggplot2.

How to Make a Scatter Plot in R

We will start by visualizing the variables wt (x-axis) and mpg (y-axis).

Before going on and creating the first scatter plot in R we will briefly cover ggplot2 and the plot functions we are going to use. First, we start by using ggplot to create a plot object.

Inside of the ggplot() function, we’re calling the aes() function that describe how variables in our data are mapped to visual properties . In this simple scatter plot in R example, we only use the x- and y-axis arguments and ggplot2 to put our variable wt on the x-axis, and put mpg on the y-axis.

In [4]:
require(ggplot2)

gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars)

gp + geom_point()

How to Change the Size of the Dots in a Scatter Plot

Here we'' change the size of the markers size using the size argument.

In [5]:
gp + geom_point(size = 4)

Note, we used aes() but added the size argument to the geom_point() function.

In [6]:
gp + geom_point(aes(size = wt))

How to Change the Number of ´ticks using ggplot2

To change the x-axis we use the function scale_x_continuous and to change the y-axis we use the function scale_y_continuous. Furthermore, we use the arguments limits, which take a vector, and we can set the limits to change the ticks.

In [7]:
gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars) + 
    geom_point()

gp + scale_y_continuous(limits=c(1, 40)) +
   scale_x_continuous(limits=c(0, 6))

Next we also change the number of ticks by adding the breaks argument to the above functions. Furthermore, we add the seq function to create a numeric vector.

In [8]:
gp + scale_y_continuous(limits=c(1, 35),
                                breaks=seq(1, 35, 5)) +
   scale_x_continuous(limits=c(1.5, 5.5),
                     breaks=seq(1.5, 5.5, 1))

Grouped Scatter Plot in R

Here we group by using color argument and the factor function to change the variable vs to a factor.

In [9]:
gp <- ggplot(aes(x=wt, y=mpg, color=factor(vs)), 
             data=mtcars)
gp + geom_point()

Another option is using the as.factor function and change vs to a factor in the dataframe object.

In [10]:
mtcars$vs <- as.factor(mtcars$vs)
gp <-ggplot(aes(x=wt, y=mpg, color=vs), 
             data=mtcars)
gp + geom_point()

Changing the Markers (the dots)

Here we are adding thea aes() function in the geom_point() function. In the aes() function we are adding the color and shape arguments and add the class column (the categorical variable).

In [11]:
data(Burt, package = 'carData')
Burt$class <- as.factor(Burt$class)

gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class, 
             shape = class))

How to Add a Trend Line to a Scatter Plot in R

We use the geom_smooth() function and the method “lm” to add a regression line.

In [12]:
gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class, 
             shape = class)) +
      geom_smooth(method = "lm", se = FALSE)

In the next scatter plot example, we are going to add a regression line to the plot for each factor (category) also. Remember, we just add the color and shape arguments to the geom_point() function:

In [13]:
gp + geom_point(aes(color = class, 
             shape = class)) +
      geom_smooth(aes(color = class), method = "lm", se = FALSE)

Bivariate Distribution on a Scatter plot

We are adding a bivariate distribution on the scatter plot in R using the geom_density2d() function.

In [14]:
gp <- ggplot(aes(x=wt, y=mpg), 
             data=mtcars)
gp + geom_point() + geom_density2d()

How to Add Text to Scatter Plot in R

Let's carry out correlation analysis using R, extract the r– and p-values, and later learn how to add this as text to our scatter plot.

In [15]:
require(dplyr)
require(broom)

corr <- mtcars  %$%
             cor.test(mpg, wt) %>%
             tidy %>%
             mutate_if(is.numeric, round, 4)

corr
Loading required package: dplyr

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Loading required package: broom
Warning message:
"package 'broom' was built under R version 3.6.1"
estimatestatisticp.valueparameterconf.lowconf.highmethodalternative
-0.8677 -9.559 0 30 -0.9338 -0.7441 Pearson's product-moment correlationtwo.sided
In [16]:
text = paste0('r = ', corr$estimate, ', ',
             ifelse(corr$p.value <= 0,
                           'p < 0.05',
                   paste('p = ', corr$p.value))
             )

text
'r = -0.8677, p < 0.05'

Adding Text to a Plot in R

We add text using theannotate function.

In [17]:
gp <- ggplot(aes(x = wt, y = mpg), 
             data = mtcars)
gp + geom_point() + geom_smooth(method = "lm", se = FALSE) +
    annotate('text',  x = 4.5, y = 35, label=text)

More Complex Correlation Test and Text Example

In [18]:
require(tidyr)
require(purrr)

data(Burt, package = 'carData')


corr <- Burt %>% group_by(class) %>%
  nest() %>% 
  mutate(Cor = map(data, ~ cor.test(.$IQbio, .$IQfoster)),
         p   = map_dbl(Cor, 'p.value'),
         est = map_dbl(Cor, 'estimate')
             ) %>%
  mutate_if(is.numeric, round, 4) %>%
  select(class, p, est, Cor)

text <- corr %>%
  mutate(
        text = paste0('r = ', est, ', ',
             ifelse(p <= 0.01,
                           'p < 0.05',
                   paste('p = ', p))))


Burt$class <- as.factor(Burt$class)

gp <- ggplot(aes(x = IQbio, y = IQfoster), 
             data = Burt) 


corrp <- gp + geom_point(aes(color = class, 
             shape=class)) +
      geom_smooth(aes(color = class), method = "lm", se = FALSE) +
      geom_text(aes(x = 120, y = 137, color="high", 
                    label=subset(text, class == "high")$text)) +
      geom_text(aes(x = 118, y = 109, color="medium", 
                    label=subset(text, class == "medium")$text)) +
      geom_text(aes(x = 124, y = 103, color="low", 
                    label=subset(text, class == "low")$text))

corrp
Loading required package: tidyr
Loading required package: purrr

How to Rotate the Axis using Ggplot2

Here's how to rotate the axis labels

In [19]:
data(Salaries, package = "carData")
Salaries$rank <- as.factor(Salaries$rank)

gp <- ggplot(aes(x = salary, y = yrs.since.phd), 
             data = Salaries) + 
    geom_point(aes(color = rank, 
             shape = rank)) +
    geom_smooth(method = "lm") +
   scale_y_continuous(limits = c(0, 60)) +
   scale_x_continuous(limits = c(50000, 240000), 
                      breaks = seq(50000, 240000, by = 10000))

To rotate the axis do this:

In [20]:
gp + theme(axis.text.x = 
           element_text(angle = 90, hjust = 1))

How to Style a Scatter plot in R

Here we use the theme_bw() function to get a dark-light themed plot. Then, we are going to make the scatter plot in black and grey colors using the scale_colour_grey() function. Finally, we add a theme layer using the function theme().

The function element_blank() will make draw “nothing” at that particular parameter. For instance, plot.background = element_blank() will give the plot a blank (white) background.

In [21]:
corrp + theme_bw() +  scale_colour_grey() +
  theme(axis.line = element_line(colour = "black")
        ,plot.background = element_blank()
        ,panel.grid.major = element_blank()
        ,panel.grid.minor = element_blank()
        ,strip.background = element_blank()
        ,panel.border = element_blank() 
        ,legend.title=element_blank()
        ,legend.key = element_blank())

Pairplot in R: Scatterplot + Histogram

Let's create the pairplots using the package GGally.

In [22]:
require(GGally)

cols = c('mpg', 'wt', 'hp', 'qsec')
ggpairs(mtcars, columns = cols)
Loading required package: GGally
Warning message:
"package 'GGally' was built under R version 3.6.1"Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Attaching package: 'GGally'

The following object is masked from 'package:dplyr':

    nasa

Saving a High Resolution Plot in R

In this section, we are going to learn how to save ggplot2 plots as PDF and TIFF files.

In [23]:
data(Salaries, package = "carData")


gp <- ggplot(aes(x=yrs.since.phd, y=salary), 
             data=Salaries) + 
    geom_point() +
    geom_smooth(method = "lm", se = FALSE, colour="gray") +
    theme_bw() +
    theme(axis.line = element_line(colour = "black")
        ,plot.background = element_blank()
        ,panel.grid.major = element_blank()
        ,panel.grid.minor = element_blank()
        ,strip.background = element_blank()
        ,panel.border = element_blank() 
        ,legend.title=element_blank()
        ,legend.key = element_blank())  +
    xlab('Years since Ph.D.') +
    ylab('Salary')

Now we can use the ggsave() function to save the scatter plot.

How to Save a Scatter Plot to PDF in R

Let's save a pdf!

In [24]:
ggsave("salaries_by_year_scatterplot.pdf", device = "pdf",
       width = 12, height = 8,
       units = "cm", dpi = 300)

How to Save a Scatter Plot to TIFF in R

Let's save a TIFF!

In [25]:
ggsave("salaries_by_year_scatterplot.tiff", device = "tiff",
       width = 12, height = 8,
       units = "cm", dpi = 300)