1 Short version
2 Basics
3 Exploratory data analysis (very brief)
- 3.1 Smoothers
4. Plot options
- 4.1 Point size & color
- 4.2 Axis tickmarks
5 Maps

1 Short version

Set up the R work environment to produce publication quality documents using ggplot. Use theme_set to define the base figure elements - in this case: no gridlines, a white background, and 12 pt times new roman font.

Note: this assumes you are working in a Windows environment

Load packages

library(extrafont)
#font_import() only do this one time - it takes a while
loadfonts(device="win")
windowsFonts(Times=windowsFont("TT Times New Roman"))

library(ggplot2)
theme_set(theme_bw(base_size=12, base_family='Times New Roman')+ 
  theme(panel.grid.major = element_blank(),
  panel.grid.minor = element_blank()))

Use ggsave to increase dpi from base R level (72 dpi) and to defining figure dimensions.

ggplot(data, aes(x=x, y=y)) + 
  geom_point()
ggsave("filename.png", dpi=300, height=4, width=5, units="in")

2 Basics

2.1 ggplot base figures

You say you want to create a publication quality figure using ggplot? Well then, you are in the right place!
First, let’s look at what ggplot produces as a base graphic. We will use the R built in data set mtcars.

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + 
  geom_point()

Figure 1. Base plot.

We get a gray background with small black dots, white grid lines, and a font that is not particularly legible when the figure is placed in this document. Most journals require a figure to be 300 dpi or greater not to mention the background should be white, without grid lines, and with a font that is consistent with the rest of the manuscript. Lets address these step by step.

2.2 White background

Want a white background? We’ve got that. Simply add a theme to the figure. In this case we will add theme_bw().

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() + 
  theme_bw()

Figure 1b. White background.

2.3 Remove grid lines

To remove gridlines add another theme, and use element_blank() to clear the lines.

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
   theme_bw() + 
   theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank())

Figure 1c. White background, no gridlines.

2.4 Fonts and resolution

Fonts are both easy and not so easy to deal with depending on what your needs are.
In this case we want a Times New Roman 12 pt font. The 12 pt is easy so let’s start with that. Simply add a base size option in theme_bw().

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
   theme_bw(base_size = 12) +
   theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank())

Fig 2. White background, no gridlines, plus a font size change.

Hey nothing happened, Figure 2 is the same as Figure 1c?
The reason for this is the file size. These figures would be saved as .png and R has a base figure output of 72 ppi. This resolution is too low for journal submissions so let’s up it. In this case I’m outputting a .png file with a set width and height and a resolution of 300.

Fig 2b. Changed font size and resolution.

In order to save the figure we will use ggsave. (Note the size increase of Figure 2b is due to presenting this on the web at 300 dpi - the ggsave function shown below will save a figure in a specified format at a chosen resolution and size).

ggsave("figure2b.png", dpi=300, dev='png', height=4.5, width=6.5, units="in")

Our figure is looking ok, but the font is not correct if you wanted Time New Roman. I’m on a Windows machine, so these procedures may be different for other operating systems. R is not terribly great at fonts so it is necessary to define the fonts clearly. This involves loading the extrafont package, then importing and unpacking the fonts. Note: Only font_import one time, it takes a while and once done you are good to go, I have it commented out as I’ve already run it.

library(extrafont)
#font_import()
loadfonts(device = "win")

Load the fonts for a windows device and define the font for windows.

windowsFonts(Times = windowsFont("TT Times New Roman"))

Now the font can be incorporated into the figure via the theme_bw() component (Figure 3).

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +
  theme_bw(base_size=12, base_family='Times New Roman') + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Fig 3. 12 pt Time New Roman font figure.

2.5 A better way: theme_set()

While all these adjustments to ggplot are great, there is a much better way for defining all of your ggfigures. It goes by the name theme_set. Here is a working example.

First load the fonts (you’ve already run the font_import() command).

library(extrafont)
library(ggplot2)
loadfonts(device = "win")
windowsFonts(Times = windowsFont("TT Times New Roman"))

then use theme_set to determine the desired font and figure parameters.

theme_set(theme_bw(base_size=12, base_family='Times New Roman')+
             theme(panel.grid.major = element_blank(), 
                   panel.grid.minor = element_blank()))

With the basic figure options set, less time is spent coding figures and more time can be spent exploring data. The code used to produce Figure 3 is now reduced to:

ggplot(mtcars, aes(wt, mpg)) + geom_point()

3 Exploratory data analysis (very brief)

3.1 Smoothers

Lets explore these data, starting with a smooth. The base smooth for ggplot is loess (Figure 4) for data groups <1000 and a gam for groups >1000.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() + 
  geom_smooth()

Fig 4. Loess smoother.

That is great and all but most of us do not use loess on a regular basis. However we can also implement other model structures (Figure 5).

In this case we added a linear regression (lm) and generalized linear regression with a polynomial (glm), a generalized additive model with knots (k) set at 5, and a non-linear least squares model. Note that the polynomial in the glm smooth could also be done via the lm smooth (also with the poly function instead of the identity component), however I’m mostly showing that there is a glm function that can be assigned different distributions (e.g., family=“binomial”), same goes for the gam smoother.

More info can be found here https://stats.idre.ucla.edu/r/faq/how-can-i-explore-different-smooths-in-ggplot2/. Other good things can be found on that site as well!

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point() +   
   geom_smooth(alpha=.1) +    # loess smoother
   geom_smooth(method='lm', alpha=.1, color=2, fill=2) +    # linear model
   geom_smooth(method='glm', alpha=.1, color=3, fill=3, formula=y~x+I(x^2)) +    # generalized linear model
   geom_smooth(method='gam', alpha=.1, color=4, fill=4, formula=y~s(x, k=5)) +   # generalized additive model
   geom_smooth(method = "nls", formula = y ~ a*x^2 + b * x + c,  # non-linear least squares
               method.args = list(start=list(a=.1,b=.5,c=.2)), 
               se = FALSE, linetype = 1,
               colour = 5, fill=5, alpha=.1)

Fig 5. Multiple smoothers.

4. Plot options

4.1 Point size & color

Want bigger points, no problem simply change the size in geom_point() (Figure). Jitter using geom_jitter(), etc.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(size=4) + 
  geom_smooth(alpha=.1)

Figure 6. Point size.

However, if you want the points to be different colors this can be added to the aesthetic component (in either the ggplot() or geom_point() component). For example lets change the size and color of the points by a car’s horsepower (hp) a continuous variable in the dataframe.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1)

Figure 7. Continuous horsepower for color and size.

The colors and sizes could also be determined by changing hp to a factor.

library(tidyverse)
mtcars %>% 
  mutate(hp = factor(hp)) %>% 
  ggplot(aes(wt, mpg)) + 
  geom_point(aes(size = hp, color=hp)) +
  geom_smooth(alpha=.1)

Figure 8. Factored horse power for point color and size.

However this will often create too many discrete groups to clearly understand what is being plotted, therefore a “better” method is to define the scales for the data points. This can be done with scale_color_gradientn().

mtcars %>% 
ggplot(aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp))+
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5)) +
  scale_size(range=c(0,7))

Figure 9. Continuous horsepower color gradient.

Want to define where the legend breaks? no problem, just add a vector of values.

bs <- seq(0,350,25) # define break locations

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs))

Figure 10. Continuous horsepower color gradient with defined breaks.

Note that the legend may fall off the plotted area of the figure. One way to fix this is to remove one of the legends, in this case the legend that references the size of the points.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE)

Figure 11. Continuous horsepower color gradient with defined breaks and one legend removed.

The legend now breaks in increments of 25 using a 5 color scale from the rainbow color palette.

4.2 Axis tickmarks

It is quite easy to change the axis tick marks. Change the y-axis to every 5th value.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
  scale_y_continuous(breaks= seq(0, 40, 5))

Figure 12. Change y-axis tick marks.

Also change the x-axis, this time to unequal spacings.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
  scale_y_continuous(breaks= seq(0, 40, 5)) +
  scale_x_continuous(breaks= c(0, 1, 1.5, 2, 3, 5))

Figure 13.Change y-axis and x-axis tick marks.

Your collaborator really prefers to have all of the tick marks present so… Change the labels while keeping the tickmarks.

ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha = 0.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0, 7), breaks=as.vector(bs),guide = FALSE) +
  scale_y_continuous(breaks= seq(0,40,5), labels=c(0,"",10,"",20,"",30,"",40)) +
  scale_x_continuous(breaks= seq(0,5,.5), labels=c(seq(0,2,.5),"","",3.5, "",4.5,""))

Figure 14.Change y-axis and x-axis tick marks.

However, the scale presented doesn’t include zero, something that is often nice to have. Most people approach this by using xlim() and/or ylim().

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
  xlim(0,5) + ylim(0,40)

Figure 15. Change y-axis and x-axis tick marks.

A number of things have now changed

1. the scales have reverted back to their original designations,
1. zero is now included on both axes, and
1. we have cut off data by constraining the axis at 5.

One function of ggplot is that it only evaluates the data that is in the figure pane. So any smooths, etc., are informed only by the data that is seen. This can be changed to include the data that inform a smooth, but allow the figure to be “zoomed in” on an area of interest. To do this we usecoord_cartesian().

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
  scale_y_continuous(breaks= seq(0,40,5), labels=c(0,"",10,"",20,"",30,"",40)) +
  scale_x_continuous(breaks= seq(0,5,.5), labels=c(seq(0,2,.5),"","",3.5, "",4.5,"")) +
  coord_cartesian(xlim=c(0,5), ylim=c(0,40))

Figure 16. Using coord_cartesian to “zoom in”.

Or if you simply want to have the axes include zero ou could use expand_limits.

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
  expand_limits(x = 0, y = 0)

Figure 16a. Include zero using “expand_limits”.

The smooth now extends to the end of the pane as it indeed includes the values that are off the page. The points can be brought back into the figure by changing the x-axis limits.

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(aes(size=hp, color=hp)) +
  geom_smooth(alpha=.1) +
  scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
  scale_size(range=c(0,7), breaks=as.vector(bs),guide = FALSE) +
  scale_y_continuous(breaks= seq(0,40,5), labels=c(0,"",10,"",20,"",30,"",40)) +
  scale_x_continuous(breaks= seq(0,5,.5), labels=c(seq(0,2,.5),"","",3.5, "",4.5,"")) +
  coord_cartesian(xlim=c(0,5.5), ylim=c(0,40))

Figure 17. Include zero and show all data.

5 Maps

5.1 Basic maps

There are a number of ways in R to plot maps and deal with raster images etc. I have no idea how to operate most of them. I do know a couple of easy ways to generate quick images though, they are presented below.

Going deeper into the mapping realm
(or the rabbit hole…)

Mapping can get very confusing very quickly depending upon what exactly you want to do, how you want to present information, etc. I am focusing on a limited subset of mapping, mainly generating maps for quick visualization of data for evaluating spatial components. There is a great deal of information on mapping be sure to have a look around the interwebs. Let us generate a map of the western Gulf of Alaska.

library(ggplot2)
library(mapdata)
ak <- map_data('worldHires','USA:Alaska')
ggplot() + 
  geom_polygon(data=ak, aes(long, lat, group=group), fill=8, color="black")

Figure 18. A map of Alaska, based on data from PBSmapping package.

Notice that this map in Figure 18 doesn’t seem quite right, there are a few ways to deal with this. If you don’t need the far western Aleutian Islands you can simply constrain the plot (that is the method I will present); two, if you do need the Aleutian Islands then use world2Hires as opposed to ‘worldHires’ for the data source, this centers the map on the Pacific Ocean, but you will then have to deal with latitude and longitude adjustments (longitude is no longer negative \(\leftarrow\) small rabbit hole).

Lets return to the map we do want. There are two ways to constrain the data for the Aleutians (remove it, or simply constrain the limits of the plot). The first method here removes any points that are further west (east?) than 180\(^o\). Then you can create a map with a light grey fill for the land and a black outline for the coast (Figure 19).

ak <- map_data('worldHires','USA:Alaska')
ak <- subset(ak, long<0) #drop the end of the Aleutian Islands, or use world2Hires 
ggplot() + 
  geom_polygon(data=ak, aes(long, lat, group=group), fill=8, color="black")

Figure 19. ggplot2 map of Alaska based upon worldHires data.

Or we can simply constrain the data for the map if we are only interested in a particular area of Alaska. We can also clean the map up while zooming into a particular areas (Figure 24), in this case the western Gulf of Alaska. Note that the code to zoom in is not simply done by changing the x and y limits. When you constrain the x and y limits in ggplot it throws out any data outside of the image, therefore the polygon shapes (“groups”) will not be complete and the image will not be pretty. In this plot I’ve incorporated the limits in a projection wrapper (coord_map), there are a number of different projections that you can use though this will lead to a BIG rabbit hole! I’ve added a background color (aliceblue) as well as incorporated degree symbols into the x- and y-axis labels. Data points and polygons can now be added as additional layers as shown earlier.

ggplot() + 
  geom_polygon(data = ak, aes(long, lat, group=group), fill = 8, color="black") +
  theme(panel.background = element_rect(fill = 'aliceblue')) +
  xlab(expression(paste(Longitude^o,~'W'))) +
  ylab(expression(paste(Latitude^o,~'N'))) +
  coord_map(xlim = c(-165, -145), ylim = c(54, 61))

Figure 20. ggplot2 map of Alaska all dressed up with color.

Figure 21 is the same map without color (better for reports, etc.).

ggplot() + 
  geom_polygon(data = ak, aes(long, lat, group = group), fill=8, color='black') +
   theme(panel.background = element_rect(fill = 'white')) + 
   xlab(expression(paste(Longitude^o,~'W'))) +
   ylab(expression(paste(Latitude^o,~'W'))) +
    coord_map(xlim = c(-165, -145), ylim = c(54, 61))

Figure 21. Zoomed in ggplot map of the western Gulf of Alaska based upon PBSmapping data.

Figure 22 is the same map without color but based upon a different base map layer. PBSmapping has a better defined coastline for Alaska than the worldHires database.

Note: rnaturalearth is another good base map data set to consider using.

library(PBSmapping)
library(tidyverse)
data('nepacLLhigh') 
nepacLLhigh %>% 
  dplyr::select(group=PID, POS=POS,long=X, lat=Y) -> ak 

ggplot() + 
  geom_polygon(data = ak, aes(long, lat, group = group), fill=8, color='black') +
   theme(panel.background = element_rect(fill = 'white')) + 
   xlab(expression(paste(Longitude^o,~'W'))) +
   ylab(expression(paste(Latitude^o,~'W')))+
    coord_map(xlim = c(-165, -145), ylim = c(54, 61))

Figure 22. Zoomed in ggplot map of the western Gulf of Alaska based upon PBSmapping data.

So you don’t like the projection do you? Well change it using + coord_map(). There are loads of different projections that can be used, here I show “gilbert”, but there are plenty more. Look at page 4 here for various types https://cran.r-project.org/web/packages/mapproj/mapproj.pdf.

library(mapproj)

ggplot() + 
  geom_polygon(data = ak, aes(long, lat, group = group), fill = 8, color = "black") +
  coord_map("gilbert")

Figure 23. ggplot2 map of Alaska based upon PBSmapping data with a “Gilbert” projection

5.3 Have shapefile - will map

The federal EEZ shapefile is available here https://github.com/ben-williams/excellent_figures/tree/master/shpfiles

This will require using rgdal and rgeos to manipulate the shapefile into a dataframe available for using in ggplot2. I’ve also used ggrepel as a great way to adjust labels in figures.

funcr is my own package of miscellaneous functions. It is available here install.packages("devtools") devtools::install_github("ben-williams/funcr")

Within it has theme_report() and tickr() functions to facilitate figure making.

library(PBSmapping)
library(tidyverse)
library(ggrepel)
library(rgeos)
library(rgdal)
library(maptools)
library(funcr)
theme_set(theme_report())

# load the shapefile
areas <- readOGR("shpfiles", layer = "NMFS_GOA_WGS1984")

## OGR data source with driver: ESRI Shapefile 
## Source: "D:\laptop\code\excellent_figures\shpfiles", layer: "NMFS_GOA_WGS1984"
## with 7 features
## It has 2 fields

areas@data$id = rownames(areas@data)
areas.points = broom::tidy(areas, region="id")
areas.df = left_join(areas.points, areas@data, by="id")

#load PBSmapping data set for N Pacific - much better resolution than world highres...
data('nepacLLhigh') 
nepacLLhigh %>% 
  dplyr::select(group=PID, POS=POS,long=X,lat=Y) -> ak 

ports <- data.frame(long = c(-152.433333, -160.493333, -162.318056, -165.775), 
                    lat = c(57.787, 55.336667, 55.072222, 54.1325), 
                    port = c("Kodiak", "Sand Point", "King Cove", "Akutan"))



ggplot() +
  geom_polygon(data=ak, aes(long, lat, group=group), fill="lightgray", color="lightgray") + 
  coord_map("gilbert", xlim = c(-170, -145), ylim = c(49, 62)) +
  xlab(expression(paste(Longitude^o, ~'W'))) +
  ylab(expression(paste(Latitude^o, ~'N'))) +
  geom_path(data=areas.df, aes(long, lat, group = group)) +
  geom_text_repel(data = ports, aes(long, lat, label = port, group = port), point.padding = 1.27, direction = "y") +
  annotate("text", x = -165, y = 51.5, label = "Area \n 610") +
  annotate("text", x = -157, y = 53, label = "Area \n 620") +
  annotate("text", x = -151, y = 55, label = "Area \n 630") +
  annotate("text", x = -146, y = 57.2, label = "Area \n 640") +
  annotate("text", x = -167, y = 57.5, label = "Bering Sea") +
  annotate("text", x = -159, y = 60.5, label = "Alaska") +
  annotate("text", x = -150, y = 51, label = "Gulf of Alaska")

Figure 24. ggplot2 shapefile map of Alaska with federal management areas and a “Gilbert” projection

5.4 Bathymetry maps

You need bathymetry data in order to make these. Can be provided as shapefiles or pulled from websources. Here are a few poorly explained examples.

library(marmap)
library(inlmisc)


#  Download data from the server
getNOAA.bathy(lon1 = -175, lon2 = -150, lat1 = 53, lat2 = 66, resolution=1) -> AKBath
AKBath_df <- Grid2Polygons(as.SpatialGridDataFrame(AKBath), level=TRUE, pretty=TRUE)
AKBath_map <- broom::tidy(AKBath_df) %>% 
  mutate(id = factor(id, levels = paste0("X", 1:26)))

# function for a color gradient - note the white is under the map
colfunc <- colorRampPalette(c("royalblue4", "gray88", "white"))

ggplot() + 
  coord_map("gilbert", xlim = c(-165, -151), ylim = c(54, 61)) + 
  geom_map(data=AKBath_map, map=AKBath_map,
                    aes(map_id=id, fill = id), color="black") +
  geom_polygon(data = ak, aes(long, lat, group=group), fill=8, color="black") + 
  scale_fill_manual(values = colfunc(26)) + 
  theme(legend.position = "none") +
  scale_x_continuous(name = expression(paste(Longitude^o, ~'W')),
                     breaks = funcr::tickr(AKBath_map, long, to=1, start = -165, end = -150)$breaks,
                     labels = funcr::tickr(AKBath_map, long, to=1, start = -165, end = -150)$labels) +
  scale_y_continuous(name = expression(paste(Latitude^o, ~'N')),
                     breaks = funcr::tickr(AKBath_map, lat, to=1, start = 54)$breaks,
                     labels = funcr::tickr(AKBath_map, lat, to=1, start = 54)$labels)

Figure 25. ggplot2 map of Alaska based upon PBSmapping and NOAA bathymetry data with a “Gilbert” projection

Additional plotting options

#  Plot bathy data
plot(AKBath, image=TRUE, deep=-6000, shallow=0, step=1000)
scaleBathy(AKBath, deg=2, x="topleft", inset=5)

autoplot.bathy(AKBath, geom=c("r", "c")) + 
  scale_fill_etopo()

Most excellent figures in R with ggplot2

Ben Williams

2020-05