Set up the R work environment to produce publication quality documents using ggplot. Use theme_set
to define the base figure elements - in this case: no gridlines, a white background, and 12 pt times new roman font.
Note: this assumes you are working in a Windows environment
Load packages
library(extrafont)
#font_import() only do this one time - it takes a while
loadfonts(device="win")
windowsFonts(Times=windowsFont("TT Times New Roman"))
library(ggplot2)
theme_set(theme_bw(base_size=12, base_family='Times New Roman')+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()))
Use ggsave
to increase dpi from base R level (72 dpi) and to defining figure dimensions.
ggplot(data, aes(x=x, y=y)) +
geom_point()
ggsave("filename.png", dpi=300, height=4, width=5, units="in")
You say you want to create a publication quality figure using ggplot
? Well then, you are in the right place!
First, let’s look at what ggplot
produces as a base graphic. We will use the R built in data set mtcars
.
library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) +
geom_point()
We get a gray background with small black dots, white grid lines, and a font that is not particularly legible when the figure is placed in this document. Most journals require a figure to be 300 dpi or greater not to mention the background should be white, without grid lines, and with a font that is consistent with the rest of the manuscript. Lets address these step by step.
Want a white background? We’ve got that. Simply add a theme
to the figure. In this case we will add theme_bw()
.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
theme_bw()
To remove gridlines add another theme
, and use element_blank() to clear the lines.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
Fonts are both easy and not so easy to deal with depending on what your needs are.
In this case we want a Times New Roman 12 pt font. The 12 pt is easy so let’s start with that. Simply add a base size option in theme_bw()
.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
theme_bw(base_size = 12) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
Hey nothing happened, Figure 2 is the same as Figure 1c?
The reason for this is the file size. These figures would be saved as .png and R has a base figure output of 72 ppi. This resolution is too low for journal submissions so let’s up it. In this case I’m outputting a .png file with a set width and height and a resolution of 300.
In order to save the figure we will use ggsave
. (Note the size increase of Figure 2b is due to presenting this on the web at 300 dpi - the ggsave function shown below will save a figure in a specified format at a chosen resolution and size).
ggsave("figure2b.png", dpi=300, dev='png', height=4.5, width=6.5, units="in")
Our figure is looking ok, but the font is not correct if you wanted Time New Roman. I’m on a Windows machine, so these procedures may be different for other operating systems. R is not terribly great at fonts so it is necessary to define the fonts clearly. This involves loading the extrafont
package, then importing and unpacking the fonts. Note: Only font_import
one time, it takes a while and once done you are good to go, I have it commented out as I’ve already run it.
library(extrafont)
#font_import()
loadfonts(device = "win")
Load the fonts for a windows device and define the font for windows.
windowsFonts(Times = windowsFont("TT Times New Roman"))
Now the font can be incorporated into the figure via the theme_bw()
component (Figure 3).
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
theme_bw(base_size=12, base_family='Times New Roman') +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
While all these adjustments to ggplot are great, there is a much better way for defining all of your ggfigures. It goes by the name theme_set
. Here is a working example.
First load the fonts (you’ve already run the font_import()
command).
library(extrafont)
library(ggplot2)
loadfonts(device = "win")
windowsFonts(Times = windowsFont("TT Times New Roman"))
then use theme_set
to determine the desired font and figure parameters.
theme_set(theme_bw(base_size=12, base_family='Times New Roman')+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()))
With the basic figure options set, less time is spent coding figures and more time can be spent exploring data. The code used to produce Figure 3 is now reduced to:
ggplot(mtcars, aes(wt, mpg)) + geom_point()
Lets explore these data, starting with a smooth. The base smooth for ggplot is loess (Figure 4) for data groups <1000 and a gam for groups >1000.
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_smooth()
That is great and all but most of us do not use loess on a regular basis. However we can also implement other model structures (Figure 5).
In this case we added a linear regression (lm) and generalized linear regression with a polynomial (glm), a generalized additive model with knots (k) set at 5, and a non-linear least squares model. Note that the polynomial in the glm smooth could also be done via the lm smooth (also with the poly function instead of the identity component), however I’m mostly showing that there is a glm function that can be assigned different distributions (e.g., family=“binomial”), same goes for the gam smoother.
More info can be found here https://stats.idre.ucla.edu/r/faq/how-can-i-explore-different-smooths-in-ggplot2/. Other good things can be found on that site as well!
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_smooth(alpha=.1) + # loess smoother
geom_smooth(method='lm', alpha=.1, color=2, fill=2) + # linear model
geom_smooth(method='glm', alpha=.1, color=3, fill=3, formula=y~x+I(x^2)) + # generalized linear model
geom_smooth(method='gam', alpha=.1, color=4, fill=4, formula=y~s(x, k=5)) + # generalized additive model
geom_smooth(method = "nls", formula = y ~ a*x^2 + b * x + c, # non-linear least squares
method.args = list(start=list(a=.1,b=.5,c=.2)),
se = FALSE, linetype = 1,
colour = 5, fill=5, alpha=.1)
Want bigger points, no problem simply change the size in geom_point()
(Figure). Jitter using geom_jitter()
, etc.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(size=4) +
geom_smooth(alpha=.1)
However, if you want the points to be different colors this can be added to the aesthetic component (in either the ggplot()
or geom_point()
component). For example lets change the size and color of the points by a car’s horsepower (hp) a continuous variable in the dataframe.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1)
The colors and sizes could also be determined by changing hp to a factor.
library(tidyverse)
mtcars %>%
mutate(hp = factor(hp)) %>%
ggplot(aes(wt, mpg)) +
geom_point(aes(size = hp, color=hp)) +
geom_smooth(alpha=.1)
However this will often create too many discrete groups to clearly understand what is being plotted, therefore a “better” method is to define the scales for the data points. This can be done with scale_color_gradientn()
.
mtcars %>%
ggplot(aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp))+
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5)) +
scale_size(range=c(0,7))
Want to define where the legend breaks? no problem, just add a vector of values.
bs <- seq(0,350,25) # define break locations
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs))
Note that the legend may fall off the plotted area of the figure. One way to fix this is to remove one of the legends, in this case the legend that references the size of the points.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE)
The legend now breaks in increments of 25 using a 5 color scale from the rainbow color palette.
It is quite easy to change the axis tick marks. Change the y-axis to every 5th value.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
scale_y_continuous(breaks= seq(0, 40, 5))
Also change the x-axis, this time to unequal spacings.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
scale_y_continuous(breaks= seq(0, 40, 5)) +
scale_x_continuous(breaks= c(0, 1, 1.5, 2, 3, 5))
Your collaborator really prefers to have all of the tick marks present so… Change the labels while keeping the tickmarks.
ggplot(mtcars, aes(wt, mpg)) + geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha = 0.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0, 7), breaks=as.vector(bs),guide = FALSE) +
scale_y_continuous(breaks= seq(0,40,5), labels=c(0,"",10,"",20,"",30,"",40)) +
scale_x_continuous(breaks= seq(0,5,.5), labels=c(seq(0,2,.5),"","",3.5, "",4.5,""))
However, the scale presented doesn’t include zero, something that is often nice to have. Most people approach this by using xlim() and/or ylim().
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
xlim(0,5) + ylim(0,40)
A number of things have now changed
One function of ggplot is that it only evaluates the data that is in the figure pane. So any smooths, etc., are informed only by the data that is seen. This can be changed to include the data that inform a smooth, but allow the figure to be “zoomed in” on an area of interest. To do this we usecoord_cartesian()
.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
scale_y_continuous(breaks= seq(0,40,5), labels=c(0,"",10,"",20,"",30,"",40)) +
scale_x_continuous(breaks= seq(0,5,.5), labels=c(seq(0,2,.5),"","",3.5, "",4.5,"")) +
coord_cartesian(xlim=c(0,5), ylim=c(0,40))
Or if you simply want to have the axes include zero ou could use expand_limits
.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs), guide = FALSE) +
expand_limits(x = 0, y = 0)
The smooth now extends to the end of the pane as it indeed includes the values that are off the page. The points can be brought back into the figure by changing the x-axis limits.
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(size=hp, color=hp)) +
geom_smooth(alpha=.1) +
scale_colour_gradientn(colours=rainbow(5), breaks=as.vector(bs)) +
scale_size(range=c(0,7), breaks=as.vector(bs),guide = FALSE) +
scale_y_continuous(breaks= seq(0,40,5), labels=c(0,"",10,"",20,"",30,"",40)) +
scale_x_continuous(breaks= seq(0,5,.5), labels=c(seq(0,2,.5),"","",3.5, "",4.5,"")) +
coord_cartesian(xlim=c(0,5.5), ylim=c(0,40))
There are a number of ways in R to plot maps and deal with raster images etc. I have no idea how to operate most of them. I do know a couple of easy ways to generate quick images though, they are presented below.
Going deeper into the mapping realm
(or the rabbit hole…)
Mapping can get very confusing very quickly depending upon what exactly you want to do, how you want to present information, etc. I am focusing on a limited subset of mapping, mainly generating maps for quick visualization of data for evaluating spatial components. There is a great deal of information on mapping be sure to have a look around the interwebs. Let us generate a map of the western Gulf of Alaska.
library(ggplot2)
library(mapdata)
ak <- map_data('worldHires','USA:Alaska')
ggplot() +
geom_polygon(data=ak, aes(long, lat, group=group), fill=8, color="black")
Notice that this map in Figure 18 doesn’t seem quite right, there are a few ways to deal with this. If you don’t need the far western Aleutian Islands you can simply constrain the plot (that is the method I will present); two, if you do need the Aleutian Islands then use world2Hires
as opposed to ‘worldHires’ for the data source, this centers the map on the Pacific Ocean, but you will then have to deal with latitude and longitude adjustments (longitude is no longer negative \(\leftarrow\) small rabbit hole).
Lets return to the map we do want. There are two ways to constrain the data for the Aleutians (remove it, or simply constrain the limits of the plot). The first method here removes any points that are further west (east?) than 180\(^o\). Then you can create a map with a light grey fill for the land and a black outline for the coast (Figure 19).
ak <- map_data('worldHires','USA:Alaska')
ak <- subset(ak, long<0) #drop the end of the Aleutian Islands, or use world2Hires
ggplot() +
geom_polygon(data=ak, aes(long, lat, group=group), fill=8, color="black")
Or we can simply constrain the data for the map if we are only interested in a particular area of Alaska. We can also clean the map up while zooming into a particular areas (Figure 24), in this case the western Gulf of Alaska. Note that the code to zoom in is not simply done by changing the x and y limits. When you constrain the x and y limits in ggplot it throws out any data outside of the image, therefore the polygon shapes (“groups”) will not be complete and the image will not be pretty. In this plot I’ve incorporated the limits in a projection wrapper (coord_map), there are a number of different projections that you can use though this will lead to a BIG rabbit hole! I’ve added a background color (aliceblue) as well as incorporated degree symbols into the x- and y-axis labels. Data points and polygons can now be added as additional layers as shown earlier.
ggplot() +
geom_polygon(data = ak, aes(long, lat, group=group), fill = 8, color="black") +
theme(panel.background = element_rect(fill = 'aliceblue')) +
xlab(expression(paste(Longitude^o,~'W'))) +
ylab(expression(paste(Latitude^o,~'N'))) +
coord_map(xlim = c(-165, -145), ylim = c(54, 61))
Figure 21 is the same map without color (better for reports, etc.).
ggplot() +
geom_polygon(data = ak, aes(long, lat, group = group), fill=8, color='black') +
theme(panel.background = element_rect(fill = 'white')) +
xlab(expression(paste(Longitude^o,~'W'))) +
ylab(expression(paste(Latitude^o,~'W'))) +
coord_map(xlim = c(-165, -145), ylim = c(54, 61))
Figure 22 is the same map without color but based upon a different base map layer. PBSmapping
has a better defined coastline for Alaska than the worldHires
database.
Note: rnaturalearth
is another good base map data set to consider using.
library(PBSmapping)
library(tidyverse)
data('nepacLLhigh')
nepacLLhigh %>%
dplyr::select(group=PID, POS=POS,long=X, lat=Y) -> ak
ggplot() +
geom_polygon(data = ak, aes(long, lat, group = group), fill=8, color='black') +
theme(panel.background = element_rect(fill = 'white')) +
xlab(expression(paste(Longitude^o,~'W'))) +
ylab(expression(paste(Latitude^o,~'W')))+
coord_map(xlim = c(-165, -145), ylim = c(54, 61))
So you don’t like the projection do you? Well change it using + coord_map()
. There are loads of different projections that can be used, here I show “gilbert”, but there are plenty more. Look at page 4 here for various types https://cran.r-project.org/web/packages/mapproj/mapproj.pdf.
library(mapproj)
ggplot() +
geom_polygon(data = ak, aes(long, lat, group = group), fill = 8, color = "black") +
coord_map("gilbert")
The federal EEZ shapefile is available here https://github.com/ben-williams/excellent_figures/tree/master/shpfiles
This will require using rgdal
and rgeos
to manipulate the shapefile into a dataframe available for using in ggplot2
. I’ve also used ggrepel
as a great way to adjust labels in figures.
funcr
is my own package of miscellaneous functions. It is available here install.packages("devtools") devtools::install_github("ben-williams/funcr")
Within it has theme_report()
and tickr()
functions to facilitate figure making.
library(PBSmapping)
library(tidyverse)
library(ggrepel)
library(rgeos)
library(rgdal)
library(maptools)
library(funcr)
theme_set(theme_report())
# load the shapefile
areas <- readOGR("shpfiles", layer = "NMFS_GOA_WGS1984")
## OGR data source with driver: ESRI Shapefile
## Source: "D:\laptop\code\excellent_figures\shpfiles", layer: "NMFS_GOA_WGS1984"
## with 7 features
## It has 2 fields
areas@data$id = rownames(areas@data)
areas.points = broom::tidy(areas, region="id")
areas.df = left_join(areas.points, areas@data, by="id")
#load PBSmapping data set for N Pacific - much better resolution than world highres...
data('nepacLLhigh')
nepacLLhigh %>%
dplyr::select(group=PID, POS=POS,long=X,lat=Y) -> ak
ports <- data.frame(long = c(-152.433333, -160.493333, -162.318056, -165.775),
lat = c(57.787, 55.336667, 55.072222, 54.1325),
port = c("Kodiak", "Sand Point", "King Cove", "Akutan"))
ggplot() +
geom_polygon(data=ak, aes(long, lat, group=group), fill="lightgray", color="lightgray") +
coord_map("gilbert", xlim = c(-170, -145), ylim = c(49, 62)) +
xlab(expression(paste(Longitude^o, ~'W'))) +
ylab(expression(paste(Latitude^o, ~'N'))) +
geom_path(data=areas.df, aes(long, lat, group = group)) +
geom_text_repel(data = ports, aes(long, lat, label = port, group = port), point.padding = 1.27, direction = "y") +
annotate("text", x = -165, y = 51.5, label = "Area \n 610") +
annotate("text", x = -157, y = 53, label = "Area \n 620") +
annotate("text", x = -151, y = 55, label = "Area \n 630") +
annotate("text", x = -146, y = 57.2, label = "Area \n 640") +
annotate("text", x = -167, y = 57.5, label = "Bering Sea") +
annotate("text", x = -159, y = 60.5, label = "Alaska") +
annotate("text", x = -150, y = 51, label = "Gulf of Alaska")
You need bathymetry data in order to make these. Can be provided as shapefiles or pulled from websources. Here are a few poorly explained examples.
library(marmap)
library(inlmisc)
# Download data from the server
getNOAA.bathy(lon1 = -175, lon2 = -150, lat1 = 53, lat2 = 66, resolution=1) -> AKBath
AKBath_df <- Grid2Polygons(as.SpatialGridDataFrame(AKBath), level=TRUE, pretty=TRUE)
AKBath_map <- broom::tidy(AKBath_df) %>%
mutate(id = factor(id, levels = paste0("X", 1:26)))
# function for a color gradient - note the white is under the map
colfunc <- colorRampPalette(c("royalblue4", "gray88", "white"))
ggplot() +
coord_map("gilbert", xlim = c(-165, -151), ylim = c(54, 61)) +
geom_map(data=AKBath_map, map=AKBath_map,
aes(map_id=id, fill = id), color="black") +
geom_polygon(data = ak, aes(long, lat, group=group), fill=8, color="black") +
scale_fill_manual(values = colfunc(26)) +
theme(legend.position = "none") +
scale_x_continuous(name = expression(paste(Longitude^o, ~'W')),
breaks = funcr::tickr(AKBath_map, long, to=1, start = -165, end = -150)$breaks,
labels = funcr::tickr(AKBath_map, long, to=1, start = -165, end = -150)$labels) +
scale_y_continuous(name = expression(paste(Latitude^o, ~'N')),
breaks = funcr::tickr(AKBath_map, lat, to=1, start = 54)$breaks,
labels = funcr::tickr(AKBath_map, lat, to=1, start = 54)$labels)
Additional plotting options
# Plot bathy data
plot(AKBath, image=TRUE, deep=-6000, shallow=0, step=1000)
scaleBathy(AKBath, deg=2, x="topleft", inset=5)
autoplot.bathy(AKBath, geom=c("r", "c")) +
scale_fill_etopo()