You are currently browsing the monthly archive for December 2015.
I’m very pleased to announce the release of ggplot2 2.0.0. I know I promised that there wouldn’t be any more updates, but while working on the 2nd edition of the ggplot2 book, I just couldn’t stop myself from fixing some long standing problems.
On the scale of ggplot2 releases, this one is huge with over one hundred fixes and improvements. This might break some of your existing code (although I’ve tried to minimise breakage as much as possible), but I hope the new features make up for any short term hassle. This blog post documents the most important changes:
- ggplot2 now has an official extension mechanism.
- There are a handful of new geoms, and updates to existing geoms.
- The default appearance has been thoroughly tweaked so most plots should look better.
- Facets have a much richer set of labelling options.
- The documentation has been overhauled to be more helpful, and require less integration across multiple pages.
- A number of older and less used features have been deprecated.
These are described in more detail below. See the release notes for a complete list of all changes.
Extensibility
Perhaps the bigggest news in this release is that ggplot2 now has an official extension mechanism. This means that others can now easily create their on stats, geoms and positions, and provide them in other packages. This should allow the ggplot2 community to flourish, even as less development work happens in ggplot2 itself. See vignette("extending-ggplot2")
for details.
Coupled with this change, ggplot2 no longer uses proto or reference classes. Instead, we now use ggproto, a new OO system designed specifically for ggplot2. Unlike proto and RC, ggproto supports clean cross-package inheritance, which is necessary for extensibility. Creating a new OO system isn’t usually the right solution, but I’m pretty sure it was necessary here. Read more about it in the vignette.
New and updated geoms
- ggplot no longer throws an error if you your plot has no layers. Instead it automatically adds
geom_blank()
:ggplot(mpg, aes(cyl, hwy))
geom_count()
(a new alias for the oldstat_sum()
) counts the number of points at unique locations on a scatterplot, and maps the size of the point to the count:ggplot(mpg, aes(cty, hwy)) + geom_point() ggplot(mpg, aes(cty, hwy)) + geom_count()
geom_curve()
draws curved lines in the same way thatgeom_segment()
draws straight lines:df <- expand.grid(x = 1:2, y = 1:2) ggplot(df, aes(x, y, xend = x + 0.5, yend = y + 0.5)) + geom_curve(aes(colour = "curve")) + geom_segment(aes(colour = "segment"))
geom_bar()
now behaves differently fromgeom_histogram()
. Instead of binning the data, it counts the number of unique observations at each location:ggplot(mpg, aes(cyl)) + geom_bar() ggplot(mpg, aes(cyl)) + geom_histogram(binwidth = 1)
If you got into the (bad) habit of using
geom_histogram()
to create bar charts, orgeom_bar()
to create histograms, you’ll need to switch.- Layers are now much stricter about their arguments – you will get an error if you’ve supplied an argument that isn’t an aesthetic or a parameter. This breaks the handful of geoms/stats that used
...
to pass additional arguments on to the underlying computation. Nowgeom_smooth()
/stat_smooth()
andgeom_quantile()
/stat_quantile()
usemethod.args
instead; andstat_summary()
,stat_summary_hex()
, andstat_summary2d()
usefun.args
. This is likely to cause some short-term pain but in the long-term it will make it much easier to spot spelling mistakes and other errors. geom_text()
has been overhauled to make labelling your data a little easier. You can usenudge_x
andnudge_y
arguments to offset labels from their corresponding points.check_overlap = TRUE
provides a simple way to avoid overplotting of labels: labels that would otherwise overlap are omitted.ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars))) + geom_point() + geom_text(nudge_y = 0.5, check_overlap = TRUE)
(Labelling points well is still a huge pain, but at least these new features make life a lit better.)
geom_label()
works likegeom_text()
but draws a rounded rectangle underneath each label:grid <- expand.grid( x = seq(-pi, pi, length = 50), y = seq(-pi, pi, length = 50) ) %>% mutate(r = x ^ 2 + y ^ 2, z = cos(r ^ 2) * exp(-r / 6)) ggplot(grid, aes(x, y)) + geom_raster(aes(fill = z)) + geom_label(data = data.frame(x = 0, y = 0), label = "Center") + theme(legend.position = "none") + coord_fixed()
aes_()
replacesaes_q()
, and works like the SE functions in dplyr and my other recent packages. It supports formulas, so the most concise SE version ofaes(carat, price)
is nowaes_(~carat, ~price)
. You may want to use this form in packages, as it will avoid spuriousR CMD check
warnings about undefined global variables.ggplot(mpg, aes_(~displ, ~cty)) + geom_point() # Same as ggplot(mpg, aes(displ, cty)) + geom_point()
Appearance
I’ve made a number of small tweaks to the default appearance:
- The default
theme_grey()
background colour has been changed from “grey90” to “grey92”: this makes the background a little less visually prominent. - Labels and titles have been tweaked for readability. Axis labels are darker, and legend titles get the same visual treatment as axis labels.
- The default font size dropped from 12 to 11. You might be surprised that I’ve made the default text size smaller as it was already hard for many people to read. It turns out there was a bug in RStudio (fixed in 0.99.724), that shrunk the text of all grid based graphics. Once that was resolved the defaults seemed too big to my eyes.
scale_size()
now maps values to area, not radius. Usescale_radius()
if you want the old behaviour (not recommended, except perhaps for lines). Continue to usescale_size_area()
if you want 0 values to have 0 area.- Bar and rectangle legends no longer get a diagonal line. Instead, the border has been tweaked to make it visible, and more closely match the size of line drawn on the plot.
ggplot(mpg, aes(factor(cyl), fill = drv)) + geom_bar(colour = "black", size = 1) + coord_flip()
geom_point()
now uses shape 19 instead of 16. This looks much better on the default Linux graphics device. (It’s very slightly smaller than the old point, but it shouldn’t affect any graphics significantly). You can now control the width of the outline on shapes 21-25 with thestroke
parameter.- The default legend will now allocate multiple rows (if vertical) or columns (if horizontal) in order to make a legend that is more likely to fit on the screen. You can override with the
nrow
/ncol
arguments toguide_legend()
p <- ggplot(mpg, aes(displ,hwy, colour = manufacturer)) + geom_point() + theme(legend.position = "bottom") p # Revert back to previous behaviour p + guides(colour = guide_legend(nrow = 1))
- Two new themes were contributed by Jean-Olivier Irisson:
theme_void()
is completely empty andtheme_dark()
has a dark background designed to make colours pop out.
Facet labels
Thanks to the work of Lionel Henry, facet labels have received three major improvements:
- You can switch the position of facet labels so they’re next to the axes.
facet_wrap()
now supports custom labellers.- You can create combined labels when facetting by multiple variables.
Switching the labels
The new switch
argument allows you to switch the labels to display near the axes:
data <- transform(mtcars,
am = factor(am, levels = 0:1, c("Automatic", "Manual")),
gear = factor(gear, levels = 3:5, labels = c("Three", "Four", "Five"))
)
ggplot(data, aes(mpg, disp)) +
geom_point() +
facet_grid(am ~ gear, switch = "both")
This is especially useful when the labels directly characterise the axes. In that situation, switching the labels can make the plot clearer and more readable. You may also want to use a neutral label background by setting strip.background
to element_blank()
:
data <- mtcars %>%
mutate(
Logarithmic = log(mpg),
Inverse = 1 / mpg,
Cubic = mpg ^ 3,
Original = mpg
) %>% tidyr::gather(transformation, mpg2, Logarithmic:Original)
ggplot(data, aes(mpg2, disp)) +
geom_point() +
facet_wrap(~transformation, scales = "free", switch = "x") +
theme(strip.background = element_blank())
Wrap labeller
A longstanding issue in ggplot was that facet_wrap()
did not support custom labellers. Labellers are small functions that make it easy to customise the labels. You can now supply labellers to both wrap and grid facets:
ggplot(data, aes(mpg2, disp)) +
geom_point() +
facet_wrap(~transformation, scales = "free", labeller = "label_both")
Composite margins
Labellers have now better support for composite margins when you facet over multiple variable with +
. All labellers gain a multi_line
argument to control whether labels should be displayed as a single line or over multiple lines, one for each factor.
The labellers still work the same way except for label_bquote()
. That labeller makes it easy to write mathematical expression involving the values of facetted factors. Historically, label_bquote()
could only specify a single expression for all margins and factor. The factor value was referred to via the backquoted placeholder .(x)
. Now that it supports expressions combining multiple factors, you must backquote the variable names themselves. In addition, you can provide different expressions for each margin:
my_labeller <- label_bquote(
rows = .(am) / alpha,
cols = .(vs) ^ .(cyl)
)
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
facet_grid(am ~ vs + cyl, labeller = my_labeller)
Documentation
I’ve given the documentation a thorough overhaul:
- Tighly linked geoms and stats (e.g.
geom_boxplot()
andstat_boxplot()
) are now documented in the same file so you can see all the arguments in one place. Similarly, variations on a theme (likegeom_path()
,geom_line()
, andgeom_step()
) are documented together. - I’ve tried to reduce the use of
...
so that you can see all the documentation in one place rather than having to follow links around. In some cases this has involved adding additional arguments to geoms to make it more clear what you can do. - Thanks to Bob Rudis, the use of
qplot()
in examples has been grealy reduced. This is inline with the 2nd edition of the ggplot2 book, which eliminatesqplot()
in favour ofggplot()
.
Deprecated features
- The
order
aesthetic is officially deprecated. It never really worked, and was poorly documented. - The
stat
andposition
arguments toqplot()
have been deprecated.qplot()
is designed for quick plots – if you need to specify position or stat, useggplot()
instead. - The theme setting
axis.ticks.margin
has been deprecated: now use the margin property ofaxis.ticks
. stat_abline()
,stat_hline()
andstat_vline()
have been removed: these were never suitable for use other than with their corresponding geoms and were not documented.show_guide
has been renamed toshow.legend
: this more accurately reflects what it does (controls appearance of layer in legend), and uses the same convention as other ggplot2 arguments (i.e. a.
between names). (Yes, I know that’s inconsistent with function names (which use_
) but it’s too late to change now.)
A number of geoms have been renamed to be more consistent. The previous names will continue to work for the forseeable future, but you should switch to the new names for new work.
stat_binhex()
andstat_bin2d()
have been renamed tostat_bin_hex()
andstat_bin_2d()
.stat_summary2d()
has been renamed tostat_summary_2d()
,geom_density2d()
/stat_density2d()
has been renamed togeom_density_2d()
/stat_density_2d()
.stat_spoke()
is nowgeom_spoke()
since I realised it’s a reparameterisation ofgeom_segment()
.stat_bindot()
has been removed because it’s so tightly coupled togeom_dotplot()
. If you happened to usestat_bindot()
, just change togeom_dotplot()
.
All defunct functions have been removed.
I’m pleased to announced a new package for producing SVGs from R: svglite. This package is a fork of Matthieu Decorde RSvgDevice and wouldn’t be possible without his hard work. I’d also like to thank David Gohel who wrote the gdtools package: it solves all the hardest problems associated with making good SVGs from R.
Today, most browsers have good support for SVG and it is a great way of displaying vector graphics on the web. Unfortunately, R’s built-in svg()
device is focussed on high quality rendering, not size or speed. It renders text as individual polygons: this ensures a graphic will look exactly the same regardless of what fonts you have installed, but makes output considerably larger (and harder to edit in other tools). svglite produces hand-optimised SVG that is as small as possible.
Features
svglite is a complete graphics device: that means you can give it any graphic and it will look the same as the equivalent .pdf
or .png
. Please file an issue if you discover a plot that doesn’t look right.
Use
In an interactive session, you use it like any other R graphics device:
svglite::svglite("myfile.svg")
plot(runif(10), runif(10))
dev.off()
If you want to use it in knitr, just set your chunk options as follows:
```{r setup, include = FALSE}
library(svglite)
knitr::opts_chunk$set(
dev = "svglite",
fig.ext = ".svg"
)
(Thanks to Bob Rudis for the tip)
There are also a few helper functions:
htmlSVG()
makes it easy to preview the SVG in RStudio.editSVG()
opens the SVG file in your default SVG editor.xmlSVG()
returns the SVG as an xml2 object.
Are you ready to upgrade your R skills? Register soon to secure your seat.
On January 28 and 29, 2016, Hadley Wickham will teach his popular Master R Developer Workshop at the Westin San Francisco Airport. The workshop is offered only 3 times a year and the San Francisco class is already nearly 50% full. This is the only Master R Developer Workshop Hadley is planning for the US West Coast in 2016.
We look forward to seeing you there!
The RStudio IDE is bursting with capabilities and features. Do you know how to use them all? Tomorrow, we begin an “RStudio Essentials” webinar series. This will be the perfect way to learn how to use the IDE to its fullest. The series is broken into six sections always on a Wednesday at 11 a.m. EDT:
- Programming Part 1 (Writing code in RStudio) – December 2nd
- Programming Part 2 (Debugging code in RStudio) – December 9th
- Programming Part 3 (Package Writing and in RStudio) – December 16th
- Managing Change Part 1 (Projects in RStudio) – January 6th
- Managing Change Part 2 (Github and RStudio) – January 20th
- Managing Change Part 3 (Package version with Packrat) – February 3rd
Each webinar will be 30 minutes long, which will make them easy to attend. If you miss a live webinar or want to review them, recorded versions will be available to registrants. Register here.
p.s. Don’t forget that you can watch many useful past webinars at our webinars archive.