I’m planning to submit dplyr 0.6.0 to CRAN on May 11 (in four weeks time). In preparation, I’d like to announce that the release candidate, dplyr 0.5.0.9002 is now available. I would really appreciate it if you’d try it out and report any problems. This will ensure that the official release has as few bugs as possible.
Installation
Install the pre-release version with:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")
If you discover any problems, please file a minimal reprex on GitHub. You can roll back to the released version with:
install.packages("dplyr")
Features
dplyr 0.6.0 is a major release including over 100 bug fixes and improvements. There are three big changes that I want to touch on here:
- Databases
- Improved encoding support (particularly for CJK on windows)
- Tidyeval, a new framework for programming with dplyr
You can see a complete list of changes in the draft release notes.
Databases
Almost all database related code has been moved out of dplyr and into a new package, dbplyr. This makes dplyr simpler, and will make it easier to release fixes for bugs that only affect databases.
To install the development version of dbplyr so you can try it out, run:
devtools::install_github("hadley/dbplyr")
There’s one major change, as well as a whole heap of bug fixes and minor improvements. It is now no longer necessary to create a remote “src”. Instead you can work directly with the database connection returned by DBI, reflecting the robustness of the DBI ecosystem. Thanks largely to the work of Kirill Muller (funded by the R Consortium) DBI backends are now much more consistent, comprehensive, and easier to use. That means that there’s no longer a need for a layer between you and DBI.
You can continue to use src_mysql()
, src_postgres()
, and src_sqlite()
(which still live in dplyr), but I recommend a new style that makes the connection to DBI more clear:
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "iris", iris)
#> [1] TRUE
iris2 <- tbl(con, "iris")
iris2
#> Source: table<iris> [?? x 5]
#> Database: sqlite 3.11.1 [:memory:]
#>
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with more rows
This is particularly useful if you want to perform non-SELECT queries as you can do whatever you want with DBI::dbGetQuery()
and DBI::dbExecute()
.
If you’ve implemented a database backend for dplyr, please read the backend news to see what’s changed from your perspective (not much). If you want to ensure your package works with both the current and previous version of dplyr, see wrap_dbplyr_obj()
for helpers.
Character encoding
We have done a lot of work to ensure that dplyr works with encodings other that Latin1 on Windows. This is most likely to affect you if you work with data that contains Chinese, Japanese, or Korean (CJK) characters. dplyr should now just work with such data.
Tidyeval
dplyr has a new approach to non-standard evaluation (NSE) called tidyeval. Tidyeval is described in detail in a new vignette about programming with dplyr but, in brief, it gives you the ability to interpolate values in contexts where dplyr usually works with expressions:
my_var <- quo(homeworld)
starwars %>%
group_by(!!my_var) %>%
summarise_at(vars(height:mass), mean, na.rm = TRUE)
#> # A tibble: 49 × 3
#> homeworld height mass
#> <chr> <dbl> <dbl>
#> 1 Alderaan 176.3333 64.0
#> 2 Aleen Minor 79.0000 15.0
#> 3 Bespin 175.0000 79.0
#> 4 Bestine IV 180.0000 110.0
#> 5 Cato Neimoidia 191.0000 90.0
#> 6 Cerea 198.0000 82.0
#> 7 Champala 196.0000 NaN
#> 8 Chandrila 150.0000 NaN
#> 9 Concord Dawn 183.0000 79.0
#> 10 Corellia 175.0000 78.5
#> # ... with 39 more rows
This will make it much easier to eliminate copy-and-pasted dplyr code by extracting repeated code into a function.
This also means that the underscored version of each main verb (filter_()
, select_()
etc). is no longer needed, and so these functions have been deprecated (but remain around for backward compatibility).
17 comments
April 13, 2017 at 9:13 pm
Phil
Will tidyeval eventually be applied to other packages in the tidyverse family?
April 14, 2017 at 8:59 am
hadleywickham
Yes!
April 13, 2017 at 11:25 pm
wreckord
Awesome. Thank you.
April 14, 2017 at 2:49 am
Joe
I’m concerned that the `:=` operator will conflict with the long-established `:=` function in data.table. Are there conflicts when both data.table and dplyr 0.6.0 are loaded? How will this affect dtplyr?
April 14, 2017 at 8:59 am
hadleywickham
It shouldn’t cause any problems, as far as I know
April 14, 2017 at 12:06 pm
Joe
Oh great!. Excited to start working with the new tidyeval approach
April 14, 2017 at 2:57 am
Joe
Also, what’s best practice for using dplyr 0.6.0 in packages? If I recall correctly, you previously needed to use the se versions (e.g. `select_()`, etc) of dplyr functions to pass cran check without any warnings.
April 14, 2017 at 9:02 am
hadleywickham
I don’t think that R CMD check notes will be a problem with the new system. If needed, you can use `.data$x`, as described in the programming with dplyr vignette.
April 14, 2017 at 5:23 am
Steven S.
Nice. There’s one thing that bugs me though: could you use ‘quote’ instead of ‘quo’ ? That seems much nicer and cleaner to me.
April 14, 2017 at 9:03 am
hadleywickham
As described in the vignette, quote() only captures the expression, not the environment where it should be evaluated. That makes it impossible to correctly compute the expression.
April 14, 2017 at 8:02 am
EL-AD DAVID AMIR
Excited to see the next version of tidyr! I’ve been looking at the new quotation infrastructure and I think it’s trying to achieve a goal similar to replyr::let — may I ask why you have taken this approach over replyr’s more straightforward alias system?
April 14, 2017 at 9:05 am
hadleywickham
tidyeval is a much richer system than replyr::let. It will take more time to understand quasiquotation, but it’s a deeper theory, so gives you greater powers once you have mastered it.
April 14, 2017 at 9:28 am
El-ad David Amir
Thank you!
April 17, 2017 at 10:30 pm
Earl Brown
Is non-equi join available in dplyr? Did I miss it in dplyr 0.5.0? https://groups.google.com/forum/#!searchin/manipulatr/earl$20brown%7Csort:relevance/manipulatr/GTXH2B21O9Q/5UVDeOmrBGMJ
April 17, 2017 at 10:32 pm
ekbrown77
Did non-equi join make it into dplyr 0.5.0? If not, will it make it into 0.6.0? See conversation here: https://groups.google.com/forum/#!searchin/manipulatr/earl$20brown%7Csort:relevance/manipulatr/GTXH2B21O9Q/5UVDeOmrBGMJ
April 18, 2017 at 7:52 am
hadleywickham
No, and no, sorry.
April 22, 2017 at 10:53 am
Programming over R – Win-Vector Blog
[…] 0.6 is introducing a new execution system (alternately called rlang or tidyeval, see here) which uses a notation more like the following (but fewer parenthesis, and with the ability to […]