You are currently browsing the monthly archive for February 2016.
RStudio is pleased to notify account holders of recent updates to shinyapps.io.
Note: Action is required if your shiny application URL includes internal.shinyapps.io
What’s New?
We have updated the authentication and invitation system to improve the user experience, security, and extensibility for anyone with private applications. You may have already noticed some changes to the authentication flow for your applications if you are a Standard or Professional account holder.
As a part of these changes, we have eliminated the IFRAME and the associated RStudio branding, except for customers using custom domains where the IFRAME is still required.
For customers on free plans, we will replace the RStudio branding bar with a softer, less intrusive branding overlay.
Possible Action Required
If you have used the provided URL from shinyapps.io for your shiny applications like most accounts, no action is needed. Your applications will simply benefit from the improvements.
If your shiny application URL begins with internal.shinyapps.io you must change it.
To complete the update we will SHUTDOWN all internal.shinyapps.io URLs on March 2, 2016. If you have publicly linked your application to internal.shinyapps.io or you have embedded applications on your website by directly referring to the internal.shinyapps.io URL, you MUST change your links to the URL you see in the shinyapps.io dashboard for your application.
While relatively few accounts are impacted and no action is required for most shinyapps.io users, if you have questions please contact shinyapps-support@rstudio.com.
Thank you all for your help and thanks for using shinyapps.io!
The RStudio shinyapps.io Team
We’re pleased to announce that a new release of RStudio (v0.99.878) is available for download now. Highlights of this release include:
- Support for registering custom RStudio Addins.
- R Markdown editing improvements including outline view and inline UI for chunk execution.
- Support for multiple source windows (tear editor tabs off main window).
- Pane zooming for working distraction free within a single pane.
- Editor and IDE keyboard shortcuts can now be customized.
- New Emacs keybindings mode for the source editor.
- Support for parameterized R Markdown reports.
- Various improvements to RStudio Server Pro including multiple concurrent R sessions, use of multiple R versions, and shared projects for collaboration.
There are lots of other small improvements across the product, check out the release notes for full details.
RStudio Addins
RStudio Addins provide a mechanism for executing custom R functions interactively from within the RStudio IDE—either through keyboard shortcuts, or through the Addins menu. Coupled with the rstudioapi package, users can now write R code to interact with and modify the contents of documents open in RStudio.
An addin can be as simple as a function that inserts a commonly used snippet of text, and as complex as a Shiny application that accepts input from the user and uses it to transform the contents of the active editor. The sky is the limit!
Here’s an example of addin that enables interactive subsetting of a data frame with live preview:
This addin is implemented using a Shiny Gadget (see the source code for more details). RStudio Addins are distributed as R packages. Once you’ve installed an R package that contains addins, they’ll be immediately become available within RStudio.
You can learn more about using and developing addins here: http://rstudio.github.io/rstudioaddins/.
R Markdown
We’ve made a number of improvements to R Markdown authoring. There’s now an optional outline view that enables quick navigation across larger documents:
We’ve also added inline UI to code chunks for running individual chunks, running all previous chunks, and specifying various commonly used knit options:
Multiple Source Windows
There are two ways to open a new source window:
Pop out an editor: click the Show in New Window button in any source editor tab.
Tear off a pane: drag a tab out of the main window and onto the desktop; a new source window will be opened where you dropped the tab.
You can have as many source windows open as you like. Each source window has its own set of tabs; these tabs are independent of the tabs in RStudio’s main source pane.
Customizable Keyboard Shortcuts
You can now customize keyboard shortcuts in RStudio — you can bind keys to execute RStudio application commands, editor commands, or even user-defined R functions.
Access the keyboard shortcuts by clicking Tools -> Modify Keyboard Shortcuts...
:
This will present a dialog that enables remapping of all available editor commands (commands that affect the current document’s contents, or the current selection) and RStudio commands (commands whose actions are scoped beyond just the current editor).
Emacs Keybindings
We’ve introduced a new keybindings mode to go along with the default bindings and Vim bindings already supported. Emacs mode provides a base set of keybindings for navigation and selection, including:
C-p
,C-n
,C-b
andC-f
to move the cursor up, down left and right by charactersM-b
,M-f
to move left and right by wordsC-a
,C-e
to navigate to the start, or end, of line;C-k
to ‘kill’ to end of line, andC-y
to ‘yank’ the last kill,C-s
,C-r
to initiate an Emacs-style incremental search (forward / reverse),C-Space
to set/unset mark, andC-w
to kill the marked region.
There are some additional keybindings that Emacs Speaks Statistics (ESS) users might find familiar:
C-c C-v
displays help for the object under the cursor,C-c C-n
evaluates the current line / selection,C-x b
allows you to visit another file,M-C-a
moves the cursor to the beginning of the current function,M-C-e
moves to the end of the current function,C-c C-f
evaluates the current function.
We’ve also introduced a number of keybindings that allow you to interact with the IDE as you might normally do in Emacs:
C-x C-n
to create a new document,C-x C-f
to find / open an existing document,C-x C-s
to save the current document,C-x k
to close the current file.
RStudio Server Pro
We’ve introduced a number of significant enhancements to RStudio Server Pro in this release, including:
- The ability to open multiple concurrent R sessions. Multiple concurrent sessions are useful for running multiple analyses in parallel and for switching between different tasks.
- Flexible use of multiple R versions on the same server. This is useful when you have some analysts or projects that require older versions of R or R packages and some that require newer versions.
- Project sharing for easy collaboration within workgroups. When you share a project, RStudio Server securely grants other users access to the project, and when multiple users are active in the project at once, you can see each others’ activity and work together in a shared editor.
See the updated RStudio Server Pro page for additional details, including a set of videos which demonstrate the new features.
Try it Out
RStudio v0.99.878 is available for download now. We hope you enjoy the new release and as always please let us know how it’s working and what else we can do to make the product better.
On May 19 and 20, 2016, Hadley Wickham will teach his two day Master R Developer Workshop in the centrally located European city of Amsterdam.
Are you ready to upgrade your R skills? Register soon to secure your seat.
For the convenience of those who may travel to the workshop, it will be held at the Hotel NH Amsterdam Schiphol Airport.
Hadley teaches a few workshops each year and this is the only one planned for Europe. They are very popular and hotel rooms are limited. Please register soon.
We look forward to seeing you in the month of May!
We are pleased to announce version 1.0.0 of the memoise package is now available on CRAN. Memoization stores the value of function call and returns the cached result when the function is called again with the same arguments.
The following function computes Fibonacci numbers and illustrates the usefulness of memoization. Because the function definition is recursive, the intermediate results can be looked up rather than recalculated at each level of recursion, which reduces the runtime drastically. The last time the memoised function is called the final result can simply be returned, so no measurable execution time is recorded.
fib <- function(n) {
if (n < 2) {
return(n)
} else {
return(fib(n-1) + fib(n-2))
}
}
system.time(x <- fib(30))
#> user system elapsed
#> 4.454 0.010 4.472
fib <- memoise(fib)
system.time(y <- fib(30))
#> user system elapsed
#> 0.004 0.000 0.004
system.time(z <- fib(30))
#> user system elapsed
#> 0 0 0
all.equal(x, y)
#> [1] TRUE
all.equal(x, z)
#> [1] TRUE
Memoization is also very useful for storing queries to external resources, such as network APIs and databases.
Improvements in this release make memoised functions much nicer to use interactively. Memoised functions now have a print method which outputs the original function definition rather than the memoization code.
mem_sum <- memoise(sum)
mem_sum
#> Memoised Function:
#> function (..., na.rm = FALSE) .Primitive("sum")
Memoised functions now forward their arguments from the original function rather than simply passing them with ...
. This allows autocompletion to work transparently for memoised functions and also fixes a bug related to non-constant default arguments. [1]
mem_scan <- memoise(scan)
args(mem_scan)
#> function (file = "", what = double(), nmax = -1L, n = -1L, sep = "",
#> quote = if (identical(sep, "\n")) "" else "'\"", dec = ".",
#> skip = 0L, nlines = 0L, na.strings = "NA", flush = FALSE,
#> fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE,
#> multi.line = TRUE, comment.char = "", allowEscapes = FALSE,
#> fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
#> NULL
Memoisation can now depend on external variables aside from the function arguments. This feature can be used in a variety of ways, such as invalidating the memoisation when a new package is attached.
mem_f <- memoise(runif, ~search())
mem_f(2)
#> [1] 0.009113091 0.988083122
mem_f(2)
#> [1] 0.009113091 0.988083122
library(ggplot2)
mem_f(2)
#> [1] 0.89150566 0.01128355
Or invalidating the memoisation after a given amount of time has elapsed. A timeout()
helper function is provided to make this feature easier to use.
mem_f <- memoise(runif, ~timeout(10))
mem_f(2)
#> [1] 0.6935329 0.3584699
mem_f(2)
#> [1] 0.6935329 0.3584699
Sys.sleep(10)
mem_f(2)
#> [1] 0.2008418 0.4538413
A great amount of thanks for this release goes to Kirill Müller, who wrote the argument forwarding implementation and added comprehensive tests to the package. [2, 3]
See the release notes for a complete list of changes.
I’m pleased to announce tidyr 0.4.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette. Install it with:
install.packages("tidyr")
There are two big features in this release: support for nested data frames, and improved tools for turning implicit missing values into explicit missing values. These are described in detail below. As well as these big features, all tidyr verbs now handle grouped_df
objects created by dplyr, gather()
makes a character key
column (instead of a factor), and there are lots of other minor fixes and improvements. Please see the release notes for a complete list of changes.
Nested data frames
nest()
and unnest()
have been overhauled to support a new way of structuring your data: the nested data frame. In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.
For example, take the gapminder dataset:
library(gapminder)
library(dplyr)
gapminder
#> Source: local data frame [1,704 x 6]
#>
#> country continent year lifeExp pop gdpPercap
#> (fctr) (fctr) (int) (dbl) (int) (dbl)
#> 1 Afghanistan Asia 1952 28.8 8425333 779
#> 2 Afghanistan Asia 1957 30.3 9240934 821
#> 3 Afghanistan Asia 1962 32.0 10267083 853
#> 4 Afghanistan Asia 1967 34.0 11537966 836
#> 5 Afghanistan Asia 1972 36.1 13079460 740
#> 6 Afghanistan Asia 1977 38.4 14880372 786
#> 7 Afghanistan Asia 1982 39.9 12881816 978
#> 8 Afghanistan Asia 1987 40.8 13867957 852
#> .. ... ... ... ... ... ...
We can plot the trend in life expetancy for each country:
library(ggplot2)
ggplot(gapminder, aes(year, lifeExp)) +
geom_line(aes(group = country))
But it’s hard to see what’s going on because of all the overplotting. One interesting solution is to summarise each country with a linear model. To do that most naturally, you want one data frame for each country. nest()
creates this structure:
by_country <- gapminder %>%
group_by(continent, country) %>%
nest()
by_country
#> Source: local data frame [142 x 3]
#>
#> continent country data
#> (fctr) (fctr) (list)
#> 1 Asia Afghanistan <tbl_df [12,4]>
#> 2 Europe Albania <tbl_df [12,4]>
#> 3 Africa Algeria <tbl_df [12,4]>
#> 4 Africa Angola <tbl_df [12,4]>
#> 5 Americas Argentina <tbl_df [12,4]>
#> 6 Oceania Australia <tbl_df [12,4]>
#> 7 Europe Austria <tbl_df [12,4]>
#> 8 Asia Bahrain <tbl_df [12,4]>
#> .. ... ... ...
The intriguing thing about this data frame is that it now contains one row per group, and to store the original data we have a new data
column, a list of data frames. If we look at the first one, we can see that it contains the complete data for Afghanistan (sans grouping columns):
by_country$data[[1]]
#> Source: local data frame [12 x 4]
#>
#> year lifeExp pop gdpPercap
#> (int) (dbl) (int) (dbl)
#> 1 1952 43.1 9279525 2449
#> 2 1957 45.7 10270856 3014
#> 3 1962 48.3 11000948 2551
#> 4 1967 51.4 12760499 3247
#> 5 1972 54.5 14760787 4183
#> 6 1977 58.0 17152804 4910
#> 7 1982 61.4 20033753 5745
#> 8 1987 65.8 23254956 5681
#> .. ... ... ... ...
This form is natural because there are other vectors where you’ll have one value per country. For example, we could fit a linear model to each country with purrr:
by_country <- by_country %>%
mutate(model = purrr::map(data, ~ lm(lifeExp ~ year, data = .))
)
by_country
#> Source: local data frame [142 x 4]
#>
#> continent country data model
#> (fctr) (fctr) (list) (list)
#> 1 Asia Afghanistan <tbl_df [12,4]> <S3:lm>
#> 2 Europe Albania <tbl_df [12,4]> <S3:lm>
#> 3 Africa Algeria <tbl_df [12,4]> <S3:lm>
#> 4 Africa Angola <tbl_df [12,4]> <S3:lm>
#> 5 Americas Argentina <tbl_df [12,4]> <S3:lm>
#> 6 Oceania Australia <tbl_df [12,4]> <S3:lm>
#> 7 Europe Austria <tbl_df [12,4]> <S3:lm>
#> 8 Asia Bahrain <tbl_df [12,4]> <S3:lm>
#> .. ... ... ... ...
Because we used mutate()
, we get an extra column containing one linear model per country.
It might seem unnatural to store a list of linear models in a data frame. However, I think it is actually a really convenient and powerful strategy because it allows you to keep related vectors together. If you filter or arrange the vector of models, there’s no way for the other components to get out of sync.
nest()
got us into this form; unnest()
gets us out. You give it the list-columns that you want to unnested, and tidyr will automatically repeat the grouping columns. Unnesting data
gets us back to the original form:
by_country %>% unnest(data)
#> Source: local data frame [1,704 x 6]
#>
#> continent country year lifeExp pop gdpPercap
#> (fctr) (fctr) (int) (dbl) (int) (dbl)
#> 1 Asia Afghanistan 1952 43.1 9279525 2449
#> 2 Asia Afghanistan 1957 45.7 10270856 3014
#> 3 Asia Afghanistan 1962 48.3 11000948 2551
#> 4 Asia Afghanistan 1967 51.4 12760499 3247
#> 5 Asia Afghanistan 1972 54.5 14760787 4183
#> 6 Asia Afghanistan 1977 58.0 17152804 4910
#> 7 Asia Afghanistan 1982 61.4 20033753 5745
#> 8 Asia Afghanistan 1987 65.8 23254956 5681
#> .. ... ... ... ... ... ...
When working with models, unnesting is particularly useful when you combine it with broom to extract model summaries:
# Extract model summaries:
by_country %>% unnest(model %>% purrr::map(broom::glance))
#> Source: local data frame [142 x 15]
#>
#> continent country data model r.squared
#> (fctr) (fctr) (list) (list) (dbl)
#> 1 Asia Afghanistan <tbl_df [12,4]> <S3:lm> 0.985
#> 2 Europe Albania <tbl_df [12,4]> <S3:lm> 0.888
#> 3 Africa Algeria <tbl_df [12,4]> <S3:lm> 0.967
#> 4 Africa Angola <tbl_df [12,4]> <S3:lm> 0.034
#> 5 Americas Argentina <tbl_df [12,4]> <S3:lm> 0.919
#> 6 Oceania Australia <tbl_df [12,4]> <S3:lm> 0.766
#> 7 Europe Austria <tbl_df [12,4]> <S3:lm> 0.680
#> 8 Asia Bahrain <tbl_df [12,4]> <S3:lm> 0.493
#> .. ... ... ... ... ...
#> Variables not shown: adj.r.squared (dbl), sigma (dbl),
#> statistic (dbl), p.value (dbl), df (int), logLik (dbl),
#> AIC (dbl), BIC (dbl), deviance (dbl), df.residual (int).
# Extract coefficients:
by_country %>% unnest(model %>% purrr::map(broom::tidy))
#> Source: local data frame [284 x 7]
#>
#> continent country term estimate std.error
#> (fctr) (fctr) (chr) (dbl) (dbl)
#> 1 Asia Afghanistan (Intercept) -1.07e+03 43.8022
#> 2 Asia Afghanistan year 5.69e-01 0.0221
#> 3 Europe Albania (Intercept) -3.77e+02 46.5834
#> 4 Europe Albania year 2.09e-01 0.0235
#> 5 Africa Algeria (Intercept) -6.13e+02 38.8918
#> 6 Africa Algeria year 3.34e-01 0.0196
#> 7 Africa Angola (Intercept) -6.55e+01 202.3625
#> 8 Africa Angola year 6.07e-02 0.1022
#> .. ... ... ... ... ...
#> Variables not shown: statistic (dbl), p.value (dbl).
# Extract residuals etc:
by_country %>% unnest(model %>% purrr::map(broom::augment))
#> Source: local data frame [1,704 x 11]
#>
#> continent country lifeExp year .fitted .se.fit
#> (fctr) (fctr) (dbl) (int) (dbl) (dbl)
#> 1 Asia Afghanistan 43.1 1952 43.4 0.718
#> 2 Asia Afghanistan 45.7 1957 46.2 0.627
#> 3 Asia Afghanistan 48.3 1962 49.1 0.544
#> 4 Asia Afghanistan 51.4 1967 51.9 0.472
#> 5 Asia Afghanistan 54.5 1972 54.8 0.416
#> 6 Asia Afghanistan 58.0 1977 57.6 0.386
#> 7 Asia Afghanistan 61.4 1982 60.5 0.386
#> 8 Asia Afghanistan 65.8 1987 63.3 0.416
#> .. ... ... ... ... ... ...
#> Variables not shown: .resid (dbl), .hat (dbl), .sigma
#> (dbl), .cooksd (dbl), .std.resid (dbl).
I think storing multiple models in a data frame is a powerful and convenient technique, and I plan to write more about it in the future.
Expanding
The complete()
function allows you to turn implicit missing values into explicit missing values. For example, imagine you’ve collected some data every year basis, but unfortunately some of your data has gone missing:
resources <- frame_data(
~year, ~metric, ~value,
1999, "coal", 100,
2001, "coal", 50,
2001, "steel", 200
)
resources
#> Source: local data frame [3 x 3]
#>
#> year metric value
#> (dbl) (chr) (dbl)
#> 1 1999 coal 100
#> 2 2001 coal 50
#> 3 2001 steel 200
Here the value for steel in 1999 is implicitly missing: it’s simply absent from the data frame. We can use complete()
to make this missing row explicit, adding that combination of the variables and inserting a placeholder NA
:
resources %>% complete(year, metric)
#> Source: local data frame [4 x 3]
#>
#> year metric value
#> (dbl) (chr) (dbl)
#> 1 1999 coal 100
#> 2 1999 steel NA
#> 3 2001 coal 50
#> 4 2001 steel 200
With complete you’re not limited to just combinations that exist in the data. For example, here we know that there should be data for every year, so we can use the fullseq()
function to generate every year over the range of the data:
resources %>% complete(year = full_seq(year, 1L), metric)
#> Source: local data frame [6 x 3]
#>
#> year metric value
#> (dbl) (chr) (dbl)
#> 1 1999 coal 100
#> 2 1999 steel NA
#> 3 2000 coal NA
#> 4 2000 steel NA
#> 5 2001 coal 50
#> 6 2001 steel 200
In other scenarios, you may not want to generate the full set of combinations. For example, imagine you have an experiment where each person is assigned one treatment. You don’t want to expand the combinations of person and treatment, but you do want to make sure every person has all replicates. You can use nesting()
to prevent the full Cartesian product from being generated:
experiment <- data_frame(
person = rep(c("Alex", "Robert", "Sam"), c(3, 2, 1)),
trt = rep(c("a", "b", "a"), c(3, 2, 1)),
rep = c(1, 2, 3, 1, 2, 1),
measurment_1 = runif(6),
measurment_2 = runif(6)
)
experiment
#> Source: local data frame [6 x 5]
#>
#> person trt rep measurment_1 measurment_2
#> (chr) (chr) (dbl) (dbl) (dbl)
#> 1 Alex a 1 0.7161 0.927
#> 2 Alex a 2 0.3231 0.942
#> 3 Alex a 3 0.4548 0.668
#> 4 Robert b 1 0.0356 0.667
#> 5 Robert b 2 0.5081 0.143
#> 6 Sam a 1 0.6917 0.753
experiment %>% complete(nesting(person, trt), rep)
#> Source: local data frame [9 x 5]
#>
#> person trt rep measurment_1 measurment_2
#> (chr) (chr) (dbl) (dbl) (dbl)
#> 1 Alex a 1 0.7161 0.927
#> 2 Alex a 2 0.3231 0.942
#> 3 Alex a 3 0.4548 0.668
#> 4 Robert b 1 0.0356 0.667
#> 5 Robert b 2 0.5081 0.143
#> 6 Robert b 3 NA NA
#> 7 Sam a 1 0.6917 0.753
#> 8 Sam a 2 NA NA
#> .. ... ... ... ... ...
httr 1.1.0 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R. Learn more in the quick start vignette.
Install the latest version with:
install.packages("httr")
When writing this blog post I discovered that I forgot to annouce httr 1.0.0. This was a major release marking the transition from the RCurl package to the curl package, a modern binding to libcurl written by Jeroen Ooms. This makes httr more reliable, less likely to leak memory, and prevents the diabolical “easy handle already used in multi handle” error.
httr 1.1.0 includes a couple of new features:
stop_for_status()
,warn_for_status()
and (new)message_for_status()
replace the oldmessage
argument with a newtask
argument that optionally describes the current task. This allows API wrappers to provide more informative error messages on failure.-
http_error()
replacesurl_ok()
andurl_successful()
.http_error()
more clearly conveys intent and works with urls, responses and status codes.
Otherwise, OAuth support continues to improve thanks to support from the community:
- Nathan Goulding added RSA-SHA1 signature support to
oauth1.0_token()
. He also fixed bugs inoauth_service_token()
and improved the caching behaviour ofrefresh_oauth2.0()
. This makes httr easier to use with Google’s service accounts. -
Graham Parsons added support for HTTP basic authentication to
oauth2.0_token()
with theuse_basic_auth
. This is now the default method used when retrieving a token. -
Daniel Lockau implemented
user_params
which allows you to pass arbitrary additional parameters to the token access endpoint when acquiring or refreshing a token. This allows you to use httr with Microsoft Azure. He also wrote a demo so you can see exactly how this works.
To see the full list of changes, please read the release notes for 1.0.0 and 1.1.0.
Devtools 1.10.0 is now available on CRAN. Devtools makes package building so easy that a package can become your default way to organise code, data, documentation, and tests. You can learn more about creating your own package in R packages. Install devtools with:
install.packages("devtools")
This version is mostly a collection of bug fixes and minor improvements. For example:
- Devtools employs a new strategy for detecting RTools on windows: we now only check for Rtools if you need to
load_all()
orbuild()
a package with compiled code. This should make life easier for most windows users. - Package installation receieved a lot of tweaks from the community. Devtools now makes use of the
Additional_repositories
field, which is useful if you’re using drat for non-CRAN packages.install_github()
is now lazy and won’t reinstall if the currently installed version is the same as the one on github. Local installs now add git and github metadata, if available. use_news_md()
adds a (very) basicNEWS.md
template. CRAN now acceptsNEWS.md
files sorelease()
warns if you’ve previously added it to.Rbuilignore
.use_mit_license()
writes the necessary infrastructure to declare that your package is MIT licensed (in a CRAN-compliant way).check(cran = TRUE)
automatically adds--run-donttest
as this is a de facto CRAN standard.
To see the full list of changes, please read the release notes.