dplyr 1.2.0 comes with much faster and more memory efficient `if_else()` and `case_when()` functions!
dplyr 1.2.0 fills in some important gaps in dplyr's API: we've added a new complement to `filter()` focused on dropping rows, and we've expanded the `case_when()` family with three new recoding and replacing functions!
This release deprecates `str_like(ignore_case)` and changes the behaviour of `str_replace_all()` for function replacements. It also introduces `str_ilike()` for case-insensitive SQL-like pattern matching, three new case conversion functions (`str_to_camel()`, `str_to_snake()`, and `str_to_kebab()`), and preserves names in all relevant functions
This release tightens up the package by removing long-deprecated functions, making `map_chr()` and predicate functions more type-safe, and requiring a newer version of carrier to make `in_parallel()` use easier. It also includes performance improvements to `every()`, `some()`, and `none()`, as well as a new getting started vignette
httr2 1.2.0 improves security for redacted headers, improves URL parsing and building, enhances debugging, and includes a bunch of other quality of life improvements
The functional programming toolkit for R gains new capabilities for parallel processing and distributed computing using mirai
duckplyr 1.1.0 is on CRAN! A drop-in replacement for dplyr, powered by DuckDB for speed. It is the most dplyr-like of dplyr backends
Choosing a data validation library for Polars? We compare Pandera, Patito, Pointblank, Validoopsie, and Dataframely
Want a free Pointblank workshop for your data team? Here's how to set one up
Pointblank's revamped user guide: spiral learning, clearer examples, and full API coverage
Automate responses to bad data with Pointblank's Actions and FinalActions
Get started with Pointblank for data validation using Polars, Pandas, or DuckDB
We are thrilled to announce Air, a new R formatter
nanoparquet 0.4.0 comes with a new and much faster `read_parquet()`, configurable type mappings in `write_parquet()`, and a new `append_parquet()`
httr2 1.1.0 introduces powerful new streaming capabilities with `req_perform_connection()`, as well as comprehensive URL manipulation tools, improved AWS support, and a bunch of bug fixes
Nanoparquet is a new R package that can read and write (flat) Parquet files. This post covers its features and limitations
Highlights to the most recent updates to `sparklyr` and friends
dbplyr 2.5.0 brings improved syntax for referring to tables nested in schemas and catalogs along with a bunch of minor SQL generation improvements
This release fixes a bunch of annoyances and catches up with innovations in DBI and dbplyr
httr2 is the successor to httr, providing a pipeable interface to generate HTTP requests and handle the responses. It's focussed on the needs of an R user wrapping a modern web API, but is flexible enough to handle just about any HTTP related task
dbplyr 2.4.0 brings improvements to SQL generation, better control over the generated SQL, some new translations, and a bunch of backend specific improvements
gmailr 2.0.0 streamlines the auth process and makes it easier to use gmailr in a cloud or deployed context
dplyr 1.1.1 is on CRAN! This patch release includes a number of performance regression fixes along with refinements to the multiple match join warnings that result in warnings being thrown much less often
Now including lubridate!
dtplyr brings initial support for dplyr 1.1.0 features, new translations, and a breaking change
This final post contains a grab-bag of new features, including: `pick()` for column selection inside of data-masking functions, `reframe()` as the new home for `summarise()`'s multi-row behavior, and major performance improvements to `arrange()`
All of the dplyr vector functions, like `between()` and `case_when()`, are now powered by vctrs. We've also added two powerful new helpers: `case_match()` and `consecutive_id()`
dplyr now supports an experimental per-operation grouping syntax. This serves as an alternative to `group_by()` and always returns an ungrouped data frame, meaning that you never need to remember to `ungroup()`
In dplyr 1.1.0, joins have been greatly reworked, including a new way to specify join columns, support for inequality, rolling, and overlap joins, and two new quality control arguments
There are no major new features in this version of forcats, but the 1.0.0 label now clearly advertises that this a stable member of the tidyverse
tidyr 1.3.0 brings a new family of string separating functions, along with improvements to `unnest_longer()`, `unnest_wider()`, `pivot_longer()`, and `nest()`
dbplyr 2.3.0 brings improvements to SQL generation, improved error messages, a handful of new translations, and a bunch of backend specific improvements
purrr 1.0.0 brings a basket of updates. We deprecated a number of seldom used functions to hone in on the core purpose of purrr and implemented a swath of new features including progress bars, improved error reporting, and much much more!
It's been a long three years but a new version of stringr is now on CRAN! This release includes a bunch of small but useful new functions and some increased consistency with the rest of the tidyverse
dplyr 1.1.0 is coming soon! This post introduces some of the exciting new features coming in 1.1.0, and includes a call-for-feedback as we finalize the release
tidyselect 1.2.0 hit CRAN last week and includes a few updates to the syntax of selections in tidyverse functions
This release brings improvements to SQL translation, a new way of getting local data into the database, and support for dplyr's family of row modification functions
haven 2.5.0 adds support for custom character widths, creates FDA-compliant XPT files, and can use Stata's `strL` variable type
readxl 1.4.0 is a maintenance release with practically no user-facing changes, but extensive change to package internals
tidyr 1.2.0 includes a bunch of new features and bug fixes, particularly for pivoting, rectangling, and grid specific tools
dtplyr 1.2.0 adds three new authors, a bunch of tidyr translations, new join translations, and many minor translation improvements
readr 2.1.0 is now on CRAN. This post explains the change for default reading to be eager rather than lazy
archive 1.1.2 is now on CRAN! archive lets you work with file archives, such as ZIP, tar, 7-Zip and RAR and compression formats like gzip, bzip2, XZ and Zstandard
bigrquery 1.4.0 fixes a bug in `bq_table_download()`
Version 1.0.0 marks the graduation of googlesheets4 from experimental to stable
This major release of readr includes a new multi-threaded parsing engine powered by vroom and a number of user interface improvements
googledrive 2.0.0 adapts to Drive's pivot from Team Drives to shared drives and its shift to a "single parent" model of file organization
gargle has seen a lot of development over the past two years and five releases: cache relocation, credential rolling, a new auth method, an improved user interface, better verbosity control, and retries
This version provides much improved `labelled_spss()` support, improved date-time handling, the latest ReadStat, and a bunch of other small improvements
Introducing, clock, a new package for working with date-times
The latest version of rvest brings new tools for extracting text, a radically improved `html_table()`, and a bunch of interface changes to better align rvest with the rest of the tidyverse
We've recently released a bunch of improvements to dplyr backends. multidplyr, which allows you to spread work across multiple cores, is now on CRAN. dtplyr adds translations for dplyr 1.0.0 and fixes many bugs. dbplyr 2.1.0 adds translations for many tidyr verbs, gains an author, and has improved `across()` translations
Minor release with major performance improvements for `across()` and two new functions `if_any()` and `if_all()`
Arthur Steinmetz, former Chairman, CEO, and President of OppenheimerFunds, uses R and the tidymodels package to explore the relationship between COVID-19 cases and mortality in the US
A new version of corrr features noteworthy improvements
This article, the first of three, describes how to use a code-oriented data science approach to Google Analytics data from a blog. It creates custom views of raw GA data while hiding the complexity of the Google Analytics data and interface
A new version of the magrittr package brings laziness, better performance, and leaner backtraces for debugging errors
dbplyr 2.0.0 adds missing features from dplyr 1.0.0, numerous improvements to SQL translation (including new Amazon Redshift and SAP HANA backends), and an improved system for extending dbplyr to work with other databases
furrr 0.2.0 is now on CRAN!
The newest release of readr brings improved argument consistency, better messages and more flexible output options
The newest release of broom features many new tidier methods, bug fixes, and improvements to internal consistency
Sparklyr 1.3 is now available, featuring integration of Spark higher-order functions, and data import/export in Avro and in user-defined serialization formats
A midterm report on a project to help people work with spreadsheets
haven now uses vctrs which means labelled classes will be preserved in tidyr and dplyr operation
dplyr 1.0.0 is now available from CRAN!
tidyr 1.1.0 includes a bunch of quality of life improvements, particularly for pivoting and rectangling
googlesheets4 0.2.0: create and edit Google Sheets from R, now a full replacement for the googlesheets package
sparklyr 1.2: foreach parallel backend, Databricks Connect support, and Spark 3.0 compatibility
Learn about two last-minute additions to dplyr 1.0.0: a chattier `summarise()` with more options for controlling grouping of output, and new row manipulation functions inspired by SQL
dplyr now makes heavy use of [vctrs](http://vctrs.r-lib.org/) behind the scenes. This brings with it greater consistency and (hopefully!) more useful error messages
`rowwise()` has been renewed and revamped to make it easier to perform operations row-by-row. This makes it much easier to solve problems that previously required `lapply()`, `map()`, or friends
A new `across()` function makes it much easier to apply the same operation to multiple columns. It supersedes the `_if()`, `_at()`, and `_all()` function variants
`select()` and `rename()` can now select by position, name, function of name, type, and any combination thereof. A new `relocate()` function makes it easy to change the position of columns
In `summarise()`, a single summary expression can now create both multiple rows and multiple columns. This significantly increases its power and flexibility
This post focusses on the idea of the "function lifecycle" which helps you understand where functions in dplyr are going. Particularly important is the idea of a "superseded" function. A superseded function is not going away, but we no longer recommend using it in new code
Announcing the release of forcats 0.5.0 on CRAN
slider 0.1.0 is now available on CRAN. It provides a family of general purpose sliding window functions
sparklyr 1.1: Delta Lake support, Spark 3.0 preview, and barrier execution for deep learning
vroom 1.1.0 is now on CRAN!
odbc 1.2.0 is now on CRAN. It includes improvements to dealing with schemas, an API for immediate execution, and a new parameter to control timezone outputs
tidyverse 1.3.0 is on CRAN, and has a paper in the Journal of Open Source Software! This should make it easier to cite tidyverse packages
A total rewrite of dtplyr is now available on CRAN; it performs computation lazily (like dbplyr), making it much more performant
A new version of haven makes it easy to read only parts of a file
tidyr 1.0.0: pivot_longer() and pivot_wider() replace spread() and gather()
gmailr v1.0.0 is on CRAN
googledrive v1.0.0 is on CRAN
gargle is now on CRAN
dplyr 0.8.3 is now on CRAN
dplyr 0.8.2 is now on CRAN
dplyr 0.8.1 is now on CRAN
What's new in forcats 0.4.0?
Introducing the vroom package, extremely fast data import in R
dbplyr 1.4.0: simpler SQL generation, better translations, and improved window functions
tibble 2.1.1 is on CRAN now! This article describes and motivates the latest minor release of the tibble package
sparklyr 1.0: Apache Arrow for faster data transfers, XGBoost models, broom integration, and TFRecords
stringr 1.4.0 is now on CRAN!
haven 2.1.0 is now on CRAN!
dplyr 0.8.0 is now on CRAN
bigrquery 1.1.0 is now on CRAN
purrr 0.3.0 is now on CRAN
Tibbles are a modern reimagining of the data frame, keeping what time has shown to be effective, and throwing out what is not, with nicer default output too! This article describes the latest major release and provides an outlook on further developments
dbplyr 1.3.0 is now on CRAN
haven 2.0.0 is now on CRAN
readr 1.3.1 is now on CRAN
readxl 1.2.0 is now on CRAN
httr 1.4.0 is now on CRAN!
What you need to know about upcoming changes in dplyr 0.8.0
The upcoming tibble 2.0.0 release has internal changes relevant to package developers who depend on tibble
sparklyr 0.9: Spark structured streams for real-time data processing and Kubernetes cluster support
broom 0.5.0 is on CRAN
haven 1.1.2 is now on CRAN
dplyr 0.7.5 has reached CRAN. This is mostly a bugfix release with two important new changes
sparklyr 0.8: production ML pipelines with mleap export and graph analysis with graphframes
pillar 1.2.2 is on CRAN now, losing that extra dot for whole numbers and those extra zeros when there's nothing more to see. If you still prefer the good old data frame output, this post shows how to get it back for tibbles too
readxl 1.1.0 is now on CRAN
RStudio partners with Ursa Labs to build a cross-language data science runtime powered by Apache Arrow
pillar 1.2.1 is on CRAN now, a minor update that tweaks the output of tibbles again
stringr 1.3.0 is now on CRAN
A new version of forcats is now on CRAN
tibble 1.4.2 is on CRAN now, a minor update adding options and improving performance
New version now on CRAN. It features new database backends, an enhanced copy_to(), and initial stringr support
Tibbles are a modern reimagining of the data frame, keeping what time has shown to be effective, and throwing out what is not, with nicer default output too! This article showcases the changes in the most recent version
A new version of the tidyverse package is now on CRAN
A new version of lubridate is on CRAN!
We have updated tidyselect to revert a behaviour introduced in tidyr 0.7.0
The first release of googledrive is now on CRAN. Operate on Google Drive files from R
The next installment of tidyr is finally on CRAN! This version brings tidy eval to a crucial component of the tidyverse workflow
A new version of purrr is on CRAN! It features a new family of generic mappers, a tool for tidy plucking of deep data structures, and many other features and fixes
sparklyr 0.6: distributed R with spark_apply() and external data source connections
haven 1.1.0: SAS transport files, cols_only for selective reading, and bug fixes
dbplyr 1.1.0: database backends now work directly with DBI connections and feature improved SQL translation
bigrquery 0.4.0: query Google BigQuery with DBI and dplyr backends, now with full datetime support
dplyr 0.7.0: tidy evaluation for programming with dplyr, new datasets, and improved encoding support
readxl 1.0.0: target specific cells for reading Excel files plus new logical and list column types
dplyr 0.6.0 preview: database changes with dbplyr, CJK encoding support, and tidy evaluation
Tidyverse package updates: forcats 0.2.0, readr 1.1.0, stringr 1.2.0, and tibble 1.3.0
xml2 1.1.1 adds tools for creating and modifying XML, improved list conversion, and XML validation support
sparklyr 0.5 extends dplyr with do() and n_distinct(), adds experimental Livy support for remote Spark connections
haven 1.0.0 reads and writes SAS, SPSS, and Stata files with improved missing value and date/time support
Preview of ggplot2 2.2.0 with subtitles, captions, facet rewrites, theme improvements, and better stacking
Introducing sparklyr: use dplyr syntax to manipulate Spark data and run distributed machine learning from R
The tidyverse package installs and loads core tidyverse packages (ggplot2, dplyr, tidyr, readr, purrr, tibble) in one command
lubridate 1.6.0 adds flexible period/duration parsing from strings and date rounding with unit multipliers
Introducing forcats: tools for working with factors including fct_recode(), fct_lump(), and fct_reorder()
tibble 1.2.0 adds add_column(), improves add_row() with position control, and renames frame_data() to tribble()
stringr 1.1.0 adds practice datasets (fruit, words, sentences), boundary() in more functions, and str_view() for regex
tidyr 0.6.0 adds drop_na() to remove rows containing missing values in selected or all columns
readr 1.0.0 improves column guessing with printed specs, adds better date/time parsers and low-level file readers
xml2 1.0.0 adds XML creation and modification, xml_find_first() for ragged data, and easier namespace handling
tibble 1.1 introduces tibble(), as_tibble(), and is_tibble() naming, plus safer column extraction with warnings
httr 1.2.0 adds RETRY() for unreliable APIs with exponential backoff, and fixes POST redirect issues
dplyr 0.5.0 adds coalesce(), if_else(), case_when(), recode(), and the summarise_all/at/if family of functions
tidyr 0.5.0 adds separate_rows() for delimiter-separated values, plus sep arguments for spread() and unnest()
Introducing Feather: a fast binary file format for data frames that works in both R and Python, built on Apache Arrow
Introducing the tibble package: modern data frames with better printing, stricter subsetting, and no string-to-factor conversion
ggplot2 2.1.0 fixes bugs from 2.0.0 with better histogram binning, consistent argument ordering, and alpha behavior
tidyr 0.4.0 introduces nested data frames with nest()/unnest() and complete() for making implicit missing values explicit
httr 1.0.0 switches from RCurl to curl for reliability; 1.1.0 improves error messages and OAuth support
purrr 0.2.0 adds type-stable map functions (map_lgl/int/dbl/chr), flatten variants, and safely() for error handling
readr 0.2.0 adds locales for international data (encodings, date formats, decimal marks), comment support, and CSV/TSV writers
Introducing purrr: functional programming tools for R with map functions, formula shortcuts for anonymous functions, and list manipulation
rvest 0.3.0 switches to xml2 for better performance and no memory leaks. html() becomes read_html(), html_tag() becomes html_name()
tidyr 0.3.0 adds fill() for carrying forward values, replace_na(), complete() for missing combinations, and unnest() for list columns
dplyr 0.4.3 fixes mutate() crashes, improves non-ASCII column support, shows column types when printing, and adds bind_rows(.id)
stringr 1.0.0 is now powered by stringi for faster performance and better unicode. Adds str_subset(), boundary(), and locale-aware sorting
xml2 wraps libxml2 for easy XML/HTML parsing in R: navigate trees, extract nodes with XPath, and handle namespaces
readxl reads both .xls and .xlsx files with no external dependencies. Fast, handles dates, and auto-drops blank columns
Introducing readr: read CSV, TSV, and fixed-width files 10x faster than base R, with automatic type detection and no stringsAsFactors
haven reads SAS, SPSS, and Stata files into R, including SAS7BDAT and Stata 13 formats, with labelled variable support
Epoch.com sponsors RMySQL development, enabling improved build systems and CRAN binaries for all platforms
RMySQL 0.10.0 adds CRAN binaries for Windows/Mac, transaction support, and DBI 0.3 compatibility
dplyr 0.4.0 adds full SQL-style joins, bind_rows/bind_cols, data_frame(), list-columns, and improved printing
httr 0.6.0 adds handle_reset(), streaming with write_stream(), custom HTTP verbs, and Google OAuth2 service accounts
tidyr 0.2.0 adds expand() for filling missing combinations, unnest() for list-columns, and better separate() control
magrittr 1.5 adds functional sequences (define functions with pipes), lambda syntax with braces, and %T>%, %$%, %<>% operators
Introducing rvest: scrape web data with CSS selectors or XPath, extract text/tables, and navigate sites with sessions
RSQLite 1.0.0 cleans up the API, adds initExtension() for useful functions, and improves transaction handling
dplyr 0.3 adds distinct(), slice(), rename(), transmute(), count(), data_frame(), flexible joins, and set operations
httr 0.5 adds write_disk() to save response bodies directly to disk, plus preliminary HTTP caching support
httr 0.4 adds quick start and API package vignettes, headers()/cookies() extractors, encode argument, and progress bars
Four new data packages: babynames, fueleconomy, nasaweather, and nycflights13—large datasets for learning data analysis
Introducing tidyr: reshape messy data with gather() (wide to long), separate() (split columns), and spread() (long to wide)
dplyr 0.2 imports %>% from magrittr, overhauls do() for list-columns, and adds sample_n(), summarise_each(), glimpse()
reshape2 1.4 gets a C++ melt() for 10x speedup. Also: Kevin Ushey joins RStudio
httr 0.3 overhauls OAuth with caching and improved authentication. Query web APIs easily from R
dplyr 0.1.3 fixes Rcpp compatibility and several bugs that caused R crashes
dplyr 0.1.2 improves select() with starts_with(), ends_with(), contains(), and named arguments for renaming
dplyr 0.1.1 fixes crash bugs, adds sort argument to tally(), and renames explain_tbl() to explain()
dplyr: a new package for fast data manipulation with a consistent API that works on local data frames and remote databases
ggplot2 0.9.3 fixes bugs and adds stat warnings; plyr 1.8 brings sequential summarise() and .paropts
httr 0.2: an easier way to work with web APIs, with HTTP verbs, OAuth 1.0/2.0, and automatic cookie handling
lubridate 1.2.0 is 50x faster at parsing dates, adds stamp() for custom formatting, and %m+% for safe month math