Highlights to the most recent updates to `sparklyr` and friends
Sparklyr 1.3 is now available, featuring integration of Spark higher-order functions, and data import/export in Avro and in user-defined serialization formats
sparklyr 1.2: foreach parallel backend, Databricks Connect support, and Spark 3.0 compatibility
sparklyr 1.1: Delta Lake support, Spark 3.0 preview, and barrier execution for deep learning
sparklyr 1.0: Apache Arrow for faster data transfers, XGBoost models, broom integration, and TFRecords
sparklyr 0.9: Spark structured streams for real-time data processing and Kubernetes cluster support
sparklyr 0.7: ML Pipelines API for building, tuning, and deploying machine learning workflows at scale
sparklyr 0.6: distributed R with spark_apply() and external data source connections
sparklyr 0.5 extends dplyr with do() and n_distinct(), adds experimental Livy support for remote Spark connections
Guest tutorial: Get started with SparkR for distributed computing—install locally, run map/reduce operations, and deploy on AWS
This is a recording of an RStudio webinar. You can subscribe to receive invitations to future webinars at https://www.rstudio.com/resources/webinars/ . We try to host a couple each month with the goal of furthering the R community's understanding of R and RStudio's capabilities. We are always interested in receiving feedback, so please don't hesitate to comment or reach out with a personal message