Delta Lake Blogs
Efficient Delta Vacuum with File Inventory
by Arun Ravi M V (Grab)
Today, Delta Lake is rapidly making its mark as a highly popular hybrid data format, earning widespread adoption across various organizations.
Rivian expands the Delta Lake ecosystem with Delta-Go
by Chelsea Jones, Staff Data Engineer, Rivian; Rahul Madnawat, Software Engineer II, Rivian; Jason Shiverick, Director of AI Platforms, Rivian
Real-time data ingestion for high-volume transactions, now available in open source
Pros and cons of Hive-style partitioning
by Matthew Powers, Martin Bode
This post discusses the pros and cons of Hive-style partioning.
Structured Spark Streaming with Delta Lake: A Comprehensive Guide
by Delta Lake
The webinar demonstrates how to embrace structured streaming seamlessly from data emission to your final Delta table destination.
High-Performance Querying on Massive Delta Lake Tables with Daft
by Clark Zinzow, Jay Chia
This post introduces the distributed + parallel Delta Lake reader in Daft.
Delta Lake - State of the Project - Part 2
by Tathagata "TD" Das, Susan Pierce, Carly Akerly
Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.
Delta Lake Announces Pandas Enhancement: Real Pandas to Optimize Data Lakehouse Performance
by Carly Akerly
The Delta Lake project is thrilled to announce its latest and most exciting collaboration with the Pandas community!
Delta Lake - State of the Project - Part 1
by Tathagata "TD" Das, Susan Pierce, Carly Akerly
Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.
Delta Lake 3.1.0
by Carly Akerly
This post describes the exiting features in the Delta Lake 3.1.0 release
Delta Lake replaceWhere
Selectively overriding rows or partitions of a Delta Lake table with replaceWhere.
Delta Lake Performance
by Joe Harris
This post shows explains why Delta Lake is fast and describes improvements to Delta Lake performance over time.
Writing a Kafka Stream to Delta Lake with Spark Structured Streaming
by Bo Gao, Matthew Powers
This blog post explains how to write a Kafka stream to a Delta table with Spark Structured Streaming.
Using Delta Lake with AWS Glue
by Keerthi Josyula, Matthew Powers
This post shows how to register Delta tables in the AWS Glue Data Catalog with the AWS Glue Crawler.
New features in the Python deltalake 0.12.0 release
by Ion Koutsouris
This post explains the new features in the Python deltalake 0.12.0 release
Delta Lake 3.0.0
by Carly Akerly
This post describes the exiting features in the Delta Lake 3.0.0 release
Delta Lake vs. Parquet Comparison
This post compares the stengths and weaknesses of Delta Lake vs Parquet.
Unlock Delta Lakes for PyTorch Training with DeltaTorch
by Daniel Liden, Michael Shtelma
This post demonstrates how to create PyTorch DataLoaders using Delta tables as data sources for training deep learning models.
Introducing Delta Lake Table Features
by Nick Karpov
This introduces Delta Lake Table Features, a discrete feature-based compatibility scheme that replaces the traditional integer protocol versioning for Delta Lake tables and clients.
Delta Lake Change Data Feed (CDF)
by Nick Karpov, Matthew Powers
This blog shows how to enable and use the Delta Lake Change Data Feed.
Delta Lake’s transaction log protocol and its implementations
This blog explains the Delta Lake transaction log protocol and its various implementation.
Delta Lake Deletion Vectors
by Nick Karpov
This blog introduces the new Deletion Vectors table feature for Delta Lake tables, and explains how Deletion Vectors speed up operations that modify existing data in your lakehouse.
Using Ibis with PySpark on Delta Lake tables
by Marlene Mhangami, Matthew Powers
This post explains how to use Ibis to query Delta tables with PySpark
Delta Lake 2.3.0 Released
by Allison Portis, Matthew Powers
This post explains some of the key features in the Delta Lake 2.3.0 release
Open source self-hosted Delta Sharing server
by Shingo OKAWA
This post explains Kotosiro Delta Sharing server basic instructions
How Delta Lake uses metadata to make certain aggregations much faster
by Matthew Powers, Scott Sandre
This post explains Delta Lake performance optimizations that make some aggregations execute quicker
How to use Delta Lake generated columns
How to create Delta Lake tables with generated columns and the benefits of this feature
Introducing Support for Delta Lake Tables in AWS Lambda
by Nick Karpov
How to use deltalake in AWS Lambda with AWS SDK for pandas
How to create and append to Delta Lake tables with pandas
This post explains how to create and append to Delta Lake tables with pandas
Running ML Workflows with Delta Lake and Ray
by Jim Hibbard
This post explains how you can read Delta Lake with the Ray compute framework
How to Convert from CSV to Delta Lake
This post explains how to convert from a CSV data lake to Delta Lake, which offers much better features.
Getting started contributing to Delta Lake Spark
by Nick Karpov
This post explains the full development loop with the Delta Lake Spark connector. You'll learn how to retrieve and navigate the codebase, make changes, and package and debug custom builds.
New features in the Python deltalake 0.7.0 release of delta-rs
by Will Jones, Matthew Powers
This post explains the new features in the deltalake 0.7.0 release
Delta Lake Schema Evolution
This post shows how to enable schema evolution in Delta tables and when this is a good option.
Delta Lake Time Travel
This post shows how to time travel between different versions of a Delta table.
Delta Lake Small File Compaction with OPTIMIZE
This post shows compact small files in Delta tables with OPTMIZE.
Adding and Deleting Partitions in Delta Lake tables
by Matthew Powers, Ryan Zhu
This post shows add partitions and remove partitions from Delta Lake tables.
Remove old files with the Delta Lake Vacuum Command
by Matthew Powers, Nick Karpov
This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.
Reading Delta Lake Tables into Polars DataFrames
by Matthew Powers, Chitral Verma
This post shows how to read Delta Lake tables into Polars DataFrames.
Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR
by Vedant Jain, Denny Lee
In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.
Data Sharing across Government Agencies using Delta Sharing
by Li Yu, Mubashir Kazia, Jon D. Ceanfaglione, Prabha Rajendran, Purushotam Shrestha, Shawn A. Benjamin
This post shows how government agencies are sharing data with Delta Sharing.
How to Delete Rows from a Delta Lake Table
This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.
Delta Lake Constraints and Checks
This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.
Delta Lake Schema Enforcement
This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes
Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables
This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.
How to Create Delta Lake tables
This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.
How to Version Your Data with pandas and Delta Lake
This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.
Sharing a Delta Table’s Change Data Feed with Delta Sharing 0.5.0
by Will Girten
We are excited to announce the release of Delta Sharing 0.5.0.
How to Rollback a Delta Lake Table to a Previous Version with Restore
This post shows you how to rollback Delta Lake tables to previous versions with restore.
Converting from Parquet to Delta Lake
This post shows how to convert a Parquet table to a Delta Lake.
Why we migrated to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team
by Robert Thompson, Geoff Freeman
In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly....
How to drop columns from a Delta Lake table
This post shows you two ways to drop columns from Delta Lake tables.
Apache Flink Source Connector for Delta Lake tables
by Krzysztof Chmielewski, Scott Sandre, Denny Lee
We are excited to announce the release of Delta Connectors 0.5.0, which introduces the new Flink/Delta Source Connector on Apache Flink™ 1.13 that can read directly from Delta tables using Flink’s DataStream API.
Delta 2.0 - The Foundation of your Data Lakehouse is Open
by Tathagata Das, Denny Lee
We are happy to announce the release of the Delta Lake 2.0 on Apache Spark™ 3.2! The significance of Delta Lake 2.0 is not just a number - though it is timed quite nicely with Delta Lake’s 3rd birthday....
Multi-cluster writes to Delta Lake Storage in S3
by Scott Sandre, Denny Lee, Mariusz Kryński (Samba TV)
While Delta Lake has supported concurrent reads from multiple clusters since its inception, there were limitations for multi-cluster writes specifically to Amazon S3. Note, this was not a limitation for Azure ADLSgen2 nor Google GCS, as S3 currently lacks...
Delta Lake 1.2 - More Speed, Efficiency and Extensibility Than Ever
by Venki Korukanti, Scott Sandre, Tathagata Das, Allison Portis, Denny Lee, Vini Jaiswal
Introducing performance optimizations that will supercharge your data pipelines at any scale.
Writing to Delta Lake from Apache Flink
by Fabian Paul, Pawel Kubit, Scott Sandre, Tathagata Das, Denny Lee
Learn more about how you can write from Apache Flink to Delta Lake about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
Extending Delta Sharing to Google Cloud Storage
by Will Girten, Shixiong Zhu
Learn more about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
Delta Connectors 0.3.0 Released
by Allison Portis
We are excited to announce the release of Delta Connectors 0.3.0.
Delta Lake 1.1.0 Released
by Scott Sandre
We are excited to announce the release of Delta Lake 1.1.0.
Delta Sharing 0.3.0 Released
by Lin Zhou
We are excited to announce the release of Delta Sharing 0.3.0.
Power BI Delta Sharing Connector
by Denny Lee
We are excited about the recently announced preview of the Power BI Delta Sharing connector
Delta Lake User Survey (2021 H2)
by Denny Lee
We would like to invite you to provide your feedback on Delta Lake OSS.
Delta Lake 1.0.0 Released
by Tathagata Das
We are excited to announce the release of Delta Lake 1.0.0 on Apache Spark 3.1.
AMA: Growing the Delta Lake ecosystem
by Denny Lee
On March 11th, 2021 9:00 am PT, join us for this fun Delta Lake AMA session where we discuss with QP Hou, Christian Williams, and Alexander Kushnir from Scribd on growing the Delta Lake open-source ecosystem.
Salesforce Engineering: Delta Lake Tech Talk Series
by Denny Lee
We are happy to announce the Salesforce Engineering Delta Lake Tech Talk Series for March and April 2021.
Salesforce Engineering: Delta Lake Blog Series
by Denny Lee
Salesforce Engineering has published a series of blogs on how they use Delta Lake.
Salesforce Engineering: Global Synchronousness and Ordering in Delta Lake
by Denny Lee
At Salesforce, we maintain a platform to capture customer activity — various kinds of sales events such as emails, meetings, and videos. These events are either consumed by downstream products in real time or stored in our data lake, which...
Salesforce Engineering: Engagement Activity Delta Lake, Redshift Sectrum supports Delta Lake
by Denny Lee
We have a couple of exciting call outs this week!
Getting Started with Delta Lake
by Denny Lee
Want to learn more about Delta Lake? Check out this series of Delta Lake videos.
Delta Lake Sessions at Spark+AI Summit North America 2020
by Denny Lee
We're really excited for the numerous Delta Lake training and conference sessions that will be showcased throughout Spark+AI Summit NA 2020.
Delta Lake 0.7.0 Released
by Denny Lee
We are excited to announce the release of Delta Lake 0.7.0 on Apache Spark 3.0. This is the first release on Spark 3.x and adds support for metastore-defined tables and SQL DDLs.
Delta Lake 0.6.1 Released
by Denny Lee
We are excited to announce the release of Delta Lake 0.6.1, which fixes a few critical bugs in merge operation and operation metrics. If you are using version 0.6.0, it is strongly recommended that you upgrade to version 0.6.1.
Delta Lake 0.6.0 Released
by Denny Lee
We are excited to announce the release of Delta Lake 0.6.0, which introduces schema evolution and performance improvements in merge, and operation metrics in table history.
Delta Lake Newsletter: 2020-03-20 Edition
by Denny Lee
For this edition of the Delta Lake Newsletter, find out more about the latest and upcoming tech talks and videos.
Diving into Delta Lake Online Tech Talk Series
by Denny Lee
For our next series of Delta Lake online tech talks, we're excited to dive into the internals with our Diving into Delta Lake series. This will be a fun set of tech talks with live demos and Q&A. Check them...
Delta Lake Online Tech Talks
by Denny Lee
We’re excited to announce the next series of Delta Lake online tech talks over the next few weeks. This will be a fun set of tech talks with live demos and Q&A. Check them out!
Delta Lake 0.5.0 Released
by Denny Lee
We are excited to announce the release of Delta Lake 0.5.0, which introduces Presto/Athena support and improved concurrency.
Delta Lake Newsletter: 2019-10-03 Edition (incl. SAIS EU 2019 Sessions)
by Denny Lee
This edition of the Delta Lake Newsletter, find out more about the latest and upcoming webinars, meetups, and publications. For this edition, we will also focus on the many sessions at Spark+AI Summit EU 2019 in Amsterdam.
Delta Lake 0.4.0 Released
by Denny Lee
We are excited to announce the release of Delta Lake 0.4.0 which introduces Python APIs for manipulating and managing data in Delta tables.
Delta Lake 0.3.0 Released
by Denny Lee
We are happy to announce the availability of Delta Lake 0.3.0! Features include: Scala Java APIs for DML commands, Scala/Java APIs for query commit history, and Scala/Java APIs for vacuuming old files.
Delta Lake 0.2.0 Released
by Denny Lee
We are happy to announce the availability of Delta Lake 0.2.0! It brings support for cloud storage (e.g. Amazon S3 and Azure Blob Storage) and improved concurrency.
Delta Lake 0.1.0 Released
by Denny Lee
We are happy to announce the availability of Delta Lake 0.1.0! Initial version of the open source Delta Lake.