Data Lakehouse Engineering with Apache Iceberg
Published 5/2025
Duration: 3h 31m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 1.29 GB
Genre: eLearning | Language: English
Published 5/2025
Duration: 3h 31m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 1.29 GB
Genre: eLearning | Language: English
Design scalable, versioned, and ACID-compliant data lakehouse solutions using Apache Iceberg from the ground up.
What you'll learn
- Gain a deep understanding of Apache Iceberg’s architecture, its role in the modern data lakehouse ecosystem, and why it outperforms traditional table formats li
- Learn how to create, manage, and query Iceberg tables using Python (PyIceberg), SQL interfaces, and metadata catalogs — with practical examples from real-world
- Build high-performance batch and streaming data pipelines by integrating Iceberg with leading engines like Apache Spark, Apache Flink, Trino, and DuckDB.
- Explore how to use cloud-native storage with AWS S3, and design scalable Iceberg tables that support large-scale, distributed analytics.
- Apply performance tuning techniques such as file compaction, partition pruning, and metadata caching to optimize query speed and reduce compute costs.
- Work with modern Python analytics tools like Polars and DuckDB for fast in-memory processing, enabling rapid exploration, testing, and data validation workflows
Requirements
- Basic knowledge of Python, SQL, and data concepts is helpful, but no prior experience with Apache Iceberg or cloud tools is required.
Description
Welcome toData Lakehouse Engineering with Apache Iceberg: From Basics to Best Practices– your complete guide to mastering the next generation of open table formats for analytics at scale.
As the data world moves beyond traditional data lakes and expensive warehouses,Apache Icebergis rapidly becoming thecornerstone of modern data architecture. Built for petabyte-scale datasets, Iceberg bringsACID transactions, schema evolution, time travel, partition pruning, andcompatibility across multiple engines— all in an open, vendor-agnostic format.
In this hands-on course, you'll go far beyond the basics. You'llbuild real-world data lakehouse pipelinesusing powerful tools like:
PyIceberg– programmatic access to Iceberg tables in PythonPolars– lightning-fast DataFrame library for in-memory transformationsDuckDB– local SQL powerhouse for interactive developmentApache SparkandApache Flink– for large-scale batch and streaming processingTrino– query Iceberg with federated SQLAWS S3– cloud-native object storage for Iceberg tablesAnd many more: SQL, Parquet, Glue, Athena, and modern open-source utilities
What Makes This Course Special?
Hands-on & Tool-rich:Not just Spark! Learn to use Iceberg withmodern engineslike Polars, DuckDB, Flink, and Trino.
Cloud-Ready Architecture:Learn how to store and manage your Iceberg tables onAWS S3, enabling scalable and cost-effective deployments.
Concepts + Practical Projects:Understandtable formats, catalog management, schema evolution, and then apply them using real datasets.
Open-source Focused:No vendor lock-in. You’ll buildinteroperable pipelinesusing open, community-driven tools.
What You’ll Learn:
Thewhy and how of Apache Icebergand its role in the data lakehouse ecosystem
Designing Iceberg tables withschema evolution,partitioning, andmetadata management
How to query and manipulate Iceberg tables usingPython (PyIceberg), SQL, and Spark
Real-world integration withTrino, Flink, DuckDB, andPolars
UsingS3 object storagefor cloud-native Iceberg tables
Performingtime travel, incremental reads, andsnapshot-based rollbacks
Optimizing performance withfile compaction, statistics, and clustering
Building reproducible, scalable, and maintainable data pipelines
Who Is This Course For?
Data Engineers and Architects building modern lakehouse systems
Python Developers working with large-scale datasets and analytics
Cloud Professionals using AWS S3 for data lakes
Analysts or Engineers moving from Hive, Delta Lake, or traditional warehouses
Anyone passionate aboutdata engineering, analytics, and open-source innovation
Tools & Technologies You’ll Use:
Apache Iceberg, PyIceberg, Spark, Flink, Trino
DuckDB, Polars, Pandas, SQL, AWS S3, Parquet
Integration withMetastore/Catalogs (REST, Glue)
Hands-on withJupyter Notebooks, CLI, and script-based workflows
By the end of this course, you'll be able todesign, deploy, and scale data lakehouse solutionsusing Apache Iceberg and a rich ecosystem of open-source tools — confidently and efficiently.
Who this course is for:
- This course is for data professionals and beginners who want to build scalable, modern data lakehouse solutions using Apache Iceberg and open-source tools.
More Info