Data Lakehouse Engineering with Apache Iceberg

Posted By: lucky_aut

Data Lakehouse Engineering with Apache Iceberg
Published 5/2025
Duration: 3h 31m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 1.29 GB
Genre: eLearning | Language: English

Design scalable, versioned, and ACID-compliant data lakehouse solutions using Apache Iceberg from the ground up.

What you'll learn
- Gain a deep understanding of Apache Iceberg’s architecture, its role in the modern data lakehouse ecosystem, and why it outperforms traditional table formats li
- Learn how to create, manage, and query Iceberg tables using Python (PyIceberg), SQL interfaces, and metadata catalogs — with practical examples from real-world
- Build high-performance batch and streaming data pipelines by integrating Iceberg with leading engines like Apache Spark, Apache Flink, Trino, and DuckDB.
- Explore how to use cloud-native storage with AWS S3, and design scalable Iceberg tables that support large-scale, distributed analytics.
- Apply performance tuning techniques such as file compaction, partition pruning, and metadata caching to optimize query speed and reduce compute costs.
- Work with modern Python analytics tools like Polars and DuckDB for fast in-memory processing, enabling rapid exploration, testing, and data validation workflows

Requirements
- Basic knowledge of Python, SQL, and data concepts is helpful, but no prior experience with Apache Iceberg or cloud tools is required.

Description
Welcome toData Lakehouse Engineering with Apache Iceberg: From Basics to Best Practices– your complete guide to mastering the next generation of open table formats for analytics at scale.

As the data world moves beyond traditional data lakes and expensive warehouses,Apache Icebergis rapidly becoming thecornerstone of modern data architecture. Built for petabyte-scale datasets, Iceberg bringsACID transactions, schema evolution, time travel, partition pruning, andcompatibility across multiple engines— all in an open, vendor-agnostic format.

In this hands-on course, you'll go far beyond the basics. You'llbuild real-world data lakehouse pipelinesusing powerful tools like:

PyIceberg– programmatic access to Iceberg tables in PythonPolars– lightning-fast DataFrame library for in-memory transformationsDuckDB– local SQL powerhouse for interactive developmentApache SparkandApache Flink– for large-scale batch and streaming processingTrino– query Iceberg with federated SQLAWS S3– cloud-native object storage for Iceberg tablesAnd many more: SQL, Parquet, Glue, Athena, and modern open-source utilities

What Makes This Course Special?

Hands-on & Tool-rich:Not just Spark! Learn to use Iceberg withmodern engineslike Polars, DuckDB, Flink, and Trino.

Cloud-Ready Architecture:Learn how to store and manage your Iceberg tables onAWS S3, enabling scalable and cost-effective deployments.

Concepts + Practical Projects:Understandtable formats, catalog management, schema evolution, and then apply them using real datasets.

Open-source Focused:No vendor lock-in. You’ll buildinteroperable pipelinesusing open, community-driven tools.

What You’ll Learn:

Thewhy and how of Apache Icebergand its role in the data lakehouse ecosystem

Designing Iceberg tables withschema evolution,partitioning, andmetadata management

How to query and manipulate Iceberg tables usingPython (PyIceberg), SQL, and Spark

Real-world integration withTrino, Flink, DuckDB, andPolars

UsingS3 object storagefor cloud-native Iceberg tables

Performingtime travel, incremental reads, andsnapshot-based rollbacks

Optimizing performance withfile compaction, statistics, and clustering

Building reproducible, scalable, and maintainable data pipelines

Who Is This Course For?

Data Engineers and Architects building modern lakehouse systems

Python Developers working with large-scale datasets and analytics

Cloud Professionals using AWS S3 for data lakes

Analysts or Engineers moving from Hive, Delta Lake, or traditional warehouses

Anyone passionate aboutdata engineering, analytics, and open-source innovation

Tools & Technologies You’ll Use:

Apache Iceberg, PyIceberg, Spark, Flink, Trino

DuckDB, Polars, Pandas, SQL, AWS S3, Parquet

Integration withMetastore/Catalogs (REST, Glue)

Hands-on withJupyter Notebooks, CLI, and script-based workflows

By the end of this course, you'll be able todesign, deploy, and scale data lakehouse solutionsusing Apache Iceberg and a rich ecosystem of open-source tools — confidently and efficiently.

Who this course is for:
- This course is for data professionals and beginners who want to build scalable, modern data lakehouse solutions using Apache Iceberg and open-source tools.
More Info

Please check out others courses in your favourite language and bookmark them
English - German - Spanish - French - Italian
Portuguese