Tags
Language
Tags
May 2025
Su Mo Tu We Th Fr Sa
27 28 29 30 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Data Lakehouse Engineering with Apache Iceberg

    Posted By: lucky_aut
    Data Lakehouse Engineering with Apache Iceberg

    Data Lakehouse Engineering with Apache Iceberg
    Published 5/2025
    Duration: 3h 31m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 1.29 GB
    Genre: eLearning | Language: English

    Design scalable, versioned, and ACID-compliant data lakehouse solutions using Apache Iceberg from the ground up.

    What you'll learn
    - Gain a deep understanding of Apache Iceberg’s architecture, its role in the modern data lakehouse ecosystem, and why it outperforms traditional table formats li
    - Learn how to create, manage, and query Iceberg tables using Python (PyIceberg), SQL interfaces, and metadata catalogs — with practical examples from real-world
    - Build high-performance batch and streaming data pipelines by integrating Iceberg with leading engines like Apache Spark, Apache Flink, Trino, and DuckDB.
    - Explore how to use cloud-native storage with AWS S3, and design scalable Iceberg tables that support large-scale, distributed analytics.
    - Apply performance tuning techniques such as file compaction, partition pruning, and metadata caching to optimize query speed and reduce compute costs.
    - Work with modern Python analytics tools like Polars and DuckDB for fast in-memory processing, enabling rapid exploration, testing, and data validation workflows

    Requirements
    - Basic knowledge of Python, SQL, and data concepts is helpful, but no prior experience with Apache Iceberg or cloud tools is required.

    Description
    Welcome toData Lakehouse Engineering with Apache Iceberg: From Basics to Best Practices– your complete guide to mastering the next generation of open table formats for analytics at scale.

    As the data world moves beyond traditional data lakes and expensive warehouses,Apache Icebergis rapidly becoming thecornerstone of modern data architecture. Built for petabyte-scale datasets, Iceberg bringsACID transactions, schema evolution, time travel, partition pruning, andcompatibility across multiple engines— all in an open, vendor-agnostic format.

    In this hands-on course, you'll go far beyond the basics. You'llbuild real-world data lakehouse pipelinesusing powerful tools like:

    PyIceberg– programmatic access to Iceberg tables in PythonPolars– lightning-fast DataFrame library for in-memory transformationsDuckDB– local SQL powerhouse for interactive developmentApache SparkandApache Flink– for large-scale batch and streaming processingTrino– query Iceberg with federated SQLAWS S3– cloud-native object storage for Iceberg tablesAnd many more: SQL, Parquet, Glue, Athena, and modern open-source utilities

    What Makes This Course Special?

    Hands-on & Tool-rich:Not just Spark! Learn to use Iceberg withmodern engineslike Polars, DuckDB, Flink, and Trino.

    Cloud-Ready Architecture:Learn how to store and manage your Iceberg tables onAWS S3, enabling scalable and cost-effective deployments.

    Concepts + Practical Projects:Understandtable formats, catalog management, schema evolution, and then apply them using real datasets.

    Open-source Focused:No vendor lock-in. You’ll buildinteroperable pipelinesusing open, community-driven tools.

    What You’ll Learn:

    Thewhy and how of Apache Icebergand its role in the data lakehouse ecosystem

    Designing Iceberg tables withschema evolution,partitioning, andmetadata management

    How to query and manipulate Iceberg tables usingPython (PyIceberg), SQL, and Spark

    Real-world integration withTrino, Flink, DuckDB, andPolars

    UsingS3 object storagefor cloud-native Iceberg tables

    Performingtime travel, incremental reads, andsnapshot-based rollbacks

    Optimizing performance withfile compaction, statistics, and clustering

    Building reproducible, scalable, and maintainable data pipelines

    Who Is This Course For?

    Data Engineers and Architects building modern lakehouse systems

    Python Developers working with large-scale datasets and analytics

    Cloud Professionals using AWS S3 for data lakes

    Analysts or Engineers moving from Hive, Delta Lake, or traditional warehouses

    Anyone passionate aboutdata engineering, analytics, and open-source innovation

    Tools & Technologies You’ll Use:

    Apache Iceberg, PyIceberg, Spark, Flink, Trino

    DuckDB, Polars, Pandas, SQL, AWS S3, Parquet

    Integration withMetastore/Catalogs (REST, Glue)

    Hands-on withJupyter Notebooks, CLI, and script-based workflows

    By the end of this course, you'll be able todesign, deploy, and scale data lakehouse solutionsusing Apache Iceberg and a rich ecosystem of open-source tools — confidently and efficiently.

    Who this course is for:
    - This course is for data professionals and beginners who want to build scalable, modern data lakehouse solutions using Apache Iceberg and open-source tools.
    More Info

    Please check out others courses in your favourite language and bookmark them
    English - German - Spanish - French - Italian
    Portuguese