Join us for an exclusive in-person event on “Apache Iceberg: Basics, optimizations, features, streaming data, query execution” hosted by e6data in Hyderabad!
Lakehouse Days - Powered by AWS is designed specifically for data engineers, data architects, and senior software engineers who constantly seek to optimize their data architecture to make it more price-performant while delivering the best user experience. In this edition, we will dive deep into the internal architecture of open table formats like Apache Iceberg, how Apache Kafka works, building a modern data platform that simultaneously queries streaming and analytical data on Iceberg, how Amazon S3 Tables delivers a fully managed Apache Iceberg experience to simplify large-scale analytics on Amazon S3, and how Arrow IPC enhances Apache Iceberg-based data lakes by accelerating streaming ingestion and query execution. We aim to raise awareness about these open-table formats and gain a deeper understanding.
Lakehouse Days - Powered by AWS is designed to enable fellow data geeks to meet, network, and have insightful discussions on the entropic world of data.
Topic: Streaming Data into a Lakehouse - Kafka Greets Iceberg
Summary: Join this session to learn how operational and analytical data estates are getting merged! Apache Kafka, the de-facto standard for real-time streaming data, can now materialize events in a Lakehouse(Iceberg/Delta Lake), and analytical queries can run on materialized Kafka topics. This session will start from the ground up on what Iceberg is, how Kafka works, and the community efforts behind two of the most important frameworks, Apache Kafka and Apache Iceberg, coming closer. The audience will learn how to build a modern data platform with streaming and analytical data simultaneously queried on Iceberg.
Time: 10:00 - 10:45 AM IST
Topic: Amazon S3 Tables: Scaling Apache Iceberg for High-Performance Analytics
Summary: Traditional data lakes provide immense scalability but often face performance, consistency, and interoperability challenges. In this session, David guides you through how Open Table Formats (OTFs) like Apache Iceberg revolutionize how organizations store and process tabular data at scale. He’ll dive into Iceberg’s key features, advantages over traditional approaches, and how Amazon S3 Tables, AWS’s latest innovation, delivers a fully managed Apache Iceberg experience to simplify large-scale analytics on Amazon S3. The audience will learn how S3 Tables enhance query performance, reduce operational overhead, and empower businesses with seamless and high-performance analytics at scale.
Time: 11:00 - 11:45 AM IST
Topic: Fast Distributed Iceberg Writes and Queries with Apache Arrow IPC
Summary: In modern distributed analytical systems, efficient data movement and processing are critical for performance. Apache Arrow’s Inter-Process Communication (IPC) framework provides a high-performance, language-agnostic columnar format that eliminates serialization overhead and optimizes in-memory analytics. This talk explores how Arrow IPC enhances Apache Iceberg-based data lakes by accelerating streaming ingestion and query execution. Karthic will highlight Arrow IPC’s zero-copy data sharing and high-speed transport via Arrow Flight, which streamlines data movement, and its vectorized computation capabilities, which align seamlessly with Iceberg’s columnar storage. Key applications include batching streaming data to mitigate the small files problem during ingestion and optimizing data shuffling and result delivery during queries. Through practical examples, He will demonstrate how Arrow IPC unifies fast writes and queries, delivering efficiency and scalability to Iceberg data platforms.
Time: 12:00 - 12:45 PM IST
This is an exclusive and invite-only event. Please RSVP to reserve your spot through this link: https://lu.ma/ahuq2jqz?utm_source=website
Venue - Amazon Development Centre (HYD11), Nanakramguda
Date and time - Mar 8, 2025, from 10:00 AM to 2:00 PM
We are universally interoperable and open-source friendly. We can integrate across any object store, table format, data catalog, governance tools, BI tools, and other data applications.
We use a usage-based pricing model based on vCPU consumption. Your billing is determined by the number of vCPUs used, ensuring you only pay for the compute power you actually consume.
We support all types of file formats, like Parquet, ORC, JSON, CSV, AVRO, and others.
e6data promises a 5 to 10 times faster querying speed across any concurrency at over 50% lower total cost of ownership across the workloads as compared to any compute engine in the market.
We support serverless and in-VPC deployment models.
We can integrate with your existing governance tool, and also have an in-house offering for data governance, access control, and security.