Join us for an exclusive in-person event on "Open table formats: Apache Iceberg, Delta, and Apache Hudi," hosted by e6data in collaboration with The Big Data Show. This meetup is designed specifically for data engineers, data architects, and senior software engineers who are constantly looking to optimise their data architecture to make it more price-performant while delivering the best user experience. In this edition, we will be deep-diving into the internal architecture of open table formats like Apache Iceberg, Delta, and Apache Hudi. We aim to raise awareness about these open-table formats and gain a deeper understanding of them.Lakehouse Days is designed to enable fellow data nerds to meet, network, and have insightful discussions on the entropic world of data.
Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. This talk will focus on the internal architecture of Apache Hudi and demonstrate the internals of file layouts of Apache Hudi.
Time: 9:00 - 9:45 AM IST
Apache Iceberg is an open-source high-performance format for huge analytic tables, which enables the use of SQL tables for big data while making it possible for engines like Spark, Trino, Flink, Presto, and e6data query engines. In this talk, we will discuss and demonstrate the data layer, the metadata layer, and metadata files, along with their subcomponents.
Time: 10:00 - 10:45 AM IST
Delta Lake is an open-source storage framework that enables building a format-agnostic Lakehouse architecture. In this talk, we will dive deep into Delta's internal file layout and what makes it so performant.
Time: 11:00 - 11:45 AM IST
Insights into the evolving landscape and emerging use cases centred around data lakehouse architecture, with emerging players in data catalogs, open table formats, query engines, and more by a curated panel of senior data architects and engineers from leading enterprises.
Time: 11:45 - 12:30 PM IST
Pick a heavy workload
Choose a common cross-industry "heavy" workload; OR Work with our solution architect team to identify your own.
Define your 360° interop
Define all points of interop with your stack: e.g. Catalog, BI Tool, etc. e6data is serverless first and available on AWS/Azure.
Pick a success metric
Supported dimensions: Speed/Latency, TCO, Latency Under Load. Pick any linear combination of these three dimensions.
Pick a kick off date
Assemble your team (data engineer, architect, devOps) for kickoff from the date of kickoff, and go live in 10 business days.
We are universally interoperable and open-source friendly. We can integrate across any object store, table format, data catalog, governance tools, BI tools, and other data applications.
We use a usage-based pricing model based on vCPU consumption. Your billing is determined by the number of vCPUs used, ensuring you only pay for the compute power you actually consume.
We support all types of file formats, like Parquet, ORC, JSON, CSV, AVRO, and others.
e6data promises a 5 to 10 times faster querying speed across any concurrency at over 50% lower total cost of ownership across the workloads as compared to any compute engine in the market.
We support serverless and in-VPC deployment models.
We can integrate with your existing governance tool, and also have an in-house offering for data governance, access control, and security.