} Apache Druid Data Modeling (2024)
Apache Druid Data Modeling (2024)

Apache Druid Data Modeling (2024)

Data modeling is the key to leveraging your Apache Druid® database. Learn how to ingest data into Druid data models that are fast and scalable.

rate limit

Code not recognized.

About this course

The Apache Druid® database powers analytical applications for organisations large and small around the worldUnder the hood, a highly-optimised data format and shared-nothing, micro-services architecture helps deliver on performance, resilience, and availability.  On the surface, a fully-fledged SQL dialect enables you to create flexible, interactive analytics UI components for your applications using real-time and batch ingested data.

Who is this course for?

This course guides you, as a relatively new Apache Druid® user through functionality and first principles of effective ingestion and data modelling.

  • Hear from experts on how and why to apply different techniques.
  • Get hands-on with links to Python notebooks on specific Druid database features.
  • Gain a certification that you can share on social media.

What's in the course?

There are four units in the course.  The first walks through the basics of Druid table schemas, before moving onto the second part that focuses on where to best employ Druid's processing power. Next, part three covers data layout in Druid, and why it matters so much to query performance. Finally, walk through summarisation and approximation in Apache Druid - both key techniques that can help you boost query performance.

Each unit contains presentations from experts in Apache Druid, together with specific Python notebooks to try out.

You can return to the course as often as you like – even retaking the final certification exam to up that score!

What do I need?

All of the parts of this course are optional - you can skip straight to the final exam if that's all you need!

To run the Python notebooks that walk you through features in Apache Druid, you will need to make sure that you can run the learning environment. Make sure to visit the site and can get the learn-druid environment running.

Curriculum

  • Introduction
  • Course introduction
  • Welcome!
  • Set up your learning environment
  • Design a good schema

    With Apache Druid, you can create TABLEs for storing event data, and LOOKUPs for key-value pair data.

    Hear from experts in both areas as they describe what these two structures are and how to use them effectively.

  • Expert interview
  • Exercises
  • Put processing in the right place
  • Expert interview
  • Exercises: functions
  • Exercises: JOINs
  • Learn more: JSON-based ingestion
  • Optimize segment layout and location
  • Segments and infrastructure
  • Expert interview
  • Exercises: partitioning and clustering
  • Learn more
  • Learn more: tiering in action
  • Summarise and sketch
  • Expert interview
  • Exercises: summarised tables
  • Approximation
  • Exercises: approximation
  • Tables
  • Exercises: UNION ALL
  • Learn more
  • Exam
  • Join the community
  • Feedback questionnaire
  • Exam introduction
  • Exam
  • Next steps

About this course

The Apache Druid® database powers analytical applications for organisations large and small around the worldUnder the hood, a highly-optimised data format and shared-nothing, micro-services architecture helps deliver on performance, resilience, and availability.  On the surface, a fully-fledged SQL dialect enables you to create flexible, interactive analytics UI components for your applications using real-time and batch ingested data.

Who is this course for?

This course guides you, as a relatively new Apache Druid® user through functionality and first principles of effective ingestion and data modelling.

  • Hear from experts on how and why to apply different techniques.
  • Get hands-on with links to Python notebooks on specific Druid database features.
  • Gain a certification that you can share on social media.

What's in the course?

There are four units in the course.  The first walks through the basics of Druid table schemas, before moving onto the second part that focuses on where to best employ Druid's processing power. Next, part three covers data layout in Druid, and why it matters so much to query performance. Finally, walk through summarisation and approximation in Apache Druid - both key techniques that can help you boost query performance.

Each unit contains presentations from experts in Apache Druid, together with specific Python notebooks to try out.

You can return to the course as often as you like – even retaking the final certification exam to up that score!

What do I need?

All of the parts of this course are optional - you can skip straight to the final exam if that's all you need!

To run the Python notebooks that walk you through features in Apache Druid, you will need to make sure that you can run the learning environment. Make sure to visit the site and can get the learn-druid environment running.

Curriculum

  • Introduction
  • Course introduction
  • Welcome!
  • Set up your learning environment
  • Design a good schema

    With Apache Druid, you can create TABLEs for storing event data, and LOOKUPs for key-value pair data.

    Hear from experts in both areas as they describe what these two structures are and how to use them effectively.

  • Expert interview
  • Exercises
  • Put processing in the right place
  • Expert interview
  • Exercises: functions
  • Exercises: JOINs
  • Learn more: JSON-based ingestion
  • Optimize segment layout and location
  • Segments and infrastructure
  • Expert interview
  • Exercises: partitioning and clustering
  • Learn more
  • Learn more: tiering in action
  • Summarise and sketch
  • Expert interview
  • Exercises: summarised tables
  • Approximation
  • Exercises: approximation
  • Tables
  • Exercises: UNION ALL
  • Learn more
  • Exam
  • Join the community
  • Feedback questionnaire
  • Exam introduction
  • Exam
  • Next steps