Top 10 data engineering mistakes

Scale
06/11/2018 - 11:00 to 11:40
Palais Atelier
long talk (40 min)
Beginner

Session abstract: 

A large fraction of big data projects fail to deliver return of investment, or take years before they do so. The reasons are typically a combination of project management, leadership, organisation, available competence, and technical failures. In this presentation, I will focus on the technical aspects, and present the most common or costly data engineering mistakes that I have experienced when building scalable data processing technology over the last five years, as well as advice for how to avoid them. The presentation includes war stories from large scale production environments, some that lead to reprocessing of petabytes of data, or DDoSing critical services with a Hadoop cluster, and what we learnt from the incidents.

Video: 

Slide: