When DataFrames fail, resort to mapPartitions

Scale

06/11/2018 - 17:20 to 18:00

Moon Lounge

long talk (40 min)

Intermediate

Session abstract:

DataFrame is an awesome interface for data manipulation in Spark but when the complexity grows outside of the capabilities of Spark itself, you need to resort to "violence". In this talk I will explain one of the projects which became too complex to be executed using the DataFrame API and had to be rewritten into a custom code applied using mapPartitions function. We will cover some of the tips and tricks for reducing lineage complexity, share our process of analyzing pain points and get into details of mapPartitions functionality to leverage Spark's distributed processing capabilities and reliability while executing custom code.

Video:

#bbuzz 2018: Matija Gobec – When DataFrames fail, resort to mapPartitions

Slide:

berlinbuzzwords2018-180620103826.pdf

Berlin Buzzwords

When DataFrames fail, resort to mapPartitions

Session abstract:

Video:

#bbuzz 2018: Matija Gobec – When DataFrames fail, resort to mapPartitions

Slide:

berlinbuzzwords2018-180620103826.pdf

Newsletter

Partners

Gold Partner

Past conferences