Deploying Large Spark Models to production and model scoring in near real time

Scale

06/12/2018 - 11:00 to 11:40

Maschinenhaus

long talk (40 min)

Intermediate

Session abstract:

How does one build a pyspark model and deploy it in a scala pipeline with no code rewrite - Solving the greatest fights between datascientist who want to code in python and data engineers who like the tried and tested type safety of the JVM.
How does one beat the spark context latency to serve spark models in milliseconds to handle near realtime business needs
How does one build a ML model, zip it up and deploy it across platforms in a completely vendor neutral way i.e. build your model on AWS and deploy it on GCP or vice-versa.
How does one leverage the years of efforts spent in software engineering and use it directly in building datascience pipelines without reinventing the wheel and pain.
How does on build a completely GDPR compliant machine learning model with 0.88 on the ROC curve.