Weekend Project: Real World AirBnB Data Science and Pricing Bot

Scale
06/13/2017 - 14:30 to 16:00
Palais Atelier
Workshop
Intermediate

Session abstract: 

For those getting started in AI, finding an interesting yet solvable problem can often be one of the greatest challenges. A project needs a set of data or stimuli that is rich yet easy to collect and a relevant problem that motivates the practitioner.  If hoping to share the project with a diverse audience, it must not require extensive background industry knowledge. Data from Airbnb’s API is a “one-stop-shop” for new practitioners to experiment with a very diverse set of techniques and methods from the AI/Machine Learning canon. This tutorial introduces the Airbnb API, shows how to collect data from it, and presents a collection of examples of how artificial intelligence tools and algorithms can be used to extract value from the data.

This tutorial is motivated by the speaker’s own experience--he has a spare apartment and a guest apartment in the Logan Square neighborhood of Chicago that he began renting on Airbnb.  Trapped indoors throughout the savage Chicago winter, he spent his time building a bot that seeks to optimize the listing price and apartment description for his listing. The success of this project has led the speaker to consider purchasing another property in the neighboring Avondale neighborhood as an Airbnb investment property.

This tutorial will present the interesting highlights of the project, to include the following.

A Python script collects the listings available for each of the next 60 days.  Each listing has structured information such as the offered price, number of bedrooms, unit rating, host rating, etc. and unstructured information such as a title, description of the unit, and pictures.

Using Apache Spark, the JSON data is loaded into SparkSQL tables. Advanced querying techniques in SparkSQL are leveraged to convert semi-structured JSONs into structured tables for consumption by machine learning algorithms. Additional columns such as “rented units” and “price” are inferred if a unit was “available” yesterday, but is not listed today.

Methods for creating demand curves will be presented, such as a mixed-effects model using R and then Apache Mahout, as well as random-forests and multi-layer perceptrons using SparkML.  The usefulness of single-node machine learning libraries such as Python’s sklearn are also demonstrated, as well as at what scale such methods become unrealistic, providing motivation for using “Big Data” tools.

Text analysis on listing title and description is performed, as well as image feature recognition in listing images, using the “fixed effects” of previous models as a dependent variable.  A simple machine is created for helping the user ‘score’ the quality of their own listing description.  We will also present a toy application that uses images and facts about the apartment to algorithmically generate “optimal” listing descriptions using a long short-term memory neural network.

Finally, a bot finds an optimal schedule of prices for the next 60 days for all of our listings, and uses the Airbnb API to update our listing prices accordingly. We will explore various methods of creating such a bot, and how to incorporate machine learning findings.

The delayed reward problem is particularly applicable here because there are N days until a listing can possibly be rented.  This is also a good example of exploration-exploitation as the bot must weigh trade-offs between lowering the price to increase the probability of renting the unit versus raising the price to increase revenue.  Finally, since each day in the future will only happen once and there are a limited number of days until the rental event happens, this is also a good fit for the generalization problem.

Attendees will come away from this tutorial with a number of exciting ideas for machine learning and artificial intelligence projects which fit together into a nice macro project, where each sub project can be explored at a depth ranging from beginner to advanced.  This tutorial is great for autodidacts seeking a plan of study that exposes them to a diverse set of topics in the space, as well as those considering teaching a full course on the subject.  Working starter code/instructions will be available.