Building an image search service from scratch

Abstract: Many products fundamentally appeal to our perception. When browsing through outfits on clothing sites, looking for a vacation rental on Airbnb, or choosing a pet to adopt, the way something looks is often an important factor in our decision. The way we perceive things is a strong predictor of what kind of items we will like, and therefore a valuable quality to measure.

However, making computers understand images the way humans do has been a computer science challenge for quite some time. Since 2012, Deep Learning has slowly started overtaking classical methods such as Histograms of Oriented Gradients (HOG) in perception tasks like image classification or object detection. One of the main reasons often credited for this shift is deep learning’s ability to automatically extract meaningful representations when trained on a large enough dataset.

This is why many teams — like at Pinterest, StitchFix, and Flickr — started using Deep Learning to learn representations of their images, and provide recommendations based on the content users find visually pleasing. Similarly, Fellows at Insight have used deep learning to build models for applications such as helping people find cats to adopt, recommending sunglasses to buy, and searching for art styles.

Many recommendation systems are based on collaborative filtering: leveraging user correlations to make recommendations (“users that liked the items you have liked have also liked…”). However, these models require a significant amount of data to be accurate, and struggle to handle new items that have not yet been viewed by anyone. Item representation can be used in what’s called content-based recommendation systems, which do not suffer from the problem above.

In addition, these representations allow consumers to efficiently search photo libraries for images that are similar to the selfie they just took (querying by image), or for photos of particular items such as cars (querying by text). Common examples of this include Google Reverse Image Search, as well as Google Image Search.

Based on our experience providing technical mentorship for many semantic understanding projects, we are bring a workshop to ODSC on how you would go about building your own representations, both for image and text data, and efficiently do similarity search. By the end of this workshop, you should be able to build a quick semantic search model from scratch, no matter the size of your dataset.

Bio: Coming Soon

Open Data Science Conference