Across our two brands, Badoo and Bumble, we have over 500 million registered users worldwide uploading millions of photos a day to our platform. These images provide us with a rich data set from which we derive a wealth of insights.
User Profile Information
Within our data science team, we created a service where images are input, and output is the image content information. We use this service during our prototyping phase, during our exploratory analysis and for delivering ad-hoc insights to the business about image content.
For example, we have observed that if someone is wearing sunglasses in all of their pictures, they tend to receive fewer likes than users who clearly show their faces. This enables us to provide tips to users on how to enhance their profiles.
In this blog, I give an overview of how we combined deep neural networks and Flask APIs to offer this service.
Computer Vision Tasks
Our API serves a variety of models, providing different types of information on image content. Some of the models we use are:
- Perform image classification
- Provide textual descriptions of image content
- Object detection
Impact & Workflow
During the execution of projects, we carry out a number of potential impact analyses. These give an estimate of the value we expect the project to deliver as we learn more and progress throughout the process. The purpose of this is to ensure that resources get allocated effectively and that what we produced delivers maximum impact on the business. The visual below outlines the workflow we aim for.
During the prototyping phase of our computer vision models, we need to be able to assess what the possible impact might be on the business if we were to allocate resources and productionise them. In order to assess this fully, we needed to be able to scale the number of images we could score. To this end, we built a Flask framework to enable us to serve the models on a greater scale compared to using a local machine.
Once we had trained our models using Jupyter notebooks and .py scripts we wanted other members of the team and people across the business to be able to use them to support their prototyping efforts and potential impact reviews. To achieve this, we decided to encapsulate the models in REST APIs. An API essentially allows you to interact over HTTP, making requests to specific URLs and getting relevant data back in the response.
The reason we decided to use APIs is that it makes it easy for cross-language applications to work well. For example, when it’s necessary for a front-end developer to use these models, they simply need to get the endpoint of the API and have no need to be familiar with Python or have domain-specific knowledge.
There are a host of third-party solutions offering machine vision APIs including Google Cloud Vision and AWS Rekognition. We decided against going down this route in the interests both of minimising costs and keeping our data in-house. We used Python Flask to build and serve our API in-house. Flask is a microframework for Python and offers a powerful way of annotating Python functions with REST endpoints.
Flask and Django are relatively comparable to Python web frameworks. We decided to use Flask over Django because it is very simple and easy to get started with whereas Django is quite heavy for building web applications. Simplicity and flexibility being two key requirements for our service also influenced our decision.
Hosting the Service
Once our Flask API was up and running on a local machine, we then packaged up the service as an application on one of our servers for easy access by other people in the business. On these servers, we have GPUs which help to accelerate the computational time.
In order to containerize the service on our server, we created a Docker container from an image. If you are not familiar with Docker I would recommend taking a look at their documentation online as it is very thorough and digestible.
The infrastructure we have developed is primarily used by the Data Science team when carrying out exploratory pieces of work, performing ad-hoc analysis, and during our model prototyping phase. We also use the service to help to determine whether we should be investing resources into putting the models onto production. We have found that the framework allows our team to work in a more unified yet flexible way.
During my talk at ODSC, “Image Detection as a Service: How we Use APIs and Deep Learning to Support our Products,” I’ll discuss this service in more detail, the challenges we faced along with some advice regarding best practices. I hope you will attend and enjoy the talk.
About the author/ODSC Speaker: Laura Mitchell
With over 10 years of experience in the tech and data science space, I am the Lead Data Scientist at MagicLab whose brands, Badoo and Bumble, have connected the lives of over 500 million people through dating, social and business.
I am a published author in the field of Artificial Intelligence and have hands-on and leadership experience in the delivery of projects using natural language processing, computer vision, and recommender systems, from initial conception through to production.