Abstract: In this talk we present a new paradigm of computation where the intelligence is computed inside the database. Standard software systems must get the data from the database to execute a routine. If the size of the data is big, there are inefficiencies due to the data movement. Store procedures tried to solve this issue in the past, allowing for computing simple functions inside the database. However, only simple routines can be executed.
To showcase the capabilities of our new system, we created a lung cancer detection algorithm using Microsoft’s Cognitive Toolkit, also known as CNTK. We used transfer learning between ImageNet dataset, which contains natural images, and a lung cancer dataset, which contains scans of horizontal sections of the lung for healthy and sick patients. Specifically, a pretrained Convolutional Neural Network on ImageNet is used on the lung cancer dataset to generate features. Once the features are computed, a boosted tree is applied to predict whether the patient has cancer or not.
All this process is computed inside the database, so the data movement is minimized. We are even able to execute the algorithm using the GPU of the virtual machine that hosts the database. Using a GPU, we can compute the featurization in less than 1h, in contrast to using a CPU, that would take up to 32h. Finally, we set up an API to connect the solution to a web app, where a doctor can analyze the images and get a prediction of a patient.
Bio: Dr. Miguel González-Fierro is a Data Scientist at Microsoft UK, where his job consists of helping customers leverage their processes using Big Data and Machine Learning. Previously, he was CEO and founder of Samsamia Technologies, a company that created a visual search engine for fashion items allowing users to find products using images instead of words, and founder of the Robotics Society of Universidad Carlos III, which developed different projects related to UAVs, mobile robots, small humanoids competitions, and 3D printers. Miguel also worked as a robotics scientist at Universidad Carlos III of Madrid and King’s College London, where his research focused on learning from demonstration, reinforcement learning, computer vision, and dynamic control of humanoid robots. He holds a BSc and MSc in electrical engineering and an MSc and PhD in robotics.