Are Labels All You Need? Building Machine Learning Models at SentiLink


SentiLink builds models in areas of fraud where no dataset containing ground truth exists. This means our models are built using a combination of the massive amount of unlabeled data we receive from our partners, as well as labels that an internal team of fraud experts generates by manually reviewing cases escalated to them by both the data science team and external partners. This presents us with a number of unique challenges: how do we use unlabeled data to improve our models by supplementing these labeled cases during model training? How do we select incremental cases for these risk analysts to review, given what we’ve already labeled (a problem very similar to active learning, but with some special caveats for our domain)? How do we determine how much of different kinds of fraud are hitting our partners, given that we can’t label everything; that is, how do we know there aren’t significant fraud trends that we’re missing? In this talk, SentiLink Data Scientist Seth Weidman will tackle these questions and more. Attendees will come away with a better understanding of how to operate in the context of partially-labeled datasets.


Seth Weidman is a Data Scientist at SentiLink, where he works on the core synthetic fraud and identity theft models that power SentiLink’s API-based solution to stopping fraud, as well as on new product development. Immediately before SentiLink he was at Facebook. He is the author of Deep Learning From Scratch, published by O’Reilly in 2019, and has degrees in mathematics and economics from the University of Chicago.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google