Imagededup – Finding Duplicate Images Made Easy!


The problem of finding duplicates in an image collection is widespread. Many online businesses rely on image galleries to deliver a good customer experience and consequently, generate more revenue. Hence, the image galleries need to be of the highest quality. The presence of duplicates in such galleries could potentially degrade the customer experience. Additionally, image-based machine learning models could generate misleading results due to the duplicates present in the training/evaluation/test sets.

Therefore, finding and removing duplicates is an important requirement across several use cases. In this talk, we want to present imagededup, a Python package we built to solve the problem of finding exact and near-duplicates in an image collection. We will speak about the motivation behind building it, its functionality, and also give a demo.


Tanuj Jain is a Senior Machine Learning Engineer working at Axel Springer AI (located in Berlin) since December 2019. I was previously a part of the Data Science team at idealo Internet GmbH. My current interests revolve around deep learning research for speech and image processing. I completed my M.Sc. in Electrical Engineering from Paderborn University in 2015 and my B.Tech from GGSIP University, New Delhi in 2010. I’m very interested in leveraging the power of machine learning to empower businesses and measure the impact thus created.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google