Privacy-Law Aware ML Data Preparation


The new PDP (Personal Data Protection) Law, which is similar to GDPR and CCPA, is being implemented in India. All enterprise data services including analytics and data science within the scope of the law are required to comply with the same. Almost all major geographies have now passed similar laws. The expectation of responsible data handling from organizations is also increasing.
Enrich, our product, is a high-trust data preparation platform for enterprises that provides data input to analysts and models at the scale every day. Such data preparation services are on organizations’ compliance and privacy-activity critical path because of their
‘fan-out’ nature. They provide a convenient location to enforce policy and safety mechanisms.

In this talk we discuss some of the mechanisms that we are building for clients in our data preparation platform, Enrich. They include
an open-source compliance checklist to help with the process, ‘right to forget’ service using anonymized lookup key service, and metadata service to enable tracking of the datasets. The focus will be on the generic capabilities, and not on Scribble or our product.


Venkata Pingali is an academic-turned entrepreneur. He works at the intersection of data, ML algorithms, and systems. He builds data products to speed up the adoption of ML in the enterprise. Scribble Data, his firm, offers a platform and managed service to production feature engineering.