Provenance: a Fundamental Data Governance Tool ⎯ a Case Study for Data Science Pipelines and Their Explanations


Data science pipelines are increasingly used to build new services to the general public. Significant concerns about privacy, fairness, bias and ethics have been raised and are now well documented. Increasingly, service providers and data controllers are under the obligation of providing explanations about the data they process.
Meanwhile, a decade of research on provenance, a standardisation of provenance at the World Wide Web Consortium (PROV), and applications, toolkits and services adopting provenance have led to the recognition that provenance is a critical facet of good data governance for businesses, governments and organisations in general. Provenance, which is defined as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing, turns out to be a solid foundation to construct explanations about data science pipelines.
In this talk, I will show how a provenance record can be constructed to describe a data science pipeline and to provide explanations about its outcomes. The talk will overview the notion of provenance, will outline key steps in constructing explanations, and will report on our experience in the project PLEAD: Provenance-driven and Legally-grounded Explanations for Automated Decisions (


Luc Moreau is a Professor of Computer Science and Head of the department of Informatics, at King's College London. Before joining King's, Luc was Head of the Web and Internet Science, in the department of Electronics and Computer Science, at the University of Southampton.

Luc was co-chair of the W3C Provenance Working Group, which resulted in four W3C Recommendations and nine W3C Notes, specifying PROV, a conceptual data model for provenance the Web, and its serializations in various Web languages. Previously, he initiated the successful Provenance Challenge series, which saw the involvement of over 20 institutions investigating provenance inter-operability in 3 successive challenges, and which resulted in the specification of the community Open Provenance Model (OPM). Before that, he led the development of provenance technology in the FP6 Provenance project and the Provenance Aware Service Oriented Architecture (PASOA) project.

He is on the editorial board of "PeerJ Computer Science" and previously he was editor-in-chief of the journal "Concurrency and Computation: Practice and Experience" and on the editorial board of "ACM Transactions on Internet Technology"

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google