Abstract: The past few year have seen enormous interest in making scientific data “FAIR”—findable, accessible, interoperable, and reusable. Unfortunately, most scientific datasets are not FAIR, and this problem hinders data sharing precisely when the fast-paced international response to the COVID-19 challenge demands rapid and effective data dissemination. Alas, when left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. A lack of metadata standardization makes it difficult for other investigators to locate relevant datasets, to reanalyze them, and to integrate those datasets with other data. The need for open science in response to the COVID-19 pandemic highlights the importance of making it easy for investigators to author metadata that adhere to community standards and that describe datasets in reproducible terms. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of comprehensive, standardized metadata to make online datasets more useful to the scientific community. In collaboration with the GO-FAIR International Support and Coordination Office, CEDAR is participating in the Virus Outbreak Data Network to develop robust approaches to the sharing of scientific datasets related to COVID-19. The approach enables scientists to manage libraries of metadata templates and to fill out the templates using terms from standard ontologies and value sets—making the datasets FAIR and immediately accessible and useful to a wide community of investigators.
Bio: Dr. Musen is Professor of Biomedical Informatics and of Biomedical Data Science at Stanford University, where he is Director of the Stanford Center for Biomedical Informatics Research. Dr. Musen conducts research related to open science, intelligent systems, computational ontologies, and biomedical decision support. His group developed Protégé, the world’s most widely used technology for building and managing terminologies and ontologies. He served as principal investigator of the National Center for Biomedical Ontology, one of the original National Centers for Biomedical Computing created by the U.S. National Institutes of Heath (NIH). He directs the Center for Expanded Data Annotation and Retrieval (CEDAR), founded under the NIH Big Data to Knowledge Initiative. CEDAR develops semantic technology to ease the authoring and management of biomedical experimental metadata.