
Abstract: Revision control for source code - and especially Git - has caused a great leap forward in software development and delivery. A similar revolution has not yet taken place in data. This talk will discuss the various open source databases that are approaching this problem (with a focus on TerminusDB and TerminusHub), the underlying architectures and challenges in building both a 'Git for data' and a 'GitHub for data'.
It will posit that to be a truly collaboration and distributed system, it must be:
1) decentralized
2) offline-first: work offline and then resync when online again
3) reliable: conflicts are handled properly
4) private: end-to-end-encrypted, if desired
5) efficient: only changes (diffs) to the data set are transmitted between participants
6) collaborative: multiple people can work on the same data set
Finally, the talk will look to the future and the dawn of CI/CD for data
Bio: Dr. Gavin Mendel-Gleason is the CTO of TerminusDB. He is a former research fellow at Trinity College Dublin in the School of Statistics and Computer Science. His research focuses on databases, logic, and verification in software engineering. His work includes contributing to the Seshat global historical databank, an ambitious project to record and analyze patterns in human history. He is the inventor of the Web Object Query Language and the primary architect of TerminusDB. He is interested in improving the best practices of the software development community and a strong believer in formal methods and the use of mathematics and logic as disciplines to increase the quality and robustness of the software.