New Paper: “Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data”
The paper “Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data” was published in Harvard Data Science Review 3 (4).
Lead author was our Key Researcher Andreas Rauber, one of the Co-Authors was our Senior Researcher Tomasz Miksa.
Abstract
Precisely identifying arbitrary subsets of data so that these can be re-produced is a daunting challenge in data- driven science, the more so if the underlying data source is dynamically evolving. Yet, an increasing number of settings exhibit exactly those characteristics: larger amounts of data being continuously ingested from a range of sources (be it sensor values, (on-line) questionnaires, documents etc.), with error correction and quality improvement processes adding to the dynamics.
The Research Data Alliance (RDA) Working Group on Dynamic Data Citation has published 14 recommendations that are centered around time-stamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. This paper provides an overview of the recommendations, reference implementations, and pilot systems deployed and then analyse lessons learned from these implementations. This provides a basis for institutions and data stewards considering adding this functionality to their data systems.
Andreas Rauber, Bernhard Gößwein, Carlo Maria Zwölf, C. Schubert, Florian Wörister, James Duncan, Katharina Flicker, Koji Zettsu, Kristof Meixner, Leslie D. McIntosh, Reyna Jenkyns, Stefan Pröll, Tomasz Miksa, and Mark A. Parsons: Precisely and persistently identifying and citing arbitrary subsets of dynamic data.
In: Harvard Data Science Review 3(4)
Datum: 28.10.2021, DOI: 10.1162/99608f92.be565013