Towards an open source model for data and metadata standards#
April 8th-9th, 2024, National Science Foundation, Alexandria Virginia
Recent progress in machine learning and artificial intelligence promises to advance research and understanding across a wide range of fields and activities. In tandem, an increased awareness of the importance of open data for reproducibility and scientific transparency is making inroads in fields that have not traditionally produced large publicly available datasets. Data sharing requirements from publishers and funders, as well as from other stakeholders, have also created pressure to make datasets with research and/or public interest value available through digital repositories. However, to make the best use of existing data, and facilitate the creation of useful future datasets, robust, interoperable and usable standards need to evolve and adapt over time. The open-source development model provides significant potential benefits to the process of standard creation and adaptation. In particular, development and adaptation of standards can use long-standing socio-technical processes that have been key to managing the development of software, and allow incorporating broad community input into the formulation of these standards. By adhering to open-source standards to formal descriptions (e.g., by implementing schemata for standard specification, and/or by implementing automated standard validation), processes such as automated testing and continuous integration, which have been important in the development of open-source software, can be adopted in defining data and metadata standards as well. Similarly, open-source governance provides a range of stakeholders a voice in the development of standards, potentially enabling use-cases and concerns that would not be taken into account in a top-down model of standards development. On the other hand, open-source models carry unique risks that need to be incorporated into the process. The goal of this workshop is to discuss specific examples where an open-source model for standards development has had significant impact on the practice within a field and other cases where this model has not worked in the past, and/or cases where this model is not a good fit.
The workshop will convene attendees from a broad range of research disciplines and from various sectors (e.g., academic, government and industry) to a two-day workshop that will be held at the NSF headquarters. The meeting will include a mix of activities: plenary talks from experts in the field who can share their experience developing open-source data and metadata standards; breakout discussions on a variety of sub-topics (e.g., governance of data and metadata standards, strategies for evolving and adapting data and metadata standards, etc.); brainstorming sessions; and networking sessions. One of the goals of the workshop is to synthesize the discussions into a white paper that will summarize the state of the art and make concrete recommendations for the future evolution of robust, transparent and useful standards.