Between mid-February and mid-August 2020 I carried out a master’s thesis, titled “Enabling better aggregation and discovery of cultural heritage content for Europeana and its partner institutions”, with the collaboration of Europeana Research and Development (R&D) team.
This master’s thesis was done as part of the final examination requirements of the Haute école de gestion de Genève (HEG-GE), for obtaining the Master of Science HES-SO in Information Science. I obtained a mark of 5.7 out of 6, which is equivalent to an A on the European Credit Transfer and Accumulation System (ECTS) scale.
In this post are listed:
Europeana, a non-profit foundation launched in 2008, aims to improve access to Europe’s digital cultural heritage through its open data platform that aggregates metadata and links to digital surrogates held by over 3700 providers. The data comes both directly from cultural heritage institutions (libraries, archives, museums) as well as through intermediary aggregators. Europeana’s current operating model leverages the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and the Europeana Data Model (EDM) for data import through Metis, Europeana’s ingestion and aggregation service.
However, OAI-PMH is an outdated technology, and is not web-centric, which presents high maintenance implications, in particular for smaller institutions. Consequently, Europeana seeks to find alternative aggregation mechanisms that could complement or supersede it over the long-term, and which could also bring further potential benefits. In scope, this master’s thesis seeks to extend the research on earlier aggregation experiments that Europeana successfully carried out with various technologies, such as aggregation based on Linked Open Data (LOD) datasets or through the International Image Interoperability Framework (IIIF) APIs.
The literature review first focuses on metadata standards and the aggregation landscape in the cultural heritage domain, and then provides an extensive overview of Web-based technologies with respect to two essential components that enable aggregation: data transfer and synchronisation as well as data modelling and representation. Three key results were obtained. First, the participation in the Europeana Common Culture project resulted in the documentation revision of the LOD-aggregator, a generic toolset for harvesting and transforming LOD. Second, 52 respondents completed an online survey to gauge the awareness, interest, and use of technologies other than OAI-PMH for (meta)data aggregation. Third, an assessment of potential aggregation pilots was carried out considering the 23 organisations who expressed interest in follow-up experiments on the basis of the available data and existing implementations. In the allotted time, one pilot was attempted using Sitemaps and Schema.org.
In order to encourage the adoption of new aggregation mechanisms, a list of proposed suggestions was then established. All of these recommendations were aligned with the Europeana Strategy 2020-2025 and directed towards one or several of the key roles of the aggregation workflow (data provider, aggregator, Europeana). Even if a shift in Europena’s operating model would require extensive human and technical resources, such an effort is clearly worthwhile as solutions presented in this dissertation are well-suited for data enrichment and for allowing data to be easily updated. The transition from OAI-PMH will also be facilitated by the integration of such mechanisms within the Metis Sandbox, Europeana’s new ad-hoc system where contributors will be able to test their data sources before ingestion into Metis. Ultimately, this shift is also expected to lead to a better discoverability of digital cultural heritage objects.
The master’s thesis oral defence happened on 28 August 2020 at the HEG-GE (and virtually through Zoom). The jury members who assessed the master’s thesis consisted of Arnaud Gaudinat, Associate Professor in Information Science at the HES-SO University of Applied Sciences and Arts, and Emmanuelle Bermès, Deputy Director for Services and Networks at the French National Library. Antoine Isaac, R&D Manager at Europeana, was also invited to participate.
Below is the recording of the defence which began with a twenty-minute presentation and was followed by about forty minutes of questions.