Situating Interlinked Cultural Heritage Data on the Web

I'm first going to talk about the ways that data can be interlinked on the web in terms of vision and standards, then I will deconstruct a little bit the definition of what is cultural heritage data with a few examples and finally talk about two communities active in the cultural heritage field that develop and maintain specifications, that goes often hand in hand with the Open Science, Open Access movement.

I'd like to start by mentioning an event that took place more than thirty years ago: the advent of the World Wide Web, created at CERN in Geneva by Tim Berners-Lee in 1989. In one of his email, archived by the Internet Archive, Berners-Lee mentions that the web project began with the idea that a large part of academic information should be freely available to everyone.

This Web, which has claimed to be a Semantic Web for several years now, has a centrepiece known as Resource Description Framework (RDF), a general method for describing and exchanging graph data. The Semantic Web offers major opportunities for scholarship as it allows data to be reasoned together, that is to be understood by machines via those RDF-based ontologies, a formal way to represent human-like knowledge.

With RDF, everything goes in threes, the data model contains so-called triples: that is subject, predicate, object that form graphs. Most of the components of these triples use Uniform Resource Identifiers (URIs) and are generally web-addressable, whether for naming subjects and objects (which may themselves also be objects of other triples) or relationships. As for instance, in this graph where the subject, STS-CH is a conference, that it started today and will end tomorrow. This is a presentation in graph that can be represented in different forms and formats and understood by computers.

From data to cultural heritage data. And I will divide this section into three parts, three keywords.

Cultural heritage data refer to digital or data-driven affordances of cultural heritage, embodying **a rich and varied compilation of insights originating from a variety of disciplines, techniques, traditions, positions and technologies**. It encompasses both tangible and intangible aspects of a society's culture. And on the right-hand side, there is a Male Face Mask from the Guro culture that embody tangible and intangible aspects as it is worn n the occasion of a man’s second funeral.

Another example, one coming from the restoration of Notre-Dame Cathedral in Paris. In this project, the restoration part is often told from various viewpoints, from stonecutters, conservators, archaeologists, and architects. A contribution from Anaïs Guillem, Antoine Gros and Livio DeLuca shifts the perspective by showcasing two keystones from the nave's F29-30 double arch, destroyed in the 2019 fire. Viewing them not only as artefacts but as agents that reveal their unique journey — once part of the vault, they fell, underwent diverse stages (burial, cleaning, digitization, restoration, etc.), and became active participants, echoing Actor-Network Theory principles.

Every dataset embodies an underlying potential that research and interpretation bring to light. A noticeable divide, especially within cultural heritage, exists between the generation of data, description of it and its use, owing to the diverse array of unforeseen applications.

Rossenova and Di Franco's article delves into artists' books, focusing on Carolee Schneeman's "Parts of a Body House." Unlike traditional art acquisitions, artists' books landed in libraries of art institutions, following the 1960s' surge in their recognition as art objects. These collections don't fit neatly into curatorial realms; instead, they're cataloged under library definitions. The challenge intensifies when materials resist categorization or adopt archival, serial, or ephemeral labels. Many artists' publications purposefully defy conventional library, archive, and collection norms. The study uncovered linked collections and publications from Schneeman's contributions, though these distinct versions find varying categorization across institution-specific catalogs. For unconventional art archives like artist's books, linked data's network model provides a means to chart relationships of indefinable embodied versions, constructing intricate histories beyond categorization or canonization. However, community discussions revealed difficulty in fully describing the array of relationships across editions, reinterpretations, serializations, or appropriations of publications within the LOD model structure.

Through a process of manual annotation and mapping of short excerpts from a variety of archaeological texts from Çatalhöyük [tʃaˈtaɫhœjyc] in Turkey to the CIDOC Conceptual Reference Model, a high-level ontology, they sought to examine and compare the representational affordances and resistances of data. But here I must say what I mean when I say ontology in Computer Science. An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of Existence. For knowledge-based systems, what “exists” is exactly that which can be represented. Structured data (using CRM) fails to map the more natural modes of expression found in various types of archaeological text. - Structured Data are “Representations”: The implication is that data that do not conform to a consensus of the “norm” are hard to situate within any type of structured data. - Structured Data are Not a “Neutral Resource”: we need to consider (archaeological) databases that hold structured data not as containers of knowledge but rather as artefacts of previous (archaeological) knowledge making episodes, that is, digital technologies should be seen as having their own agency. As such, structured data sets embody their own set of socio-technical relationships and can't be neutral. - Structured Data May Not Be a “Democratizing Trend”. Data may be inadvertently promoting or reinforcing forms of social injustice. the ontology itself manages to strip away much of the context that provides the means by which wider audiences might derive relevance and meaning from the data should be a cause for concern. Not everyone Even as community-driven or participatory practices grow in popularity, the fundamental redesign of workflows, methods, and data to embed communities and community values at their core is still lacking

IIIF is a community-driven initiative, which brings together key players in the academic and CH fields, and has defined open and shared APIs to standardise the way in which image-based resources are delivered on the Web. Implementing the IIIF APIs enables institutions to make better use of their digitised or born-digital material by providing, for instance, deep zooming, comparison, full-text search of OCR objects or annotation capabilities.

The model and API are based on appropriate international standards to ensure that they are interoperable and easy to use. Linked Art focuses on usability of the data as a primary consideration, such that it is easy to implement and maintain. Some of the LOUD Design Principles: B. Few Barriers to entry: it should be easy to get started. C. Comprehensible by Introspection: The data should be understandable to a large degree simply by looking at it, D. Documentation with working examples That could be a solution to some extent to democratise structured data, when more accessible tools will be released.

One example of representation, with the different processes. Something that is often done to represent the various stages from the metadata source to the visualisation and how standards can interact with each other.

Example: the release of the IIIF Image and Presentation APIs 3.0. Tracing "from use cases to specifications" in IIIF and Linked Art reveals key actors: GitHub, a platform where developers store and share software and its docmentation, orchestrates. API specification is also centrally mediated by format and representational (JSON-LD) context. Validators control compliance. Servers and clients align, each with distinct technical dependencies. Objects embody purpose. Human collectives - IIIF editors, etc - steer. JSON-LD's API mediates context; intermediaries channel. Technologies, with layered abstraction, interweave, revealing a complex ecosystem.

That is my last slide, and I hope that I could highlight some of the dependencies to create a form of knowledge regime, on the Web. I am doing my PhD in Digital Humanities on Linked Open Usable Data, with a focus on its (potential) use in the Humanities and the perspectives it could bring in terms of community practices and semantic interoperability. My research is grounded as part of the Participatory Knowledge Practices in Analogue and Digital Image Archives (PIA) research project, which aims to develop a Citizen Science platform around three photographic collections of the Swiss Society for Folklore Studies (SSFS).

Situating Interlinked Cultural Heritage Data on the Web

Agenda

Interlinking Data on the Web

An open vision of the Web

The Semantic Web or the Web of Data

Resource Description Framework (RDF)

Cultural Heritage Data

— Defining Cultural Heritage Data

Heterogeneity

Notre-Dame's Restoration

— Defining Cultural Heritage Data

Knowledge Latency

Parts of a Body House by Carolee Schneemann

Structured Data

— Defining Cultural Heritage Data

Custodianship

IIIF

Linked Art

Linked Open Usable Data for Cultural Heritage

Perspectives on Community Practices and Semantic Interoperability

Bibliography

Image Credits