Linked Open Usable Data for Cultural Heritage

Level	Linked Art
Model	CIDOC Conceptual Reference Model (CRM)
Ontology	RDF encoding of CRM 7.1, plus extensions
Vocabulary	Getty Vocabularies (mainly AAT)
Profile	Object-based cultural heritage (mainly art museum oriented)
API	JSON-LD, following REST and web patterns

I am doing my PhD in Digital Humanities on Linked Open Usable Data, with a focus on its (potential) use in the Humanities and the perspectives it could bring in terms of community practices and semantic interoperability. My research is grounded as part of the Participatory Knowledge Practices in Analogue and Digital Image Archives (PIA) research project, which aims to develop a Citizen Science platform around three photographic collections of the Swiss Society for Folklore Studies (SSFS).

From the point of view of principles or technologies that I think are necessary, here are the ones which I consider worth mentioning. Open Science / Open Scholarship, FAIR Data Principles and Linked Open Data. They have different focuses and one that I think is most useful within my thesis is the last one as I am most interested in interoperability and creating a semantic framework not only for humans but also for machines.

Data are definable as constraining affordances, exploitable by a system as input of adequate queries The alethic nature, or modalities of truth, is the component that is the hardest to come by, to assess. In short, semantic information can also be described erotetically as data + queries. According to Trevor Owens (2011): Data are constructed artefacts, interpretable texts, processable information and can hold evidentiary value.

Combining insights from Floridi and Sanderson. Interoperability is a state in which two or more tested, independently developed technological systems can interact successfully according to their scope through the implementation of agreed-upon standards.

This Web, which has claimed to be a Semantic Web for several years now, has a centrepiece known as Resource Description Framework (RDF), a general method for describing and exchanging graph data. The Semantic Web offers major opportunities for scholarship as it allows data to be reasoned together, that is to be understood by machines via those RDF-based ontologies, a formal way to represent human-like knowledge.

With RDF, everything goes in threes, the data model contains so-called triples: that is subject, predicate, object that form graphs. Most of the components of these triples use Uniform Resource Identifiers (URIs) and are generally web-addressable, whether for naming subjects and objects (which may themselves also be objects of other triples) or relationships

5-star open data scheme 1) make your stuff available on the Web (whatever format) under an open license 2) make it available as structured data 3) make it available in a non-proprietary open format (e.g., CSV instead of Excel 4) use URIs to denote things, so that people can point at your stuff 5) link your data to other data to provide context

Linked Open Data has been around for many years. Resource Description Framework (RDF) is the underlying technology where assertions in triples are being produced. LOD has come under some criticism in terms of its uptake and often LOD projects have not been sustained for very long. LOD projects have mainly been concerned with the publication and consumption of data and geared towards an expert audience with knowledge of RDF. Here the audience is slightly different as they are intended for developers and the best way to give them data is to create APIs. LOUD is also an attempt to balance the trade-offs between completeness and precision of expression and the usability of the resulting data constructs.

The restoration of Notre-Dame Cathedral in Paris is often told from various viewpoints, from stonecutters, conservators, archaeologists, and architects. This contribution shifts the perspective to two keystones from the nave's F29-30 double arch, destroyed in the 2019 fire. Viewing them not only as artefacts but as agents reveals their unique journey—once part of the vault, they fell, underwent diverse stages (burial, cleaning, digitization, restoration, etc.), and became active participants, echoing Actor-Network Theory principles. This approach links traditional narrative with information science, emphasizing the keystones' role as central actors in their history.

Every dataset embodies an underlying potential that research and interpretation bring to light. A noticeable divide, especially within cultural heritage, exists between the generation of data, description of it and its use, owing to the diverse array of unforeseen applications.

With this example and a great article from Lozana Rossenova and Karen Di Franco about artists' books and specificall Parts of a Body House from Carolee Schneeman. Artists’ book collections were established in the libraries of art schools and museums in response to the rapid proliferation of such publications as art objects starting in the 1960s. Unlike other modes of artistic practice that were accessioned by curatorial departments, these items were largely gathered by libraries, which has made artists’ publishing subject to the definitions of the library catalog rather than those of the art collection “proper.” This situation is further complicated in the case of materials that have, for a variety of reasons, either evaded categorization completely, or been located in archives and described as archival items, or been classed as serials or journals, or ephemera. But many artists’ publications deliberately challenge the categories of library, archive, and collection catalogue alike

While the case-study research revealed the interconnections among the collections and publications activated by Schneemann’s contribution, these discrete iterations are sorted into different categories within a group of collection catalogs across institutions. For archives of nonstandard art objects such as net art or artists’ publishing (e.g., Schneemann’s work), the network model of LOD offers an opportunity to map out relations of embodied iterations that defy categorization (or canonization) and thus construct new, fuller and more nuanced histories around these materials. During community discussions, the sheer range of possible relations across editions, reinterpretations, serializations, or appropriations of publications proved challenging to describe completely within the structure of the LOD model. Rossenova & Di Franco (2022)

Through a process of manual annotation and mapping of short excerpts from a variety of archaeological texts from Çatalhöyük [tʃaˈtaɫhœjyc] in Turkey to the CIDOC Conceptual Reference Model, a high-level ontology, they sought to examine and compare the representational affordances and resistances of data. Structured data (using CRM) fails to map the more natural modes of expression found in various types of archaeological text. - Structured Data are “Representations”: The implication is that data that do not conform to a consensus of the “norm” are hard to situate within any type of structured data. - Structured Data are Not a “Neutral Resource”: we need to consider (archaeological) databases that hold structured data not as containers of knowledge but rather as artefacts of previous (archaeological) knowledge making episodes, that is, digital technologies should be seen as having their own agency. As such, structured data sets embody their own set of socio-technical relationships and can't be neutral. - Structured Data May Not Be a “Democratizing Trend”. Data may be inadvertently promoting or reinforcing forms of social injustice. the ontology itself manages to strip away much of the context that provides the means by which wider audiences might derive relevance and meaning from the data should be a cause for concern. Not everyone Even as community-driven or participatory practices grow in popularity, the fundamental redesign of workflows, methods, and data to embed communities and community values at their core is still lacking

The overall idea of LOUD is to make data easy to use for humans, especially for developers. JSON-LD allows for some mapping of ontological constructs into JSON, which is the lingua-franca of modern developers and is a cornerstone technology of LOUD. Five design principles to promote data consumption have been conceived. To be part of the Web, not just on the Web.

IIIF is a community-driven initiative, which brings together key players in the academic and CH fields, and has defined open and shared APIs to standardise the way in which image-based resources are delivered on the Web. Implementing the IIIF APIs enables institutions to make better use of their digitised or born-digital material by providing, for instance, deep zooming, comparison, full-text search of OCR objects or annotation capabilities.

So why do we need IIIF? Digital images are fundamental carriers of information across the fields of cultural heritage, STEM, and others. They help us understand complex processes through visualization. They grab our attention and help us quickly understand abstract concepts. They help document many the past--and the present--and preserve it for the future. They are also ubiquitous: we interact with thousands of them every day both in real life and on the web. In short, images are important and we interact with large volumes of them online. Image 1: Female Figurine, Chupicuaro, 500/300 B.C Image 2: Vision of Saint Gregory, unknown artist, n.d. Image 3: Iyo Province: Saijo, Utagawa Hiroshige, 1855

Deep zoom with large images

Crowdsourcing - National Library of Wales

The two core specifications are the Image API and the Presentation API. The former is a web service for manipulating an image through a URL and the latter "specifies the information needed to drive a remote viewing experience".

The purpose of the API is to display descriptive information that is intended for humans and does not aim to provide semantic metadata for search engines

Abstraction Standards / Implementation Standards "A profile is a selection of appropriate abstractions, to encode, the scope of what can be described. An API is a selection of appropriate technologies, to give access to the data managed using the profile." (Robert Sanderson)

The Linked Art API is made up of different endpoints, each of which has a defined structure for the format of the data that will be returned from it. These align (mostly) with the core classes of the model, and are structured according to the API design principles.

Harvest: It runs nightly and is triggered by an operating system level scheduler to poll each stream to find and retrieve records that have changed since the previous harvest. Transform: The records are passed through source specific transformation routines in order to either map from arbitrary data formats, or to validate and clean up records already provided in Linked Art. Reconcile: This process is conducted to discover further identities from the various datasets to be able to collect all information about a particular entity eventually into a single record. Re-Identify: It maps the original \acp{URI} of the records to the internal identifiers. Merge: The records from multiple sources that have been mapped to the same identifier are merged together to form the single record. Load: The resulting dataset is then annotated with some additional features for indexing and exported to MarkLogic.

1) Belonging to a given commnity - before 2011 2) People that have been active prior to 2021 tend to be more active

An important proposition arises from the observation that adherence to the \ac{LOUD} design principles makes specifications more likely to be adopted. The primary benefit of adopting \ac{LOUD} standards lies in their grassroots nature. The development and maintenance of \ac{LOUD} standards by dedicated communities are characterised by collaboration, consensus building, and transparency. This grassroots approach not only aligns with the core values of openness and collaboration within the \ac{DH} community but also serves as a common denominator between \ac{DH} practitioners and \acp{CHI}. This unique alignment fosters a sense of shared purpose and common ground. However, it's essential to acknowledge that while \ac{LOUD} and its associated standards, including IIIF, hold immense promise, their limited recognition in the wider socio-technical ecosystem may currently hinder their full potential impact.

Linked Open Usable Data for Cultural Heritage

Linked Open Usable Data for Cultural Heritage

Perspectives on Community Practices and Semantic Interoperability

Agenda

Preamble

Movements, Principles, Linked Data

Semantic Information

Semantic Interoperability

Interlinking Data on the Web

Open Web

An open vision of the Web

The Semantic Web or the Web of Data

Resource Description Framework (RDF)

Linked Open Data (LOD)

From LOD to LOUD

Cultural Heritage Data

Tangible, Intangible, Natural

— Defining Cultural Heritage Data

Heterogeneity

Notre-Dame's Restoration

— Defining Cultural Heritage Data

Knowledge Latency

Parts of a Body House by Carolee Schneemann

Structured Data

— Defining Cultural Heritage Data

Custodianship

LOUD

LOUD Design Principles

LOUD Standards

International Image Interoperability Framework

IIIF Community

International leaders from:

IIIF Community

Community and Technical Specification Groups

Why do we need IIIF ?

Images are fundamental carriers of information

The problem

A world of silos and duplication

What does IIIF do?

How does IIIF work?

IIIF Specifications

IIIF Image API

IIIF Presentation API

Linked Art

Extensive Definition

Finding the right balance

Linked Art Model Fundamentals

Linked Art API Endpoints (1)

Linked Art API Endpoints (2)

Linked Art API Endpoints (3)

Schwyzer Fasnacht

Needed properties/patterns for our DigitalObject

LUX

Built on open standards

Perspectives

Some concluding thoughts about LOUD

Image Credits

Needed properties/patterns for our `DigitalObject`