Semantic Interoperability can be defined as the seamless exchange of well-formed, meaningful, and truthful data between distinct systems.
For all
The [World Wide Web] project merges the techniques of information retrieval and hypertext to make an easy but powerful global information system. The project started with the philosophy that much academic information should be freely available to anyone.
The Semantic Web is an extension of the World Wide Web, through standards, to make it machine-readable.
With RDF, everything goes in threes. Most of the triples' components have Uniform Resource Identifiers (URIs).
Syntax: subject, predicate, object
LOD is too focused on publishing data. Our data needs to be reused to be meaningful and valuable.
Linked Open Usable Data (LOUD) seeks a balance that takes into account the needs for data completeness and accuracy (ontological construction) and pragmatic concerns (ease of use, scalability)
Cultural heritage is, in its broadest sense, both a product and a process, which provides societies with a wealth of resources that are inherited from the past, created in the present and bestowed for the benefit of future generations. Most importantly, it includes not only tangible, but also natural and intangible heritage.
Keystones as Full-Fledged Actors
Design Principles, Standards
Examples of specifications following the LOUD design principles:
Image delivery on the Web has historically been hard, slow, expensive, disjointed, and locked-up in silos.
The Image and Presentation APIs are referred to as the core IIIF APIs
It specifies a RESTful web service that returns an image in response to a standard HTTP(S) request.
Base URI
{scheme}://{server}{/prefix}/{identifier}
Image Request
{$BASE}/{region}/{size}/{rotation}/{quality}.{format}
https://sipi.participatory-archives.ch/SGV_12/SGV_12N_08589.jp2/full/1000,/0/default.jpg
Image Information (Metadata)
{$BASE}/info.json
https://sipi.participatory-archives.ch/SGV_12/SGV_12N_08589.jp2/info.json
It is a JSON-LD based web service which provides the necessary information about the object or collection structure and layout.
Linked Art is a community collaborating to define a metadata application profile (the model) for describing cultural heritage, and the technical means for conveniently interacting with it (the API).
Linked Art is an RDF profile of the CIDOC-CRM that uses JSON-LD and the Getty Vocabularies to describe object-based cultural heritage in an event-based framework for consumption by software applications. It uses a subset of classes from the CIDOC-CRM ontology along with other commonly-used RDF ontologies to provide interoperable patterns and models that can be interpreted either as JSON or as RDF.
Level | Linked Art |
---|---|
Model | CIDOC Conceptual Reference Model (CRM) |
Ontology | RDF encoding of CRM 7.1, plus extensions |
Vocabulary | Getty Vocabularies (mainly AAT) |
Profile | Object-based cultural heritage (mainly art museum oriented) |
API | JSON-LD, following REST and web patterns |
Concepts
- Types, Materials, Languages, and others, as full records rather than external referencesDigital Objects
- Images, services and other digital objectsEvents
- Events and other non-specific activities that are related but not part of other entitiesGroups
- Groups and OrganizationsPeople
- PeoplePhysical Objects
- Physical things, including artworks, buildings or other architecture, books, parts of objects, and morePlaces
- Geographic placesProvenance Activities
- The various events that take place during the history of a physical thingSets
- Sets, including Collections and sets of objects used for exhibitionsTextual Works
- Texts worthy of description as distinct entities, such as the content carried by a book or journal articleVisual Works
- Image content worthy of description as distinct entities, such as the image shown by a painting or drawingBlack and White Negative modelled as a DigitalObject
DigitalObject
member_of
→ Collection (SGV_12) - Pointing a Set
subject_of
→ Web Pages / IIIF Manifestcurrent_owner
→ SSFS Photographic Archivescreated_by
→ Through the digitisation of a negativeproduced_by
→ Production of the negativedigitally_shows
→ Visual Contentidentified_by
→ Names and Identifiersaccess_point
→ IIIF Image API{
"@context": "https://linked.art/ns/v1/linked-art.json",
"id": "https://data.participatory-archives.ch/digital/12033",
"type": "DigitalObject",
"_label": "PIA ID 12033 - [Schwyzer Fasnacht]",
"classified_as": [
{
"id": "http://vocab.getty.edu/aat/300215302",
"type": "Type",
"_label": "Digital Image"
}
],
"member_of": [
{
"id": "https://data.participatory-archives.ch/set/12",
"type": "Set",
"_label": "SGV_12 (Ernst Brunner)",
"classified_as": [
{
"id": "http://vocab.getty.edu/aat/300025976",
"type": "Type",
"_label": "Collection"
}
]
}
],
Yale Collections Discovery
→ LUX adheres to the Linked Open Usable Data (LOUD) design principles
Community Practices
Semantic Interoperability
These images are part of the photographic archives of the Swiss Society for Folklore Studies. Licence: CC BY-NC 4.0
I am doing my PhD in Digital Humanities on Linked Open Usable Data, with a focus on its (potential) use in the Humanities and the perspectives it could bring in terms of community practices and semantic interoperability. My research is grounded as part of the Participatory Knowledge Practices in Analogue and Digital Image Archives (PIA) research project, which aims to develop a Citizen Science platform around three photographic collections of the Swiss Society for Folklore Studies (SSFS).
From the point of view of principles or technologies that I think are necessary, here are the ones which I consider worth mentioning. Open Science / Open Scholarship, FAIR Data Principles and Linked Open Data. They have different focuses and one that I think is most useful within my thesis is the last one as I am most interested in interoperability and creating a semantic framework not only for humans but also for machines.
Data are definable as constraining affordances, exploitable by a system as input of adequate queries The alethic nature, or modalities of truth, is the component that is the hardest to come by, to assess. In short, semantic information can also be described erotetically as data + queries. According to Trevor Owens (2011): Data are constructed artefacts, interpretable texts, processable information and can hold evidentiary value.
Combining insights from Floridi and Sanderson. Interoperability is a state in which two or more tested, independently developed technological systems can interact successfully according to their scope through the implementation of agreed-upon standards.
This Web, which has claimed to be a Semantic Web for several years now, has a centrepiece known as Resource Description Framework (RDF), a general method for describing and exchanging graph data. The Semantic Web offers major opportunities for scholarship as it allows data to be reasoned together, that is to be understood by machines via those RDF-based ontologies, a formal way to represent human-like knowledge.
With RDF, everything goes in threes, the data model contains so-called triples: that is subject, predicate, object that form graphs. Most of the components of these triples use Uniform Resource Identifiers (URIs) and are generally web-addressable, whether for naming subjects and objects (which may themselves also be objects of other triples) or relationships
5-star open data scheme 1) make your stuff available on the Web (whatever format) under an open license 2) make it available as structured data 3) make it available in a non-proprietary open format (e.g., CSV instead of Excel 4) use URIs to denote things, so that people can point at your stuff 5) link your data to other data to provide context
Linked Open Data has been around for many years. Resource Description Framework (RDF) is the underlying technology where assertions in triples are being produced. LOD has come under some criticism in terms of its uptake and often LOD projects have not been sustained for very long. LOD projects have mainly been concerned with the publication and consumption of data and geared towards an expert audience with knowledge of RDF. Here the audience is slightly different as they are intended for developers and the best way to give them data is to create APIs. LOUD is also an attempt to balance the trade-offs between completeness and precision of expression and the usability of the resulting data constructs.
In three parts...
The restoration of Notre-Dame Cathedral in Paris is often told from various viewpoints, from stonecutters, conservators, archaeologists, and architects. This contribution shifts the perspective to two keystones from the nave's F29-30 double arch, destroyed in the 2019 fire. Viewing them not only as artefacts but as agents reveals their unique journey—once part of the vault, they fell, underwent diverse stages (burial, cleaning, digitization, restoration, etc.), and became active participants, echoing Actor-Network Theory principles. This approach links traditional narrative with information science, emphasizing the keystones' role as central actors in their history.
Every dataset embodies an underlying potential that research and interpretation bring to light. A noticeable divide, especially within cultural heritage, exists between the generation of data, description of it and its use, owing to the diverse array of unforeseen applications.
With this example and a great article from Lozana Rossenova and Karen Di Franco about artists' books and specificall Parts of a Body House from Carolee Schneeman. Artists’ book collections were established in the libraries of art schools and museums in response to the rapid proliferation of such publications as art objects starting in the 1960s. Unlike other modes of artistic practice that were accessioned by curatorial departments, these items were largely gathered by libraries, which has made artists’ publishing subject to the definitions of the library catalog rather than those of the art collection “proper.” This situation is further complicated in the case of materials that have, for a variety of reasons, either evaded categorization completely, or been located in archives and described as archival items, or been classed as serials or journals, or ephemera. But many artists’ publications deliberately challenge the categories of library, archive, and collection catalogue alike
While the case-study research revealed the interconnections among the collections and publications activated by Schneemann’s contribution, these discrete iterations are sorted into different categories within a group of collection catalogs across institutions. For archives of nonstandard art objects such as net art or artists’ publishing (e.g., Schneemann’s work), the network model of LOD offers an opportunity to map out relations of embodied iterations that defy categorization (or canonization) and thus construct new, fuller and more nuanced histories around these materials. During community discussions, the sheer range of possible relations across editions, reinterpretations, serializations, or appropriations of publications proved challenging to describe completely within the structure of the LOD model. Rossenova & Di Franco (2022)
Through a process of manual annotation and mapping of short excerpts from a variety of archaeological texts from Çatalhöyük [tʃaˈtaɫhœjyc] in Turkey to the CIDOC Conceptual Reference Model, a high-level ontology, they sought to examine and compare the representational affordances and resistances of data. Structured data (using CRM) fails to map the more natural modes of expression found in various types of archaeological text. - Structured Data are “Representations”: The implication is that data that do not conform to a consensus of the “norm” are hard to situate within any type of structured data. - Structured Data are Not a “Neutral Resource”: we need to consider (archaeological) databases that hold structured data not as containers of knowledge but rather as artefacts of previous (archaeological) knowledge making episodes, that is, digital technologies should be seen as having their own agency. As such, structured data sets embody their own set of socio-technical relationships and can't be neutral. - Structured Data May Not Be a “Democratizing Trend”. Data may be inadvertently promoting or reinforcing forms of social injustice. the ontology itself manages to strip away much of the context that provides the means by which wider audiences might derive relevance and meaning from the data should be a cause for concern. Not everyone Even as community-driven or participatory practices grow in popularity, the fundamental redesign of workflows, methods, and data to embed communities and community values at their core is still lacking
The overall idea of LOUD is to make data easy to use for humans, especially for developers. JSON-LD allows for some mapping of ontological constructs into JSON, which is the lingua-franca of modern developers and is a cornerstone technology of LOUD. Five design principles to promote data consumption have been conceived. To be part of the Web, not just on the Web.
IIIF is a community-driven initiative, which brings together key players in the academic and CH fields, and has defined open and shared APIs to standardise the way in which image-based resources are delivered on the Web. Implementing the IIIF APIs enables institutions to make better use of their digitised or born-digital material by providing, for instance, deep zooming, comparison, full-text search of OCR objects or annotation capabilities.
So why do we need IIIF? Digital images are fundamental carriers of information across the fields of cultural heritage, STEM, and others. They help us understand complex processes through visualization. They grab our attention and help us quickly understand abstract concepts. They help document many the past--and the present--and preserve it for the future. They are also ubiquitous: we interact with thousands of them every day both in real life and on the web. In short, images are important and we interact with large volumes of them online. Image 1: Female Figurine, Chupicuaro, 500/300 B.C Image 2: Vision of Saint Gregory, unknown artist, n.d. Image 3: Iyo Province: Saijo, Utagawa Hiroshige, 1855
Deep zoom with large images
compare images
Reunify
Search within
Annotate
Crowdsourcing - National Library of Wales
The two core specifications are the Image API and the Presentation API. The former is a web service for manipulating an image through a URL and the latter "specifies the information needed to drive a remote viewing experience".
The purpose of the API is to display descriptive information that is intended for humans and does not aim to provide semantic metadata for search engines
Abstraction Standards / Implementation Standards "A profile is a selection of appropriate abstractions, to encode, the scope of what can be described. An API is a selection of appropriate technologies, to give access to the data managed using the profile." (Robert Sanderson)
Event-based model
The Linked Art API is made up of different endpoints, each of which has a defined structure for the format of the data that will be returned from it. These align (mostly) with the core classes of the model, and are structured according to the API design principles.
The Linked Art API is made up of different endpoints, each of which has a defined structure for the format of the data that will be returned from it. These align (mostly) with the core classes of the model, and are structured according to the API design principles.
The Linked Art API is made up of different endpoints, each of which has a defined structure for the format of the data that will be returned from it. These align (mostly) with the core classes of the model, and are structured according to the API design principles.
Harvest: It runs nightly and is triggered by an operating system level scheduler to poll each stream to find and retrieve records that have changed since the previous harvest. Transform: The records are passed through source specific transformation routines in order to either map from arbitrary data formats, or to validate and clean up records already provided in Linked Art. Reconcile: This process is conducted to discover further identities from the various datasets to be able to collect all information about a particular entity eventually into a single record. Re-Identify: It maps the original \acp{URI} of the records to the internal identifiers. Merge: The records from multiple sources that have been mapped to the same identifier are merged together to form the single record. Load: The resulting dataset is then annotated with some additional features for indexing and exported to MarkLogic.
1) Belonging to a given commnity - before 2011 2) People that have been active prior to 2021 tend to be more active
An important proposition arises from the observation that adherence to the \ac{LOUD} design principles makes specifications more likely to be adopted. The primary benefit of adopting \ac{LOUD} standards lies in their grassroots nature. The development and maintenance of \ac{LOUD} standards by dedicated communities are characterised by collaboration, consensus building, and transparency. This grassroots approach not only aligns with the core values of openness and collaboration within the \ac{DH} community but also serves as a common denominator between \ac{DH} practitioners and \acp{CHI}. This unique alignment fosters a sense of shared purpose and common ground. However, it's essential to acknowledge that while \ac{LOUD} and its associated standards, including IIIF, hold immense promise, their limited recognition in the wider socio-technical ecosystem may currently hinder their full potential impact.