Recap

Author

Affiliations

Julien A. Raemy

docuteam SA

University of Bern

Published

February 20, 2026

Modified

March 11, 2026

Each week’s recap is a concise summary of the key concepts, exercises, and takeaways covered in class. Expandable More context sections provide additional detail on each topic. Use this page to review before the next session or before the final examination.

Week 1

What is Open Data?

Definitions

Openness: freely access, use, modify, and share — for any purpose
Legally open: under an open licence permitting reuse and redistribution
Technically open: machine-readable, no more than reproduction cost
Metadata: data about data — no fixed boundary between the two

A Brief History

1942 — Merton: researchers must contribute to the “common pot”
1995 — term “Open Data” first appears (geophysical/environmental data)
2005 — Open Knowledge Foundation publishes the Open Definition
2007 — Sebastopol Meeting: 8 principles of open government data
2009 — Berners-Lee at TED: “Raw Data Now”

More context

Open data requires both legal and technical openness. The concept predates the term — the idea that publicly funded knowledge must benefit the public has roots in 1940s science.

Movements and Principles

Movements

Open Access (OA): free, online access to scholarly publications — Gold, Green, Diamond, Hybrid, Bronze, Blue, Black
Open Science / Open Scholarship: entire research lifecycle made open — data, methods, peer review, software
FLOSS: Free/Libre and Open Source Software — 4 freedoms (run, study, change, distribute)

Principles

FAIR: Findable, Accessible, Interoperable, Reusable — the technical framework for data sharing
CARE: Collective Benefit, Authority to Control, Responsibility, Ethics — governance for Indigenous data
Collections as Data: GLAM collections reimagined as computational resources — openness by default, interoperability, ethical stewardship
LOUD: Linked Open Usable Data — developer-friendly, JSON-LD, community-driven

FAIR tells us how to structure data. CARE tells us whose interests must be protected.

More context

These movements and principles are complementary, not competing. Open Access focuses on publications, Open Science on the full research process, FLOSS on tools. FAIR and CARE together ensure data is both technically sound and ethically grounded.

IIIF and Linked Art

Two standards that put LOUD into practice in GLAM collections.

IIIF — International Image Interoperability Framework

IIIF defines standard APIs so that any compliant viewer can display any compliant resource, regardless of the hosting institution. The two core APIs are:

Image API: delivers pixels on demand — any region, size, rotation, and format; enables deep zoom without downloading the full file
Presentation API: assembles media resources (images, audio, video) and metadata into a manifest — a JSON-LD document describing how a digital object is structured and displayed

Understanding Tile Pyramids: How Deep Zoom Works

The Image API’s power comes from its use of tile pyramids, a multi-resolution strategy that makes seamless deep zoom possible. Rather than serving a single massive image file, the server pre-generates the image at multiple zoom levels, each subdivided into tiles. When you zoom or pan in a viewer, it requests only the specific tiles needed for your current viewport.

Explore how tile pyramids work interactively:

https://observablehq.com/@allmaps/tile-pyramid

Presentation API: Structuring Complex Objects

The Presentation API provides more than just single images. It uses JSON-LD to describe how media resources are organised and how they relate to each other.

Presentation API 3.0 Key types:

Collection: An ordered list of Manifests and/or further Collections. Allows grouping in a hierarchical structure for navigation, search results, or related resource sets.
Manifest: A description of a compound object’s structure and properties (title, descriptive metadata, licence, etc.). Usually represents a single object like a book, painting, or music album.
Canvas: A virtual display surface representing a view of the object. Content is associated with Canvases via Annotations. Provides spatial and temporal framing (borrowed from PDF, HTML, Photoshop).
Range: An ordered list of Canvases and/or further Ranges. Groups Canvases by content (chapters, scenes) or physical features (page gatherings, physical carriers). Called Sequence in IIIF Presentation API 2.1.
Content: Web resources (images, audio, video, text) associated with Canvases via Annotations or providing resource representations.

To give a simple example, a digitised book’s manifest contains multiple canvases (one per page), each of which links to an Image API service. Annotations may include page transcriptions or corrections. Ranges organise canvases into chapters. This human-readable JSON structure is accessible to developers while remaining semantically rich enough to capture complex cultural heritage objects.

A photograph from the Swiss postal archive is served through the IIIF Image API (https://iiif.ptt-archiv.ch/iiif/3/P-38-2-1851-08.jp2/info.json) in Leaflet-IIIF, a lightweight image viewer. You can zoom, pan and rotate the image directly in the browser.

The painting Irises by Vincent van Gogh (held at the Getty Museum in Los Angeles) is displayed in Mirador, a robust IIIF-compliant client, by means of a IIIF manifest (https://media.getty.edu/iiif/manifest/53be857e-41e8-4198-b45d-2e0f52d3051b).

More context

The Image API exposes a simple URL pattern: {base}/{identifier}/{region}/{size}/{rotation}/{quality}.{format} — any pixel crop can be requested without preprocessing. The Presentation API wraps one or more media resources into a manifest that tells a client (viewer/player) how to display or play the object Mirador and Universal Viewer are the two most widely deployed open-source IIIF viewers.

Linked Art

Linked Art is a community and shared data model for cultural heritage description, built on JSON-LD and CIDOC-CRM. It is a concrete implementation of LOUD — semantically rich enough for research, simple enough for developers. The LUX platform at Yale University is the largest production deployment, combining collections from Yale’s museums, libraries, and archives into a single queryable knowledge graph:

More context

Linked Art constrains the full complexity of CIDOC-CRM into a profile served as JSON-LD over standard HTTP APIs — any developer who can consume JSON can work with it. LUX exposes millions of records (objects, people, places, concepts, events) as Linked Art entities, all interlinked and queryable. This is the Web of Data made operational at institutional scale.

From Tables to Triples

A museum wants to describe Van Gogh’s Starry Night. Two approaches to modelling the same information:

Three siloed tables, linked by local integer IDs:

Table	Key columns
Artist	`id`, `name`, `nationality`, `birthday`
Artwork	`id`, `title`, `date`, `material`, `artist_id` ↗
Museum	`id`, `name`, `location`

Relationships via foreign keys + SQL JOIN
IDs are local — id: 1 means nothing outside this database
Schema is rigid — adding a column requires a migration
Data is siloed — not (usually) designed to link to any other external databases

Nine triples — every relationship is explicit data:

Subject	Predicate	Object
`ex:vangogh`	`rdf:type`	`foaf:Person`
`ex:vangogh`	`foaf:name`	`"Vincent van Gogh"`
`ex:vangogh`	`schema:nationality`	`"Dutch"`
`ex:vangogh`	`ex:created`	`ex:starrynight`
`ex:starrynight`	`rdf:type`	`schema:Painting`
`ex:starrynight`	`schema:name`	`"The Starry Night"`
`ex:starrynight`	`schema:location`	`ex:moma`
`ex:moma`	`rdf:type`	`schema:Museum`
`ex:moma`	`foaf:name`	`"Museum of Modern Art"`

IDs are global URIs — wd:Q5582 is Van Gogh everywhere
Schema is flexible — add triples without breaking anything
Relationships are globally linked — any entity can reference any URI

More context

This is the core Linked Data shift. In a relational DB, relationships are implicit (foreign keys, JOIN operations) and local. In a triplestore, every relationship is an explicit triple with a globally meaningful URI. The same Van Gogh (wd:Q5582) can be referenced by the Getty, Wikidata, and the Rijksmuseum simultaneously.

The Same Data as a Knowledge Graph

The nine triples from the Triplestore tab visualised as a directed graph. Orange rounded nodes are named resources (URIs); dark red rectangles are class types; dark teal rectangles are literal string values.

%%{init: {'flowchart': {'nodeSpacing': 90, 'rankSpacing': 90}, 'themeVariables': {'fontSize': '14px', 'fontFamily': '"trebuchet ms",verdana,arial,sans-serif', 'lineColor': '#555555'}}}%%
graph TD
    VG(["ex:vangogh"]) -->|"rdf:type"| PERSON["foaf:Person"]
    VG -->|"foaf:name"| VGNAME["'Vincent van Gogh'"]
    VG -->|"schema:nationality"| NAT["'Dutch'"]
    VG -->|"ex:created"| SN(["ex:starrynight"])
    SN -->|"rdf:type"| PAINT["schema:Painting"]
    SN -->|"schema:name"| SNNAME["'The Starry Night'"]
    SN -->|"schema:location"| MOMA(["ex:moma"])
    MOMA -->|"rdf:type"| MUS["schema:Museum"]
    MOMA -->|"foaf:name"| MNAME["'Museum of Modern Art'"]

    classDef object fill:#C0770E,stroke:#7A4A00,color:#ffffff,font-weight:bold
    classDef type fill:#5D1E1E,stroke:#3A0A0A,color:#ffffff
    classDef literal fill:#1E3A4A,stroke:#0A1F2A,color:#ECF0F1

    class VG,SN,MOMA object
    class PERSON,PAINT,MUS type
    class VGNAME,NAT,SNNAME,MNAME literal

More context

ex:moma could link to wd:Q188740 (MoMA on Wikidata), which in turn links to hundreds of other artists and artworks — that is the Web of Data. The graph is open-ended — any node can link out to any other URI on the web, connecting data across institutions globally.

Persistent Identifiers (PIDs)

Why PIDs matter: Without globally unique, stable identifiers, data remains siloed. PIDs ensure the same entity is recognised everywhere.

PID Characteristics

Unique: no two entities share the same PID
Persistent: remains valid even if hosting infrastructure changes
Resolvable: points to current location via standard resolver — but resolution only guarantees you can find the resource, not that you can access it; a DOI behind a paywall resolves to a landing page, not the content
Interoperable: recognised across systems and communities
Human & machine-readable: accessible both visually and programmatically

Common PID Types/Schemes

PID	Identifies	Example	Resolver
Digital Object Identifier (DOI)	Publications, datasets, objects	`10.1038/nature12373`	https://doi.org/10.1038/nature12373
Open Researcher and Contributor ID (ORCID)	Researchers, scholars	`0000-0002-5444-2280`	https://orcid.org/0000-0002-5444-2280
Research Organization Registry (ROR)	Research organisations	`01xkakk17`	https://ror.org/01xkakk17
Archival Resource Key (ARK)	Any digital or physical object	`ark:12148/btv1b108473193/`	https://n2t.net/ark:12148/btv1b108473193/
Handle	General-purpose resources	`20.500.14716/127190`	https://hdl.handle.net/20.500.14716/127190

More context

DOI is the industry standard for academic publishing, ORCID for researchers, ROR for institutions, and ARK for cultural heritage — though ARK is a general-purpose scheme usable for any digital or physical object. Each PID type has a dedicated resolver: doi.org, orcid.org, ror.org, hdl.handle.net, and n2t.net (ARK). A critical limitation: PIDs are often conflated with open access, but they are orthogonal — a DOI resolves regardless of whether the resource is open or behind a paywall. Persistence and resolvability are properties of the identifier, not of the access conditions.

For datasets specifically, DOI is the dominant scheme, but Handle (on which DOI is technically built), ARK, and various discipline-specific or institutional PIDs are also used. DOI is itself built on top of the Handle system — it is essentially a Handle with doi.org as its resolver and a governance layer managed by the International DOI Foundation.

Cool URIs: Linked Data introduces the concept of “cool URIs” (coined by Tim Berners-Lee) — URIs that do not change over time. A cool URI can be a PID, but it does not have to be: any stable, well-managed URI qualifies. Importantly, cool URIs are not necessarily URLs intended for human browsing; they function as globally unique identifiers for concepts and entities (e.g., wd:Q5582 for Van Gogh), and dereferencing them may return machine-readable RDF rather than a human-readable page.

Key Takeaways

Open Data = legally open + technically open (machine-readable)
Open Access (OA): free online access to scholarly publications — complements Open Data by opening the outputs that describe and contextualise datasets
Open Research Data (ORD): research datasets open by default — increasingly funder-mandated, central to university library services
Open Government Data (OGD): public-sector data published proactively — but retention periods, sensitivity, and archival obligations still apply and must be respected

FAIR: structure data for machine use — Findable, Accessible, Interoperable, Reusable
CARE: governance for data about people — Collective Benefit, Authority, Responsibility, Ethics
Collections as Data: GLAM holdings as computational datasets — bulk access, not just item discovery
LOUD: Linked Data that works in practice — JSON-LD, developer-friendly; instantiated by IIIF and Linked Art

Linked Data: global URIs as identifiers; triples (subject → predicate → object) as the data model (RDF as the syntax); SPARQL as the query language
PIDs: unique, persistent, resolvable — the same entity recognised across every institution

More context

Open Science Landscape: OA covers scholarly publications; ORD extends the logic to any dataset produced by research — the two are related but distinct, and funders increasingly mandate both. OA and Open Data are complementary: open publications provide the interpretive layer for open datasets, and together they make research fully reproducible. OGD applies open data principles to the public sector, but retention periods, data sensitivity, and archival obligations remain in force — proactive publication must be balanced against these constraints, which affect archives in particular.

Principles & Frameworks: FAIR and CARE are complementary — FAIR governs technical structure, CARE protects the interests of communities whose data is used. Collections as Data and LOUD are responses specific to GLAM and Linked Data contexts: the former reframes cultural heritage collections as first-class computational resources; the latter ensures that the Linked Data stack is actually adopted by developers, with IIIF and Linked Art as leading examples.

Technical Stack: PIDs and Linked Data each play a distinct role in the Web of Data — PIDs give entities stable, globally unique addresses; Linked Data makes the relationships between them explicit and traversable. Neither alone is sufficient, but both contribute to an open web where the same entity (wd:Q5582, Van Gogh) can be referenced and navigated across different platforms.

Open data is not just about making files available — it is about creating an interconnected web of usable, ethically governed knowledge.

Week 2

ORD Platforms

re3data is a global registry of research data repositories — a meta-repository: a dataset about datasets
Repositories vary in PID support, software, access conditions, certification, and APIs
re3data assigns DOIs to its own entries and exposes a REST API — applying to itself the same openness standards it documents in others

Switzerland’s ORD landscape

Swiss researchers are increasingly required by funders (SNSF, swissuniversities, Horizon Europe, etc.) to make their (meta)data as open as possible. This typically means depositing in a recognised repository:

Institutional: if the university hosts one (e.g. Yareta at the University of Geneva)
Cross-institutional / disciplinary: e.g. SwissUBase (social sciences), DaSCH (humanities), Zenodo (general-purpose)

Not all data is openly accessible by default — some repositories support restricted access (e.g. for sensitive data) or embargoes: a period during which data is retained but not yet publicly released (Yareta, for instance, supports this).

Warning

re3data entries depend on repository owners keeping their information up to date. If they do not, entries become stale — broken URLs, outdated policies, missing features. The registry is only as reliable as the community that actively curates it.

OGD Platforms

opendata.swiss is Switzerland’s central OGD catalogue — it does not store datasets itself; it points to where they are hosted. Publishers deposit metadata and links to the actual files, services, or APIs either manually or via semi-automatic harvesting from cantonal and communal portals
The metadata model behind opendata.swiss is DCAT-AP CH (the Swiss application profile of DCAT, the Data Catalogue Vocabulary), enabling interoperability with other OGD portals
Data can be consumed as bulk file downloads or via APIs (e.g. transport.opendata.ch) — APIs are far more practical for targeted queries on large datasets
LINDAS publishes Swiss federal government data as Linked Data — queryable via SPARQL; tools like visualize.admin.ch use it under the hood

Licensing in OGD: not all public bodies use Creative Commons licences. Some institutions define their own terms of use — though these often mirror CC licences in spirit (attribution, non-commercial restrictions, etc.). Always check the specific licence attached to a dataset before reusing it.

Key Takeaways

ORD and OGD platforms serve different communities (researchers vs. the public) but share the same core principles: open access, machine-readable formats, identifiers
For researchers in Switzerland, choosing a repository depends on funder mandates, institutional offerings, and disciplinary norms — data does not have to be immediately open, but it should be findable and accessible under clear conditions
Registries and portals are only useful if their metadata is actively maintained — community curation is not optional, it is part of the infrastructure

Week 3

Discussion: Ethics & Access

Building on Dişli & Candela (2025), we examined the tension between “guardian” mentalities and open access for public domain works. Key debates included the practical barriers to machine-readable licensing, navigating cross-border legal frameworks (Fair Use vs. DSM), and defining ethical boundaries between scholarly research and commercial AI training. We also explored how CARE principles may necessitate restricted access for Indigenous data, challenging the assumption that “open” is always the default solution.

Assessment & Quality Frameworks

We analyzed the Open Data Maturity (ODM) 2025 findings, noting Switzerland’s stable but stagnant position in the “Followers” group, which underscores the shift from measuring data volume to evaluating real-world impact and API interoperability. The session (very briefly) introduced holistic quality tools like the Periodic Table of Open Data Elements, emphasising that successful initiatives depend as much on governance, culture, and demand definition as on technical standards.

Tool: OpenRefine

OpenRefine was presented as the standard tool for data wrangling, enabling users to clean messy datasets, standardise inconsistent formats, and reconcile local identifiers with global authority files (e.g., Wikidata). Its power lies in transforming raw, unstructured exports into reliable, machine-actionable data ready for analysis or publication.

Key Takeaways

Ethics over Ideology: Openness is not absolute; CARE principles and commercial restrictions demonstrate that governance and context must dictate access levels, not just technical capability.
Technical Enforcement:** Cultural heritage institutions must move beyond passive terms of use to actively enforce access policies technically—using authentication tiers, API rate limiting, robots.txt, and secure data spaces to distinguish between scholarly research and commercial AI scraping.
Impact > Volume: High maturity (ODM) is no longer about publishing more files, but about ensuring data is interoperable (APIs), legally clear (machine-readable licenses), and measurably reused. For example, data.gouv.fr has introduced Model Context Protocol (MCP) access, a critical standard allowing LLMs to securely connect to live, contextual public data rather than relying on static datasets.
Cleaning is Research: Data wrangling with tools like OpenRefine is a critical scholarly activity that transforms static collections into dynamic, queryable datasets.

References

Dişli, M., & Candela, G. (2025). Copyright and Licencing for Cultural Heritage Collections As Data. Journal of Open Humanities Data, 11, 11. https://doi.org/10.5334/johd.263

Reuse

CC BY 4.0