%%{init: {'flowchart': {'nodeSpacing': 90, 'rankSpacing': 90}, 'themeVariables': {'fontSize': '14px', 'fontFamily': '"trebuchet ms",verdana,arial,sans-serif', 'lineColor': '#555555'}}}%%
graph TD
VG(["ex:vangogh"]) -->|"rdf:type"| PERSON["foaf:Person"]
VG -->|"foaf:name"| VGNAME["'Vincent van Gogh'"]
VG -->|"schema:nationality"| NAT["'Dutch'"]
VG -->|"ex:created"| SN(["ex:starrynight"])
SN -->|"rdf:type"| PAINT["schema:Painting"]
SN -->|"schema:name"| SNNAME["'The Starry Night'"]
SN -->|"schema:location"| MOMA(["ex:moma"])
MOMA -->|"rdf:type"| MUS["schema:Museum"]
MOMA -->|"foaf:name"| MNAME["'Museum of Modern Art'"]
classDef object fill:#C0770E,stroke:#7A4A00,color:#ffffff,font-weight:bold
classDef type fill:#5D1E1E,stroke:#3A0A0A,color:#ffffff
classDef literal fill:#1E3A4A,stroke:#0A1F2A,color:#ECF0F1
class VG,SN,MOMA object
class PERSON,PAINT,MUS type
class VGNAME,NAT,SNNAME,MNAME literal
Recap
Each week’s recap is a concise summary of the key concepts, exercises, and takeaways covered in class. Expandable More context sections provide additional detail on each topic. Use this page to review before the next session or before the final examination.
Week 1
What is Open Data?
Definitions
- Openness: freely access, use, modify, and share — for any purpose
- Legally open: under an open licence permitting reuse and redistribution
- Technically open: machine-readable, no more than reproduction cost
- Metadata: data about data — no fixed boundary between the two
A Brief History
- 1942 — Merton: researchers must contribute to the “common pot”
- 1995 — term “Open Data” first appears (geophysical/environmental data)
- 2005 — Open Knowledge Foundation publishes the Open Definition
- 2007 — Sebastopol Meeting: 8 principles of open government data
- 2009 — Berners-Lee at TED: “Raw Data Now”
Open data requires both legal and technical openness. The concept predates the term — the idea that publicly funded knowledge must benefit the public has roots in 1940s science.
Movements and Principles
Movements
- Open Access (OA): free, online access to scholarly publications — Gold, Green, Diamond, Hybrid, Bronze, Blue, Black
- Open Science / Open Scholarship: entire research lifecycle made open — data, methods, peer review, software
- FLOSS: Free/Libre and Open Source Software — 4 freedoms (run, study, change, distribute)
Principles
- FAIR: Findable, Accessible, Interoperable, Reusable — the technical framework for data sharing
- CARE: Collective Benefit, Authority to Control, Responsibility, Ethics — governance for Indigenous data
- Collections as Data: GLAM collections reimagined as computational resources — openness by default, interoperability, ethical stewardship
- LOUD: Linked Open Usable Data — developer-friendly, JSON-LD, community-driven
FAIR tells us how to structure data. CARE tells us whose interests must be protected.
These movements and principles are complementary, not competing. Open Access focuses on publications, Open Science on the full research process, FLOSS on tools. FAIR and CARE together ensure data is both technically sound and ethically grounded.
IIIF and Linked Art
Two standards that put LOUD into practice in GLAM collections.
IIIF — International Image Interoperability Framework
IIIF defines standard APIs so that any compliant viewer can display any compliant resource, regardless of the hosting institution. The two core APIs are:
- Image API: delivers pixels on demand — any region, size, rotation, and format; enables deep zoom without downloading the full file
- Presentation API: assembles media resources (images, audio, video) and metadata into a manifest — a JSON-LD document describing how a digital object is structured and displayed
Understanding Tile Pyramids: How Deep Zoom Works
The Image API’s power comes from its use of tile pyramids, a multi-resolution strategy that makes seamless deep zoom possible. Rather than serving a single massive image file, the server pre-generates the image at multiple zoom levels, each subdivided into tiles. When you zoom or pan in a viewer, it requests only the specific tiles needed for your current viewport.
Explore how tile pyramids work interactively:
Presentation API: Structuring Complex Objects
The Presentation API provides more than just single images. It uses JSON-LD to describe how media resources are organised and how they relate to each other.
Presentation API 3.0 Key types:
- Collection: An ordered list of Manifests and/or further Collections. Allows grouping in a hierarchical structure for navigation, search results, or related resource sets.
- Manifest: A description of a compound object’s structure and properties (title, descriptive metadata, licence, etc.). Usually represents a single object like a book, painting, or music album.
- Canvas: A virtual display surface representing a view of the object. Content is associated with Canvases via Annotations. Provides spatial and temporal framing (borrowed from PDF, HTML, Photoshop).
- Range: An ordered list of Canvases and/or further Ranges. Groups Canvases by content (chapters, scenes) or physical features (page gatherings, physical carriers). Called Sequence in IIIF Presentation API 2.1.
- Content: Web resources (images, audio, video, text) associated with Canvases via Annotations or providing resource representations.

To give a simple example, a digitised book’s manifest contains multiple canvases (one per page), each of which links to an Image API service. Annotations may include page transcriptions or corrections. Ranges organise canvases into chapters. This human-readable JSON structure is accessible to developers while remaining semantically rich enough to capture complex cultural heritage objects.
A photograph from the Swiss postal archive is served through the IIIF Image API (https://iiif.ptt-archiv.ch/iiif/3/P-38-2-1851-08.jp2/info.json) in Leaflet-IIIF, a lightweight image viewer. You can zoom, pan and rotate the image directly in the browser.
The painting Irises by Vincent van Gogh (held at the Getty Museum in Los Angeles) is displayed in Mirador, a robust IIIF-compliant client, by means of a IIIF manifest (https://media.getty.edu/iiif/manifest/53be857e-41e8-4198-b45d-2e0f52d3051b).
The Image API exposes a simple URL pattern: {base}/{identifier}/{region}/{size}/{rotation}/{quality}.{format} — any pixel crop can be requested without preprocessing. The Presentation API wraps one or more media resources into a manifest that tells a client (viewer/player) how to display or play the object Mirador and Universal Viewer are the two most widely deployed open-source IIIF viewers.
Linked Art
Linked Art is a community and shared data model for cultural heritage description, built on JSON-LD and CIDOC-CRM. It is a concrete implementation of LOUD — semantically rich enough for research, simple enough for developers. The LUX platform at Yale University is the largest production deployment, combining collections from Yale’s museums, libraries, and archives into a single queryable knowledge graph:
Linked Art constrains the full complexity of CIDOC-CRM into a profile served as JSON-LD over standard HTTP APIs — any developer who can consume JSON can work with it. LUX exposes millions of records (objects, people, places, concepts, events) as Linked Art entities, all interlinked and queryable. This is the Web of Data made operational at institutional scale.
From Tables to Triples
A museum wants to describe Van Gogh’s Starry Night. Two approaches to modelling the same information:
Three siloed tables, linked by local integer IDs:
| Table | Key columns |
|---|---|
| Artist | id, name, nationality, birthday |
| Artwork | id, title, date, material, artist_id ↗ |
| Museum | id, name, location |
- Relationships via foreign keys + SQL
JOIN - IDs are local —
id: 1means nothing outside this database - Schema is rigid — adding a column requires a migration
- Data is siloed — not (usually) designed to link to any other external databases
Nine triples — every relationship is explicit data:
| Subject | Predicate | Object |
|---|---|---|
ex:vangogh |
rdf:type |
foaf:Person |
ex:vangogh |
foaf:name |
"Vincent van Gogh" |
ex:vangogh |
schema:nationality |
"Dutch" |
ex:vangogh |
ex:created |
ex:starrynight |
ex:starrynight |
rdf:type |
schema:Painting |
ex:starrynight |
schema:name |
"The Starry Night" |
ex:starrynight |
schema:location |
ex:moma |
ex:moma |
rdf:type |
schema:Museum |
ex:moma |
foaf:name |
"Museum of Modern Art" |
- IDs are global URIs —
wd:Q5582is Van Gogh everywhere - Schema is flexible — add triples without breaking anything
- Relationships are globally linked — any entity can reference any URI
This is the core Linked Data shift. In a relational DB, relationships are implicit (foreign keys, JOIN operations) and local. In a triplestore, every relationship is an explicit triple with a globally meaningful URI. The same Van Gogh (wd:Q5582) can be referenced by the Getty, Wikidata, and the Rijksmuseum simultaneously.
The Same Data as a Knowledge Graph
The nine triples from the Triplestore tab visualised as a directed graph. Orange rounded nodes are named resources (URIs); dark red rectangles are class types; dark teal rectangles are literal string values.
ex:moma could link to wd:Q188740 (MoMA on Wikidata), which in turn links to hundreds of other artists and artworks — that is the Web of Data. The graph is open-ended — any node can link out to any other URI on the web, connecting data across institutions globally.
Persistent Identifiers (PIDs)
Why PIDs matter: Without globally unique, stable identifiers, data remains siloed. PIDs ensure the same entity is recognised everywhere.
PID Characteristics
- Unique: no two entities share the same PID
- Persistent: remains valid even if hosting infrastructure changes
- Resolvable: points to current location via standard resolver — but resolution only guarantees you can find the resource, not that you can access it; a DOI behind a paywall resolves to a landing page, not the content
- Interoperable: recognised across systems and communities
- Human & machine-readable: accessible both visually and programmatically
Common PID Types/Schemes
| PID | Identifies | Example | Resolver |
|---|---|---|---|
| Digital Object Identifier (DOI) | Publications, datasets, objects | 10.1038/nature12373 |
https://doi.org/10.1038/nature12373 |
| Open Researcher and Contributor ID (ORCID) | Researchers, scholars | 0000-0002-5444-2280 |
https://orcid.org/0000-0002-5444-2280 |
| Research Organization Registry (ROR) | Research organisations | 01xkakk17 |
https://ror.org/01xkakk17 |
| Archival Resource Key (ARK) | Any digital or physical object | ark:12148/btv1b108473193/ |
https://n2t.net/ark:12148/btv1b108473193/ |
| Handle | General-purpose resources | 20.500.14716/127190 |
https://hdl.handle.net/20.500.14716/127190 |
DOI is the industry standard for academic publishing, ORCID for researchers, ROR for institutions, and ARK for cultural heritage — though ARK is a general-purpose scheme usable for any digital or physical object. Each PID type has a dedicated resolver: doi.org, orcid.org, ror.org, hdl.handle.net, and n2t.net (ARK). A critical limitation: PIDs are often conflated with open access, but they are orthogonal — a DOI resolves regardless of whether the resource is open or behind a paywall. Persistence and resolvability are properties of the identifier, not of the access conditions.
For datasets specifically, DOI is the dominant scheme, but Handle (on which DOI is technically built), ARK, and various discipline-specific or institutional PIDs are also used. DOI is itself built on top of the Handle system — it is essentially a Handle with doi.org as its resolver and a governance layer managed by the International DOI Foundation.
Cool URIs: Linked Data introduces the concept of “cool URIs” (coined by Tim Berners-Lee) — URIs that do not change over time. A cool URI can be a PID, but it does not have to be: any stable, well-managed URI qualifies. Importantly, cool URIs are not necessarily URLs intended for human browsing; they function as globally unique identifiers for concepts and entities (e.g., wd:Q5582 for Van Gogh), and dereferencing them may return machine-readable RDF rather than a human-readable page.
Key Takeaways
- Open Data = legally open + technically open (machine-readable)
- Open Access (OA): free online access to scholarly publications — complements Open Data by opening the outputs that describe and contextualise datasets
- Open Research Data (ORD): research datasets open by default — increasingly funder-mandated, central to university library services
- Open Government Data (OGD): public-sector data published proactively — but retention periods, sensitivity, and archival obligations still apply and must be respected
- FAIR: structure data for machine use — Findable, Accessible, Interoperable, Reusable
- CARE: governance for data about people — Collective Benefit, Authority, Responsibility, Ethics
- Collections as Data: GLAM holdings as computational datasets — bulk access, not just item discovery
- LOUD: Linked Data that works in practice — JSON-LD, developer-friendly; instantiated by IIIF and Linked Art
- Linked Data: global URIs as identifiers; triples (subject → predicate → object) as the data model (RDF as the syntax); SPARQL as the query language
- PIDs: unique, persistent, resolvable — the same entity recognised across every institution
Open Science Landscape: OA covers scholarly publications; ORD extends the logic to any dataset produced by research — the two are related but distinct, and funders increasingly mandate both. OA and Open Data are complementary: open publications provide the interpretive layer for open datasets, and together they make research fully reproducible. OGD applies open data principles to the public sector, but retention periods, data sensitivity, and archival obligations remain in force — proactive publication must be balanced against these constraints, which affect archives in particular.
Principles & Frameworks: FAIR and CARE are complementary — FAIR governs technical structure, CARE protects the interests of communities whose data is used. Collections as Data and LOUD are responses specific to GLAM and Linked Data contexts: the former reframes cultural heritage collections as first-class computational resources; the latter ensures that the Linked Data stack is actually adopted by developers, with IIIF and Linked Art as leading examples.
Technical Stack: PIDs and Linked Data each play a distinct role in the Web of Data — PIDs give entities stable, globally unique addresses; Linked Data makes the relationships between them explicit and traversable. Neither alone is sufficient, but both contribute to an open web where the same entity (wd:Q5582, Van Gogh) can be referenced and navigated across different platforms.
Open data is not just about making files available — it is about creating an interconnected web of usable, ethically governed knowledge.
Week 2
ORD Platforms
- re3data is a global registry of research data repositories — a meta-repository: a dataset about datasets
- Repositories vary in PID support, software, access conditions, certification, and APIs
- re3data assigns DOIs to its own entries and exposes a REST API — applying to itself the same openness standards it documents in others
Switzerland’s ORD landscape
Swiss researchers are increasingly required by funders (SNSF, swissuniversities, Horizon Europe, etc.) to make their (meta)data as open as possible. This typically means depositing in a recognised repository:
- Institutional: if the university hosts one (e.g. Yareta at the University of Geneva)
- Cross-institutional / disciplinary: e.g. SwissUBase (social sciences), DaSCH (humanities), Zenodo (general-purpose)
Not all data is openly accessible by default — some repositories support restricted access (e.g. for sensitive data) or embargoes: a period during which data is retained but not yet publicly released (Yareta, for instance, supports this).
re3data entries depend on repository owners keeping their information up to date. If they do not, entries become stale — broken URLs, outdated policies, missing features. The registry is only as reliable as the community that actively curates it.
OGD Platforms
- opendata.swiss is Switzerland’s central OGD catalogue — it does not store datasets itself; it points to where they are hosted. Publishers deposit metadata and links to the actual files, services, or APIs either manually or via semi-automatic harvesting from cantonal and communal portals
- The metadata model behind opendata.swiss is DCAT-AP CH (the Swiss application profile of DCAT, the Data Catalogue Vocabulary), enabling interoperability with other OGD portals
- Data can be consumed as bulk file downloads or via APIs (e.g. transport.opendata.ch) — APIs are far more practical for targeted queries on large datasets
- LINDAS publishes Swiss federal government data as Linked Data — queryable via SPARQL; tools like visualize.admin.ch use it under the hood
Licensing in OGD: not all public bodies use Creative Commons licences. Some institutions define their own terms of use — though these often mirror CC licences in spirit (attribution, non-commercial restrictions, etc.). Always check the specific licence attached to a dataset before reusing it.
Key Takeaways
- ORD and OGD platforms serve different communities (researchers vs. the public) but share the same core principles: open access, machine-readable formats, identifiers
- For researchers in Switzerland, choosing a repository depends on funder mandates, institutional offerings, and disciplinary norms — data does not have to be immediately open, but it should be findable and accessible under clear conditions
- Registries and portals are only useful if their metadata is actively maintained — community curation is not optional, it is part of the infrastructure
Week 3
Discussion: Ethics & Access
Building on Dişli & Candela (2025), we examined the tension between “guardian” mentalities and open access for public domain works. Key debates included the practical barriers to machine-readable licensing, navigating cross-border legal frameworks (Fair Use vs. DSM), and defining ethical boundaries between scholarly research and commercial AI training. We also explored how CARE principles may necessitate restricted access for Indigenous data, challenging the assumption that “open” is always the default solution.
Assessment & Quality Frameworks
We analyzed the Open Data Maturity (ODM) 2025 findings, noting Switzerland’s stable but stagnant position in the “Followers” group, which underscores the shift from measuring data volume to evaluating real-world impact and API interoperability. The session (very briefly) introduced holistic quality tools like the Periodic Table of Open Data Elements, emphasising that successful initiatives depend as much on governance, culture, and demand definition as on technical standards.
Tool: OpenRefine
OpenRefine was presented as the standard tool for data wrangling, enabling users to clean messy datasets, standardise inconsistent formats, and reconcile local identifiers with global authority files (e.g., Wikidata). Its power lies in transforming raw, unstructured exports into reliable, machine-actionable data ready for analysis or publication.
Key Takeaways
- Ethics over Ideology: Openness is not absolute; CARE principles and commercial restrictions demonstrate that governance and context must dictate access levels, not just technical capability.
- Technical Enforcement:** Cultural heritage institutions must move beyond passive terms of use to actively enforce access policies technically—using authentication tiers, API rate limiting,
robots.txt, and secure data spaces to distinguish between scholarly research and commercial AI scraping. - Impact > Volume: High maturity (ODM) is no longer about publishing more files, but about ensuring data is interoperable (APIs), legally clear (machine-readable licenses), and measurably reused. For example, data.gouv.fr has introduced Model Context Protocol (MCP) access, a critical standard allowing LLMs to securely connect to live, contextual public data rather than relying on static datasets.
- Cleaning is Research: Data wrangling with tools like OpenRefine is a critical scholarly activity that transforms static collections into dynamic, queryable datasets.