Introduction to Open Data
  • Home
  • Syllabus
  • Exercises
    • Exercise 1: OA Deep Dive
    • Exercise 2: Movements and Principles
    • Exercise 3: Up to date with Linked Data
    • Exercise 4: Discovering ORD Platforms
    • Exercise 5: The Reuser’s Perspective (OGD)
    • Exercise 6: Reading Assignment
    • Exercise 7: Open Refine
    • Exercise 8: IIIF & ML
  • Course Sections
    • Characteristics of Open Data
    • Associated Movements
    • Associated Principles
    • Open Data Platforms and Organisations
    • Assessment, Data Quality, and Best Practices
    • Techniques, Software, and Tools
    • Showcases
  • Recap
  • References
  • About

On this page

  • Week 1
    • What is Open Data?
      • Definitions
      • A Brief History
    • Movements and Principles
      • Movements
      • Principles
    • IIIF and Linked Art
      • IIIF — International Image Interoperability Framework
      • Linked Art
    • From Tables to Triples
    • The Same Data as a Knowledge Graph
    • Persistent Identifiers (PIDs)
    • Key Takeaways
  • Week 2
  • Week 3
  • Edit this page
  • Report an issue

Recap

Author
Affiliations

Julien A. Raemy

docuteam SA

University of Bern

Published

February 20, 2026

Modified

February 20, 2026

Each week’s recap is a concise summary of the key concepts, exercises, and takeaways covered in class. Expandable More context sections provide additional detail on each topic. Use this page to review before the next session or before the final examination.


Week 1

What is Open Data?

Definitions

  • Openness: freely access, use, modify, and share — for any purpose
  • Legally open: under an open licence permitting reuse and redistribution
  • Technically open: machine-readable, no more than reproduction cost
  • Metadata: data about data — no fixed boundary between the two

A Brief History

  • 1942 — Merton: researchers must contribute to the “common pot”
  • 1995 — term “Open Data” first appears (geophysical/environmental data)
  • 2005 — Open Knowledge Foundation publishes the Open Definition
  • 2007 — Sebastopol Meeting: 8 principles of open government data
  • 2009 — Berners-Lee at TED: “Raw Data Now”
NoteMore context

Open data requires both legal and technical openness. The concept predates the term — the idea that publicly funded knowledge must benefit the public has roots in 1940s science.


Movements and Principles

Movements

  • Open Access (OA): free, online access to scholarly publications — Gold, Green, Diamond, Hybrid, Bronze, Blue, Black
  • Open Science / Open Scholarship: entire research lifecycle made open — data, methods, peer review, software
  • FLOSS: Free/Libre and Open Source Software — 4 freedoms (run, study, change, distribute)

Principles

  • FAIR: Findable, Accessible, Interoperable, Reusable — the technical framework for data sharing
  • CARE: Collective Benefit, Authority to Control, Responsibility, Ethics — governance for Indigenous data
  • Collections as Data: GLAM collections reimagined as computational resources — openness by default, interoperability, ethical stewardship
  • LOUD: Linked Open Usable Data — developer-friendly, JSON-LD, community-driven

FAIR tells us how to structure data. CARE tells us whose interests must be protected.

NoteMore context

These movements and principles are complementary, not competing. Open Access focuses on publications, Open Science on the full research process, FLOSS on tools. FAIR and CARE together ensure data is both technically sound and ethically grounded.


IIIF and Linked Art

Two standards that put LOUD into practice in GLAM collections.

IIIF — International Image Interoperability Framework

IIIF defines standard APIs so that any compliant viewer can display any compliant resource, regardless of the hosting institution. The two core APIs are:

  • Image API: delivers pixels on demand — any region, size, rotation, and format; enables deep zoom without downloading the full file
  • Presentation API: assembles media resources (images, audio, video) and metadata into a manifest — a JSON-LD document describing how a digital object is structured and displayed
  • Image API
  • Presentation API

A Swiss postal archive photograph served via IIIF Image API — zoom, pan, and rotate directly in the browser:

Irises by Vincent van Gogh (Getty Museum) — a IIIF manifest in Mirador:

NoteMore context

The Image API exposes a simple URL pattern: {base}/{identifier}/{region}/{size}/{rotation}/{quality}.{format} — any pixel crop can be requested without preprocessing. The Presentation API wraps one or more media resources into a manifest that tells a client (viewer/player) how to display or play the object Mirador and Universal Viewer are the two most widely deployed open-source IIIF viewers.

Linked Art

Linked Art is a community and shared data model for cultural heritage description, built on JSON-LD and CIDOC-CRM. It is a concrete implementation of LOUD — semantically rich enough for research, simple enough for developers. The LUX platform at Yale University is the largest production deployment, combining collections from Yale’s museums, libraries, and archives into a single queryable knowledge graph:

NoteMore context

Linked Art constrains the full complexity of CIDOC-CRM into a profile served as JSON-LD over standard HTTP APIs — any developer who can consume JSON can work with it. LUX exposes millions of records (objects, people, places, concepts, events) as Linked Art entities, all interlinked and queryable. This is the Web of Data made operational at institutional scale.


From Tables to Triples

A museum wants to describe Van Gogh’s Starry Night. Two approaches to modelling the same information:

  • Relational Database
  • Triplestore

Three siloed tables, linked by local integer IDs:

Table Key columns
Artist id, name, nationality, birthday
Artwork id, title, date, material, artist_id ↗
Museum id, name, location
  • Relationships via foreign keys + SQL JOIN
  • IDs are local — id: 1 means nothing outside this database
  • Schema is rigid — adding a column requires a migration
  • Data is siloed — not (usually) designed to link to any other external databases

Nine triples — every relationship is explicit data:

Subject Predicate Object
ex:vangogh rdf:type foaf:Person
ex:vangogh foaf:name "Vincent van Gogh"
ex:vangogh schema:nationality "Dutch"
ex:vangogh ex:created ex:starrynight
ex:starrynight rdf:type schema:Painting
ex:starrynight schema:name "The Starry Night"
ex:starrynight schema:location ex:moma
ex:moma rdf:type schema:Museum
ex:moma foaf:name "Museum of Modern Art"
  • IDs are global URIs — wd:Q5582 is Van Gogh everywhere
  • Schema is flexible — add triples without breaking anything
  • Relationships are globally linked — any entity can reference any URI
NoteMore context

This is the core Linked Data shift. In a relational DB, relationships are implicit (foreign keys, JOIN operations) and local. In a triplestore, every relationship is an explicit triple with a globally meaningful URI. The same Van Gogh (wd:Q5582) can be referenced by the Getty, Wikidata, and the Rijksmuseum simultaneously.


The Same Data as a Knowledge Graph

The nine triples from the Triplestore tab visualised as a directed graph. Orange rounded nodes are named resources (URIs); dark red rectangles are class types; dark teal rectangles are literal string values.

%%{init: {'flowchart': {'nodeSpacing': 90, 'rankSpacing': 90}, 'themeVariables': {'fontSize': '14px', 'fontFamily': '"trebuchet ms",verdana,arial,sans-serif', 'lineColor': '#555555'}}}%%
graph TD
    VG(["ex:vangogh"]) -->|"rdf:type"| PERSON["foaf:Person"]
    VG -->|"foaf:name"| VGNAME["'Vincent van Gogh'"]
    VG -->|"schema:nationality"| NAT["'Dutch'"]
    VG -->|"ex:created"| SN(["ex:starrynight"])
    SN -->|"rdf:type"| PAINT["schema:Painting"]
    SN -->|"schema:name"| SNNAME["'The Starry Night'"]
    SN -->|"schema:location"| MOMA(["ex:moma"])
    MOMA -->|"rdf:type"| MUS["schema:Museum"]
    MOMA -->|"foaf:name"| MNAME["'Museum of Modern Art'"]

    classDef object fill:#C0770E,stroke:#7A4A00,color:#ffffff,font-weight:bold
    classDef type fill:#5D1E1E,stroke:#3A0A0A,color:#ffffff
    classDef literal fill:#1E3A4A,stroke:#0A1F2A,color:#ECF0F1

    class VG,SN,MOMA object
    class PERSON,PAINT,MUS type
    class VGNAME,NAT,SNNAME,MNAME literal

NoteMore context

ex:moma could link to wd:Q188740 (MoMA on Wikidata), which in turn links to hundreds of other artists and artworks — that is the Web of Data. The graph is open-ended — any node can link out to any other URI on the web, connecting data across institutions globally.


Persistent Identifiers (PIDs)

Why PIDs matter: Without globally unique, stable identifiers, data remains siloed. PIDs ensure the same entity is recognised everywhere.

PID Characteristics

  • Unique: no two entities share the same PID
  • Persistent: remains valid even if hosting infrastructure changes
  • Resolvable: points to current location via standard resolver — but resolution only guarantees you can find the resource, not that you can access it; a DOI behind a paywall resolves to a landing page, not the content
  • Interoperable: recognised across systems and communities
  • Human & machine-readable: accessible both visually and programmatically

Common PID Types/Schemes

PID Identifies Example Resolver
Digital Object Identifier (DOI) Publications, datasets, objects 10.1038/nature12373 https://doi.org/10.1038/nature12373
Open Researcher and Contributor ID (ORCID) Researchers, scholars 0000-0002-5444-2280 https://orcid.org/0000-0002-5444-2280
Research Organization Registry (ROR) Research organisations 01xkakk17 https://ror.org/01xkakk17
Archival Resource Key (ARK) Cultural heritage objects ark:12148/btv1b108473193/ https://n2t.net/ark:12148/btv1b108473193/
Handle General-purpose resources 20.500.14716/127190 https://hdl.handle.net/20.500.14716/127190
NoteMore context

DOI is the industry standard for academic publishing, ORCID for researchers, ROR for institutions, and ARK for cultural heritage objects. Each PID type has a dedicated resolver: doi.org, orcid.org, ror.org, hdl.handle.net, and n2t.net (ARK). A critical limitation: PIDs are often conflated with open access, but they are orthogonal — a DOI resolves regardless of whether the resource is open or behind a paywall. Persistence and resolvability are properties of the identifier, not of the access conditions.


Key Takeaways

  • Open Science Landscape
  • Principles & Frameworks
  • Technical Stack
  • Open Data = legally open + technically open (machine-readable)
  • Open Access (OA): free online access to scholarly publications — complements Open Data by opening the outputs that describe and contextualise datasets
  • Open Research Data (ORD): research datasets open by default — increasingly funder-mandated, central to university library services
  • Open Government Data (OGD): public-sector data published proactively — but retention periods, sensitivity, and archival obligations still apply and must be respected
  • FAIR: structure data for machine use — Findable, Accessible, Interoperable, Reusable
  • CARE: governance for data about people — Collective Benefit, Authority, Responsibility, Ethics
  • Collections as Data: GLAM holdings as computational datasets — bulk access, not just item discovery
  • LOUD: Linked Data that works in practice — JSON-LD, developer-friendly; instantiated by IIIF and Linked Art
  • Linked Data: global URIs as identifiers; triples (subject → predicate → object) as the data model (RDF as the syntax); SPARQL as the query language
  • PIDs: unique, persistent, resolvable — the same entity recognised across every institution
NoteMore context

Open Science Landscape: OA covers scholarly publications; ORD extends the logic to any dataset produced by research — the two are related but distinct, and funders increasingly mandate both. OA and Open Data are complementary: open publications provide the interpretive layer for open datasets, and together they make research fully reproducible. OGD applies open data principles to the public sector, but retention periods, data sensitivity, and archival obligations remain in force — proactive publication must be balanced against these constraints, which affect archives in particular.

Principles & Frameworks: FAIR and CARE are complementary — FAIR governs technical structure, CARE protects the interests of communities whose data is used. Collections as Data and LOUD are responses specific to GLAM and Linked Data contexts: the former reframes cultural heritage collections as first-class computational resources; the latter ensures that the Linked Data stack is actually adopted by developers, with IIIF and Linked Art as leading examples.

Technical Stack: PIDs and Linked Data each play a distinct role in the Web of Data — PIDs give entities stable, globally unique addresses; Linked Data makes the relationships between them explicit and traversable. Neither alone is sufficient, but both contribute to an open web where the same entity (wd:Q5582, Van Gogh) can be referenced and navigated across different platforms.

Open data is not just about making files available — it is about creating an interconnected web of usable, ethically governed knowledge.


Week 2

Content coming after Week 2.

Week 3

Content coming after Week 3.

Back to top

Reuse

CC BY 4.0

Julien A. Raemy | Introduction to Open Data

 
  • Edit this page
  • Report an issue

Content is published under a Creative Commons Attribution 4.0 International licence