Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Introduction to Open Data

Julien A. Raemy (University of Basel / DaSCH)
ORCID Google Scholar GitHub Mastodon

HES-SO University of Applied Sciences and Arts of Western Switzerland
HEG-GE Bachelor Information Science | Spring Semester 2023-2024

HEG-GE Bachelor Information Science | Spring Semester 2023-2024 |
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Round of introductions

Introduce yourself

  • Name
  • Background
  • Expectations of the course
  • Open Data?
Preamble
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Course Outline

Date Content
20.02.2024 Course Overview, Characteristics of Open Data
27.02.2024 Associated Movements and Principles, Platforms and Organisations
05.03.2024 Assessment, Data Quality, and Best Practices, Techniques, Software, and Tools
12.03.2024 Assignment Workshop, Showcases, Conclusion, References and Image Credits
Preamble

Course Overview

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Objectives

  1. To gain an understanding of Open Data, its essential aspects, and the principles of opening data;
  2. to learn how to find, analyse, and reuse open datasets;
  3. to learn the processes involved in preparing and publishing open datasets.
Course Overview
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Methods

  • Ex cathedra presentations
  • Discussions
  • Theoretical and practical exercises
  • Exploration and comparison of online services
  • Individual assignment
Course Overview
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Assessment

Individual assignment

  • Select one or more datasets from one or several platforms discussed during the course (if several datasets are selected, there must be a common thread)
  • Analyse, describe and identify the potential uses of the dataset(s)
  • Between 900 and 1,100 words (without references)
  • Short paper to be uploaded onto Cyberlearn by Friday 22 March, 7pm

This assignment is weighted at 20% of the 7C2-CT module.

Further details can be discussed at any time during the course and during the workshop.

Course Overview
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Assessment

Criteria

Assessment Criteria Points
Introduction and Contextualisation 5
Analysis and Argumentation 20
Structure and Writing 10
Presentation and Referencing 5
Total 40
Course Overview

Characteristics of Open Data

  • Definitions
  • History
  • The Impact of Open Data on Disciplines
  • Typology
  • Purposes
  • Requirements
  • Licences
  • Technical means
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Definitions

Open(ness)

Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)

The Open Definition according to the Open Knowledge Network: https://opendefinition.org/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Definitions

Open(ness)

  • No limitations on access of any kind
  • No cost, no authentication, no national or institutional privileges

→ Knowledge funded by (i.e. produced under a mandate from) the public must benefit the public without any limitations

[Scholger 2023]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Definitions

Data

center

[Chen & Floridi 2013]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Definitions

Data

Data at its most basic level as the absence of uniformity, whether in the real world or in some symbolic system. Only once such data have some recognisable structure and are given some meaning can they be considered information.

[Floridi 2010]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Definitions

Metadata

Data whose purpose is to describe and give information about other data.

[Oxford English Dictionary 2023b]

(Meta)data: semantic transposition

(...) there is no fixed boundary between “data” and “metadata”, and that information viewed as data in one discipline may be metadata in another.

[Alter et al. 2023]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Definitions

Open Data

Open data and content can be freely used, modified, and shared by anyone for any purpose.

The Open Definition according to the Open Knowledge Network: https://opendefinition.org/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

History

Robert K. Merton (1910-2003)

American sociologist, considered a founding father of modern sociology, said in 1942:

Each researcher must contribute to the 'common pot' and give up intellectual property rights to allow knowledge to move forward.

https://en.wikipedia.org/wiki/Timeline_of_the_open-access_movement

While the term “open data” isn’t even 20 years old, the author puts the concept in a historical context; the idea that scientific research should be free to all was popularized by Robert King Merton in the early 1940s. Research (which produces data) should be shared freely for the common good.

https://data.gov/blog/open-data-history/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

History

Timeline

  • 1942: The concept starts with Robert K. Merton
  • 1995: The term 'Open Data' first appeared, related to the sharing of geophysical and environmental data.
  • November 2005: Open Knowledge Foundation creates the Open Definition
  • December 2007: The concept of open public data was discussed and defined in Sebastopol, CA, USA at a meeting of Internet activists. They identified 8 principles.
  • February 2009: Tim Berners-Lee presents The next web at TED2009. He famously asked for "raw data now".

https://devopedia.org/open-data
https://www.opendatasoft.com/en/what-is-open-data-practical-guide/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

The Impact of Open Data on Disciplines

Catalysing interdisciplinary progress

  • A foundation for collaboration and innovation: Open data drives interdisciplinary research by providing universally accessible datasets, fostering collaboration across diverse fields such as information science, digital humanities and data science.

  • Enabling data-driven research: Facilitates advanced analysis and research in disciplines that rely heavily on data, increasing the accuracy and depth of insights and discoveries.

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

The Impact of Open Data on Disciplines

Open Data in Information Science, Digital Humanities, and Data Science

  • Information Science: Improves data management and accessibility, enabling more efficient data retrieval, archiving and dissemination practices.

  • Digital Humanities: Enabling new digital approaches to humanities research, providing new insights into historical, cultural and linguistic studies through data analysis.

  • Data Science: Leverages open data for predictive modelling, machine learning and big data analytics, enabling comprehensive analysis and informed decision-making in diverse fields such as health, finance and social sciences.

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Typology

Main Sources

  • Research: Open Research Data (ORD) / Open Scientific Data
  • Government: Open Government Data (OGD)
  • Non-profit organisations
  • Private organisations
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Typology

Disciplines

  • Cultural Heritage
  • Healthcare
  • Education
  • Transportation
  • Meteorology
  • Geospatial Information
  • Economic and Finance
  • Legal and Criminal Justice
  • Etc.
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Research Data (ORD)

Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical). These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence. Data may be raw or primary (e.g. direct from measurement or collection) or derived from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set), or derived from existing sources where the rights may be held by others.

[Concordat Working Group 2016]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

ORD

For funding agencies (and institutions)

  1. For the purposes of research assessment, consider the value and impact of all research outputs (including datasets and software) in addition to research publications, and consider a broad range of impact measures including qualitative indicators of research impact, such as influence on policy and practice.

For organizations that supply metrics

  1. Be open and transparent by providing data and methods used to calculate all metrics.
  2. Provide the data under a licence that allows unrestricted reuse, and provide computational access to data, where possible.

San Francisco Declaration on Research Assessment (DORA): https://sfdora.org/read/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

ORD in Switzerland

SNSF Policy

The Swiss National Science Foundation (SNSF) expects all its funded researchers:

  • to store the research data they have worked on and produced during the course of their research work,
  • to share these data with other researchers, unless they are bound by legal, ethical, copyright, confidentiality or other clauses, and
  • to deposit their data and metadata onto existing public repositories in formats that anyone can find, access and reuse without restriction.

https://www.snf.ch/en/dMILj9t4LNk8NwyR/topic/open-research-data

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

ORD in Switzerland

Vision

By facilitating access to and reuse of research data, ORD promotes better, more effective, and more impactful research for the benefit of society as a whole. Through the principles of open access and reusability of research data, ORD practices support transparent and reproducible research findings. Moreover, ORD fosters collaboration by promoting exchange among researchers across disciplines, legal systems and national borders, thus enabling creativity and innovation to thrive.

[Open Science Delegation 2021a]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

ORD in Switzerland

Action Areas

  • Support researchers and research communities in imagining and adopting ORD practices
  • Development, promotion, and maintenance of financially sustainable basic infrastructures and services for all researchers
  • Equipping researchers for ORD skills development and exchange of best practices
  • Building up systemic und supportive conditions for institutions and research communities

[Open Science Delegation 2021b]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Research Data Management (RDM)

Definition

RDM refers to the organisation, storage and preservation of data created during a research project.

Purpose

RDM ensures that research data are well-organised, maintained and accessible for current and future research, thereby improving their reliability, validity and reproducibility.

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Data Management Plan (DMP)

Definition

A DMP is a formal document that outlines how data will be handled during and after a research project, covering aspects from collection to sharing and preservation.

Purpose

It serves as a guide for managing data efficiently and meets funding agency requirements for data stewardship. It is now mandatory for most funding agencies to require a DMP as part of the grant application process to secure funding. University libraries often have services and resources to assist researchers in creating these documents, providing expert guidance on best practices in data management.

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

RDM and DMP

Interconnected Roles

RDM encompasses the day-to-day management of research data, while a DMP provides a structured plan for how to manage, share, and preserve data throughout the research project.

Planning and Execution

A DMP is essentially a blueprint for RDM. It outlines the policies and standards to be applied to the data, ensuring that data management practices are thought through from the outset of the project.

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Government Data (OGD)

The work of government involves collecting huge amounts of data, much of which is not confidential (economic data, demographic data, spending data, crime data, transport data, etc). The value of much of this data can be greatly enhanced by releasing it as open data, freeing it for re-use by business, research, civil society, data journalists, etc.

Open Data Handbook [Open Knowledge 2016]:
https://opendatahandbook.org/glossary/en/terms/government-data/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

OGD in Switzerland

center

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

OGD in Switzerland

LMETA. Art. 10, al. 4

Les données sont mises en ligne gratuitement, en temps utile, sous une forme lisible par machine et dans un format ouvert. Elles peuvent être librement réutilisées, sous réserve d’obligations légales spéciales de mentionner la source des données.

[Loi fédérale sur l’utilisation des moyens électroniques pour l’exécution des tâches des autorités (LMETA) 2023]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

OGD in Switzerland

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Purposes

  • Transparency and democratic control
  • Participation
  • Self-empowerment
  • Improved or new private products and services
  • Innovation
  • Improved efficiency and effectiveness of government services
  • Impact measurement of policies
  • New knowledge from combined data sources and patterns in large data volumes

Open Data Handbook [Open Knowledge 2016]
https://opendatahandbook.org/guide/en/why-open-data/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Purposes

Open Government Data Principles

  1. Complete
  2. Primary
  3. Timely
  4. Accessible
  5. Machine processable
  6. Non-discriminatory
  7. Non-proprietary
  8. License-free

https://public.resource.org/8_principles.html

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Requirements

Two main components

  1. Legally open: available under an open (data) licence that permits anyone freely to access, reuse and redistribute
  2. Technically open: data is available for no more than the cost of reproduction and in machine-readable and bulk form.

Open Data Handbook [Open Knowledge 2016]
https://opendatahandbook.org/glossary/en/terms/open-data/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

  • Grants creators exclusive rights to control use, reproduction, and distribution.
  • Designed to protect creators' economic interests; allows monetization of work.

Copyleft

  • Allows use, modification, and distribution with the condition of keeping works and derivatives open.
  • Promotes freedom, sharing of knowledge, and collaborative improvement.

For further information about licences → Competence Center in Digital Law: https://www.ccdigitallaw.ch/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

  • Focuses on economic rights; treats copyright as transferable property.
  • Duration based on set years post-creation or author's life plus years.

European Author's Rights (droit d'auteur / Urheberrecht)

  • Emphasises moral rights alongside economic rights.
  • Grants inalienable moral rights to creators; duration includes lifetime plus post-death period (commonly 70 years).
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Creative Commons (CC)

  • CC BY 4.0: Attribution
  • CC BY-SA 4.0: Attribution, Share Alike
  • CC BY-ND 4.0: Attribution, No Deritave Works
  • CC BY-NC 4.0: Attribution, No Commercial Use
  • CC BY-NC-SA 4.0: Attribution, No Commercial Use, Share Alike
  • CC BY-NC-ND 4.0: Attribution, No Commercial Use, No Derivative Works
  • Public Domain Dedication (CC0): No Rights Reserved
  • Public Domain Mark: No Known Copyright

https://creativecommons.org/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

center

Creative commons license spectrum [MJL 2020]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Rights Statements

  • 12 different rights statements that can be used by cultural heritage institutions
  • Three categories
    • In Copyright: statements for works that are in copyright
    • No Copyright: statements for works that are not in copyright
    • Other: stateements for works where the copyright status is unclear

https://rightsstatements.org/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Open Data Commons Open Database License (ODbL)

  • Copyleft licence
  • Attribution and Share-Alike for Data/Databases

https://opendatacommons.org/licenses/odbl/

ODbL is somewhat equivalent to CC BY-SA [Santos 2020].

Public Domain Dedication and License (PDDL)

  • The PDDL places the data(base) in the public domain (waiving all rights)

https://opendatacommons.org/licenses/pddl/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Licences for software

  • GNU General Public License (GPL)
  • GNU Affero General Public License (AGPL)
  • Mozilla Public License (MPL)
  • MIT License
  • Apache License

And others... → see for instance https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Responsable AI Licenses (RAIL)

  • Responsible AI Pubs Licenses
    • AIPubs Open RAIL-S
    • AIPubs Open RAIL-M
    • AIPubs Research-Use RAIL-S
    • AIPubs Research-Use RAIL-M
  • Responsible AI End-User License (RAIL-A License)
  • Responsible AI Source Code License (Open RAIL-S License)
  • BigScience Open RAIL-M License (Open RAIL-M License)

https://www.licenses.ai/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Responsable AI Licenses (RAIL)

center

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

I Hate AI License (IHAIL)

  • Based on CC BY 4.0
  • It prohibits the use of the material with Artifical Intelligence (AI) technologies while allowing sharing, adaptation, and commercial use under certain terms.

https://ihateailicense.eu/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Licences

Recommendations for Open (Research) Data

  1. CC0 (to the fullest extent allowed by law, as a complete waiver is not feasible under Swiss regulations)
  2. CC BY 4.0
  3. CC BY-SA 4.0

[Santos 2020]

Characteristics of Open Data
center
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Technical means

Important factors in providing structured data for machines

  • Data(set) formats
    • Text-based formats
    • Binary-encoded formats
  • Metadata standards / schemas (to describe the dataset)
  • Documentation
  • Protocols

And of course the underlying infrastructure...

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Infrastructure

Definitions

A collective term for the subordinate parts of an undertaking; substructure, foundation.

[Oxford English Dictionary 2023a]

People commonly envision infrastructure as a system of substrates – railroad, lines, pipes and plumbing, electrical power plants, and wires. It is by definition invisible, part of the background for other kinds of work. It is ready-to-hand. This image holds up well enough for many purposes – turn on the faucet for a drink of water and you use a vast infrastructure of plumbing and water regulation without usually thinking much about it.

[Star 1999]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Infrastructure

Dimensions

  1. Embeddedness
  2. Transparency
  3. Reach or scope
  4. Learned as part of membership
  5. Links with conventions of practice
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Infrastructure

Dimensions

  1. Embodiment of standards
  2. Built on an installed base
  3. Becomes visible upon breakdown
  4. Is fixed in modular increments
  5. Not all at once globally

[Star 1999]

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

Plain Text (TXT)

WSe2			WS2			MoS2	

dk	Intensity	dk	Intensity	dk	Intensity	

855.87628	63	848.96433	-39	855.87628	372
855.25787	72.25	848.34546	2	855.25787	424
854.63942	64.25	847.72654	-39	854.63942	460
854.02093	58	847.10759	-37	854.02093	362
853.40239	66	846.4886	-28	853.40239	440

Sohier, T., Ponomarev, E., Gibertini, M., Berger, H., Marzari, N., Ubrig, N., & Morpurgo, A. F. (2019). Enhanced Electron-Phonon Interaction in Multivalley Materials [Data set]. Université de Genève, Yareta. https://doi.org/10.26037/yareta:jlzyhiwj6vfjrnbza4bkvobjai

File (extract): f1b.txt

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

Markdown (MD)

# e-periodica OAI-PMH - Ethnology and Folklore
This script was done to download the metadata of [e-periodica](https://www.e-periodica.ch/) through 
their OAI-PMH endpoint (`https://www.e-periodica.ch/oai`) that could be interesting to the PIA research 
project as we want to link our image-based collections to the e-periodica articles. 

There are more than 16k metadata articles which have 
the 390 `setSpec` (Ethnology, folklore) on e-periodica. Probably, the more relevant articles 
come from the `Korrespondenzblättern der SGV` (more than 2k articles), divided into these three sources: 

- https://www.e-periodica.ch/digbib/volumes?UID=sgv-001 
- https://www.e-periodica.ch/digbib/volumes?UID=sgv-002
- https://www.e-periodica.ch/digbib/volumes?UID=sgv-003 

## Records in CSV
- [All records](data/records.csv)
- [Extract (SGV)](data/sgv.csv)

Raemy, J. A. (2023). e-periodica OAI-PMH - Ethnology and Folklore (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.7777797
File: README.MD

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

Comma-separated values (CSV)

"TRANSPORT_TYPE";"TRANSPORT_MEAN";"TRAVEL_REASON";"SOCIO_DEMO_VARIABLE_TYPE";"SOCIO_DEMO_VARIABLE";"PERIOD_REF";"UNIT_MEAS";"VALUE";"OBS_CONFIDENCE";"OBS_STATUS"
"TOT";"TOT";"ALL_REAS";"GEO";"CH";2015;"KM";36.8318;0.4602;"A"
"TOT";"TOT";"WORK";"GEO";"CH";2015;"KM";8.8512;0.1995;"A"
"TOT";"TOT";"SCHOOL";"GEO";"CH";2015;"KM";1.9104;0.1026;"A"
"TOT";"TOT";"SHOP";"GEO";"CH";2015;"KM";4.7651;0.1346;"A"
"TOT";"TOT";"LEISU";"GEO";"CH";2015;"KM";16.2548;0.3385;"A"
"TOT";"TOT";"SERV_ACC";"GEO";"CH";2015;"KM";1.8462;0.0982;"A"
"TOT";"TOT";"BUSIN";"GEO";"CH";2015;"KM";2.5514;0.1587;"A"
"TOT";"TOT";"OTH_REAS";"GEO";"CH";2015;"KM";0.6527;0.0759;"A"
"SOFT_MOB";"TOT";"ALL_REAS";"GEO";"CH";2015;"KM";2.8031;0.0440;"A"

Federal Statistical Office. Comportement de la population en matière de transports, tableaux de synthèse. https://opendata.swiss/fr/perma/18084205@bundesamt-fur-statistik-bfs

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

Extensible Markup Language (XML)

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dcat="http://www.w3.org/ns/dcat#"
  xmlns:dct="http://purl.org/dc/terms/"
  xmlns:vcard="http://www.w3.org/2006/vcard/ns#">
  <dcat:Dataset rdf:about="https://ckan.opendata.swiss/perma/121911@bundesamt-fur-statistik-bfs">
    <dcat:keyword xml:lang="it">lavoro-e-reddito</dcat:keyword>
    <dct:language>fr</dct:language>
    <dcat:distribution>
      <dcat:Distribution rdf:about="https://ckan.opendata.swiss/dataset/90beaddf-4f48-4211-9e34-ff68d4308f98/resource/f9eb3fb4-0a11-4a40-a995-0aa13f377011">
        <dct:rights rdf:resource="http://dcat-ap.ch/vocabulary/licenses/terms_by_ask"/>
        <dcat:downloadURL rdf:resource="https://dam-api.bfs.admin.ch/hub/api/dam/assets/121910/master"/>
        <dcat:accessURL rdf:resource="https://dam-api.bfs.admin.ch/hub/api/dam/assets/121910/master"/>
        <dct:format rdf:resource="http://publications.europa.eu/resource/authority/file-type/XLS"/>
        <dct:title xml:lang="de">Kanton Genf: Erwerbsleben und Ausbildung</dct:title>
        <dct:identifier>121910-master@bundesamt-fur-statistik-bfs</dct:identifier>
        <dct:language>de</dct:language>
        <dcat:mediaType rdf:resource="http://www.iana.org/assignments/application/vnd.ms-excel"/>
        <dct:description xml:lang="de">Dieser Dataset präsentiert die Zahlen zu Erwerbsleben und Ausbildung (Eidgenössische Volkszählung 2000)</dct:description>
        <dct:license rdf:resource="http://dcat-ap.ch/vocabulary/licenses/terms_by_ask"/>
      </dcat:Distribution>

Federal Statistical Office. Canton de Genève: Vie active et formation
https://opendata.swiss/fr/perma/121911@bundesamt-fur-statistik-bfs

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

Terse RDF Triple Language (Turtle)

@prefix ns1: <https://data.tg.ch/ld/ontologies/div-energie-6/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://data.tg.ch/ld/resources/div-energie-6/div-energie-6-record/1740b190a1beca6edcd04e0b143c380b885ee1de/> 
    a ns1:div-energie-6-record ;
    ns1:andere "2690"^^xsd:int ;
    ns1:einwohner "266510"^^xsd:int ;
    ns1:energiebezugsflaeche "25055573"^^xsd:int ;
    ns1:erdgas "308728"^^xsd:int ;
    ns1:erdoelbrennstoffe "307734"^^xsd:int ;
    ns1:jahr "2015-01-01"^^xsd:date ;
    ns1:kehricht "78654"^^xsd:int ;
    ns1:total "1291573"^^xsd:int ;
    ns1:treibstoffe "593764"^^xsd:int .

Kanton Thurgau. CO2-Gesamtemissionen nach Energieträgern (Ebene Kanton)
https://opendata.swiss/de/perma/div-energie-6@kanton-thurgau
https://data.tg.ch/ld/resources/div-energie-6/div-energie-6-record/1740b190a1beca6edcd04e0b143c380b885ee1de/

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

JavaScript Object Notation (JSON)

[{
        "author": "Hemingway, Ernest",
        "available_at": [
            {
                "isil": "AG0066"
            }
        ],
        "bid": "AGR0005487",
        "cited": [
            "VEA1112819"
        ],
        "citing": [],
        "dewey_classifications": null,

EPFL. Citations extracted from monographs about the history of Venice. https://opendata.swiss/de/perma/EPFL-LinkedBooksMonographs@openglam

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Text-based Formats

JavaScript Object Notation for Linked Data (JSON-LD)

{
  "id": "https://lux.collections.yale.edu/data/group/8b757ad2-f853-425e-a30d-19686aa779ee",
  "type": "Group",
  "_label": "American Academy of Arts and Sciences",
  "@context": "https://linked.art/ns/v1/linked-art.json",
  "formed_by": {
    "type": "Formation",
    "timespan": {
      "type": "TimeSpan",
      "identified_by": [
        {
          "type": "Name",
          "content": "1780-05-04",
          "classified_as": [
            {
              "id": "https://lux.collections.yale.edu/data/concept/5088ec29-065b-4c66-b49e-e61d3c8f3717",
              "type": "Type",
              "_label": "Display Title" }

LUX. American Academy of Arts and Sciences
https://lux.collections.yale.edu/data/group/8b757ad2-f853-425e-a30d-19686aa779ee

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Binary-encoded Formats

Binary files are used to store non-text data, such as images, audio, or executable programs. These files do not contain human-readable text and are encoded in binary format.

  • Image Formats: BMP, GIF, JPEG, JPEG2000, PNG, TIFF
  • Vector Graphics Formats: EPS, PSD, SVG
  • 3D Formats: 3MF, GLB, GLTF, OBJ, STL
  • Audio Formats: AAC, FLAC, MP3, OGG, WAV
  • Video Formats: AVI, FFV1/MKV, MOV, MP4, WEBM
  • Documents: DOCX, ODT, PDF, PDF/A
  • Scientific Data Formats: CDF, DICOM, FITS
  • Archive File Formats: 7-ZIP, GZIP, TAR, ZIP
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

center

1706-11-30_Verzaglia_Giuseppe-Bernoulli_Johann_I
https://iiif.dasch.swiss/0801/4VjgCwiTn8p-CTaooIqSZBO.jpx/full/max/0/default.jpg

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Metadata standards / schemas

  • Metadata standards are sets of rules and guidelines that dictate how metadata should be formatted and used. They ensure consistency and interoperability across different systems and platforms.
    • CIDOC Conceptual Reference Model (CIDOC-CRM), Dublin Core, Machine-Readable Cataloging (MARC), Preservation Metadata: Implementation Strategies (PREMIS)
  • Metadata schemas are specific implementations of metadata standards. They outline the structure, elements, and attributes of metadata for a specific purpose.
    • Encoded Archival Description (EAD), Lightweight Information Describing Object (LIDO), Metadata Object Description Schema (MODS)
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Data Catalog Vocabulary (Application Profiles)

Data Catalog Vocabulary (DCAT)

  • Resource Description Framework (RDF) vocabulary to facilitate interoperability between data catalogues published on the Web.
  • Current version: DCAT 3

Data Catalog Vocabulary Application Profile (DCAT-AP)

Specifications based on DCAT for describing public sector datasets

  • DCAT Application Profile for data portals in Europe: DCAT-AP 3.0
  • DCAT Application Profile for the United States of America: DCAT-US - Version 3
  • DCAT Application Profile for Data Portals in Switzerland (DCAT-AP CH): eCH-0200
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

DCAT

Seven classes ("entities")

  1. dcat:Catalog represents a catalogue, which is a dataset in which each individual item is a metadata record describing some resource
  2. dcat:Resource represents a dataset, a data service or any other resource that may be described by a metadata record in a catalogue.
  3. dcat:Dataset represents a collection of data, published or curated by a single agent or identifiable community.
  4. dcat:Distribution represents an accessible form of a dataset such as a downloadable file.
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0
  1. dcat:DataService represents a collection of operations accessible through an interface (API) that provide access to one or more datasets or data processing functions.
  2. dcat:DatasetSeries is a dataset that represents a collection of datasets that are published separately, but share some characteristics that group them.
  3. dcat:CatalogRecord represents a metadata record in the catalogue, primarily concerning the registration information, such as who added the record and when.

https://www.w3.org/TR/vocab-dcat-3/#dcat-scope

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

DCAT

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

DCAT

ex:catalog
  a dcat:Catalog ;
  dcterms:title "Imaginary Catalog"@en ;
  dcterms:title "Catálogo imaginario"@es ;
  rdfs:label "Imaginary Catalog"@en ;
  rdfs:label "Catálogo imaginario"@es ;
  foaf:homepage <http://dcat.example.org/catalog> ;
  dcterms:publisher ex:transparency-office ;
  dcterms:language <http://id.loc.gov/vocabulary/iso639-1/en>  ;
  dcat:dataset ex:dataset-001 , ex:dataset-002 , ex:dataset-003 ;
  .

https://www.w3.org/TR/vocab-dcat-3/#basic-example

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

DCAT-AP CH

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# ---------- class Catalog --------------------------------------------------
<https://swisstopo/opendata/catalog>
  a dcat:Catalog ;

  # mandatory properties
  dct:description "Datenkatalog der Stadt Zurich"@de ;
  dct:publisher <https://publishers/swisstopo> ;
  dct:title "Open Data City of Zurich"@en ,
            "Offene Daten der Stadt Zurich"@de .

https://www.dcat-ap.ch/releases/2.0/dcat-ap-ch.html#Class:Catalog

Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Documentation

Comprehensive and understandable information about the data, including its source, structure, context, and how to use it effectively.

Types of Documentation

  • Data field descriptions/data models
  • User guides
  • Metadata
  • Developer documentation
  • Source code documentation
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Protocols

  • Application Programming Interface (API): mechanism that enable two software components to communicate with each other
    • Representational State Transfer (REST): a style of API that uses HTTP requests for communication. REST is stateless, i.e. each request from a client to the server is treated as new. There is no stored memory of previous interactions. This means the server does not store any state about the client session on the server side.
    • Simple Object Access Protocol (SOAP): a protocol used for exchanging structured information in web services, offering high security and transactional reliability. SOAP can support both stateless and stateful operations.
  • File Transfer Protocol (FTP): used for the transfer of data files, particularly large datasets, from one host to another.
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Protocols

  • Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH): a protocol for harvesting metadata descriptions of records in an archive, particularly used in digital libraries.
  • Really Simple Syndication or Rich Site Summary (RSS) / Atom Feeds: used for regularly updating or publishing data that changes frequently. Feeds enable publishers to syndicate data automatically.
  • SPARQL Protocol and RDF Query Language (SPARQL): used for querying and manipulating RDF
Characteristics of Open Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – Imagine an ORD/OGD Platform

Conceptualising an Open Data Platform

In pairs or small groups, you will conceptualise a new platform for open data. Imagine what kind of datasets you would accept and showcase one example. Please consider the following:

  • Accepted datasets
    • Criteria and subjects
  • Accepted licence(s)
    • Open licence, possible restrictions
  • Accepted (meta)data formats
  • Metadata standard to describe the datasets
  • Protocols/Services available on the platform
Characteristics of Open Data

Associated Movements and Principles

  • Open Access
  • Open Science / Open Scholarship
  • Open Source / Free Software / F(L)OSS
  • FAIR Data Principles
  • CARE Principles for Indigenous Data Governance
  • Collections as Data
  • Linked (Open) (Usable) Data
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Access

Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities

  1. Authors and right holders must grant all users a free, irrevocable, worldwide, right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship as well as the right to make small numbers of printed copies for their personal use.
  2. A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in an appropriate standard electronic format is deposited in at least one online repository using suitable technical standards.

[Max Planck Society & European Cultural Heritage Online 2003]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Access

Definition

Open access (OA) is a broad international movement that seeks to grant free and open online access to academic information, such as publications and data. A publication is defined 'open access' when there are no financial, legal or technical barriers to accessing it - that is to say when anyone can read, download, copy, distribute, print, search for and search within the information, or use it in education or in any other way within the legal agreements.

https://www.openaccess.nl/en/what-is-open-access

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Access

Definition

OA is a publishing model for scholarly communication that makes research information available to readers at no cost, as opposed to the traditional subscription model in which readers have access to scholarly information by paying a subscription (usually via libraries).

https://www.openaccess.nl/en/what-is-open-access

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Access

Gold Open Access

Publications are made freely accessible by the publisher immediately upon publication. It often involves Article Processing Charges (APCs) paid by the author, their institution, or a funder.

→ Immediate OA via publisher

Green Open Access (Self-Archiving)

Authors publish their work in any journal and then self-archive an earlier version of the article (pre-print) for free public use in a repository (sometimes after an embargo period).

→ Immediate or delayed OA via self-archiving method/repository

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Access

Hybrid Open Access

Subscription-based journals allow authors to make their individual articles OA upon payment of an APC.

→ Immediate OA via publisher

Diamond/Platinum Open Access

Journals do not charge authors APCs and provide immediate OA to all their articles. It operates without direct cost to the authors; funding often comes from institutions, societies, or donations.

→ Immediate OA via publisher

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Access

Bronze Open Access

Articles made freely accessible on the publisher's website without an explicit OA licence.

Blue Open Access

Through blue OA, authors can archive the post-print or the publisher’s final version.

Black Open Access

It refers to the unauthorised distribution of published content through various channels, such as pirate sites or peer-to-peer networks.

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Science

Definition

Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.

[FOSTER 2019]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Science

center

[Morrison 2021, citing Persic 2021]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Scholarship

Open Scholarship: Expanding the Reach of Open Science

  • Broader Approach
    • Extends beyond traditional scientific disciplines to include arts and humanities.
    • Engages not just the research community but also the wider public, including non-experts, educators, and policymakers.
  • Supporting Collaboration and Innovation
    • Facilitates interdisciplinary collaboration across arts, humanities, and other fields.
    • Encourages the use of open educational resources for collaborative teaching and learning.
    • Advances open data practices for the sharing and reuse of cultural heritage resources.

[Tennant et al. 2020]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Source

Definition and Philosophy

Open Source refers to software with source code that can be inspected, modified, and enhanced by anyone. It emphasises collaboration and community-oriented development.

Key Characteristics

It includes free redistribution, access to source code, and allowance for derived works.

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Source

Criteria

  1. Free redistribution
  2. Source code must be included
  3. Derived works must be allowed
  4. Integrity of the author's source code
  5. No discrimation against persons or groups
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Source

  1. No discrimation against fields of endeavour
  2. Distribution of licence
  3. Licence must not be specific to a product
  4. Licence must not restrict other software
  5. Licence must be technology-neutral

https://opensource.org/osd/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Free Software

  • Free Software is centred around the idea of user freedom – the freedom to run, study, change, and distribute the software. "Free" refers to freedom, not price.
  • It has four essential freedoms
    • The freedom to run the program as you wish, for any purpose (freedom 0).
    • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
    • The freedom to redistribute copies so you can help others (freedom 2).
    • The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Free Software

“Free software” means software that respects users' freedom and community. Roughly, it means that the users have the freedom to run, copy, distribute, study, change and improve the software. Thus, “free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech,” not as in “free beer.” We sometimes call it “libre software,” borrowing the French or Spanish word for “free” as in freedom, to show we do not mean the software is gratis.

https://www.gnu.org/philosophy/free-sw.en.html

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

F(L)OSS

Free/Libre and Open Source Software

This is software for which the licensee can get the source code, and is allowed to modify this code and to redistribute the software and the modifications. Many terms are used: free, referring to the freedom to use (not to “free of charge”), libre, which is the French translation of Free/freedom, and which is preferred by some writers to avoid the ambiguous reference to free of charge, and open source, which focuses more on the access to the sources than on the freedom to redistribute. In practice, the differences are not great, and more and more scholars are choosing the term FLOSS to name this whole movement.

[Jullien 2009]

"Neutral stance": See https://www.gnu.org/philosophy/floss-and-foss.en.html

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

FAIR

FAIR Data Principles: https://www.go-fair.org/fair-principles/

[Wilkinson et al. 2016]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

FAIR

Findable

F1. (Meta)data are assigned a globally unique and persistent identifier

F2. Data are described with rich metadata (defined by R1)

F3. Metadata clearly and explicitly include the identifier of the data they describe

F4. (Meta)data are registered or indexed in a searchable resource

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

FAIR

Accessible

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

  • A1.1 The protocol is open, free, and universally implementable

  • A1.2 The protocol allows for an authentication and authorisation procedure, where necessary

A2. Metadata are accessible, even when the data are no longer available

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

FAIR

Interoperable

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (Meta)data use vocabularies that follow FAIR principles

I3. (Meta)data include qualified references to other (meta)data

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

FAIR

Reusable

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

  • R1.1. (Meta)data are released with a clear and accessible data usage license

  • R1.2. (Meta)data are associated with detailed provenance

  • R1.3. (Meta)data meet domain-relevant community standards

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

CARE

center

CARE Principles for Indigenous Data Governance: https://www.gida-global.org/care

[Carroll et al. 2020]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

CARE

Collective Benefit

C1. For inclusive development and innovation

C2. For improved governance and citizen engagement

C3. For equitable outcomes

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

CARE

Authority to Control

A1. Recognizing rights and interests

A2. Data for governance

A3. Governance of data

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

CARE

Responsibility

R1. For positive relationships

R2. For expanding capability and capacity

R3. For Indigenous languages and worldviews

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

CARE

Ethics

E1. For minimizing harm and maximizing benefit

E2. For justice

E3. For future use

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

CARE and FAIR

Operationalising CARE and FAIR

[Carroll et al. 2021]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Summits

  • 2017: Santa Barbara Statement [Padilla et al. 2017]
  • 2023: Vancouver Statement [Padilla, Scates Kettler, Varner, et al. 2023]

Main outputs

  • 10 principles
  • 'Part to Whole' Report
  • Related checklist and initiatives

https://collectionsasdata.github.io/statement/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

10 Principles

  1. Collections as Data development aims to encourage computational use of digitised and born digital collections.
  2. Collections as Data stewards are guided by ongoing ethical commitments.
  3. Collections as Data stewards aim to lower barriers to use.
  4. Collections as Data designed for everyone serve no one.
  5. Shared documentation helps others find a path to doing the work.
  6. Collections as Data should be made openly accessible by default, except in cases where ethical or legal obligations preclude it.
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

  1. Collections as Data development values interoperability.
  2. Collections as Data stewards work transparently in order to develop trustworthy, long-lived collections.
  3. Data as well as the data that describe those data are considered in scope.
  4. The development of collections as data is an ongoing process and does not necessarily conclude with a final version.
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Collections as Data: Part to Whole

  • Boundary Object Concept: Collections-as-data serve as flexible tools adaptable to various needs while maintaining a common identity.
  • Ethical Considerations: Emphasis on ethical development and use of collections, especially concerning marginalized communities.
  • Community Engagement: Essential for respecting and understanding the context of collections.
  • Organisational Structure Support: Effective initiatives require collaboration across various organisational departments.

Boundary Object, cf. Star & Griesemer [1989]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Collections as Data: Part to Whole

  • Documentation Importance: Crucial for understanding and maintaining collections in the future.
  • Community of Practice: Emphasises the need for skill sharing and collaborative environments.
  • Future Challenges and Opportunities
    • Integration of AI and computational tools in collections.
    • Navigating the balance between global collaboration and local cultural sensitivities.
    • Addressing financial and resource limitations for global community growth.
    • Potential and risks of using collections as data for AI training.

[Padilla, Scates Kettler, & Shorish, 2023]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Checklist to publish Collections as Data in GLAM institutions

  1. Provide a clear license allowing reuse of the dataset without restrictions
  2. Provide a suggestion of how to cite your dataset
  3. Include documentation about the dataset
  4. Use a public platform to publish the dataset
  5. Share examples of use as additional documentation
  6. Give structure to the dataset

Galleries, Libraries, Archives, Museums (GLAM)

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Checklist to publish Collections as Data in GLAM institutions

  1. Provide machine-readable metadata (about the dataset itself)
  2. Include your dataset in collaborative edition platforms
  3. Offer an API to access your repository
  4. Develop a portal page
  5. Add a terms of use

[Candela et al. 2023]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Workflow

center

https://marketplace.sshopencloud.eu/workflow/I3JvP6

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Collections as Data

Implementation at the Royal Library of Belgium

  • Data-level access to collections
  • Digital Humanities Research

center

https://www.kbr.be/en/projects/data-kbr-be/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

An Open Vision of the Web

The [World Wide Web] project merges the techniques of information retrieval and hypertext to make an easy but powerful global information system. The project started with the philosophy that much academic information should be freely available to anyone.

[Berners-Lee 1991]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Data

Linked Data refers to a set of best practices for publishing structured data on the Web.

Linked Data Principles

  • Use Uniform Resource Identifiers (URIs) as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information, using the standards (e.g. RDF, RDFS, SPARQL, etc.)
  • Include links to other URIs so that they can discover more things.

[Berners-Lee 2006]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Open Data (LOD)

center

5-star deployment scheme for Open Data: https://5stardata.info/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

The Semantic Web or the Web of Data

The Semantic Web is an extension of the World Wide Web, through standards, to make it machine-readable.

center

Tweaked Semantic Web Layer Cake [Idehen 2017]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Resource Description Framework (RDF)

With RDF, everything goes in threes. Most of the triples' components have Uniform Resource Identifiers (URIs). Syntax: subject, predicate, object

center

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Open Usable Data (LOUD)

The concept of LOUD extends LOD by emphasising not just the openness and interlinking of data but also its usability.

LOUD

  • The term was coined by Robert Sanderson [2018, 2019] who has been involved in the conception and maintenance of web standards, mainly in the cultural heritage field.
  • LOUD's goal is to achieve the Semantic Web's intent on a global scale in a usable fashion by leveraging community-driven and JSON-LD-based specifications.
  • It has five main design principles to make the data more easily accessible to software developers, who play a key role in interacting with the data and building software and services on top of it, and to some extent to academics.
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Open Usable Data (LOUD)

LOUD design principles

  • The right Abstraction for the audience
  • Few Barriers to entry
  • Comprehensible by introspection
  • Documentation with working examples
  • Few Exceptions, instead many consistent patterns

https://linked.art/loud/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

LOUD Standards

Specifications that follow the LOUD principles

  • International Image Interoperability Framework (IIIF)
  • W3C Web Annotation Data Model
  • Linked Art
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

LOUD-driven Communities

IIIF and Linked Art: social fabrics of sound socio-technical practices

  • Synergy of effective social and technical integration with an emphasis on usability
  • Unified by shared expertise and leadership
  • Collaboration beyond technical boundaries
  • Inclusivity and diversity in participation
  • Openness and friendliness as core values
  • Commitment to transparency
  • Organisation of online and face-to-face meetings

[Newbury 2018; Raemy 2023]

Associated Movements and Principles
center
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

International Image Interoperability Framework (IIIF)

IIIF

  • A model for presenting and annotating content
  • A global community that develops shared application programming interfaces (APIs), implements them in software, and exposes interoperable content

https://iiif.io

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

A global community...

...but mostly from the Northern hemisphere

center

https://bit.ly/iiifmap

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Community

  • State and National Libraries: Bavarian State Library, French National Library (BnF), British Library, National Library of Estonia, New York Public Library, Vatican Library, etc.

  • Archives: Blavatnik Foundation Archive, Indigenous Digital Archive, Internet Archive, Swedish National Archives, Swiss Federal Archives, etc.

  • Museums & Galleries: Art Institute Chicago, J. Paul Getty Trust, Smithsonian, Victoria & Albert Museum, MIT Museum, National Gallery of Art, Van Gogh Worldwide, etc.

  • Universities & Research Institutions: Cambridge, Cornell University, Ghent University, Swiss National Data and Service Center for the Humanities (DaSCH), Kyoto University, Oxford, Stanford, University of Toronto, Yale University, etc.

  • Aggregators/Facilitators: Europeana, Cuba-IIIF, Cultural Japan, OCLC ContentDM, etc.

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Community

center

2019 IIIF Conference, Göttingen, Germany

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Community Practices

Associated Movements and Principles
center
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Community Practices

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Images are fundamental carriers of information

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

The Problem

A world of silos and duplication

Image delivery on the Web has historically been hard, slow, expensive, disjointed, and locked-up in silos.

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

The Problem

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Deep Zoom with Large Images

center

https://purl.stanford.edu/hs631zg4177

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Compare Images

center

Letter from Alexander Hamilton Papers (September 6, 1780), Library of Congress: https://prtd.app/#72f604db-6869-4c08-91ce-7c79502a7f35

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Reunify

center

https://demos.biblissima.fr/chateauroux/demo/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Search within

center

Franks, Kendal; Royal College of Surgeons of England. The Germ Theory. via Wellcome Library.

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Storytelling

center

Storiiies: http://storiiies.cogapp.com/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Crowdsource

center

Crowdsourcing initiative from the National Library of Wales

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Machine-generated Annotations

center

See Cornut et al. [2023]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Beyond Images

center

https://ddmal.music.mcgill.ca/IIIF-AV-player/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF – Layers of digitisation

center

Leiden Collection's Curtain Viewer:
https://www.theleidencollection.com/viewer/david-and-uriah/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Application Programming Interface (API)

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Specifications

  • Image API
  • Presentation API
  • Authorization Flow API
  • Change Discovery API
  • Content Search API
  • Content State API

The Image and Presentation APIs are referred to as the core IIIF APIs

https://iiif.io/api

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Core IIIF APIs

center

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Image API

The IIIF Image API specifies a RESTful web service that returns an image in response to a standard HTTP(S) request.

center

https://iiif.io/api/image

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Image API

Uniform Resource Identifier (URI) Syntax

Associated Movements and Principles

IIIF Presentation API

The IIIF Presentation API is a JSON-LD based web service which provides the necessary information about the object or collection structure and layout.

https://iiif.io/api/presentation

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Presentation API

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Presentation API


{
  "@context": "http://iiif.io/api/presentation/3/context.json",
  "id": "https://iiif.participatory-archives.ch/SGV_12N_08589/manifest.json",
  "type": "Manifest",
  "label": {
    "en": [
      "[Ringtanz während der Masüras auf der Alp Sura]"
    ]
  },
  "metadata": [
    {
      "label": {
        "de": [
          "Titel"
        ]
      },
      "value": {
        "de": [
          "[Ringtanz während der Masüras auf der Alp Sura]"
        ]
      }
    },

https://iiif.participatory-archives.ch/SGV_12N_08589/manifest.json

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Core IIIF APIs in Mirador

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Ecosystem

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

IIIF Ecosystem

center

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Web Annotation Data Model

center

https://www.w3.org/TR/annotation-model/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Web Annotation Data Model

Motivations

To uderstand the reasons why the Annotation was created, or why the Textual Body was included in the Annotation.

Some of the Motivations: commenting, highlighting, identifying, tagging

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Web Annotation Data Model in a IIIF setting

{
  "@context": "http://iiif.io/api/presentation/3/context.json",
  "id": "https://iiif.participatory-archives.ch/annotations/SGV_12N_08589-p1-list.json",
  "type": "AnnotationPage",
  "items": [
    {
      "@context": "http://www.w3.org/ns/anno.jsonld",
      "id": "https://iiif.participatory-archives.ch/annotations/SGV_12N_08589-p1-list/annotation-436121.json",
      "motivation": "commenting",
      "type": "Annotation",
      "body": [
        {
          "type": "TextualBody",
          "value": "person",
          "purpose": "commenting"
        },
        {
          "type": "TextualBody",
          "value": "Object Detection (vitrivr)",
          "purpose": "tagging"
        },
        {
          "type": "TextualBody",
          "value": "<br><small>Detection score: 0.9574</small>",
          "purpose": "commenting"
        }
      ],
      "target": {
        "source": "https://iiif.participatory-archives.ch/SGV_12N_08589/canvas/p1",
        "selector": {
          "type": "FragmentSelector",
          "conformsTo": "http://www.w3.org/TR/media-frags/",
          "value": "xywh=319,2942,463,523"
        },
        "dcterms:isPartOf": {
          "type": "Manifest",
          "id": "https://iiif.participatory-archives.ch/SGV_12N_08589/manifest.json"
        }}},
Associated Movements and Principles
center
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art

Linked Art is a community and a CIDOC (ICOM International Committee for Documentation) Working Group collaborating to define a metadata application profile for describing cultural heritage, and the technical means for conveniently interacting with it (the API).

https://linked.art

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art Community

Institutions (some of them)

The American Numismatics Society, Europeana, The Frick Collection, J. Paul Getty Trust, The Metropolitan Museum of Art, The Museum of Modern Art (NY), National Gallery of Art (US), Oxford University (OERC), The Philadephia Museum of Art, Rijksmuseum (NL), University of Basel (Digital Humanities Lab), University of the Arts London, Victoria and Albert Museum, Yale Center for British Art

https://linked.art/community/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art Community Practices

center

https://groups.google.com/g/linked-art/c/8DcbDIExdS8/m/RTRQtOBsFQAJ

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Intent of Linked Art: finding the right balance

center

[Sanderson 2023]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art Overview

center

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art Data Model Fundamentals

Level Linked Art
Conceptual Model CIDOC Conceptual Reference Model (CRM)
Ontology RDF encoding of CRM 7.1, plus extensions
Vocabulary Getty Vocabularies, mainly the Art & Architecture Thesaurus (AAT), as well as the Thesaurus of Geographic Names (TGN) and the Union List of Artist Names (ULAN)
Profile Object-based cultural heritage (mainly art museum oriented)
API JSON-LD 1.1, following REST (representational state transfer) and web patterns
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art from 50k feet

center

[Raemy et al. 2023, adapted from Sanderson 2018]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art

Digital Object

center

https://linked.art/api/1.0/endpoint/digital_object/

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art

Digital Object

{
  "@context": "https://linked.art/ns/v1/linked-art.json",
  "id": "https://linked.art/example/digital/0",
  "type": "DigitalObject",
  "_label": "Digital Image of Self-Portrait of Van Gogh",
  "classified_as": [
    {
      "id": "http://vocab.getty.edu/aat/300215302",
      "type": "Type",
      "_label": "Digital Image"
    }
  ],

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Art Digital Integration (with IIIF)

center

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

LOUD-Driven Infrastructure

center

[Felsing et al. 2023]

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Linked Open Usable Data (LOUD)

LOUD in a nutshell

  • Grassroots development of IIIF and Linked Art with collaboration and transparency are one of the key factors, but implementations are needed to be conducted in parallel (specifications versus demonstrability).

  • LOUD standards, when used in conjunction, enhances semantic interoperability, even if it comes at the cost of ontological purity.

  • LOUD practices and standards should serve as common denominators for cultural heritage institutions, public bodies as well as research projects.

Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – Identifying the Movements and Principles

Match the movements and principles to these statements

In pairs or small groups, relate the movements and principles (OA, Open Data, Open Science, FLOSS, FAIR, CARE, Collections as Data, LOUD) to the following propositions (multiple answers possible).

  1. Software development
  2. Image dissemination
  3. Metadata dissemination
  4. Publication of scientific articles
  5. Persistent identifier assignment
  6. Digital object interoperability
Associated Movements and Principles
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – Identifying the Movements and Principles

  1. Development and maintenance of APIs
  2. Documentation
  3. Open license
  4. Inclusivity
  5. Machine-readable (meta)data
  6. Collaboration
  7. Ethical commitment
  8. Semantic interoperability
  9. Reusability
Associated Movements and Principles

Platforms and Organisations

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Platforms

ORD Platforms

  • Registry: re3data.org
  • Cross-disciplinary: SwissUBase
  • Humanities: DaSCH Service Platform
  • Cross-institutional: OLOS
  • Institutional: Yareta
  • Generic: Zenodo
Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

re3data.org

Registry of Research Data Repositories

  • Platform launched in 2012
  • Registry that includes data repositories from various academic disciplines
  • Embeddable widgets and tools
    • Additional information, Data Accessibility, Terms of use and licences, Policy, Persistent Identifier (PID) system, Certification
  • All metadata are available for open use under CC0. It also provides an API to access the content

http://www.re3data.org/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

SwissUbase

National cross-disciplinary solution

  • Launched in 2021, superseding FORSbase
  • Operated by FORS – Swiss Centre of Expertise in the Social Sciences – and the Universities of Lausanne, Neuchâtel and Zurich
    • Data catalogue, mostly from social sciences and linguistics
  • Own metadata schema for studies, datasets, and data files
  • Digital Object Identifier (DOI) at the dataset level

https://www.swissubase.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

DaSCH Service Platform (DSP)

Swiss National Data and Service Center for the Humanities

  • Institutionalised in 2017 by the Digital Humanities Lab of the University of Basel
  • Operated as a national research infrastructure since 2021 by DaSCH and primarily funded by the SNSF
  • Project-based data models that rely on a core base ontology (Knora), own metadata schema describing the whole project
  • RESTful API (JSON-LD), IIIF Image API, Archival Resource Keys (ARKs) with timestamps

https://app.dasch.swiss/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

OLOS

Consultation and archive portal of Switzerland

  • National instance deployed in 2021
  • Developed as part of the Data Life-Cycle Management (DLCM) project and operated by an association composed of the University of Fribourg, the HEG-GE and the HES-SO
  • Dataset description based on the Datacite Metadata Schema
  • DOI at the dataset level

https://olos.swiss/portal/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Yareta

Research Data Repository for Geneva's Higher Education Institutions

  • Platform launched in 2019
  • Developed as part of the DLCM project, based on OLOS, and operated by the University of Geneva

https://yareta.unige.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Zenodo

Generalist repository built at CERN

  • Created by the CERN to be a generic solution for storing data
  • Anyone can deposit data, with or without embargo
  • Own metadata schema
  • DOI per version

https://zenodo.org/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Platforms

OGD Platforms

Switzerland

  • National: opendata.swiss
  • Cantonal: Open Data Basel-Stadt
  • Municipal: Stadt Zürich Open Data
  • Public-Law Body: Open Data Portal of Geneva Public Transport

International

  • EU: European Data
  • USA: DATA.GOV
Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

opendata.swiss

Swiss public administration’s central portal for OGD

  • National platform launched in 2013 (first as opendata.admin.ch) under the direction of the Swiss Federal Archives. It exists as opendata.swiss since 2016 and is overseen by the Federal Statistical Office since 2019. It provides an overview of OGD published in Switzerland and is a joint project of the Confederation and the cantons.
  • Source code accessible on GitHub
  • Metadata accessible via a JSON API (CKAN) and an XML file according to the DCAT-AP CH standard. Unique identifier per dataset

https://opendata.swiss/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Basel-Stadt

Canton of Basel-Stadt's OGD

  • Cantonal platform officially launched in 2019 (pilot project in 2017-2018)
  • Plateform based on opendatasoft
  • Own Metadata schema which comprises some DCAT and DCAT-AP CH properties. JSON API to explore the catalogue and the datasets. Dedicated dashboard. Opaque identifier per dataset.

https://data.bs.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Stadt Zürich Open Data

City of Zurich's OGD

  • First OGD Platform in Switzerland, launched in 2012.
  • Documentation and dedicated scripts on GitHubSpecialist Unit for Open Government Data Canton of Zurich
  • Own metadata scheme which comprises DCAT-AP CH properties. (Geo)JSON APIs (see documentation). Non-opaque identifier per dataset

https://data.stadt-zuerich.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Portal of Geneva Public Transport

opendata.tpg

  • Platform launched in 2022. First open data initiative in 2015 through their real-time transit time data API.
  • Democratisation process: transparency, efficiency, innovation, and citizen participation
  • Several metadata schemes and download possibilities including DCAT in RDF/XML. Dataset schema in JSON which comprises GeoJSON, a format for encoding a variety of geographic data structures. Non-opaque identifier per dataset.

https://opendata.tpg.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

European Data

The official portal for European data

  • Platform launched in 2021 (beta version in 2015)
  • Source code available on GitLab
  • Metadata displayed using DCAT-AP (currently version 2.1.1) and accessible through a variety of APIs and an SPARQL endpoint (see documentation). Opaque identifier per dataset.

https://data.europa.eu/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

DATA.GOV

The Home of the U.S. Government's Open Data

  • Platform launched in 2009 which provides access to datasets published by agencies across the federal government of the United States
  • Based on open source applications (such as CKAN)
  • Metadata displayed using DCAT-US Schema v1.1

https://data.gov/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Organisations

  • Open Knowledge Network
  • Opendata.ch
  • Open Data Beer
Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Knowledge Network

A Global Network for Open Data

  • Launched in 2011
  • Foundation which comprises several established chapters, members, and contributors from around the world
  • Their mission is to create a fair, sustainable and open digital future, advancing open knowledge as a design principle beyond just data.

https://okfn.org/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Opendata.ch

Opendata.ch: Swiss Open Data Association

  • Founded in 2011
  • Swiss chapter of the Open Knowledge Network
  • Dedicated working group for the GLAM (Galleries, Libraries, Archives, Museums) sector: OpenGLAM

https://opendata.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Beer

  • Events since 2018 (around three events per year)
  • Founded by Open Data practicioners in Switzerland
    • Federal Statistical Office
    • Canton of Basel-Stadt
    • Canton of Thurgau
    • Canton of Zurich
    • City of Zurich
    • SBB CFF FFS

https://opendatabeer.ch/

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

And much more...

Awesome Open Government Data Switzerland

A curated list of OGD portals, websites, APIs, tools and other related resources in Switzerland (and beyond)

https://github.com/rnckp/awesome-ogd-switzerland

Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – Comparing Open Data Portals

Short comparative analysis of open data portals

In pairs or small groups, you will conduct a comparative analysis of one ORD portal and one OGD portal, neither of which has been previously discussed in our course. Your analysis will involve comparing these portals with similar ones that have already been presented, based on specific criteria.

  1. Choose one ORD portal and one OGD portal now. Announce your chosen portals to ensure no overlap.
  2. Dimensions to conduct the analysis: Launch Year, Purpose and Theme, Data Types, Access, Metadata Standards, Dataset Identifiers
  3. Prepare a concise 5-minute presentation of your findings (with or without visual aids)
Platforms and Organisations

Assessment, Data Quality, and Best Practices

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Maturity (ODM)

Annual assessment

Exercise done by the EU since 2015 to measure the progress of European countries in promoting and facilitating the availability and reuse of public sector information (→ mostly OGD).

  1. Policy – It investigates the open data policies and strategies in place in the participating countries, the national governance models for managing open data and the measures applied to implement those policies and strategies.
  2. Impact – It analyses the willingness, preparedness and ability of countries to measure both the reuse of open data and the impact created through this reuse.
Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Open Data Maturity (ODM)

Annual assessment

  1. Portal – It investigates the functionality of national open data portals, the extent to which users’ needs and behaviour are examined to improve the portal, the availability of open data across different domains and the approach to ensuring the portal’s sustainability.
  2. Quality – It assesses the measures adopted by portal managers to ensure the systematic harvesting of metadata, the monitoring of metadata quality and compliance with the DCAT-AP metadata standard, and the quality of deployment of the published data on the national portal.

https://data.europa.eu/en/publications/open-data-maturity

Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

ODM 2023

  • 35 participating countries: EU-27, 3 European Free Trade Association Countries (Iceland, Norway and Switzerland), 5 candidate countries (Bosnia and Herzegovina, Montenegro, Albania, Serbia and Ukraine)
  • Switzerland is 24th with an Open Data Maturity of 79%.

[Page et al. 2023]

https://data.europa.eu/en/publications/open-data-maturity/2023

Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Measuring Impact

Impact Monitoring Framework

Used in Switzerland for the opendata.swiss platform

  • Method to measure the value of OGD initiatives and projects.
  • Based on a structured and consistent list of criteria
  • Leverages the Social Return on Investment (SROI) approach to measure impact (input, output, outcome, impact)

[Stürmer 2016]

Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Data Quality and Best Practices

Data Quality

Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Data Quality and Best Practices

Best Practices, Toolkits

And of course the important principles from FAIR, CARE, Collections as Data, LOUD (and surely others)

Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Periodic Table of Open Data Elements

center

English: https://odimpact.org/periodic-table.html
French: https://open.datactivist.coop/apps/periodic-table [Pichot Damon 2024]

Assessment, Data Quality, and Best Practices

Techniques, Software, and Tools

How to get and work with data?

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Data Scraping

  • Process of extracting data from websites or other online sources, typically using automated software or scripts.
    • Advantages: It allows for efficient data collection from multiple sources, can automate repetitive tasks, and is capable of handling large volumes of data.
    • Challenges: Data scraping faces issues like website layout changes, legal and ethical considerations, as well as handling dynamic content loaded through JavaScript.
  • Examples
    • Extracting exhibition data from museum websites using Beautiful Soup, a Python library;
    • Scraping historical records or archives from government websites with Scrapy.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

API Integration

  • API integration involves connecting and pulling data from external services.
    • Advantages: It provides structured and often real-time access to data, allows for automation, and ensures data consistency and reliability.
    • Challenges: Complexity in handling API limits/rate limiting, maintaining integration after API updates, and managing data from disparate APIs.
  • Examples
    • Integrating social media data from platforms like Instagram, LinkedIn or Mastodon;
    • Retrieving weather information from meteorological APIs.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Data Mining

  • Data mining is the process of analysing large datasets to discover patterns, correlations, and insights.
    • Advantages: Helps in identifying trends, making predictions, and informing decision-making processes; can uncover hidden patterns in data.
    • Challenges: Requires significant computational resources, potential privacy concerns, and the need for skilled interpretation of results.
  • Examples
    • Analysing visitor data patterns using RapidMiner, a Java-based data science platform;
    • Mining public opinion data from government surveys with WEKA (Waikato Environment for Knowledge Analysis), a Java-based Machine Learning software.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Data wrangling/munging

  • Data wrangling, or munging, involves transforming and mapping raw data into a more structured and usable format.
    • Advantages: Makes data more accessible and useful for analysis, helps in cleaning and standardising data, and improves data quality.
    • Challenges: Time-consuming, requires expertise in data manipulation, and can be complex with large and diverse datasets.
  • Examples
    • Formatting and combining different datasets for a research project using Python's pandas library;
    • Harmonising open government datasets from different departments for comparative analysis.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Data Integration

  • Data integration involves combining data from different sources to provide a unified view of the data.
    • Advantages: Provides a comprehensive view of data, enhances data usability and analysis, and supports better decision-making.
    • Challenges: Managing data format and schema discrepancies, ensuring data quality and consistency, and handling large-scale integration.
  • Examples:
    • Combining spatial data from various archaeological digs and historical GIS databases for comprehensive mapping and analysis;
    • Combining financial data from various business units.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Stream processing

  • Stream processing is the technique of processing data in real-time as it flows in streams from various sources.
    • Advantages: Enables real-time data analysis and decision-making, can handle high throughput, and is suitable for time-sensitive data.
    • Challenges: Requires handling data velocity and volume, ensuring system scalability and reliability, and managing out-of-order data streams.
  • Examples:
    • Real-time analysis of social media feeds using Apache Kafka;
    • Processing live public transport data for city management using Apache Flink.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Data Quality Management

  • Data quality management involves ensuring the accuracy, completeness, and reliability of data in a dataset.
    • Advantages: Increases the trustworthiness of data, improves decision-making, and reduces the risk of errors in data analysis.
    • Challenges: Continuously maintaining data quality, especially with large and evolving datasets, and integrating quality management into existing processes.
  • Examples
    • Using OpenRefine to clean and standardise metadata across different collections of historical artifacts;
    • Ensuring accuracy in patient data in healthcare databases.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Techniques

Extract, Transform, Load (ETL)

  • ETL is a process where data is extracted from various sources, transformed into a suitable format, and loaded into a target system.
    • Advantages: Facilitates data consolidation, supports complex data transformations, and enables effective data storage and analysis.
    • Challenges: Managing data from disparate sources, ensuring data transformation accuracy, and maintaining ETL process performance.
  • Examples
    • Extracting economic and demographic data from various government departments using Apache NiFi transforming it for consistency, and loading it into an aggregated portal.
    • Analysing sales data from different systems.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Comprehensive Knowledge Archive Network (CKAN)

Open Source Data Management System (DMS)

  • CKAN is an open source DMS, mainly written in Python, for powering data hubs and portals. It is maintained by the Open Knowledge Foundation since 2006.
  • It contains a PostgeSQL database, a Solr index, an API, and has several extensions.

https://ckan.org/
https://github.com/ckan/ckan

Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Apache Tools

  • Apache Kafka: a distributed event streaming platform designed for high-throughput, real-time data feeds, excelling as a scalable, durable, and fault-tolerant message broker for large-scale data integration and streaming
  • Apache Flink: a stream processing framework optimised for stateful computations and complex event processing on unbounded data streams, offering robust event time processing, advanced windowing, and real-time analytics capabilities.
  • Apache NiFi: a data flow management tool providing a user-friendly interface for automating, controlling, and monitoring data flows between systems, with strengths in data routing, transformation, and ensuring data provenance and compliance.
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

OpenRefine

  • Open source tool initally released in 2010 (first as Freebase Gridworks and then as Google Refine) for data cleanup and transformation
  • It operates as a local web application to clean messy data and can be installed on Windows, macOS and Linux
  • It handles various types of data (CSV, TSV, JSON, XML) and can connect to and import data from databases and other sources
  • It supports scripting in languages like General Refine Expression Language (GREL) and Jython, allowing for advanced data manipulation
  • It has various features: faceting/filtering, clustering, reconciliation, undo/redo

https://openrefine.org/
https://github.com/OpenRefine/OpenRefine

Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – OpenRefine

Goal: gain hands-on experience in setting up and navigating OpenRefine

Step 1. Getting started with OpenRefine

  1. Install the software (https://openrefine.org/docs)
  2. Run it locally (accessible at http://127.0.0.1:3333/)
  3. Have a look at the different pages and functionalities
  4. Create a new project by importing any supported files
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – OpenRefine

Step 2. Create a project with an extract from the CAS photographic archives

  1. Create a project by importing the data extract from the CAS photographic archives: ekws_extract.csv
  2. Review the dataset
  3. Clean the dataset by removing unnecessary columns
  4. Undertake some reconciliation with external services for agents (people and institutions).
Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – OpenRefine

Step 3. Going further

  1. Create a new project by importing a dataset from one of the ORD/OGD portals
  2. Analyse and curate the dataset

Alternatively: go through this tutorial from Library Carpentry: https://librarycarpentry.org/lc-open-refine/

Techniques, Software, and Tools
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Exercise – Analysing datasets from opendata.swiss

Starter code for all CSV datasets on opendata.swiss

  1. Find a dataset from the preconfigured starter code files: https://rnckp.github.io/starter-code_opendataswiss/
  2. Open it on Google Colab
  3. Run the code snippets
  4. You can also open it on GitHub (Jupyter Notebook or Rmd) and download the file

Explanation and Showcases: https://opendata.swiss/de/showcase/starter-code-fur-alle-csv-datensatze-auf-opendata-swiss

Techniques, Software, and Tools

Assignment Workshop

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Assignement

Work on your assignment

Assignment Workshop

Showcases

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Our World in Data

Research and data to make progress against the world’s largest problems

  • Our World in Data is a collaborative effort between researchers at the University of Oxford and the non-profit organisation Global Change Data Lab (GCDL).
  • It is a comprehensive online resource that presents empirical research and data on a wide array of global issues, focusing on large-scale problems like poverty, disease, hunger, climate change, war, existential risks, and inequality.
  • The platform aims to provide accessible, comprehensible, and transparent data to inform readers about the state of the world and to support informed decision-making.

https://ourworldindata.org/

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Infectious Diseases Dashboard (IDD)

  • The Infectious Diseases Dashboard (IDD) is managed by the Federal Office of Public Health (FOPH)
  • The IDD provides information on cases of infection and illness in Switzerland and the Principality of Liechtenstein caused by various pathogens.

https://idd.bag.admin.ch/

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Sportanlagen-Finder

The sports facility finder shows sports and exercise facilities operated by the canton of Basel-Stadt as well as all cantonal sports facilities outside the cantonal and national borders. The dataset also lists cantonal premises that are used and rented for sports activities.

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Federal Popular Votes

Managed by the Federal Sttistical Office where results of popular votes in Switzerland are continuously updated.

center

https://abstimmungen.admin.ch/

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Heile Preise

Launched in 2023 by Mario Zechner, this platform offers a comprehensive platform for comparing food prices across various supermarkets in Austria, tracking and analysing price trends over time.

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Animal Crossing Art Generator

  • Innovative tool for integrating virtual art: The generator allows players to turn any image from the Getty Museum's open-access collection into miniature works of art for use in the Animal Crossing: New Horizons game.
  • Creative Expression and Customisation: Players can use the tool to add famous artworks to their game by applying them to clothing, wallpaper, canvas, etc., enhancing their virtual environment with museum-quality art.
  • Technical foundation and accessibility: Uses open source code from the Animal Crossing Pattern Tool and includes a IIIF manifest converter for broader art integration, making it easy to import art from various institutions into the game.

https://experiments.getty.edu/ac-art-generator/

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Animal Crossing Art Generator

center

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

12 sunsets: Exploring Ed Ruscha's Archive

  • Interactive platform launched in 2020 by the J. Paul Getty Trust to explore Sunset Boulevard throughout 60 years (between 1965 and 2007) as photographed by Ed Ruscha
  • The 65,000 photographs are IIIF-compliant and are all linked to the Getty Research Institute

https://12sunsets.getty.edu/

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

12 sunsets: Exploring Ed Ruscha's Archive

center

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

LUX: Yale Collections Discovery

LUX provides a unified gateway to more than 41 million cultural heritage resources held by Yale's museums, archives and libraries: Yale University Library, Yale Center for British Art, Yale Peabody Museum, Yale University Art Gallery.

Built on open standards

  • Linked Art, IIIF, W3C Activity Streams
  • Widespread technologies: Python, JavaScript, Node.js, React, AWS
  • Multimodal database (NoSQL): MarkLogic Server

https://lux.collections.yale.edu/

[Metcalfe Hurst 2023]

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

LUX: Yale Collections Discovery

Data pipeline and architecture

center

[Raemy & Sanderson 2023]

Data Transformation Pipeline Code: https://github.com/project-lux/data-pipeline

Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

LUX: Yale Collections Discovery

center

Link to optimised video resolution

Showcases

Conclusion

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

Long-term archiving of ORD

  • Archival appraisal: Identifying which research data warrants long-term preservation, taking into account scientific, historical or legal significance as data volumes increase.

  • Documentation and contextualisation: Ensuring comprehensive documentation for each dataset, including the context of its creation, to maintain its relevance and intelligibility over time.

  • Infrastructure: Addressing the challenges of physical and software obsolescence, file format changes and the risk of content loss over time.

  • Methodology/Process: Determining the most effective time and method for archiving, including periodic or project completion, to prevent data loss and ensure data integrity.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

Movements and principles impacting Open Data

  • The rise of movements such as Open Access and Open Science, along with principles like FAIR and CARE, significantly shape the relevance and implementation of open data.

  • However, achieving true openness requires not only adherence to these principles and movements but also the backing of sufficient funding and the cultivation of necessary skills among data practitioners. This is crucial for ensuring that open data is not just available but also meaningful and usable.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

The Evolution of Open Data

  • While open data in itself is a commendable goal, the concept of Linked Open (Usable) Data takes it a step further.

  • Linked Open Data enhances the value of open data by ensuring it is not only available but also interconnected, making it more discoverable and useful for a wider range of applications and analyses.

  • LOUD is about enhancing usability and semantic interoperability leveraging community-driven standards and practices.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

OGD and ORD for GLAM institutions

  • ORD and OGD can be viewed both as a service provided to the public and as a process that requires active management and continuous improvement.

  • Institutions in the GLAM sector need to consider how these open data initiatives fit within their practices, both in terms of contributing data and utilising data for research, curation, and public engagement.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

Public Engagement and Empowerment

  • Open data empowers the public by providing access to information that was previously inaccessible or difficult to obtain.

  • This not only fosters a more informed citizenry but also enables individuals and communities to participate more actively in civic and cultural discourses.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

Transparency and Accountability

  • Open data plays a pivotal role in enhancing transparency and accountability, particularly in sectors where public trust is paramount.

  • By making data freely accessible, open data initiatives allow for greater scrutiny and analysis, leading to more accountable governance and institutional practices.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

AI and ML

  • Open data serves as a critical fuel for AI systems, providing the large datasets necessary for training ML models. The availability of diverse, high-quality open datasets enables more robust and inclusive AI developments.

  • By leveraging open data, AI can transcend a wide array of domains, from improving healthcare diagnostics to enhancing climate change models, thus contributing significantly to societal advancements and problem-solving.

  • Open data plays a pivotal role in fostering transparency and ethical practices in AI. By using open datasets, AI researchers and developers can ensure a level of accountability in their models, allowing for external validation and reducing biases.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Conclusion

Collaboration is Key

  • Collaboration is a fundamental aspect of open data initiatives.

    • Discussing best practices grounded in collaboration, such as leveraging the Collections as Data checklist
    • Participating in the IIIF and Linked Art communities for the cultural heritage field (and beyond, notably for the STEM sector)
    • OGD meet-ups (Open Data Beer)
    • Etc.
  • Such collaboration is vital for addressing global challenges, encouraging innovation, and ensuring the sustainable development of open data ecosystems.

Conclusion
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

A multitude of tools

For a better understanding of the past,
Our images have to be enhanced,
A new dialogue in three dimensions,
Must have openness at its heart,
For somewhere within the archive
Of our aggregated minds
Are a multitude of questions
And a multitude of answers,
Simply awaiting to be found.

[Mr Gee 2023], Data Poet at EuropeanaTech 2023

Conclusion

References and Image Credits

Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Alter, G., Rizzolo, F., & Schleidt, K. (2023). View points on data points: A shared vocabulary for cross-domain conversations on data and metadata. IASSIST Quarterly, 47(1), 1–39. https://doi.org/10.29173/iq1051

Berners-Lee, T. (1991, August 6). WorldWideWeb — Executive summary. Archive.Md. https://archive.md/Lfopj

Berners-Lee, T. (2006, July 27). Linked Data. W3C. https://www.w3.org/DesignIssues/LinkedData.html

Candela, G., Gabriëls, N., Chambers, S., Dobreva, M., Ames, S., Ferriter, M., Fitzgerald, N., Harbo, V., Hofmann, K., Holownia, O., Irollo, A., Mahey, M., Manchester, E., Pham, T.-A., Potter, A., & Van Keer, E. (2023). A checklist to publish collections as data in GLAM institutions. Global Knowledge, Memory and Communication, ahead-of-print. https://doi.org/10.1108/GKMC-06-2023-0195

Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. https://doi.org/10.5334/dsj-2020-043

Carroll, S. R., Herczog, E., Hudson, M., Russell, K., & Stall, S. (2021). Operationalizing the CARE and FAIR Principles for Indigenous data futures. Scientific Data, 8(1), 108. https://doi.org/10.1038/s41597-021-00892-0

Chen, M., & Floridi, L. (2013). An analysis of information visualisation. Synthese, 190(16), 3421–3438.
https://doi.org/10.1007/s11229-012-0183-y

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Concordat Working Group. (2016). Concordat on Open Research Data. UK Research and Innovation. https://www.ukri.org/wp-content/uploads/2020/10/UKRI-020920-ConcordatonOpenResearchData.pdf

Cornut, M., Raemy, J. A., & Spiess, F. (2023). Annotations as Knowledge Practices in Image Archives: Application of Linked Open Usable Data and Machine Learning. Journal on Computing and Cultural Heritage, 16(4), 1–19. https://doi.org/10.1145/3625301

Felsing, U., Fornaro, P., Frischknecht, M., & Raemy, J. A. (2023). Community and Interoperability at the Core of Sustaining Image Archives. Digital Humanities in the Nordic and Baltic Countries Publications, 5(1), 40–54. https://doi.org/10.5617/dhnbpub.10649

Floridi, L. (2010). Information: A very short introduction. Oxford University Press. ISBN 978-0-19-955137-8

FOSTER. (2019). Open Science. In Foster Taxonomy. FACILITATE OPEN SCIENCE TRAINING FOR EUROPEAN RESEARCH. https://www.fosteropenscience.eu/taxonomy/term/100

Idehen, K. U. (2017, July 24). Semantic Web Layer Cake Tweak, Explained. OpenLink Software Blog. https://medium.com/openlink-software-blog/semantic-web-layer-cake-tweak-explained-6ba5c6ac3fab

Jullien, N. (2009). A Historical Analysis of the Emergence of Free Cooperative Software Production: In M. Pagani (Ed.), Encyclopedia of Multimedia Technology and Networking, Second Edition (pp. 605–612). IGI Global. https://doi.org/10.4018/978-1-60566-014-1.ch081

Loi fédérale sur l’utilisation des moyens électroniques pour l’exécution des tâches des autorités (LMETA), Pub. L. No. FF 2023 787, 22.022 Confédération suisse. Secrétariat général DFF (2023). https://fedlex.data.admin.ch/eli/fga/2023/787

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Max Planck Society & European Cultural Heritage Online. (2003). Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. Max Planck Society. https://openaccess.mpg.de/Berlin-Declaration

Metcalfe Hurst, E. (2023). LUX: Yale Collections Discovery. ARLIS/NA Multimedia & Technology Reviews, 2023(4), 1–4. https://doi.org/10.17613/3hy1-pv45

MJL. (2020). Creative commons license spectrum. https://commons.wikimedia.org/wiki/File:Creative_commons_license_spectrum.svg

Morrison, R. (2021). Redrawn slide from presentation of Ana Persic, Division of Science Policy and Capacity-Building (SC/PCB), UNESCO (France) presentation to Open Science Conference 2021, ZBW — Leibniz Information Centre for Economics, Germany. Own work. https://commons.wikimedia.org/wiki/File:Osc2021-unesco-open-science-no-gray.png

Mr Gee. (2023, October 12). Day 2 Closing – A multitude of tools. EuropeanaTech 2023. EuropeanaTech 2023, The Hague, Netherlands. https://youtu.be/pOX9CrvAG7I

Newbury, D. (2018). LOUD: Linked Open Usable Data and linked.art. 2018 CIDOC Conference, 1–11. https://cidoc.mini.icom.museum/wp-content/uploads/sites/6/2021/03/CIDOC2018_paper_153.pdf

OFS. (2023). Masterplan Open Government Data 2024-2027 (p. 24) [Masterplan OGD]. Office fédérale de la statistique. https://www.newsd.admin.ch/newsd/message/attachments/84864.pdf

Open Knowledge. (2016). The Open Data Handbook. Open Data Handbook. https://opendatahandbook.org/

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Open Science Delegation. (2021a). Swiss National Open Research Data Strategy. swissuniversities. https://www.swissuniversities.ch/fileadmin/swissuniversities/Dokumente/Hochschulpolitik/ORD/Swiss_National_ORD_Strategy_en.pdf

Open Science Delegation. (2021b). Swiss National Strategy Open Research Data: Action Plan. swissuniversities. https://www.swissuniversities.ch/fileadmin/swissuniversities/Dokumente/Hochschulpolitik/ORD/ActionPlanV1.0_December_2021_def.pdf

Oxford English Dictionary. (2023a). Infrastructure. In Oxford English Dictionary (OED). Oxford University Press. https://doi.org/10.1093/OED/1206711036

Oxford English Dictionary. (2023b). Metadata. In Oxford English Dictionary (OED). Oxford University Press. https://doi.org/10.1093/OED/7968104326

Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E., & Varner, S. (2017). Always Already Computational: Collections as Data. Collections as Data. https://doi.org/10.17605/OSF.IO/MX6UK

Padilla, T., Scates Kettler, H., & Shorish, Y. (2023). Collections as Data: Part to Whole (p. 19) [Final Report]. Always Already Computational - Collections as Data. https://doi.org/10.5281/zenodo.10161976

Padilla, T., Scates Kettler, H., Varner, S., & Shorish, Y. (2023). Vancouver Statement on Collections as Data [White paper]. Internet Archive Canada. https://doi.org/10.5281/zenodo.8341519

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Page, M., Hajduk, E., Lincklaen Arriëns, E. N., Cecconi, G., & Brinkhuis, S. (2023). 2023 Open Data Maturity Report [ODM Report]. European Union. https://doi.org/10.2830/384422

Persic, A. (2021, February). Building a Global Consensus on Open Science – the future UNESCO Recommendation on Open Science. https://doi.org/10.5446/53434

Pichot Damon, E. (2024, January 12). Table périodique: Les facteurs de succès d’un projet d’open data. Open Datactivist. https://open.datactivist.coop/docs/tableau-periodique

Raemy, J. A. (2023). Characterising the IIIF and Linked Art Communities: Survey report (p. 29) [Report]. University of Basel. https://doi.org/10.5451/unibas-ep95340

Raemy, J. A., Gray, T., Collinson, A., & Page, K. R. (2023, July 12). Enabling Participatory Data Perspectives for Image Archives through a Linked Art Workflow (Poster). Digital Humanities 2023 Posters. Digital Humanities 2023, Graz, Austria. https://doi.org/10.5281/zenodo.7878358

Raemy, J. A., & Sanderson, R. (2023). Analysis of the Usability of Automatically Enriched Cultural Heritage Data (arXiv:2309.16635). arXiv. https://doi.org/10.48550/arXiv.2309.16635

Sanderson, R. (2018, May 15). Shout it Out: LOUD. EuropeanaTech Conference 2018, Rotterdam, the Netherlands. https://www.slideshare.net/Europeana/shout-it-out-loud-by-rob-sanderson-europeanatech-conference-2018

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Sanderson, R. (2019). Keynote: Standards and Communities: Connected People, Consistent Data, Usable Applications. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 28. https://doi.org/10.1109/JCDL.2019.00009

Sanderson, R. (2023, October 13). Understanding Linked Art. Linked Art face-to-face meeting, Amsterdam, The Netherlands. https://www.slideshare.net/azaroth42/understanding-linked-art

Santos, A. (2020). Données de la recherche : cadre juridique et licences [Mémoire de master, HES-SO University of Applied Sciences and Arts, Haute école de gestion de Genève]. https://doi.org/10.5281/zenodo.3967402

Scholger, W. (2023, October 20). Legal Aspects of Arts and Humanities Data. DARIAH-CH Study Day 2023, Bern, Switzerland. https://www.dariah.ch/_files/ugd/8756fc_af72e01542284294ac0b7cf5c6064160.pdf

Star, S. L., & Griesemer, J. R. (1989). Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social Studies of Science, 19(3), 387–420. https://www.jstor.org/stable/285080

Star, S. L. (1999). The Ethnography of Infrastructure. American Behavioral Scientist, 43(3), 377–391. https://doi.org/10.1177/00027649921955326

Stürmer, M. E. (2016). Measuring the promise of open data: Development of the Impact Monitoring Framework. 1–12. https://doi.org/10.7892/boris.75031

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

References

Tennant, J., Agarwal, R., Baždarić, K., Brassard, D., Crick, T., Dunleavy, D. J., Evans, T. R., Gardner, N., Gonzalez-Marquez, M., Graziotin, D., Greshake Tzovaras, B., Gunnarsson, D., Havemann, J., Hosseini, M., Katz, D. S., Knöchelmann, M., Madan, C. R., Manghi, P., Marocchino, A., … Yarkoni, T. (2020). A tale of two ‘opens’: Intersections between Free and Open Source Software and Open Scholarship [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/2kxq8

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data | CC BY 4.0

Image Credits

Cultural Anthropology Switzerland (CAS)

These images are part of the photographic archives of Cultural Anthropology Switzerland, formerly the Swiss Society for Folklore Studies, based in Basel, Switzerland. Licence: CC BY-NC 4.0

  • Brunner, Ernst. [Katze auf einer Mauer]. Ort und Datum unbekannt. Black and White Negative, 6x6cm. SGV_12 Ernst Brunner. SGV_12N_19553. Alte Bildnummer: HV 53. https://archiv.sgv-sstp.ch/resource/441788
  • Brunner, Ernst. [Ringtanz während der Masüras auf der Alp Sura]. Guarda, 1939. Black and White Negative, 6x6cm. SGV_12 Ernst Brunner. SGV_12N_08589. Alte Bildnummer: DL 89. https://archiv.sgv-sstp.ch/resource/430824
References and Image Credits

Courses take place over the course of four Tuesday afternoons

"In December 2007, thirty thinkers and activists of the Internet held a meeting in Sebastopol, north of San Francisco. Their aim was to define the concept of open public data and have it adopted by the US presidential candidates. Among them, were two well-known figures: Tim O’Reilly and Lawrence Lessig. The first is familiar to the techies: this American author and editor is the originator of many vanguard computer and Internet movements; he defined and popularized expressions such as the open source and Web 2.0. Lawrence Lessig, Professor of Law at Stanford University (California), is the founder of Creative Commons licenses, based on the idea of copyleft and free dissemination of knowledge. Participants of the Sebastopol meeting mostly come from the free software and culture movements. These movements are at the heart of many innovations in the field of computers and the Internet over the last fifteen years. Some of these innovations are now familiar – think of the collaborative encyclopedia Wikipedia. Other open source creations are less known to the general public despite playing a fundamental role in online services: for instance, the Apache software for the servers is used to host most websites. Some activists and entrepreneurs who already used public data were attending the Sebastopol meeting too: Adrian Holovaty (the founder of EveryBlock, a localized information service) and Briton Tom Steinberg (initiator of the FixMyStreet site). One of the youngest of the group was no other than the late Aaron Swartz, inventor of the RSS and free knowledge activist. Together, they created the principles that allow us today to define and evaluate open public data." Source: https://www.paristechreview.com/2013/03/29/brief-history-open-data/

The data is made available online free of charge, in a timely manner, in machine-readable form and in an open format

NaDB: The Federal Council expects to make data management in the public sector easier and more efficient by reusing data: Persons and businesses will only need to report certain information once (once only principle). i14y: National Data Catalogue which ensures the efficient exchange of data between authorities, companies and citizens The Digital Switzerland Division is part of the Federal Chancellery’s Digital Transformation and ICT Steering (DTI) Sector. The division coordinates the ongoing development and implementation of the Digital Switzerland Strategy.

Government data shall be considered open if it is made public in a way that complies with the principles below: 1. Complete All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations. 2. Primary Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms. 3. Timely Data is made available as quickly as necessary to preserve the value of the data. 4. Accessible Data is available to the widest range of users for the widest range of purposes. 5. Machine processable Data is reasonably structured to allow automated processing. 6. Non-discriminatory Data is available to anyone, with no requirement of registration. 7. Non-proprietary Data is available in a format over which no entity has exclusive control. 8. License-free Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Ownership and Control: Copyright grants the creator exclusive rights over their work, including the right to control how it's used, reproduced, and distributed. Others must obtain permission from the copyright holder to use the work, often requiring payment or adherence to specific conditions. Copyleft, on the other hand, is a concept in the realm of free and open-source software. It allows anyone to use, modify, and distribute the work, but with the stipulation that any derivative work must also be distributed under the same or compatible license terms. This ensures that the work and its derivatives remain free and open. Purpose and Philosophy: Copyright is designed to protect the economic interests of creators by granting them exclusive rights to monetize their work. It supports the traditional model of intellectual property rights. Copyleft is motivated by the idea of promoting freedom and sharing of knowledge. It is intended to keep creative works accessible and reusable for the public, encouraging collaborative improvement and innovation.

Focus and Foundation: The Anglo-Saxon copyright system (common in countries like the U.S. and the U.K.) is largely focused on the economic rights of authors. It treats copyright as a type of property that can be bought, sold, or transferred, and emphasizes the monetary value of creative works. The European author's rights model places a stronger emphasis on the moral rights of the creator, alongside the economic rights. This includes the right to be recognized as the author of a work and to object to any distortion or modification that could harm the author's reputation. Duration and Transferability: In the Anglo-Saxon system, copyright is often seen as a more transferable and commercial asset. The duration of copyright is typically based on a set number of years post-creation or the author's life plus a certain number of years. The European model tends to grant authors inalienable moral rights that remain with the creator regardless of the economic rights being sold or transferred. The duration can also vary, but it usually includes the author's lifetime plus a period after their death (commonly 70 years in many European countries).

The rights statements have been specifically developed for the needs of cultural heritage institutions and online cultural heritage aggregation platforms and are not intended to be used by individuals to license their own creations.

One of the specific features of this licence is that attribution is not required in the case of derived content that is not a database but is produced from one, such as graphics, diagrams or maps. [Santos 2020, citing Ball 2014]

Responsible AI Licenses (RAIL) are a class of licenses designed to encourage the responsible use of an AI artifact being licensed by including a set of use restrictions applied to AI artifact. RAILs can be more or less restrictive depending on the aims of the licensor. For instance, a license can be RAIL while being a proprietary license, or a license just allowing the use of the AI feature for research purposes and without allowing distribution of derivative versions. In contrast, Open & Responsible AI Licenses (OpenRAIL) are a subclass of RAIL licenses that permit free-of-charge open access and re-use of AI artifacts for commercial purposes, while including usage restrictions. Note that usage restrictions in RAIL Licenses also apply to any derivatives of AI artifact. RAILs can be used to license data (D), Apps (A), models (M), and source code (S). depending on the AI feature(s) you are licensing, you will add suffix D, A, M, or S

Definition of an infrastructure according to Susan Leigh Star

Thee of the nine dimensions... Embeddedness: Infrastructure is sunk into and inside of other structures, social arrangements, and technologies. People do not necessarily distinguish the several coordinated aspects of infrastructure. Links with conventions of practice: Infrastructure both shapes and is shaped by the conventions of a community of practice. Embodiment of standards: Modified by scope and often by conflicting conventions, infrastructure takes on transparency by plugging into other infrastructures and tools in a standardised fashion.

Thee of the nine dimensions... Embeddedness: Infrastructure is sunk into and inside of other structures, social arrangements, and technologies. People do not necessarily distinguish the several coordinated aspects of infrastructure. Links with conventions of practice: Infrastructure both shapes and is shaped by the conventions of a community of practice. Embodiment of standards: Modified by scope and often by conflicting conventions, infrastructure takes on transparency by plugging into other infrastructures and tools in a standardised fashion.

While standards provide the "what" and "why" of metadata, schemas offer the "how" for specific data types or field needs.

DCAT-AP CH is a subprofile of DCAT-AP

Enhances Usability: Makes data more accessible and understandable to a wider audience, including non-experts and developers. Facilitates Data Quality and Trust: Offers transparency about the data’s sources, methodologies, and underlying code, building trust among users. Supports Data Integration and Development: Helps in combining data from different sources and in the development of applications using the open data.

Set of rules and standards that govern the exchange and accessibility of open data through the internet. Enables Accessibility: Facilitates easy and standardized access to open data, essential for fostering innovation and transparency. Supports Interoperability: Ensures that open data from various sources can be integrated and used together efficiently.

In a nutshell, OA refers to the practice of providing unrestricted access via the Internet to peer-reviewed scholarly research.

Relevance in Arts and Humanities - Addresses complex cultural materials and narratives with societal implications. - Promotes cultural understanding and engagement with broader societal issues.

It emphasises users' rights and community benefits, going beyond mere practical advantages. Key projects include the GNU Operating System and Free Software Directory.

FLOSS merges the social and ethical emphasis of Free Software with the pragmatic model of Open Source.

Data oriented principles

Indigenous data sovereignty reinforces the rights to engage in decision-making in accordance with Indigenous values and collective interests.

Data ecosystems shall be designed and function in ways that enable Indigenous Peoples to derive benefit from the data.

Indigenous Peoples’ rights and interests in Indigenous data must be recognised and their authority to control such data be empowered. Indigenous data governance enables Indigenous Peoples and governing bodies to determine how Indigenous Peoples, as well as Indigenous lands, territories, resources, knowledges and geographical indicators, are represented and identified within data.

Those working with Indigenous data have a responsibility to share how those data are used to support Indigenous Peoples’ selfdetermination and collective benefit. Accountability requires meaningful and openly available evidence of these efforts and the benefits accruing to Indigenous Peoples.

Indigenous Peoples’ rights and wellbeing should be the primary concern at all stages of the data life cycle and across the data ecosystem.

FAIR and CARE are complementary perspectives which enable maximum value through the appropriate and ethical reuse of Indigenous data. However, assessing the FAIR-ness of a data set is typically a technical exercise which can be done independently by the researcher to prepare the final data set for reuse. On the other hand, the CARE Principles require engagement with people to address the cultural, ethical, legal, and social dimensions associated with the intended uses of the dataset. As Indigenous communities expect CARE-full data practices to be enacted at each step of the data lifecycle, we will need to reflect a broader temporal dimension to our application of the CARE Principles. At present there is no process to assess whether a research project meets the CARE Principles. Creating such an assessment represents the next stage towards an equitable cyberinfrastructure that supports the FAIR and CARE-full use of Indigenous data. [Carroll et al. 2021]

The statement highlights the growing global engagement with collections as data. It promotes the responsible computational use of collections to empower memory, knowledge and data practitioners. It emphasises ethical concerns, openness and participatory design, as well as the need for transparent documentation and sustainable infrastructure. The statement, comprising of ten recommendations, also recognises the potential impact of data consumption by AI, and the importance of considering climate impacts and exploitative labour.

Purpose The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use. Design/methodology/approach The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data. Findings Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use. Originality/value While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collections as data principles in daily workflows following best practices and guidelines.

Linked Data refers to a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF, and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.

Linked Open Data is a subset of Linked Data that is open, meaning it is freely accessible and reusable by anyone. It adheres to the principles of being accessible under an open license, available in a machine-readable format, using open standards from the W3C (such as RDF and SPARQL), and linked to other datasets to increase its utility.

This Web, which has claimed to be a Semantic Web for several years now, has a centrepiece known as Resource Description Framework (RDF), a general method for describing and exchanging graph data. The Semantic Web offers major opportunities for scholarship as it allows data to be reasoned together, that is to be understood by machines via those RDF-based ontologies, a formal way to represent human-like knowledge.

With RDF, everything goes in threes, the data model contains so-called triples: that is subject, predicate, object that form graphs. Most of the components of these triples use Uniform Resource Identifiers (URIs) and are generally web-addressable, whether for naming subjects and objects (which may themselves also be objects of other triples) or relationships

IIIF is a community-driven initiative, which brings together key players in the academic and CH fields, and has defined open and shared APIs to standardise the way in which image-based resources are delivered on the Web. Implementing the IIIF APIs enables institutions to make better use of their digitised or born-digital material by providing, for instance, deep zooming, comparison, full-text search of OCR objects or annotation capabilities.

Organisations

And individuals/meetings

So why do we need IIIF? Digital images are fundamental carriers of information across the fields of cultural heritage, STEM, and others. They help us understand complex processes through visualization. They grab our attention and help us quickly understand abstract concepts. They help document many the past--and the present--and preserve it for the future. They are also ubiquitous: we interact with thousands of them every day both in real life and on the web. In short, images are important and we interact with large volumes of them online. Image 1: Female Figurine, Chupicuaro, 500/300 B.C Image 2: Vision of Saint Gregory, unknown artist, n.d. Image 3: Iyo Province: Saijo, Utagawa Hiroshige, 1855

- Linked Art is focused on usability, not full precision / completeness - Consistently solves actual challenges from real data - Development is iterative, as new use cases are found

A Periodic Table of Open Data Elements detailing the enabling conditions and disabling factors that often determine the impact of open data initiatives. Five main elements: Problem and Demand Definition Capacity and Culture Governance Parternships Risks

While Kafka, Flink, and NiFi each serve distinct purposes in data streaming and processing – Kafka for data integration and transportation, Flink for in-depth processing and analytics, and NiFi for flow management and data routing – their combined use can create a comprehensive, efficient, and robust data management architecture.

LUX enhances and prepares Yale collections data for further collaboration, use, and re-use. With the support of a well-resourced research university–including Yale’s Vice-Provost office–in addition to the support of active committees with members across Yale and a meticulous technical team, LUX is well-positioned to help bridge gaps and create more accessible and diverse representations of cultural heritage collections. [Metcalfe Hurst 2023]

In summary, open data is more than a concept; it's a dynamic ecosystem that thrives on principles, collaboration, and continuous evolution. Its impact spans from enhancing transparency to empowering public engagement, and its future hinges on effective funding, skill development, and collaborative efforts.