Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Access
Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities
Authors and right holders must grant all users a free, irrevocable, worldwide, right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship as well as the right to make small numbers of printed copies for their personal use.
A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in an appropriate standard electronic format is deposited in at least one online repository using suitable technical standards.
[Max Planck Society & European Cultural Heritage Online 2003]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Access
Definition
Open access (OA) is a broad international movement that seeks to grant free and open online access to academic information, such as publications and data. A publication is defined 'open access' when there are no financial, legal or technical barriers to accessing it - that is to say when anyone can read, download, copy, distribute, print, search for and search within the information, or use it in education or in any other way within the legal agreements.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Access
Definition
OA is a publishing model for scholarly communication that makes research information available to readers at no cost, as opposed to the traditional subscription model in which readers have access to scholarly information by paying a subscription (usually via libraries).
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Access
Gold Open Access
Publications are made freely accessible by the publisher immediately upon publication. It often involves Article Processing Charges (APCs) paid by the author, their institution, or a funder.
→ Immediate OA via publisher
Green Open Access (Self-Archiving)
Authors publish their work in any journal and then self-archive an earlier version of the article (pre-print) for free public use in a repository (sometimes after an embargo period).
→ Immediate or delayed OA via self-archiving method/repository
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Access
Hybrid Open Access
Subscription-based journals allow authors to make their individual articles OA upon payment of an APC.
→ Immediate OA via publisher
Diamond/Platinum Open Access
Journals do not charge authors APCs and provide immediate OA to all their articles. It operates without direct cost to the authors; funding often comes from institutions, societies, or donations.
→ Immediate OA via publisher
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Access
Bronze Open Access
Articles made freely accessible on the publisher's website without an explicit OA licence.
Blue Open Access
Through blue OA, authors can archive the post-print or the publisher’s final version.
Black Open Access
It refers to the unauthorised distribution of published content through various channels, such as pirate sites or peer-to-peer networks.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Science
Definition
Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.
[FOSTER 2019]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Science
[Morrison 2021, citing Persic 2021]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Scholarship
Open Scholarship: Expanding the Reach of Open Science
Broader Approach
Extends beyond traditional scientific disciplines to include arts and humanities.
Engages not just the research community but also the wider public, including non-experts, educators, and policymakers.
Supporting Collaboration and Innovation
Facilitates interdisciplinary collaboration across arts, humanities, and other fields.
Encourages the use of open educational resources for collaborative teaching and learning.
Advances open data practices for the sharing and reuse of cultural heritage resources.
[Tennant et al. 2020]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Source
Definition and Philosophy
Open Source refers to software with source code that can be inspected, modified, and enhanced by anyone. It emphasises collaboration and community-oriented development.
Key Characteristics
It includes free redistribution, access to source code, and allowance for derived works.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Source
Criteria
Free redistribution
Source code must be included
Derived works must be allowed
Integrity of the author's source code
No discrimation against persons or groups
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Free Software
Free Software is centred around the idea of user freedom – the freedom to run, study, change, and distribute the software. "Free" refers to freedom, not price.
It has four essential freedoms
The freedom to run the program as you wish, for any purpose (freedom 0).
The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
The freedom to redistribute copies so you can help others (freedom 2).
The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Free Software
“Free software” means software that respects users' freedom and community. Roughly, it means that the users have the freedom to run, copy, distribute, study, change and improve the software. Thus, “free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech,” not as in “free beer.” We sometimes call it “libre software,” borrowing the French or Spanish word for “free” as in freedom, to show we do not mean the software is gratis.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
F(L)OSS
Free/Libre and Open Source Software
This is software for which the licensee can get the source code, and is allowed to modify this code and to redistribute the software and the modifications. Many terms are used: free, referring to the freedom to use (not to “free of charge”), libre, which is the French translation of Free/freedom, and which is preferred by some writers to avoid the ambiguous reference to free of charge, and open source, which focuses more on the access to the sources than on the freedom to redistribute. In practice, the differences are not great, and more and more scholars are choosing the term FLOSS to name this whole movement.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
An Open Vision of the Web
The [World Wide Web] project merges the techniques of information retrieval and hypertext to make an easy but powerful global information system. The project started with the philosophy that much academic information should be freely available to anyone.
[Berners-Lee 1991]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Data
Linked Data refers to a set of best practices for publishing structured data on the Web.
Linked Data Principles
Use Uniform Resource Identifiers (URIs) as names for things
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (e.g. RDF, RDFS, SPARQL, etc.)
Include links to other URIs so that they can discover more things.
[Berners-Lee 2006]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
The Semantic Web or the Web of Data
The Semantic Web is an extension of the World Wide Web, through standards, to make it machine-readable.
Tweaked Semantic Web Layer Cake [Idehen 2017]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Resource Description Framework (RDF)
With RDF, everything goes in threes. Most of the triples' components have Uniform Resource Identifiers (URIs). Syntax: subject, predicate, object
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Open Usable Data (LOUD)
The concept of LOUD extends LOD by emphasising not just the openness and interlinking of data but also its usability.
LOUD
The term was coined by Robert Sanderson [2018, 2019] who has been involved in the conception and maintenance of web standards, mainly in the cultural heritage field.
LOUD's goal is to achieve the Semantic Web's intent on a global scale in a usable fashion by leveraging community-driven and JSON-LD-based specifications.
It has five main design principles to make the data more easily accessible to software developers, who play a key role in interacting with the data and building software and services on top of it, and to some extent to academics.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
IIIF Community
State and National Libraries: Bavarian State Library, French National Library (BnF), British Library, National Library of Estonia, New York Public Library, Vatican Library, etc.
Archives: Blavatnik Foundation Archive, Indigenous Digital Archive, Internet Archive, Swedish National Archives, Swiss Federal Archives, etc.
Museums & Galleries: Art Institute Chicago, J. Paul Getty Trust, Smithsonian, Victoria & Albert Museum, MIT Museum, National Gallery of Art, Van Gogh Worldwide, etc.
Universities & Research Institutions: Cambridge, Cornell University, Ghent University, Swiss National Data and Service Center for the Humanities (DaSCH), Kyoto University, Oxford, Stanford, University of Toronto, Yale University, etc.
Aggregators/Facilitators: Europeana, Cuba-IIIF, Cultural Japan, OCLC ContentDM, etc.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
IIIF Community
2019 IIIF Conference, Göttingen, Germany
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
IIIF Community Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
IIIF Community Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Images are fundamental carriers of information
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
The Problem
A world of silos and duplication
Image delivery on the Web has historically been hard, slow, expensive, disjointed, and locked-up in silos.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
The Problem
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
IIIF Presentation API
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
IIIF Presentation API
{"@context":"http://iiif.io/api/presentation/3/context.json","id":"https://iiif.participatory-archives.ch/SGV_12N_08589/manifest.json","type":"Manifest","label":{"en":["[Ringtanz während der Masüras auf der Alp Sura]"]},"metadata":[{"label":{"de":["Titel"]},"value":{"de":["[Ringtanz während der Masüras auf der Alp Sura]"]}},
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Art
Linked Art is a community and a CIDOC (ICOM International Committee for Documentation) Working Group collaborating to define a metadata application profile for describing cultural heritage, and the technical means for conveniently interacting with it (the API).
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Art Community
Institutions (some of them)
The American Numismatics Society, Europeana, The Frick Collection, J. Paul Getty Trust, The Metropolitan Museum of Art, The Museum of Modern Art (NY), National Gallery of Art (US), Oxford University (OERC), The Philadephia Museum of Art, Rijksmuseum (NL), University of Basel (Digital Humanities Lab), University of the Arts London, Victoria and Albert Museum, Yale Center for British Art
Getty Vocabularies, mainly the Art & Architecture Thesaurus (AAT), as well as the Thesaurus of Geographic Names (TGN) and the Union List of Artist Names (ULAN)
Profile
Object-based cultural heritage (mainly art museum oriented)
API
JSON-LD 1.1, following REST (representational state transfer) and web patterns
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Art from 50k feet
[Raemy et al. 2023, adapted from Sanderson 2018]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Art
Digital Object
{"@context":"https://linked.art/ns/v1/linked-art.json","id":"https://linked.art/example/digital/0","type":"DigitalObject","_label":"Digital Image of Self-Portrait of Van Gogh","classified_as":[{"id":"http://vocab.getty.edu/aat/300215302","type":"Type","_label":"Digital Image"}],
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Art Digital Integration (with IIIF)
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
LOUD-Driven Infrastructure
[Felsing et al. 2023]
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Linked Open Usable Data (LOUD)
LOUD in a nutshell
Grassroots development of IIIF and Linked Art with collaboration and transparency are one of the key factors, but implementations are needed to be conducted in parallel (specifications versus demonstrability).
LOUD standards, when used in conjunction, enhances semantic interoperability, even if it comes at the cost of ontological purity.
LOUD practices and standards should serve as common denominators for cultural heritage institutions, public bodies as well as research projects.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Exercise – Identifying the Movements and Principles
Match the movements and principles to these statements
In pairs or small groups, relate the movements and principles (OA, Open Data, Open Science, FLOSS, FAIR, CARE, Collections as Data, LOUD) to the following propositions (multiple answers possible).
Software development
Image dissemination
Metadata dissemination
Publication of scientific articles
Persistent identifier assignment
Digital object interoperability
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Exercise – Identifying the Movements and Principles
Development and maintenance of APIs
Documentation
Open license
Inclusivity
Machine-readable (meta)data
Collaboration
Ethical commitment
Semantic interoperability
Reusability
Platforms and Organisations
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Data Platforms
ORD Platforms
Registry: re3data.org
Cross-disciplinary: SwissUBase
Humanities: DaSCH Service Platform
Cross-institutional: OLOS
Institutional: Yareta
Generic: Zenodo
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
re3data.org
Registry of Research Data Repositories
Platform launched in 2012
Registry that includes data repositories from various academic disciplines
Embeddable widgets and tools
Additional information, Data Accessibility, Terms of use and licences, Policy, Persistent Identifier (PID) system, Certification
All metadata are available for open use under CC0. It also provides an API to access the content
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
OLOS
Consultation and archive portal of Switzerland
National instance deployed in 2021
Developed as part of the Data Life-Cycle Management (DLCM) project and operated by an association composed of the University of Fribourg, the HEG-GE and the HES-SO
Dataset description based on the Datacite Metadata Schema
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Data Platforms
OGD Platforms
Switzerland
National: opendata.swiss
Cantonal: Open Data Basel-Stadt
Municipal: Stadt Zürich Open Data
Public-Law Body: Open Data Portal of Geneva Public Transport
International
EU: European Data
USA: DATA.GOV
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
opendata.swiss
Swiss public administration’s central portal for OGD
National platform launched in 2013 (first as opendata.admin.ch) under the direction of the Swiss Federal Archives. It exists as opendata.swiss since 2016 and is overseen by the Federal Statistical Office since 2019. It provides an overview of OGD published in Switzerland and is a joint project of the Confederation and the cantons.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Data Basel-Stadt
Canton of Basel-Stadt's OGD
Cantonal platform officially launched in 2019 (pilot project in 2017-2018)
Plateform based on opendatasoft
Own Metadata schema which comprises some DCAT and DCAT-AP CH properties. JSON API to explore the catalogue and the datasets. Dedicated dashboard. Opaque identifier per dataset.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Data Portal of Geneva Public Transport
opendata.tpg
Platform launched in 2022. First open data initiative in 2015 through their real-time transit time data API.
Democratisation process: transparency, efficiency, innovation, and citizen participation
Several metadata schemes and download possibilities including DCAT in RDF/XML. Dataset schema in JSON which comprises GeoJSON, a format for encoding a variety of geographic data structures. Non-opaque identifier per dataset.
Metadata displayed using DCAT-AP (currently version 2.1.1) and accessible through a variety of APIs and an SPARQL endpoint (see documentation). Opaque identifier per dataset.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Exercise – Comparing Open Data Portals
Short comparative analysis of open data portals
In pairs or small groups, you will conduct a comparative analysis of one ORD portal and one OGD portal, neither of which has been previously discussed in our course. Your analysis will involve comparing these portals with similar ones that have already been presented, based on specific criteria.
Choose one ORD portal and one OGD portal now. Announce your chosen portals to ensure no overlap.
Dimensions to conduct the analysis: Launch Year, Purpose and Theme, Data Types, Access, Metadata Standards, Dataset Identifiers
Prepare a concise 5-minute presentation of your findings (with or without visual aids)
Assessment, Data Quality, and Best Practices
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Data Maturity (ODM)
Annual assessment
Exercise done by the EU since 2015 to measure the progress of European countries in promoting and facilitating the availability and reuse of public sector information (→ mostly OGD).
Policy – It investigates the open data policies and strategies in place in the participating countries, the national governance models for managing open data and the measures applied to implement those policies and strategies.
Impact – It analyses the willingness, preparedness and ability of countries to measure both the reuse of open data and the impact created through this reuse.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Open Data Maturity (ODM)
Annual assessment
Portal – It investigates the functionality of national open data portals, the extent to which users’ needs and behaviour are examined to improve the portal, the availability of open data across different domains and the approach to ensuring the portal’s sustainability.
Quality – It assesses the measures adopted by portal managers to ensure the systematic harvesting of metadata, the monitoring of metadata quality and compliance with the DCAT-AP metadata standard, and the quality of deployment of the published data on the national portal.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
ODM 2023
35 participating countries: EU-27, 3 European Free Trade Association Countries (Iceland, Norway and Switzerland), 5 candidate countries (Bosnia and Herzegovina, Montenegro, Albania, Serbia and Ukraine)
Switzerland is 24th with an Open Data Maturity of 79%.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Data Scraping
Process of extracting data from websites or other online sources, typically using automated software or scripts.
Advantages: It allows for efficient data collection from multiple sources, can automate repetitive tasks, and is capable of handling large volumes of data.
Challenges: Data scraping faces issues like website layout changes, legal and ethical considerations, as well as handling dynamic content loaded through JavaScript.
Examples
Extracting exhibition data from museum websites using Beautiful Soup, a Python library;
Scraping historical records or archives from government websites with Scrapy.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
API Integration
API integration involves connecting and pulling data from external services.
Advantages: It provides structured and often real-time access to data, allows for automation, and ensures data consistency and reliability.
Challenges: Complexity in handling API limits/rate limiting, maintaining integration after API updates, and managing data from disparate APIs.
Examples
Integrating social media data from platforms like Instagram, LinkedIn or Mastodon;
Retrieving weather information from meteorological APIs.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Data Mining
Data mining is the process of analysing large datasets to discover patterns, correlations, and insights.
Advantages: Helps in identifying trends, making predictions, and informing decision-making processes; can uncover hidden patterns in data.
Challenges: Requires significant computational resources, potential privacy concerns, and the need for skilled interpretation of results.
Examples
Analysing visitor data patterns using RapidMiner, a Java-based data science platform;
Mining public opinion data from government surveys with WEKA (Waikato Environment for Knowledge Analysis), a Java-based Machine Learning software.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Data wrangling/munging
Data wrangling, or munging, involves transforming and mapping raw data into a more structured and usable format.
Advantages: Makes data more accessible and useful for analysis, helps in cleaning and standardising data, and improves data quality.
Challenges: Time-consuming, requires expertise in data manipulation, and can be complex with large and diverse datasets.
Examples
Formatting and combining different datasets for a research project using Python's pandas library;
Harmonising open government datasets from different departments for comparative analysis.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Data Integration
Data integration involves combining data from different sources to provide a unified view of the data.
Advantages: Provides a comprehensive view of data, enhances data usability and analysis, and supports better decision-making.
Challenges: Managing data format and schema discrepancies, ensuring data quality and consistency, and handling large-scale integration.
Examples:
Combining spatial data from various archaeological digs and historical GIS databases for comprehensive mapping and analysis;
Combining financial data from various business units.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Stream processing
Stream processing is the technique of processing data in real-time as it flows in streams from various sources.
Advantages: Enables real-time data analysis and decision-making, can handle high throughput, and is suitable for time-sensitive data.
Challenges: Requires handling data velocity and volume, ensuring system scalability and reliability, and managing out-of-order data streams.
Examples:
Real-time analysis of social media feeds using Apache Kafka;
Processing live public transport data for city management using Apache Flink.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Data Quality Management
Data quality management involves ensuring the accuracy, completeness, and reliability of data in a dataset.
Advantages: Increases the trustworthiness of data, improves decision-making, and reduces the risk of errors in data analysis.
Challenges: Continuously maintaining data quality, especially with large and evolving datasets, and integrating quality management into existing processes.
Examples
Using OpenRefine to clean and standardise metadata across different collections of historical artifacts;
Ensuring accuracy in patient data in healthcare databases.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Techniques
Extract, Transform, Load (ETL)
ETL is a process where data is extracted from various sources, transformed into a suitable format, and loaded into a target system.
Advantages: Facilitates data consolidation, supports complex data transformations, and enables effective data storage and analysis.
Challenges: Managing data from disparate sources, ensuring data transformation accuracy, and maintaining ETL process performance.
Examples
Extracting economic and demographic data from various government departments using Apache NiFi transforming it for consistency, and loading it into an aggregated portal.
Analysing sales data from different systems.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Comprehensive Knowledge Archive Network (CKAN)
Open Source Data Management System (DMS)
CKAN is an open source DMS, mainly written in Python, for powering data hubs and portals. It is maintained by the Open Knowledge Foundation since 2006.
It contains a PostgeSQL database, a Solr index, an API, and has several extensions.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Apache Tools
Apache Kafka: a distributed event streaming platform designed for high-throughput, real-time data feeds, excelling as a scalable, durable, and fault-tolerant message broker for large-scale data integration and streaming
Apache Flink: a stream processing framework optimised for stateful computations and complex event processing on unbounded data streams, offering robust event time processing, advanced windowing, and real-time analytics capabilities.
Apache NiFi: a data flow management tool providing a user-friendly interface for automating, controlling, and monitoring data flows between systems, with strengths in data routing, transformation, and ensuring data provenance and compliance.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
OpenRefine
Open source tool initally released in 2010 (first as Freebase Gridworks and then as Google Refine) for data cleanup and transformation
It operates as a local web application to clean messy data and can be installed on Windows, macOS and Linux
It handles various types of data (CSV, TSV, JSON, XML) and can connect to and import data from databases and other sources
It supports scripting in languages like General Refine Expression Language (GREL) and Jython, allowing for advanced data manipulation
It has various features: faceting/filtering, clustering, reconciliation, undo/redo
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Assignement
Work on your assignment
Showcases
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Our World in Data
Research and data to make progress against the world’s largest problems
Our World in Data is a collaborative effort between researchers at the University of Oxford and the non-profit organisation Global Change Data Lab (GCDL).
It is a comprehensive online resource that presents empirical research and data on a wide array of global issues, focusing on large-scale problems like poverty, disease, hunger, climate change, war, existential risks, and inequality.
The platform aims to provide accessible, comprehensible, and transparent data to inform readers about the state of the world and to support informed decision-making.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Sportanlagen-Finder
The sports facility finder shows sports and exercise facilities operated by the canton of Basel-Stadt as well as all cantonal sports facilities outside the cantonal and national borders. The dataset also lists cantonal premises that are used and rented for sports activities.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Heile Preise
Launched in 2023 by Mario Zechner, this platform offers a comprehensive platform for comparing food prices across various supermarkets in Austria, tracking and analysing price trends over time.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Animal Crossing Art Generator
Innovative tool for integrating virtual art: The generator allows players to turn any image from the Getty Museum's open-access collection into miniature works of art for use in the Animal Crossing: New Horizons game.
Creative Expression and Customisation: Players can use the tool to add famous artworks to their game by applying them to clothing, wallpaper, canvas, etc., enhancing their virtual environment with museum-quality art.
Technical foundation and accessibility: Uses open source code from the Animal Crossing Pattern Tool and includes a IIIF manifest converter for broader art integration, making it easy to import art from various institutions into the game.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Animal Crossing Art Generator
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
12 sunsets: Exploring Ed Ruscha's Archive
Interactive platform launched in 2020 by the J. Paul Getty Trust to explore Sunset Boulevard throughout 60 years (between 1965 and 2007) as photographed by Ed Ruscha
The 65,000 photographs are IIIF-compliant and are all linked to the Getty Research Institute
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
12 sunsets: Exploring Ed Ruscha's Archive
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
LUX: Yale Collections Discovery
LUX provides a unified gateway to more than 41 million cultural heritage resources held by Yale's museums, archives and libraries: Yale University Library, Yale Center for British Art, Yale Peabody Museum, Yale University Art Gallery.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
Long-term archiving of ORD
Archival appraisal: Identifying which research data warrants long-term preservation, taking into account scientific, historical or legal significance as data volumes increase.
Documentation and contextualisation: Ensuring comprehensive documentation for each dataset, including the context of its creation, to maintain its relevance and intelligibility over time.
Infrastructure: Addressing the challenges of physical and software obsolescence, file format changes and the risk of content loss over time.
Methodology/Process: Determining the most effective time and method for archiving, including periodic or project completion, to prevent data loss and ensure data integrity.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
Movements and principles impacting Open Data
The rise of movements such as Open Access and Open Science, along with principles like FAIR and CARE, significantly shape the relevance and implementation of open data.
However, achieving true openness requires not only adherence to these principles and movements but also the backing of sufficient funding and the cultivation of necessary skills among data practitioners. This is crucial for ensuring that open data is not just available but also meaningful and usable.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
The Evolution of Open Data
While open data in itself is a commendable goal, the concept of Linked Open (Usable) Data takes it a step further.
Linked Open Data enhances the value of open data by ensuring it is not only available but also interconnected, making it more discoverable and useful for a wider range of applications and analyses.
LOUD is about enhancing usability and semantic interoperability leveraging community-driven standards and practices.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
OGD and ORD for GLAM institutions
ORD and OGD can be viewed both as a service provided to the public and as a process that requires active management and continuous improvement.
Institutions in the GLAM sector need to consider how these open data initiatives fit within their practices, both in terms of contributing data and utilising data for research, curation, and public engagement.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
Public Engagement and Empowerment
Open data empowers the public by providing access to information that was previously inaccessible or difficult to obtain.
This not only fosters a more informed citizenry but also enables individuals and communities to participate more actively in civic and cultural discourses.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
Transparency and Accountability
Open data plays a pivotal role in enhancing transparency and accountability, particularly in sectors where public trust is paramount.
By making data freely accessible, open data initiatives allow for greater scrutiny and analysis, leading to more accountable governance and institutional practices.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
AI and ML
Open data serves as a critical fuel for AI systems, providing the large datasets necessary for training ML models. The availability of diverse, high-quality open datasets enables more robust and inclusive AI developments.
By leveraging open data, AI can transcend a wide array of domains, from improving healthcare diagnostics to enhancing climate change models, thus contributing significantly to societal advancements and problem-solving.
Open data plays a pivotal role in fostering transparency and ethical practices in AI. By using open datasets, AI researchers and developers can ensure a level of accountability in their models, allowing for external validation and reducing biases.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Conclusion
Collaboration is Key
Collaboration is a fundamental aspect of open data initiatives.
Discussing best practices grounded in collaboration, such as leveraging the Collections as Data checklist
Participating in the IIIF and Linked Art communities for the cultural heritage field (and beyond, notably for the STEM sector)
OGD meet-ups (Open Data Beer)
Etc.
Such collaboration is vital for addressing global challenges, encouraging innovation, and ensuring the sustainable development of open data ecosystems.
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
A multitude of tools
For a better understanding of the past,
Our images have to be enhanced,
A new dialogue in three dimensions,
Must have openness at its heart,
For somewhere within the archive
Of our aggregated minds
Are a multitude of questions
And a multitude of answers,
Simply awaiting to be found.
[Mr Gee 2023], Data Poet at EuropeanaTech 2023
References and Image Credits
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
References
Alter, G., Rizzolo, F., & Schleidt, K. (2023). View points on data points: A shared vocabulary for cross-domain conversations on data and metadata. IASSIST Quarterly, 47(1), 1–39. https://doi.org/10.29173/iq1051
Berners-Lee, T. (1991, August 6). WorldWideWeb — Executive summary. Archive.Md. https://archive.md/Lfopj
Candela, G., Gabriëls, N., Chambers, S., Dobreva, M., Ames, S., Ferriter, M., Fitzgerald, N., Harbo, V., Hofmann, K., Holownia, O., Irollo, A., Mahey, M., Manchester, E., Pham, T.-A., Potter, A., & Van Keer, E. (2023). A checklist to publish collections as data in GLAM institutions. Global Knowledge, Memory and Communication, ahead-of-print. https://doi.org/10.1108/GKMC-06-2023-0195
Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. https://doi.org/10.5334/dsj-2020-043
Carroll, S. R., Herczog, E., Hudson, M., Russell, K., & Stall, S. (2021). Operationalizing the CARE and FAIR Principles for Indigenous data futures. Scientific Data, 8(1), 108. https://doi.org/10.1038/s41597-021-00892-0
Cornut, M., Raemy, J. A., & Spiess, F. (2023). Annotations as Knowledge Practices in Image Archives: Application of Linked Open Usable Data and Machine Learning. Journal on Computing and Cultural Heritage, 16(4), 1–19. https://doi.org/10.1145/3625301
Felsing, U., Fornaro, P., Frischknecht, M., & Raemy, J. A. (2023). Community and Interoperability at the Core of Sustaining Image Archives. Digital Humanities in the Nordic and Baltic Countries Publications, 5(1), 40–54. https://doi.org/10.5617/dhnbpub.10649
Floridi, L. (2010). Information: A very short introduction. Oxford University Press. ISBN 978-0-19-955137-8
Jullien, N. (2009). A Historical Analysis of the Emergence of Free Cooperative Software Production: In M. Pagani (Ed.), Encyclopedia of Multimedia Technology and Networking, Second Edition (pp. 605–612). IGI Global. https://doi.org/10.4018/978-1-60566-014-1.ch081
Loi fédérale sur l’utilisation des moyens électroniques pour l’exécution des tâches des autorités (LMETA), Pub. L. No. FF 2023 787, 22.022 Confédération suisse. Secrétariat général DFF (2023). https://fedlex.data.admin.ch/eli/fga/2023/787
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
References
Max Planck Society & European Cultural Heritage Online. (2003). Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. Max Planck Society. https://openaccess.mpg.de/Berlin-Declaration
Morrison, R. (2021). Redrawn slide from presentation of Ana Persic, Division of Science Policy and Capacity-Building (SC/PCB), UNESCO (France) presentation to Open Science Conference 2021, ZBW — Leibniz Information Centre for Economics, Germany. Own work. https://commons.wikimedia.org/wiki/File:Osc2021-unesco-open-science-no-gray.png
Mr Gee. (2023, October 12). Day 2 Closing – A multitude of tools. EuropeanaTech 2023. EuropeanaTech 2023, The Hague, Netherlands. https://youtu.be/pOX9CrvAG7I
Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E., & Varner, S. (2017). Always Already Computational: Collections as Data. Collections as Data. https://doi.org/10.17605/OSF.IO/MX6UK
Padilla, T., Scates Kettler, H., & Shorish, Y. (2023). Collections as Data: Part to Whole (p. 19) [Final Report]. Always Already Computational - Collections as Data. https://doi.org/10.5281/zenodo.10161976
Padilla, T., Scates Kettler, H., Varner, S., & Shorish, Y. (2023). Vancouver Statement on Collections as Data [White paper]. Internet Archive Canada. https://doi.org/10.5281/zenodo.8341519
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
References
Page, M., Hajduk, E., Lincklaen Arriëns, E. N., Cecconi, G., & Brinkhuis, S. (2023). 2023 Open Data Maturity Report [ODM Report]. European Union. https://doi.org/10.2830/384422
Persic, A. (2021, February). Building a Global Consensus on Open Science – the future UNESCO Recommendation on Open Science. https://doi.org/10.5446/53434
Raemy, J. A. (2023). Characterising the IIIF and Linked Art Communities: Survey report (p. 29) [Report]. University of Basel. https://doi.org/10.5451/unibas-ep95340
Raemy, J. A., Gray, T., Collinson, A., & Page, K. R. (2023, July 12). Enabling Participatory Data Perspectives for Image Archives through a Linked Art Workflow (Poster). Digital Humanities 2023 Posters. Digital Humanities 2023, Graz, Austria. https://doi.org/10.5281/zenodo.7878358
Raemy, J. A., & Sanderson, R. (2023). Analysis of the Usability of Automatically Enriched Cultural Heritage Data (arXiv:2309.16635). arXiv. https://doi.org/10.48550/arXiv.2309.16635
Santos, A. (2020). Données de la recherche : cadre juridique et licences [Mémoire de master, HES-SO University of Applied Sciences and Arts, Haute école de gestion de Genève]. https://doi.org/10.5281/zenodo.3967402
Star, S. L., & Griesemer, J. R. (1989). Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social Studies of Science, 19(3), 387–420. https://www.jstor.org/stable/285080
Stürmer, M. E. (2016). Measuring the promise of open data: Development of the Impact Monitoring Framework. 1–12. https://doi.org/10.7892/boris.75031
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
References
Tennant, J., Agarwal, R., Baždarić, K., Brassard, D., Crick, T., Dunleavy, D. J., Evans, T. R., Gardner, N., Gonzalez-Marquez, M., Graziotin, D., Greshake Tzovaras, B., Gunnarsson, D., Havemann, J., Hosseini, M., Katz, D. S., Knöchelmann, M., Madan, C. R., Manghi, P., Marocchino, A., … Yarkoni, T. (2020). A tale of two ‘opens’: Intersections between Free and Open Source Software and Open Scholarship [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/2kxq8
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
Julien A. Raemy | 7C2-CT-4A Introduction to Open Data |
Brunner, Ernst. [Katze auf einer Mauer]. Ort und Datum unbekannt. Black and White Negative, 6x6cm. SGV_12 Ernst Brunner. SGV_12N_19553. Alte Bildnummer: HV 53. https://archiv.sgv-sstp.ch/resource/441788
Brunner, Ernst. [Ringtanz während der Masüras auf der Alp Sura]. Guarda, 1939. Black and White Negative, 6x6cm. SGV_12 Ernst Brunner. SGV_12N_08589. Alte Bildnummer: DL 89. https://archiv.sgv-sstp.ch/resource/430824
Courses take place over the course of four Tuesday afternoons
"In December 2007, thirty thinkers and activists of the Internet held a meeting in Sebastopol, north of San Francisco. Their aim was to define the concept of open public data and have it adopted by the US presidential candidates.
Among them, were two well-known figures: Tim O’Reilly and Lawrence Lessig. The first is familiar to the techies: this American author and editor is the originator of many vanguard computer and Internet movements; he defined and popularized expressions such as the open source and Web 2.0. Lawrence Lessig, Professor of Law at Stanford University (California), is the founder of Creative Commons licenses, based on the idea of copyleft and free dissemination of knowledge.
Participants of the Sebastopol meeting mostly come from the free software and culture movements. These movements are at the heart of many innovations in the field of computers and the Internet over the last fifteen years. Some of these innovations are now familiar – think of the collaborative encyclopedia Wikipedia. Other open source creations are less known to the general public despite playing a fundamental role in online services: for instance, the Apache software for the servers is used to host most websites.
Some activists and entrepreneurs who already used public data were attending the Sebastopol meeting too: Adrian Holovaty (the founder of EveryBlock, a localized information service) and Briton Tom Steinberg (initiator of the FixMyStreet site). One of the youngest of the group was no other than the late Aaron Swartz, inventor of the RSS and free knowledge activist. Together, they created the principles that allow us today to define and evaluate open public data."
Source: https://www.paristechreview.com/2013/03/29/brief-history-open-data/
The data is made available online free of charge, in a timely manner, in machine-readable form and in an open format
NaDB: The Federal Council expects to make data management in the public sector easier and more efficient by reusing data: Persons and businesses will only need to report certain information once (once only principle).
i14y: National Data Catalogue which ensures the efficient exchange of data between authorities, companies and citizens
The Digital Switzerland Division is part of the Federal Chancellery’s Digital Transformation and ICT Steering (DTI) Sector. The division coordinates the ongoing development and implementation of the Digital Switzerland Strategy.
Government data shall be considered open if it is made public in a way that complies with the principles below:
1. Complete
All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
2. Primary
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
3. Timely
Data is made available as quickly as necessary to preserve the value of the data.
4. Accessible
Data is available to the widest range of users for the widest range of purposes.
5. Machine processable
Data is reasonably structured to allow automated processing.
6. Non-discriminatory
Data is available to anyone, with no requirement of registration.
7. Non-proprietary
Data is available in a format over which no entity has exclusive control.
8. License-free
Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
Ownership and Control:
Copyright grants the creator exclusive rights over their work, including the right to control how it's used, reproduced, and distributed. Others must obtain permission from the copyright holder to use the work, often requiring payment or adherence to specific conditions.
Copyleft, on the other hand, is a concept in the realm of free and open-source software. It allows anyone to use, modify, and distribute the work, but with the stipulation that any derivative work must also be distributed under the same or compatible license terms. This ensures that the work and its derivatives remain free and open.
Purpose and Philosophy:
Copyright is designed to protect the economic interests of creators by granting them exclusive rights to monetize their work. It supports the traditional model of intellectual property rights.
Copyleft is motivated by the idea of promoting freedom and sharing of knowledge. It is intended to keep creative works accessible and reusable for the public, encouraging collaborative improvement and innovation.
Focus and Foundation:
The Anglo-Saxon copyright system (common in countries like the U.S. and the U.K.) is largely focused on the economic rights of authors. It treats copyright as a type of property that can be bought, sold, or transferred, and emphasizes the monetary value of creative works.
The European author's rights model places a stronger emphasis on the moral rights of the creator, alongside the economic rights. This includes the right to be recognized as the author of a work and to object to any distortion or modification that could harm the author's reputation.
Duration and Transferability:
In the Anglo-Saxon system, copyright is often seen as a more transferable and commercial asset. The duration of copyright is typically based on a set number of years post-creation or the author's life plus a certain number of years.
The European model tends to grant authors inalienable moral rights that remain with the creator regardless of the economic rights being sold or transferred. The duration can also vary, but it usually includes the author's lifetime plus a period after their death (commonly 70 years in many European countries).
The rights statements have been specifically developed for the needs of cultural heritage institutions and online cultural heritage aggregation platforms and are not intended to be used by individuals to license their own creations.
One of the specific features of this licence is that attribution is not required in the case of derived content that is not a database but is produced from one, such as graphics, diagrams or maps. [Santos 2020, citing Ball 2014]
Responsible AI Licenses (RAIL) are a class of licenses designed to encourage the responsible use of an AI artifact being licensed by including a set of use restrictions applied to AI artifact. RAILs can be more or less restrictive depending on the aims of the licensor. For instance, a license can be RAIL while being a proprietary license, or a license just allowing the use of the AI feature for research purposes and without allowing distribution of derivative versions.
In contrast, Open & Responsible AI Licenses (OpenRAIL) are a subclass of RAIL licenses that permit free-of-charge open access and re-use of AI artifacts for commercial purposes, while including usage restrictions. Note that usage restrictions in RAIL Licenses also apply to any derivatives of AI artifact.
RAILs can be used to license data (D), Apps (A), models (M), and source code (S). depending on the AI feature(s) you are licensing, you will add suffix D, A, M, or S
Definition of an infrastructure according to Susan Leigh Star
Thee of the nine dimensions...
Embeddedness: Infrastructure is sunk into and inside of other structures, social arrangements, and technologies. People do not necessarily distinguish the several coordinated aspects of infrastructure.
Links with conventions of practice: Infrastructure both shapes and is shaped by the conventions of a community of practice.
Embodiment of standards: Modified by scope and often by conflicting conventions, infrastructure takes on transparency by plugging into other infrastructures and tools in a standardised fashion.
Thee of the nine dimensions...
Embeddedness: Infrastructure is sunk into and inside of other structures, social arrangements, and technologies. People do not necessarily distinguish the several coordinated aspects of infrastructure.
Links with conventions of practice: Infrastructure both shapes and is shaped by the conventions of a community of practice.
Embodiment of standards: Modified by scope and often by conflicting conventions, infrastructure takes on transparency by plugging into other infrastructures and tools in a standardised fashion.
While standards provide the "what" and "why" of metadata, schemas offer the "how" for specific data types or field needs.
DCAT-AP CH is a subprofile of DCAT-AP
Enhances Usability: Makes data more accessible and understandable to a wider audience, including non-experts and developers.
Facilitates Data Quality and Trust: Offers transparency about the data’s sources, methodologies, and underlying code, building trust among users.
Supports Data Integration and Development: Helps in combining data from different sources and in the development of applications using the open data.
Set of rules and standards that govern the exchange and accessibility of open data through the internet.
Enables Accessibility: Facilitates easy and standardized access to open data, essential for fostering innovation and transparency.
Supports Interoperability: Ensures that open data from various sources can be integrated and used together efficiently.
In a nutshell, OA refers to the practice of providing unrestricted access via the Internet to peer-reviewed scholarly research.
Relevance in Arts and Humanities
- Addresses complex cultural materials and narratives with societal implications.
- Promotes cultural understanding and engagement with broader societal issues.
It emphasises users' rights and community benefits, going beyond mere practical advantages. Key projects include the GNU Operating System and Free Software Directory.
FLOSS merges the social and ethical emphasis of Free Software with the pragmatic model of Open Source.
Data oriented principles
Indigenous data sovereignty reinforces the rights to engage in decision-making in accordance with Indigenous values and collective interests.
Data ecosystems shall be designed and function in ways that enable Indigenous Peoples to derive benefit from the data.
Indigenous Peoples’ rights and interests in Indigenous data must be recognised and their authority to control such data be empowered. Indigenous data governance enables Indigenous Peoples and governing bodies to determine how Indigenous Peoples, as well as Indigenous lands, territories, resources, knowledges and geographical indicators, are represented and identified within data.
Those working with Indigenous data have a responsibility to share how those data are used to support Indigenous Peoples’ selfdetermination and collective benefit. Accountability requires meaningful and openly available evidence of these efforts and the benefits accruing to Indigenous Peoples.
Indigenous Peoples’ rights and wellbeing should be the primary concern at all stages of the data life cycle and across the data ecosystem.
FAIR and CARE are complementary perspectives which enable maximum value through the appropriate and ethical reuse of Indigenous data. However, assessing the FAIR-ness of a data set is typically a technical exercise which can be done independently by the researcher to prepare the final data set for reuse. On the other hand, the CARE Principles require engagement with people to address the cultural, ethical, legal, and social dimensions associated with the intended uses of the dataset. As Indigenous communities expect CARE-full data practices to be enacted at each step of the data lifecycle, we will need to reflect a broader temporal dimension to our application of the CARE Principles. At present there is no process to assess whether a research project meets the CARE Principles. Creating such an assessment represents the next stage towards an equitable cyberinfrastructure that supports the FAIR and CARE-full use of Indigenous data.
[Carroll et al. 2021]
The statement highlights the growing global engagement with collections as data. It promotes the responsible computational use of collections to empower memory, knowledge and data practitioners. It emphasises ethical concerns, openness and participatory design, as well as the need for transparent documentation and sustainable infrastructure. The statement, comprising of ten recommendations, also recognises the potential impact of data consumption by AI, and the importance of considering climate impacts and exploitative labour.
Purpose The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use. Design/methodology/approach The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data. Findings Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use. Originality/value While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collections as data principles in daily workflows following best practices and guidelines.
Linked Data refers to a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF, and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.
Linked Open Data is a subset of Linked Data that is open, meaning it is freely accessible and reusable by anyone. It adheres to the principles of being accessible under an open license, available in a machine-readable format, using open standards from the W3C (such as RDF and SPARQL), and linked to other datasets to increase its utility.
This Web, which has claimed to be a Semantic Web for several years now, has a centrepiece known as Resource Description Framework (RDF), a general method for describing and exchanging graph data. The Semantic Web offers major opportunities for scholarship as it allows data to be reasoned together, that is to be understood by machines via those RDF-based ontologies, a formal way to represent human-like knowledge.
With RDF, everything goes in threes, the data model contains so-called triples: that is subject, predicate, object that form graphs.
Most of the components of these triples use Uniform Resource Identifiers (URIs) and are generally web-addressable, whether for naming subjects and objects (which may themselves also be objects of other triples) or relationships
IIIF is a community-driven initiative, which brings together key players in the academic and CH fields, and has defined open and shared APIs to standardise the way in which image-based resources are delivered on the Web. Implementing the IIIF APIs enables institutions to make better use of their digitised or born-digital material by providing, for instance, deep zooming, comparison, full-text search of OCR objects or annotation capabilities.
Organisations
And individuals/meetings
So why do we need IIIF? Digital images are fundamental carriers of information across the fields of cultural heritage, STEM, and others. They help us understand complex processes through visualization. They grab our attention and help us quickly understand abstract concepts. They help document many the past--and the present--and preserve it for the future. They are also ubiquitous: we interact with thousands of them every day both in real life and on the web. In short, images are important and we interact with large volumes of them online.
Image 1: Female Figurine, Chupicuaro, 500/300 B.C
Image 2: Vision of Saint Gregory, unknown artist, n.d.
Image 3: Iyo Province: Saijo, Utagawa Hiroshige, 1855
- Linked Art is focused on usability, not full precision / completeness
- Consistently solves actual challenges from real data
- Development is iterative, as new use cases are found
A Periodic Table of Open Data Elements detailing the enabling conditions and disabling factors that often determine the impact of open data initiatives.
Five main elements:
Problem and Demand Definition
Capacity and Culture
Governance
Parternships
Risks
While Kafka, Flink, and NiFi each serve distinct purposes in data streaming and processing – Kafka for data integration and transportation, Flink for in-depth processing and analytics, and NiFi for flow management and data routing – their combined use can create a comprehensive, efficient, and robust data management architecture.
LUX enhances and prepares Yale collections data for further collaboration, use, and re-use. With the support of a well-resourced research university–including Yale’s Vice-Provost office–in addition to the support of active committees with members across Yale and a meticulous technical team, LUX is well-positioned to help bridge gaps and create more accessible and diverse representations of cultural heritage collections.
[Metcalfe Hurst 2023]
In summary, open data is more than a concept; it's a dynamic ecosystem that thrives on principles, collaboration, and continuous evolution. Its impact spans from enhancing transparency to empowering public engagement, and its future hinges on effective funding, skill development, and collaborative efforts.