Exercise 3: Up to date with Linked Data
Up to date with Linked Data
Linked Data is about publishing structured data on the web so that it can be interlinked and become more useful through semantic queries. It extends standard web technologies (HTTP, URIs) to share information in a machine-readable way.
The Four Principles of Linked Data
Tim Berners-Lee defined four principles for publishing data on the web:
- Use URIs as names for things - Give everything a unique web address
- Use HTTP URIs - Make those addresses accessible via the web
- Provide useful information - When someone looks up a URI, return data using standards (RDF, SPARQL)
- Include links to other URIs - Enable discovery of related information
Understanding RDF Triples
RDF (Resource Description Framework) represents data as triples: subject-predicate-object statements.
Example: “Vincent van Gogh created the Starry Night painting”
subject: <http://example.org/artist/vangogh>
predicate: <http://example.org/vocab/created>
object: <http://example.org/artwork/starrynight>
Turtle Syntax
Turtle (Terse RDF Triple Language) is a human-readable syntax for RDF. Let’s look at a cultural heritage example:
@prefix ex: <http://example.org/> .
@prefix schema: <http://schema.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
ex:vangogh a foaf:Person ;
foaf:name "Vincent van Gogh" ;
foaf:birthday "1853-03-30" ;
schema:nationality "Dutch" ;
ex:created ex:starrynight .
ex:starrynight a schema:Painting ;
schema:name "The Starry Night" ;
schema:dateCreated "1889-06" ;
schema:material "Oil on canvas" ;
schema:creator ex:vangogh ;
schema:location ex:moma .
ex:moma a schema:Museum ;
foaf:name "Museum of Modern Art" ;
schema:location "New York, USA" .Key elements:
@prefixdeclares namespace prefixesameans “is of type”;continues statements about the same subject.ends a group of statements
SPARQL Queries
SPARQL is the query language for RDF data. It allows you to find patterns in the data.
Basic SPARQL query structure:
Find all paintings:
Find artworks created by Van Gogh:
Exercises (15-20 minutes)
A: Understanding Turtle Syntax (4-5 minutes)
Look at the following Turtle snippet describing a manuscript:
@prefix ex: <http://library.org/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix schema: <http://schema.org/> .
ex:manuscript_042 a schema:Book ;
dc:title "Grandes Chroniques de France" ;
dc:creator "Anonymous" ;
schema:dateCreated "1375" ;
ex:heldBy ex:chateauroux_library .
ex:chateauroux_library a schema:Library ;
schema:name "Bibliothèque municipale de Châteauroux" ;
schema:location "France" .- What type of resource is
ex:manuscript_042? - How many triples have
ex:manuscript_042as their subject? - What relationship links the manuscript to the library?
- Type of resource: Pay attention to the keyword
ain Turtle — it is shorthand forrdf:type. What comes right afteraon the line whereex:manuscript_042is first described? - Counting triples: Each property-value pair (including the
astatement) about a subject is one triple. Remember that;separates multiple triples sharing the same subject. Count each line carefully. - Relationship to the library: Look for the triple where the object is
ex:chateauroux_library. The predicate (property) in that triple is your answer.
B: Real-World SPARQL Exploration (6-8 minutes)
Visit the Wikidata Query Service: https://query.wikidata.org/
Try this query to find Van Gogh paintings with images:
- How many paintings with images did you find?
- Try modifying the query to find paintings by a different artist (hint: search Wikidata for another artist’s Q-number, e.g., Pablo Picasso is
wd:Q5593). What artist did you choose and how many results did you get?
- Number of paintings: After running the query, the Wikidata Query Service shows the total result count at the bottom of the results table. You should find several dozen results for Van Gogh.
- Modifying the query: The only part you need to change is
wd:Q5582(Van Gogh’s identifier). To find another artist’s Q-number, use the Wikidata search bar at wikidata.org and look for the Q-identifier on their page. Replace it in thewdt:P170line. Some artists may have significantly more or fewer results depending on how well their works are documented on Wikidata.
C: Reflection Questions (5-7 minutes)
- What are the main advantages of using Linked Data for cultural heritage collections?
- What challenges might institutions face when implementing Linked Data?
- Based on your experience with the Wikidata Query Service, what makes a SPARQL endpoint useful or difficult to use?
- Advantages: Think about what happens when collections from different institutions can reference the same entities (people, places, concepts). Consider aspects like discoverability, interoperability across systems, and the ability to enrich records by linking to external knowledge bases.
- Challenges: Consider the practical side — what does an institution need in terms of expertise, tooling, and ongoing maintenance? Think also about data quality, choosing the right vocabularies/ontologies, and dealing with legacy cataloguing systems.
- SPARQL usability: Reflect on your own experience — was the query language intuitive? Did features like auto-completion help? Think about the balance between the power of the query language and the learning curve for newcomers.
Additional Resources
- Linked Data Principles: https://www.w3.org/DesignIssues/LinkedData.html
- Turtle Specification: https://www.w3.org/TR/turtle/
- SPARQL Tutorial: https://www.w3.org/TR/sparql11-query/
- Wikidata SPARQL Examples: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples