Jupyter SPARQL Fun

Bob DuCharme

Setup

First, load the libraries that will be used by code in later cells and define a function to output query results as a nice HTML table. If the contents of this next cell got much longer (while staying as re-usable) I'd move it to a separate library and just import that.

In [15]:
import rdflib
from IPython.core.display import display, HTML
import RDFClosure #  install from https://github.com/RDFLib/OWL-RL

def queryResultToHTMLTable(queryResult):
   HTMLResult = '<table><tr style="color:white;background-color:gray;font-weight:bold">'
   # print variable names
   for varName in queryResult.vars:
       HTMLResult = HTMLResult + '<td>' + varName + '</td>'
   HTMLResult = HTMLResult + '</tr>'
   # print values from each row
   for row in queryResult:
      HTMLResult = HTMLResult + '<tr>'   
      for column in row:
         HTMLResult = HTMLResult + '<td>' + column + '</td>'
      HTMLResult = HTMLResult + '</tr>'
   HTMLResult = HTMLResult + '</table>'
   display(HTML(HTMLResult))

# In a fancier script, you may want to create more than one Graph, but 
# I'll create just one and keep it in the setup section for simplicity. 
diskFileGraph = rdflib.Graph()   

Most of the remaining Python you see here is rdflib code. rdflib and I go way back. Here, I load a sample data file from Learning SPARQL.

In [3]:
triples = diskFileGraph.parse("lq012.ttl",format="turtle")

Running a query

This is the heart of using SPARQL with Jupyter: put the query between the pair of triple quotes in the following. (Triple quotes let you create multi-line strings in Python.) That function call performs the query, and the line after that calls the queryResultToHTMLTable() function that I defined up above to output the query result as an HTML table.

I could have combined everything in this cell into one nested function call instead of storing the query result in queryResult and then passing that to my function. I could have even put the formatting and the call to triples.query() into one function, but I chose not to.

In [5]:
queryResult = triples.query("""
SELECT *
  WHERE
  {?s ?p ?o}
""")

queryResultToHTMLTable(queryResult)
spo
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#email[email protected]
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#lastNameEllis
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#homeTel(229) 276-5135
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#email[email protected]
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#homeTel(245) 646-5488
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#email[email protected]
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#lastNameMutt
http://learningsparql.com/ns/data#i0432http://learningsparql.com/ns/addressbook#firstNameRichard
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#firstNameCindy
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#lastNameMarshall
http://learningsparql.com/ns/data#i9771http://learningsparql.com/ns/addressbook#email[email protected]
http://learningsparql.com/ns/data#i8301http://learningsparql.com/ns/addressbook#firstNameCraig

Adding data to the in-memory graph

The disk file graph is quite mutable. Here, we add more data to it, leaving the original data in there. See my blog entry Trying Out Blazegraph to see all of the triples in the furniture data and how I used it to demo some inferencing.

In [17]:
triples = diskFileGraph.parse("furniture.ttl",format="turtle")

The next query retrieves a list of the predicates used in the data, and we see the original address book predicates from the lq012.ttl file plus some new ones from the furniture data. Note how the cell is identical to the one above with the SPARQL query, except for the query itself. If you're running this with your own Jupyter server, you can modify the query or copy the cell's contents into new cells and execute those.

In [7]:
queryResult = triples.query("""
SELECT DISTINCT ?p
  WHERE
  {?s ?p ?o}
""")

queryResultToHTMLTable(queryResult)
p
http://learningsparql.com/ns/addressbook#lastName
http://learningsparql.com/ns/addressbook#homeTel
http://learningsparql.com/ns/demo#locatedIn
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://learningsparql.com/ns/addressbook#email
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://learningsparql.com/ns/addressbook#firstName

SPARQL UPDATE queries

You can run SPARQL UPDATE queries using the diskFileGraph class's update method.

In [8]:
triples.update("DELETE {?s ?p ?o} where { ?s ?p ?o }")
In [9]:
# Show all predicates again
queryResult = triples.query("""
SELECT DISTINCT ?p
  WHERE
  {?s ?p ?o}
""")

queryResultToHTMLTable(queryResult)
p

This query result shows that there are no predicates, because there are no triples. The previous command deleted them all.

Inferencing

The RDFClosure library above lets us do inferencing with the in-memory graph. I described how I used it to do data integration on a Hadoop cluster in Driving Hadoop data integration with standards-based models instead of code. Here, we'll start by reading the furniture data back into memory, and then we'll tell the library to do OWL RL inferencing.

In [19]:
triples = diskFileGraph.parse("furniture.ttl",format="turtle")
In [18]:
# do OWL RL inferencing
RDFClosure.DeductiveClosure(RDFClosure.OWLRL_Semantics).expand(triples)

Next, we'll query for furniture in building100. As described in Trying Out Blazegraph, the furniture data had no triples saying that there was furniture in that building, but it had triples saying that desks and chairs were furniture, that a certain desk and two chairs were locatedIn rooms that were locatedIn that building, and that the locatedIn property is transitive. (Semantics!) This was enough for the inferencing engine to infer what furniture was in building100:

In [13]:
queryResult = triples.query("""
PREFIX dm: <http://learningsparql.com/ns/demo#> 
PREFIX d: <http://learningsparql.com/ns/data#> 
SELECT ?furniture
WHERE 
{ 
  ?furniture a dm:Furniture .
  ?furniture dm:locatedIn d:building100 . 
}
""")
queryResultToHTMLTable(queryResult)
furniture
http://learningsparql.com/ns/data#desk22
http://learningsparql.com/ns/data#chair15
http://learningsparql.com/ns/data#chair23