Objectives

  • install Nexus Forge,
  • configure a Knowledge Graph forge,
  • download the MovieLens data
  • transform data,
  • load the transformed data into the project,
  • search for data using a SPARQL query.

Prerequisites

Installing Nexus Forge

In [ ]:
!pip install nexusforge

Import Libraries

In [ ]:
import getpass
import yaml
import pandas as pd
import numpy as np
import nexussdk as nxs
from kgforge.core import KnowledgeGraphForge

Setup

Then you need to define the Nexus Sandbox API endpoint, as well as the organization and project configured in the first part of the tutorial. Please remember to change to the appropriate organization and project in the code below.

In [ ]:
ORGANIZATION = "tutorialnexus"
PROJECT = "mytutorial" # Provide your project label here
DEPLOYMENT = "https://sandbox.bluebrainnexus.io/v1"

In order to authenticate yourself, go to the Nexus Sandbox and copy your token. You can then run the following line of code and input the token:

In [ ]:
TOKEN = getpass.getpass() # Provide your Blue Brain Nexus token. It can be obtained after login into the sandbox: https://sandbox.bluebrainnexus.io/web/

As we will be using the MovieLens data, it's useful to describe the context for the entities that we want to import in the knowledge graph. Here's an example of how to define the context.

In [ ]:
context = {
  "@id": "https://context.org",
  "@context": {
    "@vocab": "https://sandbox.bluebrainnexus.io/v1/vocabs/",
    "schema": "http://schema.org/",
    "Movie": {
      "@id": "schema:Movie"
    },
    "Rating": {
      "@id": "schema:Rating"
    }
  }
}

The next step is thus to push that context as a resource to your project (i.e. knowledge graph):

In [ ]:
nxs.config.set_environment(DEPLOYMENT)
nxs.config.set_token(TOKEN)
nxs.resources.create(ORGANIZATION, PROJECT, context)

To let Nexus Forge work with the Nexus Delta store, we need a bit of configuration. Let's write the configuration.

In [ ]:
config = {
    "Model": {
        "name": "RdfModel",
        "origin": "store",
        "source": "BlueBrainNexus",
        "context": {
            "iri": "https://context.org",
            "bucket": f"{ORGANIZATION}/{PROJECT}"
        }
    },
    "Store": {
        "name": "BlueBrainNexus",
        "endpoint": DEPLOYMENT,
        "versioned_id_template": "{x.id}?rev={x._store_metadata._rev}",
        "file_resource_mapping": "../../configurations/nexus-store/file-to-resource-mapping.hjson",
    },
    "Formatters": {
        "identifier": "https://movielens.org/{}/{}"
    }
}

Now setup the Forge:

In [ ]:
forge = KnowledgeGraphForge(config, token=TOKEN, bucket=f"{ORGANIZATION}/{PROJECT}")

MovieLens Data

Let's download the MovieLens datasets and load the data in Python:

In [ ]:
# Download the data using curl and unzipping the file
# Please note that the prefix '!' is meant to execute a shell command from inside a notebook, you will have to remove it if you do it from a terminal.
!curl -s -O http://files.grouplens.org/datasets/movielens/ml-latest-small.zip && unzip -qq ml-latest-small.zip && cd ml-latest-small && ls 

directory  = "./ml-latest-small" # Location of the files (from unzipping)

movies_df  = pd.read_csv(f"{directory}/movies.csv")
ratings_df = pd.read_csv(f"{directory}/ratings.csv", dtype={"movieId":"string"})
tags_df    = pd.read_csv(f"{directory}/tags.csv", dtype={"movieId":"string"})
links_df   = pd.read_csv(f"{directory}/links.csv")

movies_links_df = pd.merge(movies_df, links_df, on='movieId') # Merge movies and links

Resources in Nexus Forge

Let's define the types of our data frames:

In [ ]:
movies_links_df["type"] = "Movie"
ratings_df["type"] = "Rating"
tags_df["type"] = "Tag"

We can also apply some data transformations. We split the genres, and format the Id:

In [ ]:
movies_links_df["id"] = movies_links_df["movieId"].apply(lambda x: forge.format("identifier", "movies", x))
movies_links_df["genres"] = movies_links_df["genres"].apply(lambda x: x.split("|"))
ratings_df["movieId.id"] = movies_links_df["movieId"].apply(lambda x: forge.format("identifier", "movies", x))
tags_df["movieId.id"] = tags_df["movieId"].apply(lambda x: forge.format("identifier", "movies", x))

Finally, let's register these data frames as Forge resources:

In [ ]:
movies_resources = forge.from_dataframe(movies_links_df, np.nan, ".")
ratings_resources = forge.from_dataframe(ratings_df, np.nan, ".")
tags_resources = forge.from_dataframe(tags_df, np.nan, ".")

Visualize the results:

In [ ]:
print(movies_resources[0])
print(ratings_resources[0])
print(tags_resources[629])

Register Resources into Nexus Delta

Now that we have the resources, let's push them to our Sandbox deployment:

In [ ]:
forge.register(movies_resources)
forge.register(ratings_resources)
forge.register(tags_resources)

That's it! You can check your project in the web interface to see the newly created resources.

Query Resources with Nexus Forge from Nexus Delta

As the resources are being indexed in the elasticsearch and blazegraph indices, that means that we can soon query those resources using a SPARQL query.

If you are new to SPARQL, that's ok, you can watch this introduction video before moving further.

Let's list "thought-provoking" movies. First we write the query:

In [ ]:
query = """
  PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/>
  PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
  SELECT DISTINCT ?id ?title
  WHERE {
  ?id nxv:self ?self ;
      nxv:deprecated false ;
      vocab:title ?title ;
      ^vocab:movieId / vocab:tag "thought-provoking" .
  }
"""

We can then use the Forge to query the endpoint:

In [ ]:
resources = forge.sparql(query, limit=100, debug=True)
set(forge.as_dataframe(resources).title)
movie = forge.retrieve(resources[0].id)
print(movie)
movie._store_metadata

If you want to try some other examples of Nexus Forge, you can use these notebooks.

The next step is to use this query to create a Studio view in Nexus Fusion.