Globus Compute is a Function-as-a-Service (FaaS) platform for science that enables you to register functions in a cloud-hosted service and then reliably execute those functions on a remote Globus Compute endpoint. This tutorial is configured to use a tutorial endpoint hosted by the Globus Compute team. You can setup and use your own endpoint by following the Globus Compute documentation.
The Globus Compute Python SDK provides programming abstractions for interacting with the Globus Compute service. Before running this tutorial you should first install the Globus Compute SDK as follows:
$ pip install globus-compute-sdk
The Globus Compute SDK exposes a Client
object for all interactions with the Globus Compute service. In order to use the Globus Compute service, you must first authenticate using one of hundreds of supported identity provides (e.g., your institution, ORCID, Google). As part of the authentication process you must grant permission for Globus Compute to access your identity information (to retrieve your email address) and Globus Groups management access (to share endpoints).
from globus_compute_sdk import Client
gcc = Client()
The following example demonstrates how you can register and execute a function.
Globus Compute works like any other FaaS platform: you must first register a function with Globus Compute before being able to execute it on a remote endpoint. The registration process will serialize the function body and store it securely in the Globus Compute service. As we will see below, you may share functions with others and discover functions that are shared with you.
Upon registration of a function, Globus Compute will return a UUID for that function. This UUID can then be used to manage and invoke the function.
def hello_world():
return "Hello World!"
func_uuid = gcc.register_function(hello_world)
print(func_uuid)
To invoke a function, you must provide a) the function's UUID, and b) the endpoint_id
of the endpoint on which you wish to execute that function. Note: here we use the public Globus Compute tutorial endpoint; you may change the endpoint_id
to the UUID of any endpoint for which you have permission to execute functions.
Globus Compute functions are designed to be executed remotely and asynchrously. To avoid synchronous invocation, the result of a function invocation (called a task
) is a UUID that may be introspected to monitor its execution status and retrieve its results.
The Globus Compute service will manage the reliable execution of a task, for example, by qeueing tasks when the endpoint is busy or offline and retrying tasks in case of node failures.
tutorial_endpoint = '4b116d3c-1703-4f8f-9f6f-39921e5864df' # Public tutorial endpoint
res = gcc.run(endpoint_id=tutorial_endpoint, function_id=func_uuid)
print(res)
When the task has completed executing you can access the results via the Globus Compute client as follows. Note: while the task is processing it will return exceptions at various stages of the lifecycle (e.g., waiting for the endpoint).
try:
print(gcc.get_result(res))
except Exception as e:
print("Exception: {}".format(e))
Globus Compute supports registration and invocation of functions with arbitrary arguments and returned parameters. Globus Compute will serialize any *args and **kwargs when invoking a function and it will serialize any return parameters or exceptions. Note: Globus Compute uses standard Python serilaization libraries (e.g., Pickle, Dill). It also limits the size of input arguments and returned parameters to 5 MB.
The following example shows a function that computes the sum of a list of input arguments. First, we register the function as above:
def get_sum(items):
return sum(items)
sum_function = gcc.register_function(get_sum)
When invoking the function you can pass in arguments like any other function, either by position or with keyword arguments.
items = [1, 2, 3, 4, 5]
res = gcc.run(items, endpoint_id=tutorial_endpoint, function_id=sum_function)
print (gcc.get_result(res))
Globus Compute requires that functions explictly state all dependencies within the function body. It also assumes that the dependent libraries are available on the endpoint in which the function will execute. For example, in the following function, we explictly import the time module.
def get_date():
from datetime import date
return date.today()
date_function = gcc.register_function(get_date)
res = gcc.run(endpoint_id=tutorial_endpoint, function_id=date_function)
print (gcc.get_result(res))
Depending on the configuration of the Globus Compute endpoint, you can often invoke external applications that are available in the endpoint environment.
def echo(name):
import os
return os.popen("echo Hello {}".format(name)).read()
echo_function = gcc.register_function(echo)
res = gcc.run("World", endpoint_id=tutorial_endpoint, function_id=echo_function)
print (gcc.get_result(res))
When functions fail, the exception is captured and serialized by the Globus Compute endpoint, and reraised when you try to get the result. In the following example, the 'deterministic failure' exception is raised when fxc.get_result
is called on the failing function.
def failing():
raise Exception("deterministic failure")
failing_function = gcc.register_function(failing)
res = gcc.run(endpoint_id=tutorial_endpoint, function_id=failing_function)
gcc.get_result(res)
After registering a function you can invoke it repeatedly. The following example shows how the Monte Carlo method can be used to estimate pi.
Specifically, if a circle with radius $r$ is inscribed inside a square with side length $2r$, the area of the circle is $\pi r^2$ and the area of the square is $(2r)^2$. Thus, if $N$ uniformly-distributed points are dropped at random locations within the square, approximately $N\pi/4$ will be inside the circle.
import time
# function that estimates pi by placing points in a box
def pi(num_points):
from random import random
inside = 0
for i in range(num_points):
x, y = random(), random() # Drop a point randomly within the box.
if x**2 + y**2 < 1: # Count points within the circle.
inside += 1
return (inside*4 / num_points)
# register the function
pi_function = gcc.register_function(pi)
# execute the function 3 times
estimates = []
for i in range(3):
estimates.append(gcc.run(10**5, endpoint_id=tutorial_endpoint, function_id=pi_function))
# wait for all tasks to complete
for e in estimates:
while gcc.get_task(e)['pending']:
time.sleep(3)
# get the results and calculate the total
results = [gcc.get_result(i) for i in estimates]
total = 0
for r in results:
total += r
# print the results
print("Estimates: {}".format(results))
print("Average: {:.5f}".format(total/len(results)))
After registering a function, you might want to invoke that function many times without making individual calls to the Globus Compute service. Such examples occur when running Monte Carlo simulations, ensembles, and parameter sweep applications.
Globus Compute provides a batch interface that enables specification of a range of function invocations. To use this interface, you must create a Globus Compute batch object and then add each invocation to that object. You can then pass the constructed object to the batch_run
interface.
gcc = Client()
def squared(x):
return x**2
squared_function = gcc.register_function(squared)
inputs = list(range(10))
batch = gcc.create_batch()
for x in inputs:
batch.add(x, endpoint_id=tutorial_endpoint, function_id=squared_function)
batch_res = gcc.batch_run(batch)
Similary, Globus Compute provides an interface to retrieve the status of the entire batch of invocations.
gcc.get_batch_result(batch_res)
You can retrieve information about endpoints including status and information about how the endpoint is configured.
gcc.get_endpoint_status(tutorial_endpoint)