Preparations

Let's start by making some necessary imports and definitions. You have have to install requests first by running pip install --user requests.

In [33]:
import requests
import pprint
import sys

GRAPHQL = 'http://api.catalysis-hub.org/graphql'

def fetch(query):
    return requests.get(
        GRAPHQL, {'query': query}
    ).json()['data']

List of Publications

Let's start flexing our quering muscles by quering a list of publications.

In [21]:
raw_publications = fetch("""{publications {
  edges {
    node {
      id
      authors
      title
      journal
      year
      doi
    }
  }
}}
""")['publications']['edges']
publications = list(map(lambda x: x['node'], raw_publications))
pprint.pprint(publications[:3])
[{'authors': '["Boes, Jacob"]',
  'doi': None,
  'id': 'UHVibGljYXRpb246NjA=',
  'journal': None,
  'title': 'Adsorption energies on fcc 111 transition metals',
  'year': 2018},
 {'authors': '["Montoya, Joseph H.", "Tsai, Charlie", "Vojvodic, Aleksandra", '
             '"Norskov, Jens K."]',
  'doi': '10.1002/cssc.201500322',
  'id': 'UHVibGljYXRpb246NjQ=',
  'journal': 'ChemSusChem',
  'title': 'The Challenge of Electrochemical Ammonia Synthesis: A New '
           'Perspective on the Role of Nitrogen Scaling Relations',
  'year': 2015},
 {'authors': '["Catapp"]',
  'doi': None,
  'id': 'UHVibGljYXRpb246MzM=',
  'journal': None,
  'title': None,
  'year': 2012}]

We only show the first 3 results here for brevity but of you can retrieve the full list by removing the [:3] slice. The ['edges']['node'] may seem a little annoying, but it will allow us to responses that would be too large for a single request, as we will see below.

Query Reactions

Next, let's query some reactions. This is the same type of query as you would get from the Reaction Energetics App. Let's get all energies that end with CO adsorbed on the surface and some Palladium in the surface. The tilde (~) before the Pd indicates that the field only has to contain Pd. If you want the exact match, drop the tilde. Here we have al

In [25]:
fetch("""
{reactions(first: 10, products:"CO", chemicalComposition:"~Pd") {
  totalCount
  pageInfo {
    hasNextPage
    hasPreviousPage
    startCursor
    endCursor
  }
  edges {
    node {
      reactants
      products
      Equation
      reactionEnergy
      chemicalComposition
    }
  }
}}
""")
Out[25]:
{'reactions': {'totalCount': 74,
  'pageInfo': {'hasNextPage': True,
   'hasPreviousPage': False,
   'startCursor': 'YXJyYXljb25uZWN0aW9uOjA=',
   'endCursor': 'YXJyYXljb25uZWN0aW9uOjk='},
  'edges': [{'node': {'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': -2.01383127677,
     'chemicalComposition': 'Pd4'}},
   {'node': {'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': -1.74274934594,
     'chemicalComposition': 'Pd36'}},
   {'node': {'reactants': '{"star": 1, "CHCOstar": 1}',
     'products': '{"CHstar": 1, "COstar": 1}',
     'Equation': 'CHCO* + * -> CH* + CO*',
     'reactionEnergy': -0.971622912111,
     'chemicalComposition': 'Pd36'}},
   {'node': {'reactants': '{"CHOstar": 1}',
     'products': '{"COstar": 1, "hfH2gas": 1}',
     'Equation': 'CHO* -> hfH2(g) + CO*',
     'reactionEnergy': -0.71475,
     'chemicalComposition': 'Co3Pd'}},
   {'node': {'reactants': '{"star": 1, "CH3COstar": 1}',
     'products': '{"COstar": 1, "CH3star": 1}',
     'Equation': 'CH3CO* + * -> CH3* + CO*',
     'reactionEnergy': -0.531420322484,
     'chemicalComposition': 'Pd36'}},
   {'node': {'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': -0.356524751,
     'chemicalComposition': 'Zn3Pd'}},
   {'node': {'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': -0.466524751,
     'chemicalComposition': 'Cd3Pd'}},
   {'node': {'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': -0.968702757017,
     'chemicalComposition': 'Pd4'}},
   {'node': {'reactants': '{"star": 1, "CHOstar": 1}',
     'products': '{"Hstar": 1, "COstar": 1}',
     'Equation': 'CHO* + * -> CO* + H*',
     'reactionEnergy': -1.23463251346,
     'chemicalComposition': 'Pd36'}},
   {'node': {'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': 0.97,
     'chemicalComposition': 'HH- Pd-MoS2'}}]}}

Query Systems

Next up is systems. We use a different filter to filter for energies > -14 eV. So that should gives use from H or H2 at best.

In [26]:
fetch("""
{systems(first: 100, energy: -14, op:">") {
  totalCount
  edges {
    node {
      id
    	Formula
      Cifdata
      energy
      calculatorParameters
    }
  }
}}
""")
Out[26]:
{'systems': {'totalCount': 3,
  'edges': [{'node': {'id': 'U3lzdGVtOjEyNTA4',
     'Formula': 'H2',
     'Cifdata': 'data_image0\n_cell_length_a       14\n_cell_length_b       15\n_cell_length_c       16.7372\n_cell_angle_alpha    90\n_cell_angle_beta     90\n_cell_angle_gamma    90\n\n_symmetry_space_group_name_H-M    "P 1"\n_symmetry_int_tables_number       1\n\nloop_\n  _symmetry_equiv_pos_as_xyz\n  \'x, y, z\'\n\nloop_\n  _atom_site_label\n  _atom_site_occupancy\n  _atom_site_fract_x\n  _atom_site_fract_y\n  _atom_site_fract_z\n  _atom_site_thermal_displace_type\n  _atom_site_B_iso_or_equiv\n  _atom_site_type_symbol\n  H1       1.0000 0.50000  0.50000  0.52241  Biso   1.000  H\n  H2       1.0000 0.50000  0.50000  0.47760  Biso   1.000  H\n',
     'energy': -6.7714919,
     'calculatorParameters': '{}'}},
   {'node': {'id': 'U3lzdGVtOjI5Mzc=',
     'Formula': 'H2',
     'Cifdata': 'data_image0\n_cell_length_a       14\n_cell_length_b       15\n_cell_length_c       16.7372\n_cell_angle_alpha    90\n_cell_angle_beta     90\n_cell_angle_gamma    90\n\n_symmetry_space_group_name_H-M    "P 1"\n_symmetry_int_tables_number       1\n\nloop_\n  _symmetry_equiv_pos_as_xyz\n  \'x, y, z\'\n\nloop_\n  _atom_site_label\n  _atom_site_occupancy\n  _atom_site_fract_x\n  _atom_site_fract_y\n  _atom_site_fract_z\n  _atom_site_thermal_displace_type\n  _atom_site_B_iso_or_equiv\n  _atom_site_type_symbol\n  H1       1.0000 0.50000  0.50000  0.52243  Biso   1.000  H\n  H2       1.0000 0.50000  0.50000  0.47757  Biso   1.000  H\n',
     'energy': -6.75954945,
     'calculatorParameters': '{}'}},
   {'node': {'id': 'U3lzdGVtOjI3MjY=',
     'Formula': 'H2',
     'Cifdata': 'data_image0\n_cell_length_a       14\n_cell_length_b       15\n_cell_length_c       16.7372\n_cell_angle_alpha    90\n_cell_angle_beta     90\n_cell_angle_gamma    90\n\n_symmetry_space_group_name_H-M    "P 1"\n_symmetry_int_tables_number       1\n\nloop_\n  _symmetry_equiv_pos_as_xyz\n  \'x, y, z\'\n\nloop_\n  _atom_site_label\n  _atom_site_occupancy\n  _atom_site_fract_x\n  _atom_site_fract_y\n  _atom_site_fract_z\n  _atom_site_thermal_displace_type\n  _atom_site_B_iso_or_equiv\n  _atom_site_type_symbol\n  H1       1.0000 0.50000  0.50000  0.52241  Biso   1.000  H\n  H2       1.0000 0.50000  0.50000  0.47760  Biso   1.000  H\n',
     'energy': -6.7714919,
     'calculatorParameters': '{}'}}]}}

Combining Queries and Stepping Through Large Queries

The main tables that catalysis-hub.org offers are reactions, systems, and publications. Often it is useful to query more than one table at once (i.e. SQL join) to filter one table but get the data from a different table associated with it. Example we want to filter for a certain type of reaction and get the structures associated with it.

In [28]:
reaction_systems = fetch("""{reactions(first: 1, after:"", products:"CO", chemicalComposition:"~Pd") {
  totalCount
  pageInfo {
    hasNextPage
    hasPreviousPage
    startCursor
    endCursor
  }
  edges {
    node {
      id
      reactants
      products
      Equation
      reactionEnergy
      chemicalComposition
      systems{
        InputFile(format:"vasp")
      }
    }
  }
}}
""")
reaction_systems
Out[28]:
{'reactions': {'totalCount': 74,
  'pageInfo': {'hasNextPage': True,
   'hasPreviousPage': False,
   'startCursor': 'YXJyYXljb25uZWN0aW9uOjA=',
   'endCursor': 'YXJyYXljb25uZWN0aW9uOjA='},
  'edges': [{'node': {'id': 'UmVhY3Rpb246OTI=',
     'reactants': '{"star": 1, "COgas": 1}',
     'products': '{"COstar": 1}',
     'Equation': 'CO(g) + * -> CO*',
     'reactionEnergy': -2.01383127677,
     'chemicalComposition': 'Pd4',
     'systems': [{'InputFile': ' C  O Pd \n 1.0000000000000000\n    11.2810760000000005    0.0000000000000000    0.0000000000000000\n     5.6405399999999997    9.7697000000000003    0.0000000000000000\n     0.0000000000000000    0.0000000000000000   20.9082210000000011\n   1   1  64\nCartesian\n  1.4101346666666701  0.8141416666666670 15.2669614237632008\n  1.4101346666666701  0.8141416666666670 16.4476892742715997\n -0.0105757391761526 -0.0061056535666691 13.9826267161823008\n  1.4101350719169199  2.4546371834091101 13.9826267161823008\n  2.8197367200103298  4.8848366356889299 13.8956023395766000\n  4.2301502216020603  7.3277440144596904 13.8956023395766000\n  2.8308448830099899 -0.0061056578967144 13.9826267161823008\n  4.2594496261227102  2.4591947404922401 13.8967557275773999\n  5.6390405883208699  4.8857154407210297 13.8924823774328008\n  7.0506740719169203  7.2937363790691201 13.8967557275773999\n  5.6402595703251999 -0.0004547779845310 13.8956023395766000\n  7.0506730719169202  2.4406949911171898 13.8924823774328008\n  8.4623065555129493  4.8857154401075098 13.8924823774328008\n  9.8711979222318096  7.3277440143553401 13.8956023395766000\n  8.4610855735086492 -0.0004547780985577 13.8956023395766000\n  9.8418965177111204  2.4591947523843500 13.8967557275773999\n 11.2816104238234995  4.8848366354705801 13.8956023395766000\n 12.6912120719168993  7.3272752906485801 13.8955858240988999\n  2.8189270855692601  1.6275084501303201 11.6033213505450004\n  4.2297032849700500  4.0706417361641396 11.5841631011236998\n  5.6402465574528096  6.5137738846480797 11.5841631011236998\n  7.0506743516650996  8.9571083225495993 11.6033213505450004\n  5.6401300791823497  1.6277096017355099 11.5841631011236998\n  7.0506733516651003  4.0707084074202298 11.5863926674548008\n  8.4611011458774197  6.5137738845281801 11.5841631011236998\n  9.8610284784020106  8.9612827632587102 11.6035304482832995\n  8.4612156241478793  1.6277096015683501 11.5841631011236998\n  9.8716434183601400  4.0706417358770901 11.5841631011236998\n 11.2805271761174009  6.5128157635844000 11.5876907260538999\n 12.6912123516651008  8.9561936953171308 11.5876907260538999\n 11.2824186177609000  1.6275084495807499 11.6033213505450004\n 12.6912113516650997  4.0592596998026798 11.6035304482832995\n 14.1018965272127996  6.5128157633591401 11.5876907260538999\n 15.5213962249281998  8.9612827591992907 11.6035304482832995\n  1.4101346652565301  0.8141416658525250  9.3027403703600005\n  2.8202696652565300  3.2565666658525201  9.3027403703600005\n  4.2304046652565299  5.6989916658525299  9.3027403703600005\n  5.6405396652565303  8.1414166658525193  9.3027403703600005\n  4.2304036652565298  0.8141416658525250  9.3027403703600005\n  5.6405386652565301  3.2565666658525201  9.3027403703600005\n  7.0506736652565296  5.6989916658525299  9.3027403703600005\n  8.4608086652565309  8.1414166658525193  9.3027403703600005\n  7.0506726652565304  0.8141416658525250  9.3027403703600005\n  8.4608076652565298  3.2565666658525201  9.3027403703600005\n  9.8709426652565302  5.6989916658525299  9.3027403703600005\n 11.2810776652565004  8.1414166658525193  9.3027403703600005\n  9.8709416652565292  0.8141416658525250  9.3027403703600005\n 11.2810766652564993  3.2565666658525201  9.3027403703600005\n 12.6912116652564997  5.6989916658525299  9.3027403703600005\n 14.1013466652565000  8.1414166658525193  9.3027403703600005\n  0.0000000000000000  0.0000000000000000  7.0000001319882204\n  1.4101349999999999  2.4424250000000001  7.0000001319882204\n  2.8202699999999998  4.8848500000000001  7.0000001319882204\n  4.2304050000000002  7.3272750000000002  7.0000001319882204\n  2.8202690000000001  0.0000000000000000  7.0000001319882204\n  4.2304040000000001  2.4424250000000001  7.0000001319882204\n  5.6405390000000004  4.8848500000000001  7.0000001319882204\n  7.0506739999999999  7.3272750000000002  7.0000001319882204\n  5.6405380000000003  0.0000000000000000  7.0000001319882204\n  7.0506729999999997  2.4424250000000001  7.0000001319882204\n  8.4608080000000001  4.8848500000000001  7.0000001319882204\n  9.8709430000000005  7.3272750000000002  7.0000001319882204\n  8.4608070000000009  0.0000000000000000  7.0000001319882204\n  9.8709419999999994  2.4424250000000001  7.0000001319882204\n 11.2810769999999998  4.8848500000000001  7.0000001319882204\n 12.6912120000000002  7.3272750000000002  7.0000001319882204\n'},
      {'InputFile': ' C  O \n 1.0000000000000000\n    14.0000000000000000    0.0000000000000000    0.0000000000000000\n     0.0000000000000000   15.0000000000000000    0.0000000000000000\n     0.0000000000000000    0.0000000000000000   16.0000000000000000\n   1   1\nCartesian\n  6.9999985159999998  7.4999951400000002  7.4337279040000004\n  7.0000014840000002  7.5000048599999998  8.5662720960000005\n'},
      {'InputFile': 'Pd \n 1.0000000000000000\n     2.8202690000000001    0.0000000000000000    0.0000000000000000\n     1.4101349999999999    2.4424250000000001    0.0000000000000000\n     0.0000000000000000    0.0000000000000000   20.9082210000000011\n   4\nCartesian\n  0.0000000000000000  0.0000000000000000  7.0000001319882204\n  1.4101346652565301  0.8141416658525250  9.3027403703600005\n  2.8202693516650998  1.6282834074202299 11.5917470855843998\n  0.0000000719169190  0.0000002906485750 13.9042749849117993\n'}]}}]}}

One constraint we have to work with is that our server times out requests after 30 seconds (gives others a chance to query, too). Especially when generating a lot of structures we can quickly run into this limitation. To get around this we can use the pageInfo attributes as well as the first and after keywords to roll our own pagination and combine the whole list. We will do simple loop that doesn't end and break out of it, when the pageInfo indicates that we are done. To step through a large query, do this:

In [39]:
end_cursor = ''
reaction_systems = {}

while True:
    response = fetch("{reactions(first: 5, after:\"" + end_cursor + """", products:"CO", chemicalComposition:"~Pd") {
  totalCount
  pageInfo {
    hasNextPage
    hasPreviousPage
    startCursor
    endCursor
  }
  edges {
    node {
      id
      reactants
      products
      Equation
      reactionEnergy
      chemicalComposition
      systems{
        InputFile(format:"vasp")
      }
    }
  }
}}""")
    for edge in response['reactions']['edges']:
        reaction_systems[edge['node']['id']] = edge['node']
    # Book-keeping for pagination
    if not response['reactions']['pageInfo']['hasNextPage']:
        sys.stdout.write(' Done!\n')
        break

    end_cursor = response['reactions']['pageInfo']['endCursor']
    sys.stdout.write('.')
.............. Done!
In [42]:
len(list(reaction_systems.keys()))
Out[42]:
74

Now we can do further analysis with this combined data set. Note that some reaction energies do not contains geometries (especially older ones). For purely technical reasons they have a placeholder geometry with only one Hydrogen from and a 1x1x1 Angstrom unit cell.

More Resources

In order quickly test what are possible queries, we have a GraphiQL Interface. You can write your own queries and GraphiQL will try to complete your keywords. Once your are happy with the results, you can copy the query back into e.g. Jupyter Notebook for further analysis. Also check out our Documentation for complete reference of the database schema and more tutorials and examples.