This notebooks will show how you can use requests
and pandas
so gather and explore your data. Often times you will need to suply your data by other methods.
The api
that we will be using is the material project. Link to the api description
import requests
base_url = 'https://materialsproject.org/rest/v2/'
This link details the steps necissary.
API_KEY
to this value.The subprocess method is a way that I store my passwords on my computer and will not work for you.
Afterwards in the next cell we will test that our API key works.
This is done by performing a GET
or POST
request to https://www.materialsproject.org/rest/v1/api_check
.
import subprocess
API_KEY = subprocess.check_output('gopass www/materialsproject.com apikey'.split()).decode('utf-8')
# API_KEY = "<apikey-here>"
session = requests.Session()
session.headers.update({'X-API-KEY': API_KEY})
# for some reason the v2 API does not include an API check method??
response = session.get(f'https://www.materialsproject.org/rest/v1/api_check')
data = response.json()
print(data)
if not data['api_key_valid']:
raise ValueError('You are not authenticated!')
{'valid_response': True, 'api_key_valid': True}
The materials project provides a RESTfull API for getting material properties which is detailed here.
If you have followed the steps above you should be ready to parse materials project data.
A RESTfull API is a nice way to expose data over the web. While they provide convenient methods for getting each individual material property they have a limit of 500 queries per day so we need to be efficient in our queries. To do this we will use the npquery
to get properties in batch.
Lets start by getting a list of materials that are compossed of the following elements Fe
, Ti
, O
, C
, N
, He
. This does not affect your API limit
def get_materials(elements):
elements_str = '-'.join(elements)
response = session.get(f'{base_url}/materials/{elements_str}/mids')
data = response.json()
print(f'Found {len(data["response"])} Materials in the Materials Project with the elements: {elements}')
return data['response']
def get_material_experimental_properties(mid):
response = session.get(f'{base_url}/materials/{mid}/exp/')
print(response.content)
data = response.json()['response'][0]
print(data)
return data
def get_material_vasp_properties(mid, piezoelectric=False, dielelectric=False):
response = session.get(f'{base_url}/materials/{mid}/vasp/')
material_data = response.json()['response'][0]
if piezoelectric:
response = session.get(f'{base_url}/materials/{mid}/vasp/piezo')
data = response.json()
if not data['valid_response']:
material_data['piezoelectric'] = None
else:
material_data['piezoelectric'] = data['response']
if dielelectric:
response = session.get(f'{base_url}/materials/{mid}/vasp/diel')
data = response.json()
if not data['valid_response']:
material_data['dielelectric'] = None
else:
material_data['dielelectric'] = data['response']
return material_data
material_ids = get_materials(['Fe', 'O', 'Ni', 'He', 'Zn', 'Cu'])
Found 385 Materials in the Materials Project with the elements: ['Fe', 'O', 'Ni', 'He', 'Zn', 'Cu']
Includes:
energy
, energy_per_atom
, volume
, formation_energy_per_atom
, nsites
, unit_cell_formula
, pretty_formula
, e_above_hull
, spacegroup
, icsd_ids
, cif
,
properties: band_gap
, density
, energry
, energy_per_atom
, formation_energy_per_atom
, elascticity
, total_magnetization
But some properties are still not included:
piezo
, diel
# MgO
material_id = 'mp-1265'
# Na2O
material_id = 'mp-776952'
data = get_material_vasp_properties(material_id, piezoelectric=True, dielelectric=True)
data.keys()
dict_keys(['energy', 'energy_per_atom', 'volume', 'formation_energy_per_atom', 'nsites', 'unit_cell_formula', 'pretty_formula', 'is_hubbard', 'elements', 'nelements', 'e_above_hull', 'hubbards', 'is_compatible', 'spacegroup', 'task_ids', 'band_gap', 'density', 'icsd_id', 'icsd_ids', 'cif', 'total_magnetization', 'material_id', 'oxide_type', 'tags', 'elasticity', 'full_formula', 'piezoelectric', 'dielelectric'])
Turns out to be thermochemical data and not worth looking at
get_material_experimental_properties(material_id)
The Material Project definently is not enforcing their 500
materials per day rate limit.
Also if you have a query that get greater than 3,000 materials it fails. Thus why some are commented out.
materials_data = {}
# Lets just grab a bunch of materials
material_ids = get_materials(['H', 'He',
#'Li', 'Be',
#'B', 'C', 'N',
'O',
#'F', 'Ne',
#'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar'
'K', 'Ca',
'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn',
# 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr',
])
print('Number of materials', len(material_ids))
Found 2661 Materials in the Materials Project with the elements: ['H', 'He', 'O', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn'] Number of materials 2661
# store the results
for mid in material_ids:
if mid in materials_data:
continue
materials_data[mid] = get_material_vasp_properties(mid)
len(materials_data)
6928
import json
json.dump(materials_data, open('mpdata.json', 'w'))
! du -sh *
12K 1-gather-data.ipynb 4.0K Overview.ipynb 22M mpdata.json