Matminer Examples: Data Mining in Materials Science

If you had any questions related to matminer, please consult the official documentation. If the documentation did not cover your question, please look for it in matminer forum and if not already answered, post your question there as others may have the same question. If you found it useful in your work, please cite matminer paper.

Background

  • The purpose of this tutorial is to provide examples of how to use different aspects of the materials data minining python package, matminer, from gathering the data to featurization, visualization and training machine learning models to predict materials properties.
  • You may be going through these notebooks via binder in which case you can double-click on any given cell and change the source and then use shift + enter to run that cell and see the result of your change. Do not worry as binder creates your own isolated environment so you are not really changing any source code. Alternatively, you can clone the matminer_examples Github repository to run these notebooks on your machine and also checkout the examples that are written in simple python script which are not shown here.
  • Finally, having a basic knowledge of python programming language and Pandas (spreadsheet for python) would maximize the benefits of these examples.

Notebook examples

In these notebooks we show how to use matminer to retrieve materials data (e.g. composition, band gaps), generate features for materials (e.g. using their formula or crystal structures), visualize, train machine learning models, interpret and evaluate those models.

Note: "Advanced" notebooks refer to those that require some knowledge of pandas or scikit-learn.

Data Retrieval

  • Basic: Retrieve data from the following platforms:
    • The Materials Project, Citrine Informatics, The Materials Data Facility and The Materials Platform for Data Science
  • Basic: Use the data retrieval tools to compare computed and experimental band gaps and also plot with FigRecipes
  • Basic: Work with the NIST Jarvis database
  • Basic: Work with the Materials Platform for Data Science (MPDS) to conduct U-O bond length analysis (requires MPDS API key)

Visualization

  • Basic: basic plots with figrecipes package
  • Advanced: showing data retrieval results in nicely formatted plots all within matminer

More visualization examples (in Python script rather than Juypter notebook format) can be found in the figrecipes-py folder of the examples directory.

Machine Learning

  • Basic: Predicting the bulk/shear moduli of compounds
  • Advanced: (Note this requires ~2 CPU hours)
    • Structure feature generation in matminer, train a model and implement a cross-validation of the model
  • Advanced: Predict the formation enthalpy of OQMD compounds using composition descriptors only and cross-validate
  • Advanced: Using sklearn Pipeline
    • Composition feature generation in matminer, train and validate a machine learning model and optimize its hyperparameters
  • Advanced: Train random forests with uncertainty estimates

Automatminer

  • Basic: Use automatminer to make predictions on a benchmark dataset.
In [ ]: