Machine Learning for Catalysis

Our training is designed to support collaborative, student-driven research for transformative advances in chemical and data sciences.

Harnessing Data Revolution for Catalyst Discovery

Our CataLST training model seeks to bridge artificial intelligence (Catalog, Learn, Search) with human intelligence (Search and Test) in order to make extraordinary leaps in the field of catalysis.
Graphic depicting the connection between human and artificial intelligence via Catalog, Learn, Search, Test (CataLST)

CataLST Traineeship Model

To train students how to harness data for catalyst discovery, we are building a unique, scalable, and transferable traineeship model called CataLST (pronounced catalyst).

In this model, trainees will learn how to conduct research at the interface between chemical and data sciences. Specifically, they will work in collaborative teams to:

  1. Catalog the literature with data mining,
  2. Learn from this knowledge base using machine learning to uncover new insights,
  3. Search for new catalysts and reaction conditions with these fundamental insights complemented by computational chemistry models, and
  4. Test and validate catalytic performance experimentally.

Collaborative Research

Our research does not simply involve passing information over a fence from one area of expertise to another. Rather, the training is designed to support collaborative learning, with each trainee contributing ideas toward a common goal.

This means that chemists, engineers and computer scientists all engage in dialog to catalog and interpret the literature, find ways to automatically extract key data, and then learn from the insights to uncover new relationships/patterns in the search for new catalysts to test experimentally.

The CataLST Model is Adaptable for Many Research Projects

Traditionally, machine learning has been applied to organic catalysts in the cheminformatics field, where researchers seek to predict how molecular structure affects activity. 

Our revolutionary approach of machine learning and natural language processing is needed to understand authentic systems under real conditions, since the number of reported catalysts for key reactions has grown beyond the limit of what can readily be gleaned from traditional literature searches, review articles, and tabulations of experimental and computational data.