GPU-accelerated materials informatics

Predicting the Electronic Future of Materials, Instantly

CrystaLogix demonstrates a two-stage hurdle framework for electronic bandgap prediction: classify metallic phases first, then estimate the nonmetallic bandgap with an ensemble regressor and conformal uncertainty.

Stage One

Metal Classification Gate

XGBoost Classifier | Recall: 0.28

Stage Two

Ensemble Regressor

Conformal Prediction Intervals

200,487

Crystals Analyzed

87

Selected Features

0.2336eV

Global MAE

0.8945

End-to-end R2

Dissertation core

A model built around the physical split between metals and nonmetals.

The Materials Project bandgap target is not a normal regression target: more than half of the corpus sits exactly at Eg = 0eV. The hurdle framework treats that spike as a classification problem before modeling the continuous positive-gap distribution.

52.2%

Metallic share

Eg = 0eV ENTRIES

95,920

Nonmetal subset

SENT TO STAGE 2

72% / 8% / 20%

Train / Calibration / Test

PROPORTIONAL PHASE SPLIT

40,098

Holdout test

WITHHELD MATERIALS

TECHNICAL IMPLEMENTATION

Engineering a High-Throughput Pipeline.

01RAPIDS cuDF

GPU-resident curation

The corpus is processed on an NVIDIA GeForce RTX 3050 with an approximately 280 MB in-memory footprint, making 200k-entry screening practical on consumer-grade hardware.

02XGBoost gate

Classifier hurdle

A tuned binary classifier separates metals from nonmetals. Lowering the decision threshold to 0.28 prioritizes nonmetal recall, reducing false negatives from 976 to 411.

035-model ensemble

Nonmetal regressor

Only positive-bandgap entries are passed to an Optuna-tuned XGBoost ensemble trained on log(1 + Eg), isolating the continuous prediction problem from the zero spike.

04Conformal PI

Bias and uncertainty layer

Bin-wise correction reduces high-energy tail bias, while split conformal prediction converts residuals into calibrated 90% and 95% prediction intervals.

DOWNSTREAM IMPACT

Real-world R&D Deployment Scenarios.

Semiconductor and power electronics

Retune the gate and objective around high-energy gaps to triage power-device candidates before expensive validation.

Photovoltaic manufacturing

Prioritize materials around the Shockley-Queisser window of roughly 1.15-1.35 eV and avoid candidates outside the useful bandgap range.

Risk-aware R&D screening

Use conformal interval width as a decision variable, ranking candidates by both predicted Eg and confidence.