GPU-accelerated materials informatics
Predicting the Electronic Future of Materials, Instantly
CrystaLogix demonstrates a two-stage hurdle framework for electronic bandgap prediction: classify metallic phases first, then estimate the nonmetallic bandgap with an ensemble regressor and conformal uncertainty.
Stage One
Metal Classification Gate
XGBoost Classifier | Recall: 0.28
Stage Two
Ensemble Regressor
Conformal Prediction Intervals
200,487
Crystals Analyzed
87
Selected Features
0.2336eV
Global MAE
0.8945
End-to-end R2
Dissertation core
A model built around the physical split between metals and nonmetals.
The Materials Project bandgap target is not a normal regression target: more than half of the corpus sits exactly at Eg = 0eV. The hurdle framework treats that spike as a classification problem before modeling the continuous positive-gap distribution.
52.2%
Metallic share
Eg = 0eV ENTRIES
95,920
Nonmetal subset
SENT TO STAGE 2
72% / 8% / 20%
Train / Calibration / Test
PROPORTIONAL PHASE SPLIT
40,098
Holdout test
WITHHELD MATERIALS
TECHNICAL IMPLEMENTATION
Engineering a High-Throughput Pipeline.
GPU-resident curation
The corpus is processed on an NVIDIA GeForce RTX 3050 with an approximately 280 MB in-memory footprint, making 200k-entry screening practical on consumer-grade hardware.
Classifier hurdle
A tuned binary classifier separates metals from nonmetals. Lowering the decision threshold to 0.28 prioritizes nonmetal recall, reducing false negatives from 976 to 411.
Nonmetal regressor
Only positive-bandgap entries are passed to an Optuna-tuned XGBoost ensemble trained on log(1 + Eg), isolating the continuous prediction problem from the zero spike.
Bias and uncertainty layer
Bin-wise correction reduces high-energy tail bias, while split conformal prediction converts residuals into calibrated 90% and 95% prediction intervals.
INTERACTIVE CORE
Inspect the Framework Dynamics.
Trace the hurdle architecture
Follow the data path from Materials Project records to GPU featurization, phase routing, regression, bias correction, and conformal calibration.
Explore
Audit the model behavior
Inspect stage metrics, conformal interval coverage, benchmark improvements, error regimes, and known validity threats.
Explore
Screen candidate materials
Adjust the classifier threshold, confidence level, and acquisition preference to see how routing and uncertainty change decisions.
Explore
DOWNSTREAM IMPACT
Real-world R&D Deployment Scenarios.
Semiconductor and power electronics
Retune the gate and objective around high-energy gaps to triage power-device candidates before expensive validation.
Photovoltaic manufacturing
Prioritize materials around the Shockley-Queisser window of roughly 1.15-1.35 eV and avoid candidates outside the useful bandgap range.
Risk-aware R&D screening
Use conformal interval width as a decision variable, ranking candidates by both predicted Eg and confidence.