Results
The end-to-end pipeline improves accuracy while exposing risk.
The dissertation validates the framework on a withheld Materials Project test set, then separates the story into classifier behavior, nonmetal regression, calibrated intervals, and error regimes.
0.9843
Stage 1 ROC-AUC
PHASE GATE DISCRIMINATION
97.86%
Nonmetal recall
OPTIMIZED FOR RECALL
0.3758 eV
Stage 2 MAE
BIN-CORRECTED NONMETALS
0.8734
Stage 2 R2
POSITIVE-GAP SUBSET
0.2336 eV
Global MAE
ALL METAL CLASSES
0.8945
Global R2
END-TO-END PIPELINE
Conformal prediction for Stage two regression
Stage 2 Prediction intervals are calibrated in log space, then returned to eV.
PI90 coverage
90.56%
2.17 eV mean width
PI95 coverage
95.09%
2.87 eV mean width
Calibration set
16,039
unused in model training
BENCHMARKING VS GRAPH NEURAL NETWORKS BASELINES
DFT-PBE
1.0000eV
CGCNN
0.3880eV
MEGNet
0.3299eV
GATGNN
0.3222eV
CrystaLogiX
0.2336eV
Lower MAE is better. Baseline MAEs are back-calculated from the dissertation's reported percentage improvements over CGCNN, MEGNet, and GATGNN.
GLOBAL PERFORMANCE VALIDATION
Evaluating Residual Distributions & Pipeline Accuracy.

The Parity Analysis
This parity plot maps the predicted electronic bandgaps against the true DFT-ground truth values across the entire withheld validation corpus.
- Zero-Spike Handling: Notice the high density of accurately mapped points anchoring the origin at (0,0). This visually demonstrates the success of the Stage 1 XGBoost classifier gate in perfectly routing metallic phases out of the pipeline.
- High-Density Convergence: The majority of semiconductor entries tightly cluster within the shaded ±0.5eV calibration band along the perfect prediction line y = x.
- Variance at Higher Gaps: The minor dispersion seen above 6.0eV represents wide-bandgap insulators, an expected behavior given the extreme scarcity of high-energy insulator samples in open crystal structures.
Error anatomy
The biggest remaining risk is not random; it is routed and energy-dependent.
Correctly routed samples achieved an MAE of 0.1910 eV, while misrouted samples rose to 0.7595 eV.
Narrow-gap materials in the 0-1 eV range were overestimated by roughly +0.222 eV.
Wide-gap materials above 5 eV were underestimated by roughly -0.420 eV.
The remaining PI90 coverage shortfall is attributable to Stage 1 gate errors rather than the conformal regressor alone.
Limits
The model is practical, but its validity boundary is explicit.
Gradient-boosted trees have an extrapolation ceiling for sparse regions such as wide-gap insulators above 5 eV.
Static Magpie descriptors cannot fully encode defect states, surface reconstruction, spin-orbit effects, or complex f-block behavior.
PBE ground-truth labels impose a noise floor for strongly correlated oxides and absolute experimental gap prediction.
Marginal conformal coverage is not automatically conditional across every crystal system or compositional family.