Results

The end-to-end pipeline improves accuracy while exposing risk.

The dissertation validates the framework on a withheld Materials Project test set, then separates the story into classifier behavior, nonmetal regression, calibrated intervals, and error regimes.

0.9843

Stage 1 ROC-AUC

PHASE GATE DISCRIMINATION

97.86%

Nonmetal recall

OPTIMIZED FOR RECALL

0.3758 eV

Stage 2 MAE

BIN-CORRECTED NONMETALS

0.8734

Stage 2 R²

POSITIVE-GAP SUBSET

0.2336 eV

Global MAE

ALL METAL CLASSES

0.8945

Global R²

END-TO-END PIPELINE

Conformal prediction for Stage two regression

Stage 2 Prediction intervals are calibrated in log space, then returned to eV.

PI90 coverage

90.56%

2.17 eV mean width

PI95 coverage

95.09%

2.87 eV mean width

Calibration set

16,039

unused in model training

BENCHMARKING VS GRAPH NEURAL NETWORKS BASELINES

MODELMAE, eV

DFT-PBE

1.0000eV

CGCNN

0.3880eV

MEGNet

0.3299eV

GATGNN

0.3222eV

CrystaLogiX

0.2336eV

Lower MAE is better. Baseline MAEs are back-calculated from the dissertation's reported percentage improvements over CGCNN, MEGNet, and GATGNN.

GLOBAL PERFORMANCE VALIDATION

Evaluating Residual Distributions & Pipeline Accuracy.

Piarity plot showing predicted vs. true bandgap values for the full pipeline, with a reference y=x line and error distribution shading.

The Parity Analysis

This parity plot maps the predicted electronic bandgaps against the true DFT-ground truth values across the entire withheld validation corpus.

Zero-Spike Handling: Notice the high density of accurately mapped points anchoring the origin at (0,0). This visually demonstrates the success of the Stage 1 XGBoost classifier gate in perfectly routing metallic phases out of the pipeline.
High-Density Convergence: The majority of semiconductor entries tightly cluster within the shaded ±0.5eV calibration band along the perfect prediction line y = x.
Variance at Higher Gaps: The minor dispersion seen above 6.0eV represents wide-bandgap insulators, an expected behavior given the extreme scarcity of high-energy insulator samples in open crystal structures.

Error anatomy

The biggest remaining risk is not random; it is routed and energy-dependent.

Correctly routed samples achieved an MAE of 0.1910 eV, while misrouted samples rose to 0.7595 eV.

Narrow-gap materials in the 0-1 eV range were overestimated by roughly +0.222 eV.

Wide-gap materials above 5 eV were underestimated by roughly -0.420 eV.

The remaining PI90 coverage shortfall is attributable to Stage 1 gate errors rather than the conformal regressor alone.

Limits

The model is practical, but its validity boundary is explicit.

Gradient-boosted trees have an extrapolation ceiling for sparse regions such as wide-gap insulators above 5 eV.

Static Magpie descriptors cannot fully encode defect states, surface reconstruction, spin-orbit effects, or complex f-block behavior.

PBE ground-truth labels impose a noise floor for strongly correlated oxides and absolute experimental gap prediction.

Marginal conformal coverage is not automatically conditional across every crystal system or compositional family.