Alpha-D Surrogate: Axial-Profile MLP Tutorial¶
This guide walks through extracting Darcy resistance coefficient profiles from the flow contraction-expansion parametric study and training an MLP surrogate model.
Problem overview¶
The parametric study runs 1000 MOOSE k-epsilon RANS simulations of flow through a pipe with an internal contraction-expansion. Three parameters are varied:
Parameter |
Symbol |
Range |
Spacing |
|---|---|---|---|
Reynolds number |
Re |
5 000 – 250 000 |
log-spaced (10 values) |
Diameter ratio |
Dr |
0.05 – 0.9 |
linear (10 values) |
Length ratio |
Lr |
0.01 – 0.2 |
linear (10 values) |
The surrogate predicts the Darcy resistance coefficient alpha_D(z) along
the contraction region as a function of case parameters and axial position:
[log10(Re), Dr, Lr, z_hat, d_local/D_big, region_flags] --> log(alpha_D)
The model is PhysicsNeMo’s FullyConnected MLP.
Prerequisites¶
moose-physicsnemoconda environment (or theetlDocker service)If using Docker Compose, use
etloretl-ngcfor training/evaluation.etl-devis ETL-only and does not include PyTorch.Completed parametric study cases under
data/flow_contraction_expansion/parametric_study/(each case directory must containsimulation_out.eandcase_metadata.txt)
Verify that you have completed cases:
ls data/flow_contraction_expansion/parametric_study/Re_5000__Dr_0p144__Lr_0p01/simulation_out.e
Step 1: Extract alpha_D profiles (ETL)¶
The alpha_D ETL pipeline reads each case’s simulation_out.e, extracts the
contraction region, computes the Darcy resistance coefficient at axial
stations, and writes per-case .zarr stores.
Run the extraction¶
cd src && python cases/alpha_d/run_etl.py \
etl.source.input_dir=../data/flow_contraction_expansion/parametric_study \
etl.sink.output_dir=../data/flow_contraction_expansion/parametric_study/processed
This processes all cases with simulation_out.e files. Use
etl.processing.num_processes=8 to speed up with more workers.
What the ETL does¶
For each case:
Opens
simulation_out.evia netCDF4 and reads node coordinates, element connectivity, and the pressure field at the final (converged) time step.Applies mesh scaling (
1.0by default).Computes element centroids and identifies the contraction region of interest (ROI), defined as the throat plus a buffer of 1 pipe diameter upstream and downstream.
Bins ROI elements into 50 axial stations and computes area-weighted (radially weighted for axisymmetric geometry) average pressure at each.
Derives the Darcy coefficient via the pressure gradient:
alpha_D(z) = -dP/dz * 2 * D_h(z) / (rho * V_bulk^2)
Constructs feature vectors:
log10(Re),Dr,Lr, normalized axial positionz_hat, local diameter ratiod_local/D_big, and region flags.Stores
log(alpha_D)as the target (log transform for positivity and better regression over wide ranges).
Output Zarr layout¶
{case_name}.zarr/
features float32 [50, 13] -- 50 stations x 13 input features
targets float32 [50, 2] -- log_alpha_D and signed_log1p_alpha_D
sample_weight float32 [50] -- per-station weights (higher in throat)
metadata/
attrs: case_id, feature_names, target_names, Re, Dr, Lr, delta_p_case
Feature reference¶
The per-station feature row written by AlphaDTransformation and the
engineered features synthesized at load time by
cases/alpha_d/feature_data.py::build_engineered_feature_map.
Notation:
Re_global = ρ V_bulk D / μis the case-level Reynolds number (V_bulk = ρ = 1in the simulation, soμ = D / Re_global).Dis the outer pipe diameter;d_local(z)is the local diameter, equal toDupstream/downstream andDr × Dinside the throat.D_h_local = d_localfor the axisymmetric circular cross-section.V_local(z)is the area-mean axial velocity (vel_z) at axial stationz, area-weighted withrover the cross-section to match the pressure averaging.
Base features (written by ETL into each case’s Zarr)¶
Feature |
Expression / definition |
Per-row varying? |
Source |
|---|---|---|---|
|
|
case-level |
|
|
|
case-level |
|
|
|
case-level |
|
|
|
local |
Geometry, ∈ [0, 1] across ROI |
|
|
local |
Geometry (abrupt model: |
|
|
local |
Identity for circular cross-section |
|
|
local |
MOOSE |
|
|
local |
One-hot region flag |
|
|
local |
One-hot region flag |
|
|
local |
One-hot region flag |
|
|
local |
Discrete axial gradient of |
|
|
local |
Signed: positive past the entry (inside throat / downstream), negative upstream |
|
|
local |
Signed: positive upstream of the exit (inside throat / upstream), negative past it. Note the sign convention is opposite of |
Engineered features (synthesized at load time)¶
These are included in the default selected_from_allowlist: null
candidate set and can be filtered down explicitly by listing the base
names. They are pure functions of base features so TabularPairDataset
can reproduce them at training time without any extra storage.
Feature |
Expression |
Per-row varying? |
Notes |
|---|---|---|---|
|
|
case-level |
Re scaled by throat diameter |
|
|
local |
Local Re using local V, local D, constant ν per case |
|
|
case-level |
|
|
|
case-level |
|
|
|
local |
|
|
|
local |
|
|
|
local |
Unsigned |
Historical note: dist_to_throat_start and dist_to_throat_end are still
listed in ENGINEERED_FEATURES (and the build_engineered_feature_map
fallback computation), but the ETL writes them into Zarr directly, and
load_feature_matrix prefers the Zarr column when a name is in both
sets. They are documented under Base features above.
In ideal incompressible flow V_local/V_bulk equals 1/A_local_over_A
exactly, so log10_Re_local would algebraically reduce to
log10_Re − log10(d_local/D). The simulation-derived V_local lets
this term carry any deviation between the as-computed velocity and the
round-pipe identity (e.g., near separation regions).
delta_p_case (total ROI pressure drop) is stored in metadata/ for
reference but is intentionally excluded from the allowlist — it is
target-adjacent and would leak.
Verify extraction¶
# Check that Zarr stores were created
ls data/flow_contraction_expansion/parametric_study/processed/*.zarr | head -5
# Inspect one store (requires Python with zarr)
cd src && python -c "
import zarr, json
root = zarr.open('../data/flow_contraction_expansion/parametric_study/processed/Re_5000__Dr_0p144__Lr_0p01.zarr', mode='r')
print('features shape:', root['features'].shape)
print('targets shape:', root['targets'].shape)
meta = root['metadata']
print('case_id:', meta.attrs['case_id'])
print('Re:', meta.attrs['Re'])
print('feature_names:', json.loads(meta.attrs['feature_names']))
print('target_names:', json.loads(meta.attrs['target_names']))
"
Expected output:
features shape: (50, 13)
targets shape: (50, 2)
case_id: Re_5000__Dr_0p144__Lr_0p01
Re: 5000.0
feature_names: ['log10_Re', 'Dr', 'Lr', 'z_hat', 'd_local_over_D', 'A_local_over_A', 'V_local_over_V_bulk', 'is_upstream', 'is_throat', 'is_downstream', 'dD_dz_local', 'dist_to_throat_start', 'dist_to_throat_end']
target_names: ['log_alpha_D', 'signed_log1p_alpha_D']
ETL configuration reference¶
Config file: src/cases/alpha_d/configs/etl.yaml
Key |
Default |
Description |
|---|---|---|
|
|
Parallel workers |
|
(required) |
Parametric study root directory |
|
|
Mesh coordinate scale factor |
|
|
Exodus filename in each case dir |
|
|
Axial stations per case |
|
|
Upstream/downstream buffer (pipe diameters) |
|
|
Fluid density (kg/m^3) |
|
(required) |
Directory for output Zarr stores |
Step 2: PyCaret feature selection¶
The MLP training config selects its inputs from a selected_features.txt
file written by PyCaret. Run this step after ETL and before training:
cd src && python cases/alpha_d/run_feature_selection_pycaret.py
The script reads the per-case Zarr stores from
../data/flow_contraction_expansion/parametric_study/processed/ (override
with data.zarr_dir=…), runs PyCaret regression with GroupKFold by
case_id over the ALLOWLIST-constrained candidate set, and writes
selected_features.txt to
../data/feature_analysis/pycaret/default/ (override with
output.dir=…).
That output path is exactly where train_mlp.yaml’s
data.input_columns_file points, so the MLP training step picks it up
with no extra flags.
When you can skip this step¶
Conv1D (
train_conv1d) hard-codes its input columns intrain_conv1d.yaml, so the Conv1D path does not need PyCaret output.For the MLP, you can also bypass PyCaret by passing an explicit list on the CLI:
python train.py --config-path cases/alpha_d/configs --config-name train_mlp \ data.input_columns_file=null \ 'data.input_columns=[log10_Re,Dr,Lr,z_hat,d_local_over_D,V_local_over_V_bulk]'
Common knobs¶
# Keep more features than the default 6
python cases/alpha_d/run_feature_selection_pycaret.py \
pycaret.setup.n_features_to_select=8
# Write output somewhere else
python cases/alpha_d/run_feature_selection_pycaret.py \
output.dir=../data/feature_analysis/pycaret/run_1
If you change output.dir, also override
data.input_columns_file=…/selected_features.txt in Step 3 so training
reads from the same place.
Step 3: Train the MLP¶
The training config src/cases/alpha_d/configs/train_mlp.yaml is
pre-configured for the axial-profile MLP.
Run training¶
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp'
This trains a 6-layer FullyConnected MLP (layer_size=128,
activation=silu, skip_connections=true) on log(alpha_D) using MSE loss.
Override defaults from the CLI¶
# More epochs, different learning rate
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp \
training.epochs=500 training.lr=1e-4'
# Larger model
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp \
model.params.layer_size=256 model.params.num_layers=8'
# Different data directory (if your Zarr stores are elsewhere)
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp \
data.zarr_dir=../data/my_processed_cases'
Training output¶
Checkpoint:
data/cases/train_mlp/model.mdlus(PhysicsNeMo format; path =${output.root_dir}/${hydra:job.config_name}/model.mdlus)Run metadata:
data/cases/train_mlp/run_meta.json(records split, model params, final loss)
Lock a split for fair model comparisons¶
If you want future experiments to use exactly the same train/test cases,
export the split from an existing run_meta.json once and then switch
training to the file-based split strategy.
cd src && python export_split.py \
--run-meta ../data/models/run_meta.json \
--output-dir ../data/splits/alpha_d_locked_v1
This writes:
../data/splits/alpha_d_locked_v1/train.txt../data/splits/alpha_d_locked_v1/test.txt
Then train future models against that same split:
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp \
data.split.strategy=file \
data.split.train_file=../data/splits/alpha_d_locked_v1/train.txt \
data.split.test_file=../data/splits/alpha_d_locked_v1/test.txt'
This is the easiest way to make evaluation metrics comparable across feature sets, HPO versions, and model architectures.
Verify training¶
Check that the loss decreases:
Training model='mlp' adapter='pointwise' on 612 train case(s), 153 test case(s), device=cpu.
epoch 1/200: loss=2.345678e+00
epoch 10/200: loss=4.567890e-01
epoch 50/200: loss=1.234567e-02
...
Training config reference¶
Config file: src/cases/alpha_d/configs/train_mlp.yaml
Key |
Default |
Description |
|---|---|---|
|
|
Model from registry (PhysicsNeMo FullyConnected) |
|
|
Hidden layer width |
|
|
Number of hidden layers |
|
|
Activation function |
|
|
Skip connections every 2 layers |
|
|
Directory of alpha_D Zarr stores |
|
|
Explicit column list; |
|
|
PyCaret-written file (Step 2). Set |
|
|
Target column |
|
|
Split strategy ( |
|
|
Fraction of cases for training |
|
|
Training epochs |
|
|
Batch size (rows, not cases) |
|
|
Learning rate |
|
|
Loss function |
Step 4: Evaluate¶
docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/alpha_d/configs --config-name train_mlp'
eval.checkpoint defaults to ${output.checkpoint} (the path written during
training), so no override is required for the standard flow.
The evaluator:
Loads the checkpoint and
run_meta.json.Reconstructs the dataset and train/test split from metadata.
Computes per-field MSE and RMSE on the held-out test cases.
Expected output¶
Evaluated adapter='pointwise' on 153 test case(s), overall mse=X.XXXXXXe-XX, rmse=X.XXXXXXe-XX.
log_alpha_D: mse=X.XXXXXXe-XX, rmse=X.XXXXXXe-XX
A good baseline RMSE on log(alpha_D) is below 0.5 (meaning the predicted
alpha_D is within a factor of ~1.6 of the CFD value on average). Lower values
indicate better fits.
Verify with metrics JSON¶
docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/alpha_d/configs --config-name train_mlp \
output.metrics_out=../data/cases/train_mlp/metrics.json'
Plots¶
Pass output.plot_dir=<path> to the evaluator to generate the full
diagnostic set:
python evaluate.py --config-path cases/alpha_d/configs --config-name train_conv1d \
output.plot_dir=../data/cases/train_conv1d/plots
The plotter writes (for the pointwise and profile adapters):
File |
What it shows |
|---|---|
|
Best-fit case: α_D(z) — three curves: ground truth (solid + markers), prediction (dashed), and the closed-form analytical baseline (dotted gray). |
|
Same but for the worst-fit case by pointwise RMSE on the test split. |
|
Per-station scatter of predicted vs. ground-truth α_D across the entire test set, with |
|
Per-case scatter of predicted ΔP (trapezoidal integral of predicted α_D over the ROI) vs. ground-truth |
output.plot_dir defaults to null, so no plots are produced unless
you opt in. Baseline / extended metrics (per-region MSE, R², Δp
relative-error breakdown by Dr) are written to
output.metrics_out regardless.
Then inspect:
python -c "import json; m=json.load(open('../data/cases/train_mlp/metrics.json')); print(json.dumps(m, indent=2))"
Quick-reference command summary¶
# 1. Extract alpha_D profiles from CFD
docker compose run --rm etl bash -lc 'cd src && python cases/alpha_d/run_etl.py'
# 2. PyCaret feature selection (writes selected_features.txt)
docker compose run --rm etl bash -lc 'cd src && python cases/alpha_d/run_feature_selection_pycaret.py'
# 3. Train (HPO + retrain best, all in one command)
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp'
# 3b. Train directly without HPO (power user)
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp hpo=null'
# 4. Evaluate on held-out cases
docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/alpha_d/configs --config-name train_mlp'
If you only want the Conv1D model, skip step 2 and substitute
--config-name train_conv1d in step 3.
Apptainer (HPC) equivalent¶
On systems without Docker, use the pre-built multifid-th-gpu.sif at the repo root
(or build one yourself per
Getting Started → Build a SIF image).
Add --nv for GPU; drop it for CPU-only. Replace /path/to/project with the
absolute path to your repo checkout (or set APPTAINER_BIND once and omit
--bind).
# 1. Extract alpha_D profiles (CPU is fine for ETL — drop --nv)
apptainer exec --bind /path/to/project:/path/to/project multifid-th-gpu.sif \
bash -lc 'cd /path/to/project/src && python cases/alpha_d/run_etl.py'
# 2. PyCaret feature selection (CPU is fine — drop --nv)
apptainer exec --bind /path/to/project:/path/to/project multifid-th-gpu.sif \
bash -lc 'cd /path/to/project/src && python cases/alpha_d/run_feature_selection_pycaret.py'
# 3. Train (HPO + retrain best)
apptainer exec --nv --bind /path/to/project:/path/to/project multifid-th-gpu.sif \
bash -lc 'cd /path/to/project/src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp'
# 3b. Train directly without HPO
apptainer exec --nv --bind /path/to/project:/path/to/project multifid-th-gpu.sif \
bash -lc 'cd /path/to/project/src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp hpo=null'
# 4. Evaluate
apptainer exec --nv --bind /path/to/project:/path/to/project multifid-th-gpu.sif \
bash -lc 'cd /path/to/project/src && python evaluate.py --config-path cases/alpha_d/configs --config-name train_mlp'
multifid-th-gpu.sif (built from docker/gpu.def) ships PyTorch CUDA 12.4 wheels +
PhysicsNeMo and matches the etl-gpu Docker service; it runs CPU-only without
--nv. For a smaller CPU-only image, build multifid-th-cpu.sif from
docker/physicsnemo-cpu.def.
Coupling the surrogate back into MOOSE¶
Once a model is trained, its per-station α_D(z) predictions can drive
a porous-medium Forchheimer closure inside a PINSFV simulation. The
end-to-end pipeline (exporter → MOOSE input → verifier) lives under
src/cases/alpha_d/ and is described in
Coupling Physics Reference, which
covers the volume-averaged momentum equation, the α_D → C_F mapping
F = α_D · ε² / D_h, and a quantitative comparison of the coupled
MOOSE pressure drop against the resolved-CFD ground truth.
Architecture notes¶
The alpha_D pipeline integrates with the existing generic training framework:
Model:
mlpin the model registry (training.models.mlp), wrapping PhysicsNeMo’sFullyConnected.Adapter:
training.adapters.PointwiseAdapter, handling tabular(B, D_in) -> (B, D_out)data.Dataset:
training.datasets_tabular.TabularPairDataset, reads per-case.zarrstores and concatenates row-wise. Splitting is done at the case level (never mixing rows from the same CFD case across train and test).ETL:
AlphaDSource/AlphaDTransformation/AlphaDZarrSinkincases.alpha_d.etl.source,cases.alpha_d.etl.transform, andcases.alpha_d.etl.sink, following the PhysicsNeMo Curator pattern.Experiment hooks:
cases.alpha_d.experiment.AlphaDExperimentadds throat-weighted loss and the plotting hooks (decode_for_plotting(),baseline_for_plotting()) the runner uses for per-case profile and parity plots.Feature data:
cases.alpha_d.feature_dataowns the ALLOWLIST, engineered-feature synthesis, and theFeatureAnalysisDataloader.
Note: HPO is built into training¶
The train_mlp.yaml config includes an hpo section with a search
space. Both entry points — the top-level train.py and the case
wrapper cases/alpha_d/train.py — call the same
training.runner.train_or_hpo dispatcher, which runs Optuna HPO first
and then retrains with the best hyperparameters whenever the config
contains a non-empty hpo.search_space.
To skip HPO and train directly with the parameters in the config, set
hpo=null on the CLI:
cd src && python train.py --config-path cases/alpha_d/configs --config-name train_mlp hpo=null
# or, equivalently, via the case wrapper:
cd src && python cases/alpha_d/train.py hpo=null
See Hyperparameter Optimization Guide for details on search-space format, study settings, and output artifacts.
Before training, use the
Case Distribution Analysis tool to
preview how much data you have in each Dr / Re / Lr bin –
especially after applying min_Dr or exclude_cases filters.
After running multiple HPO versions, review training progress and compare evaluation metrics across versions using whichever notebook or script you’ve adopted for cross-run comparison.
Status¶
Landed since the original tutorial draft:
Region-weighted loss —
data.throat_weight/data.downstream_weightinTabularPairDataset(Conv1D + MLP).Stratified case split —
data.split.strategy: stratifiedwithn_binscovering Re × Dr × Lr.Parity and per-case profile plots — see Plots above.
Per-case Δp integral evaluation —
extended.delta_p.per_caseineval_metrics.jsonreports trapezoidal-integrated ΔP_pred vs. ground-truthdelta_p_casefor every test case.
Explicitly not done (tried, regressed, removed):
A pressure-drop-consistency training loss (
delta_p_weightin the experiment). Two attempts (delta_p_weight = 0.25direct;delta_p_weight = 0.05via HPO selection) degraded the test Δp by 30–130% versus the pure relative_l2 baseline, even when val loss looked comparable. The log-space Δp loss has ~unit magnitude, so any non-trivial weight starves the per-station signal that the model actually fits well. The term and its hooks were removed fromAlphaDExperiment.MOOSE offline coupling: export predictions for closed-loop verification