cases¶
Per-case packages. Each case owns its ETL pipeline, datasets, physics
helpers, experiment hook, and Hydra configs. The training core does
not import from cases/* — coupling flows in one direction only.
cases.alpha_d¶
Experiment¶
Alpha-D experiment.
Specialises training.experiment.Experiment with the alpha-D
case’s evaluation metrics (per-region pointwise + Δp integral) and the
plotting hooks the runner uses for per-case profile and parity plots.
The training step itself is inherited unchanged.
- class cases.alpha_d.experiment.AlphaDExperiment(model, optimizer, loss_fn, adapter, device, **kwargs)[source]¶
Bases:
ExperimentAlpha-D-specific evaluation + plotting hooks.
- compute_extended_metrics(eval_dataset, all_preds, all_targets)[source]¶
Pointwise + Δp metrics for the alpha-D adapter.
Requires a TabularPairDataset / AlphaDProfileDataset (gated by
_row_case_idx). Other adapters fall through to{}.
- print_extended_metrics(metrics)[source]¶
Pretty-print the dict returned by
compute_extended_metrics.Default: no-op (the generic runner already prints overall + per-field MSE/RMSE). Subclasses override to format case-specific blocks.
- Return type:
- prepare_for_training(train_dataset, val_dataset, device)[source]¶
Bind alpha-D-specific state from the datasets onto the experiment.
- Return type:
Feature data¶
Data loader for feature analysis.
Flattens the per-case zarr stores produced by the alpha_D ETL pipeline
into dense numpy arrays (X, y, groups) suitable for sklearn.
Leakage controls¶
ALLOWLISThard-codes the input features that are safe to consider. The YAML config may restrict this set viaselected_from_allowlistbut cannot extend it. Unknown names raiseValueError. Extending the allowlist is a code change so it appears in review.groupsis the per-row case index. Callers must usesklearn.model_selection.GroupKFold(groups=groups)for all CV; rows inside a case are spatially correlated.Metadata attributes such as
delta_p_caseare never included as features here — they are target-adjacent and would leak.
- cases.alpha_d.feature_data.build_engineered_feature_map(features, raw_feature_names)[source]¶
Build leakage-safe engineered features from raw per-row feature columns.
- cases.alpha_d.feature_data.engineered_features_spec()[source]¶
Entrypoint returning
(names, builder)for TabularPairDataset.Used by the pointwise adapter via the
data.engineered_features_entrypointconfig key.
- cases.alpha_d.feature_data.load_feature_matrix(zarr_dir, *, target='log_alpha_D', selected_from_allowlist=None, local_velocity_normalization=True, min_Dr=None, exclude_cases=None)[source]¶
Load all cases under
zarr_dirinto a flatFeatureAnalysisData.- Parameters:
target (
str) – Target column name. Must exist in zarrtarget_names.selected_from_allowlist (
list[str] |None) – Restrict the ALLOWLIST to this subset. Cannot add new names.local_velocity_normalization (
bool) – If True, rescale alpha_D-family targets to the local-velocity basis before returning them. Matches training behaviour when the MLP is trained withlocal_velocity_normalization: true.min_Dr (
float|None) – Drop cases with diameter ratio below this value (matches training).exclude_cases (
list[str] |None) – Drop cases whose stem is in this list.
- Return type:
Transforms and metrics¶
Case-specific target transforms for the alpha-D surrogate.
The generic TabularPairDataset accepts a target_transform callable
that rewrites encoded targets before the dataset materialises tensors.
This module provides the alpha-D closed-form residual transform, which
optionally applies local-velocity normalisation first and then subtracts
the closed-form baseline:
encoded_truth_lv = LV_norm(encoded_truth) if requested encoded_residual = encoded_truth_lv − encoded_baseline_lv
where encoded_baseline is the per-station alpha-D baseline encoded with
the same target convention as the truth (see cases.alpha_d.physics).
A transform returns (transformed_y, extras) where extras is a dict
of well-known extras the dataset stashes on self. Recognised keys:
baseline_encoded—ndarrayof the encoded baseline, stashed atdataset._baseline_encodedso metrics / plotting / the Δp integral can re-add it at decode boundaries.local_velocity_normalization—boolindicating whether LV-norm was actually applied (the dataset propagates this ontodataset.local_velocity_normalization).
- cases.alpha_d.transforms.alpha_d_residual_transform(full_y, full_x, *, target_names, feature_names, case_meta_list, rows_per_case, local_velocity_normalization=False)[source]¶
Optionally LV-normalise and subtract the closed-form alpha-D baseline.
No-op (returns
(full_y, {})) when the dataset cannot satisfy the prerequisites:z_hat/d_local_over_Dfeatures missing, or no alpha-D-shaped column intarget_names.
Alpha-D-specific extended evaluation metrics.
The generic runner calls Experiment.compute_extended_metrics(), which
AlphaDExperiment (in cases/alpha_d/experiment.py) overrides to invoke
the helpers in this module. Base experiments inherit a no-op and skip this
module entirely, so the runner stays alpha-D-agnostic.
Three functions live here:
compute_pointwise_extended_metrics()— R², physical-space, per-region, per-case error breakdown for the TabularPairDataset adapter.compute_delta_p_metrics()— trapezoidal integration of the predicted α_D profile per case and comparison vs the ground-truthdelta_p_case.print_extended_metrics()— human-readable summary used byevaluate().
- cases.alpha_d.metrics.compute_delta_p_metrics(model, eval_dataset, device, *, alpha_d_target_name='log_alpha_D', local_velocity_normalization=False)[source]¶
Per-case Δp prediction error statistics.
Integrates the predicted α_D profile via the trapezoidal rule to obtain
delta_p_pred, then compares withdelta_p_casestored in the zarr metadata. Per-case geometry constants (D_big,outer_height_m,buffer_diams,rho,V_bulk) are read from each case’s metadata and fall back to the historical AlphaD-ETL defaults when missing.
Datasets¶
Per-case profile dataset for 1D-conv α_D training.
Wraps a TabularPairDataset and exposes per-case views shaped
(features, stations) so that a 1D conv along the station axis can
treat each case as a single sample.
The wrapper delegates flat row-level state to the inner tabular dataset,
which is what every existing access site in runner.py,
experiments/alpha_d.py, and plotting.py reads. Subsets are built
by delegating to TabularPairDataset.subset_by_case_indices so that
flat state stays aligned with the train/val/test case split.
- class cases.alpha_d.datasets.profile.AlphaDProfileDataset(*args, **kwargs)[source]¶
Bases:
DatasetPer-case dataset producing
(x, y, w, case_idx)profile tensors.- Shapes per item:
x :
[in_features, n_stations]y :[out_features, n_stations]w :[1, n_stations](broadcast-compatible with y) case_idx : scalar long tensor
Stations are sorted by
z_hatper case so the conv sees a monotone spatial sequence.- property norm_stats¶
- property normalize¶
- property has_target_baseline¶
- property local_velocity_normalization¶
- property exclude_cases¶
- property input_columns¶
- property output_columns¶
- property in_features¶
- property out_features¶
- property sim_names¶
- cases.alpha_d.datasets.profile.build_dataset(data_cfg)[source]¶
Construct an
AlphaDProfileDatasetfrom a Hydra data config.Called by
training.adapters.ProfileAdapter.build_datasetafter it resolvesdata.dataset_entrypoint. Reads all the alpha-D-flavoured kwargs the generic adapter no longer knows about; engineered features and target transform default-injection still happen insideAlphaDProfileDataset.__init__.- Return type:
Physics¶
Shared conversions for alpha_D training targets.
The alpha_D surrogate can be trained with different encoded targets while
still needing a consistent way to recover physical alpha_D for
pressure-drop integration, evaluation, and plotting.
- cases.alpha_d.physics.targets.is_alpha_d_target(field_name)[source]¶
Return True when field_name encodes alpha_D.
- Return type:
- cases.alpha_d.physics.targets.encode_alpha_d_target(alpha_d_values, *, target_name)[source]¶
Encode physical
alpha_Dinto a model target representation.
- cases.alpha_d.physics.targets.decode_alpha_d_target(encoded_values, *, target_name)[source]¶
Decode a model target representation back to physical
alpha_D.
- cases.alpha_d.physics.targets.alpha_d_values_to_bulk(encoded_values, *, target_name, d_over_D=None, local_velocity_normalization=False)[source]¶
Decode target-space values into bulk-velocity-basis
alpha_D.
- cases.alpha_d.physics.targets.alpha_d_bulk_to_values(alpha_d_bulk, *, target_name, d_over_D=None, local_velocity_normalization=False)[source]¶
Encode bulk-basis
alpha_Dinto the requested model-space target.
- cases.alpha_d.physics.targets.convert_alpha_d_values_between_bases(encoded_values, *, target_name, d_over_D, from_local_velocity_normalization, to_local_velocity_normalization)[source]¶
Re-encode alpha_D values while switching bulk/local-velocity basis.
- cases.alpha_d.physics.targets.field_values_to_physical(values, *, field_name, d_over_D=None, local_velocity_normalization=False)[source]¶
Convert model-space values into physical-space values.
For alpha_D targets this returns bulk-basis
alpha_D. For other logarithmic fields, it applies the corresponding inverse transform. Non-logarithmic fields are returned unchanged.
Closed-form sudden contraction + throat-friction alpha_D baseline.
For an axisymmetric pipe with an abrupt contraction-then-expansion this baseline keeps two terms whose CFD signature is genuinely localised:
ΔP_baseline = K_c · q_throat + f_Darcy · (L_throat / D_throat) · q_throat
with q_throat = 0.5 ρ V_throat², and
β² = (D_throat / D_big)² = Dr² K_c = 0.5 · (1 − β²) # sudden contraction (Idelchik) V_throat = V_bulk / Dr² Re_throat = Re_bulk / Dr f_Darcy = 0.316 · Re_throat^(-1/4) # Blasius (turbulent)
For this dataset Re_throat ∈ [5.5e3, 7.5e5] — always turbulent, so the laminar branch is omitted.
The Borda-Carnot expansion loss K_e = (1 − β²)² is deliberately not included. In CFD the expansion loss is dissipated gradually over several diameters downstream, and the local α_D right at the expansion plane is dominated by Bernoulli-like pressure recovery (often slightly negative). Adding K_e as a single positive spike at the outlet bin forces the model to fight a phantom artifact (worth O(K_e · D_h / dz) in α_D), which empirically destroyed the residual fit. We let the model learn the distributed downstream loss on its own.
The baseline is exposed as a profile α_D(z) so that integrating
dp/dz = α_D · ρ V_bulk² / (2 D_h) # bulk-basis convention
over the ROI reproduces ΔP_baseline above. K_c is localised at the contraction-inlet bin (single station of physical width dz_phys = L_roi / n_stations); friction is uniform across the throat interior.
The closed form is intentionally simple — its job is to provide a smooth shape prior so the MLP can fit the residual against CFD, not to replace the CFD computation.
- class cases.alpha_d.physics.baseline.BaselineGeometry(Re, Dr, Lr, D_big=0.2, outer_height_m=1.0, buffer_diams=1.0, rho=1.0, V_bulk=1.0, n_stations=50)[source]¶
Bases:
objectPer-case geometry constants required to evaluate the baseline.
- cases.alpha_d.physics.baseline.alpha_d_baseline_profile(z_hat, geom)[source]¶
Compute the bulk-basis α_D baseline at each station.
- Parameters:
z_hat (
ndarray) – Per-station normalised axial coordinate, shape[n_stations].geom (
BaselineGeometry) – Geometry + flow constants for the case.
- Returns:
alpha_D – Bulk-basis α_D = -dp/dz · 2 D_h / (ρ V_bulk²) at each station.
- Return type:
ndarray
ETL (Source · Transform · Sink)¶
AlphaDSource: reads MOOSE simulation_out.e + case_metadata.txt.
Subclasses physicsnemo_curator DataSource. Each file represents one parametric-study case. The source returns raw mesh + field arrays together with the case parameters (Re, Dr, Lr, geometry dimensions) needed by the downstream transformation.
- class cases.alpha_d.etl.source.AlphaDSource(cfg, input_dir, manifest=None, mesh_scale=1.0, exodus_filename='simulation_out.e')[source]¶
Bases:
DataSourceReads
simulation_out.efiles from the parametric study.- Parameters:
AlphaDTransformation: extract contraction ROI, compute Darcy coefficient.
Implements the DataTransformation ABC from physicsnemo-curator.
- For each CFD case the transformation:
Computes element centroids from node coords + connectivity.
Identifies the contraction region from geometry parameters.
Bins elements into axial (z) stations.
Computes cross-section-averaged pressure at each station.
Derives the Darcy resistance coefficient alpha_D via the pressure gradient and local hydraulic diameter.
Constructs the feature table that the MLP will consume.
Output dict contains features [N_stations, D_in], targets
[N_stations, 1], and metadata.
- class cases.alpha_d.etl.transform.AlphaDTransformation(cfg, n_stations=50, buffer_diams=1.0, rho=1.0, min_elements=3)[source]¶
Bases:
DataTransformationExtract contraction-region Darcy resistance profiles.
- Parameters:
cfg (
ProcessingConfig) – ProcessingConfig.n_stations (
int) – Number of axial stations to bin into.buffer_diams (
float) – Number of pipe diameters of upstream/downstream buffer.rho (
float) – Fluid density (kg/m^3), matching the CFD setup.min_elements (
int) – Minimum elements required per station; skip case if not met.
AlphaDZarrSink: writes per-case alpha_D profiles to Zarr stores.
Zarr store layout per case:
{case_name}.zarr/
features float32 [N_stations, D_in]
targets float32 [N_stations, D_out]
metadata/
attrs: case_id, feature_names, target_names,
Re, Dr, Lr, delta_p_case
cases.case_pressure_drop¶
Feature-selection helpers for the case-level pressure-drop workflow.
- class cases.case_pressure_drop.feature_selection.MethodResult(mean_score, mean_rank, per_fold_ranks, stability)[source]¶
Bases:
object- mean_score: numpy.ndarray¶
- mean_rank: numpy.ndarray¶
- per_fold_ranks: numpy.ndarray¶
- stability: numpy.ndarray¶
- class cases.case_pressure_drop.feature_selection.SelectionResult(selected_features, report, report_path, manifest_path, selected_features_path, used_case_ids)[source]¶
Bases:
object
cases.moose_grid¶
ETL data sources¶
ExodusDataSource: reads MOOSE Exodus (.e) simulation output files.
Subclasses physicsnemo_curator’s DataSource ABC.
- Each Exodus file represents one simulation run. The reader:
Extracts mesh geometry (node coordinates, element connectivity).
Extracts element solution fields for every time step.
Optionally co-reads matching CSV line-probe files via CSVProbeSource.
The returned dict is keyed so that MOOSEDataTransformation can consume it directly without field-name guessing.
- class cases.moose_grid.etl.data_sources.exodus_source.ExodusDataSource(cfg, input_dir, data_dir=None)[source]¶
Bases:
DataSourceReads MOOSE Exodus files and co-reads matching CSV probe files.
- Parameters:
CSVProbeSource: reads MOOSE CSV line-probe output files.
- CSV files produced by MOOSE VectorPostprocessors follow the naming pattern:
{sim_prefix}_out_{probe_name}_{timestep:04d}.csv
All CSVs belonging to the same simulation run share the same {sim_prefix}. Each file holds a column-per-field table (TKE, TKED, id, pressure, vel_x, vel_y, x, y, z, …) with one row per sample point along the probe.
This helper is called by ExodusDataSource.read_file() — it is not a DataSource subclass because it does not manage its own file list.
- cases.moose_grid.etl.data_sources.csv_source.find_probe_files(sim_prefix, data_dir)[source]¶
Find all CSV probe files that belong to a simulation run.
- Parameters:
- Return type:
- Returns:
Mapping from probe name to sorted list of CSV file paths (one entry per time step).
- class cases.moose_grid.etl.data_sources.csv_source.CSVProbeSource(data_dir)[source]¶
Bases:
objectReads and aggregates MOOSE CSV line-probe files for one simulation run.
- read_all(sim_prefix)[source]¶
Read all probe CSVs for a simulation run.
- Returns:
probe_data: dict mappingprobe_nameto a numpy array of shape[Np, C]whereNpis the number of sample points andCthe number of columns. When multiple time steps are found, data from the last time step is used (steady-state typical).probe_columns: ordered list of column names shared across probes.
- Return type:
A pair
(probe_data, probe_columns)
- cases.moose_grid.etl.data_sources.csv_source.read_csv(path)[source]¶
Read a MOOSE output CSV file into a numpy array.
- Returns:
[Np, C] float32 array columns : list of column name strings
- Return type:
arr
MOOSEZarrSink: writes processed MOOSE simulation data to Zarr format.
Subclasses physicsnemo_curator’s DataSource ABC (write-only role).
Zarr store layout per simulation run:
{sim_name}.zarr/
├── mesh/
│ ├── coords float32 [N, D] node coordinates
│ ├── connectivity int32 [E, K] element→node (0-indexed)
│ ├── edge_src int32 [M] graph edge source nodes
│ └── edge_dst int32 [M] graph edge destination nodes
├── fields/
│ ├── {field_name} float32 [T, E] normalized element solution field
│ └── ...
├── probes/
│ ├── {probe_name} float32 [Np, C] CSV line-probe values
│ └── ...
├── grid/
│ ├── x float32 [Nx] grid x-coordinates
│ ├── y float32 [Ny] grid y-coordinates
│ └── {field_name} float32 [T,Nx,Ny] interpolated field on regular grid
└── metadata/
├── time_steps float32 [T]
├── field_names str attrs on /metadata
├── probe_columns str attrs on /metadata
└── norm_stats/{field_name} attrs: mean, std
- class cases.moose_grid.etl.data_sources.zarr_sink.MOOSEZarrSink(cfg, output_dir, overwrite_existing=True, compression_level=3, compression_method='zstd', chunk_size_mb=1.0)[source]¶
Bases:
DataSourceWrites processed MOOSE data to per-simulation Zarr stores.
- Parameters:
cfg (
ProcessingConfig) – ProcessingConfig.output_dir (
str) – Directory where .zarr stores will be written.overwrite_existing (
bool) – If True (default) overwrite existing stores.compression_level (
int) – Blosc compression level 1-9.compression_method (
str) – Blosc codec name (default ‘zstd’).chunk_size_mb (
float) – Target chunk size in MB.
ETL transformations¶
MOOSEDataTransformation: normalize, build graph, interpolate to grid.
Implements the DataTransformation ABC from physicsnemo-curator.
- Pipeline:
Per-field mean/std normalization across all time steps and elements.
Graph edge construction from element→node connectivity (all node pairs within each element, both directions).
Bilinear interpolation of element-centroid values onto a regular Nx×Ny grid using scipy.interpolate.griddata.
Returns a dict containing all fields of MOOSEProcessedData, ready for MOOSEZarrSink to write to disk.
Schemas and validators¶
Data schemas for MOOSE simulation ETL pipeline.
MOOSERawData – raw data extracted directly from Exodus + CSV files. MOOSEProcessedData – normalized, graph-ready, grid-ready data for ML training.
- class cases.moose_grid.etl.schemas.MOOSERawData(coords, connectivity, field_names, fields, time_steps, probe_data, probe_columns, sim_name)[source]¶
Bases:
objectRaw data extracted from a single MOOSE simulation run.
coords : [N, D] node coordinates (D=2 for 2-D, D=3 for 3-D) connectivity : [E, K] element→node connectivity (0-indexed, K nodes/element) field_names : list[str] ordered list of element solution field names fields : [T, E, F] element solution values (time, element, field) time_steps : [T] simulation time values probe_data : dict probe_name → [Np, C] CSV line probe arrays probe_columns: list[str] column names shared by all CSV probes sim_name : str unique simulation identifier (stem of the .e file)
- coords: numpy.ndarray¶
- connectivity: numpy.ndarray¶
- fields: numpy.ndarray¶
- time_steps: numpy.ndarray¶
- probe_data: dict[str, numpy.ndarray]¶
- class cases.moose_grid.etl.schemas.NormStats(mean, std)[source]¶
Bases:
objectPer-field normalization statistics.
- class cases.moose_grid.etl.schemas.MOOSEProcessedData(coords, connectivity, edge_src, edge_dst, fields, field_names, norm_stats, probe_data, probe_columns, grid_fields, grid_x, grid_y, time_steps, sim_name)[source]¶
Bases:
objectProcessed, normalized data ready for PhysicsNeMo ML training.
coords : [N, D] node coordinates connectivity : [E, K] element→node connectivity (0-indexed) edge_src : [M] graph edge source node indices edge_dst : [M] graph edge destination node indices fields : [T, E, F] normalized element solution fields field_names : list[str] ordered field names matching last dim of fields norm_stats : dict field_name → NormStats(mean, std) probe_data : dict probe_name → [Np, C] CSV probe arrays (raw) probe_columns: list[str] column names for probe arrays grid_fields : [T, Nx, Ny, F] fields interpolated onto a regular grid grid_x : [Nx] x-coordinates of grid columns grid_y : [Ny] y-coordinates of grid rows time_steps : [T] simulation time values sim_name : str unique simulation identifier
- coords: numpy.ndarray¶
- connectivity: numpy.ndarray¶
- edge_src: numpy.ndarray¶
- edge_dst: numpy.ndarray¶
- fields: numpy.ndarray¶
- probe_data: dict[str, numpy.ndarray]¶
- grid_fields: numpy.ndarray¶
- grid_x: numpy.ndarray¶
- grid_y: numpy.ndarray¶
- time_steps: numpy.ndarray¶
MOOSEDatasetValidator: validates processed Zarr dataset structure.
Checks that every .zarr store in the output directory has the expected group hierarchy and required arrays before ML training begins.
- class cases.moose_grid.etl.validators.MOOSEDatasetValidator(cfg, output_dir)[source]¶
Bases:
DatasetValidatorValidates a directory of processed Zarr stores.
- Parameters:
cfg (
ProcessingConfig) – ProcessingConfig.output_dir (
str) – Directory containing .zarr stores to validate.
- validate_single_item(item)[source]¶
Validate a single .zarr store.
- Return type:
list[ValidationError]
- Checks:
Required groups exist.
Required arrays within each group exist.
Required metadata attributes are present.
fields/ group contains at least one dataset.
Shape consistency (coords, connectivity).