cases

Per-case packages. Each case owns its ETL pipeline, datasets, physics helpers, experiment hook, and Hydra configs. The training core does not import from cases/* — coupling flows in one direction only.

cases.alpha_d

Experiment

Alpha-D experiment.

Specialises training.experiment.Experiment with the alpha-D case’s evaluation metrics (per-region pointwise + Δp integral) and the plotting hooks the runner uses for per-case profile and parity plots. The training step itself is inherited unchanged.

class cases.alpha_d.experiment.AlphaDExperiment(model, optimizer, loss_fn, adapter, device, **kwargs)[source]

Bases: Experiment

Alpha-D-specific evaluation + plotting hooks.

compute_extended_metrics(eval_dataset, all_preds, all_targets)[source]

Pointwise + Δp metrics for the alpha-D adapter.

Requires a TabularPairDataset / AlphaDProfileDataset (gated by _row_case_idx). Other adapters fall through to {}.

Return type:

dict[str, Any]

print_extended_metrics(metrics)[source]

Pretty-print the dict returned by compute_extended_metrics.

Default: no-op (the generic runner already prints overall + per-field MSE/RMSE). Subclasses override to format case-specific blocks.

Return type:

None

prepare_for_training(train_dataset, val_dataset, device)[source]

Bind alpha-D-specific state from the datasets onto the experiment.

Return type:

None

decode_for_plotting(values, dataset, field_name, mask)[source]

Re-add encoded baseline, decode to bulk α_D for profile plotting.

baseline_for_plotting(dataset, field_name, mask)[source]

Decode the analytical alpha-D baseline for the masked stations.

Reuses decode_for_plotting with a zero residual so the physical decode pipeline (baseline re-add + unit conversion) stays in one place.

Feature data

Data loader for feature analysis.

Flattens the per-case zarr stores produced by the alpha_D ETL pipeline into dense numpy arrays (X, y, groups) suitable for sklearn.

Leakage controls

  1. ALLOWLIST hard-codes the input features that are safe to consider. The YAML config may restrict this set via selected_from_allowlist but cannot extend it. Unknown names raise ValueError. Extending the allowlist is a code change so it appears in review.

  2. groups is the per-row case index. Callers must use sklearn.model_selection.GroupKFold(groups=groups) for all CV; rows inside a case are spatially correlated.

  3. Metadata attributes such as delta_p_case are never included as features here — they are target-adjacent and would leak.

cases.alpha_d.feature_data.build_engineered_feature_map(features, raw_feature_names)[source]

Build leakage-safe engineered features from raw per-row feature columns.

Return type:

dict[str, ndarray]

cases.alpha_d.feature_data.engineered_features_spec()[source]

Entrypoint returning (names, builder) for TabularPairDataset.

Used by the pointwise adapter via the data.engineered_features_entrypoint config key.

cases.alpha_d.feature_data.load_feature_matrix(zarr_dir, *, target='log_alpha_D', selected_from_allowlist=None, local_velocity_normalization=True, min_Dr=None, exclude_cases=None)[source]

Load all cases under zarr_dir into a flat FeatureAnalysisData.

Parameters:
  • zarr_dir (str | Path) – Directory containing *.zarr stores.

  • target (str) – Target column name. Must exist in zarr target_names.

  • selected_from_allowlist (list[str] | None) – Restrict the ALLOWLIST to this subset. Cannot add new names.

  • local_velocity_normalization (bool) – If True, rescale alpha_D-family targets to the local-velocity basis before returning them. Matches training behaviour when the MLP is trained with local_velocity_normalization: true.

  • min_Dr (float | None) – Drop cases with diameter ratio below this value (matches training).

  • exclude_cases (list[str] | None) – Drop cases whose stem is in this list.

Return type:

FeatureAnalysisData

Transforms and metrics

Case-specific target transforms for the alpha-D surrogate.

The generic TabularPairDataset accepts a target_transform callable that rewrites encoded targets before the dataset materialises tensors. This module provides the alpha-D closed-form residual transform, which optionally applies local-velocity normalisation first and then subtracts the closed-form baseline:

encoded_truth_lv = LV_norm(encoded_truth) if requested encoded_residual = encoded_truth_lv − encoded_baseline_lv

where encoded_baseline is the per-station alpha-D baseline encoded with the same target convention as the truth (see cases.alpha_d.physics).

A transform returns (transformed_y, extras) where extras is a dict of well-known extras the dataset stashes on self. Recognised keys:

  • baseline_encodedndarray of the encoded baseline, stashed at dataset._baseline_encoded so metrics / plotting / the Δp integral can re-add it at decode boundaries.

  • local_velocity_normalizationbool indicating whether LV-norm was actually applied (the dataset propagates this onto dataset.local_velocity_normalization).

cases.alpha_d.transforms.alpha_d_residual_transform(full_y, full_x, *, target_names, feature_names, case_meta_list, rows_per_case, local_velocity_normalization=False)[source]

Optionally LV-normalise and subtract the closed-form alpha-D baseline.

No-op (returns (full_y, {})) when the dataset cannot satisfy the prerequisites: z_hat / d_local_over_D features missing, or no alpha-D-shaped column in target_names.

Return type:

tuple[ndarray, dict[str, Any]]

Alpha-D-specific extended evaluation metrics.

The generic runner calls Experiment.compute_extended_metrics(), which AlphaDExperiment (in cases/alpha_d/experiment.py) overrides to invoke the helpers in this module. Base experiments inherit a no-op and skip this module entirely, so the runner stays alpha-D-agnostic.

Three functions live here:

cases.alpha_d.metrics.compute_delta_p_metrics(model, eval_dataset, device, *, alpha_d_target_name='log_alpha_D', local_velocity_normalization=False)[source]

Per-case Δp prediction error statistics.

Integrates the predicted α_D profile via the trapezoidal rule to obtain delta_p_pred, then compares with delta_p_case stored in the zarr metadata. Per-case geometry constants (D_big, outer_height_m, buffer_diams, rho, V_bulk) are read from each case’s metadata and fall back to the historical AlphaD-ETL defaults when missing.

Return type:

dict[str, Any]

cases.alpha_d.metrics.compute_pointwise_extended_metrics(preds, targets, dataset, output_fields)[source]

R², physical-space, per-region, and per-case metrics.

Only meaningful for the pointwise adapter with TabularPairDataset.

Return type:

dict[str, Any]

cases.alpha_d.metrics.print_extended_metrics(metrics)[source]

Human-readable summary of α_D extended metrics.

Return type:

None

Datasets

Per-case profile dataset for 1D-conv α_D training.

Wraps a TabularPairDataset and exposes per-case views shaped (features, stations) so that a 1D conv along the station axis can treat each case as a single sample.

The wrapper delegates flat row-level state to the inner tabular dataset, which is what every existing access site in runner.py, experiments/alpha_d.py, and plotting.py reads. Subsets are built by delegating to TabularPairDataset.subset_by_case_indices so that flat state stays aligned with the train/val/test case split.

class cases.alpha_d.datasets.profile.AlphaDProfileDataset(*args, **kwargs)[source]

Bases: Dataset

Per-case dataset producing (x, y, w, case_idx) profile tensors.

Shapes per item:

x : [in_features, n_stations] y : [out_features, n_stations] w : [1, n_stations] (broadcast-compatible with y) case_idx : scalar long tensor

Stations are sorted by z_hat per case so the conv sees a monotone spatial sequence.

property norm_stats
property normalize
property has_target_baseline
property local_velocity_normalization
property exclude_cases
property input_columns
property output_columns
property in_features
property out_features
property sim_names
subset_by_case_indices(case_indices)[source]
Return type:

AlphaDProfileDataset

add_baseline_to_encoded(encoded, row_mask=None, field_idx=None)[source]

Delegate to the inner TabularPairDataset.

cases.alpha_d.datasets.profile.build_dataset(data_cfg)[source]

Construct an AlphaDProfileDataset from a Hydra data config.

Called by training.adapters.ProfileAdapter.build_dataset after it resolves data.dataset_entrypoint. Reads all the alpha-D-flavoured kwargs the generic adapter no longer knows about; engineered features and target transform default-injection still happen inside AlphaDProfileDataset.__init__.

Return type:

AlphaDProfileDataset

Physics

Shared conversions for alpha_D training targets.

The alpha_D surrogate can be trained with different encoded targets while still needing a consistent way to recover physical alpha_D for pressure-drop integration, evaluation, and plotting.

cases.alpha_d.physics.targets.is_alpha_d_target(field_name)[source]

Return True when field_name encodes alpha_D.

Return type:

bool

cases.alpha_d.physics.targets.encode_alpha_d_target(alpha_d_values, *, target_name)[source]

Encode physical alpha_D into a model target representation.

cases.alpha_d.physics.targets.decode_alpha_d_target(encoded_values, *, target_name)[source]

Decode a model target representation back to physical alpha_D.

cases.alpha_d.physics.targets.alpha_d_values_to_bulk(encoded_values, *, target_name, d_over_D=None, local_velocity_normalization=False)[source]

Decode target-space values into bulk-velocity-basis alpha_D.

cases.alpha_d.physics.targets.alpha_d_bulk_to_values(alpha_d_bulk, *, target_name, d_over_D=None, local_velocity_normalization=False)[source]

Encode bulk-basis alpha_D into the requested model-space target.

cases.alpha_d.physics.targets.convert_alpha_d_values_between_bases(encoded_values, *, target_name, d_over_D, from_local_velocity_normalization, to_local_velocity_normalization)[source]

Re-encode alpha_D values while switching bulk/local-velocity basis.

cases.alpha_d.physics.targets.field_values_to_physical(values, *, field_name, d_over_D=None, local_velocity_normalization=False)[source]

Convert model-space values into physical-space values.

For alpha_D targets this returns bulk-basis alpha_D. For other logarithmic fields, it applies the corresponding inverse transform. Non-logarithmic fields are returned unchanged.

Closed-form sudden contraction + throat-friction alpha_D baseline.

For an axisymmetric pipe with an abrupt contraction-then-expansion this baseline keeps two terms whose CFD signature is genuinely localised:

ΔP_baseline = K_c · q_throat + f_Darcy · (L_throat / D_throat) · q_throat

with q_throat = 0.5 ρ V_throat², and

β² = (D_throat / D_big)² = Dr² K_c = 0.5 · (1 − β²) # sudden contraction (Idelchik) V_throat = V_bulk / Dr² Re_throat = Re_bulk / Dr f_Darcy = 0.316 · Re_throat^(-1/4) # Blasius (turbulent)

For this dataset Re_throat ∈ [5.5e3, 7.5e5] — always turbulent, so the laminar branch is omitted.

The Borda-Carnot expansion loss K_e = (1 − β²)² is deliberately not included. In CFD the expansion loss is dissipated gradually over several diameters downstream, and the local α_D right at the expansion plane is dominated by Bernoulli-like pressure recovery (often slightly negative). Adding K_e as a single positive spike at the outlet bin forces the model to fight a phantom artifact (worth O(K_e · D_h / dz) in α_D), which empirically destroyed the residual fit. We let the model learn the distributed downstream loss on its own.

The baseline is exposed as a profile α_D(z) so that integrating

dp/dz = α_D · ρ V_bulk² / (2 D_h) # bulk-basis convention

over the ROI reproduces ΔP_baseline above. K_c is localised at the contraction-inlet bin (single station of physical width dz_phys = L_roi / n_stations); friction is uniform across the throat interior.

The closed form is intentionally simple — its job is to provide a smooth shape prior so the MLP can fit the residual against CFD, not to replace the CFD computation.

class cases.alpha_d.physics.baseline.BaselineGeometry(Re, Dr, Lr, D_big=0.2, outer_height_m=1.0, buffer_diams=1.0, rho=1.0, V_bulk=1.0, n_stations=50)[source]

Bases: object

Per-case geometry constants required to evaluate the baseline.

Re: float
Dr: float
Lr: float
D_big: float = 0.2
outer_height_m: float = 1.0
buffer_diams: float = 1.0
rho: float = 1.0
V_bulk: float = 1.0
n_stations: int = 50
property L_roi: float
property dz_phys: float
property dz_norm: float
property z_throat_start_norm: float
property z_throat_end_norm: float
cases.alpha_d.physics.baseline.alpha_d_baseline_profile(z_hat, geom)[source]

Compute the bulk-basis α_D baseline at each station.

Parameters:
  • z_hat (ndarray) – Per-station normalised axial coordinate, shape [n_stations].

  • geom (BaselineGeometry) – Geometry + flow constants for the case.

Returns:

alpha_D – Bulk-basis α_D = -dp/dz · 2 D_h / (ρ V_bulk²) at each station.

Return type:

ndarray

cases.alpha_d.physics.baseline.integrated_baseline_delta_p(geom)[source]

ΔP predicted by the closed-form profile (K_c + throat friction).

K_e is intentionally excluded — see the module docstring.

Return type:

float

ETL (Source · Transform · Sink)

AlphaDSource: reads MOOSE simulation_out.e + case_metadata.txt.

Subclasses physicsnemo_curator DataSource. Each file represents one parametric-study case. The source returns raw mesh + field arrays together with the case parameters (Re, Dr, Lr, geometry dimensions) needed by the downstream transformation.

class cases.alpha_d.etl.source.AlphaDSource(cfg, input_dir, manifest=None, mesh_scale=1.0, exodus_filename='simulation_out.e')[source]

Bases: DataSource

Reads simulation_out.e files from the parametric study.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig.

  • input_dir (str) – Root of parametric study (contains case sub-directories).

  • manifest (str | None) – Path to cases_manifest.csv.

  • mesh_scale (float) – Scale factor applied to mesh coordinates (default 1.0).

  • exodus_filename (str) – Name of the output Exodus file inside each case dir.

get_file_list()[source]

Return paths to simulation_out.e files in case sub-directories.

Return type:

list[str]

read_file(filename)[source]

Read one case’s Exodus output and metadata.

Return type:

dict[str, Any]

AlphaDTransformation: extract contraction ROI, compute Darcy coefficient.

Implements the DataTransformation ABC from physicsnemo-curator.

For each CFD case the transformation:
  1. Computes element centroids from node coords + connectivity.

  2. Identifies the contraction region from geometry parameters.

  3. Bins elements into axial (z) stations.

  4. Computes cross-section-averaged pressure at each station.

  5. Derives the Darcy resistance coefficient alpha_D via the pressure gradient and local hydraulic diameter.

  6. Constructs the feature table that the MLP will consume.

Output dict contains features [N_stations, D_in], targets [N_stations, 1], and metadata.

class cases.alpha_d.etl.transform.AlphaDTransformation(cfg, n_stations=50, buffer_diams=1.0, rho=1.0, min_elements=3)[source]

Bases: DataTransformation

Extract contraction-region Darcy resistance profiles.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig.

  • n_stations (int) – Number of axial stations to bin into.

  • buffer_diams (float) – Number of pipe diameters of upstream/downstream buffer.

  • rho (float) – Fluid density (kg/m^3), matching the CFD setup.

  • min_elements (int) – Minimum elements required per station; skip case if not met.

transform(data)[source]

Compute alpha_D axial profile for one CFD case.

Return type:

Optional[dict[str, Any]]

AlphaDZarrSink: writes per-case alpha_D profiles to Zarr stores.

Zarr store layout per case:

{case_name}.zarr/
    features    float32 [N_stations, D_in]
    targets     float32 [N_stations, D_out]
    metadata/
        attrs: case_id, feature_names, target_names,
               Re, Dr, Lr, delta_p_case
class cases.alpha_d.etl.sink.AlphaDZarrSink(cfg, output_dir, overwrite_existing=True)[source]

Bases: DataSource

Writes alpha_D profile data to per-case Zarr stores.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig.

  • output_dir (str) – Directory where .zarr stores will be written.

  • overwrite_existing (bool) – Overwrite existing stores (default True).

get_file_list()[source]
Return type:

list[str]

read_file(filename)[source]
Return type:

dict[str, Any]

should_skip(filename)[source]
Return type:

bool

cleanup_temp_files()[source]
Return type:

None

cases.case_pressure_drop

Feature-selection helpers for the case-level pressure-drop workflow.

class cases.case_pressure_drop.feature_selection.MethodResult(mean_score, mean_rank, per_fold_ranks, stability)[source]

Bases: object

mean_score: numpy.ndarray
mean_rank: numpy.ndarray
per_fold_ranks: numpy.ndarray
stability: numpy.ndarray
class cases.case_pressure_drop.feature_selection.SelectionResult(selected_features, report, report_path, manifest_path, selected_features_path, used_case_ids)[source]

Bases: object

selected_features: list[str]
report: dict[str, Any]
report_path: Path
manifest_path: Path
selected_features_path: Path
used_case_ids: list[str]
cases.case_pressure_drop.feature_selection.run_feature_selection(dataset, *, feature_names, methods, top_k, n_splits, seed, stability_min, mutual_info_n_seeds, output_dir, config, redundancy_threshold=0.95)[source]

Rank candidate features on the training split and write artifacts.

Return type:

SelectionResult

cases.moose_grid

ETL data sources

ExodusDataSource: reads MOOSE Exodus (.e) simulation output files.

Subclasses physicsnemo_curator’s DataSource ABC.

Each Exodus file represents one simulation run. The reader:
  1. Extracts mesh geometry (node coordinates, element connectivity).

  2. Extracts element solution fields for every time step.

  3. Optionally co-reads matching CSV line-probe files via CSVProbeSource.

The returned dict is keyed so that MOOSEDataTransformation can consume it directly without field-name guessing.

class cases.moose_grid.etl.data_sources.exodus_source.ExodusDataSource(cfg, input_dir, data_dir=None)[source]

Bases: DataSource

Reads MOOSE Exodus files and co-reads matching CSV probe files.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig from the curator framework.

  • input_dir (str) – Directory containing Exodus (.e) files.

  • data_dir (str | None) – Directory containing CSV probe files. If omitted, defaults to input_dir.

get_file_list()[source]

Return sorted list of Exodus file paths.

Return type:

list[str]

read_file(filename)[source]

Read one Exodus file and its associated CSV probes.

Returns a dict that can be passed directly to MOOSEDataTransformation.

Return type:

dict[str, Any]

CSVProbeSource: reads MOOSE CSV line-probe output files.

CSV files produced by MOOSE VectorPostprocessors follow the naming pattern:

{sim_prefix}_out_{probe_name}_{timestep:04d}.csv

All CSVs belonging to the same simulation run share the same {sim_prefix}. Each file holds a column-per-field table (TKE, TKED, id, pressure, vel_x, vel_y, x, y, z, …) with one row per sample point along the probe.

This helper is called by ExodusDataSource.read_file() — it is not a DataSource subclass because it does not manage its own file list.

cases.moose_grid.etl.data_sources.csv_source.find_probe_files(sim_prefix, data_dir)[source]

Find all CSV probe files that belong to a simulation run.

Parameters:
  • sim_prefix (str) – Stem of the Exodus file (e.g. ‘lid-driven-segregated_out’ stripped of the trailing ‘_out’ is not needed — just pass the full exodus stem without extension).

  • data_dir (Path) – Directory to search for CSV files.

Return type:

dict[str, list[Path]]

Returns:

Mapping from probe name to sorted list of CSV file paths (one entry per time step).

class cases.moose_grid.etl.data_sources.csv_source.CSVProbeSource(data_dir)[source]

Bases: object

Reads and aggregates MOOSE CSV line-probe files for one simulation run.

read_all(sim_prefix)[source]

Read all probe CSVs for a simulation run.

Returns:

  • probe_data: dict mapping probe_name to a numpy array of shape [Np, C] where Np is the number of sample points and C the number of columns. When multiple time steps are found, data from the last time step is used (steady-state typical).

  • probe_columns: ordered list of column names shared across probes.

Return type:

A pair (probe_data, probe_columns)

cases.moose_grid.etl.data_sources.csv_source.read_csv(path)[source]

Read a MOOSE output CSV file into a numpy array.

Returns:

[Np, C] float32 array columns : list of column name strings

Return type:

arr

MOOSEZarrSink: writes processed MOOSE simulation data to Zarr format.

Subclasses physicsnemo_curator’s DataSource ABC (write-only role).

Zarr store layout per simulation run:

{sim_name}.zarr/
├── mesh/
│   ├── coords          float32 [N, D]   node coordinates
│   ├── connectivity    int32   [E, K]   element→node (0-indexed)
│   ├── edge_src        int32   [M]      graph edge source nodes
│   └── edge_dst        int32   [M]      graph edge destination nodes
├── fields/
│   ├── {field_name}    float32 [T, E]   normalized element solution field
│   └── ...
├── probes/
│   ├── {probe_name}    float32 [Np, C]  CSV line-probe values
│   └── ...
├── grid/
│   ├── x               float32 [Nx]     grid x-coordinates
│   ├── y               float32 [Ny]     grid y-coordinates
│   └── {field_name}    float32 [T,Nx,Ny] interpolated field on regular grid
└── metadata/
    ├── time_steps      float32 [T]
    ├── field_names     str attrs on /metadata
    ├── probe_columns   str attrs on /metadata
    └── norm_stats/{field_name}  attrs: mean, std
class cases.moose_grid.etl.data_sources.zarr_sink.MOOSEZarrSink(cfg, output_dir, overwrite_existing=True, compression_level=3, compression_method='zstd', chunk_size_mb=1.0)[source]

Bases: DataSource

Writes processed MOOSE data to per-simulation Zarr stores.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig.

  • output_dir (str) – Directory where .zarr stores will be written.

  • overwrite_existing (bool) – If True (default) overwrite existing stores.

  • compression_level (int) – Blosc compression level 1-9.

  • compression_method (str) – Blosc codec name (default ‘zstd’).

  • chunk_size_mb (float) – Target chunk size in MB.

get_file_list()[source]
Return type:

list[str]

read_file(filename)[source]
Return type:

dict[str, Any]

should_skip(filename)[source]
Return type:

bool

cleanup_temp_files()[source]

Remove orphaned *.zarr_temp directories from interrupted runs.

Return type:

None

ETL transformations

MOOSEDataTransformation: normalize, build graph, interpolate to grid.

Implements the DataTransformation ABC from physicsnemo-curator.

Pipeline:
  1. Per-field mean/std normalization across all time steps and elements.

  2. Graph edge construction from element→node connectivity (all node pairs within each element, both directions).

  3. Bilinear interpolation of element-centroid values onto a regular Nx×Ny grid using scipy.interpolate.griddata.

Returns a dict containing all fields of MOOSEProcessedData, ready for MOOSEZarrSink to write to disk.

class cases.moose_grid.etl.transformations.moose_transform.MOOSEDataTransformation(cfg, grid_nx=64, grid_ny=64, eps=1e-08)[source]

Bases: DataTransformation

Normalize, build graph, and interpolate MOOSE simulation data.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig from the curator framework.

  • grid_nx (int) – Number of grid columns for the regular-grid output.

  • grid_ny (int) – Number of grid rows for the regular-grid output.

  • eps (float) – Small value added to std to avoid division by zero.

transform(data)[source]

Transform raw MOOSE data into ML-ready form.

Parameters:

data (dict[str, Any]) – dict produced by ExodusDataSource.read_file().

Return type:

Optional[dict[str, Any]]

Returns:

dict of MOOSEProcessedData fields, or None to skip this sample.

Schemas and validators

Data schemas for MOOSE simulation ETL pipeline.

MOOSERawData – raw data extracted directly from Exodus + CSV files. MOOSEProcessedData – normalized, graph-ready, grid-ready data for ML training.

class cases.moose_grid.etl.schemas.MOOSERawData(coords, connectivity, field_names, fields, time_steps, probe_data, probe_columns, sim_name)[source]

Bases: object

Raw data extracted from a single MOOSE simulation run.

coords : [N, D] node coordinates (D=2 for 2-D, D=3 for 3-D) connectivity : [E, K] element→node connectivity (0-indexed, K nodes/element) field_names : list[str] ordered list of element solution field names fields : [T, E, F] element solution values (time, element, field) time_steps : [T] simulation time values probe_data : dict probe_name → [Np, C] CSV line probe arrays probe_columns: list[str] column names shared by all CSV probes sim_name : str unique simulation identifier (stem of the .e file)

coords: numpy.ndarray
connectivity: numpy.ndarray
field_names: list[str]
fields: numpy.ndarray
time_steps: numpy.ndarray
probe_data: dict[str, numpy.ndarray]
probe_columns: list[str]
sim_name: str
class cases.moose_grid.etl.schemas.NormStats(mean, std)[source]

Bases: object

Per-field normalization statistics.

mean: float
std: float
class cases.moose_grid.etl.schemas.MOOSEProcessedData(coords, connectivity, edge_src, edge_dst, fields, field_names, norm_stats, probe_data, probe_columns, grid_fields, grid_x, grid_y, time_steps, sim_name)[source]

Bases: object

Processed, normalized data ready for PhysicsNeMo ML training.

coords : [N, D] node coordinates connectivity : [E, K] element→node connectivity (0-indexed) edge_src : [M] graph edge source node indices edge_dst : [M] graph edge destination node indices fields : [T, E, F] normalized element solution fields field_names : list[str] ordered field names matching last dim of fields norm_stats : dict field_name → NormStats(mean, std) probe_data : dict probe_name → [Np, C] CSV probe arrays (raw) probe_columns: list[str] column names for probe arrays grid_fields : [T, Nx, Ny, F] fields interpolated onto a regular grid grid_x : [Nx] x-coordinates of grid columns grid_y : [Ny] y-coordinates of grid rows time_steps : [T] simulation time values sim_name : str unique simulation identifier

coords: numpy.ndarray
connectivity: numpy.ndarray
edge_src: numpy.ndarray
edge_dst: numpy.ndarray
fields: numpy.ndarray
field_names: list[str]
norm_stats: dict[str, NormStats]
probe_data: dict[str, numpy.ndarray]
probe_columns: list[str]
grid_fields: numpy.ndarray
grid_x: numpy.ndarray
grid_y: numpy.ndarray
time_steps: numpy.ndarray
sim_name: str

MOOSEDatasetValidator: validates processed Zarr dataset structure.

Checks that every .zarr store in the output directory has the expected group hierarchy and required arrays before ML training begins.

class cases.moose_grid.etl.validators.MOOSEDatasetValidator(cfg, output_dir)[source]

Bases: DatasetValidator

Validates a directory of processed Zarr stores.

Parameters:
  • cfg (ProcessingConfig) – ProcessingConfig.

  • output_dir (str) – Directory containing .zarr stores to validate.

validate()[source]

Validate all .zarr stores in output_dir.

Return type:

list[ValidationError]

validate_single_item(item)[source]

Validate a single .zarr store.

Return type:

list[ValidationError]

Checks:
  • Required groups exist.

  • Required arrays within each group exist.

  • Required metadata attributes are present.

  • fields/ group contains at least one dataset.

  • Shape consistency (coords, connectivity).