dataset

The public PyTorch Dataset consumed by downstream user code. For the narrative overview see Dataset API.

MOOSEDataset: PyTorch Dataset over processed Zarr simulation stores.

Supports three representation modes for different PhysicsNeMo model families:

“graph” GNN / MeshGraphNet ───────────────────────────────────────────────────────────────────────── coords float32 [N, D] node spatial coordinates edge_index int64 [2, M] COO edge list (src / dst) node_fields float32 [T, N, F] per-node fields (interpolated from elements) elem_fields float32 [T, E, F] per-element fields (raw) probe_data dict probe_name → float32 [Np, C]

“point_cloud” PointNet / Transformer ───────────────────────────────────────────────────────────────────────── coords float32 [N, D] node spatial coordinates node_fields float32 [T, N, F] per-node fields

“grid” CNN (U-Net, FNO) ───────────────────────────────────────────────────────────────────────── grid_x float32 [Nx] column x-coordinates grid_y float32 [Ny] row y-coordinates grid_fields float32 [T, Nx, Ny, F] fields on regular grid

All modes also include:

field_names list[str] field name for each F index norm_stats dict field_name → {“mean”: float, “std”: float} sim_name str unique simulation identifier time_steps float32 [T]

If time_idx is given (≥ 0), only that time step is returned (T-dim removed).

Denormalization ───────────────

Call dataset.denormalize("pressure", tensor) to recover a tensor in original physical units.

class dataset.moose_dataset.MOOSEDataset(zarr_dir, mode='graph', time_idx=-1)[source]

Bases: Dataset

Dataset over a directory of processed MOOSE Zarr stores.

Parameters:
  • zarr_dir (str | Path) – Path to the directory containing *.zarr stores.

  • mode (str) – One of “graph”, “point_cloud”, “grid”.

  • time_idx (int) – If ≥ 0, return only this time step (removes T dimension). If -1 (default), return all time steps.

MODES = ('graph', 'point_cloud', 'grid')
denormalize(field_name, tensor)[source]

Reverse the z-score normalization for a single field.

Parameters:
  • field_name (str) – Name of the field (must be in norm_stats).

  • tensor (Tensor) – Normalized tensor of any shape.

Return type:

Tensor

Returns:

Tensor in original physical units.

dataset.moose_dataset.to_tensor(arr)[source]

Convert a zarr array to a float32 torch tensor.

Return type:

Tensor

dataset.moose_dataset.load_fields(fields_grp, field_names)[source]

Load and stack element fields from a zarr group → [T, E, F].

Return type:

Tensor

dataset.moose_dataset.slice_time(tensor, time_idx)[source]

If time_idx >= 0, select that time step and remove the T dimension.

Return type:

Tensor

dataset.moose_dataset.elem_to_node(elem_fields, connectivity, n_nodes)[source]

Average element fields onto nodes (scatter mean over element→node map).

elem_fields : […, E, F] (leading dims may include T) connectivity : [E, K] 0-indexed node indices n_nodes : N

Returns node_fields : […, N, F]

Return type:

Tensor

dataset.moose_dataset.load_norm_stats(meta_grp)[source]

Read per-field normalization stats from metadata/norm_stats/.

Return type:

dict[str, dict[str, float]]