Getting Started¶

This guide covers day-to-day usage with Docker Compose or Apptainer (for HPC systems where Docker is unavailable):

setting up and running the ETL
choosing the right container image
troubleshooting container builds
adding new Python packages to images

Prerequisites¶

Docker Desktop (or Docker Engine + Compose v2) or Apptainer (HPC)
Git submodules initialized:

git submodule update --init --recursive

Environment variables¶

Copy .env.example to .env and fill in values for your environment:

cp .env.example .env

Key variables:

Variable	Used by	Description
`APPTAINER_BIND`	Apptainer	Default bind mounts (e.g. `/path/to/project:/path/to/project`)
`DOCKER_PLATFORM`	Docker Compose	Force `linux/amd64` on Apple Silicon
`CA_CERT_DIR`	Docker Compose	Path to directory with custom CA certs
`EXTRA_CA_CERT_B64`	Docker Compose	Base64-encoded CA cert (alternative to `CA_CERT_DIR`)
`INSTALL_PHYSICSNEMO`	Docker Compose (`etl`)	Set to `0` to skip PhysicsNeMo install
`UV_ALLOW_INSECURE_HOST_FLAGS`	Docker Compose (`etl`)	TLS bypass for uv
`PIP_TRUSTED_HOST_FLAGS`	Docker Compose (`etl-dev`, `etl-ngc`)	TLS bypass for pip
`HTTP_PROXY` / `HTTPS_PROXY` / `NO_PROXY`	Both	Corporate proxy settings

Docker Compose reads .env automatically. For Apptainer, see the source step in the Apptainer section below.

Choose a container image¶

Service	Dockerfile	Apptainer def	Base	Approx. size	Best for
`etl-dev`	`docker/Dockerfile.dev`	`docker/dev.def`	`python:3.11-slim`	~300 MB	Fast ETL iteration (no PhysicsNeMo/PyTorch)
`etl`	`docker/Dockerfile.physicsnemo-cpu`	`docker/physicsnemo-cpu.def`	`python:3.11-slim`	~1 GB	Full CPU stack from PyPI
`etl-gpu`	`docker/Dockerfile.gpu`	`docker/gpu.def`	`python:3.11-slim` + PyTorch cu124 wheels	~4 GB	CPU + NVIDIA GPU (CUDA 12.4, amd64 only)
`etl-ngc`	`docker/Dockerfile.ngc`	`docker/ngc.def`	`nvcr.io/nvidia/physicsnemo/physicsnemo:25.11`	~13 GB	NVIDIA pre-tested stack (amd64 only)

etl-dev and etl run on Apple Silicon (arm64) and Intel (amd64) without a GPU. etl-gpu and etl-ngc are amd64-only and support NVIDIA GPUs.

Note

None of these images include MOOSE itself — they ship the Python / PhysicsNeMo stack used to consume MOOSE outputs. To produce the .e files the ETL reads, see Running MOOSE Simulations, which documents the separate moose-dev-openmpi-x86_64.sif Apptainer image and links to MOOSE’s official install paths.

Build and run with Docker Compose¶

Option A: direct run from host terminal¶

docker compose build etl-dev
docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py'

cases/moose_grid/run_etl.py defaults to cases/moose_grid/configs/etl.yaml (the lid-driven flow). Replace etl-dev with etl or etl-ngc if needed.

Build and run with Apptainer (HPC)¶

Use Apptainer on HPC systems where Docker is not available (e.g., INL ROD).

Step 1: Source environment variables¶

Apptainer does not read .env automatically. Source it before every session:

set -a && source .env && set +a

This loads APPTAINER_BIND and any proxy settings into your shell so subsequent apptainer commands pick them up without needing --bind flags.

Step 2: Build a SIF image¶

# Minimal dev image (ETL only, ~300 MB)
apptainer build multifid-th-dev.sif docker/dev.def

# Full CPU image with PhysicsNeMo (~1 GB)
apptainer build multifid-th-cpu.sif docker/physicsnemo-cpu.def

# CUDA 12.4 GPU image — CPU-only without --nv, GPU with --nv (~5 GB)
apptainer build multifid-th-gpu.sif docker/gpu.def

# NGC image — CPU-only without --nv, GPU with --nv (~13 GB)
apptainer build multifid-th-ngc.sif docker/ngc.def

Step 3: Run with project folder bound¶

Bind your project directory so the container can read inputs and write outputs:

# CPU-only
apptainer run \
  --bind /path/to/project:/path/to/project \
  multifid-th-cpu.sif

# GPU (--nv exposes host NVIDIA drivers to the container)
apptainer run --nv \
  --bind /path/to/project:/path/to/project \
  multifid-th-gpu.sif

Your $HOME directory is auto-bound by Apptainer, so files under $HOME are always accessible without an explicit --bind.

Run a script directly¶

# CPU
apptainer exec \
  --bind /path/to/project:/path/to/project \
  multifid-th-cpu.sif \
  bash -c 'cd /path/to/src && python cases/moose_grid/run_etl.py'

# GPU
apptainer exec --nv \
  --bind /path/to/project:/path/to/project \
  multifid-th-gpu.sif \
  bash -c 'cd /path/to/src && python train.py --config-path cases/moose_grid/configs --config-name train_fno'

Verify GPU access inside the container¶

apptainer exec --nv multifid-th-gpu.sif python -c \
  "import torch; print(torch.cuda.get_device_name(0))"

Set a default bind (optional)¶

To avoid typing --bind every time, export it in your shell profile:

export APPTAINER_BIND="/path/to/project:/path/to/project"

Then run without --bind:

apptainer run multifid-th-cpu.sif

The lid-driven flow config lives at src/cases/moose_grid/configs/etl.yaml:

defaults:
  - etl_base
  - _self_

etl:
  processing:
    num_processes: 4
  source:
    input_dir: ../data/lid-driven
    data_dir: ../data/lid-driven
  sink:
    output_dir: ../data/processed/lid-driven

Option A2: same run with CLI overrides (no dedicated yaml)¶

docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py \
  etl.source.input_dir=../data/lid-driven \
  etl.source.data_dir=../data/lid-driven \
  etl.sink.output_dir=../data/processed/lid-driven \
  etl.processing.num_processes=4'

Create your own config¶

Use the lid-driven config as a template for a new dataset:

cp src/cases/moose_grid/configs/etl.yaml src/cases/moose_grid/configs/my_case.yaml

Edit these keys in src/cases/moose_grid/configs/my_case.yaml:

etl.source.input_dir
etl.source.data_dir
etl.sink.output_dir
(optional) etl.processing.num_processes

Run with your new config name:

docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py --config-name my_case'

Option B: interactive shell¶

docker compose run --rm etl-dev

Then inside the container:

cd src
python cases/moose_grid/run_etl.py

Input and output conventions¶

Pattern	Description
`{sim_name}.e`	Exodus II mesh + element fields
`{sim_prefix}_out_{probe_name}_{timestep:04d}.csv`	CSV line probes

Output directory (with the default lid-driven config): data/processed/lid-driven/
Output format: one {sim_name}.zarr per simulation
Exodus and CSV prefixes do not need to match

FNO training and evaluation¶

Use the etl or etl-ngc service for PhysicsNeMo + PyTorch scripts. Edit this template first:

src/cases/moose_grid/configs/train_fno.yaml

train_fno.yaml is a Hydra config that inherits src/training/config/default.yaml (via hydra.searchpath: pkg://training.config) and sets an FNO example for train/evaluate.

Train¶

docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/moose_grid/configs --config-name train_fno'

Evaluate¶

docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/moose_grid/configs --config-name train_fno'

Generate velocity-field comparison plots during evaluation:

docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/moose_grid/configs --config-name train_fno \
  output.plot_dir=../data/models/lid_driven_fno_plots'

CLI flags override YAML values:

docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/moose_grid/configs --config-name train_fno training.epochs=50'

Logs¶

During build:

docker compose build --progress=plain etl-ngc

During runtime:

docker compose logs -f etl-ngc
docker compose logs --tail=100 etl-ngc

docker compose run --rm ... removes the container when it exits, including its stored logs. Omit --rm if you need to inspect logs after a run.

Build the documentation¶

The site you’re reading is built with Sphinx + MyST + Furo. The multifid-th-cpu.sif image already has the full Sphinx stack installed, so no extra install step is required.

From the repository root:

# Apptainer (preferred on HPC)
apptainer exec --bind "$PWD:$PWD" --pwd "$PWD" multifid-th-cpu.sif \
    make -C docs html

# Docker (workstation)
docker compose run --rm etl bash -lc 'make -C docs html'

Open docs/_build/html/index.html in a browser to preview.

Rebuild after:

editing any .md file under docs/,
editing module / class / function docstrings (autodoc pulls them into the API pages), or
adding / removing classes or functions exposed via automodule directives in docs/api/.

For live reload, the nitpicky strict target (warnings-fatal, matching CI), and the full list of make targets, see Building the documentation. docs/_build/ is git-ignored — don’t commit anything under it.

Troubleshooting builds¶

If you see TLS errors such as CERTIFICATE_VERIFY_FAILED or UnknownIssuer, your environment may require a corporate CA certificate.

Add CA file (`etl-dev`, `etl`, `etl-ngc`)¶

Place a CA cert in docker/certs/ (.pem, .crt, .cer) and rebuild:

docker compose build --no-cache etl-dev  # or etl / etl-ngc

etl-dev validates custom certs and skips files that are malformed or leaf certificates (CA:FALSE) instead of CA certificates (CA:TRUE).

You can also point to a different host cert directory at build time:

CA_CERT_DIR=/path/to/certs docker compose build --no-cache etl-dev

Pass CA via environment variable (`etl`, `etl-ngc`)¶

EXTRA_CA_CERT_B64="$(base64 < /path/to/your-org-ca.crt | tr -d '\n')" \
docker compose build etl                  # or etl-ngc

Bypass TLS as last resort¶

For etl-dev:

PIP_TRUSTED_HOST_FLAGS="--trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org" \
docker compose build etl-dev

For etl:

UV_ALLOW_INSECURE_HOST_FLAGS="--allow-insecure-host pypi.org --allow-insecure-host files.pythonhosted.org" \
docker compose build etl

For etl-ngc:

PIP_TRUSTED_HOST_FLAGS="--trusted-host pypi.org --trusted-host files.pythonhosted.org" \
docker compose build etl-ngc

Corporate proxy¶

HTTP_PROXY=http://proxy.example.com:8080 \
HTTPS_PROXY=http://proxy.example.com:8080 \
NO_PROXY=localhost,127.0.0.1 \
docker compose build etl                  # or etl-ngc

Apple Silicon: force `amd64`¶

DOCKER_PLATFORM=linux/amd64 docker compose build etl       # or etl-ngc
DOCKER_PLATFORM=linux/amd64 docker compose run --rm etl    # or etl-ngc

Skip full PhysicsNeMo install (`etl` only)¶

INSTALL_PHYSICSNEMO=0 docker compose build etl

Add new Python packages to Docker images¶

Do not rely on pip install in a running container for persistent changes. Add packages to Dockerfiles, then rebuild.

Select service(s) that need the dependency.
Edit the matching Dockerfile:

Service	Dockerfile	Install command style
`etl-dev`	`docker/Dockerfile.dev`	`pip install ...`
`etl`	`docker/Dockerfile.physicsnemo-cpu`	`uv ... pip install --system ...`
`etl-gpu`	`docker/Dockerfile.gpu`	`uv ... pip install --system ...`
`etl-ngc`	`docker/Dockerfile.ngc`	`pip install ...`

Rebuild and rerun:

docker compose build etl-dev
docker compose run --rm etl-dev

Verify inside container:

python -c "import your_package; print(your_package.__version__)"

Next references¶

Alpha-D surrogate tutorial – extract Darcy resistance profiles and train an MLP
Hyperparameter optimization – Optuna-based HPO for any model
ETL pipeline internals
Dataset API
FNO training and evaluation details