Getting Started

This guide covers day-to-day usage with Docker Compose or Apptainer (for HPC systems where Docker is unavailable):

  • setting up and running the ETL

  • choosing the right container image

  • troubleshooting container builds

  • adding new Python packages to images

Prerequisites

  • Docker Desktop (or Docker Engine + Compose v2) or Apptainer (HPC)

  • Git submodules initialized:

git submodule update --init --recursive

Environment variables

Copy .env.example to .env and fill in values for your environment:

cp .env.example .env

Key variables:

Variable

Used by

Description

APPTAINER_BIND

Apptainer

Default bind mounts (e.g. /path/to/project:/path/to/project)

DOCKER_PLATFORM

Docker Compose

Force linux/amd64 on Apple Silicon

CA_CERT_DIR

Docker Compose

Path to directory with custom CA certs

EXTRA_CA_CERT_B64

Docker Compose

Base64-encoded CA cert (alternative to CA_CERT_DIR)

INSTALL_PHYSICSNEMO

Docker Compose (etl)

Set to 0 to skip PhysicsNeMo install

UV_ALLOW_INSECURE_HOST_FLAGS

Docker Compose (etl)

TLS bypass for uv

PIP_TRUSTED_HOST_FLAGS

Docker Compose (etl-dev, etl-ngc)

TLS bypass for pip

HTTP_PROXY / HTTPS_PROXY / NO_PROXY

Both

Corporate proxy settings

Docker Compose reads .env automatically. For Apptainer, see the source step in the Apptainer section below.

Choose a container image

Service

Dockerfile

Apptainer def

Base

Approx. size

Best for

etl-dev

docker/Dockerfile.dev

docker/dev.def

python:3.11-slim

~300 MB

Fast ETL iteration (no PhysicsNeMo/PyTorch)

etl

docker/Dockerfile.physicsnemo-cpu

docker/physicsnemo-cpu.def

python:3.11-slim

~1 GB

Full CPU stack from PyPI

etl-gpu

docker/Dockerfile.gpu

docker/gpu.def

python:3.11-slim + PyTorch cu124 wheels

~4 GB

CPU + NVIDIA GPU (CUDA 12.4, amd64 only)

etl-ngc

docker/Dockerfile.ngc

docker/ngc.def

nvcr.io/nvidia/physicsnemo/physicsnemo:25.11

~13 GB

NVIDIA pre-tested stack (amd64 only)

etl-dev and etl run on Apple Silicon (arm64) and Intel (amd64) without a GPU. etl-gpu and etl-ngc are amd64-only and support NVIDIA GPUs.

Note

None of these images include MOOSE itself — they ship the Python / PhysicsNeMo stack used to consume MOOSE outputs. To produce the .e files the ETL reads, see Running MOOSE Simulations, which documents the separate moose-dev-openmpi-x86_64.sif Apptainer image and links to MOOSE’s official install paths.

Build and run with Docker Compose

Option A: direct run from host terminal

docker compose build etl-dev
docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py'

cases/moose_grid/run_etl.py defaults to cases/moose_grid/configs/etl.yaml (the lid-driven flow). Replace etl-dev with etl or etl-ngc if needed.

Build and run with Apptainer (HPC)

Use Apptainer on HPC systems where Docker is not available (e.g., INL ROD).

Step 1: Source environment variables

Apptainer does not read .env automatically. Source it before every session:

set -a && source .env && set +a

This loads APPTAINER_BIND and any proxy settings into your shell so subsequent apptainer commands pick them up without needing --bind flags.

Step 2: Build a SIF image

# Minimal dev image (ETL only, ~300 MB)
apptainer build multifid-th-dev.sif docker/dev.def

# Full CPU image with PhysicsNeMo (~1 GB)
apptainer build multifid-th-cpu.sif docker/physicsnemo-cpu.def

# CUDA 12.4 GPU image — CPU-only without --nv, GPU with --nv (~5 GB)
apptainer build multifid-th-gpu.sif docker/gpu.def

# NGC image — CPU-only without --nv, GPU with --nv (~13 GB)
apptainer build multifid-th-ngc.sif docker/ngc.def

Step 3: Run with project folder bound

Bind your project directory so the container can read inputs and write outputs:

# CPU-only
apptainer run \
  --bind /path/to/project:/path/to/project \
  multifid-th-cpu.sif

# GPU (--nv exposes host NVIDIA drivers to the container)
apptainer run --nv \
  --bind /path/to/project:/path/to/project \
  multifid-th-gpu.sif

Your $HOME directory is auto-bound by Apptainer, so files under $HOME are always accessible without an explicit --bind.

Run a script directly

# CPU
apptainer exec \
  --bind /path/to/project:/path/to/project \
  multifid-th-cpu.sif \
  bash -c 'cd /path/to/src && python cases/moose_grid/run_etl.py'

# GPU
apptainer exec --nv \
  --bind /path/to/project:/path/to/project \
  multifid-th-gpu.sif \
  bash -c 'cd /path/to/src && python train.py --config-path cases/moose_grid/configs --config-name train_fno'

Verify GPU access inside the container

apptainer exec --nv multifid-th-gpu.sif python -c \
  "import torch; print(torch.cuda.get_device_name(0))"

Set a default bind (optional)

To avoid typing --bind every time, export it in your shell profile:

export APPTAINER_BIND="/path/to/project:/path/to/project"

Then run without --bind:

apptainer run multifid-th-cpu.sif

The lid-driven flow config lives at src/cases/moose_grid/configs/etl.yaml:

defaults:
  - etl_base
  - _self_

etl:
  processing:
    num_processes: 4
  source:
    input_dir: ../data/lid-driven
    data_dir: ../data/lid-driven
  sink:
    output_dir: ../data/processed/lid-driven

Option A2: same run with CLI overrides (no dedicated yaml)

docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py \
  etl.source.input_dir=../data/lid-driven \
  etl.source.data_dir=../data/lid-driven \
  etl.sink.output_dir=../data/processed/lid-driven \
  etl.processing.num_processes=4'

Create your own config

Use the lid-driven config as a template for a new dataset:

cp src/cases/moose_grid/configs/etl.yaml src/cases/moose_grid/configs/my_case.yaml

Edit these keys in src/cases/moose_grid/configs/my_case.yaml:

  • etl.source.input_dir

  • etl.source.data_dir

  • etl.sink.output_dir

  • (optional) etl.processing.num_processes

Run with your new config name:

docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py --config-name my_case'

Option B: interactive shell

docker compose run --rm etl-dev

Then inside the container:

cd src
python cases/moose_grid/run_etl.py

Input and output conventions

Pattern

Description

{sim_name}.e

Exodus II mesh + element fields

{sim_prefix}_out_{probe_name}_{timestep:04d}.csv

CSV line probes

  • Output directory (with the default lid-driven config): data/processed/lid-driven/

  • Output format: one {sim_name}.zarr per simulation

  • Exodus and CSV prefixes do not need to match

FNO training and evaluation

Use the etl or etl-ngc service for PhysicsNeMo + PyTorch scripts. Edit this template first:

  • src/cases/moose_grid/configs/train_fno.yaml

train_fno.yaml is a Hydra config that inherits src/training/config/default.yaml (via hydra.searchpath: pkg://training.config) and sets an FNO example for train/evaluate.

Train

docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/moose_grid/configs --config-name train_fno'

Evaluate

docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/moose_grid/configs --config-name train_fno'

Generate velocity-field comparison plots during evaluation:

docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/moose_grid/configs --config-name train_fno \
  output.plot_dir=../data/models/lid_driven_fno_plots'

CLI flags override YAML values:

docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/moose_grid/configs --config-name train_fno training.epochs=50'

Logs

During build:

docker compose build --progress=plain etl-ngc

During runtime:

docker compose logs -f etl-ngc
docker compose logs --tail=100 etl-ngc

docker compose run --rm ... removes the container when it exits, including its stored logs. Omit --rm if you need to inspect logs after a run.

Build the documentation

The site you’re reading is built with Sphinx + MyST + Furo. The multifid-th-cpu.sif image already has the full Sphinx stack installed, so no extra install step is required.

From the repository root:

# Apptainer (preferred on HPC)
apptainer exec --bind "$PWD:$PWD" --pwd "$PWD" multifid-th-cpu.sif \
    make -C docs html

# Docker (workstation)
docker compose run --rm etl bash -lc 'make -C docs html'

Open docs/_build/html/index.html in a browser to preview.

Rebuild after:

  • editing any .md file under docs/,

  • editing module / class / function docstrings (autodoc pulls them into the API pages), or

  • adding / removing classes or functions exposed via automodule directives in docs/api/.

For live reload, the nitpicky strict target (warnings-fatal, matching CI), and the full list of make targets, see Building the documentation. docs/_build/ is git-ignored — don’t commit anything under it.

Troubleshooting builds

If you see TLS errors such as CERTIFICATE_VERIFY_FAILED or UnknownIssuer, your environment may require a corporate CA certificate.

Add CA file (etl-dev, etl, etl-ngc)

Place a CA cert in docker/certs/ (.pem, .crt, .cer) and rebuild:

docker compose build --no-cache etl-dev  # or etl / etl-ngc

etl-dev validates custom certs and skips files that are malformed or leaf certificates (CA:FALSE) instead of CA certificates (CA:TRUE).

You can also point to a different host cert directory at build time:

CA_CERT_DIR=/path/to/certs docker compose build --no-cache etl-dev

Pass CA via environment variable (etl, etl-ngc)

EXTRA_CA_CERT_B64="$(base64 < /path/to/your-org-ca.crt | tr -d '\n')" \
docker compose build etl                  # or etl-ngc

Bypass TLS as last resort

For etl-dev:

PIP_TRUSTED_HOST_FLAGS="--trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org" \
docker compose build etl-dev

For etl:

UV_ALLOW_INSECURE_HOST_FLAGS="--allow-insecure-host pypi.org --allow-insecure-host files.pythonhosted.org" \
docker compose build etl

For etl-ngc:

PIP_TRUSTED_HOST_FLAGS="--trusted-host pypi.org --trusted-host files.pythonhosted.org" \
docker compose build etl-ngc

Corporate proxy

HTTP_PROXY=http://proxy.example.com:8080 \
HTTPS_PROXY=http://proxy.example.com:8080 \
NO_PROXY=localhost,127.0.0.1 \
docker compose build etl                  # or etl-ngc

Apple Silicon: force amd64

DOCKER_PLATFORM=linux/amd64 docker compose build etl       # or etl-ngc
DOCKER_PLATFORM=linux/amd64 docker compose run --rm etl    # or etl-ngc

Skip full PhysicsNeMo install (etl only)

INSTALL_PHYSICSNEMO=0 docker compose build etl

Add new Python packages to Docker images

Do not rely on pip install in a running container for persistent changes. Add packages to Dockerfiles, then rebuild.

  1. Select service(s) that need the dependency.

  2. Edit the matching Dockerfile:

Service

Dockerfile

Install command style

etl-dev

docker/Dockerfile.dev

pip install ...

etl

docker/Dockerfile.physicsnemo-cpu

uv ... pip install --system ...

etl-gpu

docker/Dockerfile.gpu

uv ... pip install --system ...

etl-ngc

docker/Dockerfile.ngc

pip install ...

  1. Rebuild and rerun:

docker compose build etl-dev
docker compose run --rm etl-dev
  1. Verify inside container:

python -c "import your_package; print(your_package.__version__)"

Next references