Getting Started¶
This guide covers day-to-day usage with Docker Compose or Apptainer (for HPC systems where Docker is unavailable):
setting up and running the ETL
choosing the right container image
troubleshooting container builds
adding new Python packages to images
Prerequisites¶
Docker Desktop (or Docker Engine + Compose v2) or Apptainer (HPC)
Git submodules initialized:
git submodule update --init --recursive
Environment variables¶
Copy .env.example to .env and fill in values for your environment:
cp .env.example .env
Key variables:
Variable |
Used by |
Description |
|---|---|---|
|
Apptainer |
Default bind mounts (e.g. |
|
Docker Compose |
Force |
|
Docker Compose |
Path to directory with custom CA certs |
|
Docker Compose |
Base64-encoded CA cert (alternative to |
|
Docker Compose ( |
Set to |
|
Docker Compose ( |
TLS bypass for uv |
|
Docker Compose ( |
TLS bypass for pip |
|
Both |
Corporate proxy settings |
Docker Compose reads .env automatically. For Apptainer, see the source step
in the Apptainer section below.
Choose a container image¶
Service |
Dockerfile |
Apptainer def |
Base |
Approx. size |
Best for |
|---|---|---|---|---|---|
|
|
|
|
~300 MB |
Fast ETL iteration (no PhysicsNeMo/PyTorch) |
|
|
|
|
~1 GB |
Full CPU stack from PyPI |
|
|
|
|
~4 GB |
CPU + NVIDIA GPU (CUDA 12.4, amd64 only) |
|
|
|
|
~13 GB |
NVIDIA pre-tested stack (amd64 only) |
etl-dev and etl run on Apple Silicon (arm64) and Intel (amd64) without a GPU.
etl-gpu and etl-ngc are amd64-only and support NVIDIA GPUs.
Note
None of these images include MOOSE itself — they ship the Python /
PhysicsNeMo stack used to consume MOOSE outputs. To produce the .e
files the ETL reads, see Running MOOSE Simulations,
which documents the separate moose-dev-openmpi-x86_64.sif Apptainer
image and links to MOOSE’s official install paths.
Build and run with Docker Compose¶
Option A: direct run from host terminal¶
docker compose build etl-dev
docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py'
cases/moose_grid/run_etl.py defaults to cases/moose_grid/configs/etl.yaml (the lid-driven flow). Replace etl-dev with etl or etl-ngc if needed.
Build and run with Apptainer (HPC)¶
Use Apptainer on HPC systems where Docker is not available (e.g., INL ROD).
Step 1: Source environment variables¶
Apptainer does not read .env automatically. Source it before every session:
set -a && source .env && set +a
This loads APPTAINER_BIND and any proxy settings into your shell so subsequent
apptainer commands pick them up without needing --bind flags.
Step 2: Build a SIF image¶
# Minimal dev image (ETL only, ~300 MB)
apptainer build multifid-th-dev.sif docker/dev.def
# Full CPU image with PhysicsNeMo (~1 GB)
apptainer build multifid-th-cpu.sif docker/physicsnemo-cpu.def
# CUDA 12.4 GPU image — CPU-only without --nv, GPU with --nv (~5 GB)
apptainer build multifid-th-gpu.sif docker/gpu.def
# NGC image — CPU-only without --nv, GPU with --nv (~13 GB)
apptainer build multifid-th-ngc.sif docker/ngc.def
Step 3: Run with project folder bound¶
Bind your project directory so the container can read inputs and write outputs:
# CPU-only
apptainer run \
--bind /path/to/project:/path/to/project \
multifid-th-cpu.sif
# GPU (--nv exposes host NVIDIA drivers to the container)
apptainer run --nv \
--bind /path/to/project:/path/to/project \
multifid-th-gpu.sif
Your $HOME directory is auto-bound by Apptainer, so files under $HOME are
always accessible without an explicit --bind.
Run a script directly¶
# CPU
apptainer exec \
--bind /path/to/project:/path/to/project \
multifid-th-cpu.sif \
bash -c 'cd /path/to/src && python cases/moose_grid/run_etl.py'
# GPU
apptainer exec --nv \
--bind /path/to/project:/path/to/project \
multifid-th-gpu.sif \
bash -c 'cd /path/to/src && python train.py --config-path cases/moose_grid/configs --config-name train_fno'
Verify GPU access inside the container¶
apptainer exec --nv multifid-th-gpu.sif python -c \
"import torch; print(torch.cuda.get_device_name(0))"
Set a default bind (optional)¶
To avoid typing --bind every time, export it in your shell profile:
export APPTAINER_BIND="/path/to/project:/path/to/project"
Then run without --bind:
apptainer run multifid-th-cpu.sif
The lid-driven flow config lives at src/cases/moose_grid/configs/etl.yaml:
defaults:
- etl_base
- _self_
etl:
processing:
num_processes: 4
source:
input_dir: ../data/lid-driven
data_dir: ../data/lid-driven
sink:
output_dir: ../data/processed/lid-driven
Option A2: same run with CLI overrides (no dedicated yaml)¶
docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py \
etl.source.input_dir=../data/lid-driven \
etl.source.data_dir=../data/lid-driven \
etl.sink.output_dir=../data/processed/lid-driven \
etl.processing.num_processes=4'
Create your own config¶
Use the lid-driven config as a template for a new dataset:
cp src/cases/moose_grid/configs/etl.yaml src/cases/moose_grid/configs/my_case.yaml
Edit these keys in src/cases/moose_grid/configs/my_case.yaml:
etl.source.input_diretl.source.data_diretl.sink.output_dir(optional)
etl.processing.num_processes
Run with your new config name:
docker compose run --rm etl-dev bash -lc 'cd src && python cases/moose_grid/run_etl.py --config-name my_case'
Option B: interactive shell¶
docker compose run --rm etl-dev
Then inside the container:
cd src
python cases/moose_grid/run_etl.py
Input and output conventions¶
Pattern |
Description |
|---|---|
|
Exodus II mesh + element fields |
|
CSV line probes |
Output directory (with the default lid-driven config):
data/processed/lid-driven/Output format: one
{sim_name}.zarrper simulationExodus and CSV prefixes do not need to match
FNO training and evaluation¶
Use the etl or etl-ngc service for PhysicsNeMo + PyTorch scripts.
Edit this template first:
src/cases/moose_grid/configs/train_fno.yaml
train_fno.yaml is a Hydra config that inherits src/training/config/default.yaml
(via hydra.searchpath: pkg://training.config) and sets an FNO example for
train/evaluate.
Train¶
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/moose_grid/configs --config-name train_fno'
Evaluate¶
docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/moose_grid/configs --config-name train_fno'
Generate velocity-field comparison plots during evaluation:
docker compose run --rm etl bash -lc 'cd src && python evaluate.py --config-path cases/moose_grid/configs --config-name train_fno \
output.plot_dir=../data/models/lid_driven_fno_plots'
CLI flags override YAML values:
docker compose run --rm etl bash -lc 'cd src && python train.py --config-path cases/moose_grid/configs --config-name train_fno training.epochs=50'
Logs¶
During build:
docker compose build --progress=plain etl-ngc
During runtime:
docker compose logs -f etl-ngc
docker compose logs --tail=100 etl-ngc
docker compose run --rm ... removes the container when it exits, including its
stored logs. Omit --rm if you need to inspect logs after a run.
Build the documentation¶
The site you’re reading is built with Sphinx + MyST + Furo. The
multifid-th-cpu.sif image already has the full Sphinx stack installed,
so no extra install step is required.
From the repository root:
# Apptainer (preferred on HPC)
apptainer exec --bind "$PWD:$PWD" --pwd "$PWD" multifid-th-cpu.sif \
make -C docs html
# Docker (workstation)
docker compose run --rm etl bash -lc 'make -C docs html'
Open docs/_build/html/index.html in a browser to preview.
Rebuild after:
editing any
.mdfile underdocs/,editing module / class / function docstrings (autodoc pulls them into the API pages), or
adding / removing classes or functions exposed via
automoduledirectives indocs/api/.
For live reload, the nitpicky strict target (warnings-fatal,
matching CI), and the full list of make targets, see
Building the documentation. docs/_build/
is git-ignored — don’t commit anything under it.
Troubleshooting builds¶
If you see TLS errors such as CERTIFICATE_VERIFY_FAILED or UnknownIssuer,
your environment may require a corporate CA certificate.
Add CA file (etl-dev, etl, etl-ngc)¶
Place a CA cert in docker/certs/ (.pem, .crt, .cer) and rebuild:
docker compose build --no-cache etl-dev # or etl / etl-ngc
etl-dev validates custom certs and skips files that are malformed or leaf
certificates (CA:FALSE) instead of CA certificates (CA:TRUE).
You can also point to a different host cert directory at build time:
CA_CERT_DIR=/path/to/certs docker compose build --no-cache etl-dev
Pass CA via environment variable (etl, etl-ngc)¶
EXTRA_CA_CERT_B64="$(base64 < /path/to/your-org-ca.crt | tr -d '\n')" \
docker compose build etl # or etl-ngc
Bypass TLS as last resort¶
For etl-dev:
PIP_TRUSTED_HOST_FLAGS="--trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org" \
docker compose build etl-dev
For etl:
UV_ALLOW_INSECURE_HOST_FLAGS="--allow-insecure-host pypi.org --allow-insecure-host files.pythonhosted.org" \
docker compose build etl
For etl-ngc:
PIP_TRUSTED_HOST_FLAGS="--trusted-host pypi.org --trusted-host files.pythonhosted.org" \
docker compose build etl-ngc
Corporate proxy¶
HTTP_PROXY=http://proxy.example.com:8080 \
HTTPS_PROXY=http://proxy.example.com:8080 \
NO_PROXY=localhost,127.0.0.1 \
docker compose build etl # or etl-ngc
Apple Silicon: force amd64¶
DOCKER_PLATFORM=linux/amd64 docker compose build etl # or etl-ngc
DOCKER_PLATFORM=linux/amd64 docker compose run --rm etl # or etl-ngc
Skip full PhysicsNeMo install (etl only)¶
INSTALL_PHYSICSNEMO=0 docker compose build etl
Add new Python packages to Docker images¶
Do not rely on pip install in a running container for persistent changes. Add
packages to Dockerfiles, then rebuild.
Select service(s) that need the dependency.
Edit the matching Dockerfile:
Service |
Dockerfile |
Install command style |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Rebuild and rerun:
docker compose build etl-dev
docker compose run --rm etl-dev
Verify inside container:
python -c "import your_package; print(your_package.__version__)"
Next references¶
Alpha-D surrogate tutorial – extract Darcy resistance profiles and train an MLP
Hyperparameter optimization – Optuna-based HPO for any model