Command Line Interface (CLI)#

This page documents the main CLI entry points under mmml/cli with example invocations, minimal input files, and Slurm submission scripts.

Prerequisites#

Python virtual environment with mmml installed and model/data dependencies available
Access to GPU/CPU as required by the chosen command
For Slurm examples, an HPC partition with CUDA modules or suitable CPU nodes

make_res.py#

Purpose

Create a residue (or small molecule) template and minimal inputs for subsequent steps.

Usage

python -m mmml.cli.make_res \
  --resname WAT \
  --pdb water.pdb \
  --out water_res

Inputs

--resname: residue name (e.g., WAT, ETH, ACE)
--pdb: input PDB containing the residue

Outputs

Directory water_res with processed residue files

make_box.py#

Purpose

Build boxes from residues and write PDB/PSF (or equivalent) for simulation setup.

Usage

python -m mmml.cli.make_box \
  --residue water_res \
  --count 1000 \
  --box 30 \
  --out water_box

Inputs

--residue: residue directory from make_res
--count: number of molecules
--box: box edge length (Å)

Outputs

Directory water_box with PDB (and auxiliary) files

make_training.py#

Purpose

Prepare and/or run training for a PhysNetJAX (or compatible) model.

Common flags

--data: path to dataset (npz)
--tag: run name tag
--model: model definition (JSON/INP); if omitted, a default EF model is created
--n_train / --n_valid: split sizes
--num_epochs: number of epochs
--batch_size: batch size
--learning_rate: optimizer learning rate
--num_atoms: number of atoms per structure (auto-detected from data if not specified)
--ckpt_dir: checkpoints directory

Usage (basic - num_atoms auto-detected)

python -m mmml.cli.make_training \
  --data data/dimers.npz \
  --tag physnet_run1 \
  --num_epochs 5 \
  --batch_size 4 \
  --learning_rate 1e-3 \
  --ckpt_dir checkpoints/physnet_run1

Usage (explicit num_atoms)

python -m mmml.cli.make_training \
  --data data/dimers.npz \
  --tag physnet_run1 \
  --num_atoms 60 \
  --num_epochs 5 \
  --batch_size 4 \
  --learning_rate 1e-3 \
  --ckpt_dir checkpoints/physnet_run1

Outputs

Checkpoints in checkpoints/physnet_run1
Parameter snapshots paramsYYYY-mm-dd_HH-MM-SS.json

Notes

The --num_atoms parameter is now auto-detected from max(N) in the dataset
Padding is automatically removed if detected (e.g., 60 padded → 10 actual atoms)
Training uses only the actual number of atoms for efficiency
Unpadded files are saved for reuse (e.g., data_train_unpadded.npz)
You can still specify --num_atoms explicitly if needed

run_sim.py#

Purpose

Run a short ASE+MM/ML hybrid simulation (or energy/force evaluation) using a trained model.

Common flags

--pdbfile: input PDB to load
--checkpoint: path to trained model checkpoint directory
--n-monomers / --n-atoms-monomer: topology assumptions for ML partitions
--temperature: target temperature (K) for MD
--num-steps / --timestep: MD length and integration step (fs)
--output-prefix: prefix for trajectory/outputs

Usage

python -m mmml.cli.run_sim \
  --pdbfile water_box/water.pdb \
  --checkpoint checkpoints/physnet_run1 \
  --n-monomers 1000 \
  --n-atoms-monomer 3 \
  --temperature 100 \
  --timestep 0.1 \
  --num-steps 10000 \
  --output-prefix md_simulation

Outputs

Trajectory md_simulation_trajectory_100K_10000steps.traj
Console logs of energy/temperature

calculator.py#

Purpose

Provides a generic ASE calculator for trained MMML models. Can be used as a Python module or from command line.

Common flags

--checkpoint: path to checkpoint file or directory
--cutoff: neighbor list cutoff distance (Angstroms)
--use-dcmnet-dipole: use DCMNet dipole if available
--test-molecule: test with predefined molecule (CO2, H2O, CH4, NH3)

Usage as module

from mmml.cli.calculator import MMMLCalculator
from ase import Atoms

calc = MMMLCalculator.from_checkpoint('checkpoints/my_model')
atoms = Atoms('CO2', positions=[[0,0,0], [1.16,0,0], [-1.16,0,0]])
atoms.calc = calc

energy = atoms.get_potential_energy()
forces = atoms.get_forces()
dipole = atoms.get_dipole_moment()

Usage from command line

python -m mmml.cli.calculator \
  --checkpoint checkpoints/my_model \
  --test-molecule CO2

Outputs

Energy, forces, dipole moment, and atomic charges for test molecule

clean_data.py#

Purpose

Clean and validate NPZ datasets by removing structures with quality issues and keeping only essential training fields.

Common flags

input: input NPZ file to clean
-o, --output: output NPZ file (cleaned)
--max-force: maximum allowed force magnitude (eV/Å), default: 10.0
--min-distance: minimum allowed interatomic distance (Å), default: 0.4
--no-check-distances: skip distance checks (faster, recommended)
--quiet: suppress output

Essential fields kept

E, F, R, Z, N: Required for energy/force training
D, Dxyz: Optional dipole data
All other fields (cube_*, orbital_*, metadata) are removed

Usage (recommended - fast, keeps 99%+ data)

python -m mmml.cli.clean_data input.npz -o cleaned.npz --no-check-distances

Usage (stricter - removes overlapping atoms)

python -m mmml.cli.clean_data input.npz -o cleaned.npz \
  --max-force 10.0 --min-distance 0.4

Custom thresholds

python -m mmml.cli.clean_data input.npz -o cleaned.npz \
  --max-force 5.0 --min-distance 0.3

Outputs

Cleaned NPZ file with only essential fields (E, F, R, Z, N, D, Dxyz)
Invalid structures removed
Statistics about removed structures and failure reasons

Notes

Use --no-check-distances for faster cleaning and higher data retention (recommended)
Only removes clear SCF failures, keeping good training data
Automatically strips unnecessary QM fields (orbital energies, cube data, etc.)

inspect_checkpoint.py#

Purpose

Inspect model checkpoints and infer configuration from parameter structure.

Common flags

--checkpoint: path to checkpoint file or directory
--save-config: save inferred configuration to JSON file
--quiet: suppress detailed output

Usage

python -m mmml.cli.inspect_checkpoint --checkpoint model/best_params.pkl

Save configuration

python -m mmml.cli.inspect_checkpoint --checkpoint model/ \\
  --save-config inferred_config.json

Outputs

Total parameter count
Parameter structure breakdown by component
Inferred model configuration (features, iterations, etc.)
Optionally saves configuration to JSON

convert_npz_traj.py#

Purpose

Convert NPZ datasets to ASE trajectory format for visualization.

Common flags

input: input NPZ file
-o, --output: output trajectory file (.traj, .xyz, .pdb, etc.)
--max-structures: maximum number of structures to convert
--stride: use every Nth structure
--quiet: suppress output

Usage

python -m mmml.cli.convert_npz_traj data.npz -o trajectory.traj

Convert subset

python -m mmml.cli.convert_npz_traj data.npz -o traj.traj \\
  --max-structures 100 --stride 10

To XYZ format

python -m mmml.cli.convert_npz_traj data.npz -o structures.xyz

Outputs

ASE trajectory file (can be viewed with ase gui)
Removes padding automatically
Includes energies and forces if available

split_dataset.py#

Purpose

Split datasets into train/valid/test sets with optional unit conversion.

Common flags

input: input NPZ file (single file mode)
--efd: energy/force/dipole file (multi-file mode)
--grid: ESP grid file (multi-file mode)
-o, --output-dir: output directory
--train, --valid, --test: split ratios (default: 0.8/0.1/0.1)
--convert-units: convert Hartree→eV and Hartree/Bohr→eV/Å
--seed: random seed for reproducibility

Usage (single file)

python -m mmml.cli.split_dataset data.npz -o splits/

Usage (with unit conversion)

python -m mmml.cli.split_dataset data.npz -o splits/ --convert-units

Usage (multiple files - EFD + Grid)

python -m mmml.cli.split_dataset \\
  --efd energies_forces_dipoles.npz \\
  --grid grids_esp.npz \\
  -o training_data --convert-units

Custom split ratios

python -m mmml.cli.split_dataset data.npz -o splits/ \\
  --train 0.7 --valid 0.15 --test 0.15

Outputs

data_train.npz, data_valid.npz, data_test.npz
split_indices.npz (reproducible split indices)
Optionally converts units to ASE standard (eV, eV/Å)

explore_data.py#

Purpose

Explore and visualize NPZ datasets with statistical summaries.

Common flags

input: input NPZ file
--detailed: detailed analysis including geometry
--plots: generate distribution plots
--output-dir: output directory for plots
--quiet: suppress output

Usage

python -m mmml.cli.explore_data data.npz

With plots

python -m mmml.cli.explore_data data.npz --plots --output-dir exploration

Detailed analysis

python -m mmml.cli.explore_data data.npz --detailed --plots --output-dir analysis

Outputs

Statistical summaries (energy, forces, dipoles)
Bond length analysis (if –detailed)
Distribution plots (if –plots)
Data quality checks

evaluate_model.py#

Purpose

Evaluate trained models on datasets with detailed metrics (under development).

Common flags

--checkpoint: model checkpoint directory or file
--data: single dataset to evaluate
--train, --valid, --test: evaluate on multiple splits
--detailed: compute per-structure breakdown
--plots: generate correlation and error distribution plots
--output-dir: output directory for results

Usage

python -m mmml.cli.evaluate_model --checkpoint model/ --data test.npz

Multiple splits

python -m mmml.cli.evaluate_model --checkpoint model/ \\
  --train train.npz --valid valid.npz --test test.npz \\
  --output-dir evaluation

Outputs

Error metrics (MAE, RMSE, R²) for energy, forces, dipoles
Correlation plots (if –plots specified)
Per-structure analysis (if –detailed specified)

dynamics.py#

Purpose

Molecular dynamics and vibrational analysis with multiple framework support (ASE, JAX MD).

Common flags

--checkpoint: model checkpoint directory or file
--molecule: predefined molecule (CO2, H2O, CH4, NH3)
--structure: load structure from file (XYZ, PDB, etc.)
--optimize: optimize geometry
--frequencies: calculate vibrational frequencies
--ir-spectra: calculate IR spectrum (requires –frequencies)
--md: run molecular dynamics
--framework: MD framework (ase or jaxmd)
--ensemble: MD ensemble (nve, nvt, npt)
--temperature: temperature (K)
--timestep: MD timestep (fs)
--nsteps: number of MD steps
--output-dir: output directory

Usage - Optimization

python -m mmml.cli.dynamics --checkpoint model/ --molecule CO2 \\
  --optimize --output-dir co2_opt

Usage - Vibrational analysis

python -m mmml.cli.dynamics --checkpoint model/ --molecule CO2 \\
  --frequencies --ir-spectra --output-dir co2_vib

Usage - Molecular dynamics (ASE)

python -m mmml.cli.dynamics --checkpoint model/ --molecule CO2 \\
  --md --framework ase --ensemble nvt --temperature 300 --nsteps 10000 \\
  --output-dir co2_md

Usage - Full workflow

python -m mmml.cli.dynamics --checkpoint model/ --structure molecule.xyz \\
  --optimize --frequencies --ir-spectra --md --nsteps 5000 \\
  --output-dir full_analysis

Outputs

Optimized geometries (XYZ format)
Vibrational frequencies and normal modes
IR spectra (plots and data)
MD trajectories (ASE trajectory format)
Analysis results and statistics

plot_training.py#

Purpose

Visualize training history and analyze model parameters from saved checkpoints.

Common flags

history_files: one or more training history JSON files
--compare: compare two training runs (requires 2 history files)
--params: parameter pickle file(s) for analysis
--analyze-params: analyze and plot parameter structure
--output-dir: output directory for plots
--dpi: DPI for output images (default: 150)
--format: output format (png, pdf, svg, jpg)
--smoothing: exponential smoothing factor (0-1, 0=none)
--summary-only: only print text summary, no plots

Usage single model

python -m mmml.cli.plot_training \
  checkpoints/my_model/history.json \
  --output-dir plots --dpi 300

Usage comparison

python -m mmml.cli.plot_training \
  model1/history.json model2/history.json \
  --compare --names "Model A" "Model B" \
  --smoothing 0.9

With parameter analysis

python -m mmml.cli.plot_training history.json \
  --params best_params.pkl \
  --analyze-params

Outputs

Training history plots showing loss curves and metrics
Parameter analysis plots (if requested)
Text summary of training performance

Minimal example files#

Model args (EF) JSON (if constructing a default model):

{
  "features": 64,
  "max_degree": 0,
  "num_basis_functions": 32,
  "num_iterations": 2,
  "n_res": 2,
  "cutoff": 8.0,
  "max_atomic_number": 28,
  "zbl": false,
  "efa": false
}

Dataset layout

Single npz file with arrays at least: R (positions), Z (atomic numbers), E (energies), F (forces)

Minimal Slurm scripts#

Training (1 GPU)

#!/bin/bash
#SBATCH -J mmml-train
#SBATCH -A your_account
#SBATCH -p gpu
#SBATCH -N 1
#SBATCH -c 8
#SBATCH --gres=gpu:1
#SBATCH -t 12:00:00
#SBATCH -o slurm-%j.out

module load cuda/12.1  # if needed
source /path/to/venv/bin/activate

srun python -m mmml.cli.make_training \
  --data /path/to/data.npz \
  --tag physnet_run1 \
  --num_epochs 20 \
  --batch_size 8 \
  --learning_rate 1e-3 \
  --ckpt_dir /scratch/$USER/mmml_checkpoints/physnet_run1

MD run (CPU or GPU)

#!/bin/bash
#SBATCH -J mmml-md
#SBATCH -A your_account
#SBATCH -p gpu
#SBATCH -N 1
#SBATCH -c 8
#SBATCH --gres=gpu:1
#SBATCH -t 02:00:00
#SBATCH -o slurm-%j.out

module load cuda/12.1  # if needed
source /path/to/venv/bin/activate

srun python -m mmml.cli.run_sim \
  --pdbfile /path/to/box.pdb \
  --checkpoint /scratch/$USER/mmml_checkpoints/physnet_run1 \
  --n-monomers 1000 \
  --n-atoms-monomer 3 \
  --temperature 100 \
  --timestep 0.1 \
  --num-steps 10000 \
  --output-prefix md_simulation

Debug (short) job

#!/bin/bash
#SBATCH -J mmml-debug
#SBATCH -A your_account
#SBATCH -p debug
#SBATCH -N 1
#SBATCH -c 4
#SBATCH -t 00:10:00
#SBATCH -o slurm-%j.out

source /path/to/venv/bin/activate
srun python -m mmml.cli.make_res --resname WAT --pdb water.pdb --out water_res

Notes#

For reproducible results, set seeds where provided by flags.
Ensure the box size in run_sim.py is physically reasonable for your system.
If running on CPU-only nodes, remove CUDA module loads.
The calculator.py module provides a generic interface that automatically detects model types.
Use plot_training.py to visualize and compare training runs from JSON history files.
All CLI tools support --help for detailed usage information.

Command Line Interface (CLI)#

Prerequisites#

make_res.py#

make_box.py#

make_training.py#

run_sim.py#

calculator.py#

clean_data.py#

inspect_checkpoint.py#

convert_npz_traj.py#

split_dataset.py#

explore_data.py#

evaluate_model.py#

dynamics.py#

plot_training.py#

Minimal example files#

Minimal Slurm scripts#

Notes#

This Page