Command Line Interface (CLI)#
This page documents the main CLI entry points under mmml/cli with example
invocations, minimal input files, and Slurm submission scripts.
Prerequisites#
Python virtual environment with mmml installed and model/data dependencies available
Access to GPU/CPU as required by the chosen command
For Slurm examples, an HPC partition with CUDA modules or suitable CPU nodes
make_res.py#
- Purpose
Create a residue (or small molecule) template and minimal inputs for subsequent steps.
- Usage
python -m mmml.cli.make_res \ --resname WAT \ --pdb water.pdb \ --out water_res
- Inputs
--resname: residue name (e.g., WAT, ETH, ACE)--pdb: input PDB containing the residue
- Outputs
Directory
water_reswith processed residue files
make_box.py#
- Purpose
Build boxes from residues and write PDB/PSF (or equivalent) for simulation setup.
- Usage
python -m mmml.cli.make_box \ --residue water_res \ --count 1000 \ --box 30 \ --out water_box
- Inputs
--residue: residue directory frommake_res--count: number of molecules--box: box edge length (Å)
- Outputs
Directory
water_boxwith PDB (and auxiliary) files
make_training.py#
- Purpose
Prepare and/or run training for a PhysNetJAX (or compatible) model.
- Common flags
--data: path to dataset (npz)--tag: run name tag--model: model definition (JSON/INP); if omitted, a default EF model is created--n_train/--n_valid: split sizes--num_epochs: number of epochs--batch_size: batch size--learning_rate: optimizer learning rate--num_atoms: number of atoms per structure (auto-detected from data if not specified)--ckpt_dir: checkpoints directory
- Usage (basic - num_atoms auto-detected)
python -m mmml.cli.make_training \ --data data/dimers.npz \ --tag physnet_run1 \ --num_epochs 5 \ --batch_size 4 \ --learning_rate 1e-3 \ --ckpt_dir checkpoints/physnet_run1
- Usage (explicit num_atoms)
python -m mmml.cli.make_training \ --data data/dimers.npz \ --tag physnet_run1 \ --num_atoms 60 \ --num_epochs 5 \ --batch_size 4 \ --learning_rate 1e-3 \ --ckpt_dir checkpoints/physnet_run1
- Outputs
Checkpoints in
checkpoints/physnet_run1Parameter snapshots
paramsYYYY-mm-dd_HH-MM-SS.json
- Notes
The
--num_atomsparameter is now auto-detected from max(N) in the datasetPadding is automatically removed if detected (e.g., 60 padded → 10 actual atoms)
Training uses only the actual number of atoms for efficiency
Unpadded files are saved for reuse (e.g.,
data_train_unpadded.npz)You can still specify
--num_atomsexplicitly if needed
run_sim.py#
- Purpose
Run a short ASE+MM/ML hybrid simulation (or energy/force evaluation) using a trained model.
- Common flags
--pdbfile: input PDB to load--checkpoint: path to trained model checkpoint directory--n-monomers/--n-atoms-monomer: topology assumptions for ML partitions--temperature: target temperature (K) for MD--num-steps/--timestep: MD length and integration step (fs)--output-prefix: prefix for trajectory/outputs
- Usage
python -m mmml.cli.run_sim \ --pdbfile water_box/water.pdb \ --checkpoint checkpoints/physnet_run1 \ --n-monomers 1000 \ --n-atoms-monomer 3 \ --temperature 100 \ --timestep 0.1 \ --num-steps 10000 \ --output-prefix md_simulation
- Outputs
Trajectory
md_simulation_trajectory_100K_10000steps.trajConsole logs of energy/temperature
calculator.py#
- Purpose
Provides a generic ASE calculator for trained MMML models. Can be used as a Python module or from command line.
- Common flags
--checkpoint: path to checkpoint file or directory--cutoff: neighbor list cutoff distance (Angstroms)--use-dcmnet-dipole: use DCMNet dipole if available--test-molecule: test with predefined molecule (CO2, H2O, CH4, NH3)
- Usage as module
from mmml.cli.calculator import MMMLCalculator from ase import Atoms calc = MMMLCalculator.from_checkpoint('checkpoints/my_model') atoms = Atoms('CO2', positions=[[0,0,0], [1.16,0,0], [-1.16,0,0]]) atoms.calc = calc energy = atoms.get_potential_energy() forces = atoms.get_forces() dipole = atoms.get_dipole_moment()
- Usage from command line
python -m mmml.cli.calculator \ --checkpoint checkpoints/my_model \ --test-molecule CO2
- Outputs
Energy, forces, dipole moment, and atomic charges for test molecule
clean_data.py#
- Purpose
Clean and validate NPZ datasets by removing structures with quality issues and keeping only essential training fields.
- Common flags
input: input NPZ file to clean-o, --output: output NPZ file (cleaned)--max-force: maximum allowed force magnitude (eV/Å), default: 10.0--min-distance: minimum allowed interatomic distance (Å), default: 0.4--no-check-distances: skip distance checks (faster, recommended)--quiet: suppress output
- Essential fields kept
E, F, R, Z, N: Required for energy/force training
D, Dxyz: Optional dipole data
All other fields (cube_*, orbital_*, metadata) are removed
- Usage (recommended - fast, keeps 99%+ data)
python -m mmml.cli.clean_data input.npz -o cleaned.npz --no-check-distances
- Usage (stricter - removes overlapping atoms)
python -m mmml.cli.clean_data input.npz -o cleaned.npz \ --max-force 10.0 --min-distance 0.4
- Custom thresholds
python -m mmml.cli.clean_data input.npz -o cleaned.npz \ --max-force 5.0 --min-distance 0.3
- Outputs
Cleaned NPZ file with only essential fields (E, F, R, Z, N, D, Dxyz)
Invalid structures removed
Statistics about removed structures and failure reasons
- Notes
Use
--no-check-distancesfor faster cleaning and higher data retention (recommended)Only removes clear SCF failures, keeping good training data
Automatically strips unnecessary QM fields (orbital energies, cube data, etc.)
inspect_checkpoint.py#
- Purpose
Inspect model checkpoints and infer configuration from parameter structure.
- Common flags
--checkpoint: path to checkpoint file or directory--save-config: save inferred configuration to JSON file--quiet: suppress detailed output
- Usage
python -m mmml.cli.inspect_checkpoint --checkpoint model/best_params.pkl
- Save configuration
python -m mmml.cli.inspect_checkpoint --checkpoint model/ \\ --save-config inferred_config.json
- Outputs
Total parameter count
Parameter structure breakdown by component
Inferred model configuration (features, iterations, etc.)
Optionally saves configuration to JSON
convert_npz_traj.py#
- Purpose
Convert NPZ datasets to ASE trajectory format for visualization.
- Common flags
input: input NPZ file-o, --output: output trajectory file (.traj, .xyz, .pdb, etc.)--max-structures: maximum number of structures to convert--stride: use every Nth structure--quiet: suppress output
- Usage
python -m mmml.cli.convert_npz_traj data.npz -o trajectory.traj
- Convert subset
python -m mmml.cli.convert_npz_traj data.npz -o traj.traj \\ --max-structures 100 --stride 10
- To XYZ format
python -m mmml.cli.convert_npz_traj data.npz -o structures.xyz
- Outputs
ASE trajectory file (can be viewed with
ase gui)Removes padding automatically
Includes energies and forces if available
split_dataset.py#
- Purpose
Split datasets into train/valid/test sets with optional unit conversion.
- Common flags
input: input NPZ file (single file mode)--efd: energy/force/dipole file (multi-file mode)--grid: ESP grid file (multi-file mode)-o, --output-dir: output directory--train, --valid, --test: split ratios (default: 0.8/0.1/0.1)--convert-units: convert Hartree→eV and Hartree/Bohr→eV/Å--seed: random seed for reproducibility
- Usage (single file)
python -m mmml.cli.split_dataset data.npz -o splits/
- Usage (with unit conversion)
python -m mmml.cli.split_dataset data.npz -o splits/ --convert-units
- Usage (multiple files - EFD + Grid)
python -m mmml.cli.split_dataset \\ --efd energies_forces_dipoles.npz \\ --grid grids_esp.npz \\ -o training_data --convert-units
- Custom split ratios
python -m mmml.cli.split_dataset data.npz -o splits/ \\ --train 0.7 --valid 0.15 --test 0.15
- Outputs
data_train.npz, data_valid.npz, data_test.npz
split_indices.npz (reproducible split indices)
Optionally converts units to ASE standard (eV, eV/Å)
explore_data.py#
- Purpose
Explore and visualize NPZ datasets with statistical summaries.
- Common flags
input: input NPZ file--detailed: detailed analysis including geometry--plots: generate distribution plots--output-dir: output directory for plots--quiet: suppress output
- Usage
python -m mmml.cli.explore_data data.npz
- With plots
python -m mmml.cli.explore_data data.npz --plots --output-dir exploration
- Detailed analysis
python -m mmml.cli.explore_data data.npz --detailed --plots --output-dir analysis
- Outputs
Statistical summaries (energy, forces, dipoles)
Bond length analysis (if –detailed)
Distribution plots (if –plots)
Data quality checks
evaluate_model.py#
- Purpose
Evaluate trained models on datasets with detailed metrics (under development).
- Common flags
--checkpoint: model checkpoint directory or file--data: single dataset to evaluate--train, --valid, --test: evaluate on multiple splits--detailed: compute per-structure breakdown--plots: generate correlation and error distribution plots--output-dir: output directory for results
- Usage
python -m mmml.cli.evaluate_model --checkpoint model/ --data test.npz
- Multiple splits
python -m mmml.cli.evaluate_model --checkpoint model/ \\ --train train.npz --valid valid.npz --test test.npz \\ --output-dir evaluation
- Outputs
Error metrics (MAE, RMSE, R²) for energy, forces, dipoles
Correlation plots (if –plots specified)
Per-structure analysis (if –detailed specified)
dynamics.py#
- Purpose
Molecular dynamics and vibrational analysis with multiple framework support (ASE, JAX MD).
- Common flags
--checkpoint: model checkpoint directory or file--molecule: predefined molecule (CO2, H2O, CH4, NH3)--structure: load structure from file (XYZ, PDB, etc.)--optimize: optimize geometry--frequencies: calculate vibrational frequencies--ir-spectra: calculate IR spectrum (requires –frequencies)--md: run molecular dynamics--framework: MD framework (ase or jaxmd)--ensemble: MD ensemble (nve, nvt, npt)--temperature: temperature (K)--timestep: MD timestep (fs)--nsteps: number of MD steps--output-dir: output directory
- Usage - Optimization
python -m mmml.cli.dynamics --checkpoint model/ --molecule CO2 \\ --optimize --output-dir co2_opt
- Usage - Vibrational analysis
python -m mmml.cli.dynamics --checkpoint model/ --molecule CO2 \\ --frequencies --ir-spectra --output-dir co2_vib
- Usage - Molecular dynamics (ASE)
python -m mmml.cli.dynamics --checkpoint model/ --molecule CO2 \\ --md --framework ase --ensemble nvt --temperature 300 --nsteps 10000 \\ --output-dir co2_md
- Usage - Full workflow
python -m mmml.cli.dynamics --checkpoint model/ --structure molecule.xyz \\ --optimize --frequencies --ir-spectra --md --nsteps 5000 \\ --output-dir full_analysis
- Outputs
Optimized geometries (XYZ format)
Vibrational frequencies and normal modes
IR spectra (plots and data)
MD trajectories (ASE trajectory format)
Analysis results and statistics
plot_training.py#
- Purpose
Visualize training history and analyze model parameters from saved checkpoints.
- Common flags
history_files: one or more training history JSON files--compare: compare two training runs (requires 2 history files)--params: parameter pickle file(s) for analysis--analyze-params: analyze and plot parameter structure--output-dir: output directory for plots--dpi: DPI for output images (default: 150)--format: output format (png, pdf, svg, jpg)--smoothing: exponential smoothing factor (0-1, 0=none)--summary-only: only print text summary, no plots
- Usage single model
python -m mmml.cli.plot_training \ checkpoints/my_model/history.json \ --output-dir plots --dpi 300
- Usage comparison
python -m mmml.cli.plot_training \ model1/history.json model2/history.json \ --compare --names "Model A" "Model B" \ --smoothing 0.9
- With parameter analysis
python -m mmml.cli.plot_training history.json \ --params best_params.pkl \ --analyze-params
- Outputs
Training history plots showing loss curves and metrics
Parameter analysis plots (if requested)
Text summary of training performance
Minimal example files#
Model args (EF) JSON (if constructing a default model):
{
"features": 64,
"max_degree": 0,
"num_basis_functions": 32,
"num_iterations": 2,
"n_res": 2,
"cutoff": 8.0,
"max_atomic_number": 28,
"zbl": false,
"efa": false
}
- Dataset layout
Single
npzfile with arrays at least:R(positions),Z(atomic numbers),E(energies),F(forces)
Minimal Slurm scripts#
- Training (1 GPU)
#!/bin/bash #SBATCH -J mmml-train #SBATCH -A your_account #SBATCH -p gpu #SBATCH -N 1 #SBATCH -c 8 #SBATCH --gres=gpu:1 #SBATCH -t 12:00:00 #SBATCH -o slurm-%j.out module load cuda/12.1 # if needed source /path/to/venv/bin/activate srun python -m mmml.cli.make_training \ --data /path/to/data.npz \ --tag physnet_run1 \ --num_epochs 20 \ --batch_size 8 \ --learning_rate 1e-3 \ --ckpt_dir /scratch/$USER/mmml_checkpoints/physnet_run1
- MD run (CPU or GPU)
#!/bin/bash #SBATCH -J mmml-md #SBATCH -A your_account #SBATCH -p gpu #SBATCH -N 1 #SBATCH -c 8 #SBATCH --gres=gpu:1 #SBATCH -t 02:00:00 #SBATCH -o slurm-%j.out module load cuda/12.1 # if needed source /path/to/venv/bin/activate srun python -m mmml.cli.run_sim \ --pdbfile /path/to/box.pdb \ --checkpoint /scratch/$USER/mmml_checkpoints/physnet_run1 \ --n-monomers 1000 \ --n-atoms-monomer 3 \ --temperature 100 \ --timestep 0.1 \ --num-steps 10000 \ --output-prefix md_simulation
- Debug (short) job
#!/bin/bash #SBATCH -J mmml-debug #SBATCH -A your_account #SBATCH -p debug #SBATCH -N 1 #SBATCH -c 4 #SBATCH -t 00:10:00 #SBATCH -o slurm-%j.out source /path/to/venv/bin/activate srun python -m mmml.cli.make_res --resname WAT --pdb water.pdb --out water_res
Notes#
For reproducible results, set seeds where provided by flags.
Ensure the box size in
run_sim.pyis physically reasonable for your system.If running on CPU-only nodes, remove CUDA module loads.
The
calculator.pymodule provides a generic interface that automatically detects model types.Use
plot_training.pyto visualize and compare training runs from JSON history files.All CLI tools support
--helpfor detailed usage information.