Manual Workflow

Step-by-step parameter sweep using individual joshpy components

Introduction

Josh is an ecological simulation runtime for agent-based modeling developed by the Eric and Wendy Schmidt Center for Data Science and Environment. This demo assumes familiarity with Josh’s simulation language and runtime.

joshpy is a Python client that enables:

  • Orchestration: Define parameter sweeps, expand job configurations, and execute simulations programmatically
  • Tracking: Register runs in a DuckDB-backed registry with session and config tracking
  • Data Loading: Import cell-level CSV exports into queryable tables
  • Analysis: Query results across parameter values and replicates
  • Diagnostics: Quick matplotlib visualizations for simulation sanity checks
  • Visualization: Create publication-quality plots with R/ggplot2 integration

This demo walks through a complete parameter sweep workflow using each component directly. This approach provides maximum control and visibility into each step. For a simplified workflow using SweepManager, see SweepManager Workflow.

We vary the maxGrowth parameter from 10 to 100 meters/step across 10 experiments, each with 3 replicates, then load, query, and visualize the results.

Prerequisites

Ensure the Josh JAR is available at jar/joshsim-fat.jar and joshpy is installed:

pip install -e '.[all]'

For visualization, ensure R is installed with the following packages:

install.packages(c("reticulate", "ggplot2", "dplyr"))

Step 1: Setup - Define Parameter Sweep

The first step is to define our experiment configuration. joshpy uses three key abstractions:

  • JobConfig: The top-level configuration specifying source files, templates, and sweep parameters
  • SweepConfig: Defines which parameters to sweep and their values
  • SweepParameter: A single parameter with a name and list of values

The JobExpander will later compute the cartesian product of all parameters, generating one job per combination.

from pathlib import Path

from joshpy.jobs import JobConfig, SweepConfig, SweepParameter

# Paths to source files (optimized for fast tutorial builds)
SOURCE_PATH = Path("../../examples/tutorial_sweep.josh")
TEMPLATE_PATH = Path("../../examples/templates/sweep_config.jshc.j2")

# Parameter sweep: maxGrowth from 10 to 100 in steps of 10
MAX_GROWTH_VALUES = list(range(10, 101, 10))

config = JobConfig(
    template_path=TEMPLATE_PATH,
    source_path=SOURCE_PATH,
    simulation="Main",
    replicates=3,
    sweep=SweepConfig(
        parameters=[
            # maxGrowth is swept - creates one job per value
            SweepParameter(name="maxGrowth", values=MAX_GROWTH_VALUES),
            # Note: daysPerYear is a static value in the template (not swept)
        ]
    ),
)

The sweep creates one job per maxGrowth value (10, 20, …, 100). Static values like daysPerYear are defined directly in the template, not as sweep parameters.

Let’s examine the source files. The .josh file defines the simulation, and the .jshc.j2 template provides parameterized configuration:

Josh Source

print(SOURCE_PATH.read_text())
# Tutorial sweep simulation - optimized for fast documentation builds
# Uses larger grid cells (5000m) for faster execution with same extent

start simulation Main

  grid.size = 5000 m
  grid.low = 33.7 degrees latitude, -115.4 degrees longitude
  grid.high = 34.0 degrees latitude, -116.4 degrees longitude
  grid.patch = "Default"

  steps.low = 0 count
  steps.high = 10 count

  exportFiles.patch = "file:///tmp/tutorial_sweep_{maxGrowth}_{replicate}.csv"

end simulation

start patch Default

  ForeverTree.init = create 10 count of ForeverTree

  export.averageAge.step = mean(ForeverTree.age)
  export.averageHeight.step = mean(ForeverTree.height)

end patch

start organism ForeverTree

  # Static config value - same for all sweep runs (initial tree count)
  initialTreeCount.init = config sweep_config.initialTreeCount

  # Swept config value - varies across sweep runs
  maxGrowth.init = config sweep_config.maxGrowth

  age.init = 0 year
  age.step = prior.age + 1 year

  height.init = 0 meters
  # maxGrowth is swept via sweep_config.jshc
  height.step = prior.height + sample uniform from 0 meters to maxGrowth

end organism

start unit year

  alias years
  alias yr
  alias yrs

end unit

Template Configuration

print(TEMPLATE_PATH.read_text())
# Auto-generated configuration for tutorial_sweep.josh
# Parameter sweep: maxGrowth={{ maxGrowth }}

# =============================================================================
# STATIC CONFIG VALUES
# These values are the same for all runs in the sweep.
# Use static values for constants that don't need to vary across experiments.
# =============================================================================

# Initial tree count per organism (constant across all sweep runs)
initialTreeCount = 10 count

# =============================================================================
# SWEPT CONFIG VALUES
# These values vary across sweep runs. Each unique combination creates a job.
# Use swept values for parameters you want to explore or optimize.
# =============================================================================

# Maximum growth per timestep (meters) - SWEPT via Jinja template
maxGrowth = {{ maxGrowth }} meters

Notice how the configuration template has two types of values:

  • Static values (e.g., daysPerYear = 365 count): Fixed values that don’t use Jinja templating. These are the same for all runs in the sweep.
  • Swept values (e.g., maxGrowth = {{ maxGrowth }} meters): Values that use Jinja variables. These vary across sweep runs based on the SweepParameter definitions.

The .josh file references both via config sweep_config.variableName. At runtime, each config variable pulls its value from the generated .jshc file.

Step 2: Initialize Registry and Expand Jobs

The RunRegistry provides experiment tracking backed by DuckDB. It stores:

  • Sessions: High-level experiment metadata
  • Configs: Rendered configuration files with parameter values and input file hashes
  • Runs: Individual execution records with timing and exit codes

The JobExpander takes our JobConfig and generates concrete jobs - one per parameter combination, each with a unique run hash for tracking. The run hash includes the .josh file content, rendered .jshc content, and hashes of any input data files.

from joshpy.jobs import JobExpander
from joshpy.registry import RunRegistry

# Registry path - saved to disk for use in analysis tutorial
REGISTRY_PATH = "demo_registry.duckdb"

# Create registry (overwrites if exists)
registry = RunRegistry(REGISTRY_PATH)

# Expand config into individual jobs
expander = JobExpander()
job_set = expander.expand(config)

# Create a session to track this experiment
# Note: create_session() takes a JobConfig directly and auto-stores it in metadata
session_id = registry.create_session(
    config=config,
    experiment_name="growth_rate_sweep",
)

# Register each job's configuration in the registry
for job in job_set.jobs:
    registry.register_run(
        session_id=session_id,
        run_hash=job.run_hash,
        josh_path=str(job.source_path),
        config_content=job.config_content,
        file_mappings=job.file_mappings,
        parameters=job.parameters,
    )

Step 3: Run the Simulations

The JoshCLI executes jobs via the Josh command-line interface. The run_sweep() function handles execution and automatically records runs in the registry when registry and session_id are provided.

from joshpy.cli import JoshCLI
from joshpy.jobs import run_sweep

# Create CLI targeting the local fat JAR
cli = JoshCLI(josh_jar=Path("../../jar/joshsim-fat.jar"))

# Run all jobs with automatic tracking
# Note: run_sweep() now automatically manages session status:
#   - Sets status to "running" at start
#   - Sets status to "completed" if all jobs succeed
#   - Sets status to "failed" if any job fails
results = run_sweep(cli, job_set, registry=registry, session_id=session_id)
Running 10 jobs (30 total replicates)
[1/10] Running (local): {'maxGrowth': 10}
  [OK] Completed successfully
[2/10] Running (local): {'maxGrowth': 20}
  [OK] Completed successfully
[3/10] Running (local): {'maxGrowth': 30}
  [OK] Completed successfully
[4/10] Running (local): {'maxGrowth': 40}
  [OK] Completed successfully
[5/10] Running (local): {'maxGrowth': 50}
  [OK] Completed successfully
[6/10] Running (local): {'maxGrowth': 60}
  [OK] Completed successfully
[7/10] Running (local): {'maxGrowth': 70}
  [OK] Completed successfully
[8/10] Running (local): {'maxGrowth': 80}
  [OK] Completed successfully
[9/10] Running (local): {'maxGrowth': 90}
  [OK] Completed successfully
[10/10] Running (local): {'maxGrowth': 100}
  [OK] Completed successfully
Completed: 10 succeeded, 0 failed

Step 4: Load Cell Data from CSVs

Josh exports simulation data to CSV files. The recover_sweep_results() function automatically discovers export paths from the Josh file (using inspect_exports), resolves template variables for each job, and loads results into the registry.

from joshpy.sweep import recover_sweep_results

# Automatically discover and load CSV results
recover_sweep_results(cli, job_set, registry)
Loading patch results from: /tmp/tutorial_sweep_{maxGrowth}_{replicate}.csv
  Loaded 1463 rows from tutorial_sweep_10_0.csv
  Loaded 1463 rows from tutorial_sweep_10_1.csv
  Loaded 1463 rows from tutorial_sweep_10_2.csv
  Loaded 1463 rows from tutorial_sweep_20_0.csv
  Loaded 1463 rows from tutorial_sweep_20_1.csv
  Loaded 1463 rows from tutorial_sweep_20_2.csv
  Loaded 1463 rows from tutorial_sweep_30_0.csv
  Loaded 1463 rows from tutorial_sweep_30_1.csv
  Loaded 1463 rows from tutorial_sweep_30_2.csv
  Loaded 1463 rows from tutorial_sweep_40_0.csv
  Loaded 1463 rows from tutorial_sweep_40_1.csv
  Loaded 1463 rows from tutorial_sweep_40_2.csv
  Loaded 1463 rows from tutorial_sweep_50_0.csv
  Loaded 1463 rows from tutorial_sweep_50_1.csv
  Loaded 1463 rows from tutorial_sweep_50_2.csv
  Loaded 1463 rows from tutorial_sweep_60_0.csv
  Loaded 1463 rows from tutorial_sweep_60_1.csv
  Loaded 1463 rows from tutorial_sweep_60_2.csv
  Loaded 1463 rows from tutorial_sweep_70_0.csv
  Loaded 1463 rows from tutorial_sweep_70_1.csv
  Loaded 1463 rows from tutorial_sweep_70_2.csv
  Loaded 1463 rows from tutorial_sweep_80_0.csv
  Loaded 1463 rows from tutorial_sweep_80_1.csv
  Loaded 1463 rows from tutorial_sweep_80_2.csv
  Loaded 1463 rows from tutorial_sweep_90_0.csv
  Loaded 1463 rows from tutorial_sweep_90_1.csv
  Loaded 1463 rows from tutorial_sweep_90_2.csv
  Loaded 1463 rows from tutorial_sweep_100_0.csv
  Loaded 1463 rows from tutorial_sweep_100_1.csv
  Loaded 1463 rows from tutorial_sweep_100_2.csv

Results:
  Jobs in sweep: 10
  Jobs with results loaded: 10
  Total rows loaded: 43890
43890

Step 5: Verify Data Loaded

Let’s verify the data is in the registry and ready for analysis:

# Get summary of loaded data
summary = registry.get_data_summary()
print(summary)
Registry Data Summary
========================================
Sessions: 1
Configs:  10
Runs:     10
Rows:     43,890

Variables: averageAge, averageHeight
Entity types: patch
Parameters: maxGrowth
Steps: 0 - 10
Replicates: 0 - 2
Spatial extent: lon [-115.37, -114.40], lat [33.41, 33.68]
registry.list_export_variables()
['averageAge', 'averageHeight']
registry.list_config_parameters()
['maxGrowth']

Next Steps: Analysis

Now that data is loaded, see Analysis & Visualization Tutorial for comprehensive coverage of:

  • Diagnostic Plots (SimulationDiagnostics) - quick matplotlib visualizations
  • Custom Queries (DiagnosticQueries) - get pandas DataFrames
  • Direct SQL - full DuckDB access for advanced analysis
  • R/ggplot2 - publication-quality figures

Quick example:

from joshpy.diagnostics import SimulationDiagnostics

diag = SimulationDiagnostics(registry)
diag.plot_comparison(
    "averageHeight",
    group_by="maxGrowth",
    title="Tree Height by Growth Rate Parameter",
)
Figure 1: Tree height trajectories across maxGrowth values.

Summary

This demo illustrated the manual joshpy workflow using each component directly:

  1. Define a parameter sweep using JobConfig and SweepConfig
  2. Expand jobs with JobExpander to get concrete job specifications
  3. Register jobs with registry.register_run() for tracking
  4. Execute with run_sweep() for automatic recording
  5. Load outputs with recover_sweep_results() for automatic path discovery
  6. Analyze - see Analysis Tutorial for visualization and queries

Key Design Principles:

  • Thin CLI wrapper: JoshCLI maps 1:1 to CLI commands
  • Thin DuckDB wrapper: Direct registry.conn access for custom SQL
  • Convenience helpers: run_sweep() and recover_sweep_results() for common patterns
  • Full control: Each step is explicit and visible

Related Tutorials:

Cleanup

job_set.cleanup()  # Remove temporary config files
registry.close()

The registry has been saved to demo_registry.duckdb. Run the Analysis Tutorial to explore the results.