Finding Bottlenecks

Find and fix slow attributes in josh simulations

Introduction

Josh offers two complementary profiling approaches depending on what you need to understand:

Approach What it measures Who it’s for
Josh profiling (--enable-profiler) Wall-clock time per attribute evaluation in the josh language Model developers optimizing simulation logic
JFR profiling (Java Flight Recorder) CPU hotspots, GC pauses, memory, I/O inside the JVM Josh team diagnosing runtime performance

Most users will want josh profiling – it tells you which attributes in your model are expensive to evaluate, using the same units and export pipeline you already work with. JFR is a lower-level tool that exposes the Java stack trace and is primarily useful for the Josh development team.

Josh Profiling

Josh has a built-in evaluation profiler that measures how long each attribute takes to resolve. It works through a synthetic .evalDuration suffix that you can append to any attribute in your josh script. This tells you which parts of your model are expensive – useful when optimizing a simulation that takes hours.

How it works

Append .evalDuration to any attribute to get its resolution time in milliseconds. Use sum() across an entity population to get total time spent evaluating that attribute, or use it directly on a patch-level attribute.

Without enable_profiler=True, all .evalDuration values are guaranteed 0 ms with zero overhead. .evalDuration is a reserved suffix – you cannot define your own attributes with that name.

Example: finding what to optimize

Imagine a meadow model where shrub cover is tracked as 5000 individual Shrub organisms. Each has a cover attribute, and the patch computes shrubPctCover = count(Shrub) / 10000 – the fraction of a 10,000-shrub carrying capacity. Profiling reveals that resolving cover for all those entities dominates runtime:

from pathlib import Path

print(Path("../../examples/profiler_before.josh").read_text())
start simulation MeadowModel

  grid.size = 1 count
  grid.low = 0 count latitude, 0 count longitude
  grid.high = 3 count latitude, 3 count longitude

  steps.low = 0 count
  steps.high = 10 count

  exportFiles.patch = "file:///tmp/eval_duration_before.csv"

end simulation


start patch Default

  Tree.init = create 200 count of Tree
  Shrub.init = create 5000 count of Shrub

  # Shrub cover as fraction of max capacity (10000)
  shrubPctCover.step = count(Shrub) / 10000

  export.avgTreeHeight.step = mean(Tree.height)
  export.shrubPctCover.step = shrubPctCover
  export.treeHeightMs.step = sum(Tree.height.evalDuration)
  export.shrubCoverMs.step = sum(Shrub.cover.evalDuration)

end patch


start organism Tree

  height.init = sample uniform from 1 m to 5 m
  height.step = prior.height + sample uniform from 0 m to 0.5 m

end organism


start organism Shrub

  cover.init = sample uniform from 0% to 5%
  cover.step = limit prior.cover + sample uniform from 0% to 0.5% to [0%, 100%]

end organism
from joshpy.cli import JoshCLI, RunConfig
from joshpy.jar import JarMode
import pandas as pd

cli = JoshCLI(josh_jar=JarMode.DEV)

result = cli.run(RunConfig(
    script=Path("../../examples/profiler_before.josh"),
    simulation="MeadowModel",
    enable_profiler=True,
))

df_before = pd.read_csv("/tmp/eval_duration_before.csv")
patch = df_before.groupby("step").first().reset_index()

The timing columns (treeHeightMs, shrubCoverMs) appear alongside the regular export columns in the CSV. Let’s pull them out to see where time is being spent:

print("=== Eval duration (ms) per step ===")
print(patch[["step", "treeHeightMs", "shrubCoverMs"]].to_string(index=False))
print()
print(f"Total tree height eval:  {patch['treeHeightMs'].sum():.0f} ms")
print(f"Total shrub cover eval:  {patch['shrubCoverMs'].sum():.0f} ms")
=== Eval duration (ms) per step ===
 step  treeHeightMs  shrubCoverMs
    0      4.719780    658.500628
    1      0.914415    198.071841
    2      0.664683    209.907304
    3      0.293025    119.620102
    4      0.383193     62.064980
    5      0.491278    273.991658
    6      0.474356    237.879601
    7      0.325867    150.985240
    8      0.312148    161.980191
    9      0.277461    132.324344
   10      0.513607    148.344850

Total tree height eval:  9 ms
Total shrub cover eval:  2354 ms

The profiler shows shrubCoverMs dominates – 5000 per-entity evaluations per patch per step add up. But none of the shrub model logic actually requires individual identity; the only output is an aggregate cover fraction. The fix: replace the organism population with a single patch-level attribute that computes the same quantity directly:

print(Path("../../examples/profiler_after.josh").read_text())
start simulation MeadowModel

  grid.size = 1 count
  grid.low = 0 count latitude, 0 count longitude
  grid.high = 3 count latitude, 3 count longitude

  steps.low = 0 count
  steps.high = 10 count

  exportFiles.patch = "file:///tmp/eval_duration_after.csv"

end simulation


start patch Default

  Tree.init = create 200 count of Tree

  # Same quantity as before, but as a direct patch-level attribute
  shrubPctCover.init = 5000 / 10000
  shrubPctCover.step = limit prior.shrubPctCover + sample uniform from 0 to 0.01 to [0, 1]

  export.avgTreeHeight.step = mean(Tree.height)
  export.shrubPctCover.step = shrubPctCover
  export.treeHeightMs.step = sum(Tree.height.evalDuration)
  export.shrubCoverMs.step = shrubPctCover.evalDuration

end patch


start organism Tree

  height.init = sample uniform from 1 m to 5 m
  height.step = prior.height + sample uniform from 0 m to 0.5 m

end organism
result = cli.run(RunConfig(
    script=Path("../../examples/profiler_after.josh"),
    simulation="MeadowModel",
    enable_profiler=True,
))

df_after = pd.read_csv("/tmp/eval_duration_after.csv")
patch = df_after.groupby("step").first().reset_index()

print("=== Eval duration (ms) per step ===")
print(patch[["step", "treeHeightMs", "shrubCoverMs"]].to_string(index=False))
print()
print(f"Total tree height eval:  {patch['treeHeightMs'].sum():.0f} ms")
print(f"Total shrub cover eval:  {patch['shrubCoverMs'].sum():.0f} ms")
=== Eval duration (ms) per step ===
 step  treeHeightMs  shrubCoverMs
    0      1.495436      0.472081
    1      1.241889      0.218667
    2      0.971120      0.164279
    3      1.373083      0.194442
    4      1.094326      0.284344
    5      0.673712      0.163500
    6      0.712840      0.151736
    7      0.510147      0.182008
    8      0.412372      0.109939
    9      0.463403      0.111222
   10      0.606622      0.684023

Total tree height eval:  10 ms
Total shrub cover eval:  3 ms

Same shrub cover fraction, same Tree population – but shrubCoverMs drops to near zero because there are no per-entity evaluations to sum. The profiler made the bottleneck obvious.

Profiling in sweeps

enable_profiler threads through SweepManager.run() and run_sweep():

results = manager.run(enable_profiler=True)

Key details

  • .evalDuration is a reserved suffix – you cannot define your own attributes named evalDuration.
  • The profiler adds ~4% overhead when enabled, negligible when disabled.
  • Timing is wall-clock milliseconds.
  • Standard aggregates (mean, sum, etc.) work on .evalDuration across entity populations.

JFR Profiling

Warning

JFR output exposes the underlying Java stack trace, which is unlikely to help diagnose a scientific error in your simulation. It is most useful for the Josh team to identify and fix performance bottlenecks in the Java runtime. If you have a performance issue, please include the JFR summary and recording when filing a GitHub issue!

For diagnosing scientific errors (e.g., “why are my results different from expected?”), use debug output instead.

When a simulation or preprocessing step is unexpectedly slow, Java Flight Recorder (JFR) can capture detailed JVM profiling data with minimal overhead (~1-2%). JFR is built into OpenJDK 11+ and records CPU hotspots, GC pauses, memory allocation, thread activity, and I/O.

joshpy makes JFR easy to use:

  1. Pass a JfrConfig to any CLI method to enable recording
  2. Use diagnose_jfr() for a resource breakdown (CPU, memory, GC, I/O)
  3. Use summarize_jfr() for a raw text summary to attach to GitHub issues

No extra dependencies are needed – JFR is part of the JDK that Josh already requires.

Step 1: Run a Simulation with JFR

Enable JFR by passing a JfrConfig to any CLI method. The only required field is output – the path where the recording will be written.

from pathlib import Path
from joshpy.cli import JoshCLI, RunConfig, JfrConfig
from joshpy.jar import JarMode

cli = JoshCLI(josh_jar=JarMode.DEV)

jfr_path = Path("/tmp/simulation_profile.jfr")

result = cli.run(
    RunConfig(
        script=Path("../../examples/quickstart.josh"),
        simulation="Main",
        replicates=1,
    ),
    jfr=JfrConfig(output=jfr_path),
)

print(f"Simulation: {'OK' if result.success else 'FAILED'}")
print(f"JFR file exists: {jfr_path.exists()}")
if jfr_path.exists():
    size_kb = jfr_path.stat().st_size / 1024
    print(f"JFR file size: {size_kb:.0f} KB")
Simulation: OK
JFR file exists: True
JFR file size: 1057 KB

The simulation runs normally – JFR records in the background with negligible impact on execution time.

Step 2: Diagnose Resource Usage

diagnose_jfr() parses the recording into sections — CPU, memory, GC, file I/O, and thread contention — with a diagnosis of the likely bottleneck:

profile = cli.diagnose_jfr(jfr_path)
print(profile.summary())
JFR Resource Diagnostic (7.0s recording)
========================================

--- CPU ---
How much processor time the JVM uses.

  JVM user:      69.7%
  JVM system:    2.5%
  Machine total: 76.2%
  (6 samples)

--- MEMORY ---
Java heap and OS-level memory footprint.

  Heap used:     217 MB / 3.9 GB reserved
  Process RSS:   480 MB peak
  Physical RAM:  15.6 GB

--- GARBAGE COLLECTION ---
GC pauses stop all threads to reclaim memory. High GC time
often means too many entities or too little heap.

  Pauses: 27 collections, 1139 ms total, 110.0 ms max
  GC fraction: 16.3% of recording

--- FILE I/O ---
Time reading/writing files (data loading, CSV export).

  No file I/O events recorded.

--- THREAD CONTENTION ---
Lock contention when running with --parallel.

  Monitor enters: 12 events, 156 ms total

--- DIAGNOSIS ---
Likely bottleneck: GC-bound
GC pauses account for 16% of recording time. This usually means too many entities creating allocation pressure, or the heap is too small. Try reducing entity count or increasing JVM heap size (-Xmx).

The same analysis is available from the command line without writing any Python:

python -m joshpy.jfr /tmp/simulation_profile.jfr
python -m joshpy.jfr /tmp/simulation_profile.jfr --detailed

Step 3: Get a Raw Text Summary

For filing GitHub issues, summarize_jfr() produces the raw JFR event table using the JDK’s bundled jfr CLI tool:

summary = cli.summarize_jfr(jfr_path)

if summary.success:
    # Print the first 80 lines of the summary
    lines = summary.stdout.strip().split("\n")
    for line in lines[:80]:
        print(line)
    if len(lines) > 80:
        print(f"\n... ({len(lines) - 80} more lines)")
else:
    print(f"Summary failed: {summary.stderr}")
Version: 2.1
 Chunks: 1
 Start: 2026-04-11 19:27:35 (UTC)
 Duration: 7 s

 Event Type                              Count  Size (bytes) 
=============================================================
 jdk.GCPhaseParallel                      2972         77599
 jdk.ObjectAllocationSample               1922         29013
 jdk.PromoteObjectInNewPLAB                930         15701
 jdk.ExecutionSample                       606          6665
 jdk.ModuleExport                          582          5641
 jdk.BooleanFlag                           505         15365
 jdk.TenuringDistribution                  405          4154
 jdk.ActiveSetting                         360         10391
 jdk.LongFlag                              161          5220
 jdk.ModuleRequire                         155          1395
 jdk.Deoptimization                        147          3519
 jdk.UnsignedLongFlag                      132          4484
 jdk.Checkpoint                            118        740531
 jdk.MetaspaceChunkFreeListSummary         108          1944
 jdk.GCReferenceStatistics                 108          1092
 jdk.OldObjectSample                       104          3297
 jdk.GCPhasePauseLevel1                    103          4112
 jdk.InitialEnvironmentVariable             97          6106
 jdk.ThreadPark                             86          3324
 jdk.GCPhasePauseLevel2                     66          2832
 jdk.GCHeapSummary                          54          2062
 jdk.MetaspaceSummary                       54          2646
 jdk.G1HeapSummary                          54          1274
 jdk.InitialSecurityProperty                48          2508
 jdk.ExecuteVMOperation                     42           690
 jdk.SafepointBegin                         40           564
 jdk.NativeLibrary                          39          2897
 jdk.ThreadAllocationStatistics             35           361
 jdk.Compilation                            34           922
 jdk.ThreadSleep                            32           608
 jdk.JavaErrorThrow                         30          1269
 jdk.SystemProcess                          29          6978
 jdk.StringFlag                             28           917
 jdk.GarbageCollection                      27           621
 jdk.GCCPUTime                              27           523
 jdk.GCPhasePause                           27           621
 jdk.IntFlag                                25           935
 jdk.PromoteObjectOutsidePLAB               19           297
 jdk.YoungGarbageCollection                 16           208
 jdk.G1GarbageCollection                    16           208
 jdk.G1MMU                                  16           208
 jdk.EvacuationInformation                  16           402
 jdk.G1EvacuationYoungStatistics            16           418
 jdk.G1EvacuationOldStatistics              16           327
 jdk.G1BasicIHOP                            16           587
 jdk.G1AdaptiveIHOP                         16           614
 jdk.NativeMethodSample                     16           171
 jdk.InitialSystemProperty                  15           892
 jdk.DoubleFlag                             15           605
 jdk.UnsignedIntFlag                        14           473
 jdk.JavaMonitorEnter                       12           288
 jdk.ThreadStart                            11           130
 jdk.SystemGC                               11           165
 jdk.OldGarbageCollection                   11           132
 jdk.CompilationFailure                     10           438
 jdk.ResidentSetSize                         6           100
 jdk.GCHeapMemoryPoolUsage                   6           219
 jdk.CPULoad                                 6           114
 jdk.JavaThreadStatistics                    6            66
 jdk.ClassLoadingStatistics                  6            60
 jdk.ClassLoaderStatistics                   6           165
 jdk.CompilerStatistics                      6           169
 jdk.CodeCacheStatistics                     6           192
 jdk.ExceptionStatistics                     6            78
 jdk.ThreadEnd                               2            18
 jdk.GCHeapMemoryUsage                       2            40
 jdk.PhysicalMemory                          2            34
 jdk.GCConfiguration                         2            50
 jdk.Metadata                                1        103215
 jdk.DirectBufferStatistics                  1            23
 jdk.MetaspaceGCThreshold                    1            16
 jdk.ContainerConfiguration                  1            67
 jdk.Shutdown                                1            40

... (112 more lines)

This summary is compact enough to paste directly into a GitHub issue.

Step 4: Profile Preprocessing

JFR works on any CLI method, not just run(). If preprocessing is slow, profile it the same way:

from joshpy.cli import NetcdfPreprocessConfig

result = cli.preprocess(
    NetcdfPreprocessConfig(
        script=Path("simulation.josh"),
        simulation="Main",
        data_file=Path("temperature.nc"),
        variable="temp",
        units="K",
        output=Path("temperature.jshd"),
    ),
    jfr=JfrConfig(output=Path("/tmp/preprocess_profile.jfr")),
)

The same pattern applies to validate(), discover_config(), inspect_jshd(), and all other CLI methods.

Step 5: Profile a Parameter Sweep

JFR also works through SweepManager.run() and run_sweep(). Each job automatically gets its own recording file with the run_hash in the filename:

from joshpy.sweep import SweepManager
from joshpy.cli import JfrConfig

manager = (
    SweepManager.builder(config)
    .with_registry("experiment.duckdb", experiment_name="profiled_sweep")
    .with_cli(cli)
    .build()
)

# Each job writes to /tmp/sweep_profile_{run_hash}.jfr
results = manager.run(jfr=JfrConfig(output=Path("/tmp/sweep_profile.jfr")))

For a sweep with 3 jobs, this produces:

/tmp/sweep_profile_a1b2c3d4e5f6.jfr
/tmp/sweep_profile_f7e8d9c0b1a2.jfr
/tmp/sweep_profile_3456789abcde.jfr

You can then summarize any individual job’s recording:

for job, result in results:
    jfr_path = Path(f"/tmp/sweep_profile_{job.run_hash}.jfr")
    if jfr_path.exists():
        summary = cli.summarize_jfr(jfr_path)
        print(f"=== {job.parameters} ===")
        print(summary.stdout[:500])

JfrConfig Options

Field Type Default Description
output Path (required) Where to write the .jfr file
settings str "profile" "profile" (~2% overhead, more detail) or "default" (~1%, lighter)
maxsize str \| None None Max recording size (e.g., "500m", "1g")
# Default: profile settings, no size limit
JfrConfig(output=Path("recording.jfr"))

# Lighter recording for long runs
JfrConfig(output=Path("recording.jfr"), settings="default")

# Cap file size for very long simulations
JfrConfig(output=Path("recording.jfr"), maxsize="500m")

Analyzing .jfr Files

Quick text summary (no tools needed)

summary = cli.summarize_jfr(Path("recording.jfr"))
print(summary.stdout)

JDK CLI tool (terminal)

The jfr command is bundled with the JDK:

# High-level summary
jfr summary recording.jfr

# Full event dump (verbose)
jfr print recording.jfr

# Filter by event type
jfr print --events jdk.GCPhasePause recording.jfr
jfr print --events jdk.CPULoad recording.jfr

GUI tools (detailed analysis)

For deeper investigation:

  • JDK Mission Control: Free, open-source GUI. Download from adoptium.net and open the .jfr file.
  • IntelliJ IDEA: Open the .jfr file directly (built-in profiler support).

Reporting Performance Issues

When filing a GitHub issue for a performance problem:

  1. Reproduce with JFR enabled:

    result = cli.run(config, jfr=JfrConfig(output=Path("issue_profile.jfr")))
  2. Get the text summary:

    summary = cli.summarize_jfr(Path("issue_profile.jfr"))
    print(summary.stdout)
  3. Attach both to your issue:

    • Paste the text summary in the issue body
    • Attach the .jfr file for detailed analysis

Cleanup

import os

for f in ["/tmp/simulation_profile.jfr", "/tmp/eval_duration_before.csv", "/tmp/eval_duration_after.csv"]:
    if os.path.exists(f):
        os.remove(f)

Learn More