Josh offers two complementary profiling approaches depending on what you need to understand:
Approach
What it measures
Who it’s for
Josh profiling (--enable-profiler)
Wall-clock time per attribute evaluation in the josh language
Model developers optimizing simulation logic
JFR profiling (Java Flight Recorder)
CPU hotspots, GC pauses, memory, I/O inside the JVM
Josh team diagnosing runtime performance
Most users will want josh profiling – it tells you which attributes in your model are expensive to evaluate, using the same units and export pipeline you already work with. JFR is a lower-level tool that exposes the Java stack trace and is primarily useful for the Josh development team.
Josh Profiling
Josh has a built-in evaluation profiler that measures how long each attribute takes to resolve. It works through a synthetic .evalDuration suffix that you can append to any attribute in your josh script. This tells you which parts of your model are expensive – useful when optimizing a simulation that takes hours.
How it works
Append .evalDuration to any attribute to get its resolution time in milliseconds. Use sum() across an entity population to get total time spent evaluating that attribute, or use it directly on a patch-level attribute.
Without enable_profiler=True, all .evalDuration values are guaranteed 0 ms with zero overhead. .evalDuration is a reserved suffix – you cannot define your own attributes with that name.
Example: finding what to optimize
Imagine a meadow model where shrub cover is tracked as 5000 individual Shrub organisms. Each has a cover attribute, and the patch computes shrubPctCover = count(Shrub) / 10000 – the fraction of a 10,000-shrub carrying capacity. Profiling reveals that resolving cover for all those entities dominates runtime:
from pathlib import Pathprint(Path("../../examples/profiler_before.josh").read_text())
start simulation MeadowModel
grid.size = 1 count
grid.low = 0 count latitude, 0 count longitude
grid.high = 3 count latitude, 3 count longitude
steps.low = 0 count
steps.high = 10 count
exportFiles.patch = "file:///tmp/eval_duration_before.csv"
end simulation
start patch Default
Tree.init = create 200 count of Tree
Shrub.init = create 5000 count of Shrub
# Shrub cover as fraction of max capacity (10000)
shrubPctCover.step = count(Shrub) / 10000
export.avgTreeHeight.step = mean(Tree.height)
export.shrubPctCover.step = shrubPctCover
export.treeHeightMs.step = sum(Tree.height.evalDuration)
export.shrubCoverMs.step = sum(Shrub.cover.evalDuration)
end patch
start organism Tree
height.init = sample uniform from 1 m to 5 m
height.step = prior.height + sample uniform from 0 m to 0.5 m
end organism
start organism Shrub
cover.init = sample uniform from 0% to 5%
cover.step = limit prior.cover + sample uniform from 0% to 0.5% to [0%, 100%]
end organism
The timing columns (treeHeightMs, shrubCoverMs) appear alongside the regular export columns in the CSV. Let’s pull them out to see where time is being spent:
print("=== Eval duration (ms) per step ===")print(patch[["step", "treeHeightMs", "shrubCoverMs"]].to_string(index=False))print()print(f"Total tree height eval: {patch['treeHeightMs'].sum():.0f} ms")print(f"Total shrub cover eval: {patch['shrubCoverMs'].sum():.0f} ms")
=== Eval duration (ms) per step ===
step treeHeightMs shrubCoverMs
0 4.719780 658.500628
1 0.914415 198.071841
2 0.664683 209.907304
3 0.293025 119.620102
4 0.383193 62.064980
5 0.491278 273.991658
6 0.474356 237.879601
7 0.325867 150.985240
8 0.312148 161.980191
9 0.277461 132.324344
10 0.513607 148.344850
Total tree height eval: 9 ms
Total shrub cover eval: 2354 ms
The profiler shows shrubCoverMs dominates – 5000 per-entity evaluations per patch per step add up. But none of the shrub model logic actually requires individual identity; the only output is an aggregate cover fraction. The fix: replace the organism population with a single patch-level attribute that computes the same quantity directly:
start simulation MeadowModel
grid.size = 1 count
grid.low = 0 count latitude, 0 count longitude
grid.high = 3 count latitude, 3 count longitude
steps.low = 0 count
steps.high = 10 count
exportFiles.patch = "file:///tmp/eval_duration_after.csv"
end simulation
start patch Default
Tree.init = create 200 count of Tree
# Same quantity as before, but as a direct patch-level attribute
shrubPctCover.init = 5000 / 10000
shrubPctCover.step = limit prior.shrubPctCover + sample uniform from 0 to 0.01 to [0, 1]
export.avgTreeHeight.step = mean(Tree.height)
export.shrubPctCover.step = shrubPctCover
export.treeHeightMs.step = sum(Tree.height.evalDuration)
export.shrubCoverMs.step = shrubPctCover.evalDuration
end patch
start organism Tree
height.init = sample uniform from 1 m to 5 m
height.step = prior.height + sample uniform from 0 m to 0.5 m
end organism
result = cli.run(RunConfig( script=Path("../../examples/profiler_after.josh"), simulation="MeadowModel", enable_profiler=True,))df_after = pd.read_csv("/tmp/eval_duration_after.csv")patch = df_after.groupby("step").first().reset_index()print("=== Eval duration (ms) per step ===")print(patch[["step", "treeHeightMs", "shrubCoverMs"]].to_string(index=False))print()print(f"Total tree height eval: {patch['treeHeightMs'].sum():.0f} ms")print(f"Total shrub cover eval: {patch['shrubCoverMs'].sum():.0f} ms")
=== Eval duration (ms) per step ===
step treeHeightMs shrubCoverMs
0 1.495436 0.472081
1 1.241889 0.218667
2 0.971120 0.164279
3 1.373083 0.194442
4 1.094326 0.284344
5 0.673712 0.163500
6 0.712840 0.151736
7 0.510147 0.182008
8 0.412372 0.109939
9 0.463403 0.111222
10 0.606622 0.684023
Total tree height eval: 10 ms
Total shrub cover eval: 3 ms
Same shrub cover fraction, same Tree population – but shrubCoverMs drops to near zero because there are no per-entity evaluations to sum. The profiler made the bottleneck obvious.
Profiling in sweeps
enable_profiler threads through SweepManager.run() and run_sweep():
results = manager.run(enable_profiler=True)
Key details
.evalDuration is a reserved suffix – you cannot define your own attributes named evalDuration.
The profiler adds ~4% overhead when enabled, negligible when disabled.
Timing is wall-clock milliseconds.
Standard aggregates (mean, sum, etc.) work on .evalDuration across entity populations.
JFR Profiling
Warning
JFR output exposes the underlying Java stack trace, which is unlikely to help diagnose a scientific error in your simulation. It is most useful for the Josh team to identify and fix performance bottlenecks in the Java runtime. If you have a performance issue, please include the JFR summary and recording when filing a GitHub issue!
For diagnosing scientific errors (e.g., “why are my results different from expected?”), use debug output instead.
When a simulation or preprocessing step is unexpectedly slow, Java Flight Recorder (JFR) can capture detailed JVM profiling data with minimal overhead (~1-2%). JFR is built into OpenJDK 11+ and records CPU hotspots, GC pauses, memory allocation, thread activity, and I/O.
joshpy makes JFR easy to use:
Pass a JfrConfig to any CLI method to enable recording
Use diagnose_jfr() for a resource breakdown (CPU, memory, GC, I/O)
Use summarize_jfr() for a raw text summary to attach to GitHub issues
No extra dependencies are needed – JFR is part of the JDK that Josh already requires.
Step 1: Run a Simulation with JFR
Enable JFR by passing a JfrConfig to any CLI method. The only required field is output – the path where the recording will be written.
JFR Resource Diagnostic (7.0s recording)
========================================
--- CPU ---
How much processor time the JVM uses.
JVM user: 69.7%
JVM system: 2.5%
Machine total: 76.2%
(6 samples)
--- MEMORY ---
Java heap and OS-level memory footprint.
Heap used: 217 MB / 3.9 GB reserved
Process RSS: 480 MB peak
Physical RAM: 15.6 GB
--- GARBAGE COLLECTION ---
GC pauses stop all threads to reclaim memory. High GC time
often means too many entities or too little heap.
Pauses: 27 collections, 1139 ms total, 110.0 ms max
GC fraction: 16.3% of recording
--- FILE I/O ---
Time reading/writing files (data loading, CSV export).
No file I/O events recorded.
--- THREAD CONTENTION ---
Lock contention when running with --parallel.
Monitor enters: 12 events, 156 ms total
--- DIAGNOSIS ---
Likely bottleneck: GC-bound
GC pauses account for 16% of recording time. This usually means too many entities creating allocation pressure, or the heap is too small. Try reducing entity count or increasing JVM heap size (-Xmx).
The same analysis is available from the command line without writing any Python:
For filing GitHub issues, summarize_jfr() produces the raw JFR event table using the JDK’s bundled jfr CLI tool:
summary = cli.summarize_jfr(jfr_path)if summary.success:# Print the first 80 lines of the summary lines = summary.stdout.strip().split("\n")for line in lines[:80]:print(line)iflen(lines) >80:print(f"\n... ({len(lines) -80} more lines)")else:print(f"Summary failed: {summary.stderr}")
You can then summarize any individual job’s recording:
for job, result in results: jfr_path = Path(f"/tmp/sweep_profile_{job.run_hash}.jfr")if jfr_path.exists(): summary = cli.summarize_jfr(jfr_path)print(f"=== {job.parameters} ===")print(summary.stdout[:500])
JfrConfig Options
Field
Type
Default
Description
output
Path
(required)
Where to write the .jfr file
settings
str
"profile"
"profile" (~2% overhead, more detail) or "default" (~1%, lighter)
maxsize
str \| None
None
Max recording size (e.g., "500m", "1g")
# Default: profile settings, no size limitJfrConfig(output=Path("recording.jfr"))# Lighter recording for long runsJfrConfig(output=Path("recording.jfr"), settings="default")# Cap file size for very long simulationsJfrConfig(output=Path("recording.jfr"), maxsize="500m")