Create a self-contained bottle archive from a registered run.
By default, copies data files into the archive and raises if any are missing. Use omit_jshd=True for lightweight archives when the recipient has the data locally.
Parameters
Name
Type
Description
Default
label_or_hash
str
Run label or run_hash to bottle.
required
output_dir
str | Path
Directory for the archive. Default: ./bottles/.
Path('bottles')
cli
Any | None
Optional JoshCLI instance for JAR metadata.
None
omit_jshd
bool
If True, skip copying .jshd data files.
False
Returns
Name
Type
Description
Path
Path to the created .tar.gz archive.
Raises
Name
Type
Description
KeyError
If the run is not found.
ValueError
If josh content is not stored for the run.
FileNotFoundError
If omit_jshd is False and a data file is missing.
check_sparsity
registry.RunRegistry.check_sparsity()
Check for sparse columns in cell_data.
Sparse columns (>50% NULL by default) often indicate that different simulation types are being mixed in the same registry, which hurts query performance.
Returns
Name
Type
Description
SparsityReport
SparsityReport with statistics for each variable column.
Job configuration containing simulation, template, and sweep info. Must have simulation, template_path, and to_dict() attributes (typically a JobConfig from joshpy.jobs).
required
experiment_name
str | None
Name for the experiment. Defaults to config.simulation.
None
session_id
str | None
Optional externally-provided session ID. If None, generates a UUID. This allows the frontend/API layer to manage session IDs (e.g., using project IDs).
Get the number of distinct replicates for a run hash from cell_data.
This is the source-of-truth count, derived from actual loaded data rather than from job_runs metadata. Returns 0 if no data has been loaded yet.
Counts distinct (run_id, replicate) pairs rather than just distinct replicate values, because pooled runs may reuse replicate numbers across different CLI invocations.
Assign a human-readable label to a run configuration.
Labels are unique within a registry. When a collision occurs, the behavior depends on force and on_collision:
Default: raise ValueError
force=True: silently drop the old label and reassign
on_collision="timestamp": rename the old label with a timestamp suffix (e.g., baseline → baseline_20260402_153000) and assign the bare label to the new run
If True, reassign the label even if already taken.
False
on_collision
str | None
Collision strategy. "timestamp" archives the old label with a timestamp suffix. Mutually exclusive with force.
None
Raises
Name
Type
Description
KeyError
If run_hash does not exist.
ValueError
If label is already assigned to a different run and neither force nor on_collision is set, or if both force and on_collision are set, or if on_collision has an invalid value.
list_config_columns
registry.RunRegistry.list_config_columns()
List all parameter column names in config_parameters.
Returns the dynamically-added parameter columns. Column names preserve original names with special characters (e.g., ‘soil.moisture’).
List all export variable names from simulation outputs.
These are the variables exported by Josh simulations, stored as typed columns in the cell_data table. Variable names preserve original .josh names (e.g., ‘avg.height’).
When session_id is provided, only returns variables that have at least one non-NULL value for runs in that session.
Parameters
Name
Type
Description
Default
session_id
str | None
Optional session ID to filter by. If provided, only returns variables with data in that session.
Load debug messages for a run from registered debug output files.
Parameters
Name
Type
Description
Default
label_or_hash
str
Run label or run_hash.
required
run_id
str | None
Optional explicit run execution ID. If omitted, uses latest.
None
entity_types
list[str] | None
Optional debug entity types to include.
None
existing_only
bool
If True, only load files that currently exist.
True
Returns
Name
Type
Description
Any
DebugMessageStore with messages merged across all selected files.
Raises
Name
Type
Description
KeyError
If run/run execution is not found.
ValueError
If no matching debug files are available.
FileNotFoundError
If existing_only=False and any file is missing.
query
registry.RunRegistry.query(sql, params=None)
Execute a SQL query with parameters.
This provides direct access to DuckDB for custom queries beyond the pre-built methods. Use this when you need to run complex queries or explore the data in ways not covered by the API.
Parameters
Name
Type
Description
Default
sql
str
SQL query with ? placeholders for parameters.
required
params
list | None
List of parameter values.
None
Returns
Name
Type
Description
Any
DuckDB relation (call .df() for DataFrame, .fetchall() for tuples).
Examples
>>># Get DataFrame>>> df = registry.query(... "SELECT * FROM cell_data WHERE step BETWEEN ? AND ?",... [0, 10]... ).df()
>>># Get raw results>>> rows = registry.query(... "SELECT COUNT(*) FROM cell_data WHERE run_hash = ?",... ["abc123"]... ).fetchone()
Locate the original .jshc file on disk and check if it still matches.
Looks up the session metadata to find the original config_path, then checks whether the file exists and whether its content has changed since it was registered.
Parameters
Name
Type
Description
Default
run_hash
str
The run hash to look up.
required
Returns
Name
Type
Description
A
ConfigSourceInfo
class:ConfigSourceInfo describing the file’s status.
>>># Nested with time filter>>>with registry.spatial_filter(geojson=park_boundary):... with registry.time_filter(step_range=(0, 50)):... df = queries.get_timeseries("height", run_hash="abc123")