registry.RunRegistry

registry.RunRegistry(
    db_path='josh_runs.duckdb',
    enable_spatial=True,
    _conn=None,
    _spatial_filter_bbox=None,
    _spatial_filter_geojson=None,
    _time_filter_range=None,
)

DuckDB-backed registry for tracking parameter sweeps and job runs.

Supports both file-based persistence and in-memory mode (using “:memory:”).

Attributes

Name Type Description
db_path Path | str Path to the DuckDB database file, or “:memory:” for in-memory.
enable_spatial bool If True, load spatial extension and create geometry column.
_conn Any DuckDB connection (created automatically).

Examples

>>> # File-based (persistent)
>>> registry = RunRegistry("experiment.duckdb")
>>> # In-memory (for testing)
>>> registry = RunRegistry(":memory:")
>>> # With spatial support enabled (default)
>>> registry = RunRegistry("experiment.duckdb", enable_spatial=True)
>>> # Context manager
>>> with RunRegistry("experiment.duckdb") as registry:
...     session_id = registry.create_session(...)

Methods

Name Description
check_sparsity Check for sparse columns in cell_data.
close Close the database connection.
complete_run Record the completion of a job run.
create_session Create a new sweep session.
export_results_df Export session results as a pandas DataFrame.
get_config_by_hash Get config information by run hash.
get_configs_for_session Get all configs for a session.
get_data_summary Get summary of all data in registry.
get_run Get run information by ID.
get_runs_by_parameters Query runs by parameter values.
get_runs_for_hash Get all runs for a run hash.
get_session Get session information by ID.
get_session_summary Get aggregated statistics for a session.
list_config_columns List all parameter column names in config_parameters.
list_config_parameters List all config parameter names from sweep configurations.
list_entity_types List all entity types found in cell_data.
list_export_variables List all export variable names from simulation outputs.
list_parameters Alias for list_config_parameters(). Deprecated, use list_config_parameters().
list_sessions List sessions, optionally filtered by experiment name.
list_variable_columns List all variable column names in cell_data.
list_variables Alias for list_export_variables(). Deprecated, use list_export_variables().
query Execute a SQL query with parameters.
register_output Register an output file from a run.
register_run Register a job configuration (run specification).
spatial_filter Context manager for spatial filtering of queries.
start_run Record the start of a job run.
time_filter Context manager for temporal filtering of queries.
to_csv Export a table to CSV format.
to_parquet Export a table to Parquet format.
update_session_status Update the status of a session.

check_sparsity

registry.RunRegistry.check_sparsity()

Check for sparse columns in cell_data.

Sparse columns (>50% NULL by default) often indicate that different simulation types are being mixed in the same registry, which hurts query performance.

Returns

Name Type Description
SparsityReport SparsityReport with statistics for each variable column.

Examples

>>> report = registry.check_sparsity()
>>> if report.should_warn:
...     print(report)

close

registry.RunRegistry.close()

Close the database connection.

complete_run

registry.RunRegistry.complete_run(run_id, exit_code, error_message=None)

Record the completion of a job run.

Parameters

Name Type Description Default
run_id str The run ID to update. required
exit_code int Process exit code (0 = success). required
error_message str | None Error message if run failed. None

create_session

registry.RunRegistry.create_session(
    config,
    experiment_name=None,
    session_id=None,
)

Create a new sweep session.

Parameters

Name Type Description Default
config Any Job configuration containing simulation, template, and sweep info. Must have simulation, template_path, and to_dict() attributes (typically a JobConfig from joshpy.jobs). required
experiment_name str | None Name for the experiment. Defaults to config.simulation. None
session_id str | None Optional externally-provided session ID. If None, generates a UUID. This allows the frontend/API layer to manage session IDs (e.g., using project IDs). None

Returns

Name Type Description
str The session ID (generated or provided).

Examples

>>> from joshpy.jobs import JobConfig, SweepConfig, SweepParameter
>>> config = JobConfig(
...     template_path=Path("template.jshc.j2"),
...     source_path=Path("simulation.josh"),
...     simulation="Main",
...     sweep=SweepConfig(
...         parameters=[SweepParameter(name="maxGrowth", values=[10, 20, 30])]
...     ),
... )
>>> session_id = registry.create_session(config=config)

Note

total_jobs and total_replicates are computed from the JobSet after job expansion. Use job_set.total_jobs and job_set.total_replicates.

export_results_df

registry.RunRegistry.export_results_df(session_id)

Export session results as a pandas DataFrame.

Parameters

Name Type Description Default
session_id str The session to export. required

Returns

Name Type Description
Any pandas DataFrame with run results and parameters.

Raises

Name Type Description
ImportError If pandas is not installed.

get_config_by_hash

registry.RunRegistry.get_config_by_hash(run_hash)

Get config information by run hash.

Parameters

Name Type Description Default
run_hash str The run hash to look up. required

Returns

Name Type Description
ConfigInfo | None ConfigInfo if found, None otherwise.

get_configs_for_session

registry.RunRegistry.get_configs_for_session(session_id)

Get all configs for a session.

Parameters

Name Type Description Default
session_id str The session ID to get configs for. required

Returns

Name Type Description
list[ConfigInfo] List of ConfigInfo objects.

get_data_summary

registry.RunRegistry.get_data_summary(session_id=None)

Get summary of all data in registry.

Provides counts, available variables, parameters, and data ranges for diagnostic purposes.

Parameters

Name Type Description Default
session_id str | None Optional session ID to filter by. None

Returns

Name Type Description
DataSummary DataSummary with counts and metadata.

get_run

registry.RunRegistry.get_run(run_id)

Get run information by ID.

Parameters

Name Type Description Default
run_id str The run ID to look up. required

Returns

Name Type Description
RunInfo | None RunInfo if found, None otherwise.

get_runs_by_parameters

registry.RunRegistry.get_runs_by_parameters(**params)

Query runs by parameter values.

Parameters

Name Type Description Default
**params Any Parameter name-value pairs to filter by. {}

Returns

Name Type Description
list[dict[str, Any]] List of dicts containing run info and parameters.

get_runs_for_hash

registry.RunRegistry.get_runs_for_hash(run_hash)

Get all runs for a run hash.

Parameters

Name Type Description Default
run_hash str The run hash to get runs for. required

Returns

Name Type Description
list[RunInfo] List of RunInfo objects.

get_session

registry.RunRegistry.get_session(session_id)

Get session information by ID.

Parameters

Name Type Description Default
session_id str The session ID to look up. required

Returns

Name Type Description
SessionInfo | None SessionInfo if found, None otherwise.

get_session_summary

registry.RunRegistry.get_session_summary(session_id)

Get aggregated statistics for a session.

Parameters

Name Type Description Default
session_id str The session ID to summarize. required

Returns

Name Type Description
SessionSummary | None SessionSummary with counts, or None if session not found.

list_config_columns

registry.RunRegistry.list_config_columns()

List all parameter column names in config_parameters.

Returns the dynamically-added parameter columns. Column names preserve original names with special characters (e.g., ‘soil.moisture’).

Returns

Name Type Description
list[str] Sorted list of parameter column names.

Examples

>>> registry.list_config_columns()
['maxGrowth', 'scenario', 'soil.moisture']

list_config_parameters

registry.RunRegistry.list_config_parameters(session_id=None)

List all config parameter names from sweep configurations.

These are the parameters you defined in your JobConfig sweep, stored as typed columns in the config_parameters table.

Parameters

Name Type Description Default
session_id str | None Optional session ID to filter by. None

Returns

Name Type Description
list[str] Sorted list of parameter names.

Examples

>>> registry.list_config_parameters()
['maxGrowth', 'scenario', 'survivalProb']

list_entity_types

registry.RunRegistry.list_entity_types(session_id=None)

List all entity types found in cell_data.

Parameters

Name Type Description Default
session_id str | None Optional session ID to filter by. None

Returns

Name Type Description
list[str] Sorted list of entity type names.

list_export_variables

registry.RunRegistry.list_export_variables(session_id=None)

List all export variable names from simulation outputs.

These are the variables exported by Josh simulations, stored as typed columns in the cell_data table. Variable names preserve original .josh names (e.g., ‘avg.height’).

When session_id is provided, only returns variables that have at least one non-NULL value for runs in that session.

Parameters

Name Type Description Default
session_id str | None Optional session ID to filter by. If provided, only returns variables with data in that session. None

Returns

Name Type Description
list[str] Sorted list of variable column names.

Examples

>>> registry.list_export_variables()
['averageAge', 'avg.height', 'treeCount']
>>> registry.list_export_variables(session_id="abc123")
['treeCount']  # Only variables with data in this session

list_parameters

registry.RunRegistry.list_parameters(session_id=None)

Alias for list_config_parameters(). Deprecated, use list_config_parameters().

list_sessions

registry.RunRegistry.list_sessions(experiment_name=None, limit=100)

List sessions, optionally filtered by experiment name.

Parameters

Name Type Description Default
experiment_name str | None Filter by experiment name (optional). None
limit int Maximum number of sessions to return. 100

Returns

Name Type Description
list[SessionInfo] List of SessionInfo objects, ordered by creation time (newest first).

list_variable_columns

registry.RunRegistry.list_variable_columns()

List all variable column names in cell_data.

Returns the dynamically-added variable columns. Column names preserve original names with special characters (e.g., ‘avg.height’).

Returns

Name Type Description
list[str] Sorted list of variable column names.

Examples

>>> registry.list_variable_columns()
['averageAge', 'avg.height', 'treeCount']

list_variables

registry.RunRegistry.list_variables(session_id=None)

Alias for list_export_variables(). Deprecated, use list_export_variables().

query

registry.RunRegistry.query(sql, params=None)

Execute a SQL query with parameters.

This provides direct access to DuckDB for custom queries beyond the pre-built methods. Use this when you need to run complex queries or explore the data in ways not covered by the API.

Parameters

Name Type Description Default
sql str SQL query with ? placeholders for parameters. required
params list | None List of parameter values. None

Returns

Name Type Description
Any DuckDB relation (call .df() for DataFrame, .fetchall() for tuples).

Examples

>>> # Get DataFrame
>>> df = registry.query(
...     "SELECT * FROM cell_data WHERE step BETWEEN ? AND ?",
...     [0, 10]
... ).df()
>>> # Get raw results
>>> rows = registry.query(
...     "SELECT COUNT(*) FROM cell_data WHERE run_hash = ?",
...     ["abc123"]
... ).fetchone()

register_output

registry.RunRegistry.register_output(
    run_id,
    output_type,
    file_path,
    file_size=None,
    row_count=None,
)

Register an output file from a run.

Parameters

Name Type Description Default
run_id str The run this output belongs to. required
output_type str Type of output (e.g., ‘csv’, ‘log’, ‘error’). required
file_path str Path to the output file. required
file_size int | None Size of the file in bytes. None
row_count int | None Number of rows (for tabular data). None

Returns

Name Type Description
str The generated output ID.

register_run

registry.RunRegistry.register_run(
    session_id,
    run_hash,
    josh_path,
    config_content,
    file_mappings,
    parameters,
)

Register a job configuration (run specification).

Parameters

Name Type Description Default
session_id str Session this config belongs to. required
run_hash str MD5 hash of josh + config + file_mappings (12 chars). required
josh_path str Path to the .josh script file. required
config_content str Full text of the rendered configuration. required
file_mappings dict[str, dict[str, str]] | None Dict mapping names to {“path”: “…”, “hash”: “…”}. required
parameters dict[str, Any] Parameter values used to generate this config. required

spatial_filter

registry.RunRegistry.spatial_filter(bbox=None, geojson=None)

Context manager for spatial filtering of queries.

All DiagnosticQueries within this context will be spatially filtered. Can be nested with time_filter().

Parameters

Name Type Description Default
bbox tuple[float, float, float, float] | None Bounding box as (min_lon, max_lon, min_lat, max_lat). None
geojson str | dict | None GeoJSON polygon string or dict. None

Raises

Name Type Description
ValueError If both bbox and geojson are provided.

Examples

>>> with registry.spatial_filter(bbox=(-116, -115, 33.5, 34.0)):
...     df = queries.get_timeseries("height", run_hash="abc123")
>>> # Nested with time filter
>>> with registry.spatial_filter(geojson=park_boundary):
...     with registry.time_filter(step_range=(0, 50)):
...         df = queries.get_timeseries("height", run_hash="abc123")

start_run

registry.RunRegistry.start_run(
    run_hash,
    replicate=0,
    output_path=None,
    metadata=None,
)

Record the start of a job run.

Parameters

Name Type Description Default
run_hash str Run hash for this run. required
replicate int Replicate number (0-indexed). 0
output_path str | None Path where output will be written. None
metadata dict[str, Any] | None Additional metadata. None

Returns

Name Type Description
str The generated run ID.

time_filter

registry.RunRegistry.time_filter(step_range)

Context manager for temporal filtering of queries.

All DiagnosticQueries within this context will be filtered to the specified step range. Can be nested with spatial_filter().

Parameters

Name Type Description Default
step_range tuple[int, int] Tuple of (min_step, max_step) inclusive. required

Examples

>>> with registry.time_filter(step_range=(0, 50)):
...     df = queries.get_timeseries("height", run_hash="abc123")
>>> # Nested with spatial filter
>>> with registry.time_filter(step_range=(10, 20)):
...     with registry.spatial_filter(bbox=(-116, -115, 33.5, 34.0)):
...         df = queries.get_comparison("height", group_by="maxGrowth")

to_csv

registry.RunRegistry.to_csv(path, table='cell_data')

Export a table to CSV format.

Parameters

Name Type Description Default
path str | Path Output file path. required
table str Table name to export (default: cell_data). 'cell_data'

Examples

>>> registry.to_csv("results.csv")

In R:

>>> # df <- readr::read_csv("results.csv")

to_parquet

registry.RunRegistry.to_parquet(path, table='cell_data')

Export a table to Parquet format.

Parquet is recommended for R/Python analysis due to compression and type preservation.

Parameters

Name Type Description Default
path str | Path Output file path. required
table str Table name to export (default: cell_data). 'cell_data'

Examples

>>> registry.to_parquet("results.parquet")

In R:

>>> # df <- arrow::read_parquet("results.parquet")

In Python:

>>> import pandas as pd
>>> df = pd.read_parquet("results.parquet")

update_session_status

registry.RunRegistry.update_session_status(session_id, status)

Update the status of a session.

Parameters

Name Type Description Default
session_id str The session to update. required
status str New status (e.g., ‘running’, ‘completed’, ‘failed’). required