The output parameter accepts a dict with a uri (or url) field that enables syncing the local output file to a remote location (S3, GCS, etc.). Syncing can be triggered two ways:
Sync button — a button in the widget UI that uploads on click
ba.sync()— programmatic upload from the notebook
Under the hood the sync methods rely on jupyter_bioacoustic.audio.io.write(), using the app-configuration to set the source and destination locations.
Author: Brookie Guzder-Williams (bguzder-williams@berkeley.edu)
Affiliation: The Eric and Wendy Schmidt Center for Data Science & Environment
Website: https://dse.berkeley.edu/from jupyter_bioacoustic import BioacousticAnnotator
DATA = 'data/annotate-data.csv'
AUDIO = 'https://dse-soundhub.s3.us-west-2.amazonaws.com/public/audio/dev/20230522_200000.flac'1. Output Config¶
When output is a string it behaves as before — a local file path. When output is a dict, the following keys are available:
| Key | Type | Description |
|---|---|---|
path | str | (required) Local output file path — same as the string form |
uri / url | str | Remote destination for sync (S3, GCS, etc.) |
sync_button | bool or str | Show a sync button in the widget. True shows “Sync”, a string sets the label. Defaults to True when uri/url is set |
recursive | bool | Passed to io.write() — for uploading directories (e.g. partitioned parquet) |
secrets | list | Auth kwargs for io.write(). Uses the same {key, value} format as data_secrets |
Authentication¶
S3 auth uses boto3 defaults (env vars, ~/.aws/credentials, IAM role). To specify a named profile or pass credentials explicitly, use the secrets field:
output:
path: outputs/results.csv
uri: s3://my-bucket/project/results.csv
secrets:
- key: profile_name
value: env:AWS_PROFILE # reads $AWS_PROFILESecret values support three formats:
env:VAR_NAME— reads from an environment variabledialog— prompts the user via input dialogany other string — used as a literal value
The secret keys are passed as kwargs to io.write(). For S3 the relevant keys are profile_name, region_name, and client (a pre-configured boto3 S3 client).
2. Sync Button¶
When a uri is configured the widget adds a sync button to the bottom-right of the form panel. Clicking it uploads the current output file, overwriting the remote copy. The button disables during upload and re-enables when complete.
The config below uses sync_button: 'Sync to S3' for a custom label. Set sync_button: false to hide the button while still allowing programmatic sync via ba.sync().
ba = BioacousticAnnotator(
data=DATA,
audio=AUDIO,
output={
'path': 'outputs/sync-example.csv',
'uri': 's3://my-bucket/project/annotations/sync-example.csv',
'sync_button': 'Sync to S3',
'secrets': dict(profile='soundhub')
},
form_config={
'title': {'value': 'REVIEW', 'progress_tracker': True},
'pass_value': {'source_column': 'id', 'column': 'detection_id'},
'select': {
'label': 'Is Valid',
'column': 'is_valid',
'required': True,
'items': ['yes', 'no'],
},
'submission_buttons': {
'line': True,
'next': {'label': 'Skip'},
'submit': {'label': 'Verify'},
},
},
)
ba.open()ba.output()The same configuration via a config file:
ba = BioacousticAnnotator(
data=DATA,
audio=AUDIO,
config='config/sync-example.yaml',
)
ba.open()ba.output()ba.sync(dest='s3://dse-soundhub/dev/annotations/sync-example-2.csv')# config/sync-example.yaml
ident_column: common_name
data_columns: [common_name, confidence, start_time, county]
display_columns: [scientific_name]
output:
path: outputs/sync-example.csv
uri: s3://my-bucket/project/annotations/sync-example.csv
sync_button: Sync to S3
secrets:
- key: profile_name
value: env:AWS_PROFILE
form_config:
title:
value: REVIEW
progress_tracker: true
pass_value:
source_column: id
column: detection_id
select:
label: Is Valid
column: is_valid
required: true
items:
- label: 'yes'
value: 'yes'
- label: 'no'
value: 'no'
submission_buttons:
line: true
next:
label: Skip
submit:
label: Verify3. Programmatic Sync¶
ba.sync() uploads the output file to the configured uri. This is useful for scripted workflows, scheduled uploads, or syncing after a batch of annotations.
# sync to the configured uri (s3://my-bucket/project/annotations/sync-example.csv)
ba.sync()You can override the destination or pass additional auth kwargs:
# override destination
ba.sync(dest='s3://other-bucket/backup/sync-example.csv')
# override auth
ba.sync(profile_name='prod', region_name='us-east-1')For more control use io.write() directly — ba.sync() is a convenience wrapper around it:
from jupyter_bioacoustic.audio import io
io.write(
'outputs/sync-example.csv',
's3://my-bucket/project/annotations/sync-example.csv',
profile_name='my-profile',
)