Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Sync

Authors
Affiliations
The Eric and Wendy Schmidt Center for Data Science & Environment
University of California, Berkeley
The Eric and Wendy Schmidt Center for Data Science & Environment
University of California, Berkeley
The Eric and Wendy Schmidt Center for Data Science & Environment
University of California, Berkeley

The output parameter accepts a dict with a uri (or url) field that enables syncing the local output file to a remote location (S3, GCS, etc.). Syncing can be triggered two ways:

  • Sync button — a button in the widget UI that uploads on click

  • ba.sync() — programmatic upload from the notebook

Under the hood the sync methods rely on jupyter_bioacoustic.audio.io.write(), using the app-configuration to set the source and destination locations.

Author: Brookie Guzder-Williams (bguzder-williams@berkeley.edu)
Affiliation: The Eric and Wendy Schmidt Center for Data Science & Environment
Website: https://dse.berkeley.edu/
from jupyter_bioacoustic import BioacousticAnnotator

DATA = 'data/annotate-data.csv'
AUDIO = 'https://dse-soundhub.s3.us-west-2.amazonaws.com/public/audio/dev/20230522_200000.flac'

1. Output Config

When output is a string it behaves as before — a local file path. When output is a dict, the following keys are available:

KeyTypeDescription
pathstr(required) Local output file path — same as the string form
uri / urlstrRemote destination for sync (S3, GCS, etc.)
sync_buttonbool or strShow a sync button in the widget. True shows “Sync”, a string sets the label. Defaults to True when uri/url is set
recursiveboolPassed to io.write() — for uploading directories (e.g. partitioned parquet)
secretslistAuth kwargs for io.write(). Uses the same {key, value} format as data_secrets

Authentication

S3 auth uses boto3 defaults (env vars, ~/.aws/credentials, IAM role). To specify a named profile or pass credentials explicitly, use the secrets field:

output:
    path: outputs/results.csv
    uri: s3://my-bucket/project/results.csv
    secrets:
        - key: profile_name
          value: env:AWS_PROFILE      # reads $AWS_PROFILE

Secret values support three formats:

  • env:VAR_NAME — reads from an environment variable

  • dialog — prompts the user via input dialog

  • any other string — used as a literal value

The secret keys are passed as kwargs to io.write(). For S3 the relevant keys are profile_name, region_name, and client (a pre-configured boto3 S3 client).


2. Sync Button

When a uri is configured the widget adds a sync button to the bottom-right of the form panel. Clicking it uploads the current output file, overwriting the remote copy. The button disables during upload and re-enables when complete.

The config below uses sync_button: 'Sync to S3' for a custom label. Set sync_button: false to hide the button while still allowing programmatic sync via ba.sync().

ba = BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    output={
        'path': 'outputs/sync-example.csv',
        'uri': 's3://my-bucket/project/annotations/sync-example.csv',
        'sync_button': 'Sync to S3',
        'secrets': dict(profile='soundhub')
    },
    form_config={
        'title': {'value': 'REVIEW', 'progress_tracker': True},
        'pass_value': {'source_column': 'id', 'column': 'detection_id'},
        'select': {
            'label': 'Is Valid',
            'column': 'is_valid',
            'required': True,
            'items': ['yes', 'no'],
        },
        'submission_buttons': {
            'line': True,
            'next': {'label': 'Skip'},
            'submit': {'label': 'Verify'},
        },
    },
)
ba.open()
ba.output()

The same configuration via a config file:

ba = BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    config='config/sync-example.yaml',
)
ba.open()
ba.output()
ba.sync(dest='s3://dse-soundhub/dev/annotations/sync-example-2.csv')
# config/sync-example.yaml
ident_column: common_name
data_columns: [common_name, confidence, start_time, county]
display_columns: [scientific_name]

output:
    path: outputs/sync-example.csv
    uri: s3://my-bucket/project/annotations/sync-example.csv
    sync_button: Sync to S3
    secrets:
        - key: profile_name
          value: env:AWS_PROFILE

form_config:
    title:
        value: REVIEW
        progress_tracker: true
    pass_value:
        source_column: id
        column: detection_id
    select:
        label: Is Valid
        column: is_valid
        required: true
        items:
            - label: 'yes'
              value: 'yes'
            - label: 'no'
              value: 'no'
    submission_buttons:
        line: true
        next:
            label: Skip
        submit:
            label: Verify

3. Programmatic Sync

ba.sync() uploads the output file to the configured uri. This is useful for scripted workflows, scheduled uploads, or syncing after a batch of annotations.

# sync to the configured uri (s3://my-bucket/project/annotations/sync-example.csv)
ba.sync()

You can override the destination or pass additional auth kwargs:

# override destination
ba.sync(dest='s3://other-bucket/backup/sync-example.csv')

# override auth
ba.sync(profile_name='prod', region_name='us-east-1')

For more control use io.write() directly — ba.sync() is a convenience wrapper around it:

from jupyter_bioacoustic.audio import io

io.write(
    'outputs/sync-example.csv',
    's3://my-bucket/project/annotations/sync-example.csv',
    profile_name='my-profile',
)