Visualizations - Jupyter Bioacoustic

The spectrogram player renders each audio clip as an interactive visualization. Switch between visualization types from the dropdown, adjust resolution, buffer, zoom, and capture the current view as a PNG.

Built-in Visualizations¶

Five visualization types are included and can be referenced by string name in the visualizations parameter:

Name	Description
`'linear'` (or `'spectrogram'`)	Linear-frequency STFT magnitude spectrogram
`'mel'`	Mel-scale spectrogram — compresses the frequency axis to better match human auditory perception
`'log_frequency'`	Log-frequency spectrogram — more visual detail at lower frequencies, similar to a constant-Q transform
`'bandpass'`	Bandpass spectrogram focused on 1–8 kHz (typical birdsong range)
`'waveform'`	Time-domain waveform plot (amplitude vs. time, not a spectrogram)

Source:

Visualizations

BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    partial_download=False,
    visualizations=[
        'plain', 'mel', 'log_frequency', 'bandpass', 'waveform'],
).open()

The first item in the list is the default. Use selected:: prefix to override the default (e.g. ['linear', 'selected::mel', 'log_frequency']).

Capture

The capture button saves the current spectrogram/visualization as a PNG file. Configure it with:

capture: True — show the button with default label
capture: 'Save Spectrogram' — custom button label
capture: False — hide the button
capture_dir: 'spectrograms' — output directory for captures

BioacousticAnnotator(
    data='detections.csv',
    audio='recording.flac',
    capture='Save Spectrogram',
    capture_dir='spectrograms',
).open()

Standalone Usage¶

The visualization functions in jupyter_bioacoustic.utils.visualizations can be used outside the widget — for analysis, figures, or custom pipelines.

Source:

Visualizations

import soundfile as sf

# Load 15 seconds of audio
audio_data, sample_rate = sf.read('audio/test-default.flac')
duration = 15
mono = audio_data[:sample_rate * duration].mean(axis=1) if audio_data.ndim > 1 else audio_data[:sample_rate * duration]

# Generate a log-frequency spectrogram
result = vis.log_frequency(mono, sample_rate, 2000)
print('matrix shape:', result['matrix'].shape)                                                                                                                                                                                                   
print('freq_min:', result['freq_min'], 'freq_max:', result['freq_max'])
print('freq_scale:', result['freq_scale'])                                                                                                                                                                                                       
print('matrix min:', result['matrix'].min(), 'max:', result['matrix'].max())                                                                                                                                                                     
                                              
# Plot it standalone — vis.plot() returns (fig, ax)
fig, ax = vis.plot(result, cmap='inferno')
ax.set_title('Log-Frequency Spectrogram (standalone)')
fig.savefig('log-freq-standalone.png')
fig

The vis.plot() helper renders any visualization dict as a matplotlib figure. It handles dB normalization, colormap rendering, and frequency-axis tick labels for linear, mel, and log scales.

Available functions: vis.spectrogram(), vis.mel(), vis.log_frequency(), vis.bandpass(), vis.waveform().

vis.render_png(matrix, width, cmap, dynamic_range_db) converts a raw 2D matrix to PNG bytes — useful when building custom visualizations that need colormap control without full matplotlib layout.

Custom Visualizations¶

Custom visualization functions can be passed directly to the visualizations parameter alongside built-in names. Each function must have the signature (mono, sr, width) and returns a dict with freq_min, freq_max, freq_scale and either

matrix — a 2D numpy array (freq × time). The widget renders it automatically with dB normalization.
png_bytes — raw PNG image bytes for full rendering control.

Both forms require freq_min, freq_max, and freq_scale ('linear', 'mel', or 'log'). Matrix returns can optionally include matrix_scale: 'db' to skip the dB conversion.

def my_custum_vis(mono: np.ndarray, sr: float, width: int) -> dict:
    ...
    return {
        'freq_min':  ...,    # min frequency
        'freq_max':  ...,    # max frequency
        'freq_scale':  ...,  # frequency scale: one of linear, mel, or log
        'png_bytes': ...,    # [Required if matrix is None] raw PNG image bytes
        'matrix': ...,       # [Required if png_bytes is None] a 2D numpy array (freq × time)
        'matrix_scale': ..., # [Optional] db or None - only works with 'matrix'
    }

Matrix Example

Source:

Visualizations

def birdsong_spectrogram(mono, sr, width):
    """Bandpass 1-8 kHz with viridis colormap and tighter dynamic range.
    
    Uses vis.bandpass() for the matrix, then vis.render_png()
    with a custom colormap and 60 dB dynamic range for contrast.
    """
    result = vis.bandpass(mono, sr, width, f_lo=1000.0, f_hi=8000.0)
    png = vis.render_png(
        result['matrix'], width=width, cmap='viridis', dynamic_range_db=60,
    )
    return {
        'png_bytes': png,
        'freq_min': 1000.0,
        'freq_max': 8000.0,
        'freq_scale': 'linear',
    }


BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    visualizations=[
        'plain', 'mel',
        {'fn': birdsong_spectrogram, 'label': 'Birdsong (1-8 kHz)'},
    ],
).open()

PNG Example

Source:

Visualizations

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import io


def waveform_and_spectrogram(mono, sr, width):
    """Composite: waveform on top, spectrogram on bottom, single PNG.
    
    This example renders with matplotlib directly for full layout control.
    """
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(width / 100, 5),
                                    gridspec_kw={'height_ratios': [1, 3]}, dpi=100)
    fig.subplots_adjust(hspace=0.05)

    t = np.linspace(0, len(mono) / sr, len(mono))
    ax1.plot(t, mono, color='#89b4fa', linewidth=0.3)
    ax1.set_xlim(0, len(mono) / sr)
    ax1.set_ylabel('Amp', fontsize=7, color='#cdd6f4')
    ax1.tick_params(labelsize=6, colors='#6c7086')
    ax1.set_facecolor('#1e1e2e')
    ax1.spines[:].set_visible(False)

    ax2.specgram(mono, Fs=sr, NFFT=1024, noverlap=512, cmap='inferno')
    ax2.set_ylabel('Hz', fontsize=7, color='#cdd6f4')
    ax2.tick_params(labelsize=6, colors='#6c7086')
    ax2.set_facecolor('#1e1e2e')
    ax2.spines[:].set_visible(False)

    fig.patch.set_facecolor('#1e1e2e')
    buf = io.BytesIO()
    fig.savefig(buf, format='png', dpi=100, bbox_inches='tight', pad_inches=0.02)
    plt.close(fig)

    return {
        'png_bytes': buf.getvalue(),
        'freq_min': 0.0,
        'freq_max': sr / 2.0,
        'freq_scale': 'linear',
    }


def inferno_spectrogram(mono, sr, width):
    """Plain spectrogram rendered with 'inferno' colormap via vis.render_png()."""
    result = vis.spectrogram(mono, sr, width)
    png = vis.render_png(result['matrix'], width=width, cmap='inferno')
    return {
        'png_bytes': png,
        'freq_min': result['freq_min'],
        'freq_max': result['freq_max'],
        'freq_scale': result['freq_scale'],
    }


BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    visualizations=[
        'plain',
        'mel',
        {'fn': inferno_spectrogram, 'label': 'Inferno Colormap'},
        {'fn': waveform_and_spectrogram, 'label': 'Waveform + Spectrogram'},
    ],
).open()

Third-party Visualizations¶

Any audio library can be wrapped as a custom visualization. The demo notebooks show integrations with:

OpenSoundscape — Spectrogram.from_audio(), MelSpectrogram, .bandpass()
Librosa — librosa.feature.melspectrogram(), HPSS harmonic separation, chromagrams
SciPy — scipy.signal.spectrogram() with configurable window functions (Hann, Blackman, Kaiser, Tukey)

OpenSoundscapes

Source:

Custom Visualizations: Third-Party Libraries

BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    visualizations=[
        'plain',
        {'fn': oss_spectrogram, 'label': 'OSS Linear'},
        {'fn': oss_mel_spectrogram, 'label': 'OSS Mel (400 bins)'},
        {'fn': oss_bandpass, 'label': 'OSS Bandpass (2-10 kHz)'},
    ],
).open()

Librosa

Source:

Custom Visualizations: Third-Party Libraries

BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    visualizations=[
        'plain',
        {'fn': librosa_mel, 'label': 'Librosa Mel (128 bins)'},
        {'fn': librosa_harmonic, 'label': 'Librosa Harmonic (HPSS)'},
        {'fn': librosa_chromagram, 'label': 'Librosa Chromagram'},
    ],
).open()

SciPy

Source:

Custom Visualizations: Third-Party Libraries

BioacousticAnnotator(
    data=DATA,
    audio=AUDIO,
    visualizations=[
        'plain',
        {'fn': scipy_hann, 'label': 'SciPy Hann (magma)'},
        {'fn': scipy_blackman, 'label': 'SciPy Blackman (inferno)'},
        {'fn': scipy_kaiser, 'label': 'SciPy Kaiser β=14 (viridis)'},
        {'fn': scipy_tukey, 'label': 'SciPy Tukey α=0.5 (plasma)'},
    ],
).open()

See the Custom Visualizations and Third-Party Libraries notebooks for complete examples.