Skip to content

An audio/acoustic activity detection and audio segmentation tool

License

Notifications You must be signed in to change notification settings

amsehili/auditok

Repository files navigation

doc/figures/auditok-logo.png

Build Status https://codecov.io/github/amsehili/auditok/graph/badge.svg?token=0rwAqYBdkf Documentation Status

auditok is an Audio Activity Detection tool that processes online data (from an audio device or standard input) and audio files. It can be used via the command line or through its API.

Full documentation is available on Read the Docs.

Installation

auditok requires Python 3.7 or higher.

To install the latest stable version, use pip:

sudo pip install auditok

To install the latest development version from GitHub:

pip install git+https://github.com/amsehili/auditok

Alternatively, clone the repository and install it manually:

git clone https://github.com/amsehili/auditok.git
cd auditok
python setup.py install

Basic example

Here's a simple example of using auditok to detect audio events:

import auditok

# `split` returns a generator of AudioRegion objects
audio_events = auditok.split(
    "audio.wav",
    min_dur=0.2,     # Minimum duration of a valid audio event in seconds
    max_dur=4,       # Maximum duration of an event
    max_silence=0.3, # Maximum tolerated silence duration within an event
    energy_threshold=55 # Detection threshold
)

for i, r in enumerate(audio_events):
    # AudioRegions returned by `split` have defined 'start' and 'end' attributes
    print(f"Event {i}: {r.start:.3f}s -- {r.end:.3f}")

    # Play the audio event
    r.play(progress_bar=True)

    # Save the event with start and end times in the filename
    filename = r.save("event_{start:.3f}-{end:.3f}.wav")
    print(f"Event saved as: {filename}")

Example output:

Event 0: 0.700s -- 1.400s
Event saved as: event_0.700-1.400.wav
Event 1: 3.800s -- 4.500s
Event saved as: event_3.800-4.500.wav
Event 2: 8.750s -- 9.950s
Event saved as: event_8.750-9.950.wav
Event 3: 11.700s -- 12.400s
Event saved as: event_11.700-12.400.wav
Event 4: 15.050s -- 15.850s
Event saved as: event_15.050-15.850.wav

Split and plot

Visualize the audio signal with detected events:

import auditok
region = auditok.load("audio.wav") # Returns an AudioRegion object
regions = region.split_and_plot(...) # Or simply use `region.splitp()`

Example output:

doc/figures/example_1.png

Split an audio stream and re-join (glue) audio events with silence

The following code detects audio events within an audio stream, then insert 1 second of silence between them to create an audio with pauses:

# Create a 1-second silent audio region
# Audio parameters must match the original stream
from auditok import split, make_silence
silence = make_silence(duration=1,
                       sampling_rate=16000,
                       sample_width=2,
                       channels=1)
events = split("audio.wav")
audio_with_pauses = silence.join(events)

Alternatively, use split_and_join_with_silence:

from auditok import split_and_join_with_silence
audio_with_pauses = split_and_join_with_silence(silence_duration=1, input="audio.wav")

Export an AudioRegion as a numpy array

from auditok import load, AudioRegion
audio = load("audio.wav") # or use `AudioRegion.load("audio.wav")`
x = audio.numpy()
assert x.shape[0] == audio.channels
assert x.shape[1] == len(audio)

Limitations

The detection algorithm is based on audio signal energy. While it performs well in low-noise environments (e.g., podcasts, language lessons, or quiet recordings), performance may drop in noisy settings. Additionally, the algorithm does not distinguish between speech and other sounds, so it is not suitable for Voice Activity Detection in multi-sound environments.

License

MIT.