bbc-vamp-plugins  1.0
Protected Attributes
SpeechMusicSegmenter Class Reference

Calculates boundaries between speech and music. More...

#include <SpeechMusicSegmenter.h>

List of all members.

Protected Attributes

vector< double > m_zcr
int m_nframes
int resolution
double margin
double change_threshold
double decision_threshold
double min_music_length

Detailed Description

Calculates boundaries between speech and music.

Outputs

Segmentation
Impulses at the boundary points.
Detection function
Function used to find boundaries.

Parameters

Resolution
The number of frames defining the window at which candidate changes might be found (default = 256)
Change threshold
The threshold of skewness difference at which a candidate change will be marked (default = 0.0781)
Decision threshold
The threshold used to classify segments as speech or music (default = 0.2734)
Margin
A parameter for the generation of the ZCR skewness (margin around mean ZCR where no ZCR samples will be taken into account) (default = 14)
Minimum music segment length
Music segments that are shorter than this minimum length will be dismissed (default = 0)

Description

This Vamp plugin is heavily inspired by the approach described in [1].

The algorithm works as follows:

  1. Measure the skewness of the distribution of zero-crossing rate across the audio file;
  2. Find points at which this distribution changes drastically;
  3. For each candidate change point found, classify the corresponding segment as follows:
    • Mean skewness > threshold: speech
    • Mean skewness < threshold: music
  4. If the segment has the same type with the previous one, merge it with the previous one.

This is a very early prototype, so not very accurate. It is relatively fast (around 1s to process a 20 minute file).

References

[1] J. Saunders, "Real-time discrimination of broadcast speech/music," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp.993-999, 7-10 May 1996


Member Data Documentation

vector<double> SpeechMusicSegmenter::m_zcr [protected]

The documentation for this class was generated from the following file: