Yaafe core features¶
Yaafe core audio features.
Available features¶
AmplitudeModulation¶
- class yaafefeatures.AmplitudeModulation¶
Tremelo and Grain description, according to [SE2005] and [AE2001].
- AmplitudeModulation uses Envelope to describe tremolo and grain. Analyzed frequency ranges are :
- Tremolo : 4 - 8 Hz
- Grain : 10 - 40 Hz
- For each of these ranges, it computes :
- Frequency of maximum energy in range
- Difference of the energy of this frequency and the mean energy over all frequencies
- Difference of the energy of this frequency and the mean energy in range
- Product of the two first values.
[AE2001] A.Eronen, Automatic musical instrument recognition. Master’s Thesis, Tempere University of Technology, 2001. - Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
AmplitudeModulation EnDecim=200 blockSize=32768 stepSize=16384
See also
AutoCorrelation¶
- class yaafefeatures.AutoCorrelation¶
Compute autocorrelation coefficients ac on each frames.
- Parameters:
- ACNbCoeffs (default=49): Number of autocorrelation coefficients to keep
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
AutoCorrelation ACNbCoeffs=49 blockSize=1024 stepSize=512
See also
ComplexDomainOnsetDetection¶
- class yaafefeatures.ComplexDomainOnsetDetection¶
Compute onset detection using a complex domain spectral flux method [CD2003].
[CD2003] C.Duxbury et al., Complex domain onset detection for musical signals, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003 - Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
ComplexDomainOnsetDetection FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
Energy¶
Envelope¶
- class yaafefeatures.Envelope¶
Extract amplitude envelope using hilbert transform, low-pass filtering and decimation.
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
Envelope EnDecim=200 blockSize=32768 stepSize=16384
See also
EnvelopeShapeStatistics¶
- class yaafefeatures.EnvelopeShapeStatistics¶
Centroid, spread, skewness and kurtosis of each frame’s amplitude envelope. For more details about moments, see Shape Statistics.
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
EnvelopeShapeStatistics EnDecim=200 blockSize=32768 stepSize=16384
See also
Frames¶
- class yaafefeatures.Frames¶
Segment input signal into frames.
First frame has zeros on left half so that it is centered on time 0s, then consecutive frames are equally spaced. Consequently, frame i (starting from 0) is centered on sample i * stepSize.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Frames blockSize=1024 stepSize=512
LPC¶
- class yaafefeatures.LPC¶
Compute the Linear Predictor Coefficients (LPC) of a signal frame. It uses autocorrelation and Levinson-Durbin algorithm. see [JM1975].
[JM1975] Makoul J., Linear Prediction: A tutorial Review, Proc. IEEE, Vol. 63, pp. 561-580, 1975. - Parameters:
- LPCNbCoeffs (default=2): Number of Linear Predictor Coefficients to compute
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
LPC LPCNbCoeffs=2 blockSize=1024 stepSize=512
See also
LSF¶
- class yaafefeatures.LSF¶
Compute the Line Spectral Frequency (LSF) coefficients of a signal frame. Algorithm was adapted from ([TB2006], [SH1976]).
[TB2006] Tom Backstrom, Carlo Magi, Properties of line spectrum pair polynomials–A review, Signal Processing, Volume 86, Issue 11, Special Section: Distributed Source Coding, November 2006, Pages 3286-3298, ISSN 0165-1684, DOI: 10.1016/j.sigpro.2006.01.010. [SH1976] Schussler, H., A stability theorem for discrete systems, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.24, no.1, pp. 87-89, Feb 1976 - Parameters:
- LSFDisplacement (default=1): LSF Displacement parameter: 1 for classical LSF, 0 for Schussler polynomials, >1 is a generalization
- LSFNbCoeffs (default=10): Number of Line Spectral Frequencies to compute
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
LSF LSFDisplacement=1 LSFNbCoeffs=10 blockSize=1024 stepSize=512
See also
Loudness¶
- class yaafefeatures.Loudness¶
The loudness coefficients are the energy in each Bark band, normalized by the overall sum. see [GP2004] and [MG1997] for more details.
[MG1997] Moore, Glasberg, et al., A Model for the Prediction of Thresholds Loudness and Partial Loudness., J. Audio Eng. Soc. 45: 224-240, 1997. - Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- LMode (default=Relative): “Specific” computes loudness without normalization, “Relative” normalize each band so that they sum to 1, “Total” just returns the sum of Loudness in all bands.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Loudness FFTLength=0 FFTWindow=Hanning LMode=Relative blockSize=1024 stepSize=512
See also
MFCC¶
- class yaafefeatures.MFCC¶
Compute the Mel-frequencies cepstrum coefficients [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height
. Then MFCCs are computed as following, using DCT II:
[DM1980] (1, 2) S.B. Davis and P.Mermelstrin, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28 :357-366, 1980. - Parameters:
- CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
- CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MFCC CepsIgnoreFirstCoeff=1 CepsNbCoeffs=13 FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
See also
MagnitudeSpectrum¶
- class yaafefeatures.MagnitudeSpectrum¶
Compute frame’s magnitude spectrum, using an analysis window (Hanning or Hamming), or not.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MagnitudeSpectrum FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
MelSpectrum¶
- class yaafefeatures.MelSpectrum¶
Compute the Mel-frequencies spectrum [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height
.
- Parameters:
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MelSpectrum FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
See also
OBSI¶
- class yaafefeatures.OBSI¶
Compute Octave band signal intensity using a trigular octave filter bank ([SE2005]).
[SE2005] (1, 2) S.Essid, Classification automatique des signaux audio-frequences: reconnaissance des instruments de musique. PhD, UPMC, 2005. - Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
OBSI FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
See also
OBSIR¶
- class yaafefeatures.OBSIR¶
Compute log of OBSI ratio between consecutive octave.
- Parameters:
- DiffNbCoeffs (default=0): Maximum number of coeffs to keep. 0 keeps N-1 value (with N the input feature size)
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
OBSIR DiffNbCoeffs=0 FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
See also
PerceptualSharpness¶
- class yaafefeatures.PerceptualSharpness¶
Compute the sharpness of Loudness coefficients, according to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
PerceptualSharpness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
PerceptualSpread¶
- class yaafefeatures.PerceptualSpread¶
Compute the spread of Loudness coefficients, according to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
PerceptualSpread FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralCrestFactorPerBand¶
- class yaafefeatures.SpectralCrestFactorPerBand¶
Compute spectral crest factor per log-spaced band of 1/4 octave.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralCrestFactorPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralDecrease¶
- class yaafefeatures.SpectralDecrease¶
Compute spectral decrease accoding to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralDecrease FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralFlatness¶
- class yaafefeatures.SpectralFlatness¶
Compute global spectral flatness using the ratio between geometric and arithmetic mean.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlatness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralFlatnessPerBand¶
- class yaafefeatures.SpectralFlatnessPerBand¶
Compute spectral flatness per log-spaced band of 1/4 octave, as proposed in MPEG7 standard.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlatnessPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralFlux¶
- class yaafefeatures.SpectralFlux¶
Compute flux of spectrum between consecutives frames.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- FluxSupport (default=All): support of flux computation. if ‘All’ then use all bins (default), if ‘Increase’ then use only bins which are increasing
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlux FFTLength=0 FFTWindow=Hanning FluxSupport=All blockSize=1024 stepSize=512
See also
SpectralRolloff¶
- class yaafefeatures.SpectralRolloff¶
Spectral roll-off is the frequency so that 99% of the energy is contained below. see [SS1997].
[SS1997] (1, 2) E.Scheirer, M.Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331-1334, 1997. - Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralRolloff FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralShapeStatistics¶
- class yaafefeatures.SpectralShapeStatistics¶
Compute shape statistics of MagnitudeSpectrum, (see [GR2004]).
Shape Statistics are centroid, spread, skewness and kurtosis, defined as follow:
[GR2004] O.Gillet, G.Richard, Automatic transcription of drum loops. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, Canada, 2004. - Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralShapeStatistics FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralSlope¶
- class yaafefeatures.SpectralSlope¶
SpectralSlope is computed by linear regression of the spectral amplitude. (see [GP2004])
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralSlope FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
SpectralVariation¶
- class yaafefeatures.SpectralVariation¶
SpectralVariation is the normalized correlation of spectrum between consecutive frames. (see [GP2004])
[GP2004] (1, 2, 3, 4, 5, 6) Geoffroy Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004. - Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralVariation FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
See also
TemporalShapeStatistics¶
- class yaafefeatures.TemporalShapeStatistics¶
Compute shape statistics of signal frames.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
TemporalShapeStatistics blockSize=1024 stepSize=512
See also
Available feature transforms¶
AutoCorrelationPeaksIntegrator¶
- class yaafefeatures.AutoCorrelationPeaksIntegrator¶
Feature transform that compute peaks of the autocorrelation function, outputs peaks and amplitude.
- Parameters:
- ACPInterPeakMinDist (default=5): Minimal distance between consecutive autocorrelation peaks, expressed in lags.
- ACPNbPeaks (default=3): Number of autocorrelation peaks to keep
- ACPNorm (default=No): can be No|BPM|Hz. Normalize output to be expressed respectively in lag, BPM, Hz
- NbFrames (default=60): Number of frames to integrate together
- StepNbFrames (default=30): Number of frames to skip between two integration
Declaration example:
AutoCorrelationPeaksIntegrator ACPInterPeakMinDist=5 ACPNbPeaks=3 ACPNorm=No NbFrames=60 StepNbFrames=30
Cepstrum¶
- class yaafefeatures.Cepstrum¶
Feature transform that compute cepstrum coefficients of input feature frames. (use DCT II)
- Parameters:
- CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
- CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
Declaration example:
Cepstrum CepsIgnoreFirstCoeff=1 CepsNbCoeffs=13
Derivate¶
- class yaafefeatures.Derivate¶
Compute temporal derivative of input feature. The derivative is approximated by an orthogonal polynomial fit over a finite length window. (see [RR1993] p.117).
[RR1993] L.R.Rabiner, Fundamentals of Speech Processing. Prentice Hall Signal Processing Series. PTR Prentice-Hall, 1993. - Parameters:
- DO1Len (default=4): Horizon used to compute order 1 derivative.
- DO2Len (default=1): Horizon used to compute order 2 derivative. Useless if DOrder=1.
- DOrder (default=1): Order of the derivative to compute.
Declaration example:
Derivate DO1Len=4 DO2Len=1 DOrder=1
HistogramIntegrator¶
- class yaafefeatures.HistogramIntegrator¶
Feature transform that compute histogram of input values
- Parameters:
- HInf (default=0): Minimal value to take into consideration
- HNbBins (default=10): Nb bins of histogram
- HSup (default=1): Maximal value to take into consideration
- HWeighted (default=0): Set it to 1 if input values are weighted. If 1, input is considered to be a list of couple (value,weight).
- NbFrames (default=60): Number of frames to integrate together
- StepNbFrames (default=30): Number of frames to skip between two integration
Declaration example:
HistogramIntegrator HInf=0 HNbBins=10 HSup=1 HWeighted=0 NbFrames=60 StepNbFrames=30
SlopeIntegrator¶
- class yaafefeatures.SlopeIntegrator¶
Feature transform that compute the slope of input feature over the given number of frames.
- Parameters:
- NbFrames (default=60): Number of frames to integrate together
- StepNbFrames (default=30): Number of frames to skip between two integration
Declaration example:
SlopeIntegrator NbFrames=60 StepNbFrames=30
StatisticalIntegrator¶
- class yaafefeatures.StatisticalIntegrator¶
Feature transform that compute the temporal mean and variance of input feature over the given number of frames.
- Parameters:
- NbFrames (default=60): Number of frames to integrate together
- SICompute (default=MeanStddev): if ‘MeanStddev’ then compute mean and standard deviation, if ‘Mean’ compute only mean, if ‘Stddev’ compute only stantard deviation.
- StepNbFrames (default=30): Number of frames to skip between two integration
Declaration example:
StatisticalIntegrator NbFrames=60 SICompute=MeanStddev StepNbFrames=30