1. MVDR beamformer#

class torchiva.MVDRBeamformer(mask_model, ref_mic=0, eps=1e-05, mvdr_type='rtf', n_power_iter=None)#

Implementation of MVDR beamformer. This class is basically assumes DNN-based beamforming. also supports the case of estimating three masks

Parameters
  • mask_model (torch.nn.Module) – A function that is given one spectrogram and returns 2 or 3 masks of the same size as the input. When 3 masks (1 for target and the rest 2 for noise) are etimated, they are utilized as in 10

  • ref_mic (int, optional) – Reference channel (default: 0)

  • eps (float, optional) – A small constant to make divisions and the like numerically stable (default:1e-5).

  • mvdr_type (str, optional) – The way to obtain the MVDR weight. If set to rtf, relative transfer function is computed to obtain MVDR. If set to ‘scm’, MVDR weight is obtained directly with spatial covariance matrices 11 (default: rtf).

  • n_power_iter (int, optional) – Use the power iteration method to compute the relative transfer function instead of the full generalized eigenvalue decomposition (GEVD). The number of iteration desired should be provided. If set to None, the full GEVD is used (default: None).

forward(X, mask_model=None, ref_mic=None, eps=None, mvdr_type=None, n_power_iter=None)#
Parameters

X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)

Returns

Y – The separated signal in STFT-domain

Return type

torch.Tensor, shape (..., n_src, n_freq, n_frames)

References

10

C. Boeddeker et al., “Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation”, ICASSP, 2021.

11

Mehrez Souden, Jacob Benesty, and Sofiene Affes, “On optimal frequency-domain multichannel linear filtering for noise reduction”, IEEE Trans. on audio, speech, and lang. process., 2009.

2. MWF beamformer#

class torchiva.MWFBeamformer(mask_model, ref_mic=0, eps=1e-05, time_invariant=True)#

Implementation of MWF beamformer described in 12. This class is basically assumes DNN-based beamforming.

Parameters
  • mask_model (torch.nn.Module) – A function that is given one spectrogram and returns 2 masks of the same size as the input.

  • ref_mic (int, optional) – Reference channel (default: 0)

  • eps (float, optional) – A small constant to make divisions and the like numerically stable (default:1e-5).

  • time_invariant (bool, optional) – If set to True, this flag indicates that we want to use the time-invariant version of MWF. If set to False, the time-varying MWF is used instead (default: True).

forward(X, mask_model=None, ref_mic=None, eps=None, time_invariant=None)#
Parameters

X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)

Returns

Y – The separated signal in STFT-domain

Return type

torch.Tensor, shape (..., n_src, n_freq, n_frames)

References

12

Y. Masuyama et al., “Consistency-aware multi-channel speech enhancement using deep neural networks”, ICASSP, 2020.

3. GEV beamformer#

class torchiva.GEVBeamformer(mask_model, ref_mic=0, eps=1e-05)#

Implementation of GEV beamformer. This class is basically assumes DNN-based beamforming.

Parameters
  • mask_model (torch.nn.Module) – A function that is given one spectrogram and returns 2 masks of the same size as the input.

  • ref_mic (int, optional) – Reference channel (default: 0)

  • eps (float, optional) – A small constant to make divisions and the like numerically stable (default:1e-5).

forward(X, mask_model=None, ref_mic=None, eps=None)#
Parameters

X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)

Returns

Y – The separated signal in STFT-domain

Return type

torch.Tensor, shape (..., n_src, n_freq, n_frames)

4. FIVE#

class torchiva.FIVE(n_iter=10, model=None, proj_back_mic=0, eps=None, n_power_iter=None)#

Fast independent vector extraction (FIVE) 8. FIVE extracts one source from the input signal.

Parameters
  • n_iter (int, optional) – The number of iterations (default: 10).

  • model (torch.nn.Module, optional) – The model of source distribution (default: LaplaceModel).

  • proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to None, projection back is not applied (default: 0).

  • eps (float, optional) – A small constant to make divisions and the like numerically stable (default: None).

  • n_power_iter (int, optional) – The number of power iterations. If set to None, eigenvector decomposition is used instead. (default: None)

forward(X, n_iter=None, model=None, proj_back_mic=None, eps=None)#
Parameters

X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)

Returns

Y – The extracted one signal in STFT-domain.

Return type

torch.Tensor, shape (..., n_freq, n_frames)

References

8

R. Scheibler, and N Ono, “Fast independent vector extraction by iterative SINR maximization”, ICASSP, 2020, https://arxiv.org/pdf/1910.10654.pdf.