1. T_ISS#

class torchiva.T_ISS(n_iter=10, n_taps=0, n_delay=0, n_src=None, model=None, proj_back_mic=0, use_dmc=False, eps=None)#

Joint dereverberation and separation with time-decorrelation iterative source steering (T-ISS) 1.

Parameters can also be specified during a forward call. In this case, the forward argument is only used in that forward process and does not rewrite class attributes.

Parameters

n_iter (int, optional) – The number of iterations. (default: 10)
n_taps (int, optional) – The length of the dereverberation filter. If set to 0, this method works as the normal AuxIVA with ISS update 2 (default: 0).
n_delay (int, optional) – The number of delay for dereverberation (default: 0).
n_src (int, optional) – The number of sources to be separated. When n_src < n_chan, a computationally cheaper variant (Over-T-ISS) 3 is used. If set to None, n_src is set to n_chan (default: None)
model (torch.nn.Module, optional) – The model of source distribution. Mask estimation neural network can also be used. If None, spherical Laplace is used (default: None).
proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to None, projection back is not applied (default: 0).
use_dmc (bool, optonal) – If set to True, memory efficient Demixing Matrix Checkpointing (DMC) 4 is used to compute the gradient. It reduces the memory cost to that of a single iteration when training neural source model (default: False).
eps (float, optional) – A small constant to make divisions and the like numerically stable (default:None).

forward(n_iter=None, n_taps=None, n_delay=None, n_src=None, model=None, proj_back_mic=None, use_dmc=None, eps=None)#

Parameters: X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)
Returns: Y – The separated and dereverberated signal in STFT-domain
Return type: torch.Tensor, shape (..., n_src, n_freq, n_frames)

Note

This class can handle various BSS methods with ISS update rule depending on the specified arguments:

IVA-ISS: n_taps=0, n_delay=0, n_chan==n_src, model=LaplaceMoldel() or GaussMoldel()
ILRMA-ISS: n_taps=0, n_delay=0, n_chan==n_src, model=NMFModel()
DNN-IVA-ISS: n_taps=0, n_delay=0, n_chan==n_src, model=*DNN*
OverIVA-ISS: n_taps=0, n_delay=0, n_chan < n_src
ILRMA-T-ISS 1 : n_taps>0, n_delay>0, n_chan==n_src, model=NMFMoldel()
DNN-T-ISS 4 : n_taps>0, n_delay>0, n_chan==n_src, model=*DNN*
Over-T-ISS 3 : n_taps>0, n_delay>0, n_chan > n_src

References

1(1,2): T. Nakashima, R. Scheibler, M. Togami, and N. Ono, “Joint dereverberation and separation with iterative source steering”, ICASSP, 2021, https://arxiv.org/pdf/2102.06322.pdf.
2: R. Scheibler, and N Ono, “Fast and stable blind source separation with rank-1 updates” ICASSP, 2021,
3(1,2): R. Scheibler, W. Zhang, X. Chang, S. Watanabe, and Y. Qian, “End-to-End Multi-speaker ASR with Independent Vector Analysis”, arXiv preprint arXiv:2204.00218, 2022, https://arxiv.org/pdf/2204.00218.pdf.
4(1,2): K. Saijo, and R. Scheibler, “Independence-based Joint Speech Dereverberation and Separation with Neural Source Model”, arXiv preprint arXiv:2110.06545, 2022, https://arxiv.org/pdf/2110.06545.pdf.

2. AuxIVA_IP#

class torchiva.AuxIVA_IP(n_iter=10, n_src=None, model=None, proj_back_mic=0, eps=None)#

Independent vector analysis (IVA) with iterative projection (IP) update 5.

We do not support ILRMA-T with IP updates.

Parameters

n_iter (int, optional) – The number of iterations. (default: 10)
n_src (int, optional) – The number of sources to be separated. When n_src < n_chan, a computationally cheaper variant (OverIVA) 6 is used. If set to None, n_src is set to n_chan (default: None)
model (torch.nn.Module, optional) – The model of source distribution. If None, spherical Laplace is used (default: None).
proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to None, projection back is not applied (default: 0).
eps (float, optional) – A small constant to make divisions and the like numerically stable (default:None).

forward(X, n_iter=None, n_src=None, model=None, proj_back_mic=None, eps=None)#

Parameters: X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)
Returns: Y – The separated signal in STFT-domain
Return type: torch.Tensor, shape (..., n_src, n_freq, n_frames)

Note

This class can handle two BSS methods with IP update rule depending on the specified arguments:

AuxIVA-IP: n_chan==n_src, model=LaplaceMoldel() or GaussMoldel()
ILRMA-IP: n_chan==n_src, model=NMFModel()
OverIVA_IP 6: n_taps=0, n_delay=0, n_chan==n_src, model=NMFModel()

References

5: N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique”, WASSPA, 2011.
6(1,2): R. Scheibler, and N Ono, “Independent vector analysis with more microphones than sources”, WASSPA, 2019, https://arxiv.org/pdf/1905.07880.pdf.

3. AuxIVA_IP2#

class torchiva.AuxIVA_IP2(n_iter=10, model=None, proj_back_mic=0, eps=None)#

Blind source separation based on independent vector analysis with alternating updates of the mixing vectors 7

Parameters

n_iter (int, optional) – The number of iterations (default: 10).
model (torch.nn.Module, optional) – The model of source distribution. If None, spherical Laplace is used (default: None).
proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to None, projection back is not applied (default: 0).
eps (float, optional) – A small constant to make divisions and the like numerically stable (default:None).

forward(X, n_iter=None, model=None, proj_back_mic=None, eps=None)#

Parameters: X (torch.Tensor) – The input mixture in STFT-domain, shape (..., n_chan, n_freq, n_frames)
Returns: X – The separated signal in STFT-domain.
Return type: torch.Tensor, shape (..., n_chan, n_freq, n_frames)

References

7: N. Ono, “Fast stereo independent vector analysis and its implementation on mobile phone”, IWAENC, 2012.