1. T_ISS#
- class torchiva.T_ISS(n_iter=10, n_taps=0, n_delay=0, n_src=None, model=None, proj_back_mic=0, use_dmc=False, eps=None)#
Joint dereverberation and separation with time-decorrelation iterative source steering (T-ISS) 1.
Parameters can also be specified during a forward call. In this case, the forward argument is only used in that forward process and does not rewrite class attributes.
- Parameters
n_iter (int, optional) – The number of iterations. (default:
10
)n_taps (int, optional) – The length of the dereverberation filter. If set to
0
, this method works as the normal AuxIVA with ISS update 2 (default:0
).n_delay (int, optional) – The number of delay for dereverberation (default:
0
).n_src (int, optional) – The number of sources to be separated. When
n_src < n_chan
, a computationally cheaper variant (Over-T-ISS) 3 is used. If set toNone
,n_src
is set ton_chan
(default:None
)model (torch.nn.Module, optional) – The model of source distribution. Mask estimation neural network can also be used. If
None
, spherical Laplace is used (default:None
).proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to
None
, projection back is not applied (default:0
).use_dmc (bool, optonal) – If set to
True
, memory efficient Demixing Matrix Checkpointing (DMC) 4 is used to compute the gradient. It reduces the memory cost to that of a single iteration when training neural source model (default:False
).eps (float, optional) – A small constant to make divisions and the like numerically stable (default:
None
).
- forward(n_iter=None, n_taps=None, n_delay=None, n_src=None, model=None, proj_back_mic=None, use_dmc=None, eps=None)#
- Parameters
X (torch.Tensor) – The input mixture in STFT-domain,
shape (..., n_chan, n_freq, n_frames)
- Returns
Y – The separated and dereverberated signal in STFT-domain
- Return type
torch.Tensor,
shape (..., n_src, n_freq, n_frames)
Note
- This class can handle various BSS methods with ISS update rule depending on the specified arguments:
IVA-ISS:
n_taps=0, n_delay=0, n_chan==n_src, model=LaplaceMoldel() or GaussMoldel()
ILRMA-ISS:
n_taps=0, n_delay=0, n_chan==n_src, model=NMFModel()
DNN-IVA-ISS:
n_taps=0, n_delay=0, n_chan==n_src, model=*DNN*
OverIVA-ISS:
n_taps=0, n_delay=0, n_chan < n_src
ILRMA-T-ISS 1 :
n_taps>0, n_delay>0, n_chan==n_src, model=NMFMoldel()
DNN-T-ISS 4 :
n_taps>0, n_delay>0, n_chan==n_src, model=*DNN*
Over-T-ISS 3 :
n_taps>0, n_delay>0, n_chan > n_src
References
- 1(1,2)
T. Nakashima, R. Scheibler, M. Togami, and N. Ono, “Joint dereverberation and separation with iterative source steering”, ICASSP, 2021, https://arxiv.org/pdf/2102.06322.pdf.
- 2
R. Scheibler, and N Ono, “Fast and stable blind source separation with rank-1 updates” ICASSP, 2021,
- 3(1,2)
R. Scheibler, W. Zhang, X. Chang, S. Watanabe, and Y. Qian, “End-to-End Multi-speaker ASR with Independent Vector Analysis”, arXiv preprint arXiv:2204.00218, 2022, https://arxiv.org/pdf/2204.00218.pdf.
- 4(1,2)
K. Saijo, and R. Scheibler, “Independence-based Joint Speech Dereverberation and Separation with Neural Source Model”, arXiv preprint arXiv:2110.06545, 2022, https://arxiv.org/pdf/2110.06545.pdf.
2. AuxIVA_IP#
- class torchiva.AuxIVA_IP(n_iter=10, n_src=None, model=None, proj_back_mic=0, eps=None)#
Independent vector analysis (IVA) with iterative projection (IP) update 5.
We do not support ILRMA-T with IP updates.
- Parameters
n_iter (int, optional) – The number of iterations. (default:
10
)n_src (int, optional) – The number of sources to be separated. When
n_src < n_chan
, a computationally cheaper variant (OverIVA) 6 is used. If set toNone
,n_src
is set ton_chan
(default:None
)model (torch.nn.Module, optional) – The model of source distribution. If
None
, spherical Laplace is used (default:None
).proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to
None
, projection back is not applied (default:0
).eps (float, optional) – A small constant to make divisions and the like numerically stable (default:
None
).
- forward(X, n_iter=None, n_src=None, model=None, proj_back_mic=None, eps=None)#
- Parameters
X (torch.Tensor) – The input mixture in STFT-domain,
shape (..., n_chan, n_freq, n_frames)
- Returns
Y – The separated signal in STFT-domain
- Return type
torch.Tensor,
shape (..., n_src, n_freq, n_frames)
Note
- This class can handle two BSS methods with IP update rule depending on the specified arguments:
AuxIVA-IP:
n_chan==n_src, model=LaplaceMoldel() or GaussMoldel()
ILRMA-IP:
n_chan==n_src, model=NMFModel()
OverIVA_IP 6:
n_taps=0, n_delay=0, n_chan==n_src, model=NMFModel()
References
- 5
N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique”, WASSPA, 2011.
- 6(1,2)
R. Scheibler, and N Ono, “Independent vector analysis with more microphones than sources”, WASSPA, 2019, https://arxiv.org/pdf/1905.07880.pdf.
3. AuxIVA_IP2#
- class torchiva.AuxIVA_IP2(n_iter=10, model=None, proj_back_mic=0, eps=None)#
Blind source separation based on independent vector analysis with alternating updates of the mixing vectors 7
- Parameters
n_iter (int, optional) – The number of iterations (default:
10
).model (torch.nn.Module, optional) – The model of source distribution. If
None
, spherical Laplace is used (default:None
).proj_back_mic (int, optional) – The reference mic index to perform projection back. If set to
None
, projection back is not applied (default:0
).eps (float, optional) – A small constant to make divisions and the like numerically stable (default:
None
).
- forward(X, n_iter=None, model=None, proj_back_mic=None, eps=None)#
- Parameters
X (torch.Tensor) – The input mixture in STFT-domain,
shape (..., n_chan, n_freq, n_frames)
- Returns
X – The separated signal in STFT-domain.
- Return type
torch.Tensor,
shape (..., n_chan, n_freq, n_frames)
References
- 7
N. Ono, “Fast stereo independent vector analysis and its implementation on mobile phone”, IWAENC, 2012.