Time filter

Source Type

Chen L.,National Engineering Laboratory for Speech and Language Information Processing | Lee K.A.,Institute for Infocomm Research | Ma B.,Institute for Infocomm Research | Guo W.,National Engineering Laboratory for Speech and Language Information Processing | And 2 more authors.
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Year: 2015

This paper investigates the use of frame alignment given by a deep neural network (DNN) for text-constrained speaker verification task, where the lexical contents of the test utterances are limited to a finite set of vocabulary. The DNN makes use of information carried by the target and its contextual frames to assign it probabilistically to one of the phonetic states. The frame alignment is therefore more precise and less ambiguous than that generated by a Gaussian mixture model (GMM). Using the DNN alignment, we show that an i-vector can be decomposed into segments of local variability vectors, each corresponding to a monophone, where each local vector models session variability given the phonetic context. Based on the local vectors, the content matching between the utterances for comparison can be accomplished in the PLDA scoring. Experiments conducted on the RSR2015 database shows that the proposed phone-centric local variability vector achieves a better performance compared to the i-vector. Copyright © 2015 ISCA.


Chen L.,National Engineering Laboratory for Speech and Language Information Processing | Chen L.,Institute for Infocomm Research | Chen L.,Nanyang Technological University | Lee K.A.,Institute for Infocomm Research | And 4 more authors.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | Year: 2016

I-vector has shown to be very effective in speaker verification with long-duration speech utterances. But when test utterances are of short duration, content mismatch between the enrollment and test utterances limit the performance of i-vector system. This paper proposes to extract local session variability vectors on different phonetic classes from the utterances instead of estimating the session variability across the whole utterance as i-vector does. Using the posteriors given by a deep neural network (DNN) trained for phone state classification, the local vectors represent the session variability contained in specific phonetic content. Our experiments show that the content-aware local vectors are better at coping with the content mismatch between training and test utterances of short durations for text-independent, text-constrained and text-dependent tasks. © 2016 IEEE.


Bao G.,University of Electronic Science and Technology of China | Bao G.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing | And 4 more authors.
IEEE Transactions on Audio, Speech and Language Processing | Year: 2013

This paper discusses underdetermined blind source separation (BSS) using a compressed sensing (CS) approach, which contains two stages. In the first stage we exploit a modified K-means method to estimate the unknown mixing matrix. The second stage is to separate the sources from the mixed signals using the estimated mixing matrix from the first stage. In the second stage a two-layer sparsity model is used. The two-layer sparsity model assumes that the low frequency components of speech signals are sparse on K-SVD dictionary and the high frequency components are sparse on discrete cosine transformation (DCT) dictionary. This model, taking advantage of two dictionaries, can produce effective separation performance even if the sources are not sparse in time-frequency (TF) domain. © 2006-2012 IEEE.


Xu X.,University of Electronic Science and Technology of China | Xu X.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing
IET Radar, Sonar and Navigation | Year: 2012

In this study, a new two-dimensional direction of arrival (2D DOA) estimation method is proposed for a uniform rectangular array (URA). The impinging signals are a mixture of uncorrelated and coherent signals. The method consists of two steps. The DOAs of uncorrelated signals are first estimated by a modified 2D estimation of signal parameters via rotational invariance techniques (ESPRIT). Then the contributions of uncorrelated signals and noises are eliminated after performing a subtraction operation on the elements of the covariance matrix and only those of coherent signals remain. Based on these subtracted elements, a decorrelating matrix with a larger size is constructed to estimate the DOAs of coherent signals. These two-step processes can be carried out in parallel because there is no inherent relationship between them. The proposed method has high estimation precision, needs no 2D angle searching and is suitable for the array no matter whether the number of sensors is odd or even. Simulation results demonstrate the effectiveness and performance of the proposed method. © 2012 The Institution of Engineering and Technology.


Zhang J.,University of Electronic Science and Technology of China | Huang L.,University of Electronic Science and Technology of China | Zhang L.,University of Electronic Science and Technology of China | Zhang B.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | Year: 2015

For noncircular signals, optimal widely linear (WL) minimum variance distortionless response (MVDR) beamformer has a powerful performance by exploiting the noncircularity of the received signals. Though, the noncircularity rate can be estimated by the steering vector (SV) of the signal of interest (SOI), the performance degrades as there exist errors in the SOI's SV. This paper introduces a new robust WL beamformer. In the proposed approach, the assumed extended steering vector (ESV) of the SOI is used to construct an interference-plus-noise subspace projection matrix, and the new ESV is estimated by maximizing the WL beamformer output power under a constraint that prevents the ESV from converging to the interference. The proposed algorithm only needs imprecise knowledge of the antenna array geometry and the SOI's angular sector. Simulations verify the effectiveness of the proposed algorithm. © 2015 IEEE.


Tong R.,University of Electronic Science and Technology of China | Tong R.,National Engineering Laboratory for Speech and Language Information Processing | Bao G.,University of Electronic Science and Technology of China | Bao G.,National Engineering Laboratory for Speech and Language Information Processing | And 2 more authors.
IEEE Signal Processing Letters | Year: 2015

In this letter, we propose a tensor factorization approach for multichannel speech enhancement, which is very successful even when the noise level is high. Specifically, we extend the well-known subspace approach to arbitrary orders and present the higher order subspace approach for multichannel speech enhancement. Unlike previous algorithms, the proposed approach constructs a third order tensor from the noisy data and then applies a tensor operation to reduce the noise. Through this it preserves the original data structure and makes full use of the spatial and temporal correlations in the multichannel data. The proposed approach adopts an iterative and step-wise procedure which usually converges in a few iterations. At each step a subspace filter sharing the same form with the conventional subspace approach is updated. Experiments show that it has achieved considerable performance on white Gaussian noise in terms of segmental signal-to-noise ratio improvement. Rapid convergence of the proposed approach is also reported. © 2015 IEEE.


Hu N.,Anhui University of Science and Technology | Hu N.,National Engineering Laboratory for Speech and Language Information Processing | Hu N.,CAS Hefei Institutes of Physical Science | Ye Z.,Anhui University of Science and Technology | And 8 more authors.
Signal Processing | Year: 2012

A new algorithm involving sparse recovery is proposed to address the problem of direction-of-arrival (DOA) estimation using weighted subspace fitting (WSF). The proposed algorithm proves to be a modified version of ℓ 1-SVD by using an optimal weighting matrix, wherein a scheme of regularization between sparsity penalty and subspace fitting error is also given for all SNR range. Numerical simulations verify the efficiency of the proposed algorithm and illustrate the performance improvement in low SNR. © 2012 Elsevier B.V. All rights reserved.


Zhang Y.,Anhui University of Science and Technology | Zhang Y.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,Anhui University of Science and Technology | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing | And 4 more authors.
Signal Processing | Year: 2014

A new method based on a novel model for off-grid direction-of-arrival (DOA) estimation is presented. The novel model is based on the sample covariance matrix and the off-grid representation of the steering vector. Based on this model, its equivalent signals are assumed to satisfy independent Gaussian distribution and its noise variance can be normalized to 1. The off-grid DOAs are estimated by the block sparse Bayesian algorithm. The advantages of the proposed method are that it considers the temporal correlation existed in each row of the equivalent signal sample matrix and the normalized noise variance does not need to be estimated. Moreover, this algorithm can work without the knowledge of the number of signals. Numerical simulations demonstrate the superior performance of the proposed method. © 2013 Elsevier B.V.


Zhang Y.,Anhui University of Science and Technology | Zhang Y.,National Engineering Laboratory for Speech and Language Information Processing | Hu N.,Anhui University of Science and Technology | Hu N.,National Engineering Laboratory for Speech and Language Information Processing | And 2 more authors.
Signal Processing | Year: 2013

A new method is proposed for sources number detection in array signal processing. This method is based on the orthogonality of signal subspace and noise subspace. At first obtain a set of bootstrap snapshots from original snapshots. Then estimate the direction of an arbitrary incident source, make eigen-decomposition of the covariance matrix and compute the weighted inner product vector. Subsequently repeat the above procedures many times to get weighted inner product vectors and calculate the average of them. Finally employ a clustering algorithm to determine the number of sources. The simulation results show the superiority of the proposed method at small number of snapshots and/or low signal-to-noise ratio (SNR). © 2012 Elsevier B.V. All rights reserved.


Cao S.,Anhui University of Science and Technology | Cao S.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,Anhui University of Science and Technology | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing | And 4 more authors.
Signal Processing | Year: 2013

A method based on fourth-order cumulants (FOC) for direction-of-arrival (DOA) estimation in the presence of sensor gain-phase errors is presented. This method can be applied in the scenario that the signals are non-Gaussian and the noises are Gaussian. The DOAs are estimated from the Hadamard product of an FOC matrix and its conjugation. The advantage of the proposed method is that it performs independently of the phase errors. Moreover, it is practicable when the noise is spatially colored. Simulation results demonstrate the effectiveness of the proposed method. © 2013 Elsevier B.V. All rights reserved.

Loading National Engineering Laboratory for Speech and Language Information Processing collaborators
Loading National Engineering Laboratory for Speech and Language Information Processing collaborators