National Engineering Laboratory for Speech and Language Information Processing

Hefei, China

National Engineering Laboratory for Speech and Language Information Processing

Hefei, China
SEARCH FILTERS
Time filter
Source Type

Tong R.,Hefei University of Technology | Tong R.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,Hefei University of Technology | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing
Signal Processing | Year: 2017

In this paper the problem of parallel waveform enhancement via the multi-sensor fusion technology is carefully studied. Through representing the observed multiple noisy observations as a 3-D tensor, we propose two novel approaches in the time domain, i.e. the transforming and filtering (TAF) approach and the direct multidimensional filtering (DMF) approach, for parallel waveform recovery and interference suppression. The term “parallel” indicates the system can produce an estimate of the clean waveform in each sensor channel simultaneously. Specifically, the TAF approach transforms the observed tensor into a different domain where the noise can then be filtered by discarding the insignificant coefficients. The DMF approach directly reduces the noise level by applying multidimensional filtering on the observed tensor. Both DMF and TAF are “blind” because they do not require precise frequency responses between the desired source and distributed sensors. Simulations show that TAF is capable of yielding satisfactory performances for spatially white noise, while DMF can produce satisfactory results on spatially colored noise. Besides, both TAF and DMF can work well in complex real environments. © 2017 Elsevier B.V.


Chen L.,Microsoft | Lee K.A.,Institute for Infocomm Research | Ma B.,Institute for Infocomm Research | Ma L.,Microsoft | And 3 more authors.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | Year: 2017

Probabilistic linear discriminant analysis (PLDA) is widely described as an effective model for text-independent speaker verification in the i-vector space. The PLDA scoring function is typically formulated as the likelihood ratio between the speaker-adapted and the universal PLDAs. In this case, the adaptation of PLDA was performed through the speaker factors. In this paper, we show that the channel factors of the PLDA could be equivalently exploited to deal with the multi-source conditions. In speaker verification, with the proposed method, a PLDAmodel trained on conversational telephone speech could be adequately adapted for interview-style microphone recordings. Experimental results on NIST SRE'08 and SRE'10 datasets confirm that the proposed method is effective, especially for the case whereby enrollment and test utterances were captured from different sources. © 2017 IEEE.


Tong R.,University of Electronic Science and Technology of China | Tong R.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing
IEEE Transactions on Consumer Electronics | Year: 2017

In this paper, a bilinear Wiener filtering method is proposed for multichannel audio acquisition without knowing the array configurations and frequency responses. Compared with mainstream algorithms, the proposed method has two important features: blindness and robustness. For the first feature, the method does not require any prior knowledge about the array manifold. The "blindness" is very attractive to the industry because even microphones picked from the same batch of the same manufacturer can be inconsistent. For the second feature, even in the presence of strong sensor noise, the method can yield a good performance in suppressing directional interferences. This helps bring better "robustness" to the method and make it more practical for real environments. In the method, a rectangular window is adopted to slide continuously over the parallel multichannel audio streams and then bilinear Wiener filtering is performed on the matrix denoting windowed parallel audio streams. Experiments in a recording studio show the method can suppress directional interferences well. Besides, it converges rapidly and is an appealing choice for consumer electronics. © 2017 IEEE.


Bao G.,University of Electronic Science and Technology of China | Bao G.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing | And 4 more authors.
IEEE Transactions on Audio, Speech and Language Processing | Year: 2013

This paper discusses underdetermined blind source separation (BSS) using a compressed sensing (CS) approach, which contains two stages. In the first stage we exploit a modified K-means method to estimate the unknown mixing matrix. The second stage is to separate the sources from the mixed signals using the estimated mixing matrix from the first stage. In the second stage a two-layer sparsity model is used. The two-layer sparsity model assumes that the low frequency components of speech signals are sparse on K-SVD dictionary and the high frequency components are sparse on discrete cosine transformation (DCT) dictionary. This model, taking advantage of two dictionaries, can produce effective separation performance even if the sources are not sparse in time-frequency (TF) domain. © 2006-2012 IEEE.


Xu X.,University of Electronic Science and Technology of China | Xu X.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing
IET Radar, Sonar and Navigation | Year: 2012

In this study, a new two-dimensional direction of arrival (2D DOA) estimation method is proposed for a uniform rectangular array (URA). The impinging signals are a mixture of uncorrelated and coherent signals. The method consists of two steps. The DOAs of uncorrelated signals are first estimated by a modified 2D estimation of signal parameters via rotational invariance techniques (ESPRIT). Then the contributions of uncorrelated signals and noises are eliminated after performing a subtraction operation on the elements of the covariance matrix and only those of coherent signals remain. Based on these subtracted elements, a decorrelating matrix with a larger size is constructed to estimate the DOAs of coherent signals. These two-step processes can be carried out in parallel because there is no inherent relationship between them. The proposed method has high estimation precision, needs no 2D angle searching and is suitable for the array no matter whether the number of sensors is odd or even. Simulation results demonstrate the effectiveness and performance of the proposed method. © 2012 The Institution of Engineering and Technology.


Zhang J.,University of Electronic Science and Technology of China | Huang L.,University of Electronic Science and Technology of China | Zhang L.,University of Electronic Science and Technology of China | Zhang B.,University of Electronic Science and Technology of China | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | Year: 2015

For noncircular signals, optimal widely linear (WL) minimum variance distortionless response (MVDR) beamformer has a powerful performance by exploiting the noncircularity of the received signals. Though, the noncircularity rate can be estimated by the steering vector (SV) of the signal of interest (SOI), the performance degrades as there exist errors in the SOI's SV. This paper introduces a new robust WL beamformer. In the proposed approach, the assumed extended steering vector (ESV) of the SOI is used to construct an interference-plus-noise subspace projection matrix, and the new ESV is estimated by maximizing the WL beamformer output power under a constraint that prevents the ESV from converging to the interference. The proposed algorithm only needs imprecise knowledge of the antenna array geometry and the SOI's angular sector. Simulations verify the effectiveness of the proposed algorithm. © 2015 IEEE.


Hu N.,Anhui University of Science and Technology | Hu N.,National Engineering Laboratory for Speech and Language Information Processing | Hu N.,CAS Hefei Institutes of Physical Science | Ye Z.,Anhui University of Science and Technology | And 8 more authors.
Signal Processing | Year: 2012

A new algorithm involving sparse recovery is proposed to address the problem of direction-of-arrival (DOA) estimation using weighted subspace fitting (WSF). The proposed algorithm proves to be a modified version of ℓ 1-SVD by using an optimal weighting matrix, wherein a scheme of regularization between sparsity penalty and subspace fitting error is also given for all SNR range. Numerical simulations verify the efficiency of the proposed algorithm and illustrate the performance improvement in low SNR. © 2012 Elsevier B.V. All rights reserved.


Zhang Y.,Anhui University of Science and Technology | Zhang Y.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,Anhui University of Science and Technology | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing | And 4 more authors.
Signal Processing | Year: 2014

A new method based on a novel model for off-grid direction-of-arrival (DOA) estimation is presented. The novel model is based on the sample covariance matrix and the off-grid representation of the steering vector. Based on this model, its equivalent signals are assumed to satisfy independent Gaussian distribution and its noise variance can be normalized to 1. The off-grid DOAs are estimated by the block sparse Bayesian algorithm. The advantages of the proposed method are that it considers the temporal correlation existed in each row of the equivalent signal sample matrix and the normalized noise variance does not need to be estimated. Moreover, this algorithm can work without the knowledge of the number of signals. Numerical simulations demonstrate the superior performance of the proposed method. © 2013 Elsevier B.V.


Zhang Y.,Anhui University of Science and Technology | Zhang Y.,National Engineering Laboratory for Speech and Language Information Processing | Hu N.,Anhui University of Science and Technology | Hu N.,National Engineering Laboratory for Speech and Language Information Processing | And 2 more authors.
Signal Processing | Year: 2013

A new method is proposed for sources number detection in array signal processing. This method is based on the orthogonality of signal subspace and noise subspace. At first obtain a set of bootstrap snapshots from original snapshots. Then estimate the direction of an arbitrary incident source, make eigen-decomposition of the covariance matrix and compute the weighted inner product vector. Subsequently repeat the above procedures many times to get weighted inner product vectors and calculate the average of them. Finally employ a clustering algorithm to determine the number of sources. The simulation results show the superiority of the proposed method at small number of snapshots and/or low signal-to-noise ratio (SNR). © 2012 Elsevier B.V. All rights reserved.


Cao S.,Anhui University of Science and Technology | Cao S.,National Engineering Laboratory for Speech and Language Information Processing | Ye Z.,Anhui University of Science and Technology | Ye Z.,National Engineering Laboratory for Speech and Language Information Processing | And 4 more authors.
Signal Processing | Year: 2013

A method based on fourth-order cumulants (FOC) for direction-of-arrival (DOA) estimation in the presence of sensor gain-phase errors is presented. This method can be applied in the scenario that the signals are non-Gaussian and the noises are Gaussian. The DOAs are estimated from the Hadamard product of an FOC matrix and its conjugation. The advantage of the proposed method is that it performs independently of the phase errors. Moreover, it is practicable when the noise is spatially colored. Simulation results demonstrate the effectiveness of the proposed method. © 2013 Elsevier B.V. All rights reserved.

Loading National Engineering Laboratory for Speech and Language Information Processing collaborators
Loading National Engineering Laboratory for Speech and Language Information Processing collaborators