Time filter

Source Type

Kaohsiung, Taiwan

Wang K.-C.,Shin Chien University | Chin C.-L.,Chung Shan Medical University
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | Year: 2011

In this paper, we present an approach of detecting speech presence for which the decision rule is based on a combination of multiple features using a sigmoid function. A minimum classification error (MCE) training is used to update the weights adjustment for the combination. The features, consisting of three parameters: the ratio of ZCR, the spectral energy, and spectral entropy, are combined linearly with weights derived from the sub-band domain. First, the Bark-scale wavelet decomposition (BSWD) is used to split the input speech into 24 critical sub-bands. Next, the feature parameters are derived from the selected frequency subband to form robust voice feature parameters. In order to discard the seriously corrupted frequency sub-band, a strategy of adaptive frequency subband extraction (AFSE) dependant on the sub-band SNR is then applied to only the frequency sub-band used. Finally, these three feature parameters, which only consider the useful sub-band, are combined through a sigmoid type function incorporating optimal weights based on MSE training to detect either a speech present frame or a speech absent frame. Experimental results show that the performance of the proposed algorithm is superior to the standard methods such as G.729B and AMR2. © 2011 The Institute of Electronics, Information and Communication Engineers. Source

Wang K.-C.,Shin Chien University | Chin C.-L.,Chung Shan Medical University
WSEAS Transactions on Information Science and Applications | Year: 2010

In this paper, we propose a novel wavelet coefficient threshold (WCT) depended on both time and frequency information for providing robustness to non-stationary and correlated noisy environments. A perceptual wavelet filter-bank (PWFB) is firstly used to decompose the noisy speech signal into critical bands according to critical bands of psycho-acoustic model of human auditory system. The estimation of wavelet coefficient threshold (WCT) is then adjusted with the posterior SNR, which is determined by estimated noise power, through the well-known "Quantum Neural Networks (QNN)". In order to suppress the appearance of musical residual noise produced by thresholding process, we consider masking properties of human auditory system to reduce the effect of musical residual noise. Simulation results showed that the proposed system is capable of reducing noise with little speech degradation and the overall performance is superior to several competitive methods. Source

Kung C.M.,Shin Chien University
Journal of Multimedia | Year: 2010

Due to the rapid development of computer networks and data communication technologies, communication using digital media (text, picture, sound, video, etc.) has become more and more frequent. Digital media can be readily duplicated, modified, and transmitted, making them easy for people to create, manipulate, and enjoy. Thus the protection of the intellectual property rights of digital images becomes an important issue. Watermark is an effective and popular technique for discouraging illegal copying and distribution of copyrighted digital image information. In this paper, we proposed the method for robust watermarking. First, the robust watermarking scheme performed in the frequency domain. It can be used to prove the ownership. Second, we can provide a high degree of robustness against JPEG compression attacks by the source coding, and protect the transmit information by channel coding. We adopt the data distribution idea to avoid the continue information attack, because it will destroy the entire error correction scheme. Experimental results are also presented to demonstrate the validity and robustness of the approach. © 2010 ACADEMY PUBLISHER. Source

Wang K.-C.,Shin Chien University
International Journal of Computers and Applications | Year: 2011

To obtain reliable performance of Voice Activity Detector (VAD) algorithm, the straight lines on spectrogram of speech-activity being robust against noise is characterized by an entropy-based measure in this paper. A measure of entropy will be defined on the energy domain of harmonic subband. It is shown that the entropy-based measure is well suited for detecting speech in white or quasi-white noises, but will perform poorly for coloured noises. To compensate the limitation, the refined minima controlled recursive averaging, which be updated quickly and accurately even given rapidly increasing levels of noise, is required to desensitize the measure of entropy to various types of noise. Consequently, the proposed VAD algorithm is shown significantly outperform the commonly used energy-based algorithm when SNR drops rapidly, and moreover is insensitive to the changing-level of noise. Experimental results demonstrate that the performance of the proposed VAD is comparable to modern standard VADs such that ITU-T G.729B and ETSI front-end VAD or statistical model-based VADs. Source

Wang K.-C.,Shin Chien University
IEICE Transactions on Information and Systems | Year: 2010

Traditional wavelet-based speech enhancement algorithms are ineffective in the presence of highly non-stationary noise because of the difficulties in the accurate estimation of the local noise spectrum. In this paper, a simple-method of noise estimation employing the use of a voice activity detector is proposed. We can improve the output of a wavelet-based speech enhancement algorithm in the presence of random noise bursts according to the results of VAD decision. The noisy speech is first preprocessed using bark-scale wavelet packet decomposition (BSWPD) to convert a noisy signal into wavelet coefficients (WCs). It is found that the VAD using bark-scale spectral entropy, called as BS-Entropy, parameter is superior to other energy-based approach especially in variable noise-level. The wavelet coefficient threshold (WCT) of each subband is then temporally adjusted according to the result of VAD approach. In a speech-dominated frame, the speech is categorized into either a voiced frame or an unvoiced frame. A voiced frame possesses a strong tone-like spectrum in lower subbands, so that the WCs of lower-band must be reserved. On the contrary, the WCT tends to increase in lower-band if the speech is categorized as unvoiced. In a noise-dominated frame, the background noise can be almost completely removed by increasing the WCT. The objective and subjective experimental results are then used to evaluate the proposed system. The experiments show that this algorithm is valid on various noise conditions, especially for color noise and non-stationary noise conditions. Copyright ©2010 The Institute of Electronics, Information and Communication Engineers. Source

Discover hidden collaborations