Entity

Time filter

Source Type

Erlangen, Germany

Ebem D.U.,University of Nigeria | Beerends J.G.,TNO | Van Vugt J.,TNO | Schmidmer C.,OPTICOM GmbH | And 2 more authors.
AES: Journal of the Audio Engineering Society | Year: 2011

The extent to which the modeling used in objective speech quality algorithms depends on the cultural background of listeners as well as on the language characteristics using American English and Igbo, an African tone language is investigated. Two different approaches were used in order to separate behavioral aspects from speech signal aspects. In the first approach degraded American English sentences were presented to Igbo listeners and American listeners, showing that Igbo subjects are more disturbed by additive noise in comparison to other degradations than American subjects. In the second approach objective modeling, using ITU-T P.863 (POLQA), showed that Igbo subjects listening to degraded Igbo speech are more disturbed by background noise and low-level listening than predicted by the P.863 standard, which was trained on Western languages using native listeners. The most likely conclusion is that low-level signal parts of the Igbo tone language are relatively more important than lowlevel signal parts of American English. In judging the quality of their own language Igbo listeners thus need more signal level and more signal-to-noise ratio for perceiving high quality than American subjects require in judging their own language. When Igbo subjects judge the quality of American speech samples the impact of noise is overestimated but low-level listening does not have a significant impact on the perceived speech quality. The results show that one cannot build a universal objective speech quality measurement system but that adaptation toward the behavior of a set of subjects is necessary. Further investigation into the impact of tone language signal characteristics and the behavior of subjects who are raised in a specific cultural environment is necessary before a new speech quality measure for assessing voice quality in that environment can be developed. The results also suggest that speech communication systems have to be optimized dependent on the cultural context where the system is used and/or the languages for which the system is intended. Source


Beerends J.G.,TNO | Schmidmer C.,OPTICOM GmbH | Berger J.,SwissQual AG | Obermann M.,OPTICOM GmbH | And 3 more authors.
AES: Journal of the Audio Engineering Society | Year: 2013

In two closely related papers we present POLQA (Perceptual Objective Listening Quality Assessment), the third generation perceptual objective speech quality measurement algorithm, standardized by the International Telecommunication Union (ITU-T) as Recommendation P.863 in 2011. This measurement algorithm simulates subjects that rate the quality of a speech fragment in a listening test using a five-point opinion scale. The new standard provides a significantly improved performance in predicting the subjective speech quality in terms of Mean Opinion Scores when compared to PESQ (Perceptual Evaluation of Speech Quality), the second generation of objective speech quality measurements. The new POLQA algorithm allows for predicting speech quality over a wide range of distortions, from "High Definition" super-wideband speech (HD Voice, audio bandwidth up to 14 kHz) to extremely distorted narrowband telephony speech (audio bandwidth down to 2 kHz), using sample rates between 48 and 8 kHz. POLQA is suited for distortions that are outside the scope of PESQ such as linear frequency response distortions, time stretching/compression as found in Voice-over-IP, certain types of codec distortions, reverberations, and the impact of playback volume. POLQA outperforms PESQ in assessing any kind of degradation making it an ideal tool for all speech quality measurements in today's and future mobile and IP based networks. This paper (Part II) outlines the core elements of the underlying perceptual model and presents the final results. Source


Beerends J.G.,TNO | Schmidmer C.,OPTICOM GmbH | Berger J.,SwissQual AG | Obermann M.,OPTICOM GmbH | And 3 more authors.
AES: Journal of the Audio Engineering Society | Year: 2013

In two closely related papers we present POLQA (Perceptual Objective Listening Quality Assessment), the third generation perceptual objective speech quality measurement algorithm, standardized by the International Telecommunication Union (ITU-T) as Recommendation P.863 in 2011. The algorithm is composed of two separate parts, a temporal alignment that finds speech parts that belong together and a perceptual model that builds an internal representation of the aligned input and output of the device under test. This paper (Part I) provides the basics of the POLQA approach and outlines the core elements of the underlying temporal alignment. The newly developed alignment approach allows assessing the latest Voice over IP technology that often introduces sudden align jumps (mostly in silent intervals) as well as slowly changing time scalings (mostly during speech activity), either using a pitch preserving technique like PSOLA (Pitch Synchronous Overlap Add) or a straight forward technique equivalent to sample rate changes. Source


Pinson M.H.,National Telecommunications and Information Administration NTIA | Janowski L.,AGH University of Science and Technology | Pepion R.,University of Nantes | Huynh-Thu Q.,Technicolor Research and Innovation | And 6 more authors.
IEEE Journal on Selected Topics in Signal Processing | Year: 2012

Traditionally, audio quality and video quality are evaluated separately in subjective tests. Best practices within the quality assessment community were developed before many modern mobile audiovisual devices and services came into use, such as internet video, smart phones, tablets and connected televisions. These devices and services raise unique questions that require jointly evaluating both the audio and the video within a subjective test. However, audiovisual subjective testing is a relatively under-explored field. In this paper, we address the question of determining the most suitable way to conduct audiovisual subjective testing on a wide range of audiovisual quality. Six laboratories from four countries conducted a systematic study of audiovisual subjective testing. The stimuli and scale were held constant across experiments and labs; only the environment of the subjective test was varied. Some subjective tests were conducted in controlled environments and some in public environments (a cafeteria, patio or hallway). The audiovisual stimuli spanned a wide range of quality. Results show that these audiovisual subjective tests were highly repeatable from one laboratory and environment to the next. The number of subjects was the most important factor. Based on this experiment, 24 or more subjects are recommended for Absolute Category Rating (ACR) tests. In public environments, 35 subjects were required to obtain the same Student's t-test sensitivity. The second most important variable was individual differences between subjects. Other environmental factors had minimal impact, such as language, country, lighting, background noise, wall color, and monitor calibration. Analyses indicate that Mean Opinion Scores (MOS) are relative rather than absolute. Our analyses show that the results of experiments done in pristine, laboratory environments are highly representative of those devices in actual use, in a typical user environment. © 2007-2012 IEEE. Source

Discover hidden collaborations