SwissQual AG

Zuchwil, Switzerland

SwissQual AG

Zuchwil, Switzerland
SEARCH FILTERS
Time filter
Source Type

Beerends J.G.,TNO | Schmidmer C.,OPTICOM GmbH | Berger J.,SwissQual AG | Obermann M.,OPTICOM GmbH | And 3 more authors.
AES: Journal of the Audio Engineering Society | Year: 2013

In two closely related papers we present POLQA (Perceptual Objective Listening Quality Assessment), the third generation perceptual objective speech quality measurement algorithm, standardized by the International Telecommunication Union (ITU-T) as Recommendation P.863 in 2011. This measurement algorithm simulates subjects that rate the quality of a speech fragment in a listening test using a five-point opinion scale. The new standard provides a significantly improved performance in predicting the subjective speech quality in terms of Mean Opinion Scores when compared to PESQ (Perceptual Evaluation of Speech Quality), the second generation of objective speech quality measurements. The new POLQA algorithm allows for predicting speech quality over a wide range of distortions, from "High Definition" super-wideband speech (HD Voice, audio bandwidth up to 14 kHz) to extremely distorted narrowband telephony speech (audio bandwidth down to 2 kHz), using sample rates between 48 and 8 kHz. POLQA is suited for distortions that are outside the scope of PESQ such as linear frequency response distortions, time stretching/compression as found in Voice-over-IP, certain types of codec distortions, reverberations, and the impact of playback volume. POLQA outperforms PESQ in assessing any kind of degradation making it an ideal tool for all speech quality measurements in today's and future mobile and IP based networks. This paper (Part II) outlines the core elements of the underlying perceptual model and presents the final results.


Beerends J.G.,TNO | Schmidmer C.,OPTICOM GmbH | Berger J.,SwissQual AG | Obermann M.,OPTICOM GmbH | And 3 more authors.
AES: Journal of the Audio Engineering Society | Year: 2013

In two closely related papers we present POLQA (Perceptual Objective Listening Quality Assessment), the third generation perceptual objective speech quality measurement algorithm, standardized by the International Telecommunication Union (ITU-T) as Recommendation P.863 in 2011. The algorithm is composed of two separate parts, a temporal alignment that finds speech parts that belong together and a perceptual model that builds an internal representation of the aligned input and output of the device under test. This paper (Part I) provides the basics of the POLQA approach and outlines the core elements of the underlying temporal alignment. The newly developed alignment approach allows assessing the latest Voice over IP technology that often introduces sudden align jumps (mostly in silent intervals) as well as slowly changing time scalings (mostly during speech activity), either using a pitch preserving technique like PSOLA (Pitch Synchronous Overlap Add) or a straight forward technique equivalent to sample rate changes.


Moller S.,Deutsche Telekom AG | Berger J.,SwissQual AG | Raake A.,Deutsche Telekom AG | Waltermann M.,Deutsche Telekom AG | Weiss B.,Deutsche Telekom AG
2011 3rd International Workshop on Quality of Multimedia Experience, QoMEX 2011 | Year: 2011

In this paper, we identify quality dimensions which are relevant for speech communication services, such as mobile telephony or Voice-over-IP. These include dimensions perceived when listening to degraded speech, talking against echoes, double-talk capabilities, interacting with delay, conversing over channels with time-varying characteristics, and service-related dimensions experienced during speech connection set-up and maintenance. For each dimension, we review subjective evaluation metrics and instrumental quality prediction models. We group these dimensions in a framework model which is able to diagnostically assess speech communication services, and may be used for monitoring and maintenance. © 2011 IEEE.


Llagostera Casanovas A.,SwissQual AG | Cavallaro A.,Queen Mary, University of London
Multimedia Tools and Applications | Year: 2014

We present a multimodal method for the automatic synchronization of audio-visual recordings captured with a set of independent cameras. The proposed method jointly processes data from audio and video channels to estimate inter-camera delays that are used to temporally align the recordings. Our approach is composed of three main steps. First we extract from each recording temporally sharp audio-visual events. These audio-visual events are short and characterized by an audio onset happening jointly to a well-localized spatio-temporal change in the video data. Then, we estimate the inter-camera delays by assessing the co-occurrence of the events in the various recordings. Finally, we use a cross-validation procedure that combines the results for all camera pairs and aligns the recordings in a global timeline. An important feature of the proposed method is the estimation of the confidence level on the results that allows us to automatically reject recordings that are not reliable for the alignment. Results show that our method outperforms state-of-the-art approaches based on audio-only or video-only analysis with both fixed and hand-held moving cameras. © 2014, Springer Science+Business Media New York.


Berger J.,SwissQual AG | Llagostera A.,SwissQual AG
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Year: 2015

The quality of speech samples has been traditionally evaluated in subjective listening tests using 5-point Absolute Category Rating (ACR) scales in Listening Only Tests (LOT) as recommended in ITU-T P.800 [1]. Those tests provide the listening quality aspect of speech quality. There are other tests are under discussion and proposed in order to assess in detail individual perceptual dimensions of speech. In this paper we investigate the relationship between the overall listening quality obtained in an ITU-T P.800 ACR subjective test and the rating of the same signals in four dimensions proposed by Wältermann [2], namely noisiness, discontinuity, coloration and loudness. The database we use is composed of conditions and speech signals extracted from an ACR LOT used in the ITU-T P.863 evaluation, processed by simulated and live telecommunication channels [3]. The signals have been re-scored using the four mentioned scales and are foreseen as contribution to the ITU-T P.AMD project. This paper focuses on the modeling of an ACR LOT score based on individual dimensional ratings under the assumption of orthogonality of the four dimensions. Copyright © 2015 ISCA.


Borer S.,SwissQual AG
2010 2nd International Workshop on Quality of Multimedia Experience, QoMEX 2010 - Proceedings | Year: 2010

Transmission of digital videos over band limited and possibly error-prone channels is an important source of temporal impairments, such as reduced frame rates, frame freezings, and frame droppings. This often results in a non fluent and non smooth presentation of video during playback. Perceptually, this is called jerkiness. In this paper a model of perceived jerkiness is proposed. It can be applied to video sequences showing any pattern of jerkiness, with constant or variable frame rate. It includes a parametrisation of the viewing condition. Thus, it can be used to predict jerkiness from small to large resolutions. The model is compared to data from subjective experiments containing a large variety of temporal degradations and spanning the whole range of currently used video resolutions from QCIF up to HD. On this data it shows excellent performance. The model does not require any comparison to a reference signal. Therefore, it can be applied to no-reference monitoring approaches. ©2010 IEEE.

Loading SwissQual AG collaborators
Loading SwissQual AG collaborators