Time filter

Source Type

Stafylakis T.,École de Technologie Supérieure of Montreal | Katsouros V.,Institute for Language and Speech Processing ILSP | Kenny P.,École de Technologie Supérieure of Montreal | Dumouchel P.,École de Technologie Supérieure of Montreal
2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012 | Year: 2012

This paper provides the theory and the machinery for the generalization of the celebrated mean-shift algorithm to exponential families. We show that the baseline version of the algorithm is a special case of the proposed one, the one formed by the multivariate normal exponential family with known covariance matrix. With the proposed generalization, we will be capable of clustering entities that lie on other probabilistic manifolds, and hence to increasing its applicability significantly. An example is given for the problem of speaker clustering. © 2012 IEEE.


Vatakis A.,Institute for Language and Speech Processing ILSP | Papadelis G.,Aristotle University of Thessaloniki
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

Events and actions evolve over time, making the perception of time one of the most central issues in Cognitive Science. However, to-date many questions remain in regards to time perception and the nature of time. A newly formed network of scientists has recently joined forces in order to advance the human understanding of time though a multidisciplinary approach. This network tackles issues from defining the concept of time to developing rehabilitation techniques and it is fostered by the COST-ESF framework. © 2011 Springer.


Katsouros V.,Institute for Language and Speech Processing ILSP | Papavassiliou V.,Institute for Language and Speech Processing ILSP | Simistira F.,Institute for Language and Speech Processing ILSP | Simistira F.,University of Fribourg | Gatos B.,Institute of Informatics and Telecommunications
Proceedings - 12th IAPR International Workshop on Document Analysis Systems, DAS 2016 | Year: 2016

Optical Character Recognition (OCR) of ancient Greek polytonic scripts is a challenging task due to the large number of character classes, resulting from variations of diacritical marks on the vowel letters. Classical OCR systems require a character segmentation phase, which in the case of Greek polytonic scripts is the main source of errors that finally affects the overall OCR performance. This paper suggests a character segmentation free HMM-based recognition system and compares its performance with other commercial, open source, and state-of-the art OCR systems. The evaluation has been carried out on a challenging novel dataset of Greek polytonic degraded texts and has shown that HMM-based OCR yields character and word level error rates of 8.61% and 25.30% respectively, which outperforms most of the available OCR systems and it is comparable with the performance of the state-of-the-art system based on LSTM Networks proposed recently. © 2016 IEEE.


Vatakis A.,Cognitive Systems Research Institute CSRI | Pastra K.,Cognitive Systems Research Institute CSRI | Pastra K.,Institute for Language and Speech Processing ILSP
Scientific Data | Year: 2016

In the longstanding effort of defining object affordances, a number of resources have been developed on objects and associated knowledge. These resources, however, have limited potential for modeling and generalization mainly due to the restricted, stimulus-bound data collection methodologies adopted. To-date, therefore, there exists no resource that truly captures object affordances in a direct, multimodal, and naturalistic way. Here, we present the first such resource of 'thinking aloud', spontaneously-generated verbal and motoric data on object affordances. This resource was developed from the reports of 124 participants divided into three behavioural experiments with visuo-tactile stimulation, which were captured audiovisually from two camera-views (frontal/profile). This methodology allowed the acquisition of approximately 95 hours of video, audio, and text data covering: object-feature-action data (e.g., perceptual features, namings, functions), Exploratory Acts (haptic manipulation for feature acquisition/verification), gestures and demonstrations for object/feature/action description, and reasoning patterns (e.g., justifications, analogies) for attributing a given characterization. The wealth and content of the data make this corpus a one-of-a-kind resource for the study and modeling of object affordances.


Gatos B.,Institute of Informatics and Telecommunications | Stamatopoulos N.,Institute of Informatics and Telecommunications | Louloudis G.,Institute of Informatics and Telecommunications | Sfikas G.,Institute of Informatics and Telecommunications | And 4 more authors.
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR | Year: 2015

Recognition of old Greek document images containing polytonic (multi accent) characters is a challenging task due to the large number of existing character classes (more than 270) which cannot be handled sufficiently by current OCR technologies. Taking into account that the Greek polytonic system was used from the late antiquity until recently, a large amount of scanned Greek documents still remains without full test search capabilities. In order to assist the progress of relevant research, this paper introduces the first publicly available old Greek polytonic database GRPOLY-DB for the evaluation of several document image processing tasks. It contains both machine-printed and handwritten documents as well as annotation with ground-truth information that can be used for training and evaluation of the most commou document image processing tasks, i.e.. text line and word segmentation, test recognition, isolated character recognition and word spotting. Results using several representative baseline technologies are also presented in order to help researchers evaluate their methods and advance the frontiers of old Greek document image recognition and word spotting. © 2015 IEEE.


Athanaselis T.,Institute for Language and Speech Processing ILSP | Bakamidis S.,Institute for Language and Speech Processing ILSP | Dologlou I.,Institute for Language and Speech Processing ILSP | Argyriou E.N.,National Technical University of Athens | Symvonis A.,National Technical University of Athens
Multimedia Tools and Applications | Year: 2014

This work presents our effort to incorporate a state of the art speech recognition engine into a new platform for assistive reading for improving reading ability of Greek dyslexic students. This platform was developed in the framework of the Agent-DYSL, IST project, and facilitates dyslexic children in learning to read fluently. Unlike previously presented approaches, the aim of the system is not only to enable access to the reading materials within an inclusive learning system but to promote the development of reading skills by adjusting and adapting in the light of feedback to the system. The idea is to improve speech recognition performance so that gradually increase the reading capabilities of the user, gradually diminish the assistance provided, till he is able to read as a non-dyslexic reader. The evaluation results show that both learners' reading pace and learners' reading accuracy were increased. © 2012 Springer Science+Business Media, LLC.


Guimaraes R.,Institute for Language and Speech Processing ILSP | Athanaselis T.,Institute for Language and Speech Processing ILSP | Bakamidis S.,Institute for Language and Speech Processing ILSP | Dologlou I.,Institute for Language and Speech Processing ILSP | Fotinea S.-E.,Institute for Language and Speech Processing ILSP
2010 IEEE International Conference on Imaging Systems and Techniques, IST 2010 - Proceedings | Year: 2010

This paper presents a review of the necessary technology in order to develop a Vocal User interface to be integrated into the jMRUI [1]. jMRUI allows magnetic resonance (MR) spectroscopists to easily perform time-domain analysis of in vivo MR Data and might in the future be used during intraoperative MRI scanning. An operation room with an MRI scanner is a highly noisy environment which degrades speech recognition. These specific circumstances must be taken in consideration and are described in this paper along with the structure of the vocal interface. © 2010 IEEE.


Athanaselis T.,Institute for Language and Speech Processing ILSP | Bakamidis S.,Institute for Language and Speech Processing ILSP | Dologlou I.,Institute for Language and Speech Processing ILSP | Fotinea S.-E.,Institute for Language and Speech Processing ILSP
International Journal of Signal and Imaging Systems Engineering | Year: 2010

In this paper, two well-known speech enhancement techniques are compared in a Magnetic Resonance Imaging (MRI) scanner noise reduction scheme prior to speech recognition experiment. Our study deals with the comparison between the Non Linear Spectral Subtraction (NSS) with iterative overestimation and the Singular Value Decomposition (SVD)-based noise reduction techniques in enhancing medical content speech contaminated by MRI scanner noise. It is proven experimentally that both techniques can improve the recognition performance of voice commands in order to voice control a MRI scanner. Making one step ahead, the paper further investigates the performance of both signal enhancement techniques in the recognition of speech utterances with medical information. The recognition results show that each technique improved the recognition accuracy as it was expected, but the NSS outperformed SVD. Copyright © 2010 Inderscience Enterprises Ltd.


Karabetsos S.,Institute for Language and Speech Processing ILSP | Tsiakoulis P.,Institute for Language and Speech Processing ILSP | Chalamandaris A.,Institute for Language and Speech Processing ILSP | Raptis S.,Institute for Language and Speech Processing ILSP
IEEE Signal Processing Letters | Year: 2010

This letter introduces one-class classification as a framework for the spectral join cost calculation in unit selection speech synthesis. Instead of quantifying the spectral cost by a single distance measure, a data-driven approach is adopted which exploits the natural similarity of consecutive speech frames in the speech database. A pair of consecutive frames is jointly represented as a vector of spectral distance measures which provide training data for the one-class classifier. At synthesis runtime, speech units are selected based on the scores derived from the classifier. Experimental results provide evidence on the effectiveness of the proposed method which clearly outperforms the conventional approaches currently employed. © 2010 IEEE.


Dologlou I.,Institute for Language and Speech Processing ILSP | Bakamidis S.,Institute for Language and Speech Processing ILSP | Carayannis G.,Institute for Language and Speech Processing ILSP
European Signal Processing Conference | Year: 2015

A new algorithm for the design of Hidden Markov Models (HMM) from observed symbol and bisymbol probabilities is presented. The algorithm provides a global optimum and makes use of linear vectorial models of sequences of probabilistic vectors estimated during an off-line learning process. Moreover, a method to enhance observed data so as to comply with the constraints of HMM strings is proposed. An optimal estimate of the number of states of the HMM for the given observed data is also provided. © 2000 EUSIPCO.

Loading Institute for Language and Speech Processing ILSP collaborators
Loading Institute for Language and Speech Processing ILSP collaborators