Lisbon, Portugal
Lisbon, Portugal

Time filter

Source Type

Carvalho G.,University of Lisbon | Fale I.,University of Lisbon | De Matos D.M.,L2F INESC ID Lisbon | De Matos D.M.,University of Lisbon | Rocio V.,University of Lisbon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions. © 2012 Springer-Verlag.


Ribeiro R.,Instituto Universitario Of Lisbon Iscte Iul | De Matos D.M.,L2F INESC ID Lisbon | De Matos D.M.,University of Lisbon
Journal of Artificial Intelligence Research | Year: 2011

In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domainindependent. Thorough automatic evaluation shows that the method achieves state-of-theart performance, both in written text, and automatically transcribed speech summarization, including when compared to considerably more complex approaches. © 2011 AI Access Foundation. All rights reserved.


Carvalho G.,University of Lisbon | Martins De Matos D.,L2F INESC ID Lisbon | Martins De Matos D.,University of Lisbon | Rocio V.,University of Lisbon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2010

IdSay is a Question Answering system for Portuguese that participated at QA@CLEF 2008 with a baseline version (IdSayBL). Despite the encouraging results, there was still much room for improvement. The participation of six systems in the Portuguese task, with very good results either individually or in an hypothetical combination run, provided a valuable source of information. We made an analysis of all the answers submitted by all systems to identify their strengths and weaknesses. We used the conclusions of that analysis to guide our improvements, keeping in mind the two key characteristics we want for the system: efficiency in terms of response time and robustness to treat different types of data. As a result, an improved version of IdSay was developed, including as the most important enhancement the introduction of semantic information. We obtained significantly better results, from an accuracy in the first answer of 32.5% in IdSayBL to 50.5% in IdSay, without degradation of response time. © Springer-Verlag Berlin Heidelberg 2010.


Aparicio M.,L2F INESC ID Lisbon | Aparicio M.,Instituto Universitario Of Lisbon Iscte Iul | Figueiredo P.,L2F INESC ID Lisbon | Figueiredo P.,University of Lisbon | And 7 more authors.
Pattern Recognition Letters | Year: 2016

We assess the performance of generic text summarization algorithms applied to films and documentaries, using extracts from news articles produced by reference models of extractive summarization. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles. © 2016 Elsevier B.V. All rights reserved.


Marujo L.,LTI CMU | Marujo L.,University of Lisbon | Ribeiro R.,L2F INESC ID Lisbon | Ribeiro R.,Instituto Universitario Of Lisbon Iscte Iul | And 6 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels. © 2012 Springer-Verlag.


Rodrigues H.,L2F INESC ID Lisbon | Coheur L.,L2F INESC ID Lisbon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

In this paper we look into Who Wants to Be a Millionaire, a contest of multiple-answer questions, as an answer selection subproblem. Answer selection, in Question Answering systems, allows them to boost one or more correct candidate answers over a set of candidate answers. In this subproblem we look only to a set of four candidate answers, in which one is the correct answer. The built platform is language independent and supports other languages besides English with no effort. In this paper we compare some techniques for answer selection, employing them to both English and Portuguese in the context of Who Wants to Be a Millionaire. The results showed that the strategy may be applicable to more than a language without damaging its performance, getting accuracies around 73%. © 2012 Springer-Verlag.


Mota P.,L2F INESC ID Lisbon | Coheur L.,L2F INESC ID Lisbon | Curto S.,L2F INESC ID Lisbon | Fialho P.,L2F INESC ID Lisbon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

In this paper we target Natural Language Understanding in the context of Conversational Agents that answer questions about their topics of expertise, and have in their knowledge base question/answer pairs, limiting the understanding problem to the task of finding the question in the knowledge base that will trigger the most appropriate answer to a given (new) question. We implement such an agent and different state of the art techniques are tested, covering several paradigms, and moving from lab experiments to tests with real users. First, we test the implemented techniques in a corpus built by the agent's developers, corresponding to the expected questions; then we test the same techniques in a corpus representing interactions between the agent and real users. Interestingly, results show that the best "lab" techniques are not necessarily the best for real scenarios, even if only in-domain questions are considered. © 2012 Springer-Verlag.

Loading L2F INESC ID Lisbon collaborators
Loading L2F INESC ID Lisbon collaborators