Time filter

Source Type

Lisbon, Portugal

Carvalho G.,University of Lisbon | Martins De Matos D.,L2F INESC ID Lisbon | Martins De Matos D.,University of Lisbon | Rocio V.,University of Lisbon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

IdSay is a Question Answering system for Portuguese that participated at QA@CLEF 2008 with a baseline version (IdSayBL). Despite the encouraging results, there was still much room for improvement. The participation of six systems in the Portuguese task, with very good results either individually or in an hypothetical combination run, provided a valuable source of information. We made an analysis of all the answers submitted by all systems to identify their strengths and weaknesses. We used the conclusions of that analysis to guide our improvements, keeping in mind the two key characteristics we want for the system: efficiency in terms of response time and robustness to treat different types of data. As a result, an improved version of IdSay was developed, including as the most important enhancement the introduction of semantic information. We obtained significantly better results, from an accuracy in the first answer of 32.5% in IdSayBL to 50.5% in IdSay, without degradation of response time. © Springer-Verlag Berlin Heidelberg 2010. Source

Ribeiro R.,Instituto Universitario Of Lisbon Iscte Iul | De Matos D.M.,L2F INESC ID Lisbon | De Matos D.M.,University of Lisbon
Journal of Artificial Intelligence Research

In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domainindependent. Thorough automatic evaluation shows that the method achieves state-of-theart performance, both in written text, and automatically transcribed speech summarization, including when compared to considerably more complex approaches. © 2011 AI Access Foundation. All rights reserved. Source

Carvalho G.,University of Lisbon | Fale I.,University of Lisbon | De Matos D.M.,L2F INESC ID Lisbon | De Matos D.M.,University of Lisbon | Rocio V.,University of Lisbon
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions. © 2012 Springer-Verlag. Source

Marujo L.,LTI CMU | Marujo L.,University of Lisbon | Ribeiro R.,L2F INESC ID Lisbon | Ribeiro R.,Instituto Universitario Of Lisbon Iscte Iul | And 6 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels. © 2012 Springer-Verlag. Source

Aparicio M.,L2F INESC ID Lisbon | Aparicio M.,Instituto Universitario Of Lisbon Iscte Iul | Figueiredo P.,L2F INESC ID Lisbon | Figueiredo P.,University of Lisbon | And 7 more authors.
Pattern Recognition Letters

We assess the performance of generic text summarization algorithms applied to films and documentaries, using extracts from news articles produced by reference models of extractive summarization. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles. © 2016 Elsevier B.V. All rights reserved. Source

Discover hidden collaborations