Time filter

Source Type

Grant
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2011.4.2 | Award Amount: 3.68M | Year: 2011

The recent massive growth in online media and the rise of user-authored content (e.g weblogs, Twitter, Facebook) has lead to challenges of how to access and interpret these strongly multilingual data, in a timely, efficient, and affordable manner. Scientifically, streaming online media pose new challenges, due to their shorter, noisier, and more colloquial nature. Moreover, they form a temporal stream strongly grounded in events and context. Consequently, existing language technologies fall short on accuracy, scalability and portability.The goal of this project is to deliver. innovative, portable open-source real-time methods for cross-lingual mining and summarisation of large-scale stream media.TrendMiner will achieve this through an inter-disciplinary approach, combining deep linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. No expensive human annotated data will be required due to our use of time-series data (e.g. financial markets, political polls) as a proxy. A key novelty will be weakly supervised machine learning algorithms for automatic discovery of new trends and correlations. Scalability and affordability will be addressed through a cloud-based infrastructure for real-time text mining from stream media.Results will be validated in two high-profile case studies: financial decision support (with analysts, traders, regulators, and economists) and political analysis and monitoring (with politicians, economists, and political journalists).The techniques will be generic with many business applications: business intelligence, customer relations management, community support. The project will also benefit society and ordinary citizens by enabling enhanced access to government data archives, summarisation of online health information , and tracking of hot societal issues.TrendMiner addresses Objective ICT-2011.4.2 Language Technologies, target outcome b) Information access and mining.


Martinez P.,Charles III University of Madrid | Segura I.,Charles III University of Madrid | Declerck T.,German Research Center for Artificial Intelligence | Martinez J.L.,DAEDALUS - Data, Decisions and Language
Procesamiento de Lenguaje Natural | Year: 2014

The recent massive growth in online media and the rise of user-authored content (e.g weblogs, Twitter, Facebook) has led to challenges of how to access and interpret the strongly multilingual data, in a timely, efficient, and affordable manner. The goal of this project is to deliver innovative, portable open-source real-time methods for cross-lingual mining and summarization of large-scale stream media. Results are validated in three high-profile case studies: financial decision support (with analysts, traders, regulators, and economists), political analysis and monitoring (with politicians, economists, and political journalists) and monitoring patient postings in the health domain to detect adverse drug reactions. © 2014 Sociedad Española para el Procesamiento del Lenguaje Natural.


Schneider J.M.,Charles III University of Madrid | Declerck T.,German Research Center for Artificial Intelligence | Fernandez J.L.M.,DAEDALUS - Data, Decisions and Language | Fernandez P.M.,Charles III University of Madrid
Procesamiento de Lenguaje Natural | Year: 2013

This paper explains the application of ontologies in financial domains to a query expansion process. The final goal is to improve financial information retrieval effectiveness. The system is composed of an ontology and a Lucene index that stores and retrieves natural language concepts. An initial evaluation with a limited number of queries has been performed. Obtained results show that ambiguity remains a problem when expanding a query. The filtering of entities in the expansion process by selecting only companies or references to markets helps in the reduction of ambiguity. © 2013 Sociedad Española Para el Procesamiento del Lenguaje Natural.


Schneider J.M.,Charles III University of Madrid | Luis Martinez Fernandez J.,DAEDALUS - Data, Decisions and Language | Martinez P.,Charles III University of Madrid
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

Automatic speech recognition (ASR) systems are not able to recognize entities that are not present in its vocabulary. The problem considered in this paper is the misrecognition of named entities in Spanish voice queries introducing a proof-of-concept for named entity correction that provides alternative entities to the ones incorrectly recognized or misrecognized by retrieving entities phonetically similar from a dictionary. This system is domain-dependent, using sports news, especially football news, regardless of the automatic speech recognition system used. The correction process exploits the query structure and its semantic information to detect where a named entity appears. The system finds the most suitable alternative entity from a dictionary previously generated with the existing named entities. © Springer International Publishing Switzerland 2014.


Gonzalez M.,Charles III University of Madrid | Moreno J.,Charles III University of Madrid | Martinez J.L.,DAEDALUS - Data, Decisions and Language | Martinez P.,Charles III University of Madrid
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Automatic speech recognition technology can be integrated in an information retrieval process to allow searching on multimedia contents. But, in order to assure an adequate retrieval performance is necessary to state the quality of the recognition phase, especially in speaker-independent and domainindependent environments. This paper introduces a methodology to accomplish the evaluation of different speech recognition systems in several scenarios considering also the creation of new corpora of different types (broadcast news, interviews, etc.), especially in other languages apart from English that are not widely addressed in speech community. © 2013 Springer-Verlag Berlin Heidelberg.


Moreno J.,Charles III University of Madrid | Garrote M.,Charles III University of Madrid | Martinez P.,Charles III University of Madrid | Martinez-Fernandez J.L.,DAEDALUS - Data, Decisions and Language
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

This paper describes some tests performed on different types of voice/audio input applying three commercial speech recognition tools. Three multimedia retrieval scenarios are considered: a question answering system, an automatic transcription of audio from video files and a real-time captioning system used in the classroom for deaf students. A software tool, RET (Recognition Evaluation Tool), has been developed to test the output of commercial ASR systems. © 2011 Springer-Verlag Berlin Heidelberg.


Villena-Roman J.,DAEDALUS - Data, Decisions and Language | Villena-Roman J.,Charles III University of Madrid | Gonzalez-Cristobal J.C.,DAEDALUS - Data, Decisions and Language | Gonzalez-Cristobal J.C.,Technical University of Madrid
CEUR Workshop Proceedings | Year: 2014

This paper describes our participation at PAN 2014 author profiling task. Our idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company. We were interested in finding not the best possible classifier that achieves the highest accuracy, but to find the optimum balance between performance and throughput using the most simple strategy and less dependent of external systems. Results show that our software using Naive Bayes Multinomial with a term vector model representation of the text is ranked quite well among the rest of participants in terms of accuracy.


Villena-Roman J.,DAEDALUS - Data, Decisions and Language | Villena-Roman J.,Charles III University of Madrid | Luna-Cobos A.,DAEDALUS - Data, Decisions and Language | Luna-Cobos A.,Technical University of Madrid | And 2 more authors.
CEUR Workshop Proceedings | Year: 2014

In this paper a highly configurable, real-time analysis system to automatically record, analyze and visualize high level aggregated information of user interventions in Twitter is described. The system is designed to provide public entities with a powerful tool to rapidly and easily understand what the citizen behavior trends are, what their opinion about city services, events, etc. is, and also may used as a primary alert system that may improve the efficiency of emergency systems. The citizen is here observed as a proactive city sensor capable of generating huge amounts of very rich, high-level and valuable data through social media platforms, which, after properly processed, summarized and annotated, allows city administrators to better understand citizen necessities. The architecture and component blocks are described and some key details of the design, implementation and scenarios of application are discussed.


Trademark
DAEDALUS - Data, Decisions and Language | Date: 2015-05-01

Computer software for use in topics extraction, text classification, sentiment analysis, text proofreading, lemmatization, parsing, language identification, text clustering and corporate reputation analysis; Computer applications software for personal computers, mobile phones, handheld computers, tablets, namely, software for use in topics extraction, text classification, sentiment analysis, text proofreading, lemmatization, parsing, language identification, text clustering and corporate reputation analysis; Application programs, namely, computer applications software for personal computers, mobile phones, handheld computers, tablets, namely, software for use in topics extraction, text classification, sentiment analysis, text proofreading, lemmatization, parsing, language identification, text clustering and corporate reputation analysis; downloadable electronic dictionaries. Word processing; Computerized word processing; Compilation of information into computer databases; Compilation of statistics; Data processing services; Systemization of information into computer databases; Data search in computer files for others; Compilation of statistical data for business purposes; Collection and systematization of business data. Creation, design, programming and development of computer software; Software engineering; Advice and consultancy relating to computer software; Information technology consultancy; Software as a service (SaaS) services featuring software for use in topics extraction, text classification, sentiment analysis, text proofreading, lemmatization, parsing, language identification, text clustering and corporate reputation analysis; Application service provider services, namely, providing software for use in topics extraction, text classification, sentiment analysis, text proofreading, lemmatization, parsing, language identification, text clustering and corporate reputation analysis; Development of computer software application solutions; Creation, designing, programming and development of computer software for text analysis engines.


Loading DAEDALUS - Data, Decisions and Language collaborators
Loading DAEDALUS - Data, Decisions and Language collaborators