Imran M.,Qatar Computing Research Institute |
Chawla S.,Qatar Computing Research Institute |
Proceedings - IEEE International Conference on Data Mining, ICDM | Year: 2017
An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-To-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient. © 2016 IEEE.
Rafael-Palou X.,EURECAT |
Zambrana C.,EURECAT |
Vargiu E.,EURECAT |
Communications in Computer and Information Science | Year: 2015
People that need assistance, as for instance elderly or disabled people, may be affected by a decline in daily functioning that usually involves the reduction and discontinuity in daily routines and a worsening in the overall quality of life. Thus, there is the need to intelligent systems able to monitor indoor and outdoor activities of users to detect emergencies, recognize activities, send notifications, and provide a summary of all the relevant information. To this end, several sensorbased telemonitoring and home support systems have been presented in the literature. Unfortunately, performance of those systems depends, among other characteristics, on the reliability of the adopted sensors. Although binary sensors are quite used in the literature and also in commercial solutions to identify user’s activities, they are prone to noise and errors. In this chapter, we present a hierarchical approach, based on machine learning techniques, aimed at reducing errors from the sensors. The proposed approach is aimed at improving the classification accuracy in detecting if a user is at home, away, alone or with some visits. It has been integrated in a sensor-based telemonitoring and home support system. After being evaluated with a control user, the overall system has been installed in 8 elderly people’s homes in Barcelona, results are presented in this chapter. © Springer International Publishing Switzerland 2015.
Miralles F.,Eurecat |
Vargiu E.,Eurecat |
Casals E.,University Pompeu Fabra |
Cordero J.A.,Polytechnic University of Catalonia |
International Journal of E-Health and Medical Communications | Year: 2015
Telemonitoring makes possible to remotely assess health status and quality of life of individuals. By acquiring heterogeneous data coming from sensors (physiological, biometric, environmental; non-invasive, adaptive and transparent to user) and data coming from other sources to become aware of user context; by inferring user behaviour and detecting anomalies from this data; and by providing elaborated and smart knowledge to clinicians, therapists, carers, families, and the patients themselves, we will be able to foster preventive, predictive and personalized care actions, decisions and support. In this paper, by relying on a novel sensor-based telemonitoring and home support system, the authors are focused on monitoring mobility activities; the ultimate goal being to automatically assess quality of life of people. In particular, the authors are aimed at answering to an item of a quality-oflife questionnaire, namely "Mobility". Although the authors are interested in assisting disabled people, they performed preliminary experiments with a healthy user, as a proof of concept. Results show that the approach is promising. Thus, the authors are now in the process to install the final system in a number of disabled people's homes under the umbrella of the BackHome project. Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Garcia R.,Eurecat |
Gomez D.,University of Santiago de Chile |
Parra D.,University of Santiago de Chile |
Trattner C.,Norwegian University of Science and Technology |
And 2 more authors.
HT 2015 - Proceedings of the 26th ACM Conference on Hypertext and Social Media | Year: 2015
Using Twitter during academic conferences is a way of engaging and connecting an audience inherently multicultural by the nature of scientific collaboration. English is expected to be the lingua franca bridging the communication and integration between native speakers of different mother tongues. However, little research has been done to support this assumption. In this paper we analyzed how integrated language communities are by analyzing the scholars' tweets used in 26 Computer Science conferences over a time span of five years. We found that although English is the most popular language used to tweet during conferences, a significant proportion of people also tweet in other languages. In addition, people who tweet solely in English interact mostly within the same group (English monolinguals), while people who speak other languages interact more with different lingua groups. Finally, we also found higher interaction between people tweeting in different languages.These results suggest a relation between the number of languages a user speaks and their interaction dynamics in online communities. © 2015 ACM.
Mehmood Y.,University Pompeu Fabra |
Bonchi F.,ISI Foundation |
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2016
What is the set of nodes of a social network that, under a probabilistic contagion model, would get infected if a given node s gets infected? We call this set the sphere of influence of s. Due to the stochastic nature of the contagion model we need to define a notion of "expected" or "typical" cascade: this is a set of nodes which is the closest to all the possible cascades starting from s. We thus formalize the Typical Cascade problem which requires, for a given source node s, to find the set of nodes minimizing the expected Jaccard distance to all the possible cascades from s. The expected cost of a typical cascade also provides us a measure of the stability of cascade propagation, i.e., how much random cascades from a source node s deviate from the "typical" cascade. In this sense source nodes with lower expected costs are more reliable. We show that, while computing the quality of a candidate solution is #P-hard, a method based on (1) sampling random cascades and (2) computing their Jaccard Median, can obtain a multiplicative approximation with just O(1) samples. We then devise an index that allows to efficiently compute the sphere of influence for any node in the network. Finally, we propose to approach the influence maximization problem as an instance of set cover on the spheres of influence. Through exhaustive evaluation using real-world networks and different methods of assigning the influence probability to each edge, we show that our approach outper- forms in quality the theoretically optimal greedy algorithm. © 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Arapakis I.,Eurecat |
Arapakis I.,Yahoo! |
Leiva L.A.,Sciling Inc. |
Leiva L.A.,Polytechnic University of Valencia
SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval | Year: 2016
Predicting user engagement with direct displays (DD) is of paramount importance to commercial search engines, as well as to search performance evaluation. However, understanding within-content engagement on a web page is not a trivial task mainly because of two reasons: (1) engagement is subjective and different users may exhibit different behavioural patterns; (2) existing proxies of user engagement (e.g., clicks, dwell time) suffer from certain caveats, such as the well-known position bias, and are not as effective in discriminating between useful and non-useful components. In this paper, we conduct a crowdsourcing study and examine how users engage with a prominent web search engine component such as the knowledge module (KM) display. To this end, we collect and analyse more than 115k mouse cursor positions from 300 users, who perform a series of search tasks. Furthermore, we engineer a large number of meta-features which we use to predict different proxies of user engagement, including attention and usefulness. In our experiments, we demonstrate that our approach is able to predict more accurately different levels of user engagement and outperform existing baselines. © 2016 ACM.
Hajian S.,Eurecat |
Bonchi F.,ISI Foundation |
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | Year: 2016
Algorithms and decision making based on Big Data have become pervasive in all aspects of our daily lives lives (offline and online), as they have become essential tools in personal finance, health care, hiring, housing, education, and policies. It is therefore of societal and ethical importance to ask whether these algorithms can be discriminative on grounds such as gender, ethnicity, or health status. It turns out that the answer is positive: for instance, recent studies in the context of online advertising show that ads for high-income jobs are presented to men much more often than to women ; and ads for arrest records are significantly more likely to show up on searches for distinctively black names . This algorithmic bias exists even when there is no discrimination intention in the developer of the algorithm. Sometimes it may be inherent to the data sources used (software making decisions based on data can reflect, or even amplify, the results of historical discrimination), but even when the sensitive attributes have been suppressed from the input, a well trained machine learning algorithm may still discriminate on the basis of such sensitive attributes because of correlations existing in the data. These considerations call for the development of data mining systems which are discrimination-conscious by-design. This is a novel and challenging research area for the data mining community. The aim of this tutorial is to survey algorithmic bias, presenting its most common variants, with an emphasis on the algorithmic techniques and key ideas developed to derive efficient solutions. The tutorial covers two main complementary approaches: algorithms for discrimination discovery and discrimination prevention by means of fairness-aware data mining. We conclude by summarizing promising paths for future research. © 2016 Copyright held by the owner/author(s).
Imran M.,Qatar Computing Research Institute |
Meier P.,The World Bank |
Castillo C.,Eurecat |
Lesa A.,UNICEF |
DH 2016 - Proceedings of the 2016 Digital Health Conference | Year: 2016
In response to the growing HIV/AIDS and other health-related issues, UNICEF through their U-Report platform receives thousands of messages (SMS) every day to pro-vide prevention strategies, health case advice, and counsel-ing support to vulnerable population. Due to a rapid in-crease in U-Report usage (up to 300% in last 3 years), plus approximately 1,000 new registrations each day, the volume of messages has thus continued to increase, which made it impossible for the team at UNICEF to process them in a timely manner. In this paper, we present a platform de-signed to perform automatic classification of short messages (SMS) in real-Time to help UNICEF categorize and prior-itize health-related messages as they arrive. We employ a hybrid approach, which combines human and machine intel-ligence that seeks to resolve the information overload issue by introducing processing of large-scale data at high-speed while maintaining a high classification accuracy. The sys-Tem has recently been tested in conjunction with UNICEF in Zambia to classify short messages received via the U-Report platform on various health related issues. The system is designed to enable UNICEF make sense of a large volume of short messages in a timely manner. In terms of evalua-Tion, we report design choices, challenges, and performance of the system observed during the deployment to validate its effectiveness.
Graells-Garrido E.,Telefonica |
ACM International Conference Proceeding Series | Year: 2016
Travel surveys provide rich information about urban mobility and commuting patterns. But, at the same time, they have drawbacks: they are static pictures of a dynamic phenomena, are expensive to make, and take prolonged periods of time to finish. Nowadays, the availability of mobile usage data (Call Detail Records) makes the study of urban mobility possible at spatiotemporal granularity levels that surveys do not reach. This has been done in the past with good results - mobile data makes possible to find and understand aggregated mobility patterns. In this paper, we propose to analyze mobile data at individual level by estimating daily journeys, and use those journeys to build Origin-Destiny matrices to understand urban flow. We evaluate this approach with large anonymized CDRs from Santiago, Chile, and find that our method has a high correlation (ρ = 0.89) with the current travel survey, and that it captures external anomalies in daily travel patterns, making our method suitable for inclusion into urban computing applications. © 2016 Copyright held by the owner/author(s).
PubMed | Eurecat, Labs and Institute Universitari Of Neurorehabilitacio Adscrit Uab
Type: Journal Article | Journal: International journal of environmental research and public health | Year: 2015
The objective of this research is to provide a standardized platform to monitor and predict indicators of people with traumatic brain injury using the International Classification of Functioning, Disability and Health, and analyze its potential benefits for people with disabilities, health centers and administrations. We developed a platform that allows automatic standardization and automatic graphical representations of indicators of the status of individuals and populations. We used data from 730 people with acquired brain injury performing periodic comprehensive evaluations in the years 2006-2013. Health professionals noted that the use of color-coded graphical representation is useful for quickly diagnose failures, limitations or restrictions in rehabilitation. The prognosis system achieves 41% of accuracy and sensitivity in the prediction of emotional functions, and 48% of accuracy and sensitivity in the prediction of executive functions. This monitoring and prognosis system has the potential to: (1) save costs and time, (2) provide more information to make decisions, (3) promote interoperability, (4) facilitate joint decision-making, and (5) improve policies of socioeconomic evaluation of the burden of disease. Professionals found the monitoring system useful because it generates a more comprehensive understanding of health oriented to the profile of the patients, instead of their diseases and injuries.