Park K.,KAIST |
Weber I.,QCRI |
Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW | Year: 2016
As the world becomes more digitized and interconnected, information that was once considered to be private such as one's health status is now being shared publicly. To understand this new phenomenon better, it is crucial to study what types of health information are being shared on social media and why, as well as by whom. In this paper, we study the traits of users who share their personal health and fitness related information on social media by analyzing fitness status updates that MyFitnessPal users have shared via Twitter. We investigate how certain features like user profile, fitness activity, and fitness network in social media can potentially impact the longterm engagement of fitness app users. We also discuss implications of our findings to achieve a better retention of these users and to promote more sharing of their status updates. © 2016 ACM.
Banerjee S.,Pennsylvania State University |
Mitra P.,QCRI |
Sugiyama K.,National University of Singapore
IJCAI International Joint Conference on Artificial Intelligence | Year: 2015
Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.
Kwak H.,QCRI |
Blackburn J.,Telefonica |
Han S.,University of Washington
Conference on Human Factors in Computing Systems - Proceedings | Year: 2015
In this work we explore cyberbullying and other toxic behavior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior. Besides providing large-scale, empirical based understanding of toxic behavior, our work can be used as a basis for building systems to detect, prevent, and counter-act toxic behavior. © Copyright 2015 ACM.
Fan W.,University of Edinburgh |
Fan W.,Beihang University |
Geerts F.,University of Antwerp |
Tang N.,QCRI |
Yu W.,University of Edinburgh
Proceedings - International Conference on Data Engineering | Year: 2013
This paper introduces a new approach for conflict resolution: given a set of tuples pertaining to the same entity, it is to identify a single tuple in which each attribute has the latest and consistent value in the set. This problem is important in data integration, data cleaning and query answering. It is, however, challenging since in practice, reliable timestamps are often absent, among other things. We propose a model for conflict resolution, by specifying data currency in terms of partial currency orders and currency constraints, and by enforcing data consistency with constant conditional functional dependencies. We show that identifying data currency orders helps us repair inconsistent data, and vice versa. We investigate a number of fundamental problems associated with conflict resolution, and establish their complexity. In addition, we introduce a framework and develop algorithms for conflict resolution, by integrating data currency and consistency inferences into a single process, and by interacting with users. We experimentally verify the accuracy and efficiency of our methods using real-life and synthetic data. © 2013 IEEE.
Abbar S.,QCRI |
Amer-Yahia S.,French National Center for Scientific Research |
Indyk P.,Massachusetts Institute of Technology |
Mahabadi S.,Massachusetts Institute of Technology
WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web | Year: 2013
News articles typically drive a lot of traffic in the form of comments posted by users on a news site. Such user-generated content tends to carry additional information such as entities and sentiment. In general, when articles are recommended to users, only popularity (e.g., most shared and most commented), recency, and sometimes (manual) editors' picks (based on daily hot topics), are considered. We formalize a novel recommendation problem where the goal is to find the closest most diverse articles to the one the user is currently browsing. Our diversity measure incorporates entities and sentiment extracted from comments. Given the realtime nature of our recommendations, we explore the applicability of nearest neighbor algorithms to solve the problem. Our user study on real opinion articles from aljazeera.net and reuters.com validates the use of entities and sentiment extracted from articles and their comments to achieve news diversity when compared to content-based diversity. Finally, our performance experiments show the real-time feasibility of our solution. Copyright is held by the International World Wide Web Conference Committee (IW3C2).