Daldal R.,Sabanci University |
Gamzu I.,Yahoo! Research |
Segev D.,Haifa University |
Unluyurt T.,Sabanci University
Naval Research Logistics | Year: 2016
We introduce and study a generalization of the classic sequential testing problem, asking to identify the correct state of a given series system that consists of independent stochastic components. In this setting, costly tests are required to examine the state of individual components, which are sequentially tested until the correct system state can be uniquely identified. The goal is to propose a policy that minimizes the expected testing cost, given a-priori probabilistic information on the stochastic nature of each individual component. Unlike the classic setting, where variables are tested one after the other, we allow multiple tests to be conducted simultaneously, at the expense of incurring an additional set-up cost. The main contribution of this article consists in showing that the batch testing problem can be approximated in polynomial time within factor 6.829 + ϵ, for any fixed ϵ ∈ (0,1). In addition, we explain how, in spite of its highly nonlinear objective function, the batch testing problem can be formulated as an approximate integer program of polynomial size, while blowing up its expected cost by a factor of at most 1 + ϵ. Finally, we conduct extensive computational experiments, to demonstrate the practical effectiveness of these algorithms as well as to evaluate their limitations. © 2016 Wiley Periodicals, Inc. Naval Research Logistics 63: 275–286, 2016. © 2016 Wiley Periodicals, Inc.
Giannopoulos G.,IMIS Institute |
Koniaris M.,National Technical University of Athens |
Weber I.,Qatar Computing Research Institute |
Jaimes A.,Yahoo! Research |
Sellis T.,RMIT University
Journal of Intelligent Information Systems | Year: 2014
In this paper, we introduce an approach for diversifying user comments on news articles. We claim that, although content diversity suffices for the keyword search setting, as proven by existing work on search result diversification, it is not enough when it comes to diversifying comments of news articles. Thus, in our proposed framework, we define comment-specific diversification criteria in order to extract the respective diversification dimensions in the form of feature vectors. These criteria involve content similarity, sentiment expressed within comments, named entities, quality of comments and combinations of them. Then, we apply diversification on comments, utilizing the extracted features vectors. The outcome of this process is a subset of the initial set that contains heterogeneous comments, representing different aspects of the news article, different sentiments expressed, different writing quality, etc. We perform an experimental analysis showing that the diversity criteria we introduce result in distinctively diverse subsets of comments, as opposed to the baseline of diversifying comments only w.r.t. to their content. We also present a prototype system that implements our diversification framework on news articles comments. © 2014, Springer Science+Business Media New York.
Grabowicz P.A.,Saarland University |
Grabowicz P.A.,University of the Balearic Islands |
Aiello L.M.,Yahoo! Research |
Menczer F.,Indiana University Bloomington
EPJ Data Science | Year: 2014
Detecting and visualizing what are the most relevant changes in an evolving network is an open challenge in several domains. We present a fast algorithm that filters subsets of the strongest nodes and edges representing an evolving weighted graph and visualize it by either creating a movie, or by streaming it to an interactive network visualization tool. The algorithm is an approximation of exponential sliding time-window that scales linearly with the number of interactions. We compare the algorithm against rectangular and exponential sliding time-window methods. Our network filtering algorithm: (i) captures persistent trends in the structure of dynamic weighted networks, (ii) smoothens transitions between the snapshots of dynamic network, and (iii) uses limited memory and processor time. The algorithm is publicly available as open-source software. © 2014 Grabowicz et al.
Gullo F.,Yahoo! Research |
Domeniconi C.,George Mason University |
Tagarelli A.,University of Calabria
Machine Learning | Year: 2013
The Projective Clustering Ensemble (PCE) problem is a recent clustering advance aimed at combining the two powerful tools of clustering ensembles and projective clustering. PCE has been formalized as either a two-objective or a single-objective optimization problem. Two-objective PCE has been recognized as more accurate than its single-objective counterpart, although it is unable to jointly handle the object-based and feature-based cluster representations.In this paper, we push forward the current PCE research, aiming to overcome the limitations of all existing PCE formulations. We propose a novel single-objective PCE formulation so that (i) the object-based and feature-based cluster representations are jointly considered, and (ii) the resulting optimization strategy follows a metacluster-based methodology borrowed from traditional clustering ensembles. As a result, the proposed formulation features best suitability to the PCE problem, thus guaranteeing improved effectiveness. Experiments on benchmark datasets have shown how the proposed approach achieves better average accuracy than all existing PCE methods, as well as efficiency superior to the most accurate existing metacluster-based PCE method on larger datasets. © 2013, The Author(s).
Gamzu I.,Yahoo! Research |
Medina M.,Tel Aviv University
Algorithmica | Year: 2016
An instance of the maximum mixed graph orientation problem consists of a mixed graph and a collection of source-target vertex pairs. The objective is to orient the undirected edges of the graph so as to maximize the number of pairs that admit a directed source-target path. This problem has recently arisen in the study of biological networks, and it also has applications in communication networks. In this paper, we identify an interesting local-to-global orientation property. This property enables us to modify the best known algorithms for maximum mixed graph orientation and some of its special structured instances, due to Elberfeld et al. (Theor. Comput. Sci. 483:96–103, 2013), and obtain improved approximation ratios. We further proceed by developing an algorithm that achieves an even better approximation guarantee for the general setting of the problem. Finally, we study several well-motivated variants of this orientation problem. © 2014, Springer Science+Business Media New York.
Singh G.,University College London |
Mantrach A.,Yahoo! Research |
Silvestri F.,Yahoo! Research
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2016
The majority of online users do not engage highly with services that are offered via Web. This is a well-known fact and it is also one of the main issues that personalization algorithms try to overcome. A popular way of personalizing an online service is to record users’ actions into user profiles. Weakly-engaged users lead to sparsely populated user profiles, or weak profiles as we name them. Such weak profiles constitute a source of potential increase in user engagement and as a consequence, windfall profits for Internet companies. In this paper, we define the novel problem of enhancing weak profiles in positive space and propose an effective solution based on learning collective embedding space in order to capture a low-dimensional manifold designed to specifically reconstruct sparse user profiles. Our method consistently outperforms baselines consisting of kNN and collective factorization without constraints on user profile. Experiments on two datasets, news and video, from a popular online portal show improvements of up to more than 100% in terms of MAP for extremely weak profiles, and up to around 10% for moderately weak profiles. In order to evaluate the impact of our method on learned latent space embeddings for users and items, we generate recommendations exploiting our user profile constrained approach. The generated recommendations outperform state-of-the-art techniques based on low-rank collective matrix factorization in particular for users that clicked at most four times (78–82% of the total) on the items published by the online portal we consider. © Springer International Publishing Switzerland 2016.
Sadeghi S.,RMIT University |
Blanco R.,Yahoo! Research |
Mika P.,Yahoo! Research |
Sanderson M.,RMIT University |
And 2 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2015
In this study, we address the problem of identifying if users are attempting to re-find information and estimating the level of difficulty of the refinding task. We propose to consider the task information (e.g. multiple queries and click information) rather than only queries. Our resultant prediction models are shown to be significantly more accurate (by 2%) than the current state of the art. While past research assumes that previous search history of the user is available to the prediction model, we examine if re-finding detection is possible without access to this information. Our evaluation indicates that such detection is possible, but more challenging. We further describe the first predictive model in detecting re-finding difficulty, showing it to be significantly better than existing approaches for detecting general search difficulty. © Springer International Publishing Switzerland 2015.
PubMed | Yahoo! Research
Type: Journal Article | Journal: Behavior research methods | Year: 2012
Amazons Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this article is to demonstrate how to use this Web site for conducting behavioral research and to lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments. While other methods of conducting behavioral research may be comparable to or even better than Mechanical Turk on one or more of the axes outlined above, we will show that when taken as a whole Mechanical Turk can be a useful tool for many researchers. We will discuss how the behavior of workers compares with that of experts and laboratory subjects. Then we will illustrate the mechanics of putting a task on Mechanical Turk, including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform, including techniques for conducting synchronous experiments, methods for ensuring high-quality work, how to keep data private, and how to maintain code security.
PubMed | Yahoo! Research
Type: Journal Article | Journal: PloS one | Year: 2011
A longstanding idea in the literature on human cooperation is that cooperation should be reinforced when conditional cooperators are more likely to interact. In the context of social networks, this idea implies that cooperation should fare better in highly clustered networks such as cliques than in networks with low clustering such as random networks. To test this hypothesis, we conducted a series of web-based experiments, in which 24 individuals played a local public goods game arranged on one of five network topologies that varied between disconnected cliques and a random regular graph. In contrast with previous theoretical work, we found that network topology had no significant effect on average contributions. This result implies either that individuals are not conditional cooperators, or else that cooperation does not benefit from positive reinforcement between connected neighbors. We then tested both of these possibilities in two subsequent series of experiments in which artificial seed players were introduced, making either full or zero contributions. First, we found that although players did generally behave like conditional cooperators, they were as likely to decrease their contributions in response to low contributing neighbors as they were to increase their contributions in response to high contributing neighbors. Second, we found that positive effects of cooperation were contagious only to direct neighbors in the network. In total we report on 113 human subjects experiments, highlighting the speed, flexibility, and cost-effectiveness of web-based experiments over those conducted in physical labs.
PubMed | Yahoo! Research
Type: Journal Article | Journal: PloS one | Year: 2012
We live in a computerized and networked society where many of our actions leave a digital trace and affect other peoples actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.