Time filter

Source Type

Sunnyvale, CA, United States

Miyazawa F.K.,University of Campinas | Pedrosa L.L.C.,University of Campinas | Schouery R.C.S.,University of Campinas | Sviridenko M.,Yahoo! Labs | Wakabayashi Y.,University of Sao Paulo
Algorithmica | Year: 2015

We consider the problem of packing a set of circles into a minimum number of unit square bins. To obtain rational solutions, we use augmented bins of height (Formula presented.), for some arbitrarily small number (Formula presented.). For this problem, we obtain an asymptotic approximation scheme (APTAS) that is polynomial on (Formula presented.), and thus (Formula presented.) may be given as part of the problem input. For the special case that (Formula presented.) is constant, we give a (one dimensional) resource augmentation scheme, that is, we obtain a packing into bins of unit width and height (Formula presented.) using no more than the number of bins in an optimal packing without resource augmentation. Additionally, we obtain an APTAS for the circle strip packing problem, whose goal is to pack a set of circles into a strip of unit width and minimum height. Our algorithms are the first approximation schemes for circle packing problems, and are based on novel ideas of iteratively separating small and large items, and may be extended to a wide range of packing problems that satisfy certain conditions. These extensions comprise problems with different kinds of items, such as regular polygons, or with bins of different shapes, such as circles and spheres. As an example, we obtain APTAS’s for the problems of packing d-dimensional spheres into hypercubes under the (Formula presented.)-norm. © 2015 Springer Science+Business Media New York Source

Anandkumar A.,University of California at Irvine | Foster D.P.,Yahoo! Labs | Hsu D.,Columbia University | Kakade S.M.,Microsoft | Liu Y.-K.,U.S. National Institute of Standards and Technology
Algorithmica | Year: 2015

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments. © 2014, Springer Science+Business Media New York. Source

Zhang C.,CAS Institute of Software | Wang H.,CAS Institute of Software | Cao L.,Yahoo! Labs | Wang W.,CAS Institute of Software | Xu F.,CAS Institute of Software
Knowledge-Based Systems | Year: 2016

Topic detection as a tool to detect topics from online media attracts much attention. Generally, a topic is characterized by a set of informative keywords/terms. Traditional approaches are usually based on various topic models, such as Latent Dirichlet Allocation (LDA). They cluster terms into a topic by mining semantic relations between terms. However, co-occurrence relations across the document are commonly neglected, which leads to the detection of incomplete information. Furthermore, the inability to discover latent co-occurrence relations via the context or other bridge terms prevents the important but rare topics from being detected. To tackle this issue, we propose a hybrid relations analysis approach to integrate semantic relations and co-occurrence relations for topic detection. Specifically, the approach fuses multiple relations into a term graph and detects topics from the graph using a graph analytical method. It can not only detect topics more effectively by combing mutually complementary relations, but also mine important rare topics by leveraging latent co-occurrence relations. Extensive experiments demonstrate the advantage of our approach over several benchmarks. © 2015 Elsevier B.V. All rights reserved. Source

Agarwal N.,University of Arkansas at Little Rock | Liu H.,Arizona State University | Tang L.,Yahoo! Labs | Yu P.S.,University of Illinois at Chicago
Social Network Analysis and Mining | Year: 2012

Blogging has become a popular and convenient way to communicate, publish information, share preferences, voice opinions, provide suggestions, report news, and form virtual communities in the Blogosphere. The blogosphere obeys a power law distribution with very few blogs being extremely influential and a huge number of blogs being largely unknown. Regardless of a (multi-author) blog being influential or not, there are influential bloggers. However, the sheer number of such blogs makes it extremely challenging to study each one of them. One way to analyze these blogs is to find influential bloggers and consider them as the community representatives. Influential bloggers can impact fellow bloggers in various ways. In this paper, we study the problem of identifying influential bloggers. We define influential bloggers, investigate their characteristics, discuss the challenges with identification, develop a model to quantify their influence, and pave the way for further research leading to more sophisticated models that enable categorization of various types of influential bloggers. To highlight these issues, we conduct experiments using data from blogs, evaluate multiple facets of the problem, and present a unique and objective evaluation strategy given the subjectivity in defining the influence, in addition to various other analytical capabilities. We conclude with interesting findings and future work. © 2011, Springer-Verlag. Source

Fernandes E.R.,Federal University of Mato Grosso do Sul | Brefeld U.,Luneburg University | Blanco R.,Yahoo! Labs | Atserias J.,University of the Basque Country
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2016

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings. © Springer International Publishing Switzerland 2016. Source

Discover hidden collaborations