Yahoo! Labs

Sunnyvale, CA, United States

Yahoo! Labs

Sunnyvale, CA, United States
Time filter
Source Type

Chen K.,Wright State University | Bai J.,Microsoft | Zheng Z.,Yahoo! Labs
ACM Transactions on Information Systems | Year: 2011

Machine-learned ranking functions have shown successes in Web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, that is, the problem of insufficient labeled training data, which has significantly slowed the development and deployment of machine-learned ranking functions for different domains. There are two possible approaches to address this problem: (1) combining labeled training data from similar domains with the small targetdomain labeled data for training or (2) using pairwise preference data extracted from user clickthrough log for the target domain for training. In this article, we propose a new approach called tree-based ranking function adaptation (Trada) to effectively utilize these data sources for training cross-domain ranking functions. Tree adaptation assumes that ranking functions are trained with the Stochastic Gradient Boosting Trees method-a gradient boosting method on regression trees. It takes such a ranking function from one domain and tunes its tree-based structure with a small amount of training data from the target domain. The unique features include (1) automatic identification of the part of the model that needs adjustment for the new domain and (2) appropriate weighing of training examples considering both local and global distributions. Based on a novel pairwise loss function that we developed for pairwise learning, the basic tree adaptation algorithm is also extended (Pairwise Trada) to utilize the pairwise preference data from the target domain to further improve the effectiveness of adaptation. Experiments are performed on real datasets to show that tree adaptation can provide better-quality ranking functions for a new domain than other methods. © 2011, ACM. All rights reserved.

Fernandes E.R.,Federal University of Mato Grosso do Sul | Brefeld U.,Lüneburg University | Blanco R.,Yahoo! Labs | Atserias J.,University of the Basque Country
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2016

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings. © Springer International Publishing Switzerland 2016.

BEIJING, October 28, 2016 /PRNewswire/ -- Phoenix New Media Limited (NYSE: FENG), a leading new media company in China ("Phoenix New Media", "ifeng" or the "Company"), today announced that it has appointed media veteran Mr. Tong Chen as Co-President of ifeng, effective October 31, 2016. Mr. Chen has also been appointed by Particle Inc. ("Yidian") as its President, effective the same date. Mr. Chen will be fully responsible for ifeng's content operations management and Yidian's content, product operations, and public relations. "We are extremely excited to have Tong, a seasoned media executive and pioneer of China's internet media ecosystem, join us at ifeng," stated Mr. Shuang Liu, CEO of ifeng. "Over the years, Tong has helped lead the media industry's content consumption evolution and spearheaded the development and success of China's portal, blog and weibo productions. He is a proven leader with significant media and content development expertise and a deep understanding of the changes and direction that are taking place, not only in our industry, but within China overall and recognizes the vast opportunities they present for ifeng. As we continue to improve our cutting-edge technology and provide our users with the highest quality, customized content across our mobile and PC platforms, we aim to leverage Tong's vision and track record of delivering results to lead us in the right direction. We look forward to working with him to build upon the Company's strong foundation and accelerate our success for years to come." Mr. Chen is a well-known, highly accomplished media executive with nearly 20 years of experience in the media and content development space. From November 2014 to October 2016, he served as Vice President of Xiaomi Inc., responsible for its content investment and operation. Prior to joining Xiaomi Inc., Mr. Chen spent 17 years at SINA Corporation and served as its Chief Editor and Executive Vice President from February 2007 to November 2014. Mr. Chen holds an M.B.A. from China-Europe International Business School, an M.A. in Journalism from Renmin University of China, an M.A. in Communications from Beijing Institute of Technology, and a B.S. in electronic engineering from Beijing University of Technology. Phoenix New Media Limited (NYSE: FENG) is a leading new media company providing premium content on an integrated platform across Internet, mobile and TV channels in China. Having originated from a leading global Chinese language TV network based in Hong Kong, Phoenix TV, the Company enables consumers to access professional news and other quality information and share user-generated content on the Internet and through their mobile devices. Phoenix New Media's platform includes its channel, consisting of its website and web-based game platform, its video channel, comprised of its dedicated video vertical and mobile video services, and its mobile channel, including its mobile Internet website, mobile applications and mobile value-added services. Yidian owns Yidian Zixun, which is an interest-oriented mobile App, which integrates cutting-edge search and recommendation technologies to provide its users with unique personalized content. Yidian is dedicated to building a next-generation, interest-based content engine. Yidian was co-founded by Mr. Xuyang Ren, who is the Chairman of Yidian and former vice president of Baidu, Dr. Zhaohui Zheng, who is the CEO of Yidian and former founding head of Yahoo! Labs in China, Dr. Xin Li and Mr. Rongqing Lu, both of whom are Internet technology veterans with years of experiences in top-notch Silicon Valley high-tech companies. This announcement contains forwarda??looking statements. These statements are made under the "safe harbor" provisions of the U.S. Private Securities Litigation Reform Act of 1995. These forwarda??looking statements can be identified by terminology such as "will," "expects," "anticipates," "future," "intends," "plans," "believes," "estimates" and similar statements. Among other things, the business outlook and quotations from management in this announcement, as well as Phoenix New Media's strategic and operational plans, contain forwarda??looking statements. Phoenix New Media may also make written or oral forwarda??looking statements in its periodic reports to the U.S. Securities and Exchange Commission ("SEC") on Forms 20a??F and 6a??K, in its annual report to shareholders, in press releases and other written materials and in oral statements made by its officers, directors or employees to third parties. Statements that are not historical facts, including statements about Phoenix New Media's beliefs and expectations, are forwarda??looking statements. Forwarda??looking statements involve inherent risks and uncertainties. A number of factors could cause actual results to differ materially from those contained in any forwarda??looking statement, including but not limited to the following: the Company's goals and strategies; the Company's future business development, financial condition and results of operations; the expected growth of online and mobile advertising, online video and mobile paid services markets in China; the Company's reliance on online and mobile advertising and MVAS for a majority of its total revenues; the Company's expectations regarding demand for and market acceptance of its services; the Company's expectations regarding maintaining and strengthening its relationships with advertisers, partners and customers; fluctuations in the Company's quarterly operating results; the Company's plans to enhance its user experience, infrastructure and services offerings; the Company's reliance on mobile operators in China to provide most of its MVAS; changes by mobile operators in China to their policies for MVAS; competition in its industry in China; and relevant government policies and regulations relating to the Company. Further information regarding these and other risks is included in the Company's filings with the SEC, including its registration statement on Form Fa??1, as amended, and its annual reports on Form 20a??F. All information provided in this press release and in the attachments is as of the date of this press release, and Phoenix New Media does not undertake any obligation to update any forwarda??looking statement, except as required under applicable law. For investor and media inquiries please contact: To view the original version on PR Newswire, visit:

Miyazawa F.K.,University of Campinas | Pedrosa L.L.C.,University of Campinas | Schouery R.C.S.,University of Campinas | Sviridenko M.,Yahoo! Labs | Wakabayashi Y.,University of Sao Paulo
Algorithmica | Year: 2015

We consider the problem of packing a set of circles into a minimum number of unit square bins. To obtain rational solutions, we use augmented bins of height (Formula presented.), for some arbitrarily small number (Formula presented.). For this problem, we obtain an asymptotic approximation scheme (APTAS) that is polynomial on (Formula presented.), and thus (Formula presented.) may be given as part of the problem input. For the special case that (Formula presented.) is constant, we give a (one dimensional) resource augmentation scheme, that is, we obtain a packing into bins of unit width and height (Formula presented.) using no more than the number of bins in an optimal packing without resource augmentation. Additionally, we obtain an APTAS for the circle strip packing problem, whose goal is to pack a set of circles into a strip of unit width and minimum height. Our algorithms are the first approximation schemes for circle packing problems, and are based on novel ideas of iteratively separating small and large items, and may be extended to a wide range of packing problems that satisfy certain conditions. These extensions comprise problems with different kinds of items, such as regular polygons, or with bins of different shapes, such as circles and spheres. As an example, we obtain APTAS’s for the problems of packing d-dimensional spheres into hypercubes under the (Formula presented.)-norm. © 2015 Springer Science+Business Media New York

Agarwal N.,University of Arkansas at Little Rock | Liu H.,Arizona State University | Tang L.,Yahoo! Labs | Yu P.S.,University of Illinois at Chicago
Social Network Analysis and Mining | Year: 2012

Blogging has become a popular and convenient way to communicate, publish information, share preferences, voice opinions, provide suggestions, report news, and form virtual communities in the Blogosphere. The blogosphere obeys a power law distribution with very few blogs being extremely influential and a huge number of blogs being largely unknown. Regardless of a (multi-author) blog being influential or not, there are influential bloggers. However, the sheer number of such blogs makes it extremely challenging to study each one of them. One way to analyze these blogs is to find influential bloggers and consider them as the community representatives. Influential bloggers can impact fellow bloggers in various ways. In this paper, we study the problem of identifying influential bloggers. We define influential bloggers, investigate their characteristics, discuss the challenges with identification, develop a model to quantify their influence, and pave the way for further research leading to more sophisticated models that enable categorization of various types of influential bloggers. To highlight these issues, we conduct experiments using data from blogs, evaluate multiple facets of the problem, and present a unique and objective evaluation strategy given the subjectivity in defining the influence, in addition to various other analytical capabilities. We conclude with interesting findings and future work. © 2011, Springer-Verlag.

Anandkumar A.,University of California at Irvine | Foster D.P.,Yahoo! Labs | Hsu D.,Columbia University | Kakade S.M.,Microsoft | Liu Y.-K.,U.S. National Institute of Standards and Technology
Algorithmica | Year: 2015

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments. © 2014, Springer Science+Business Media New York.

Han S.,University of Pittsburgh | He D.,University of Pittsburgh | Yue Z.,Yahoo! Labs | Brusilovsky P.,University of Pittsburgh
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2015

The wide adoption of smartphones eliminates the time and location barriers for people’s daily information access, but also limits users’ information exploration activities due to the small mobile screen size. Thus, cross-device web search, where people initialize information needs on one device but complete them on another device, is frequently observed in modern search engines, especially for exploratory information needs. This paper aims to support the cross-device web search, on top of the commonly used context-sensitive retrieval framework, for exploratory tasks. To better model users’ search context, our method not only utilizes the search history (query history and click-through) but also employs the mobile touch interactions (MTI) on mobile devices. To be more specific, we combine MTI’s ability of locating relevant subdocument content [10] with the idea of social navigation that aggregates MTIs from other users who visit the same page. To demonstrate the effectiveness of our proposed approach, we designed a user study to collect cross-device web search logs on three different types of tasks from 24 participants and then compared our approach with two baselines: a traditional full text based relevance feedback approach and a self-MTI based subdocument relevance feedback approach. Our results show that the social navigation-based MTIs outperformed both baselines. A further analysis shows that the performance improvements are related to several factors, including the quality and quantity of click-through documents, task types and users’ search conditions. © Springer International Publishing Switzerland 2015.

Zhang C.,CAS Institute of Software | Wang H.,CAS Institute of Software | Cao L.,Yahoo! Labs | Wang W.,CAS Institute of Software | Xu F.,CAS Institute of Software
Knowledge-Based Systems | Year: 2016

Topic detection as a tool to detect topics from online media attracts much attention. Generally, a topic is characterized by a set of informative keywords/terms. Traditional approaches are usually based on various topic models, such as Latent Dirichlet Allocation (LDA). They cluster terms into a topic by mining semantic relations between terms. However, co-occurrence relations across the document are commonly neglected, which leads to the detection of incomplete information. Furthermore, the inability to discover latent co-occurrence relations via the context or other bridge terms prevents the important but rare topics from being detected. To tackle this issue, we propose a hybrid relations analysis approach to integrate semantic relations and co-occurrence relations for topic detection. Specifically, the approach fuses multiple relations into a term graph and detects topics from the graph using a graph analytical method. It can not only detect topics more effectively by combing mutually complementary relations, but also mine important rare topics by leveraging latent co-occurrence relations. Extensive experiments demonstrate the advantage of our approach over several benchmarks. © 2015 Elsevier B.V. All rights reserved.

Bentley F.,Yahoo! Labs | Cramer H.,Yahoo! Labs | Muller J.,TU Berlin
Personal and Ubiquitous Computing | Year: 2015

Mobile services are integrating into the places and routines of daily life. But which types of places afford the use of various services, and how important are these places in our lives? Through several studies, we have explored the types of places that are most important to people in their cities, and compare these to the place types where different location-based services are used. We find that services were used quite consistently between cities, but that between services places of personal salience, such as parks, are less common in the use of today’s check-in services compared with location-based storytelling systems. Supported with data from the service, we suggest that focusing on selective sharing and storytelling can facilitate use at these more personally meaningful places. © 2014, Springer-Verlag London.

KOZAREVA Z.,Yahoo! Labs | NASTASE V.,Fondazione Bruno Kessler | MIHALCEA R.,University of Michigan
Natural Language Engineering | Year: 2015

Graph structures naturally model connections. In natural language processing (NLP) connections are ubiquitous, on anything between small and web scale. We find them between words – as grammatical, collocation or semantic relations – contributing to the overall meaning, and maintaining the cohesive structure of the text and the discourse unity. We find them between concepts in ontologies or other knowledge repositories – since the early ages of artificial intelligence, associative or semantic networks have been proposed and used as knowledge stores, because they naturally capture the language units and relations between them, and allow for a variety of inference and reasoning processes, simulating some of the functionalities of the human mind. We find them between complete texts or web pages, and between entities in a social network, where they model relations at the web scale. Beyond the more often encountered ‘regular’ graphs, hypergraphs have also appeared in our field to model relations between more than two units. Copyright © Cambridge University Press 2015

Loading Yahoo! Labs collaborators
Loading Yahoo! Labs collaborators