Time filter

Source Type

United States

Gattani A.,AtWalmartLabs | Lamba D.S.,AtWalmartLabs | Garera N.,AtWalmartLabs | Tiwari M.,LinkedIn | And 7 more authors.
Proceedings of the VLDB Endowment | Year: 2013

Many applications that process social data, such as tweets, must extract entities from tweets (e.g., "Obama" and "Hawaii" in "Obama went to Hawaii"), link them to entities in a knowledge base (e.g., Wikipedia), classify tweets into a set of predefined topics, and assign descriptive tags to tweets. Few solutions exist today to solve these problems for social data, and they are limited in important ways. Further, even though several industrial systems such as OpenCalais have been deployed to solve these problems for text data, little if any has been published about them, and it is unclear if any of the systems has been tailored for social media. In this paper we describe in depth an end-to-end indus-trial system that solves these problems for social data. The system has been developed and used heavily in the past three years, first at Kosmix, a startup, and later at Wal-martLabs. We show how our system uses a Wikipedia-based global "real-time" knowledge base that is well suited for so-cial data, how we interleave the tasks in a synergistic fash-ion, how we generate and use contexts and social signals to improve task accuracy, and how we scale the system to the entire Twitter firehose. We describe experiments that show that our system outperforms current approaches. Fi-nally we describe applications of the system at Kosmix and WalmartLabs, and lessons learned. © 2013 VLDB. Source

Lam W.,AtWalmartLabs | Liu L.,AtWalmartLabs | Prasad S.,AtWalmartLabs | Rajaraman A.,AtWalmartLabs | And 3 more authors.
Proceedings of the VLDB Endowment | Year: 2012

MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Ex-amples of such data include sensor data streams, the Twit-ter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applica-tions and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this ques-tion, as carried out at Kosmix and WalmartLabs. We de-scribeMapUpdate, a framework likeMapReduce, but specif-ically developed for fast data. We describe Muppet, our im-plementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used exten-sively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce. © 2012 VLDB Endowment. Source

Discover hidden collaborations