CA, United States
CA, United States

Time filter

Source Type

News Article | December 12, 2016

Richard Craib is a 29-year-old South African who runs a hedge fund in San Francisco. Or rather, he doesn’t run it. He leaves that to an artificially intelligent system built by several thousand data scientists whose names he doesn’t know. Under the banner of a startup called Numerai, Craib and his team have built technology that masks the fund’s trading data before sharing it with a vast community of anonymous data scientists. Using a method similar to homomorphic encryption, this tech works to ensure that the scientists can’t see the details of the company’s proprietary trades, but also organizes the data so that these scientists can build machine learning models that analyze it and, in theory, learn better ways of trading financial securities. “We give away all our data,” says Craib, who studied mathematics at Cornell University in New York before going to work for an asset management firm in South Africa. “But we convert it into this abstract form where people can build machine learning models for the data without really knowing what they’re doing.” He doesn’t know these data scientists because he recruits them online and pays them for their trouble in a digital currency that can preserve anonymity. “Anyone can submit predictions back to us,” he says. “If they work, we pay them in bitcoin.” The company comes across as a Silicon Valley gag. All that’s missing is the virtual reality. So, to sum up: They aren’t privy to his data. He isn’t privy to them. And because they work from encrypted data, they can’t use their machine learning models on other data—and neither can he. But Craib believes the blind can lead the blind to a better hedge fund. Numerai’s fund has been trading stocks for a year. Though he declines to say just how successful it has been, due to government regulations around the release of such information, he does say it’s making money. And an increasingly large number of big-name investors have pumped money into the company, including the founder of Renaissance Technologies, an enormously successful “quant” hedge fund driven by data analysis. Craib and company have just completed their first round of venture funding, led by the New York venture capital firm Union Square Ventures. Union Square has invested $3 million in the round, with an additional $3 million coming from others. Hedge funds have been exploring the use of machine learning algorithms for a while now, including established Wall Street names like Renaissance and Bridgewater Associates as well as tech startups like Sentient Technologies and Aidyia. But Craib’s venture represents new efforts to crowdsource the creation of these algorithms. Others are working on similar projects, including Two Sigma, a second data-centric New York hedge fund. But Numerai is attempting something far more extreme. The company comes across as some sort of Silicon Valley gag: a tiny startup that seeks to reinvent the financial industry through artificial intelligence, encryption, crowdsourcing, and bitcoin. All that’s missing is the virtual reality. And to be sure, it’s still very early for Numerai. Even one of its investors, Union Square partner Andy Weissman, calls it an “experiment.” But others are working on similar technology that can help build machine learning models more generally from encrypted data, including researchers at Microsoft. This can help companies like Microsoft better protect all the personal information they gather from customers. Oren Etzioni, the CEO of the Allen Institute for AI, says the approach could be particularly useful for Apple, which is pushing into machine learning while taking a hardline stance on data privacy. But such tech can also lead to the kind of AI crowdsourcing that Craib espouses. Craib dreamed up the idea while working for that financial firm in South Africa. He declines to name the firm, but says it runs an asset management fund spanning $15 billion in assets. He helped build machine learning algorithms that could help run this fund, but these weren’t all that complex. At one point, he wanted to share the company’s data with a friend who was doing more advanced machine learning work with neural networks, and the company forbade him. But its stance gave him an idea. “That’s when I started looking into these new ways of encrypting data—looking for a way of sharing the data with him without him being able to steal it and start his own hedge fund,” he says. The result was Numerai. Craib put a million dollars of his own money in the fund, and in April, the company announced $1.5 million in funding from a group that included Howard Morgan, one of the founders of Renaissance Technologies. Morgan has invested again in the Series A round alongside Union Square and First Round Capital. It’s an unorthodox play, to be sure. This is obvious just when you visit the company’s website, where Craib describes the company’s mission in a short video. He’s dressed in black-rimmed glasses and a silver racer jacket, and the video cuts him into a visual landscape reminiscent of The Matrix. “When we saw those videos, we thought: ‘this guy thinks differently,'” says Weissman. As Weissman admits, the question is whether the scheme will work. The trouble with homomorphic encryption is that it can significantly slow down data analysis tasks. “Homomorphic encryption requires a tremendous about of computation time,” says Ameesh Divatia, the CEO of Baffle, a company that’s building encryption similar to what Craib describes. “How do you get it to run inside a business decision window?” Craib says that Numerai has solved the speed problem with its particular form of encryption, but Divatia warns that this may come at the expense of data privacy. According to Raphael Bost, a PhD student at Université de Rennes 1 in France who has explored the use of machine learning with encrypted data, Numerai is likely using a method similar to the one described by Microsoft, where the data is encrypted but not in a completely secure way. “You have to be very careful with side-channels on the algorithm that you are running,” he says of anyone who uses this method. In any event, Numerai is ramping up its effort. Three months ago, about 4,500 data scientists had built about 250,000 machine learning models that drove about 7 billion predictions for the fund. Now, about 7,500 data scientists are involved, building a total of 500,000 models that drive about 28 billion predictions. As with the crowdsourced data science marketplace Kaggle, these data scientists compete to build the best models, and they can earn money in the process. For Numerai, part of the trick is that this is done at high volume. Through a statistics and machine learning technique called stacking or ensembling, Numerai can combine the best of myriad algorithms to create a more powerful whole. Though most of these data scientists are anonymous, a small handful are not, including Phillip Culliton of Buffalo, New York, who also works for a data analysis company called Multimodel Research, which has a grant from the National Science Foundation. He has spent many years competing in data science competitions on Kaggle and sees Numerai as a more attractive option. “Kaggle is lovely and I enjoy competing, but only the top few competitors get paid, and only in some competitions,” he says. “The distribution of funds at Numerai among the top 100 or so competitors, in fairly large amounts at the top of the leaderboard, is quite nice.” Each week, one hundred scientists earn bitcoin, with the company paying out over $150,000 in the digital currency so far. If the fund reaches a billion dollars under management, Craib says, it would pay out over $1 million each month to its data scientists. Culliton says it’s more difficult to work with the encrypted data and draw his own conclusions from it, and another Numerai regular, Jim Fleming, who helps run a data science consultancy called the Fomoro Group, says much the same thing. But this isn’t necessarily a problem. After all, machine learning is more about the machine drawing the conclusions. In many cases, even when working with unencrypted data, Culliton doesn’t know what it actually represents, but he can still use it to build machine learning models. “Encrypted data is like turning off the sound at the party,” Culliton says. “You’re no longer listening in on people’s private conversations, but you can still get very good signal on how close they feel to one other.” If this works across Numerai’s larger community of data scientists, as Richard Craib hopes it will, Wall Street will be listening more closely, too.

News Article | November 29, 2016

Google’s artificial intelligence can play the ancient game of Go better than any human. It can identify faces, recognize spoken words, and pull answers to your questions from the web. But the promise is that this same kind of technology will soon handle far more serious work than playing games and feeding smartphone apps. One day, it could help care for the human body. Demonstrating this promise, Google researchers have worked with doctors to develop an AI that can automatically identify diabetic retinopathy, a leading cause blindness among adults. Using deep learning—the same breed of AI that identifies faces, animals, and objects in pictures uploaded to Google’s online services—the system detects the condition by examining retinal photos. In a recent study, it succeeded at about the same rate as human opthamologists, according to a paper published today in the Journal of the American Medical Association. “We were able to take something core to Google—classifying cats and dogs and faces—and apply it to another sort of problem,” says Lily Peng, the physician and biomedical engineer who oversees the project at Google. But the idea behind this AI isn’t to replace doctors. Blindness is often preventable if diabetic retinopathy is caught early. The hope is that the technology can screen far more people for the condition than doctors could on their own, particularly in countries where healthcare is limited, says Peng. The project began, she says, when a Google researcher realized that doctors in his native India were struggling to screen all the locals that needed to be screened. In many places, doctors are already using photos to diagnose the condition without seeing patients in person. “This is a well validated technology that can bring screening services to remote locations where diabetic retinal eye screening is less available,” says David McColloch, a clinical professor of medicine at the University of Washington who specializes in diabetes. That could provide a convenient on-ramp for an AI that automates the process. Peng’s project is part of a much wider effort to detect disease and illness using deep neural networks, pattern recognition systems that can learn discrete tasks by analyzing vast amounts of data. Researchers at DeepMind, a Google AI lab in London, have teamed with Britain’s National Health Service to build various technologies that can automatically detect when patients are at risk of disease and illness, and several other companies, including and a startup called Enlitic, are exploring similar systems. At Kaggle, an internet site where data scientists compete to solve real-world problems using algorithms, groups have worked to build their own machine learning systems that can automatically identify diabetic retinopathy. Peng is part of Google Brain, a team inside the company that provides AI software and services for everything from search to security to Android. Within this team, she now leads a group spanning dozens of researchers that focuses solely on medical applications for AI. The work on diabetic retinopathy started as a “20 Percent project” about two years ago, before becoming a full-time effort. Researchers began working with hospitals in the Indian cities of Aravind and Sankara that were already collecting retinal photos for doctors to examine. Then the Google team asked more than four dozen doctors in India and the US to identify photos where mini-aneurysms, hemorrhages, and other issues indicated that diabetic patients could be at risk for blindness. At least three doctors reviewed each photo, before Pemng and team fed about 128,000 of these images into their neural network. Ultimately, the system identified the condition slightly more consistently than the original group of doctors. At its most sensitive, the system avoided both false negatives and false positives more than 90 percent of the time, exceeding the National Institutes of Health’s recommended standard of at least 80 percent accuracy and precision for diabetic retinopathy screens. Given the success of deep learning algorithms with other machine vision tasks, the results of the original trial aren’t surprising. But Yaser Sheikh, a professor of computer science at Carnegie Mellon who is working on other forms of AI for healthcare, says that actually moving this kind of thing into the developing world can be difficult. “It is the kind of thing that sounds good, but actually making it work has proven to be far more difficult,” he says. “Getting technology to actually help in the developing world—there are many, many systematic barriers.” But Peng and her team are pushing forward. She says Google is now running additional trials with photos taken specifically to train its diagnostic AI. Preliminary results, she says, indicate that the system once again performs as well as trained doctors. The machines, it seems, are gaining new kinds of sight. And some day, they might save yours.

News Article | November 20, 2016

Facial recognition software is most commonly known as a tool to help police identify a suspected criminal by using machine learning algorithms to analyze his or her face against a database of thousands or millions of other faces. The larger the database, with a greater variety of facial features, the smarter and more successful the software becomes – effectively learning from its mistakes to improve its accuracy. Now, this type of artificial intelligence is starting to be used in fighting a specific but pervasive type of crime – illegal fishing. Rather than picking out faces, the software tracks the movement of fishing boats to root out illegal behavior. And soon, using a twist on facial recognition, it may be able to recognize when a boat’s haul includes endangered and protected fish. The latest effort to use artificial intelligence to fight illegal fishing is coming from Virginia-based The Nature Conservancy (TNC), which launched a contest on Kaggle – a crowdsourcing site based in San Francisco that uses competitions to advance data science –earlier this week. TNC hopes the winning team will write software to identify specific species of fish. The program will run on cameras, called electronic monitors, which are installed on fishing boats and used for documenting the catch. The software will put a marker at each point in the video when a protected fish is hauled in. Inspectors, who currently spend up to six hours manually reviewing a single 10-hour fishing day, will then be able to go directly to those moments and check a fishing crew’s subsequent actions to determine whether they handled the bycatch legally – by making best efforts to return it to the sea unharmed. TNC expects this approach could cut review time by up to 40% and increase the monitoring on a boat. Despite rules that call for government-approved auditors to be stationed on 5% of commercial fishing boats in the Western and Central Pacific, in practice the auditors are found only around 2% of the fishing boats, including tuna long liners. As a result, fishermen sometimes keep protected fish that they hook – including sharks that are killed for their lucrative fins. In the Pacific’s $7bn tuna fishery, illegal, unreported and unregulated (IUU) fishing not only harms fragile fish stocks, it takes an economic toll of up to $1.5bn. The impact shows up many ways, including lost income for fishermen in the legal marketplace and harm to the tourist economy that sells snorkelers and divers the opportunity to witness protected species in the wild. Worldwide, cost estimates related to IUU reach $23bn annually, and the take represents up to 20% of all seafood. Using technology to track and prevent illegal fishing presents an opportunity for technology companies as the fishing industry seeks ways to comply with the growing demand for transparency from governments and consumers. “If using facial recognition software to track fish were easy, we’d already be using it,” says Matthew Merrifield, TNC’s chief technology officer. Whereas images from security cameras installed inside banks or other buildings are consistent and predictable, “the data from (electronic monitoring) cameras on boats is dirty, because the ships are always moving and the light keeps changing”. Because of the “dirty” data, it will not be easy to write a facial recognition software that can accurately spot protected species when the variable conditions on the high seas could lead to blurry images on the video. Given those challenges, it’s too early to know how large this market will grow, or how quickly. While the use of artificial intelligence to reduce illegal catch is relatively new, the Kaggle contest isn’t the first time it is being applied to the fishing industry. San Francisco-based startup Pelagic Data Systems (PDS) has developed technology that illuminates the activity of some of the 4.6m small-scale commercial fishing boats that ply coastal waters around the world. Using data from a UN’s Food and Agriculture Organization report, PDS estimates that roughly 95% of those boats don’t have the types of communications and tracking radios that larger boats are required to have, partly because the boats are too small or lack the power source to run the radios. PDS installs a solar powered radio with an integrated GPS receiver and cellular modem on boats. The company collects the location data and analyzes it to create a map to show where the boat traveled and deduce its activities, such as where it stopped to set out nets or other gear and where and for how long it hauled in a catch. This data is vital because it shows whether the boat fished inside or outside marine protected areas. The device doesn’t have an on/off switch, a design to prevent a fishing crew from tampering with data collection. The software also generates heat maps to indicate where the heaviest fishing activities are taking place within a coastal region. By pairing that data with the movements of the boats, PDS can also estimate the quantity and even the size of the fish pulled from those waters, says Dave Solomon, CEO of PDS. The company sells its technology to governments, nonprofits, academic researchers and companies in the fishing industry, and expects the number of boats installed with its device to reach 1,000 in regions such as West Africa, North America and Mexico by the end of the year, Solomon says. Some of his customers install the devices in the boats of their suppliers for another reason: to win over customers by demonstrating transparency in fishing practices. Another effort to use data to fight illegal fishing comes from the nonprofit SkyTruth, which tracks the movement of large ships by mining data broadcast by ships and collected by satellites. Its technology is used by Global Fishing Watch, which is backed by Google, Oceana and the Leonardo DiCaprio Foundation. SkyTruth’s data helped the island nation Kirbati to bust illegal fishing operations. But Kaggle has a habit of taking on unusual technical challenges. Earlier this year, it launched a contest with State Farm to develop machine learning software, to be embedded in dashboard cameras, to classify a driver’s behavior, such as being distracted by a smartphone when behind the wheel. Kaggle, with a membership of 650,000 data scientists, hasn’t tackled an environmental problem before. But its CEO, Anthony Goldbloom, thinks the TNC contest could represent the start of environmental competitions on its site because scientists from government agencies and academic institutions are collecting a growing amount of field data using cameras and sensors. TNC contest attracted 44 teams within the first day. Each team has five months to submit its software. While the contest presents an appealing opportunity to do something good for the environment, it doesn’t promise a big payoff. That will make it difficult for software developers and data scientists to raise venture capital to fund their efforts. “Silicon Valley only invests in places with big money [potential],” says Andrew Bosworth, vice president of ads and business platform for Facebook and a board member of land conservation group Peninsula Open Space Trust. “Plus, everyone underestimates [environmental] challenges. Going to the moon is easier than tracking fishing. It really is. So these are big challenges without financial incentives to solve them.” But, he adds, Silicon Valley does provide important undergirding for using technology to solve environmental problems. Bosworth argues that the advancement in core technologies behind things like multiplayer gaming software and smartphone apps has propelled the rise of machine learning and artificial intelligence and lowered the development costs over time. The winning team of the contest will earn a prize of $150,000. Then, as part of its campaign to reduce bycatch and illegal fishing in the region, TNC will work with the governments of Palau, Federated States of Micronesia, Solomon Islands and Marshall Islands to install the software, for free, on the electronic monitors of selected fishing boats. If the software proves effective in reducing the labor costs and improving the accuracy of identifying protected species, then it could become a standard feature in electronic monitors. TNC will own the intellectual property of the winning software and make it free to the equipment makers, which include Satlink and Archipelago. The software could become even more widely used if large retailers such as Walmart begin to require electronic monitors on their vendor’s fleets. But it is still early days for policing the fishing industry. For Melissa Garren, chief scientific officer of PDS, that means the market potential is huge. “We should be treating the oceans more like we treat airspace,” she says. “If we had this lack of visibility in the skies, it would be nuts.”

News Article | February 23, 2017

Falkonry, Inc. (, a leading provider of pattern-based artificial intelligence software to improve overall equipment effectiveness (OEE) announced today that it has received funding from Zetta Venture Partners, Polaris Partners, and Start Smart Labs. In addition to this transaction, Mark Gorenberg, the founding partner of Zetta Venture Partners, will be joining Falkonry’s Board of Directors. “Patterns are prevalent in operational data used by industrial organizations. The automated discovery of patterns in industrial data streams is essential to realize the benefits of smart factories, connected industry, and Industrial IoT (IIoT),” said Mark Gorenberg, new Falkonry Board Member.. “Falkonry is unique in its ability to productize solutions to costly, complex industrial business problems. We foresee that domain-specific solutions using Falkonry software will be broadly deployed to fundamentally transform industries.” Over 200,000 industrial facilities around the world operate manufacturing, energy, and transportation systems. These facilities and their organizations have always endeavored to improve yield, quality, efficiency, and uptime through industrial engineering and process improvement methods. Falkonry is a revolutionary step forward on that journey as it enables the same practitioners to create more intelligent control systems using existing data on their own. “Falkonry is excited to partner with Zetta and Polaris as we roll out our AI-based pattern recognition technology to the Global 2000 industrial customer base,” said Nikunj Mehta, Founder & CEO of Falkonry. “Zetta has the reputation of being a top intelligent enterprise venture partner and we are excited to leverage their grasp of early business models and growth strategy. We are also delighted to have Mark join our board.” “Falkonry has a scalable business model that is essential to growth in the industrial software marketplace,” said Gary Swart of Polaris Partners. “They have developed an ideal solution for a critical need that every industrial company is being challenged to address today in order to remain competitive and succeed in the demanding global marketplace. Falkonry customers are able to find actionable insights from underutilized data to improve productivity, increase yields, and raise efficiency.” Falkonry has global customers in APAC and North America. Falkonry software is offered via a term licensing model and it can be deployed by customers in the cloud or on-premises. Users, such as industrial engineers, can rapidly apply Falkonry software to their own operations after minimal training. In other words, Falkonry customers are up and running and seeing measurable results within weeks, not months. About Zetta Venture Partners Founded in 2013, Zetta Venture Partners is the first fund focused on intelligent enterprise software and has $160 million under management. Current portfolio companies include Appdiff, Clearbit, Domo, Domino Data Lab, EventBoard, Focal Systems, InsideSales, Kaggle, Lilt, Lucid Design Group and Tractable. Visit for more information. About Polaris Partners Polaris Partners invests in exceptional technology and healthcare companies across all stages of their life cycles. With offices in Boston, San Francisco, and Dublin, Polaris partners globally with an unparalleled network of entrepreneurs, top scientists and emerging innovators who are making significant contributions in their fields and improving the way in which we live and work. For more information, visit About Falkonry, Inc. Falkonry, a Silicon Valley company on the cutting edge of industrial transformation helps the Global 2000 improve operations efficiency through pattern recognition AI. Falkonry democratizes machine learning to discover, recognize, and predict time series patterns for downtime, quality, yield, and efficiency. Falkonry software enables industrial engineers to create real-time process control insights from industrial and IoT time series data. The company’s patent-pending core AI technology continuously improves as it analyzes more input data and expert labels. The Falkonry business model comprises engaged, global, diversified distribution and leading technology partners, including OSIsoft, SAP, Vegam, MDS Technology, Microsoft, PTC, PubNub, and Oracle. Falkonry is headquartered in Santa Clara, California with offices in Seoul, Korea and Mumbai, India. For more information about Falkonry and its products, partners and services, visit or call +1 (408) 461-9286.

News Article | October 28, 2016

In this context, BBC veteran Nik Gowing and change guru Chris Langdon carried out confidential in-depth interviews with 60 business leaders and their equivalent in public service, asking them what they honestly feel about their situation: in particular their personal ability to spot what is coming, and to put proactive plans in place. In a report entitled Thinking the Impossible: a New Imperative for Leadership in the Digital Age, they share the results. They found that many leaders, once guaranteed anonymity, admit to a dire state of doubt and inadequacy, and many say their insecurity burgeoned in 2014. Gowing and Langdon describe their findings as “deeply troubling”. They talk of 2014 as “the great wake-up” year, because of the multiple geopolitical and strategic disruptions it threw at the world. They found that the insecurity of leadership, and the unwillingness of leaders to square up to “unpalatable” issues, is particularly marked in the digital domain. “In what is fast becoming a new disruptive age of digital public empowerment, big data and metadata,” they write, “leadership finds it hard to recognise these failings, let alone find answers and solutions.” In the light of these conclusions, it is interesting to consider the full extent of the challenges energy-industry leaders face today. Let me consider five themes, all greatly relevant to the energy markets of the future: transition, data, artificial intelligence, robotics, and capital. First, the challenges of transition from fossil fuel dependency to zero carbon. Business models are dying in the incumbency, yet the best operable replacements are far from obvious.  Prizes are huge and penalties dire. On the one hand, Tesla can raise $400m of free money from customers tabling deposits for a product (the Model 3) that can’t even be delivered to them for a year. On the other hand, SunEdison can plunge from multi-billion-dollar status to bankruptcy within a year. Second, the world of big data has largely yet to manifest in energy markets. It will. Ever more advanced algorithms have allowed tech companies to grow in recent years from nothing to multiple-billion-dollar valuations. They have done so by employing a broad array of strategies, including use of real-time data (eg, Waze), peer to peer bypassing (Skype), hyper-personalisation (Amazon), the leveraging of assets in the citizenry (Airbnb), leveraging of assets and workers (Uber), sharing of assets (Zipcar), outsourcing of data processing (Kaggle), and people-power financing (Kickstarter). Tech giants with operations that span these strategies, such as Apple, Google and Facebook, have made their first plays in energy. This in a world where, to take one example of massive relevance, the UK national electricity grid achieved a first in October by transmitting data down its own wires. Third, artificial intelligence. This new technology, which has so many potential benefits for society alongside inherent threats, is breaking out all around us. Machine learning of a kind only dreamed of for years is now reality, and applicable to multiple business sectors. Fourth, robotics, with which AI will go hand in hand. Toyota, for example, has launched a $400 robot with the intelligence of a five-year-old for use in homes. Fifth, capital. In the Financial Times, columnist Gillian Tett recently wrote that our future has become “unfathomable”, and investors generally are particularly ill equipped to cope with it. Banks come high on the inadequacy list. They have lost consumer trust on a grand scale since the financial crisis, and are now leaking customers to alternative service providers of many kinds. One bank chief executive says openly that his sector is becoming “not really investable”. The banks are trying to fight back by grasping the changes under way in the use of technology. Some are pitching to central banks a narrative that holds they will become more efficient if they are granted use of a utility settlement coin for clearing and settling blockchain trades. UK banks are readying to roll out robot tellers, aiming to improve customer service via learned empathy. So pity the poor confused and insecure chief executives of the energy industry in their casino, as they try to make sense of a world changing as fast as this. But not too much. Some of them will grab the chips, shuffle them around and place the right bets. These people will come to know what it feels like to ride an exponential company rocket. And, if we are lucky, to improve society as they do so. Visit for free download of The Winning of the Carbon War, his account of the dramas in energy and climate from 2013 to last year’s Paris summit. Also available for order as a printed book, with all proceeds going to SolarAid.

Guyon I.,ChaLearn | Athitsos V.,University of Texas at Arlington | Jangyodsuk P.,University of Texas at Arlington | Hamner B.,Kaggle | Escalante H.J.,National Institute of Astrophysics, Optics and Electronics
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops | Year: 2012

We organized a challenge on gesture recognition: http://gesture.chalearn. org. We made available a large database of 50,000 hand and arm gestures videorecorded with a Kinect™ camera providing both RGB and depth images. We used the Kaggle platform to automate submissions and entry evaluation. The focus of the challenge is on "one-shot-learning", which means training gesture classifiers from a single video clip example of each gesture. The data are split into subtasks, each using a small vocabulary of 8 to 12 gestures, related to a particular application domain: hand signals used by divers, finger codes to represent numerals, signals used by referees, marchalling signals to guide vehicles or aircrafts, etc. We limited the problem to single users for each task and to the recognition of short sequences of gestures punctuated by returning the hands to a resting position. This situation is encountered in computer interface applications, including robotics, education, and gaming. The challenge setting fosters progress in transfer learning by providing for training a large number of sub-tasks related to, but different from the tasks on which the competitors are tested. © 2012 IEEE.

Guyon I.,ChaLearn | Athitsos V.,University of Texas at Arlington | Jangyodsuk P.,University of Texas at Arlington | Escalante H.J.,National Institute of Astrophysics, Optics and Electronics | Hamner B.,Kaggle
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

The Kinect™ camera has revolutionized the field of computer vision by making available low cost 3D cameras recording both RGB and depth data, using a structured light infrared sensor. We recorded and made available a large database of 50,000 hand and arm gestures. With these data, we organized a challenge emphasizing the problem of learning from very few examples. The data are split into subtasks, each using a small vocabulary of 8 to 12 gestures, related to a particular application domain: hand signals used by divers, finger codes to represent numerals, signals used by referees, Marshalling signals to guide vehicles or aircrafts, etc. We limited the problem to single users for each task and to the recognition of short sequences of gestures punctuated by returning the hands to a resting position. This situation is encountered in computer interface applications, including robotics, education, and gaming. The challenge setting fosters progress in transfer learning by providing for training a large number of subtasks related to, but different from the tasks on which the competitors are tested. © 2013 Springer-Verlag.

News Article | November 10, 2016

The lesson of Trump’s victory is not that data is dead. The lesson is that data is flawed. It has always been flawed—and always will be. Before Donald Trump won the presidency on Tuesday night, everyone from Nate Silver to The New York Times to CNN predicted a Trump loss—and by sizable margins. “The tools that we would normally use to help us assess what happened failed,” Trump campaign reporter Maggie Haberman said in the Times. As Haberman explained, this happened on both sides of the political divide. Appearing on MSNBC, Republican strategist Mike Murphy told America that his crystal ball had shattered. “Tonight, data died,” he said. But this wasn’t so much a failure of the data as it was a failure of the people using the data. It’s a failure of the willingness to believe too blindly in data, not to see it for how flawed it really is. “This is a case study in limits of data science and statistics,” says Anthony Goldbloom, a data scientist who once worked for Australia’s Department of Treasury and now runs a Kaggle, a company dedicated to grooming data scientists. “Statistics and data science gets more credit than it deserves when it’s correct—and more blame than it deserves when it’s incorrect.” With presidential elections, these limits are myriad. The biggest problem is that so little data exists. The United States only elects a president once every four years, and that’s enough time for the world to change significantly. In the process, data models can easily lose their way. In the months before the election, pollsters can ask people about their intentions, but this is harder than it ever was as Americans move away from old-fashioned landline phones towards cell phones, where laws limit such calls. “We sometimes fool ourselves into thinking we have a lot of data,” says Dan Zigmond, who helps oversee data science at Facebook and previously handled data science for YouTube and Google Maps. “But the truth is that there’s just not a lot to build on. There are very small sample sizes, and in some ways, each of these elections is unique.” In the wake of Trump’s victory, Investor’s Business Daily is making the media rounds boasting that it correctly predicted the election’s outcome. Part of the trick, says IBD spokesperson Terry Jones, is that the poll makes more calls to smartphones than landlines, and that the people it calls represent the wide range of people in the country. “We have a representative sample of even the types of phones used,” he says. But this poll was the exception that proved the rule: the polling on the 2016 presidential election was flawed. In the years to come, the electorate—and the technology used by the electorate—will continue to change, ensuring future polls will have to evolve to keep up. As the world makes the internet its primary means of communication, that transition brings with it the promise of even more data—so-called “Big Data,” in Silicon Valley marketing-speak. In the run-up to the election, a company called Networked Insights mined data on Twitter and other social networks in an effort to better predict which way the electoral winds would blow. It had some success—the company predicted a much tighter race than more traditional poll aggregators, and other companies and researchers are moving in similar directions. But this data is also flawed. With a poll, you’re asking direct questions of real people. On the Internet, a company like Networked Insights must not only find accurate ways of determining opinion and intent from a sea of online chatter, but build a good way of separating the fake chatter from the real, the bots from the humans. “As a data scientist, I always think more data is better. But we really don’t know how to interpret this data,” Zigmond says. “It’s hard to figure out how all these variables are related.” ‘The way that bias creeps into any analysis is the way the data is selected.’ Meanwhile, at least among the giants of the Internet, the even bigger promise is that artificial intelligence will produce better predictions that ever before. But this too still depends on data that can never really provide a perfect picture on which to base a prediction. A deep neural network can’t forecast an election unless you give it the data to make the forecast, and the way things work now, this data must be carefully labeled by humans for the machines to understand what they’re ingesting. Yes, AI systems have gotten very good at recognizing faces and objects in photos because people have uploaded so many millions of photos to places like Google and Facebook already, photos whose contents have been labeled such that neural networks can learn to “see” what they depict. The same kind of clean, organized data on presidential elections doesn’t exist to train neural nets. People will always say they’ve cracked the problem. IBD is looking mighty good this week. Meanwhile, as Donald Trump edged towards victory Tuesday, his top data guru, Matt Oczkowski, told WIRED the campaign had known for weeks that a win was possible. “Our models predicted most of these states correctly,” he said. But let’s look at these two with as much skepticism as we’re now giving to Silver and the Times. Naturally, Oczkowski shot down the “data is dead” meme. “Data’s alive and kicking,” he said. “It’s just how you use it and how you buck normal political trends to understand your data.” In a way, he’s right. But this is also part of the problem. We don’t know what Oczkowski’s methods were. And in data science, people tend to pick data that supports their point of view. This is a problem whether you’re using basic statistical analysis or neural networks. “The way that bias creeps into any analysis is the way the data is selected,” Goldbloom says. In other words, the data used to predict the outcome of one of the most important events in recent history was flawed. And so are we.

When crowdsourced labor company CrowdFlower recently raised funding from Microsoft, co-founder Lukas Biewald told me his team was focused on technology that allows businesses to supplement algorithms and artificial intelligence with human judgment from crowdsourced labor pools. Now CrowdFlower bringing on more experts to shape the development of that technology. Specifically it’s formed a three-person scientific advisory board, made up of Barney Pell (founder/co-founder of startups including Powerset, LocoMobi and Moon Express, who also led an artificial intelligence team at NASA), Anthony Goldbloom (founder and CEO of Kaggle) and Pete Warden (a staff research engineer at Google, where he’s the technical lead on the TensorFlow Mobile machine learning project). “With all these different customers and all these different applications, we wanted them to be confident that they’re going to get a high-quality algorithm,” said Biewald. (He was previously CrowdFlower’s CEO and now serves as its chief data scientist and executive chairman. He’s also a friend of mine from college —although we really only talk about CrowdFlower now, which is kinda sad when you think about it.) “One way to make sure all the product decisions we make really reflect the cutting edge was to get some of the world leaders come in and look at our product.” Pell, who will be co-chairing the advisory board with Biewald, has a long history with CrowdFlower — he’s already an investor in the company, and he noted that Biewald first came up with the idea while working at Powerset. He said CrowdFlower’s “human in the loop” approach, where humans can help provide training and quality control to AI, could become increasingly important. “When people think about AI, they’re generally thinking about 100 percent automated solutions,”  Pell said. But the reality is, “If there’s people in the loop somewhere, then where you’re really confident, [the algorithm] can handle those cases, and then the rest of the marginal cases go to people.” Pell added that CrowdFlower (which launched at the TechCrunch50 conference) works with customers whose technology might seem fully automated at first, such as self-driving cars — but even in that case, they still need humans to help with train the vision systems and help with the labeling. As for the board’s role, Pell said it will both look at individual products under development and at the broader CrowdFlower roadmap. I brought up another possible benefit: In an industry where “artificial intelligence” and “machine learning” have become buzzwords thrown around by every startup, this kind of board can add an important layer of credibility. “The real challenge here for any company that’s trying to do machine learning is that there’s so much research that it’s impossible for anybody to synthesize it all,” Biewald replied. “I think you’ll see more and more companies trying to adopt an approach like this.”

Goldbloom A.,Kaggle
Proceedings - IEEE International Conference on Data Mining, ICDM | Year: 2010

Data prediction competitions facilitate a step change in the evolution of analytics outsourcing. They offer companies and researchers a cost effective way to harness the'cognitive surplus' of data scientists who are hungry for real-world data and motivated to excel whatever the prize. Competitions are effective because there are any number of techniques that can be applied to any modeling problem but we can't know in advance which will be most effective. By exposing the problem to a wide audience, competitions are an effective way to reach the frontier of what is possible from a given dataset. © 2010 IEEE.

Loading Kaggle collaborators
Loading Kaggle collaborators