News Article | May 10, 2017
One area where machine learning and neural networks are set to make a huge impact is in financial markets. This field is rich in the two key factors that make machine-learning techniques successful: the computing resources necessary to run powerful neural networks; and the existence of huge annotated data sets that neural networks can learn from. For those who pioneer this approach, there is likely to be much low-hanging fruit. And yet the details about how this fruit is being harvested in real markets are hard to come by. If financial organizations are experimenting with neural networks (and they surely are), they’re playing their cards close to their chests. So insights into the challenges that analysts face in applying neural networks to trading data are eagerly anticipated. Enter Swetava Ganguli and Jared Dunnmon at Stanford University in Palo Alto, California. They ask how good different machine-learning techniques can be at predicting the future price of bonds. For this they compare fairly standard “shallow learning” techniques with more exotic neural-network techniques. And their results highlight the clear advantages that some techniques have over others. First, some background. Bonds are a form of debt, a kind of IOU, that can be traded in an open market. They are different from stocks in various important ways. Stocks are a claim on the future profits of a company, so their value is intimately linked to the profitability of a company for the foreseeable future. For this reason, stocks can rocket in price if a company’s profitability increases. They can also collapse if a company runs into trouble. Bonds are generally much less volatile. They are a loan which the issuer promises to repay on a specific date while paying interest along the way. Their value is determined by the amount of cash they are likely to pay during their lifetime. This is generally just a few years, and this fixed time limit means they do not generally soar in price or collapse catastrophically. And yet their price does vary according to factors such as interest rates, potential changes in interest rates, company performance, and the likelihood the debt will be repaid, the time until the bond must be repaid, and so on. Predicting this future price is an important task for bond traders, and it is not easy. One significant problem is a lack of easily available pricing information. Stock traders can generally see offers, bids, and trades within 15 minutes of them being made. But bond traders are much less well served, say Ganguli and Dunnmon, because “the analogous information on bonds is only available for a fee and even then only in relatively small subsets compared to the overall volume of bond trades.” And that leads to the curious situation. “Many bond prices are days old and do not accurately represent recent market developments,” say the researchers. So Ganguli and Dunnmon ask an obvious question: is it possible to do better by mining the information that is available? Their approach uses a data set of bond prices and other information that was posted to the online predictive modeling and dataset host, Kaggle.com, by Breakthrough Securities in 2014. This data set consists of the last 10 trades in each of 750,000 different bonds, along with a wide range of other parameters for each bond, such as whether it can be called in early, whether a trade was a customer buy or sell or a deal between traders, a fair price estimate based on the hazard associated with the bond, and so on. An important task in this kind of analysis is to work out which of the parameters are useful in predicting future prices and which are not. So at first glance, Ganguli and Dunnmon look for parameters that are highly correlated with each other and therefore reveal redundancy in the data set. Having removed this redundancy, they then test a number of data-mining techniques to see how well, and how quickly, they can predict future prices. The techniques include principal component analysis, which removes redundant parameters and leaves those with true predictive power; generalized learning models, which is a shallow form of machine learning; and neural networks, which can find patterns in highly non-linear data sets and is thought of as a deeper form of machine learning. Perhaps unsurprisingly, the best predictions come from the neural networks, which forecast future prices with an error of around 70 cents. To put that in context, bonds are usually priced around the $1,000 mark. Interestingly, increasing the complexity of the networks has relatively little effect on their accuracy. But the accuracy of a prediction is just one part of its utility. Just as important in the real world is how quickly the prediction can be made. And here, neural networks fall short, taking several hours to work their magic. By contrast, the best shallow learning techniques make predictions with an error of around 80 cents, and they do it in just a few seconds. By combining some of these techniques into a hybrid system, Ganguli and Dunnmon say, they can make a prediction with an error of just 85 cents in just four seconds. It’s not hard to imagine which prediction a bond trader would choose. That’s interesting work that throws some light onto the dilemma that financial institutions must be facing in applying machine-learning techniques to financial data. Sure, there is plenty of scope for improving predictions, but can this be done on a time scale that is relevant? That might give the lie to the reasons there are so few stories about the amazing gains that neural networks can make in the financial world. Perhaps those using these techniques are busy secretly hoovering up the low-hanging fruit before revealing their extraordinary successes to the world. That’s certainly happened in the past. But there is another possibility. Perhaps they are struggling to find ways of doing this work efficiently on a time scale that makes a difference. If so, the low-hanging fruit is still there. Ref: arxiv.org/abs/1705.01142: Machine Learning for Better Models for Predicting Bond Prices
News Article | May 9, 2017
A contest aimed at automating the detection of lung cancer shows how machine learning may be poised to overhaul medical imaging. The challenge offered $1 million in prizes for the algorithms that most accurately identified signs of lung cancer in low-dose computed tomography images. The winning algorithms won’t necessarily be adopted by clinicians, but they could inspire algorithmic innovations that find their way into medical imaging. Low-dose CT scans have shown great potential in recent years for detecting lung cancer earlier. They use less radiation and do not require a contrast dye to be injected into the body. But diagnosis is very difficult, meaning a high number of false positives and too many unnecessary medical procedures. A machine-learning technique known as deep learning has proven especially effective for finding patterns in images in recent years (see “10 Breakthrough Technologies 2013: Deep Learning”). There is now growing hope that this and other machine-learning methods may help improve standards of diagnosis in medicine by automatically recognizing patterns that indicate disease—including ones that are too subtle for the human eye to catch. Deep learning has already been used to detect skin cancer in images with roughly the same number of errors as made by professional dermatologists. And the technique has proven effective for detecting a common cause of blindness in retinal images. There is now growing interest, among doctors and entrepreneurs, in deploying the technique more broadly. As this happens, however, more effort may be needed to make such algorithms explainable (see “The Dark Secret at the Heart of AI”). Keyvan Farahani, a program director at the National Cancer Institute, which supplied the imaging data used in the contest, says reducing the number of false lung cancer diagnoses made from low-dose CT scans would make a real difference for patients. There are about 222,500 new cases of lung cancer in the U.S. each year, according to the American Cancer Society. Farahani says existing software for identifying signs of lung cancer are unreliable. “Preliminary results suggest [the top algorithms] are better than what’s available already,” he says. Farahani does not foresee algorithms taking the place of medical experts, though. “Deep learning will help digest large amounts of data,” he says. “I don’t think they’re going to replace doctors or radiologists.” One of the key challenges in this contest was the fact that only 2,000 images were made available to teams. Machine learning often requires very large data sets in order to develop an effective algorithm. But other data, like details of the equipment used, were included. The winning team employed a neural network and put extra effort into annotating images to provide more data points. It also used an additional data set, and broke the challenge into two parts: identifying nodules and then diagnosing cancer. It isn’t yet clear how the best algorithm might measure up to a doctor, because each algorithm provides a probability rather than a definitive outcome. “We think that explicitly dividing this problem into two stages is critical, which seems also to be what human experts would do,” says Zhe Li, a member of the winning team and a student at Tsinghua University, one of China’s foremost academic institutes. Besides hinting at the potential for deep learning in medical imaging, the lung cancer contest highlights the growing reputation of Chinese AI researchers. The contest, held on the data science site Kaggle, was organized by Booz Allen Hamilton, a management consulting firm that has arranged several other major data science contests before. The $1 million in prize money came from the Laura and John Arnold Foundation. Kaggle was founded in 2010 and acquired earlier this year by Google. The site has proven to be a powerful way of crowdsourcing the development of machine-learning algorithms, and is also a popular way to identify talent. Josh Sullivan, who leads the data science team at Booz Allen Hamilton, says one motivation for the contest is talent acquisition, noting that 238 entrants have also applied for jobs at the company. He adds that the company is making the winning algorithms available for free to maximize the potential benefits to the medical community. Li, of the winning team, says developing something that might save people’s lives is gratifying, but the real reason for taking part was a bit less altruistic. “To be honest, the major motivation is to win the prize money,” he says.
News Article | May 24, 2017
To homeowners, sellers and buyers, the Zestimate home valuation remains an important data point. Combined with other information, like recent home sales, and the guidance of real estate professionals, the Zestimate helps consumers make smarter financial decisions about their homes. To data scientists, the Zestimate home valuation is known as the ultimate algorithm, one of the highest-profile, most accurate and sophisticated examples of machine learning. Zillow Prize will mark the first time that a portion of the proprietary data that powers the Zestimate home valuation will be available to individuals outside of Zillow. Zillow's data science team continually works to improve the accuracy of the Zestimate home valuation, as measured by how close the Zestimate is to the eventual sale price of a home. The U.S. median absolute percent error currently stands at 5 percent, improved from 14 percent in 2006. "We still spend enormous resources on improving the Zestimate, and are proud that with advancements in machine learning and cloud computing, we've brought the error rate down to 5 percent nationwide," said Stan Humphries, creator of the Zestimate home valuation and Zillow Group chief analytics officer. "While that error rate is incredibly low, we know the next round of innovation will come from imaginative solutions involving everything from deep learning to hyperlocal data sets -- the type of work perfect for crowdsourcing within a competitive environment." The contest is being administered by Kaggle, a platform designed to connect data scientists with complex machine learning problems. It will be staggered into two rounds, the public qualifying round which opens today and concludes Jan. 17, 2018i and a private final round that kicks off Feb. 1, 2018 and ends Jan. 15, 2019ii. Contest participants have until Oct. 16, 2017 to register for the qualifying round, download and explore the competition data setiii, and develop a model to improve the Zestimate residual error. The top 100 teams from the qualifying round, those whose solutions most reduce the difference between the Zestimate home valuation and the actual sale price of the homes within the dataset, will be invited to participate in the final round and compete for the $1 million dollar prize. In the final round, the winning team must build an algorithm to predict the actual sale price itself, using innovative data sources to engineer new features that will give the model an edge over other competitors. The home value predictions from each algorithm submission will be evaluated against real-time home sales in August through October 2018. To take home the $1 million dollar grand prize, the winning algorithm must beat Zillow's benchmark accuracy on the final round competition data setiv and enhance the accuracy further than any other competitor. A $100,000 second place prize and $50,000 third place prize will also be awarded in the final round. A total of $50,000 will also be awarded to the top three ranking teams in the qualifying round. Zillow publishes Zestimates on more than 110 million homes across the country based on 7.5 million statistical and machine learning models that examine hundreds of data points on each individual home. To calculate the Zestimate home valuation, Zillow uses data from county and tax assessor records, and direct feeds from hundreds of multiple listing services and brokerages. Additionally, homeowners have the ability to update facts about their homes and see an immediate change to their Zestimate. More than 70 million homes on Zillow have been updated by the community of users. More details on registering and competing for the Zillow Prize are available at www.zillow.com/promo/zillow-prize/. Zillow® is the leading real estate and rental marketplace dedicated to empowering consumers with data, inspiration and knowledge around the place they call home, and connecting them with the best local professionals who can help. Zillow serves the full lifecycle of owning and living in a home: buying, selling, renting, financing, remodeling and more. In addition to Zillow.com®, Zillow operates the most popular suite of mobile real estate apps, with more than two dozen apps across all major platforms. Launched in 2006, Zillow is owned and operated by Zillow Group (NASDAQ: Z and ZG) and headquartered in Seattle. Zillow, Zestimate and Zillow.com are registered trademarks of Zillow, Inc. i Detailed qualifying round timeline is as follows: All submissions are due on October 16, 2017. There will be a three-month evaluation period, which begins on October 17, 2017, when submissions will be evaluated against the actual sale prices of the homes. The final leaderboard will be revealed on January 17, 2018. ii Detailed final round timeline is as follows: All contests submissions are due on June 29, 2018. Because real estate transaction data is public information, there will be a one-month gap followed by a three-month evaluation period, which begins on August 1, 2018, when submissions will be evaluated against the actual sale prices of the homes. The final leaderboard and final prize winners will be revealed on or about January 15, 2019. iii Qualifying round data set will encompass a list of real estate properties in Los Angeles, Orange and Ventura, CA counties. iv Final round participants are challenged to beat the Zillow benchmark model, a modified version of the Zestimate algorithm that will be trained using the exact same data set available to everyone in the final round. This benchmark model has been created for the purposed of this competition and is different than the standard Zestimate displayed on the website. To view the original version on PR Newswire, visit:http://www.prnewswire.com/news-releases/zillow-launches-1-million-zestimate-competition-for-data-scientists-300462943.html
News Article | May 2, 2017
Tens of thousands of dating profile pictures were taken from Tinder by a programmer who then made them publicly available on the web. The dataset contained 40,000 images - half of which were of men, half of women - but it is now offline. Stuart Colianni wrote a program to compile the cache of photos, intending to use them for machine learning research. Tinder accused Mr Colianni of violating its terms of service. Tech news site TechCrunch reported that the dataset originally contained many thousands of pictures from Tinder users in the Bay Area, around San Francisco in California. Some users had "multiple" photos scraped from their profiles, TechCrunch added. "Tinder gives you access to thousands of people within miles of you," wrote Mr Colianni on a web page that previously linked to the data. He explained that he was looking for a way of gathering more detailed data on human faces, adding, "Why not leverage Tinder to build a better, larger facial dataset?" He had added folders containing the photos to Kaggle, a Google-run service that allows programmers to experiment with artificial intelligence (AI) progams. AI algorithms can be trained on large sets of photographs in order to perform facial recognition tasks, but it is not clear what purpose Mr Colianni had in mind for the data. However, over the weekend he posted an update saying that he had removed the pictures. "I have spoken with representatives at Kaggle, and they have received a request from Tinder to remove the dataset," he explained. Tinder said it continued to implement measures "against the automated use" of its API (application programming interface), including steps "to deter and prevent scraping". "This person has violated our terms of service (Sec. 11) and we are taking appropriate action and investigating further," the statement added. The firm also noted that all profile images are available to anyone using the app. Programs that scrape data from the web - to compare prices on e-commerce websites, for example - are very common, noted Glenn Wilkinson, an independent security researcher. "People would have an assumption that their profile is quite private," he explained, but added that getting access to such data is not usually very difficult, even if it is prohibited - as in Tinder's case - by the terms and conditions of the service. There were potential privacy threats that could result from this, said Mr Wilkinson, pointing out that it might be possible to use profile pictures to connect people's identities on separate social media sites. "People do like to keep their dating and work life separate - but if you use the same photo on Tinder and LinkedIn, those things could get linked together," he told the BBC.
News Article | April 20, 2017
Planet, the satellite imaging company that operate the largest commercial Earth imaging constellation in existence, is hosting a new data science competition on the Kaggle platform, with the specific aim of developing machine learning techniques around forestry research. Planet will open up access to thousands of image ‘chips,’ or blocks covering around 1 sauce kilometre, and will give away a total of $60,000 to participants who place in the top three when coming up with new methods for analyzing the data available in these images. Planet notes that each minute, we lose a portion of forest the size of approximately 48 football fields, which is a heck of a lot of forest. The hope is that by releasing this data and hosting this competition, Planet can encourage academics and researchers worldwide to apply advances in machine learning that have been put to great use in efforts like facial recognition and detect, to this pressing ecological problem. “We’re putting together this competition as a way to get people excited about the kinds of data that Planet provides,” explained Planet machine learning engineer Kat Scott in an interview. “Particularly when you’re analyzing imaging and that sort of thing, everyone works off the same sort of jpgs, but our satellites have these sort of superpowers. We get multiple bands at very high resolution, and deep bit depth, so we put together this interesting data set of all the interesting things that are gong on right now that we’d like to monitor. So things like deforestation, new agriculture, what we call artisanal mining which is basically illegal mining, and all these other effects.” The goal is to see if competitors can come up with new ways to monitor these situations with machine learning tools created to make sense of the data. It’s a bit like finding a needle in a haystack, according to Scott, which is why the need exists for this machine learning-driven approach, taken on from multiple teams tackling the data from multiple angles. “These mining areas might only be a couple of kilometers across, but we’re providing you with 37 million acres of imagery, so how do you sort that really quickly, and find those changes that occur over time,” Scott explained. Competitors will submit their results using Kaggle, and Planet retains ownership rights to the IP, but the plan is to release all resulting data under a Creative Commons share-alike license to help continue to drive large improvements in machine learning in this area over time. The first-place winner will receive $30,000 for their efforts, and the second and third-place teams will get $20,000 and $10,000 respectively. More information regarding the competition can be found at the Kaggle website.
News Article | November 20, 2016
Facial recognition software is most commonly known as a tool to help police identify a suspected criminal by using machine learning algorithms to analyze his or her face against a database of thousands or millions of other faces. The larger the database, with a greater variety of facial features, the smarter and more successful the software becomes – effectively learning from its mistakes to improve its accuracy. Now, this type of artificial intelligence is starting to be used in fighting a specific but pervasive type of crime – illegal fishing. Rather than picking out faces, the software tracks the movement of fishing boats to root out illegal behavior. And soon, using a twist on facial recognition, it may be able to recognize when a boat’s haul includes endangered and protected fish. The latest effort to use artificial intelligence to fight illegal fishing is coming from Virginia-based The Nature Conservancy (TNC), which launched a contest on Kaggle – a crowdsourcing site based in San Francisco that uses competitions to advance data science –earlier this week. TNC hopes the winning team will write software to identify specific species of fish. The program will run on cameras, called electronic monitors, which are installed on fishing boats and used for documenting the catch. The software will put a marker at each point in the video when a protected fish is hauled in. Inspectors, who currently spend up to six hours manually reviewing a single 10-hour fishing day, will then be able to go directly to those moments and check a fishing crew’s subsequent actions to determine whether they handled the bycatch legally – by making best efforts to return it to the sea unharmed. TNC expects this approach could cut review time by up to 40% and increase the monitoring on a boat. Despite rules that call for government-approved auditors to be stationed on 5% of commercial fishing boats in the Western and Central Pacific, in practice the auditors are found only around 2% of the fishing boats, including tuna long liners. As a result, fishermen sometimes keep protected fish that they hook – including sharks that are killed for their lucrative fins. In the Pacific’s $7bn tuna fishery, illegal, unreported and unregulated (IUU) fishing not only harms fragile fish stocks, it takes an economic toll of up to $1.5bn. The impact shows up many ways, including lost income for fishermen in the legal marketplace and harm to the tourist economy that sells snorkelers and divers the opportunity to witness protected species in the wild. Worldwide, cost estimates related to IUU reach $23bn annually, and the take represents up to 20% of all seafood. Using technology to track and prevent illegal fishing presents an opportunity for technology companies as the fishing industry seeks ways to comply with the growing demand for transparency from governments and consumers. “If using facial recognition software to track fish were easy, we’d already be using it,” says Matthew Merrifield, TNC’s chief technology officer. Whereas images from security cameras installed inside banks or other buildings are consistent and predictable, “the data from (electronic monitoring) cameras on boats is dirty, because the ships are always moving and the light keeps changing”. Because of the “dirty” data, it will not be easy to write a facial recognition software that can accurately spot protected species when the variable conditions on the high seas could lead to blurry images on the video. Given those challenges, it’s too early to know how large this market will grow, or how quickly. While the use of artificial intelligence to reduce illegal catch is relatively new, the Kaggle contest isn’t the first time it is being applied to the fishing industry. San Francisco-based startup Pelagic Data Systems (PDS) has developed technology that illuminates the activity of some of the 4.6m small-scale commercial fishing boats that ply coastal waters around the world. Using data from a UN’s Food and Agriculture Organization report, PDS estimates that roughly 95% of those boats don’t have the types of communications and tracking radios that larger boats are required to have, partly because the boats are too small or lack the power source to run the radios. PDS installs a solar powered radio with an integrated GPS receiver and cellular modem on boats. The company collects the location data and analyzes it to create a map to show where the boat traveled and deduce its activities, such as where it stopped to set out nets or other gear and where and for how long it hauled in a catch. This data is vital because it shows whether the boat fished inside or outside marine protected areas. The device doesn’t have an on/off switch, a design to prevent a fishing crew from tampering with data collection. The software also generates heat maps to indicate where the heaviest fishing activities are taking place within a coastal region. By pairing that data with the movements of the boats, PDS can also estimate the quantity and even the size of the fish pulled from those waters, says Dave Solomon, CEO of PDS. The company sells its technology to governments, nonprofits, academic researchers and companies in the fishing industry, and expects the number of boats installed with its device to reach 1,000 in regions such as West Africa, North America and Mexico by the end of the year, Solomon says. Some of his customers install the devices in the boats of their suppliers for another reason: to win over customers by demonstrating transparency in fishing practices. Another effort to use data to fight illegal fishing comes from the nonprofit SkyTruth, which tracks the movement of large ships by mining data broadcast by ships and collected by satellites. Its technology is used by Global Fishing Watch, which is backed by Google, Oceana and the Leonardo DiCaprio Foundation. SkyTruth’s data helped the island nation Kirbati to bust illegal fishing operations. But Kaggle has a habit of taking on unusual technical challenges. Earlier this year, it launched a contest with State Farm to develop machine learning software, to be embedded in dashboard cameras, to classify a driver’s behavior, such as being distracted by a smartphone when behind the wheel. Kaggle, with a membership of 650,000 data scientists, hasn’t tackled an environmental problem before. But its CEO, Anthony Goldbloom, thinks the TNC contest could represent the start of environmental competitions on its site because scientists from government agencies and academic institutions are collecting a growing amount of field data using cameras and sensors. TNC contest attracted 44 teams within the first day. Each team has five months to submit its software. While the contest presents an appealing opportunity to do something good for the environment, it doesn’t promise a big payoff. That will make it difficult for software developers and data scientists to raise venture capital to fund their efforts. “Silicon Valley only invests in places with big money [potential],” says Andrew Bosworth, vice president of ads and business platform for Facebook and a board member of land conservation group Peninsula Open Space Trust. “Plus, everyone underestimates [environmental] challenges. Going to the moon is easier than tracking fishing. It really is. So these are big challenges without financial incentives to solve them.” But, he adds, Silicon Valley does provide important undergirding for using technology to solve environmental problems. Bosworth argues that the advancement in core technologies behind things like multiplayer gaming software and smartphone apps has propelled the rise of machine learning and artificial intelligence and lowered the development costs over time. The winning team of the contest will earn a prize of $150,000. Then, as part of its campaign to reduce bycatch and illegal fishing in the region, TNC will work with the governments of Palau, Federated States of Micronesia, Solomon Islands and Marshall Islands to install the software, for free, on the electronic monitors of selected fishing boats. If the software proves effective in reducing the labor costs and improving the accuracy of identifying protected species, then it could become a standard feature in electronic monitors. TNC will own the intellectual property of the winning software and make it free to the equipment makers, which include Satlink and Archipelago. The software could become even more widely used if large retailers such as Walmart begin to require electronic monitors on their vendor’s fleets. But it is still early days for policing the fishing industry. For Melissa Garren, chief scientific officer of PDS, that means the market potential is huge. “We should be treating the oceans more like we treat airspace,” she says. “If we had this lack of visibility in the skies, it would be nuts.”
News Article | November 29, 2016
Google’s artificial intelligence can play the ancient game of Go better than any human. It can identify faces, recognize spoken words, and pull answers to your questions from the web. But the promise is that this same kind of technology will soon handle far more serious work than playing games and feeding smartphone apps. One day, it could help care for the human body. Demonstrating this promise, Google researchers have worked with doctors to develop an AI that can automatically identify diabetic retinopathy, a leading cause blindness among adults. Using deep learning—the same breed of AI that identifies faces, animals, and objects in pictures uploaded to Google’s online services—the system detects the condition by examining retinal photos. In a recent study, it succeeded at about the same rate as human opthamologists, according to a paper published today in the Journal of the American Medical Association. “We were able to take something core to Google—classifying cats and dogs and faces—and apply it to another sort of problem,” says Lily Peng, the physician and biomedical engineer who oversees the project at Google. But the idea behind this AI isn’t to replace doctors. Blindness is often preventable if diabetic retinopathy is caught early. The hope is that the technology can screen far more people for the condition than doctors could on their own, particularly in countries where healthcare is limited, says Peng. The project began, she says, when a Google researcher realized that doctors in his native India were struggling to screen all the locals that needed to be screened. In many places, doctors are already using photos to diagnose the condition without seeing patients in person. “This is a well validated technology that can bring screening services to remote locations where diabetic retinal eye screening is less available,” says David McColloch, a clinical professor of medicine at the University of Washington who specializes in diabetes. That could provide a convenient on-ramp for an AI that automates the process. Peng’s project is part of a much wider effort to detect disease and illness using deep neural networks, pattern recognition systems that can learn discrete tasks by analyzing vast amounts of data. Researchers at DeepMind, a Google AI lab in London, have teamed with Britain’s National Health Service to build various technologies that can automatically detect when patients are at risk of disease and illness, and several other companies, including Salesforce.com and a startup called Enlitic, are exploring similar systems. At Kaggle, an internet site where data scientists compete to solve real-world problems using algorithms, groups have worked to build their own machine learning systems that can automatically identify diabetic retinopathy. Peng is part of Google Brain, a team inside the company that provides AI software and services for everything from search to security to Android. Within this team, she now leads a group spanning dozens of researchers that focuses solely on medical applications for AI. The work on diabetic retinopathy started as a “20 Percent project” about two years ago, before becoming a full-time effort. Researchers began working with hospitals in the Indian cities of Aravind and Sankara that were already collecting retinal photos for doctors to examine. Then the Google team asked more than four dozen doctors in India and the US to identify photos where mini-aneurysms, hemorrhages, and other issues indicated that diabetic patients could be at risk for blindness. At least three doctors reviewed each photo, before Pemng and team fed about 128,000 of these images into their neural network. Ultimately, the system identified the condition slightly more consistently than the original group of doctors. At its most sensitive, the system avoided both false negatives and false positives more than 90 percent of the time, exceeding the National Institutes of Health’s recommended standard of at least 80 percent accuracy and precision for diabetic retinopathy screens. Given the success of deep learning algorithms with other machine vision tasks, the results of the original trial aren’t surprising. But Yaser Sheikh, a professor of computer science at Carnegie Mellon who is working on other forms of AI for healthcare, says that actually moving this kind of thing into the developing world can be difficult. “It is the kind of thing that sounds good, but actually making it work has proven to be far more difficult,” he says. “Getting technology to actually help in the developing world—there are many, many systematic barriers.” But Peng and her team are pushing forward. She says Google is now running additional trials with photos taken specifically to train its diagnostic AI. Preliminary results, she says, indicate that the system once again performs as well as trained doctors. The machines, it seems, are gaining new kinds of sight. And some day, they might save yours.
News Article | February 23, 2017
Falkonry, Inc. (http://falkonry.com/), a leading provider of pattern-based artificial intelligence software to improve overall equipment effectiveness (OEE) announced today that it has received funding from Zetta Venture Partners, Polaris Partners, and Start Smart Labs. In addition to this transaction, Mark Gorenberg, the founding partner of Zetta Venture Partners, will be joining Falkonry’s Board of Directors. “Patterns are prevalent in operational data used by industrial organizations. The automated discovery of patterns in industrial data streams is essential to realize the benefits of smart factories, connected industry, and Industrial IoT (IIoT),” said Mark Gorenberg, new Falkonry Board Member.. “Falkonry is unique in its ability to productize solutions to costly, complex industrial business problems. We foresee that domain-specific solutions using Falkonry software will be broadly deployed to fundamentally transform industries.” Over 200,000 industrial facilities around the world operate manufacturing, energy, and transportation systems. These facilities and their organizations have always endeavored to improve yield, quality, efficiency, and uptime through industrial engineering and process improvement methods. Falkonry is a revolutionary step forward on that journey as it enables the same practitioners to create more intelligent control systems using existing data on their own. “Falkonry is excited to partner with Zetta and Polaris as we roll out our AI-based pattern recognition technology to the Global 2000 industrial customer base,” said Nikunj Mehta, Founder & CEO of Falkonry. “Zetta has the reputation of being a top intelligent enterprise venture partner and we are excited to leverage their grasp of early business models and growth strategy. We are also delighted to have Mark join our board.” “Falkonry has a scalable business model that is essential to growth in the industrial software marketplace,” said Gary Swart of Polaris Partners. “They have developed an ideal solution for a critical need that every industrial company is being challenged to address today in order to remain competitive and succeed in the demanding global marketplace. Falkonry customers are able to find actionable insights from underutilized data to improve productivity, increase yields, and raise efficiency.” Falkonry has global customers in APAC and North America. Falkonry software is offered via a term licensing model and it can be deployed by customers in the cloud or on-premises. Users, such as industrial engineers, can rapidly apply Falkonry software to their own operations after minimal training. In other words, Falkonry customers are up and running and seeing measurable results within weeks, not months. About Zetta Venture Partners Founded in 2013, Zetta Venture Partners is the first fund focused on intelligent enterprise software and has $160 million under management. Current portfolio companies include Appdiff, Clearbit, Domo, Domino Data Lab, EventBoard, Focal Systems, InsideSales, Kaggle, Lilt, Lucid Design Group and Tractable. Visit http://www.zettavp.com for more information. About Polaris Partners Polaris Partners invests in exceptional technology and healthcare companies across all stages of their life cycles. With offices in Boston, San Francisco, and Dublin, Polaris partners globally with an unparalleled network of entrepreneurs, top scientists and emerging innovators who are making significant contributions in their fields and improving the way in which we live and work. For more information, visit http://www.polarispartners.com. About Falkonry, Inc. Falkonry, a Silicon Valley company on the cutting edge of industrial transformation helps the Global 2000 improve operations efficiency through pattern recognition AI. Falkonry democratizes machine learning to discover, recognize, and predict time series patterns for downtime, quality, yield, and efficiency. Falkonry software enables industrial engineers to create real-time process control insights from industrial and IoT time series data. The company’s patent-pending core AI technology continuously improves as it analyzes more input data and expert labels. The Falkonry business model comprises engaged, global, diversified distribution and leading technology partners, including OSIsoft, SAP, Vegam, MDS Technology, Microsoft, PTC, PubNub, and Oracle. Falkonry is headquartered in Santa Clara, California with offices in Seoul, Korea and Mumbai, India. For more information about Falkonry and its products, partners and services, visit http://falkonry.com/ or call +1 (408) 461-9286.
News Article | October 28, 2016
In this context, BBC veteran Nik Gowing and change guru Chris Langdon carried out confidential in-depth interviews with 60 business leaders and their equivalent in public service, asking them what they honestly feel about their situation: in particular their personal ability to spot what is coming, and to put proactive plans in place. In a report entitled Thinking the Impossible: a New Imperative for Leadership in the Digital Age, they share the results. They found that many leaders, once guaranteed anonymity, admit to a dire state of doubt and inadequacy, and many say their insecurity burgeoned in 2014. Gowing and Langdon describe their findings as “deeply troubling”. They talk of 2014 as “the great wake-up” year, because of the multiple geopolitical and strategic disruptions it threw at the world. They found that the insecurity of leadership, and the unwillingness of leaders to square up to “unpalatable” issues, is particularly marked in the digital domain. “In what is fast becoming a new disruptive age of digital public empowerment, big data and metadata,” they write, “leadership finds it hard to recognise these failings, let alone find answers and solutions.” In the light of these conclusions, it is interesting to consider the full extent of the challenges energy-industry leaders face today. Let me consider five themes, all greatly relevant to the energy markets of the future: transition, data, artificial intelligence, robotics, and capital. First, the challenges of transition from fossil fuel dependency to zero carbon. Business models are dying in the incumbency, yet the best operable replacements are far from obvious. Prizes are huge and penalties dire. On the one hand, Tesla can raise $400m of free money from customers tabling deposits for a product (the Model 3) that can’t even be delivered to them for a year. On the other hand, SunEdison can plunge from multi-billion-dollar status to bankruptcy within a year. Second, the world of big data has largely yet to manifest in energy markets. It will. Ever more advanced algorithms have allowed tech companies to grow in recent years from nothing to multiple-billion-dollar valuations. They have done so by employing a broad array of strategies, including use of real-time data (eg, Waze), peer to peer bypassing (Skype), hyper-personalisation (Amazon), the leveraging of assets in the citizenry (Airbnb), leveraging of assets and workers (Uber), sharing of assets (Zipcar), outsourcing of data processing (Kaggle), and people-power financing (Kickstarter). Tech giants with operations that span these strategies, such as Apple, Google and Facebook, have made their first plays in energy. This in a world where, to take one example of massive relevance, the UK national electricity grid achieved a first in October by transmitting data down its own wires. Third, artificial intelligence. This new technology, which has so many potential benefits for society alongside inherent threats, is breaking out all around us. Machine learning of a kind only dreamed of for years is now reality, and applicable to multiple business sectors. Fourth, robotics, with which AI will go hand in hand. Toyota, for example, has launched a $400 robot with the intelligence of a five-year-old for use in homes. Fifth, capital. In the Financial Times, columnist Gillian Tett recently wrote that our future has become “unfathomable”, and investors generally are particularly ill equipped to cope with it. Banks come high on the inadequacy list. They have lost consumer trust on a grand scale since the financial crisis, and are now leaking customers to alternative service providers of many kinds. One bank chief executive says openly that his sector is becoming “not really investable”. The banks are trying to fight back by grasping the changes under way in the use of technology. Some are pitching to central banks a narrative that holds they will become more efficient if they are granted use of a utility settlement coin for clearing and settling blockchain trades. UK banks are readying to roll out robot tellers, aiming to improve customer service via learned empathy. So pity the poor confused and insecure chief executives of the energy industry in their casino, as they try to make sense of a world changing as fast as this. But not too much. Some of them will grab the chips, shuffle them around and place the right bets. These people will come to know what it feels like to ride an exponential company rocket. And, if we are lucky, to improve society as they do so. Visit www.jeremyleggett.net for free download of The Winning of the Carbon War, his account of the dramas in energy and climate from 2013 to last year’s Paris summit. Also available for order as a printed book, with all proceeds going to SolarAid.
News Article | November 10, 2016
The lesson of Trump’s victory is not that data is dead. The lesson is that data is flawed. It has always been flawed—and always will be. Before Donald Trump won the presidency on Tuesday night, everyone from Nate Silver to The New York Times to CNN predicted a Trump loss—and by sizable margins. “The tools that we would normally use to help us assess what happened failed,” Trump campaign reporter Maggie Haberman said in the Times. As Haberman explained, this happened on both sides of the political divide. Appearing on MSNBC, Republican strategist Mike Murphy told America that his crystal ball had shattered. “Tonight, data died,” he said. But this wasn’t so much a failure of the data as it was a failure of the people using the data. It’s a failure of the willingness to believe too blindly in data, not to see it for how flawed it really is. “This is a case study in limits of data science and statistics,” says Anthony Goldbloom, a data scientist who once worked for Australia’s Department of Treasury and now runs a Kaggle, a company dedicated to grooming data scientists. “Statistics and data science gets more credit than it deserves when it’s correct—and more blame than it deserves when it’s incorrect.” With presidential elections, these limits are myriad. The biggest problem is that so little data exists. The United States only elects a president once every four years, and that’s enough time for the world to change significantly. In the process, data models can easily lose their way. In the months before the election, pollsters can ask people about their intentions, but this is harder than it ever was as Americans move away from old-fashioned landline phones towards cell phones, where laws limit such calls. “We sometimes fool ourselves into thinking we have a lot of data,” says Dan Zigmond, who helps oversee data science at Facebook and previously handled data science for YouTube and Google Maps. “But the truth is that there’s just not a lot to build on. There are very small sample sizes, and in some ways, each of these elections is unique.” In the wake of Trump’s victory, Investor’s Business Daily is making the media rounds boasting that it correctly predicted the election’s outcome. Part of the trick, says IBD spokesperson Terry Jones, is that the poll makes more calls to smartphones than landlines, and that the people it calls represent the wide range of people in the country. “We have a representative sample of even the types of phones used,” he says. But this poll was the exception that proved the rule: the polling on the 2016 presidential election was flawed. In the years to come, the electorate—and the technology used by the electorate—will continue to change, ensuring future polls will have to evolve to keep up. As the world makes the internet its primary means of communication, that transition brings with it the promise of even more data—so-called “Big Data,” in Silicon Valley marketing-speak. In the run-up to the election, a company called Networked Insights mined data on Twitter and other social networks in an effort to better predict which way the electoral winds would blow. It had some success—the company predicted a much tighter race than more traditional poll aggregators, and other companies and researchers are moving in similar directions. But this data is also flawed. With a poll, you’re asking direct questions of real people. On the Internet, a company like Networked Insights must not only find accurate ways of determining opinion and intent from a sea of online chatter, but build a good way of separating the fake chatter from the real, the bots from the humans. “As a data scientist, I always think more data is better. But we really don’t know how to interpret this data,” Zigmond says. “It’s hard to figure out how all these variables are related.” ‘The way that bias creeps into any analysis is the way the data is selected.’ Meanwhile, at least among the giants of the Internet, the even bigger promise is that artificial intelligence will produce better predictions that ever before. But this too still depends on data that can never really provide a perfect picture on which to base a prediction. A deep neural network can’t forecast an election unless you give it the data to make the forecast, and the way things work now, this data must be carefully labeled by humans for the machines to understand what they’re ingesting. Yes, AI systems have gotten very good at recognizing faces and objects in photos because people have uploaded so many millions of photos to places like Google and Facebook already, photos whose contents have been labeled such that neural networks can learn to “see” what they depict. The same kind of clean, organized data on presidential elections doesn’t exist to train neural nets. People will always say they’ve cracked the problem. IBD is looking mighty good this week. Meanwhile, as Donald Trump edged towards victory Tuesday, his top data guru, Matt Oczkowski, told WIRED the campaign had known for weeks that a win was possible. “Our models predicted most of these states correctly,” he said. But let’s look at these two with as much skepticism as we’re now giving to Silver and the Times. Naturally, Oczkowski shot down the “data is dead” meme. “Data’s alive and kicking,” he said. “It’s just how you use it and how you buck normal political trends to understand your data.” In a way, he’s right. But this is also part of the problem. We don’t know what Oczkowski’s methods were. And in data science, people tend to pick data that supports their point of view. This is a problem whether you’re using basic statistical analysis or neural networks. “The way that bias creeps into any analysis is the way the data is selected,” Goldbloom says. In other words, the data used to predict the outcome of one of the most important events in recent history was flawed. And so are we.