by Alexander Gelfand
The Florida Engineer
WHEN Ron Bowes set out to harvest the name of every person with a searchable profile on Facebook, he figured he'd wind up with more than a few.
After all, upwards of 500 million people around the world use the social networking service, which lives in that mass of distributed Web servers and hard drives known as "the cloud."
Still, he figured that Facebook would at least put up a fight.
"I wasn't surprised I got a big set of names," says Bowes, a "white-hat" hacker and independent security researcher who wanted the names for password research. (By studying the relationships between people's names and the passwords they choose, Bowes aims to learn how passwords can be made stronger and less easily hacked.) "But I was surprised they not only let me, but they made it easy."
Fearing that Facebook would ban him once it got wise to his scheme, Bowes used a bit of tailor-made code called a "spider" to requesting just one user's information from the site every 10 seconds. But no one seemed to notice. Or care. So he started making requests once every 5 seconds, then once every 3 seconds, then once every second. Towards the end, Bowes was pulling data as fast as he could from three different servers, all at once.
And by the time he was done, Ron Bowes and his itsy-bitsy spider had netted 171 million names, together with their accompanying Facebook IDs and the URLs for their profile pages.
All of which he proceeded to make available for download in a 2.8 GB file from Pirate Bay, the popular peer-to-peer file sharing site. A file that also contains a list of the most common probable usernames associated with those personal names (e.g., "jdoe" for "John Doe").
Security researchers like Bowes use such data to probe for password vulnerabilities through so-called "brute force" attacks, which employ automated software tools to guess every conceivable username and password combination a group of users might select.
Of course, it's also precisely the kind of data that malicious hackers and cyber thieves can use to infiltrate a computerized system and wreak all kinds of havoc.
Bowes is a good guy, and the information he collected, though personal, was already publicly availablethough no one had ever thought to assemble and analyze it in quite the way, or on quite the scale, that he did. Still, his actions caused a furor in the media, and led many commentators to lambaste him for what they perceived to be either his recklessness, or his evil intent.
They were wrong on both counts. As most other security experts who weighed in on "the Facebook hacker" were quick to point out, Bowes hadn't really endangered anyonethough he might well have made it easier for Internet marketers and their data-mining ilk to find fresh targets.
But you can hardly blame the privacy police for getting their knickers in a twist. If there were ever a time to be paranoid about threats to online privacy and security, now would be it.
The past year or two have seen a slew of costly data breaches, from the theft of more than 130 million credit and debit card numbers from the Hannaford supermarket chain to the interception of personal and financial data belonging to nearly 600,000 card accounts stored on Web servers maintained by the e-commerce provider Network Solutions.
Perhaps even more worryingly, researchers at Carnegie Mellon University demonstrated that they could accurately predict the Social Security numbers of nearly 5,000,000 Americans using nothing but some sophisticated algorithms and a wealth of publicly available online information.
And Facebooklovable Facebook, everyone's favorite social-networking toolcommitted so many privacy lapses (making members' information public by default, giving users access to information in their friends' accounts that was supposed to be private) that the Electronic Privacy Information Center filed a complaint against the site with the Federal Trade Commission, and more than 2.2 million users joined a Facebook group devoted solely to protesting the most egregious of the site's policies and actions.
Of course, complaining about Facebook's privacy policies through the site itself is kind of like publicizing your car's lousy safety record by racing around in it with a big sign on the roof that reads, "This car is unsafe!" And therein lies the great paradox of online privacy and data security: while many people express concern over these issues, most also seem willing to accept them as the cost of living life online.
Yet the potential trouble that cloud-based operations can cause private citizens is only the tip of the iceberg. For just as the benefits of social networking continue to attract users who may harbor concerns about privacy, so do the benefits of cloud computing in general continue to attract corporations, universities, and other large institutions that seek the convenience and cost savings that come from outsourcing their IT infrastructure.
THE BASIC idea behind cloud computing is simple: users run software applications and store data not on their own machines but on servers housed in enormous "mega data centers." Each of these servers, in turn, is populated by multiple virtual machines (VMs), which are essentially software emulations of real computers. The hardware and software that comprises this vast web of networked resources is known as the cloud (or perhaps more accurately, "a cloud," since there can be more than one). By having large numbers of physical machines generate even larger numbers of virtual ones, cloud computing providers can exploit economies of scale to offer users seemingly limitless computing capacity at discount rates.
While the debate over the relative merits and dangers of cloud computing may be fresh, the concept itself can be traced back to the 1960s, and to resource-sharing models like grid computing, in which separate clusters of computers run by different groups are united in a single, large network; and utility computing, in which customers pay to use computing resources in much the same way they might pay for metered utilities like electricity and water. Today, "the cloud" refers to any large group of networked computers that offers services over the Internet. Providers can offer access to particular software applications (e.g., Google Docs, Salesforce.com); software development platforms (e.g., Google App Engine, Joyent); and even virtual machines (e.g., Amazon EC2, GoGrid). If you've ever used Gmail or Flickr, you've enjoyed the benefits of cloud computing: such services, and many others, are creatures of the cloud, relying on it for processing power and data storage. Indeed, a survey of more than 2000 adults commissioned by the Pew Internet and American Life Project in 2008 indicated that 69% of online users have engaged in at least one common form of cloud computing, from using a webmail service to storing videos and computer files online.
For users, the advantages are clear: you can access your data and run your applications anywhere and any time, enjoying the illusion of seemingly infinite computing resources on any devicea netbook, a PDA, a smartphonethat has Internet access. No longer must businesses and other organizations that want to perform sophisticated analytics on massive data sets or provide Web services to millions of people invest the time, money, and resources necessary to build and maintain their own data centers. Instead, they can buy capacity from a provider like Amazon, which rents virtual machines for less than $.10 an hour and storage space for as little as $.12 per gigabyte per month. Such services are elastic, meaning that you can rapidly add or subtract resources to meet demand, paying only for what you need at any given moment. Some can even spool capacity up or down automatically, a trick borrowed from the field of autonomic computing, which seeks to develop self-managed computer systems. (The University contributes to the advancement of autonomic computing technology through the Center for Autonomic Computing, a national research organization which it helped found in conjunction with Rutgers University and the University of Arizona.)
Researchers, too, benefit from cloud computing; rather than having to build their own supercomputers or pony up the resources necessary to participate in a grid network, they can buy time on commercial clouds or mooch capacity from institutional ones. The Advanced Computing and Information Systems Laboratory (ACIS) at the University of Florida, for example, is one of several academic centers that provide cloud-computing power to the scientific community; its "science cloud," dubbed Stratus, offers researchers the opportunity to run scientific applications on virtual machines using a system modeled after Amazon's EC2 service.
But cloud computing has its downside, as well: users must relinquish control over their data and applications to the cloudand to the people who run it (see Facebook, above). This, in turn, raises a variety of security concerns. Individuals, for example, might fear having sensitive informationlike their social security numbers, or the medical data that will be stored in the electronic health records that the federal government has committed to establishing for every American by 2014lost, corrupted, or stolen. Companies, meanwhile, must consider the risk of entrusting their confidential business data to someone else. "Suppose you're Citibank, and you have your customer data with Amazon," says Sanjay Ranka, a professor in the department of computer information science and engineering at the University of Florida who uses game theory to make large computer networks more efficient. "Now suppose someone hacks into Amazon's servers and gets that dataand that data is life and death to your business. Are you willing to risk storing it in the cloud?"
External threats aren't the only ones that cloud providers and their customers need to worry about, however. There's also the risk that an insider within the cloud providera sticky-fingered systems administrator, for examplemight tamper with or steal customer data. "From a security perspective, it's always possible," says Wenjing Lou, a professor in the department of computer and electrical engineering at Worcester Polytechnic Institute. It can also be expensive. Writing in the November 13 issue of the journal Science, University of Virginia computer scientists William Wulf and Anita Jones claimed that in at least one sample of intrusions into financial systems, the FBI found that "attacks by insiders were twice as likely as ones from outsidersand the cost of an intrusion by an insider was 30 times as great."
FORTUNATELY, researchers are already finding ways to mitigate the risks of cloud computing, sometimes by exploiting the cloud itself. Renato Figueiredo, a professor of electrical and computer engineering at the University of Florida, works with the SocialVPN project within ACIS to develop software that allows users to create their own virtual private networks (VPNs), secure communication channels that send and receive encrypted data via the Internet, using the cloud infrastructure provided by social networking sites like Facebook and Google Chat. The software, which can be downloaded at SocialVPN.org, is user-friendly and self-configuring. Once downloaded by members of a particular social networking site, the program will automatically generate IP addresses and cryptographic keys for each user. It will then use the social networking site's own computers to store and search for the addresses and keys of each user's friends, allowing them to establish a secure network amongst themselves. The resulting "social VPN" lets these friends communicate with one another and engage in activities that may not even be supported by the social networking service itselftrading iTunes playlists, engaging in multiplayer online games, controlling one another's desktopswithin a fully encrypted environment. The software has been downloaded 4050 times since September 2009, and Figueiredo knows of at least one academic researcher who uses it to connect securely to Amazon's cloud.
VPNs are also used to protect business data on its way to and from a cloud provider's servers. But as Figueiredo points out, once a customer's data is actually on a provider's machines, it's no longer necessarily securein part because it must be unencrypted before it can be manipulated. "You still must trust the host infrastructure," Figueiredo says.
And that's a problem, particularly for those who pay to run virtual machines and store data on a provider's hardwarean arrangement known as "infrastructure-as-a-service," or IaaS. Indeed, the very same virtualization technology that makes IaaS economically attractive to start-ups and other organizations that lack the capital to build their own data centers also appears to make it fundamentally insecure. By populating their servers with multiple virtual machines, providers can squeeze the maximum possible performance out of each physical machine; as a result, however, different clients can end up inhabiting the same servers and sharing the same physical resourcesa situation that practically invites all manner of cyber-attacks. "It's like office workers sharing a secretary," says Ranka. Except that in this case, some of the workers might be using the secretary to spy on or steal from their colleagues.
Indeed, just this past fall, a group of researchers at MIT and the University of California at San Diego demonstrated that they could place their own "attacker" VMs on the same servers as "target" VMs within Amazon's EC2 cloud. They began by mapping the cloud, using network probes and other techniques to infer where particular VMs were running. Then, by exploiting the fact that all VMs running on the same physical machine share certain resources, such as CPU caches, the researchers were able to covertly eavesdrop on target VMs. Hackers, corporate rivals, and other potential adversaries could use similar methods to extract information from target VMs, or even steal passwords by capturing users' keystrokes.
Amazon may have provided the test bed, but the experiment carries broader implications. "The basic vulnerability is not specific to EC2," Eran Tromer, a postdoc in MIT's computer science and artificial intelligence laboratory, wrote in an e-mail, "but rather to the virtualization technology used by all 'infrastructure-as-a-service' cloud providers." At present, Tromer contends that the only way to plug this particular security hole would be to let customers pay for the privilege of having entire servers to themselvessomething that would make the economics of cloud computing far less appealing. But computer scientists and engineers are already working on other potential solutions.
For example, Nuno Santos and his colleagues at the Max Planck Institute in Berlin have developed a model of "trusted cloud computing" that uses trusted platform modules (TPMs)security-enhancing computer chips loaded with cryptographic keysand specially designed software to protect virtual machines running in a cloud environment from attack. The cryptographic protocols devised by Santos and his fellow researchers allow a user to verify that his own virtual machines are running on secure servers, while preventing even privileged insiders like systems administrators from hacking them. This kind of system, which turns each VM into a tamper-proof "closed box," is still hypothetical; not all servers ship with TPMs, and the system would require a trusted third party, which does not yet exist, to guarantee its security, much in the same way that a certificate authority like VeriSign attests that Web sites are encrypted and secure. But it could ultimately form part of a suite of solutions that renders cloud computing less risky.
That's a good thing, because you needn't look to academic researchers to find people who are intent on hacking online systems. And as more and more data is stored in the cloud, more and more people will be inspired to find ways to tamper with it, whether for fun or profit. "I'm sure lots of people are figuring out how to break Amazon," Figueiredo says. Which only makes the need for a broad array of innovative and robust defenses even more pressing.
Still, technology alone will never render the cloud absolutely bulletproof. "No system is perfectly secure," says Figueiredo. And business leaders know this: A survey commissioned last year by the global IT consultancy Avanade found that, while a majority of more than 500 chief executives and IT managers in 17 countries recognized the economic benefits of cloud computing, most continued to "trust existing internal systems over cloud-based systems due to fear about security threats and loss of control of data and systems."
So what can be done? Perhaps a combination of better technology and better policies will provide the reassurance that users crave. Figueiredo, for example, contends that cloud providers should offer their customers contracts that explicitly spell out the risks of cloud computing, indicate who is liable for what under which particular circumstances, and provide remedies up to and including monetary compensation. He likens such contracts both to insurance, and to the agreements that credit card companies already offer their customers.
It is a surprisingly low-tech solution to a very high-tech problem. But it might be the best way of ensuring that even the worst cloud has a silver lining.
Copyright ©2011 Alexander Gelfand