Home > News > New grid on the block

New grid on the block

From Washtech
October 24, 2001

By Laura Sivitz

By Laura Sivitz
Thursday, Oct. 25, 2001 Efforts to turn the Internet into a supercomputer could revolutionize the way the region's tech companies do business. But will security concerns keep them behind firewalls?

For a starting price of $120,000, Parabond Computation, a private Fairfax company, will sell you software to weave your company's desktop computers into a virtual supercomputer for crunching computation-heavy problems.

Sounds pricey. But for Celera Genomics of Rockville, a Parabond system might be cheaper than a new supercomputer. John Reynders, Celera's vice president of information systems, wants extra power for high-throughput protein analysis in the pursuit of drugs and diagnostics for lung and pancreatic cancer.

"Yeah, I might have a one-teraflop center," he says, referring to the company's 800 Compaq Alphas running at peak speed of a billion computations per second. "But if I have over 1,000 Intel chips [in desktop and lab PCs], that's a whole lot of compute power that rivals my supercomputer." While he hasn't made a final decision, an in-house grid "looks very cost-effective to explore."

Celera could tap into even more power on the Internet through Parabon's online grid of 36,000 individual PCs around the world ???but Reynders won't consider it. How would he guarantee the security of Celera's proprietary data as it flew across the Web?

"I think there might be some algorithmic approaches to security," he suggests. But fear overrides his own solution. "What if one person put 10,000 computers on a grid and used them to reconstruct the algorithm?" he hypothesizes. For now, he plans to stick with a grid tucked safely inside Celera's firewalls.

Such is the state of the grid-computing movement.

Supporters envision the Internet becoming a Great Global Grid that makes supercomputing power as accessible as electricity, allowing small companies to do elaborate calculations in hours, rather than days or weeks.

But concerns about intellectual-property security, computational accuracy and cost ???not to mention the lack of a standard grid protocol ???make many businesspeople wary.

High-profile investments in recent weeks by IBM and the National Science Foundation will aim to resolve some of these issues. Still, large companies tend to prefer internal grids, no matter how powerful a public one might be. Some computation-heavy organizations even question the need for a grid at all.

Proponents of grid computing say it could profoundly reshape the competitive field of drug discovery. The impact could be felt both in Maryland, one of the nation's leading biotech centers, and Virginia, where economic development officials hope to secure a foothold in the emerging field of bioinformatics.

"If a grid is available ... a small company has access to what Celera has without having to buy what Celera bought," says Bruno Walther S. Sobral, director of the Virginia Bioinformatics Institute in Blacksburg.

The race to develop targeted drugs and diagnostics involves intensive computation. To identify the genes and proteins involved in diseases, researchers must screen billions of units of genetic code and study the extraordinarily complex properties of proteins. GenBank, the central repository of genetic information managed by the National Institutes of Health in Bethesda, has doubled in size about every 14 months, according to NIH.

If small companies could access a public grid for a reasonable fee, they might engage in biotech research and development without buying and supporting their own supercomputers. "It's very much an opportunity for a biotech company to be able to compete with a large pharmaceutical in terms of raw computing power," Stuart says.

Sobral says he hopes the Virginia Bioinformatics Institute will form part of a grid providing affordable access to tools and computational power to the biotech community.

But there are skeptics.

"You'd probably spend as much money coordinating [a grid] as setting up computers in-house because computers are unbelievably cheap right now," says Steven Salzberg, the senior director of bioinformatics at The Institute for Genomic Research in Rockville. An entrepreneur could buy a 40-gigahertz system for about $40,000, he estimates ???enough to do tasks such as searching for a gene in a genetic database.

Customers can use Parabon Computation's online grid to crunch data for a fee of $6 per hour per 100 computers (not all 36,000 are operating all the time), according to Mark Weitner, vice president of marketing. He declined to reveal how many customers use the service.

The company also sells a software application for its grid platform that allows researchers to compare DNA and protein databases in the drug-discovery process. About 18 organizations, mostly biotechs and pharmaceuticals, are testing or have purchased the application at a starting price of $60,000, Weitner said, although many of them use it on an in-house grid.

Parabon hopes to stimulate the development of software that runs on its grid platform, called Frontier, to generate more customers. So last month, the company released a free kit for third-party software developers to create applications for Frontier.

Developers can take 30 days of free time on the grid to fine-tune their products. Most people who are interested in the kit plan to develop life-science applications, according to Jim Gannon, Parabon's chief technology officer.

For instance, a gene-chip software company adapted an algorithm to Frontier and used it to re-examine the results of a breast-cancer study published this year in the New England Journal of Medicine. The company, BioDiscovery of Marina del Ray, Calif., claims that running its algorithm on the Parabon grid revealed 45 of the 51 genes reported to be active during the published experiment were actually false positives. In other words, Parabon maintains, doing more computation produced a more accurate result.

"A common theme we see is that people use less sensitive approaches [for data analysis] that require less power, but miss important relationships in the data," Gannon explains.

Outside biotechnology, a software vendor called eVision has tapped into the computing power of Parabon's system to test its product for searching the Web using images instead of text.

"[They're] able to crunch 100 times more images in the same period than in the past," Gannon says.

But firms that run operations on a public grid such as Parabon's could be putting millions ???or billions ???of dollars in intellectual property and future profits at risk ???a major security concern. This is enough to deter Entropia's large pharmaceutical clients from even considering public-grid use, Stuart says. Even so, it appears the benefits would outweigh the risks for many small entrepreneurs.

"We've had sufficient interest from smaller pharmaceutical and biotech companies that don't expect to build an in-house compute source," he says. "They see the grid as a viable alternative."

Still, in-house grids support the bottom line for both Entropia and Parabon.

"The pharmaceutical companies we look at have 10,000 to 90,000 employees with desktop computers," Gannon says. "Our sales force can hit these large sales, rather than lots of smaller sales."

Parabon generates the majority of its revenue from a version of the Frontier distributed-computing platform that's modified for use inside a corporation, says Weitner. Called Frontier Enterprise, this is the product Celera is considering for an internal grid. The product also starts at $60,000, Weitner said, as does the software that must be loaded on a company's desktop computers to make use of the platform.

Although he wouldn't disclose the number or names of customers that have bought Frontier Enterprise, he said Parabon has doubled its client base each quarter of 2001, the first year it generated revenue. Gannon projects the company will reach profitability in the fourth quarter of 2002.

Parabon and Celera plan to collaborate on the development of software applications that will run inside the Rockville company on a Frontier Enterprise platform ???if Celera decides to buy it ???and will speed protein sequencing and identification. The companies had their kick-off meeting in early October, although Celera has been testing Frontier Enterprise since the second quarter. "It did a good job of flying below the radar," says Celera's Reynders. During workday tests, employees barely noticed that the Parabon system was using their computers when they weren't.

Not surprisingly, security is the most daunting challenge for global-grid developers. To protect intellectual property, grid users need some sort of key granting them exclusive access to specific tools and data. Likewise, the providers of computer power will want control over who accesses their systems.

E-commerce Web sites already have a secure system for users to metaphorically knock on the door and for sites to identify who's there and decide whether to let them in. Groups such as the Global Grid Forum, a collection of 169 organizations supporting the development of computational grids, are modifying the e-commerce protocol for grid use.

Perhaps the most challenging security issue is the so-called restricted delegation problem. Grid users will have to rely on various agents to schedule their jobs, divvy them up among hundreds or thousands of computers and do other tasks. But the broker, a software program, will likely be so complicated that it could easily make a mistake, such as delegating a job and wiping out the results five hours later. Or, some malicious program could disguise itself as an honest broker while intending to do harm.

"You cannot trust this agent that it will only do good things for you," says Marty Humphrey, a research assistant professor at University of Virginia. A co-chair of the Global Grid Forum's security committee, he and members of the university's Legion project are working on a way for users to strictly limit what any agent can do. Computer scientists at Argonne National Laboratory and many other places have the same goal.

"You need a specialized security mechanism that doesn't exist yet," he explains, adding, "It's still a couple years off."

Some grid software developers are attempting to tackle the security problem for businesses by creating products that permit the sharing of specific data systems while protecting all other corporate information.

Avaki of Boston, which grew out of computer-science research at the University of Virginia and maintains an office in Charlottesville, has such a strategy, according to Andrew Grimshaw, the chief technology officer.

Today, he says, many companies have hundreds of little buffer zones, each a firewall-protected subdomain where a client can examine the host's database. "It's tedious and labor intensive, and a lot of people are in the loop," he says. The Avaki data grid could replace the subdomains with a lattice of interconnecting software components that automatically grant selective, secure access for each customer while protecting the host's private information.

The grid-computing movement has been building for years, but only recently did heavy-hitters take initiatives to move the vision into commercial range.

It got a boost in August when IBM announced it would provide hardware, software and services to the United Kingdom's e-Science Core Program for building a grid to support global scientific collaboration. Later that month, the company won a part of a $53 million contract to help build the Distributed Terascale Facility, or "TeraGrid," for the National Science Foundation.

In addition, Big Blue says it plans to grid-enable some of its systems and technologies for commercial use. The Institute of Electrical and Electronics Engineers estimated IBM will invest more than a billion dollars or more in these projects.

The TeraGrid will be a training ground for grid-computing applications and a resource for the scientific and business communities, according to the NSF. By linking two supercomputing centers, a federal lab and a university, it is expected to perform up to 11.6 trillion calculations per second.

Rita R. Colwell, the foundation's director, illustrated the benefit of such vast computational power during a recent speech. "It can take just 20 milliseconds for a nascent protein to fold into its functional conformation," she said. "Until recently, it took 40 months of computer time to simulate that folding. With new terascale computer systems ???operating at one trillion operations per second ???we have reduced that time to one day."

An individual organization would have to spend more than $110 million to purchase an equally powerful supercomputer. That's how much it cost IBM to build ASCI White for the Department of Energy. The company claims the machine can do more than 12.3 trillion floating-point operations per second.

For an at-cost fee (well below $110 million), companies will get to experiment with the TeraGrid once it starts running in mid-2002. The hope is that these businesses will learn enough to go out and build their own grids, says Robert R. Borchers, former division director of advanced computational infrastructure and research at NSF. Up to 10 percent of the grid's resources will be for sale; the remaining 90 percent will be dedicated to government and academic research.

"We hope the product that comes out of this is grid software," says Borchers, who retired from NSF in late September.

On the consumer side, Microsoft's well-publicized .Net solution appears to have some qualities in common with grid computing, but its focus is not on computation. Rather, it aims to make the Internet the basis of a new operating system where consumers' data would reside, making information more accessible and easier to integrate. The security and protocol-development challenges for the .Net initiative appear similar to those of grid computing; indeed, Microsoft is a sponsor of The Global Grid Forum, which aims to develop grid-computing standards and technologies.

Another sponsor, Sun Microsystems of Palo Alto, Calif., makes software called Sun Grid Engine that manages the resources of an in-house cluster and is available in open-source code. The company is in the midst of developing its next level of grid software, Global Resource Director, according to director of grid computing Wolfgang Gentzsch. It will be another five years before the product is fully automated and running at customer sites, he says, declining to put a dollar figure on the company's investment. Sun also plans to spend millions of dollars assembling three to six major grids around the globe over a period of years.

Hewlett-Packard is likely to produce a grid-computing solution, too, according to Stacey Quandt, an associate analyst with Giga Information Group in Santa Clara, Calif.

Running one of the best-known public grid applications, SETI@home, has taught David Anderson a lot about what it will take to make a global grid viable.

Based at the University of California at Berkeley, the SETI research project taps into the combined power of about 500,000 computers worldwide to analyze data from space for signs of extraterrestrial intelligence.

"The main lessons learned are things that should have been totally obvious in the first place," he says.

For instance, as the number of SETI participants grew from thousands to hundreds of thousands, the organization had to beef up its servers to handle the gargantuan amount of data coming and going. "Most of the work we've done has to do with continually expanding the size and power of our own servers," says Anderson, who's also the chief technology officer of United Devices.

The SETI team also discovered that, with super-data-intensive problems, sending data to a public grid could cost more than buying the equivalent supercomputing power, because the process eats up so much capacity on telecommunications networks. Neither SETI nor United Devices has reached that point, Anderson says. Given today's telecommunications costs, he estimates the threshold is when you need to send more than one gigabyte of data to occupy one day of computer time. But SETI@home is UC-Berkeley's biggest single source of outgoing Internet traffic.

"Any kind of commercial attempt at this is going to have to take [network costs] into account," he says.

Commercial and research entities alike also will have to account for the fact that PCs on a grid sometimes spew out wrong answers. "A small but non-ignorable percentage of our results are numerically wrong," according to Anderson. While hardware issues account for most of those errors, a few users try to foul up the project by sending miscalculations intentionally, he says.

Both SETI and United Devices try to ensure they get right answers by sending the same calculation to multiple computers. Statistically, most of the machines will compute it correctly. So the servers compare the results, most of which are the same, and toss out the mistakes.

Achieving a global, commercial grid looks about as easy as putting a man on the moon. The nation did it, but enormous technical challenges and safety concerns stood in the way. In the near term, the market for distributed-computing platforms inside companies looks most promising, especially as more software applications become available. It's hard to see Internet-based grid platforms becoming spectacularly profitable anytime soon. Like walking on the moon, it will take breakthroughs to make the Great Global Grid a fact of life.

2001 Post Newsweek Tech Media Group

Original Article


Original Article | Local Copy

 

More news about Marty A Humphrey

 

Return To List