Home > News > Enterprise-Wide Computing

Enterprise-Wide Computing

From Science
August 12, 1994

By Andrew Grimshaw

F or over thirty years science fiction writers have spun yarns featuring worldwide networks of interconnected computers that behave as a single entity. Until recently such fantasies have been just that. Technological changes are now occurring that may expand computational power just as the invention of desktop calculators and personal computers did. In the near future computationally demanding applications will no longer be executed primarily on supercomputers and single workstations dependent on local data sources. Instead enterprise-wide systems, and someday nationwide systems, will be used that consist of workstations, vector supercomputers, and parallel supercomputers connected by local and wide-area networks. Users will be presented the illusion of a single, very powerful computer, rather than a collection of disparate machines. The system will schedule application components on processors, manage data transfer, and provide communication and synchronization so as to dramatically improve application performance. Further, boundaries between computers will be invisible, as will the location of data and the failure of processors.

To illustrate the concept of an enterprise-wide system, first consider the workstation or personal computer on your desk. By itself it can execute applications at a rate that is loosely a function of its cost, manipulate local data stored on local disks, and make printouts on local printers. Sharing of resources with other users is minimal and difficult. If your workstation is attached to a department-wide local area network (LAN), not only are the resources of your workstation available to you, but so are the network file system and network printers. This allows expensive hardware such as disks and printers to be shared, and allows data to be shared among users on the LAN. With department-wide systems, processor resources can be shared in a primitive fashion by remote login to other machines. To realize an enterprise-wide system, many department-wide systems within a larger organization, such as a university, company, or national lab, are connected, as are more powerful resources such as vector supercomputers and parallel machines. However, connection alone does not make an enterprise-wide system. If it did then we would have enterprise-wide systems today. To convert a collection of machines into an enterprise-wide system requires software that makes sharing resources such as databases and processor cycles as easy as sharing printers and files on a LAN; it is just that software that is now being developed.

more effective collaboration by putting coworkers in the same virtual workplace; higher application performance due to parallel execution and exploitation of off-site resources; improved access to data; improved productivity resulting from more effective collaboration; and a considerably simpler programming environment for the applications programmers.

Three key technological changes make enterprise-wide computing possible. The first is the much heralded information superhighway or national information infrastructure (NII) and the gigabit (10^9 bits/second) networks which are its backbone. These networks can carry orders of magnitude more data than current systems. The effect is to "shrink the distance" between computers connected by the network. This, in turn, lowers the cost of computer-to-computer communication, enabling computers to more easily exchange both information and work to be performed.

The second technological change is the development and maturation of parallelizing compiler technology for distributed-memory parallel computers. Distributed memory parallel computers are computers consisting of many processors, each with its own memory and capable of running a different program, connected together by a network. Parallelizing compilers are programs that take source programs in a language such as High Performance Fortran and generate programs that execute in parallel across multiple processors, reducing the time required to perform the computation. Depending on the application and the equipment used the performance improvement can be from a modest factor of two or three to as much as two orders of magnitude. Most distributed memory parallel computers to date have been tightly coupled, where all of the processors are in one cabinet, connected by a special purpose, high-performance network. Loosely coupled systems, constructed of high-performance workstations and local area networks, are now competitive with tightly coupled distributed memory parallel computers on some applications. These workstation farms [1,2] have become increasingly popular as cost-effective alternatives to expensive parallel computers.

The third technological change is the maturation of heterogeneous distributed systems technology [3]. A heterogeneous distributed system consists of multiple computers, called hosts, connected by a network. The distinguishing feature is that the hosts have different processors (80486 versus 68040), different operating systems (Unix versus VMS), and different available resources (memory or disk). These differences and the distributed nature of the system introduce complications not present in traditional, single-processor mainframe systems. After twenty years of research, solutions have been found to many of the difficulties that arise in heterogeneous distributed systems.

The combination of mature parallelizing compiler technology and gigabit networks means that it is possible for applications to run in parallel on an enterprise-wide system. The gigabit networks also permit applications to more readily manipulate data regardless of its location because they will provide sufficient bandwidth to either move the data to the application or to move the application to the data. The addition of heterogeneous distributed system technology to the mix means that issues such as data representation and alignment, processor faults, and operating system differences can be managed.

Today enterprise-wide computing is just beginning. As of yet these technologies have not been fully integrated. However, projects are underway at the University of Virginia and elsewhere [4] that if successful will lead to operational enterprise-wide systems. For the moment systems available for use with networked workstations fall into three non-mutually-exclusive categories, (i) heterogeneous distributed systems, (ii) throughput oriented parallel systems, and (iii) response-time oriented parallel systems. Heterogeneous distributed systems are ubiquitous in research labs today. Such systems allow different computers to interoperate and exchange data. The most significant feature is the shared file system, which permits users to see the same file system, and thus share files, regardless of which machine they are using or its type. The single file-naming environment significantly reduces the barriers to collaboration and increases productivity. Throughput-oriented systems focus on exploiting available resources in order to service the largest number of jobs, where a job is a single program that does not communicate with other jobs. The benefit of these systems is that available, otherwise idle, processor resources within an organization can be exploited. While no single job runs any faster than it would on the owner's workstation, the total number of jobs executed in the organization can be significantly increased. For example, in such a system I could submit five jobs to the system at the same time in a manner reminiscent of old-style batch systems. The system would then select five idle processors on which to execute my jobs. If insufficient resources were available, then some of the jobs are queued for execution at a later time. Response time oriented systems are concerned with minimizing the execution time of a single application, that is, with harnessing the available workstations to act as a virtual parallel machine. The purpose is to more quickly solve larger problems than would otherwise be possible on a single workstation. Unfortunately, to achieve the performance benefits an application must be rewritten to use the parallel environment. The difficulty of parallelizing applications has limited the acceptance of parallel systems.

The Legion project at the University of Virginia is working toward providing system services that provide the illusion of a single virtual machine to users, a virtual machine that provides both improved response time and greater throughput [5]. Legion is targeted towards nation-wide computing. Rather than construct a full scale system from scratch we have chosen to construct a campus-wide testbed, the campus-wide virtual computer, by extending an existing parallel processing system. Even though the campus-wide is smaller, and the components much closer together, than in a full scale nation-wide system, it presents many of the same challenges. The processors are heterogeneous, the interconnection network is irregular, with orders of magnitude differences in bandwidth and latency, and the machines are currently in use for on-site applications that must not be hampered. Further, each department operates essentially as an island of service, with its own file system.

The campus-wide system is both a working prototype and a demonstration project. The objectives are to demonstrate the usefulness of network-based, heterogeneous, parallel processing to computational science problems; provide a shared high-performance resource for university researchers; provide a given level of service (as measured by turn-around time) at reduced cost; and act as a testbed for the large scale Legion. The prototype is now operational and consists of over sixty workstations from three different manufacturers in four different buildings. At the University of Virginia, we are using two production applications for performance testing, Complib, a biochemistry application that compares DNA and protein sequences, and ATPG, an electrical engineering application that generates test patterns for VLSI circuits. Early results are encouraging. Other production applications are planed.

I believe that projects such as Legion will lead to the widespread availability and use of enterprise-wide systems. Although significant challenges remain there is no reason to doubt that such systems can be built. The introduction of enterprise-wide systems will result in another leap forward in the usefulness of computers. Productivity will increase owing to the more powerful computing environment, the tearing down of barriers to collaboration between geographically separated researchers, and increased access to remote information such as digital libraries and databases.

References and Notes

  1. B. Buzbee, Science 261, 852 (1993).
    [1] B. Buzbee, "Workstations Clusters on Rise and Shine", pp. 852-853, Science, vol 261, August, 1993.
  2. J.A. Kaplan and M.L. Nelson, "A Comparison of Queuing, Cluster, and Distributed Computing Systems," NASA Technical Memorandum 109025 (NASA Langley Research Center, Langley, Virginia, 1993).
  3. D. Notkin et al., Commun. ACM, 30, 132 (1987).
    [3] D. Notkin, N., et al.,"Heterogeneous Computing Environments: Report on the ACM SIGOPS Workshop on Accommodating Heterogeneity," Communications of the ACM, vol. 30, no. 2, pp. 132-140, February, 1987.
  4. R. Rouselle et al., "The Virtual Computing Environment," Proceedings of the Third International Symposium on High Performance Distributed Computing, IEEE Computer Society Press, August, 1994.
  5. For more information on heterogeneous parallel processing use Mosaic to access http://www.cs.virginia.edu/~mentat/


Original Article | Local Copy

 

More news about Andrew S Grimshaw

 

Return To List