Wednesday, October 08, 2008

Grid computing…

In my previous post, I had mentioned how grid computing is helping in processing of petabytes of data generated by LHC. Today, let’s explore the concept of grid computing in a little more depth.

According to Wikipedia - Grid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks. This technology has been applied to computationally-intensive scientific, mathematical, and academic problems through volunteer computing, and it is used in commercial enterprises for such diverse applications as drug discovery, economic forecasting, seismic analysis, and back-office data processing in support of e-commerce and web services.

The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a new computing infrastructure".

An excellent resource on grid computing is available at - http://www.gridcomputing.com/

Later on, CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in 1999 by SETI@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive research problems.

One of the most famous cycle-scavenging networks is SETI@home, which was using more than 3 million computers to achieve 23.37 sustained teraflops (979 lifetime teraflops) as of September 2001. Being deeply interested in the question of extra-terrestrial life, I myself participated in the SETI@home project with my old x486!

Grid computing requires the use of software that can divide and farm out pieces of a program to as many as several thousand computers. Grid computing can be thought of as distributed and large-scale cluster computing and as a form of network-distributed parallel processing. It can be confined to the network of computer workstations within a corporation or it can be a public collaboration (in which case it is also sometimes known as a form of peer-to-peer computing).

A number of corporations, professional groups, university consortiums, and other groups have developed or are developing frameworks and software for managing grid computing projects. The European Community (EU) is sponsoring a project for a grid for high-energy physics, earth observation, and biology applications. In the United States, the National Technology Grid is prototyping a computational grid for infrastructure and an access grid for people. Sun Microsystems offers Grid Engine software. Described as a distributed resource management (DRM) tool, Grid Engine allows engineers at companies like Sony and Synopsys to pool the computer cycles on up to 80 workstations at a time. (At this scale, grid computing can be seen as a more extreme case of load balancing.)

What distinguishes grid computing from typical cluster computing systems is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing grid may be dedicated to a specialized application, it is often constructed with the aid of general purpose grid software libraries and middleware.

"Distributed" or "grid" computing in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus.

The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a multiprocessor supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various processors and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel computations can take place independently, without the need to communicate intermediate results between processors.

The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet.

There are also some differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.

By the way, Nortel had also joined the Global Grid Forum (GCF) as far back as April, 2004. The charter of the GHPN group is to establish a rich two-way communication between the community of Grid application developers and the networking communities (in both academia and industry).

Another well-known project is the World Community Grid. The World Community Grid's mission is to create the largest public computing grid benefiting humanity. This work is built on the belief that technological innovation combined with visionary scientific research and large-scale volunteerism can change our world for the better. IBM Corporation has donated the hardware, software, technical services and expertise to build the infrastructure for World Community Grid and provides free hosting, maintenance and support.

During 2007 the term cloud computing came into popularity, which is conceptually similar to the canonical Foster definition of grid computing (in terms of computing resources being consumed as electricity is from the power grid). Indeed grid computing is often (but not always) associated with the delivery of cloud computing systems.

All the major corporations of the world involved with computing industry in one way or the other are working towards this area. Microsoft is joining the cloud-computing trend, with CEO Steve Ballmer saying a "Windows Cloud" OS will be launched at Microsoft's Professional Developers Conference. Ballmer said Microsoft's "Windows Cloud" is aimed at developers creating cloud-computing apps. Microsoft, IBM, Intel and Oracle are all getting involved in cloud computing.

For example, Oracle is shifting its Grid Focus to the Application. "Think of WebLogic Application Grid as similar to a service-oriented architecture," said Mike Piech, an Oracle senior director of product marketing, during a recent briefing. "It's not a single product, not a single technology, but an infrastructure with a certain set of characteristics to provide on-demand behavior. Our approach is to have all the foundation-level middleware technologies play into that basic idea of the grid: pooling and sharing resources, using them more efficiently, but also providing a higher quality of service."

IBM is also not being. Relevant resources from IBM can be located from this link.

Grid computing appears to be a promising trend for three reasons:

(1) Its ability to make more cost-effective use of a given amount of computer resources,

(2) As a way to solve problems that can't be approached without an enormous amount of computing power, and

(3) Because it suggests that the resources of many computers can be cooperatively and perhaps synergistically harnessed and managed as collaboration toward a common objective.

In some grid computing systems, the computers may collaborate rather than being directed by one managing computer. One likely area for the use of grid computing will be pervasive computing applications - those in which computers pervade our environment without our necessary awareness

No comments: