Grid Computing Overview

Introduction

Until recently, mainstream understanding of computer simulation of power system is to run a computer simulation program, such as a Loadflow, Transient Stability or Voltage Stability program, or a combination of such programs on a computer. Recent advance on network technology and software development concept/tools has made it possible to perform "Grid" or "Cloud" computing, where a set of connected computers, forming a computational grid, located locally or remotely, are used to solve certain category of computationally intensive problems.

40 years ago, W. F. Tinney, in Ref[1], introduced the Sparse matrix solution approach to power system simulation and made simulation of power system of (practically) any size and any complexity possible using one computer. The approach forms the foundation for today's power system off-line planning study. Then, in 1974, B Stott, in Ref[2], introduced the Fast Decoupled load flow approach, which made higher voltage utility network analysis much faster. The B-matrix approximation concept used in the Fast Decoupled approach forms the foundation of today’s power system on-line real time security analysis and assessment. Today, the world seems to agree that it is impossible to meet the power system real-time online simulation requirements, where thousands of contingency analysis cases need to be evaluated in a very short period of time, using only ONE computer and the accurate AC Loadflow power network model, without some sort of approximation (simplification/reduction/screening).

Now comes the Grid/Cloud Computing era. Google has used Grid Computing successfully to conquer the internet search challenge. We believe Grid Computing could be used to solve power system on-line real-time simulation problems, using the accurate AC Loadflow model, and this will happen in the near future. The following is an experiment by a research group at a Chinese University using InterPSS Grid Computing.

    • Loadflow : Based on a base case of 1245 buses and 1994 branches, perform 2500 N-1 contingency analysis by running 2500 full AC loadflow in parallel
    • Transient Stability : perform 1000 transient stability simulations in parallel

The results in the above figure indicate that InterPSS Grid Computing approach can achieve linear scalability.

InterPSS Grid Computing Solution

InterPSS grid computing solution, in a nutshell, features the following:

    • InterPSS core simulation engine, implemented in Java, could be automatically distributed over the network and installed/deployed to any remote gird node with minimum administration overhead.
    • InterPSS simulation object, a full object-oriented power system simulation model, could be serialized to an XML document and distributed over the network to the remote grid node. Then the XML document could be de-serialized at the remote grid node into the original object model and becomes ready for power system simulation.
    • InterPSS has an open architecture, which allows any interested party to implement their own grid computing algorithm on top of InterPSS grid computing foundation to offer customized solution.

With the Grid Computing approach, the computation speed limitation constraint has been relaxed greatly. This allows us to re-think and re-design analysis algorithms used in the on-line real-time power system simulation.

Grid Computing Concept

Grid computing was started at beginning to share unused CUP power over the Internet. However, the concept has been expanded recently to include high performance distributed parallel computing using a set of connected computers. Our goal is to create a software system to perform complex power system simulation in a distributed and parallel way. Our targeted environment is a network of servers in a LAN (Local Area Network).

Split/Aggregate (Map/Reduce)

The trademark of computational grids is ability to split process into a set of sub-processes, execute them in parallel, and aggregate sub-results into the one final result. Please note that split can and often does occur recursively. Split and aggregate (a.k.a Map/Reduce) design allows to parallelize the process of task execution gaining performance and/or scalability. For example, with simple and inexpensive 10-nodes grid you can achieve "an order of magnitude" performance increase for your applications, assuming your application is computation intensive and could be split into independent sub-processes. For more in depth discussion of this concept, please refer to a Google article : MapReduce: Simplified Data Processing on Large Clusters

GridGain is an open-source project, providing a platform for implementing computational grid. Its goal is to improve general performance of processing intensive applications by splitting and parallelizing the workload. GridGain is commonly used to achieve better overall throughput, better scalability or availability of services. To get a high-level view of the project, we recommend watch the 15-min Gridgain introduction screen cast

InterPSS grid computing solution currently is based on the Gridgain platform. However, the relationship between InterPSS and Gridgain is loosely coupled, which allows us relative easily to move InterPSS grid computing solution to other grid computing platform in the future, if necessary.

Terminology

    • InterPSS Gird Node - A Gridgain agent instance, running on a physical computer, where compiled InterPSS simulation code could be deployed from the master node remotely and automatically through the network to perform certain simulation job. One or more grid node(s) could be hosted on a physical computer.
    • InterPSS Master Node - A computer with InterPSS installation. You can have multiple master nodes in your computing grid. Also, you can run one or more grid node(s) on a master node computer.
    • InterPSS Computation Grid - At least one master node and one or more grid node(s) in a LAN (Local Area Network) form an InterPSS computational grid.
    • InterPSS Grid Job - A unit of simulation work, which could be distributed to any grid node to run independently to perform certain power system simulation, such as Loadflow or transient stability.
    • InterPSS Grid Task – Representing a simulation problem, which could be broken into a set of grid jobs and distributed to grid node(s). Grid task is responsible to split itself into grid jobs, distribute them to remote grid nodes, and aggregate simulation sub-results from remote grid nodes for decision making or display purpose.

Grid Computing Implementation

With grid computing, solutions to power system simulation in many cases are not limited by the computation speed. One can quickly build a grid computing environment with reasonable cost to achieve high performance computation.

Simulation Job Creation

InterPSS Grid Computing provides two ways to create simulation job, as shown in the above diagram :

    • Master Node Job Creation - This is the default behavior. From a base case, simulation jobs are created at the master node and then distributed to the remote nodes to perform the simulation. If you have many simulation jobs, sending thousands of simulation jobs real-time through the computer network may take significant time and might congest the network.
    • Remote Node Job Creation - In many situations, it may be more efficient to distribute the base case once to the remote nodes once. For example, in the case of N-1 contingency analysis, the base case and a list of contingencies could be sent to a remote node. Then simulation job for a contingency, for example, opening line Bus1->Bus2, could be created at the remote node.

When defining InterPSS custom run Xml scripts, you can set the remoteJobCreation to true to tell InterPSS to perform remote node job creation.

<ipss:gridRun>

<ipss:enableGridRun>true</ipss:enableGridRun>

<ipss:remoteJobCreation>true</ipss:remoteJobCreation>

...

</ipss:gridRun>

Near Real-time Batch Processing Based Solution

Ref [4] describes an near real-time batch processing based parallel simulation approach. In the approach, a job scheduler is used to start multiple PSS/E instances to performance power system simulation in parallel. In information technology world, this approach falls into the near real-time batch processing category for simulation process automation. While there are some benefits, it is known that this kind of approaches hava serious drawbacks. Large IT organizations around the world are in the process to migrate their batch processing architecture to the real-time service oriented architecture (SOA).

Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kendall, SUN Micro System, in Ref. [3], wrote "We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. ... Further, work in distributed object-oriented systems that is based on a model that ignores or denies these differences is doomed to failure, and could easily lead to an industry-wide rejection of the notion of distributed object-based systems." Current power system simulation commercial software in the market were conceived 20-30 years ago and intended to run on one computer, or "single address space". For example, none of them are multi-thread safe.

Reference

[1] W.F. Tinney, et al, "Direct Solution of Sparse Network Equation by Optimal Ordered Triangular Factorization", Proceeding of IEEE, Vol.55 No.11, pp 1801~1809, 1967

[2] B. Stott, et al, "Fast Decoupled Load Flow", IEEE Trans on PAS, Vol. PAS-93(3), pp.859~869, Mar 1974

[3] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kendall, "A Note on Distributed Computing"

[4] Yachi Lin, et al, "Simple Application of Parallel Processing Techniques to Power System Contingency Calculations", N American T&D Conf & Expo June 13-15, 2006, Montreal, Canada