#### Performance Analysis of Multiple Bus

### Interconnection Networks with Hierarchical Requesting Model

## Jang-Ping Sheu

#### Department of Electrical Engineering National Central University

#### Abstract

In this paper, we study the performance of multiple bus networks with full bus-memory connection, single bus-memory connection, and partial bus-memory connection. Besides, we also propose one type of multiple bus networks, called partial bus networks with K classes. Under a nonuniform requesting model, hierarchical requesting model, the performance of the above multiple bus networks is analyzed. The cost and fault-tolerant capability of each multiple bus network is evaluated and compared with one another. It can be shown that the proposed networks are useful in applications requiring high performance and degree of fault-tolerance with moderate cost.

#### I. Introduction

With the advent of VLSI technologies, a great deal of attention has been paid to the design of multiprocessor systems to achieve high levels of computation power. However, the performance of a multiprocessor system depends significantly on the efficiency of its interconnection network. Several interconnection networks have been proposed for the multiprocessor systems, such as the crossbar, single bus, multiple bus, multistage interconnection networks and others [5].

The multiple bus networks with the following features become an attractive solution for connecting processors and memory modules in a multiprocessor system [4,8]. First, they provide a moderate throughput and cost comparing to that of the single bus networks and the crossbars. Second, they allow easy incremental expansion as the number of processors, memory modules, and buses grow. Finally, the multiple bus networks possess fault-tolerant capability. In case a bus fails, the multiprocessor system can still function with other nonfaulty ones.

This paper is concerned with studying the performance of various multiple bus multiprocessor systems containing N processors, M memory modules, and B buses, where the memory modules are shared among all processors and  $B \leq \min(M,N)$ . One type of N x M x B multiple bus networks is shown in Figure 1.1. There are N processors, M memory modules, and B buses. Each bus is connected to all N processors and M memory modules. Many performance measures can be used to evaluate a system. Here, we shall use the effective memory bandwidth as a performance metric. The memory bandwidth is defined as the number of successful requests per memory cycle [1,2,6,11]. Wen-Tsuen Chen

#### Institute of Computer and Decision Sciences National Tsing Hua University

In this paper, we propose an architecture of N x M x B multiple bus networks, called partial bus networks with K classes. In this architecture, each processor is connected to all buses. However, each memory module is connected only to a subset of buses. It is more flexible and less costly than that of multiple bus network with full connection of each processor and memory module to all buses. Under a nonuniform memory reference model, called hierarchical requesting model [3], the performance of the proposed N x M x B networks and other earlier proposed ones is analyzed and compared with one another. The cost and fault-tolerance of these networks are also evaluated.

# II. The Multiple Bus Networks and Their Cost

In this section, we first define various types of multiple bus networks, then evaluate their cost and fault-tolerant capabilities.

#### A. Multiple bus networks

Performance analyses of the multiple bus networks have appeared recently in several papers [4,6,7,10,11,12]. All the above authors focus their attention on the multiple bus network with full connection of each processor and memory module to all the buses. Such a multiple bus network is too costly for large multiprocessor systems. Lang et al. [8] proposed a less costly type of multiple bus networks, called partial bus networks. In an N x M x B partial bus architecture, the shared memory modules and buses are divided into g groups. All the processors are connected to all the buses, whereas each group of M/g memory modules is connected to a set of B/g buses. Figure 2.1 shows a partial bus network with g = 2.

Here, we propose another architecture of N x M x B multiple bus networks, called partial bus networks with K classes. In this type of network, there are K classes of memory modules, where K  $\leq$  B. The memory modules in class C<sub>k</sub> are connected to B buses from bus 1 to bus B, memory modules in C<sub>k-1</sub> are connected to B - 1 buses from bus 1 to bus B - 1. In general, memory modules in class C<sub>i</sub> are connected to i + B - K buses from bus 1 to bus i + B - K, for 1  $\leq$  i  $\leq$  K. A 3 x 6 x 4 partial bus network with three classes is shown in Figure 2.2.

With our proposed networks, we can have the following two principles for the memory modules being connected to the buses in order to enhance system fault-tolerance and performance. One is that the memory modules which need higher fault-tolerance for buses failure are connected to more number of buses than those which need lower fault-tolerance for buses failure. The other is that the memory modules which are more frequently referenced are



Figure 1.1 An Nx M x B multiple bus network.



Figure 2.1 An N x M x B partial bus network with g = 2.

connected to more number of buses than those which are less frequently referenced. As will be clear in the following sections, the performance and cost of the partial bus networks with K classes are close to the partial bus networks with g = 2. However, the fault-tolerance of the former is more flexible than that of the latter.

For ease of description, we give the following definitions for various types of multiple bus networks. A multiple bus network with full bus-memory connection is one with each processor and each memory module connected to all buses. A multiple bus network with single bus-memory connection is one with each processor connected to all buses, but each memory module is only connected to a single bus. A multiple bus network with partial bus-memory connection means that each processor of the network is connected to all buses, but each of its memory module can be connected to a subset of buses. The partial bus networks proposed in [8] and the partial bus networks with K classes are two examples of the multiple bus networks with partial bus-memory connection.

In the above N x M x B multiple bus networks, two types of requesting conflicts can occur. One type of conflict arises when more than one processor attempts to access the same idle memory module simultaneously, or a referenced memory module might be busy at the requesting time. This is called memory contention or memory interference. The other type of conflict arises when one or more processors attempt to access an idle memory module but no buses are available. This is called bus contention or bus interference. A two-stage arbitration scheme proposed by Lang et al. [8] can be used to resolve memory and bus contentions.



Figure 2. 2 A 3 x 6 x 4 partial bus network with three classes .

## **B.** Cost analysis

The cost and fault-tolerance of the partial bus networks with K classes will be evaluated in the following. The cost of the multiple bus networks is measured by the number of connections and the load of each bus. It is obvious that the cost of a multiple bus network is proportional to the number of connections in the network. The capacitive loads and drive requirements of a bus are proportional to the number of connections on the bus.

The number of connections of the N x M x B partial bus network with K classes is proportional to

$$NB + \sum_{i=1}^{K} Mi(i + B - K),$$

where Mi is the number of memory modules in class  $C_i$ , for  $1 \leq i \leq K$ . According to the connection scheme of this type of networks, bus B is connected by the memory modules in class  $C_k$ , bus B-1 is connected by the memory modules in class  $C_k$  and class  $C_{k-i}$ . In general, bus i is connected by the memory modules which belong to classes K, K-1, ..., max(i+K-B, 1). Thus, the load of each bus i is proportional to

$$N + \sum_{i=\max(i+\kappa-B,1)}^{\kappa} M_{i}, \text{ for } 1 \le i \le B$$

Because the memory modules in class  $C_1$  are only connected to B - K + 1 buses, the degree of fault-tolerant of this network is equal to B - K. However, accesses to the memory modules in class  $C_i$  are more fault-tolerant than those to class  $C_{i-1}$ , for  $i \leq K$ .

The cost and fault-tolerant capability of various types of multiple bus networks are summarized in Table 2.1. In view of the results of Table 2.1 we conclude that the cost and fault-tolerant capability of networks with partial bus-memory connection scheme are intermediate between the networks with full bus-memory connection and the networks with single bus-memory connection. In section IV, the performance-cost ratio of various types of multiple bus networks will be compared with one another.

### III. Performance Analysis

Most of the previous performance analyses of the multiple bus networks are based on the assumption that a processor addresses any one of the shared memory modules with the same probability. In Das and Bhuyan [4], they assumed that each processor is likely to address a

<u>Table 2.1</u> The cost and fault-tolerance of various multiple bus networks

| Connection<br>schemes                                   | No. of connections              | Load of each<br>bus i                  | Degree of<br>Fault-<br>Tolerance |
|---------------------------------------------------------|---------------------------------|----------------------------------------|----------------------------------|
| Multiple bus<br>with full<br>bus-memory<br>connection   | B(N + M)                        | N + M                                  | B – 1                            |
| Multiple bus<br>with single<br>bus-memory<br>connection | BN + M                          | N + Mi                                 | 0                                |
| Partial bus<br>network                                  | B(N + M/g)                      | N + M/g                                | B/g-1                            |
| Partial<br>bus network<br>with K<br>classes             | $NB + \sum_{i=1}^{K} Mi(i+B-K)$ | $N + \sum_{j=\max(i+k-B,1)}^{k} M_{j}$ | B – K                            |

particular memory module more frequently than others. The equally likely requesting case is a special case of the Das's model. In this paper, a general memory reference model, called hierarchical requesting model, is proposed. Under this model, the performance of various bus-memory connection schemes of the N x M x B multiple bus networks is analyzed. In the following, we shall describe the hierarchical requesting model.

## A. The hierarchical requesting model

In the multiprocessing environment, a job to be run on the system usually consists of a set of communicating tasks. To execute these tasks efficiently, the system should be organized in such a way that communication overhead among these tasks is minimized. The task assignment procedure should assign those tasks that have large amount of communications to the same processor or to a cluster of processors with low communication cost. It leads to that the traffic between a processor and other processors belonging to the same cluster is higher than that with those processors belonging to other clusters.

To model such a system, a hierarchical requesting model is proposed [3]. We assume that a cluster of processors have a cluster of memory modules as their favorite memory modules. These memory modules may be used for storing the tasks assigned for these processors. Besides, the relations of the processors with their favorite memory modules can be classified into n-level hierarchy. Each processor has different fractions of requests to the memory modules belonging to different level of subclusters. In the following, we shall describe the n-level hierarchy for the N x N x B multiple bus networks and N x M x B multiple bus networks.

For an N x N x B multiple bus network, we assume that  $N = k_1 k_2 \dots k_n$ . Each processor  $P_i$  has a memory module  $MM_i$  as its favorite memory module, for  $1 \le i \le N$ . These processors and memory modules are organized into an n-level hierarchy. First, the N processors and memory modules are partitioned into  $k_1$  clusters in the first level, each cluster contains N/k<sub>1</sub> pairs of processor-memory. In the second level, each of  $k_1$  clusters is partitioned into  $k_2$  subclusters with equal size, and so on. Finally, each subcluster in the (n-1)th level contains  $k_n$  pairs of processor-memory. For example, with a three-level

hierarchy, an N x N x B network, where  $N = k_1 k_2 k_3$ , can be partitioned into  $k_1$  clusters, and each cluster can be partitioned into  $k_2$  subclusters again. Finally, each subcluster contains  $k_3$  pairs of processor and its favorite memory module.

With an n-level hierarchy, there are n+1 different requesting rates for a processor accessing to the memory modules in different subclusters. On the other hand, each memory module is requested by different processors with different requesting rates. Each processor has n+1 types of requests: namely, requests to its favorite memory module with fraction  $m_0$ , requests to each of other memory modules in the same subcluster of the (n-1)th level except its favorite memory module with fraction  $m_1$ , requests to each of the memory modules in the same subcluster of the (n-2)th level excluding the memory modules in the (n-1)th level with fraction  $m_2$ , requests to each of the memory modules in the same subcluster of the (n-3)th level excluding the memory modules in the (n-2)th level with fraction  $m_3$ , and so on. We assume that the 0th level includes the whole network.

In general, the fraction of a processor requesting connection to its favorite memory module is higher than that of requesting connection to nonfavorite ones. Furthermore, the fraction of requests to a memory module within the same subcluster is higher than that of requests to a memory module in other subclusters. That is,  $m_0 > m_1 > \dots > m_n$ . Let  $N_i$  be the number of processors or memory modules belonging to the same subcluster in the (n-i)th level, excluding those in the (n-i+1)th level, for  $0 \le i \le n$ . The  $N_i$ 's can be computed as follows

$$N_{i} = (k_{n-i+1} - 1) k_{n-i+2} \dots k_{n-1} k_{n}, \text{ for } 1 \leq i \leq n, \text{ and}$$

$$\sum_{i=1}^{n} N_{i} = 1, \quad \text{where } N_{0} = 1. \quad (1)$$

For example, with a three-level hierarchy, let  $N = k_1 k_2 k_3$ . From formula (1), we have  $N_0 = 1$ ,  $N_1 = k_3 - 1$ ,  $N_2 = (k_2 - 1) k_3$ , and  $N_3 = (k_1 - 1) k_2 k_3$ .

In the following, let us consider an N x M x B multiple bus networks, N and M are restricted to  $N = k_1 k_2 \dots k_{n-1} k_n$  and  $M = k_1 k_2 \dots k_{n-1} k'_n$ , respectively. The partitioning way of the N x M x B networks is the same as that of the N x N x B networks. However, each subcluster in the (n-1)th level of the N x M x B networks contains  $k_n$  processors and  $k'_n$  memory modules. The  $k'_n$ memory modules in the (n-1)th level are used as the favorite memory modules of the  $k_n$  processors with the same subcluster. We assume that each processor with equal probability requests to any one of its favorite memory modules. For example, with a three-level hierarchy, an N x M x B network, where  $N = k_1 k_2 k_3$  and  $M = k_1 k_2 k'_3$ , can be partitioned into  $k_1$  clusters, and each cluster can be partitioned into  $k_2$  subclusters again. Finally, each subcluster contains  $k_3$  processors and  $k'_3$ favorite memory modules.

With an n-level hierarchy, there are n different requesting rates for a processor accessing to the memory modules in different subclusters. In fact, the requesting case of the N x M x B networks can be treated as that a

i=0

special case of the N x N x B networks with  $m_0 = 0$ . So, we only consider the case of N x N x B multiple bus networks in the performance analysis. The performance of the N x M x B networks can be obtained from the formulas derived in the case of N x N x B networks with  $m_0 = 0.$ 

Based on the hierarchical requesting model, the performance of various types of N x N x B multiple bus networks is analyzed with the following assumptions:

1) The multiple bus networks operate in a synchronous mode. The requests of all processors are issued at the same time and each processor has an identical memory cycle time. 2) Each processor P<sub>i</sub> generates random and independent

requests.

requests. 3) At the beginning of every memory cycle, each processor generates a new request with probability r. Thus, r is also the average number of requests generated per memory cycle by each processor. 4) The propagation delays and arbitration times associated with the multiple bus networks are included

in the memory cycle time.

5) The requests which are blocked (not accepted) are ignored. That is, the requests issued at the next cycle are independent of the previous cycle.

### B. The multiple bus networks with full bus-memory connection

The performance of the multiple bus networks with various bus-memory connection schemes can be analyzed by considering memory interference and bus interference in the networks. First, we consider the memory interference. Let X be the probability that there is at least one request for a particular memory module  $MM_i$ . Let  $P_0$  be the probability of a processor  $P_j$  requesting a connection to its favorite memory module  $MM_j$ . Let  $P_i$  be the probability that at least one request is generated by those processors which have fraction  $m_i$  requesting connection to memory module  $MM_j$ , for  $1 \le i \le n$ . The number of processors with fraction  $m_i$  requesting connection to  $MM_j$  is equal to  $N_i$  as given in formula (1). It follows that

$$P_{i} = 1 - (1 - r m_{i})^{N_{i}}$$

Hence, the probability of at least one processor requesting connection to MM; is

...

$$X = 1 - (1 - P_0)(1 - P_1) \dots (1 - P_n)$$
  
= 1 - (1 - r m\_0)(1 - r m\_1)^{N\_1} \dots (1 - r m\_n)^{N\_n}. (2)

Second, we consider the bus interference. The multiple bus network with B buses can allow at most B requests per memory cycle. With the probability X of at least one processor requesting connection to a memory module given by (2), the probability that exactly i of the N memory-request arbiters output a memory request is given by

$$P(i) = {\binom{N}{i}} X^{i} (1 - X)^{N - i}.$$
 (3)

The network gets saturated when more than B requests are generated and allows only B processor-memory connections simultaneously. As a result, the memory bandwidth MBWf of the multiple bus networks with full bus-memory connection is given by

$$MBWf = \sum_{i=1}^{B} i P(i) + \sum_{i=B+1}^{N} B P(i)$$
$$= \sum_{i=1}^{N} i P(i) - \sum_{i=B+1}^{N} (i-B) P(i)$$
$$= NX - \sum_{i=B+1}^{N} (i-B) P(i). \quad (4)$$

The memory bandwidth MBWs of the multiple bus networks with single bus-memory connection can be derived as follows. Let Mi be the number of memory modules connected to the bus i, for  $1 \leq i \leq B$ . Then the probability that there is at least one memory service in bus i is given by

$$Y_i = 1 - (1 - X)^{Mi},$$
 (5)

- - -

where X is the probability that there is at least one processor requesting to a particular memory module and is given by (2). Thus, the memory bandwidth of the network is expressed as

$$MBWs = \sum_{i=1}^{B} Y_i.$$
 (6)

## C. The partial bus networks

In this subsection, we shall derive the performance of the partial bus networks under the hierarchical requesting model. In fact, the memory interference analysis for the partial bus networks is the same as before, since it is independent of the bus configuration. However, the bus interference analysis needs some modification.

Assume that an N x N x B partial bus network is partitioned into g equal groups. Each of N/g memory modules is connected to all B/g buses. The equation resulted from the memory interference is the same as equation (2). Consider the bus interference in each group of buses. Equation (3) can be rewritten as follows

$$P_{\mathbf{g}}(\mathbf{i}) = {\binom{N/g}{\mathbf{i}} \mathbf{X}^{\mathbf{i}} (1 - \mathbf{X})^{N/g - \mathbf{i}}}.$$
 (7)

Because each group of B/g buses allows at most B/g requests, the memory bandwidth of each subnetwork with B/g buses and N/g memory modules is given by

$$MBWg = \sum_{i=1}^{B/g} \sum_{i=1}^{N/g} P_{g}(i) + \sum_{i=B/g+1}^{N/g} \sum_{i=B/g+1}^{N/g} P_{g}(i)$$
$$= \sum_{i=1}^{N/g} P_{g}(i) - \sum_{i=B/g+1}^{N/g} \sum_{i=B/g+1}^{N/g} P_{g}(i)$$
$$= (N/g) X - \sum_{i=B/g+1}^{N/g} \sum_{i=B/g+1}^{N/g} P_{g}(i). (8)$$

The memory bandwidth MBWp of the partial bus networks can be obtained from the summation of g groups of subnetworks. Hence, we obtain

$$MBW_{P} = g MBWg$$
  
= N X -  $\sum_{i=B/g+1}^{N/g} (g i - B) P_{g}(i).$  (9)

If g = 1, then equation (9) is equal to equation (4).

## D. The partial bus networks with K classes

Before presenting the memory bandwidth of the N x N x B partial bus networks with K classes, we give a simple and fair bus-assignment procedure for assigning the requested memory modules in each class  $C_i$  to their i + B - K connected buses. The bus-assignment procedure can be divided into two steps that are described in [9].

In the first step, it concerns to select the requested memory modules from each class  $C_i$  and assign them to the i + B - K connected buses. For each class  $C_i$ , min(i+B-K, R) memory modules from the R memory modules which have at least one request are selected. The selected memory modules in class  $C_i$  are assigned to the buses from bus i+B-K to bus i + B - K - min(i+B-K, R) + 1. For example, let B = 4 and K = 3. The memory modules in class  $C_2$  are connected buses 3, 2, and 1. If there are three requested memory modules are selected from class  $C_2$ , then the buses 3, 2, and 1 will be assigned to the selected memory modules. After the first step, a bus i may be requested by several memory-request arbiters from different classes. In the second step, each bus arbiter makes assignment in a random selection or cyclic fashion. Based on the bus-assignment procedure, the memory bandwidth of the network can be derived by the following method.

Assume that each class  $C_i$  contains Mi memory modules, for  $1 \leq i \leq K$ . In the part of memory interference, let X be the probability that there is at least one request for a particular memory module  $MM_{jk}$  which belongs to class  $C_j$  and X is derived from equation (2). In the part of bus interference, let  $Y_i$  be the probability that there is at least one memory-request arbiter output a request in bus i. Then the memory bandwidth of the partial bus network with K classes is equal to

$$MBWp' = \sum_{i=1}^{B} Y_i$$

The formulas of  $Y_i$ 's are derived as follows. Given the X, the probability that exactly m memory services are requested to the memory modules in class  $C_j$  is given by

$$P_{j}(\mathbf{m}) = \begin{pmatrix} \mathbf{M}j \\ \mathbf{m} \end{pmatrix} \mathbf{X}^{\mathbf{m}} (1-\mathbf{X}), \quad \text{for } 1 \leq j \leq \mathbf{K}.$$
(10)

From the connection scheme of the network, the bus i is connected by the memory modules which belong to classes K, K - 1, ..., max(i+K-B, 1). According to the bus-assignment procedures, the bus B will be requested if there is at least one memory service in the class  $C_k$ . Thus,

## $\mathbf{Y}\mathbf{B} = 1 - P_{\mathbf{k}}(0).$

The case that bus B - 1 is not requested is given by the conditions of no any memory service in class  $C_{k-1}$  and at most one memory service in class  $C_k$ . That is,

$$Y_{B-1} = 1 - P_{k-1}(0)(P_k(0) + P_k(1))$$

In general, the case that bus i is not requested is given by the conditions of no any memory service in class  $C_{i+k-B_i}$ , at most one memory service in class  $C_{i+k-B_{+1}}$ , at most two memory services in class  $C_{i+k-B_{+2}}$ ..., and at most B - imemory services in class  $C_k$ . Notice that, if the subscript d of a class  $C_d$  is smaller than one, then this class is a dummy (empty) class. Then equation (10) for the dummy class  $C_d$  is

$$\begin{array}{rll} P_{\mathbf{d}}(0) &= 1, & \text{and} \\ P_{\mathbf{d}}(\mathbf{m}) &= 0 & \text{for } \mathbf{m} > 0, & \text{where } \mathbf{d} \leq 0. \end{array}$$

Let A = i+K-B, then

$$Y_{i} = 1 - P_{A}(0)(P_{A+1}(0) + P_{A+1}(1)) \dots (P_{k}(0) + \dots + P_{k}(B-i))$$
  
= 1 - 
$$\prod_{i=A}^{K} \sum_{m=0}^{j-A} P_{j}(m), \qquad (11)$$

where  $P_j(m)$  is equal to zero if m > Mj. Hence, the memory bandwidth of the partial bus network with K classes is

$$MBWp' = \sum_{i=1}^{B} (1 - \prod_{j=A}^{K} \sum_{m=0}^{j=A} P_j(m))$$
  
= 
$$B - \sum_{i=1}^{B} \prod_{j=A}^{K} \sum_{m=0}^{j-A} P_j(m).$$
(12)

Notice that, if K = 1, i.e., there is only one class, then equation (12) is equal to equation (4). Therefore, the multiple bus networks with full bus-memory connection can be considered as a special case of the partial bus networks with K classes.

## IV. Numerical Results

In this section, we give some numerical results obtained from our analyses for the N x N x B multiple bus networks with various bus-memory connection schemes. The results are evaluated under the two-level hierarchy and uniform requesting model for r = 1.0 and 0.5. In the uniform requesting case, each processor requests connection to all the memory modules with equal probability. In the two-level hierarchy, we assume that an N x N x B network is partitioned into four clusters, and each cluster contains N/4 processors and memory modules. Each processor with probability 0.6 addressing to its favorite memory module, probability 0.3 addressing to other memory modules within the same cluster, and probability 0.1 addressing to the memory modules in other clusters.

Table 4.1 and Table 4.2 list the memory bandwidth of the N x N x B networks with full bus-memory connection for various values of N and B. The results show that the memory bandwidth in the two-level hierarchy is higher than that in the uniform requesting model for various values of N and B. Under the two-level hierarchy with r = 1.0, a processor requests its favorite memory module most of the time, thereby reducing the memory access conflicts. The memory bandwidth of the system is then most affected by the number of buses. If a processor generates a request in every cycle, then the network should have at least N/2 buses to provide comparable performance with that of the network with N buses or a crossbar network. However, for r = 0.5, Table 4.2 shows that the network with B = N/2 buses performs close to that of network with B = N buses. When r is less than 1, the network with large number of buses is underutilized. The results indicate that the number of buses for the networks should be determined by taking both requesting rate r and requesting pattern into consideration.

|  | Ta | ble | 4. | ] |
|--|----|-----|----|---|
|--|----|-----|----|---|

Memory bandwidth of N x N x B networks with full bus-memory connection for r = 1.0.

| No                                                                                                        | N = 8                                                                                  | N = 12                                               | N = 16                                               |
|-----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|
| Buses                                                                                                     | Hier. Unif                                                                             | Hier. Unif.                                          | Hier. Unif.                                          |
| $     \begin{array}{c}       1 \\       2 \\       4 \\       8 \\       12 \\       16     \end{array} $ | $\begin{array}{cccc} 1.0 & 1.0 \\ 2.0 & 2.0 \\ 3.97 & 3.87 \\ 5.98 & 5.25 \end{array}$ | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ |
| N x N<br>Crossbar                                                                                         | 5.98 5.25                                                                              | 8.86 7.78                                            | 11.78 10.30                                          |

 $\label{eq:table 4.2} \underline{Table 4.2}$  Memory bandwidth of N x N x B networks with

|     | ~            |            |       |        | _ |
|-----|--------------|------------|-------|--------|---|
| ful | l bus-memory | connection | for r | = 0.5. |   |
|     |              |            |       |        |   |

| No.<br>of<br>Buses           | N = 8                                            | N = 12                                               | N = 16                                                                                                                  |  |
|------------------------------|--------------------------------------------------|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|--|
|                              | Hier. Unif.                                      | Hier. Unif.                                          | Hier. Unif.                                                                                                             |  |
| 1<br>2<br>4<br>8<br>12<br>16 | 0.99 0.98<br>1.91 1.88<br>3.15 2.99<br>3.47 3.23 | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ | $\begin{array}{ccccccc} 1.0 & 1.0 \\ 2.0 & 2.0 \\ 3.95 & 3.91 \\ 6.52 & 6.15 \\ 6.86 & 6.37 \\ 6.87 & 6.37 \end{array}$ |  |
| N x N<br>Crossbar            | 3.47 3.23                                        | 5.16 4.80                                            | 6.87 6.37                                                                                                               |  |

Table 4.3 lists the memory bandwidth of the multiple bus networks with single bus-memory connection. The memory bandwidth of the network is evaluated under the case that N memory modules are distributed over the B buses. That is, each bus is connected by N/B memory modules. Table 4.4 lists the memory bandwidth of the partial bus networks with g = 2. The load of each bus is proportional to N + N/2. The results show that the memory bandwidth with the two-level hierarchical requesting model is higher than that with the uniform requesting model for various values of N and B. Comparison of the results in Table 4.3 and Table 4.4 shows that the memory bandwidth of the partial bus networks with g = 2 is higher than that of the networks with single bus-memory connection scheme. However, the cost of the networks with single bus-memory connection is less than that of the partial bus networks.

## <u>Table 4.3</u>

Memory bandwidth of N x N x B networks with single bus-memory connection.

| r = 1.0                                                                                                   |                                                                                                     |                                                      |                                                                                                                            |
|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| No.                                                                                                       | N = 8                                                                                               | N = 16                                               | N = 32                                                                                                                     |
| Buses                                                                                                     | Hier. Unif.                                                                                         | Hier. Unif.                                          | Hier. Unif.                                                                                                                |
| $     \begin{array}{c}       1 \\       2 \\       4 \\       8 \\       16 \\       32     \end{array} $ | 1.0         1.0           1.99         1.97           3.74         3.53           5.97         5.25 | $\begin{array}{rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr$ | $\begin{array}{cccccccc} 1.0 & 1.0 \\ 2.0 & 2.0 \\ 4.0 & 4.0 \\ 7.96 & 7.86 \\ 14.87 & 13.90 \\ 23.48 & 20.41 \end{array}$ |

 Table 4.4

 Memory bandwidth of N x N x B partial bus networks with g = 2.

| r = 1.0                 |                                     |                                                 |                                                      |
|-------------------------|-------------------------------------|-------------------------------------------------|------------------------------------------------------|
| No.                     | N = 8                               | N = 16                                          | N = 32                                               |
| Buses                   | Hier. Unif.                         | Hier. Unif.                                     | Hier. Unif.                                          |
| 2<br>4<br>8<br>16<br>32 | 1.99 1.97<br>3.89 3.73<br>5.97 5.25 | 2.0 2.0<br>4.0 3.99<br>7.92 7.71<br>11.78 10.30 | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ |

Table 4.5 lists the memory bandwidth of the partial bus network with K classes. The results are obtained from the case of the number of classes K = B and each class contains N/K memory modules. The number of connections in the network is proportional to NB + (B + 1)N/2. This connection cost is nearly equal to the partial bus networks with g = 2. The memory bandwidth of both networks are also very close for various values of N and B. In the partial bus networks with K classes, the memory modules which belong to class C<sub>i</sub> can tolerate at least i + B - K - 1 buses failure, while all the memory modules in the partial bus networks can tolerate B/g - 1 buses failure. The fault-tolerance of the former is more flexible than that of the latter.

<u>Table 4.5</u>

memory bandwidth of N x N x B partial bus network with K = B classes.

| r = 1.0                 |                                    |                                                      |                                                      |
|-------------------------|------------------------------------|------------------------------------------------------|------------------------------------------------------|
| No.                     | N = 8                              | N = 16                                               | N = 32                                               |
| Buses                   | Hier. Unif.                        | Hier. Unif.                                          | Hier. Unif.                                          |
| 2<br>4<br>8<br>16<br>32 | 2.0 1.98<br>3.85 3.68<br>5.97 5.25 | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ |

From the performance-cost ratio comparison, the network with single bus-memory connection is more cost-effective than the partial bus networks. However, the single bus-memory connection scheme lacks fault-tolerance. Similarly, comparison of the results of Table 4.1, Table 4.3, Table 4.4 and Table 4.5 shows that the performance of the networks with full bus-memory connection is higher than that of the partial bus networks, but the multiple bus with full bus-memory connection is less cost-effective. The performance, cost, and fault-tolerant capability of the networks with partial bus-memory connection scheme are intermediate between the networks with single bus-memory connection and the networks with full bus-memory connection.

#### V. Conclusions

In this paper, we propose an architecture of N x M x B multiple bus networks, called partial bus networks with K classes. It is more flexible and less costly than that of multiple bus network with full connection of each processor and memory module to all buses. We also propose a general memory reference model, called hierarchical requesting model. Under this model, the performance of the multiple bus networks with various types of bus-memory connections is analyzed. The cost and fault-tolerant capability for various types of N x M x B networks are also compared with one another. The numerical results show that the memory bandwidth of all the networks in the hierarchical requesting case.

The multiple bus networks with full bus-memory connection has higher memory bandwidth but less cost-effective than all the other types of multiple bus networks. The multiple bus network with single bus-memory connection is the most cost-effective, but it lacks fault-tolerance. The performance, cost, and fault-tolerant capability of the networks with partial bus-memory connection scheme are intermediate between the networks with single bus-memory connection and the networks with full bus-memory connection. Which type of the networks is selected would depend on the requirement of the multiprocessor systems.

## REFERENCES

- F. Baskett and A. J. Smith, "Interference in multiprocessor computer systems with interleaved memory," *Commun. of ACM*, Vol. 19, No. 6, pp. 327-334, Jun. 1976.
- [2] D. P. Bhandarkar, "Analysis of memory interference in multiprocessors," *IEEE Trans. Comput.*, Vol. C-24, pp. 897-908, Sept. 1975.
- [3] W. T. Chen and J. P. Sheu, "Performance analysis of multistage interconnection networks with hierarchical requesting model," to appear in *IEEE Trans. Comput*
- [4] C. R. Das and L. N. Bhuyan, "Bandwidth availability of multiple-bus multiprocessors," *IEEE Trans. Comput.*, Vol. C-34, pp. 918-926, Oct. 1985.
- [5] T. Y. Feng, "A survey of interconnection networks," *IEEE Computer*, pp. 12-27, Dec. 1981.
- [6] A. Goyal and T. Agerwala, "Performance analysis of future shared storage systems," *IBM J. Res. Develop.*, Vol. 28, No. 1, pp. 95-108, Jan. 1984.
- [7] K. B. Irani and I. H. Onyuksel, "A closed form solution for the performance analysis of multiple-bus multiprocessor systems," *IEEE Trans. Comput.*, Vol. C-33, pp. 1004-1012, Nov. 1984.
- [8] T. Lang, M. Valero, and I. Alegre, "Bandwidth of crossbar and multiple-bus connections for multiprocessors," *IEEE Trans. Comput.*, Vol. C-31, pp. 1227-1234, Dec. 1982.
- [9] T. Lang M. Valero and M. A. Fiol, "Reduction of connections for multibus organization," *IEEE Trans.* Comput., Vol. C-32, pp. 707-716, Aug. 1983.
- [10] M. A. Marsan and M. Gerla, "Markov models for multiple bus multiprocessor systems," *IEEE Trans. Comput.*, Vol. C-31, pp. 239-248, Mar. 1982.
- [11] T. N. Mudge and H. B. Al-Sadoun, "A semi-markov model for the performance of multiple-bus systems," *IEEE Trans. Comput.*, Vol. C-34, pp. 934-942, Oct. 1985.
- [12] D. Towsley, "Approximate models of multiple bus multiprocessor systems," *IEEE Trans. Comput.*, Vol. C-35, pp. 220-228, Mar. 1986.