Computer Science
MIMD
MIMD, or Multiple Instruction, Multiple Data, is a parallel computing architecture where multiple processors execute different instructions on different pieces of data simultaneously. This allows for independent processing of multiple tasks, making it suitable for complex and diverse computational workloads. MIMD systems can be either shared memory or distributed memory architectures.
Written by Perlego with AI-assistance
Related key terms
1 of 5
12 Key excerpts on "MIMD"
- eBook - PDF
- Henry Levy, Richard Eckhouse(Authors)
- 2014(Publication Date)
- Digital Press(Publisher)
Model of an MIMD computer 380 Parallelism and Parallel Computer Systems • MIMD (multiple instruction stream, multiple data stream). An MIMD machine, shown in Figure 16.2d, has multiple independent control units, each of which fetches its own instructions. Each control unit then issues instructions to its own arithmetic processor,which fetches operands, possibly from a large shared memory. Most contemporary microprocessor-based multiprocessors are MIMD machines. The MIMD structure accommodates both the execution of multiple inde-pendent processes (for example, for timesharing) and the execution of single, multithreaded applications (that is, parallel programs). The significance of Flynn's classification scheme is that it makes it easier to describe systems. For example, if a machine is built that includes the ability to operate on a vector, such as performing a simple arithmetic operation like add, subtract, multiply, or divide, to all the elements of the vector at the same time, then this machine can be classified as a SIMD machine. Alternatively, if several computers are interconnected so as to share the same memory and file space, but they operate independently of each other, then that machine can be classified as a MIMD computer. There are, of course, many other classifications, depending on the underlying architecture of the machine. Gordon Bell described machines based on their degree of parallelism. His four categories, shown in Table 16.1, relate more to the needs of the application than the specific hardware. Bell's model is based on the granularity of parallelism permit-ted by a particular system. A system with fine-grained parallelism permits an application to spawn multiple parallel threads, each of which executes only a few instructions. In such a system, the overhead of managing the parallel activities must be small, so that the benefits of executing such small parallel tasks are not dominated by that overhead. - eBook - PDF
- Harry Wechsler(Author)
- 2014(Publication Date)
- Academic Press(Publisher)
MIMD is concerned with interac-tive processes that share resources and is thus characterized by asynchronous parallelism. Loosely coupled and tightly coupled PEs are the two main subdivisions within MIMD. Multicomputers is the 9.4. Multiple Instructions Multiple Data (MIMD) 481 first subclass, and its operating mode is message passing, while the second subclass is labeled multiprocessors, which derives its functiona-lity from PE sharing memory. As we move from SIMD to MIMD, note that SIMD and MIMD are equivalent in that they can simulate each other. An SIMD machine could interpret the PE data as different instructions, while the MIMD could execute only one instruction rather than many across the PE array. There are no strict boundaries between architectures as we have seen so far, and the basic question is that of efficiency and cost for solving a specific problem. We proceed by looking into the multicomputers and multiprocessors classes, respectively. 9.4.1. Message Passing Multicomputers Message-passing multicomputers are lattices of PE nodes connected by a message-passing network. The basic computational paradigm is that of concurrency of processes, where processes are instances of programs. The PE include private memory, and there is a global name space (PE # , process # ) for variables across the multicomputer. The (N x N) network is the binary ra-cube or mesh and facilitates locality of communication between the N nodes. Multiprogramming operating systems available at the PE and coordination through message-passing facilitate concurrency. Thus, the multicomputers constitute a physical and logical distributed system. Athas and Seitz (1988) provide a good review and taxonomy for such systems. According to the grain size—medium (Mbyte of memory per PE) or fine (Kbyte of memory per PE)—different architectures can be defined. - eBook - PDF
- Eduard L Lafferty(Author)
- 2012(Publication Date)
- William Andrew(Publisher)
SECTION 2 MACHINE ARCHITECTURES 2.1 Introduction There are several architecturaldimensions to consider in parallel computing systems. How many processors are being used-tens, hundreds, or thousands? How are they interconnected? Can the system be expanded incrementally? Can the architecture be scaled up to support a large number of processors? How powerful are the individual processors? Are they single-bitprocessors,commercialmicroprocessors, or powerful custom processors? What is the granularityof parallelism that is supported: fine-grained (at the instructionlevel); medium-grained (at the cooperating concurrent task level with process synchronization occurring every few hundred machine instructions); or coarse-grained (at the nearly independent process level with process synchronization occurring every few thousand machine instructions)? This section addresses these considerations by discussing some commerciallyavailable machines. 2.2 Multiple Instruction Stream, Multiple Data Stream Machines Multiple instruction stream, multiple data stream (MIMD) machines possess a number of processors that function asynchronously and independently; at any given time different processors may be executing different instructions on different pieces of data. They are ideal for medium to coarse-grainedparallelism. The processors used in these machines range from conventionalmicroprocessors, such as the Motorola 68040 used in the Myrias SPS-3 to powerful proprietary vector processors such as those used in the Cray Y-MP. This type of architecture has been used in a number of application areas; for example, computer-aided design/computer-aided manufacturing (CAD/CAM), simulation, modeling, and as communication switches. These MIMD machines can be broken down into the shared-memory and distributed-memory subcategories based on how their processors access memory. - eBook - PDF
- Vojin G. Oklobdzija(Author)
- 2019(Publication Date)
- CRC Press(Publisher)
A sequential machine is considered to have sing le inst ruction st ream executing on a single data stream; this is called SISD . An SIMD machine has a single instruction stream executing on multiple data streams in the same cycle. MIMD has multiple instruction streams executing on multiple data st reams simultaneously. All are shown in Fig . 1.17 . An MISD is not shown but is considered to be a systolic array . Four categories of MIMD systems, dataflow, multithreaded, out of order execution, and very long instruction words ( VLIW ), are of particular interest, and seem to be the tendency for the future. These categories can be applied to a single CPU, providing parallelism by having multiple functional units . All four attempt to use fine-grain parallelism to maximize the number of instructions that may be executing in the same cycle. They also use fine-grain parallelism to assist in utilizing cycles, which possibly could be lost due to large latency in the execution of an instruction. Latency increases when the execution of one instruction is temporarily staled while waiting for some resource currently not available, such as the results of a cache miss, or even a cache fetch, the results of a floating point instruction (which takes longer than a simpler instruction), or the availability of a needed functional unit. This could cause delays in the execution of other instructions. If there is very fine grain parallelism, other instructions can use available resources while the staled instruction is waiting. This is one area where much computing power has been reclaimed. Two other compelling issues exist in parallel systems. Portability, once a program has been developed it should not need to be recoded to run efficiently on a parallel system, and scalability, the performance of a system should increase proportional to the size of the system. This is problematic since unexpected bottlenecks occur when more processors are added to many parallel systems. - eBook - PDF
Embedded Multiprocessors
Scheduling and Synchronization, Second Edition
- Sundararajan Sriram, Shuvra S. Bhattacharyya(Authors)
- 2018(Publication Date)
- CRC Press(Publisher)
2 APPLICATION-SPECIFIC MULTIPROCESSORS The extensive research results and techniques developed for general pur-pose high-performance computation are naturally applicable to signal processing systems. However, in an application-specific domain, it is often possible to sim-plify the parallel machine architecture and the interconnect structure, thereby potentially achieving the requisite performance in terms of throughput and power consumption at a lower system cost. System costs include not only the dollar cost of manufacturing, but also hardware and software development costs, and testing costs; these costs are particularly crucial for the embedded and consumer applica-tions that are targeted by a majority of current DSP-based systems. In this chapter we discuss some application-specific parallel processing strategies that have been employed for signal processing. 2.1 Parallel Architecture Classifications The classic Flynn categorization of parallel processors as Single Instruc-tion Multiple Data (SIMD) or Multiple Instruction Multiple Data (MIMD) [Fly66] classifies machines according to how they partition control and data among different processing elements. An SIMD machine partitions input data among processors executing identical programs, whereas an MIMD machine par-titions input data and allows processors to execute different programs on each data portion. Modern parallel machines may employ a mix of SIMD and MIMD type processing, as we shall see in some of the examples discussed in this sec-tion. Parallelism can be exploited at different levels of granularity: The process-ing elements making up the parallel machine could either be individual func-tional units (adders, multipliers, etc.) to achieve fine-grain parallelism, or the elements could themselves be self-contained processors that exploit parallelism - eBook - PDF
- H. Stephen Morse(Author)
- 2014(Publication Date)
- Academic Press(Publisher)
For the low end of the market, where, for example, a lab might need to turbo-charge one aspect of an often-repeated calculation, the cost advantages of an SIMD machine, and the naturalness of its program- ming paradigm for many scientific applications, can make it an attractive al- ternative. As we will see in Chapter 4, the SIMD programming paradigm is especially powerful for many application areas, and there is increasing interest in imple- menting this paradigm on top of underlying MIMD hardware. Array opera- tions in FORTRAN 90 are an important example of this trend (see Chapters 4 and 5). Chapter 3 · An Introduction to Hardware Architectures 67 3.4 DISTRIBUTED MEMORY MIMD MACHINES The following sections describe and assess the second major category of MPPs—machines with an MIMD control structure which provide separate, private memory banks to each processor. 3.4.1 A Top-Level Block Diagram Recall the three components described in Figure 3-2: the controller, C; the processor, P; and the storage or memory, S. In an SIMD machine, a single controller drives a multitude of slave processors, each with its own private stor- age. In distributed memory MIMD machines {DM-MIMD), not only the proc- essors and storage are replicated, but also the controllers. This is illustrated in Figure 3-6. What is apparent immediately is that each of the replicated proc- essing nodes (or, more loosely, processors) looks like a replica of the original vanilla-flavor sequential computer pictured in Figure 3-2. Each is capable of storing and executing its own program, on its own data, completely indepen- dent of, and asynchronous with, all other processing nodes. The term multicomputer is sometimes used for this type of architecture, and the name fits. Figure 3-6 grossly simplifies the interconnection network. As discussed in Section 3.2, this is often the most expensive, difficult, and critical component of the entire system. - eBook - PDF
Obstacle Avoidance In Multi-robot Systems, Experiments In Parallel Genetic Algorithms
Experiments in Parallel Genetic Algorithms
- Mark A C Gill, Albert Y Zomaya(Authors)
- 1998(Publication Date)
- World Scientific(Publisher)
SIMD machines are extremely efficient in handling matrix and vector operations where there is inherent parallelism in the data. 14 Obstacle Avoidance in Multi-Robot Systems Figure 2.4: SIMD architecture 2.2.13 MISD Machines In this class, there are N processing units (Processing unit i i = 1,2, ..., AO, as shown in Figure (2.5). Each processing unit has its own control unit (Control unit /; / = 1, 2, ..., AO, but share a common memory containing data. There are N separate instructions (Instruction stream i i = 1, 2,..., AO that operate simultaneously on the same item of data. Each processing unit does different things to the same data. This type of architecture is very rare and impractical. Systolic arrays fall into this cate-gory (Hwang 1993). 2.2.7.4 MIMD Machines These machines are the most general and most powerful (Akl 1989). In this machine there are N processing units (Processing unit i i = 1, 2, ..., AO, along with their own instruction streams (Instruction stream i / = 1, 2, ..., N) from their own control units (Control unit i i -1, 2, ..., AO- Each processing unit receives data from its own data stream (Data stream i i = 1, 2, ..., AO, as shown in Figure (2.6). This machine is like a collection of SISD machines operating together asynchro-nously. Parallel Computing 15 Figure 2.6: MIMD architecture There are several varieties of MIMD machines. These range from fine-grained to 16 Obstacle Avoidance in Multi-Robot Systems coarse grained, and the coarse grained systems can be further subdivided into loosely coupled and tightly coupled systems. Data flow machines are fine grained systems, and are data driven systems (Dennis 1980). Unlike von Neumann machines, instructions are activated by the availability of their operands and not under control of a control unit. In the loosely coupled coarse grained systems each processor contains its own local memory and is connected to the others via an interconnection network and shares various resources on the network. - eBook - PDF
Control and Dynamic Systems V40: Advances in Robotic Systems Part 2 of 2
Advances in Theory and Applications
- C.T. Leonides(Author)
- 2012(Publication Date)
- Academic Press(Publisher)
Finally, some concluding remarks are made in the last section. Π. B A C K G R O U N D In the literature, a wide variety of MIMD (Multiple Instruction- Multiple Data) and SIMD (Single Instruction-Multiple Data) algorithms (and architectures) for computation of the inverse dynamics problem are proposed. Several pipeline algorithms are also reported. 318 AMIR FIJANY AND ANTAL K. BEJCZY In their pioneering work, Luh and Lin [13] developed an MIMD algorithm by decompos-ing the N-E formulation for the Stanford arm into a task graph. They considered a linear array of processors, i.e., an MIMD architecture with local shared memory, and proposed a branch-and-band technique for optimal mapping of the task graph on the architecture. Kasahara and Narita [14] also considered the Stanford arm and used a different schedul-ing scheme for a bus connected architecture, i.e., an MIMD architecture with global shared memory. Barhen [15,16] considered a hypercube architecture (NCUBE), i.e., an MIMD mes-sage passing architecture, and developed a load-balancing scheme to map the task graph of the Stanford arm onto the hypercube space. SIMD algorithms are reported by Lathrop [17], and Lee and Chang [18]. Pipeline algorithms are proposed by Lathrop [17], Lee and Chang [18], and Orin et al. [19]. Some researchers have argued that the N-E formulation, due to its recursive form, is inherently serial and have proposed different formulations or modified versions of the N-E formulation to achieve a higher degree of parallelism in the computation [20]-[23]. Zheng and Hemami [20] presented an algorithm based on the N-E state-space formulation developed by Hemami [21]. Binder and Herzog [22] proposed an algorithm in which parallelism in the N-E formulation was increased by replacing the propagating variables with predicted ones. It is important to note that in both approaches an MIMD architecture with local shared memory was considered. - eBook - PDF
- Zbigniew J. Czech(Author)
- 2017(Publication Date)
- Cambridge University Press(Publisher)
The SIMD computations employing extensions MMX and SSE are used in the multimedia field permitting faster processing of multimedia files including images, videos, audio tracks. The SIMD systems are also increasingly being used in graphics processing units (GPU), which we discuss in more detail in Section 5.7. 5.3 MULTIPROCESSOR COMPUTERS Multiprocessor computers, or briefly multiprocessors, contain a multiplicity of inde- pendent processors. Each processor operates on a separate clock and is equipped with its own memory, arithmetic registers, program counter, etc. Processors asyn- chronously perform multiple streams of instructions that process different data. The 9 In the early nineties of the last century the part of processor arrays on the compiled annually lists of 500 parallel computers with the highest rates of computation amounted to a few percent (www.top500 .org). Since 1998 to date processor arrays have not appeared on these lists. 182 Introduction to Parallel Computing Figure 5.6. A structure of a typical shared-memory multiprocessor. multiple streams of instructions may be related to each other. For example, they may correspond to subproblems that have been identified during problem decomposition. Therefore each multiprocessor is provided with appropriate hardware and software means permitting processors to communicate with each other. According to Flynn’s taxonomy a multiprocessor is a multiple-instruction multiple-data (MIMD) archi- tecture. Multiprocessors can be divided into two categories: multiprocessors with shared memory and multiprocessors with distributed memory. 5.3.1 Shared-memory Multiprocessors In this architecture processors share a common, called also global or shared, memory address space that allows them to store results of computation and to communicate with each other (Figure 5.6). - eBook - PDF
- S. Levialdi(Author)
- 1988(Publication Date)
- Academic Press(Publisher)
Chapter Eight Image Processing Experiments on a Commercial MIMD System L. Carriol i %, S. Cuniólo^ and M. Ferrettif f Dipartimento Informática e Sistemistica, Pavía, Italy Í I.A.N. C.N.R. Pavia, Italy 1 INTRODUCTION This paper describes some experiments carried out to evaluate the perfor-mance of the multiprocessor machine Sequent Balance 8000 in image processing. Goal of these tests is to verify if a general purpose commercial MIMD machine can support image processing tasks, which, at the early processing level, require the elaboration of a great amount of data. Although much work has been done to develop dedicated machines (a good review of such architectures can be found in [1-3]) by means of which true real time can be reached in the low-level image processing, it is worth noting that other kinds of machines can be successfully used in many cases. The main advantage in using commercially available machines is the completeness of these systems both from hardware and software point of view. In fact they do not require any front-end computer for man-machine interface and for mass storage manipulation, any controller, any ad-hoc input device and any particular programming language. Moreover the flexibility of a MIMD machine permits both low-level and high-level image processing. From the user point of view the availability of a well known operating system (with all its programming facilities) and of parallelization tools inserted in the language structure allows a quick design and an easy implementation of the algorithms. After the description of the machine (that is a common bus multiprocessor machine) some experimental results are illustrated. They have been collected on a set of three well known image processing tasks (thresholding, mathematical morphology closing and distance transform); in measuring performance, we have used a set of parameters which try to capture all sources of overhead introduced by system activity and parallelizing strategy. - eBook - PDF
Computational Physics
Problem Solving with Python
- Rubin H. Landau, Manuel J Páez, Cristian C. Bordeianu, Manuel J. Páez(Authors)
- 2015(Publication Date)
- Wiley-VCH(Publisher)
Although, we recognize that there are major differences between the clusters on the top 500 list of computers and the ones that a university researcher may set up in his or her lab, we will not distinguish these fine points in the introductory materials we present here. For a message-passing program to be successful, the data must be divided among nodes so that, at least for a while, each node has all the data it needs to run an independent subtask. When a program begins execution, data are sent to all the nodes. When all the nodes have completed their subtasks, they exchange data again in order for each node to have a complete new set of data to perform the next subtask. This repeated cycle of data exchange followed by processing continues until the full task is completed. Message-passing MIMD programs are also single-program, multiple-data programs, which means that the program- mer writes a single program that is executed on all the nodes. Often a separate host program, which starts the programs on the nodes, reads the input files and organizes the output. 10.10 Parallel Performance Imagine a cafeteria line in which all the servers appear to be working hard and fast yet the ketchup dispenser has some relish partially blocking its output and so everyone in line must wait for the ketchup lovers up front to ruin their food before moving on. This is an example of the slowest step in a complex process determining the overall rate. An analogous situation holds for parallel processing, where the ketchup dispenser may be a relatively small part of the program that can be executed only as a series of serial steps. Because the computation cannot advance until these serial steps are completed, this small part of the program may end up being the bottleneck of the program. As we soon will demonstrate, the speedup of a program will not be significant unless you can get ∼90% of it to run in parallel, and even then most of the speedup - eBook - ePub
- Sajjan G. Shiva(Author)
- 2018(Publication Date)
- CRC Press(Publisher)
R/C ratio.The definitions of Section 5.4 for the speedup, the efficiency, and the cost of parallel computer architectures apply to MIMD systems also and is illustrated by the following examples.Example 6.9Consider again the problem of accumulating N numbers. The execution time on an SISD is of the O(N). On an MIMD with N processors and a ring interconnection network between the processors, the execution requires (N − 1) time units for communication and (N − 1) time units for addition. Thus the total time required is 2(N − 1) or O(2N), and hence the speedup is 0.5.If the processors in the MIMD are interconnected by a hypercube network, this computation requires log2 N communication steps and log2 N additions, resulting in a total run time of 2 log2 N. Hence,Speedup S = N/(2 log2 N) or O(N/ log2 N)Efficiency E = 1/(2 log2 N) or O(1/ log2 N), andCost = N × 2 log2 N or O(N log2 N)Example 6.10The problem of accumulating N numbers can be solved in two methods on an MIMD with p processors and a hypercube interconnection network. Here, p < N and we assume that N/p is less than or equal to p.In the first method, each block of p numbers are accumulated in (2 log2 p). Since there are N/p such blocks, the execution time is (2N/p log2 p). The resulting N/p partial sums are the accumulated in (2 log2 p). Thus, the total run time is (2N/p log2 p + 2 log2 p). In the second method, each of the p blocks of N/p numbers is allocated to a processor. The run time for computing the partial sums is then O(N/p). These partial sums are accumulated using the perfect shuffle network in (2 log2 p). Thus the total run time is (N/p + 2 log2 p
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.











