Computer Science

Pipelining

Pipelining in computer science refers to a technique that allows multiple instructions to be overlapped in execution. It involves breaking down the processing of instructions into a series of stages, with each stage handling a different part of the instruction. This approach helps to improve the overall throughput and efficiency of the processor.

Written by Perlego with AI-assistance

11 Key excerpts on "Pipelining"

  • Book cover image for: Computer Architecture
    eBook - PDF

    Computer Architecture

    Fundamentals and Principles of Computer Design, Second Edition

    • Joseph D. Dumas II(Author)
    • 2016(Publication Date)
    • CRC Press
      (Publisher)
    Designers may need to replicate such a component to improve throughput. The art of designing a modern processor involves balancing the workload on all the parts of the CPU such that they are kept busy doing useful work as much of the time as possible without any of them clogging up the works and making the other parts wait. As the reader might expect, this balancing Chapter four: Enhancing CPU performance 185 act is not a trivial exercise. Pipelining, which we are about to investigate, is an essential technique for helping bring about this needed balance. Pipelining , in its most basic form, means breaking up a task into smaller subtasks and overlapping the performance of those subtasks for different instances of the task. (The same concept, when applied to the manufac-ture of automobiles or other objects, is called an assembly line.) To use terms more specifcally related to computing, Pipelining means dividing a computational operation into steps and overlapping those steps over successive computations. This approach, although much more common in today’s computers than it was 30 or 40 years ago, is hardly new. The frst use of Pipelining in computers dates back to the IBM Stretch and Univac LARC machines of the late 1950s. Pipelining, as we shall see, improves the performance of a processor in much the same way that low-order inter-leaving improves the performance of main memory, while being subject to many of the same considerations and limitations. To understand how Pipelining works, consider a task that can be bro-ken down into three parts performed sequentially. Let us refer to these parts as step 1, step 2, and step 3 (see Figure 4.1). The time taken to per-form step 1 is represented as t 1 , and t 2 and t 3 represent the times required to perform steps 2 and 3.
  • Book cover image for: Computer Architecture
    eBook - ePub

    Computer Architecture

    Fundamentals and Principles of Computer Design, Second Edition

    • Joseph D. Dumas II(Author)
    • 2016(Publication Date)
    • CRC Press
      (Publisher)
    A component that is unused part of the time is not giving us our money’s worth; designers should search for a way to make more use of it. Conversely, a component that is overused (needed more often than it is available) creates a structural hazard ; it will often have other components waiting on it and will thus become a bottleneck, slowing down the entire system. Designers may need to replicate such a component to improve throughput. The art of designing a modern processor involves balancing the workload on all the parts of the CPU such that they are kept busy doing useful work as much of the time as possible without any of them clogging up the works and making the other parts wait. As the reader might expect, this balancing act is not a trivial exercise. Pipelining, which we are about to investigate, is an essential technique for helping bring about this needed balance. Pipelining, in its most basic form, means breaking up a task into smaller subtasks and overlapping the performance of those subtasks for different instances of the task. (The same concept, when applied to the manufacture of automobiles or other objects, is called an assembly line.) To use terms more specifically related to computing, Pipelining means dividing a computational operation into steps and overlapping those steps over successive computations. This approach, although much more common in today’s computers than it was 30 or 40 years ago, is hardly new. The first use of Pipelining in computers dates back to the IBM Stretch and Univac LARC machines of the late 1950s. Pipelining, as we shall see, improves the performance of a processor in much the same way that low-order interleaving improves the performance of main memory, while being subject to many of the same considerations and limitations. To understand how Pipelining works, consider a task that can be broken down into three parts performed sequentially. Let us refer to these parts as step 1, step 2, and step 3 (see Figure 4.1)
  • Book cover image for: Computer Architecture
    eBook - ePub

    Computer Architecture

    A Quantitative Approach

    • John L. Hennessy, David A. Patterson(Authors)
    • 2011(Publication Date)
    • Morgan Kaufmann
      (Publisher)
    Readers unfamiliar with the concepts of precise and imprecise interrupts and resumption after exceptions will find this material useful, since they are key to understanding the more advanced approaches in Chapter 3. Section C.5 discusses how the five-stage pipeline can be extended to handle longer-running floating-point instructions. Section C.6 puts these concepts together in a case study of a deeply pipelined processor, the MIPS R4000/4400, including both the eight-stage integer pipeline and the floating-point pipeline. Section C.7 introduces the concept of dynamic scheduling and the use of scoreboards to implement dynamic scheduling. It is introduced as a crosscutting issue, since it can be used to serve as an introduction to the core concepts in Chapter 3, which focused on dynamically scheduled approaches. Section C.7 is also a gentle introduction to the more complex Tomasulo’s algorithm covered in Chapter 3. Although Tomasulo’s algorithm can be covered and understood without introducing scoreboarding, the scoreboarding approach is simpler and easier to comprehend. What Is Pipelining? Pipelining is an implementation technique whereby multiple instructions are overlapped in execution; it takes advantage of parallelism that exists among the actions needed to execute an instruction. Today, Pipelining is the key implementation technique used to make fast CPUs. A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, although on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment
  • Book cover image for: Advanced Computer Architectures
    • Sajjan G. Shiva(Author)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    3 Pipelining
    As mentioned earlier, Pipelining offers an economical way of realizing parallelism in computer systems. The concept of Pipelining is similar to that of an assembly line in an industrial plant wherein the task at hand is subdivided into several subtasks and each subtask is performed by a stage (segment) in the pipeline. In this context, the task is the processing performed by the conglomeration of all the stages in the pipeline, and the subtask is the processing done by a stage. For example, in the car assembly line described earlier, “building a car” is the task and, it was partitioned into four subtasks. The tasks are streamed into the pipeline and all the stages operate concurrently. At any given time, each stage will be performing a subtask belonging to different task. That is, if there are N stages in the pipeline, N different tasks will be processed simultaneously and each task will be at a different stage of processing.
    The processing time required to complete a task is not reduced by the pipeline. In fact, it is increased, due to the buffering needed between the stages in the pipeline. But, since several tasks are processed simultaneously by the pipeline (in an overlapped manner), the task completion rate is higher compared to sequential processing of tasks. That is, the total processing time of a program consisting of several tasks is shorter, compared to sequential execution of tasks. As shown earlier, the throughput of an N-stage pipelined processor is nearly N times that of the nonpipelined processor.
    The next section provides a model for the pipeline and describes the types of pipelines commonly used. Section 3.2 describes pipeline control strategies. Section 3.3 deals with data interlock and other problems in pipeline design, and Section 3.4 describes dynamic pipelines. Almost all computer systems today employ Pipelining techniques to one degree or another. Section 3.5 provides a selected set of examples. Chapter 4
  • Book cover image for: Computer Programming and Architecture
    • Henry Levy, Richard Eckhouse(Authors)
    • 2014(Publication Date)
    • Digital Press
      (Publisher)
    • Multiple stages mean more results per unit of time. • An exception somewhere within the pipe is more difficult to handle because it does not necessarily relate to the current results. Pipelined Processors 383 • Stopping the pipeline to start up another sequence of instructions helps to negate some of the speed-up improvements (this occurs when a branch is taken or any change to the PC occurs). In fact, if a change to the PC occurs late in the pipeline, in the worst case we may need to undo some of the computations that have been carried out by partially completed instructions in previous stages. There is no reason to believe that instruction execution Pipelining can be subdivided into only three stages. Pipelines of five, six, or seven stages are not uncommon. Pipelining can also be applied in several places within a single processor. One common application of Pipelining is within a floating-point unit, where, for example, each floating-point addition is broken down into its various stages of subtracting exponents, aligning the mantissas, adding, shifting, normalizing, and so on. Pipelining can be applied at several levels. For example, in the VAX 8600 processors, VAX instructions are pipelined; at any point, several VAX instructions are being executed. The VAX 8600 pipeline has four stages: instruction fetch and decode, operand-address generation and fetch, execute, and result store. On the VAX 8800, however, a five-stage pipeline is used for executing microinstructions, that is, the fetching and execution of microinstructions is overlapped, and multiple microinstructions are in the process of being executed at any given time. The effect is to reduce the microcycle time, thereby reducing the instruction execution time. Multiple Functional Units and Hazards In addition to Pipelining, it is possible to increase parallelism in the execution of a sequential instruction stream through multiple functional units.
  • Book cover image for: Modern Computer Architecture and Organization
    eBook - ePub

    Modern Computer Architecture and Organization

    Learn x86, ARM, and RISC-V architectures and the design of smartphones, PCs, and cloud servers

    superPipelining , and it consists of increasing the number of pipeline stages by breaking complex stages into multiple simpler stages. A superpipeline is, in essence, a processor pipeline with a large number of stages, potentially numbering in the dozens. In addition to being superscalar, modern high-performance processors are generally superpipelined.
    Breaking a pipeline into a larger number of superpipeline stages allows the simplification of each stage, reducing the time required to execute each stage. With faster-executing stages, it is possible to increase the processor clock speed. As long as the rate of instruction issue can be sustained, superPipelining represents an instruction execution rate increase corresponding to the percentage increase in processor clock speed.
    RISC processer instruction sets are designed to support effective Pipelining. Most RISC instructions perform simple operations, such as moving data between registers and memory or adding two registers together. RISC processors usually have shorter pipelines compared to CISC processors. CISC processors, and their richer, more complex instruction sets, benefit from longer pipelines that break up long-running instructions into a series of sequential stages.
    A big part of the challenge of efficiently Pipelining processors based on legacy instruction sets such as x86 is that the original design of the instruction set did not fully consider the potential for later advances involving superscalar processing and superPipelining. As a result, modern x86-compatible processors devote a substantial proportion of their die area to the complex logic needed to implement these performance-enhancing features.
  • Book cover image for: Modern Computer Architecture and Organization
    • Jim Ledin, Dave Farley(Authors)
    • 2022(Publication Date)
    • Packt Publishing
      (Publisher)
    superPipelining . SuperPipelining consists of increasing the number of pipeline stages by breaking complex stages into multiple simpler stages. A superpipeline is, in essence, a processor pipeline with many stages, potentially numbering in the dozens. In addition to being superscalar, modern high-performance processors are generally superpipelined.
    Breaking a pipeline into a larger number of superpipeline stages permits the simplification of each stage, reducing the time required to execute each stage. With faster-executing stages, it is possible to increase the processor clock speed. SuperPipelining provides an instruction execution rate increase corresponding to the percentage increase in processor clock speed.
    Reduced instruction set computer (RISC ) processer instruction sets are designed to support effective Pipelining. Most RISC instructions perform simple operations, such as moving data between registers and memory or adding two registers together. RISC processors usually have shorter pipelines compared to complex instruction set computer (CISC ) processors. CISC processors, and their richer, more complex instruction sets, benefit from longer pipelines by breaking up long-running instructions into a series of sequential stages.
    A big part of the challenge of efficiently Pipelining processors based on legacy instruction sets such as x86 is that the original design of the instruction set did not fully consider the potential for later advances such as superscalar processing and superPipelining. As a result, modern x86-compatible processors devote a substantial proportion of their die area to complex logic implementing these performance-enhancing features for instructions that were not designed to operate in such an environment.
  • Book cover image for: Microprocessor Architecture
    eBook - PDF

    Microprocessor Architecture

    From Simple Pipelines to Chip Multiprocessors

    2 The Basics This chapter reviews features that are found in all modern microprocessors: (i) instruction Pipelining and (ii) a main memory hierarchy with caches, including the virtual-to-physical memory translation. It does not dwell on many details – that is what subsequent chapters will do. It provides solely a basis on which we can build later on. 2.1 Pipelining Consider the steps required to execute an arithmetic instruction in the von Neu-mann machine model, namely: 1. Fetch the (next) instruction (the one at the address given by the program counter). 2. Decode it. 3. Execute it. 4. Store the result and increment the program counter. In the case of a load or a store instruction, step 3 becomes two steps: calculate a memory address, and activate the memory for a read or for a write. In the latter case, no subsequent storing is needed. In the case of a branch, step 3 sets the program counter to point to the next instruction, and step 4 is voided. Early on in the design of processors, it was recognized that complete sequential-ity between the executions of instructions was often too restrictive and that parallel execution was possible. One of the first forms of parallelism that was investigated was the overlap of the mentioned steps between consecutive instructions. This led to what is now called Pipelining. 1 1 In early computer architecture texts, the terms overlap and look-ahead were often used instead of Pipelining , which was used for the Pipelining of functional units (cf. Section 2.1.6). 29 30 The Basics 2.1.1 The Pipelining Process In concept, Pipelining is similar to an assembly line process. Jobs A , B , and so on, are split into n sequential subjobs A 1 , A 2 , . . . , A n ( B 1 , B 2 , . . . , B n , etc.) with each A i ( B i , etc.) taking approximately the same amount of processing time. Each subjob is processed by a different station, or equivalently the job passes through a series of stages , where each stage processes a different A i .
  • Book cover image for: The Compiler Design Handbook
    eBook - PDF

    The Compiler Design Handbook

    Optimizations and Machine Code Generation, Second Edition

    • Y.N. Srikant, Priti Shankar, Y.N. Srikant, Priti Shankar(Authors)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    20.3.4.4 Software Pipelining for Multi-Core Architectures Coarse-grained software Pipelining has been used to exploit parallelism for streaming applications (image, video, DSP, etc.) [49]. These applications are naturally represented by a set of autonomous actors, referred to as filters , which communicate over explicit data channels. The filters are fired repeatedly in execution. To facilitate understanding, one may think of a filter as a macro instruction, and the filters (macro instructions) compose the body of a loop. Software Pipelining is especially attractive when there are loop-carried dependences between the filters. The approach proposed in [49] performs software Pipelining followed by core assignment. First, the architecture is treated as a conventional single-core processor, without considering the interconnections be-tween the cores, where each core is a functional unit. With this underlying assumption, software Pipelining schedules the filters to the cores like the traditional software Pipelining schedules instructions to func-tional units. The prolog is constructed to buffer enough data items such that the filters in the kernel are guaranteed to be independent. This allows each filter to execute completely independently during each iteration of the kernel, as they are reading and writing to buffers rather than communicating directly. The buffers could be stored in a variety of places, such as the local memory of the core, a hardware FIFO, a shared on-chip cache, or an off-chip DRAM. In the second step, the core assignment is performed. As the filters are independent, any set of filters, contiguous or not, can be mapped to the same core. The mapping follows two criteria: load balancing and synchronization minimization. To achieve load balancing, filters are sorted in order of decreasing work (computation load), and then they are assigned to the cores in that order. A filter is assigned to the core that has the least amount of work so far.
  • Book cover image for: Time-Predictable Architectures
    • Christine Rochange, Pascal Sainrat, Sascha Uhrig(Authors)
    • 2014(Publication Date)
    • Wiley-ISTE
      (Publisher)
    3 Current Processor Architectures Modern processor architectures implement advanced hardware schemes to accelerate the execution of instructions: Pipelining, out-of-order execution, branch prediction, speculative execution. All of them challenge the timing analysis of code snippets, such as basic blocks. In this chapter, we review these schemes, show how they can be analyzed and provide some recommendations for timing predictability. 3.1. Pipelining Current processors are mostly implemented in a pipelined fashion. In contrast to a non-pipeline processor, where all steps of instruction execution are performed one after another, a pipelined processor executes several steps of the execution of different instructions in parallel. A more detailed description of a pipelined processor is presented by Hennessy and Patterson [HEN 07]. They describe a simple pipeline with four stages. For a better understanding of the pipeline effects that arise in modern processors, the following sections focus on a longer pipeline structure. Figure 3.1 shows a rough block diagram of the basic stages of a pipelined processor with six pipeline stages. The first stage (instruction fetch stage) is responsible for fetching instructions out of some kind of memory and delivering them to the following pipeline stage. The second stage is called decode stage, because it determines the kind of the instruction, the instruction format, the operands and the operation itself. The required operands are read from the register file in the third pipeline stage and the operation is performed in the fourth stage. If a memory access is required, the following stage is 38 Time-Predictable Architectures responsible for hiding the memory latency. The last pipeline stage (write back) writes the result of the operation or the value read from the memory back into the register file.
  • Book cover image for: Digital Design Using VHDL
    eBook - PDF

    Digital Design Using VHDL

    A Systems Approach

    By parallelizing texture retrieval, the overall throughput of the GPU is increased. Pipelining is also applied to increase the throughput of modern processors. In a simple five-stage pipeline, shown in Figure 23.3 (b), an instruction is fetched from memory in the first stage. 23.2 Example pipelines 501 Xform triangles Clip Light Rasterize triangle pipeline Shade Composite fragment pipeline Textures Z-buffer Frame buffer triangle fragment PC IR R Regs IR A ALU IR M IR W Regs Mux Data cache Framer Policing Route lookup Switch scheduling Queue mgt Output scheduler Framer (b) Framer Policing Route lookup Switch scheduling Queue mgt Output scheduler Framer Inst cache (a) (c) Figure 23.3. (a) Graphics pipeline; (b) processor pipeline; (c) packet processing pipeline. Next, the registers are read in the second stage and operands are passed into the third stage, which performs an arithmetic operation. If needed, memory is accessed in the fourth stage. Finally, the result of the instruction is written back into the register file. In a CPU pipeline, both the instruction fetch stage and memory stage can cause accesses to a shared level-two cache (Section 25.4 ). If such a conflict occurs, the memory access stage will typically win arbitration. 1 Giving priority to the downstream stage avoids a potential deadlock situation. The register file in a CPU pipeline is shared between the register read and writeback stages. However, each stage has exclusive access to one or more register ports . The register 1 The instruction fetch stage needs to wait until the memory stage has been completed before proceeding. 502 Pipelines read stage accesses the two read ports of the register file while the writeback stage has exclusive access to the write port. Care must still be taken to avoid reading a value from the register file before it has been written, since an earlier instruction reaches the writeback stage after a later instruction accesses the register file in the read stage.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.