Unlike multi processor systems that make use of multiple processors to carry out concurrent operations, multicore CPUs make use of multiple cores within a single processor. The idea here is to divide the processor into multiple cores such as dual, quad etc to carry out operations in parallel. The main advantage of these systems is that it improves potential performance of the overall system. One of the major examples of such systems in Intel processors whose speed of processing increased from 10 MHZ to 4 GHZ. This value is considered as limit for most (or) all of the chips that are based on CMOS because of the power constraints. These constraints can be removed by employing ILP mechanisms which are based on super scalar architecture and speculative execution.
Some systems use many-core GPU (Graphics Processing Units) that make use of thousands of processor cores. These GPUs are capable of managing instructions with varying magnitudes similar to that of multi core CPU.
Some of the processors that are based on multi-core and multithreaded processing are Intel i7. AMD opteron, IBM power 6 and many more. Multithreading: Multithreading is a feature which enables multiple threads to execute on a single processor in an overlapping manner. A thread is an atomic unit of a process and many threads usually make up a process. In a multithreading
environment, the resources of a processor are being shared by multiple threads, so each thread gets a separate copy of the functional unit or resource. Functional units generally include
a register file, a separate Program Counter (pc) or a separate page table to enable virtual memory access, which in turn enables multiple program to execute simultaneously by sharing the memory
To enable multithreading, the hardware must be able to perform threads switching which is more efficient than switching the processes, as each process consists of threads and usually takes many clock cycles for its execution. As threads are lightweight they can execute and switch among themselves during the execution. Therefore they are considered more efficient and fast than processes.
Multithreading can be implemented in two ways
1. Fine-grained multithreading
2. Coarse-grained multithreading.
1. Fine Grained Multithreading: In this approach, the threads are switched on each instruction. The delay caused because of the switch operation of threads is very little. The threads are switched only if the current running thread encounters a stall. The subsequent thread is chosen from a pool of waiting threads in a round-robin fashion. The approach becomes effective if threads are switched at every clock cycle.
Advantage: The advantage of fine-grained multithreading is that it can efficiently recover the losses of throughput which come from short and long stalls of the thread.
Disadvantage: The execution of the stalled thread is delayed that in turn decreases the execution speed of that individual thread since another thread is being executed in its place.
2. Coarse-Grained Multithreading: Coarse-grained multithreading is another approach for implementing multithreading. In a coarse grained multithreading approach, the threads switch only when a costly stall is encountered. A costly stall can be defined as a stall where a thread requires resources which usually consumes more CPU clock cycles than required.
A level 2 cache miss is an example of a costly stall. If such a case is encountered in a coarse-grained approach, then another thread replaces it and executes till the stalled thread has recovered.
In contrast to a fine-grained approach, in a coarse-grained approach, the threads without stalls can be executed completely without any interruption until a costly stall is encountered.
The main disadvantage of coarse-grained multithreading is that, when a thread encounters a costly stall, its instruction pipeline which is carrying out the execution gets frozen. The new thread which replaces this frozen thread has to wait until the emptied pipeline is filled, prior to completion of instruction execution. The time delay is significant and appears to be an overhead. Coarse-grained multithreaded approach doesn't have the ability.
The key advantage of coarse grained multithreading is that it stops the execution of threads which encounter costly stall and replaces with a new thread. A costly stall consumes more clock cycles when compared to the time taken to remove a frozen thread and replace a new thread into the pipeline. Simultaneous Multithreading (SMT): In addition to the two multithreading approaches there is another approach which is implemented on a superscalar multiprocessor. A superscalar processor is a processor which issues multiple issues at the same time exploiting Instruction Level Parallelism (ILP) where multiple instructions are executed at the same time. A
Superscalar processors generally operate on more than one scalar. By scalar, we meant a single unit of data. This approach is a variation of fine-grained multithreading and is called
Simultaneous Multithreading (SMT). Simultaneous multithreading allows multiple threads to
execute at the same time and also let multiple instructions to get executed at the same time by a processor.
By allowing multiple instructions to be executed on multiple independent threads, a higher degree of efficiency is achieved. Instructions are dynamically scheduled so that a processor can run multiple instructions on multiple threads without conflicts and dependencies. To maintain the execution of multiple threads on multiple instructions, the registers are renamed continuously so to avoid confusion between two threads.