In English

Energy Evaluation of a 5- and a 7-Stage Processor Pipeline Across Nominal and Near-Treshold Supply Voltages

Arpad Jokai
Göteborg : Chalmers tekniska högskola, 2015. 73 s.
[Examensarbete på avancerad nivå]

Pipelined architectures have been used in commercial processors for the last three decades, achieving significant speedups over non-pipelined processors. Besides architectural advancements, CMOS technology scaling has improved the performance of hardware by offering decreasing transistor switching delays in every new technology generation. Technology scaling, however, is inextricably linked to supply voltage (VDD) scaling. The downside of VDD scaling is that the performance of a particular technology generation degrades with lower VDDs. On the other hand, reduction of the VDD decreases the switching power dissipation, which, due to its quadratic dependence on the VDD, reduces faster than the performance degrades. Since there is a plethora of design dimensions to this problem, processor pipelines considering both performance, power, and, consequently, energy metrics are very complex to design.

In this thesis a five- and a seven-stage pipeline are investigated, with respect to metrics such as timing, power, and energy. Evaluations are made in the nominal VDD domain, in which the five-stage pipeline is considered as a baseline to which the seven-stage one is compared with different branch target buffer (BTB) implementations. Assessments in the near-threshold VDD domain are carried out as well, in which the seven-stage design is synthesized using nominal-VDD gate libraries then mapped to libraries recharacterized for near-threshold VDDs. Five different EEMBC benchmarks are used to compare the efficiency of configurations over algorithms with different requirements.

In the nominal, 1.1-V VDD domain at a 65-nm process the most energy-efficient sevenstage pipeline design dissipates 20 μW while running at 650 MHz at the worst-case process corner. At the lowest available VDD of 0.4 V in the near-threshold domain, the same design consumes only 0.035 μW at 13.7 MHz at the typical process corner. The most timing critical paths are also reported, and found to be located toward the instruction and data caches' input ports. In spite of the decreasing power dissipation, energy consumption is expected to decrease only down to a certain VDD and then start increasing again due to the increase in execution times and leakage energy. Considering the trends throughout the thesis' results and in other works, it is concluded that the optimal energy point for the seven-stage design is around the 350-mV VDD mark.



Publikationen registrerades 2015-06-17. Den ändrades senast 2015-06-17

CPL ID: 218492

Detta är en tjänst från Chalmers bibliotek