IBM Previews the POWER6At the MicroProcessor Forum, Dr. Brad McCredie of IBM continued to tease out particulars regarding the POWER6. The presentation discussed a lot of general microarchitecture features, but did not reveal many specific details; a full revelation of the microarchitecture will likely have to wait till ISSCC, next February. However, from the details that were revealed, it is clear that the POWER6 inherited many characteristics from its predecessors, yet made substantial improvements in others.
The POWER6 is targeted to run at 4-5GHz and was fabricated on IBMÂ’s 65nm SOI process with 10 layers of metal. Compared to the 90nm process, there is a 30% performance increase at a given power level, largely due to the use of dual-stress line technology. IBMÂ’s 65nm process offers a 0.65um high performance SRAM cell, and a 0.45um cell for density. The array cells use a lower supply voltage compared to the logic, to reduce power consumption. By all accounts, IBM heavily emphasized circuit design in the POWER6, as the means to increase frequency, while prior designs relied extensively on automated tools and logic design. This helps to explain how IBM was able to dramatically increase the frequency, but it is still hard to believe that such optimizations were never made previously. Leaving a 2x performance boost on the table seems unconscionable from a competitive positioning point of view.
Like the previous two generations, the POWER6 focuses on a big system environment where system architecture makes a substantial difference. Each POWER6 MPU is implemented as a two way CMP design, integrating two simultaneous multithreaded processors along with private per-core L2 caches in a 340mm2 die. For high-end models, four POWER6 MPUs will be packaged in a single multi-chip module, along with four L3 victim caches, each 32MB. Figure 1 below shows a high level comparison of the POWER5+ and POWER6 MPUs.
POWER5+ and POWER6 MPU ComparisonAs the diagram indicates, the POWER6 has incredible bandwidth to feed the processors. At 5GHz, each MPU has 300GB/s of bandwidth, roughly 80GB/s from the L3 cache, 75GB/s from the memory, 80GB/s across the intra-MCM busses, 50GB/s from remote processors, and 20GB/s from local I/O. Generally, the POWER6 doubles the bandwidth of POWER5+ systems, due to frequency increases and adding some new interfaces. The non-core functions in the POWER6 all run at one half core frequency, in the 2-2.5GHz range, compared to roughly 0.8-1.15GHz for various POWER5+ processors. The POWER6 also hosts an additional memory controller and intra-MCM fabric link, and increases the I/O frequency from one third to one half the CPU frequency. Each memory controller connects to memory using the third generation of IBMÂ’s synchronous memory interface. Like Fully Buffered DIMMs, these SMI chips enable larger memory configurations and different memory types (typically older DDR variants for capacity or newer DDR2/3 for bandwidth). The memory controllers and L3 cache all have separate address and data busses (address busses are not shown in the above image), while the interconnect fabric and GX+ I/O bus multiplex the addressing and data.