1.4 billion transistors. Just let that number sink in for a little while. One point four billion transistors. That’s how big the GTX 200 GPU at the heart of Nvidia ’s new high-end graphics cards is. To put that in perspective, Intel’s latest 45nm quad-core CPUs with 12MB of cache is about 820 million transistors—70% less.Nvidia won’t share the precise size of their chip in square millimeters, but they’re still on a 65nm manufacturing process. The 65nm G92 processor used in so many recent Nvidia graphics cards squeezed about half as many transistors (754 million) into a chip that was about 330mm^2. Even with improved transistor density, it’s a sure bet that the GTX 200 chip is nearly 600mm^2, if not bigger.
click on image for full view
That’s one enormous chip, and you sure don’t get a whole lot of them on a wafer from the fabrication plant. If we assume a 300mm (12-inch) wafer size, that’s less than 100 dies per wafer, assuming normal margins. In other words, it’s going to go into some expensive products. Expensive, fast products, one would hope.
The two products built using the GTX 200 chip are the GeForce GTX 280, which fully enables all the chip’s capabilities, and the less expensive GeForce GTX 260, the so-called “salvage chip” that has some of the chip’s parts disabled (typically, this is done to get use out of chips with small defects that would otherwise have been thrown out).
No doubt about it, Nvidia is aiming straight at the high-end with this product launch. With price tags estimated at $650 for the GTX 280 cards and $400 for the GTX 260 cards, these single-GPU cards already occupy the rarified air typically reserved for dual-GPU cards. Does the performance match up to the sticker shock?
GTX 200 Architecture Features
The chip powering these two new graphics cards, called the GTX 200 chip, is an absolute monster of a processor. Nvidia proudly proclaims that it’s the biggest processor TSMC (Nvidia’s primary chip fabrication partner) has ever built. It’s not just a clocked-up or expanded version of the G92 chip powering Nvidia’s most recent high-end cards, but a totally new architecture.
The basic layout of the GTX 280 looks like this: You have a geometry shader processing unit, vertex shader, and setup/raster units. These feed into the unified shader array of no less than 240 stream processing units. That’s 10 blocks of 24 shader units, each block is three groups of eight.
Some L2 texture cache sits between these and the memory interface units, typically referred to as “ROPs” or render back-ends (eight “blocks” of those with four-per-block, which is double the ROP power of the G92).
click on image for full view
Each stream processor includes a register file twice the size of those in the stream processors of earlier Nvidia chips, along with a floating point compute unit, an integer compute unit, and a move/compare unit. There’s also a double-precision floating point unit (IEEE compliant), which is useful for some GP-GPU tasks but not particularly handy for graphics.
Call that configuration one processing “core,” if you will. Eight of them each access a 16K block of local shared memory. Three of those eight-processor groups together access the same bank of L1 cache, creating a 24-unit processing “block.” The block has eight texture address/filter units associated with it. The GTX 280 chip has 10 of these blocks, for a total of 240 stream processors and 80 texture units.
A block diagram of the GTX 200 GPU’s stream processing unit, 24 compute cores. (source: Nvidia)
In the GeForce GTX 280 products, all of these functional elements are enabled. In the GeForce GTX 260 products, some of the units are disabled. These are typically called “salvage chips,” where GPUs that had some defects can have certain parts disabled and still function well as lower-performing parts, allowing Nvidia to effectively use some of the “bad” chips on a wafer.
In this case, the GeForce GTX 260 cards have two blocks of stream processors disabled for a total of 192 stream processors and 64 texture mapping/filtering units. One of the memory access/ROP units is also disabled, for 24 render back-ends instead of 32.
Whether or not you consider the GTX 200 GPU an extension of the G92 class or not is a matter of perspective. Certainly some of the capabilities are similar, and there are no added support for features present in DirectX 10.1, for instance. On the other hand, there are some significant tweaks to the design.
The scheduler has been upgraded to handle more threads at once, which it would need to do to effectively utilize all those stream processing units. We already mentioned the larger register file and support for double precision floating point math. Instruction co-issuing in the stream processors is now more efficient. Texture filter units employ a more efficient scheduler, which Nvidia claims is 22% more efficient than those in the G92 chip.
The blend rate of the ROP units is doubled (per-ROP) compared to previous generation chips. Geometry shading performance, a real sore spot of the G80 and G92 generation GPUs, has been massively improved in the GTX 200 GPU.
GTX 200 vs. The World
We’ve already discussed how the GTX 200 is an absolute monster of a chip, with a huge 1.4 billion transistor count and extremely large surface area. Let’s get into some specifics of the two graphics cards the chip will appear on: the GeForce GTX 280 and 260.
The top-end card has all the functional units enabled, including a 512-bit memory bus that brings back echos of ATI’s Radeon 2900 XT. On that bus sits a full gigabyte of GDDR3 memory at around 1100 MHz, for a total of over 140GB/sec of memory bandwidth.
The less expensive GTX 260 variant eliminates one of the ROPs/memory interface units, so it effectively has a 448-bit interface and 896MB of GDDR3 RAM. The clock speeds are somewhat lower on those parts, too. Here’s how the new products compare with some other recent graphics cards:
|
|
|
|
|
|
|
| Spec |
GeForce GTX 280 |
GeForce GTX 260 |
GeForce 9800 GX2 |
GeForce 9800 GTX |
Radeon HD 3870 X2 |
Radeon HD 3870 |
|
| Price |
~$650 |
~$400 |
$480 |
$260 |
~$360 |
~$160 |
|
| GPU |
GTX 200 |
GTX 200 |
2x G92 |
G92 |
2x RV670 |
RV670 |
|
| Manufacturing Process |
65nm |
65nm |
65nm |
65nm |
55nm |
55nm |
|
| Core Clock |
602 MHz |
576 MHz |
675 MHz |
675 MHz |
825 MHz |
775 MHz |
|
| Stream Processor Clock |
1.29 GHz |
1.24 GHz |
1.69 GHz |
1.69 GHz |
825 MHz |
775 GHz |
|
| Memory Clock |
2.2 GHz DDR |
2.0 GHz DDR |
2.2 GHz DDR |
2.2 GHz DDR |
1.8 GHz DDR |
2.25 GHz DDR |
|
| Stream Processors |
240 |
192 |
2x 128 |
128 |
640 |
320 |
|
| Texture Units |
80 |
64 |
2x 64 |
64 |
16 |
16 |
|
| Render back end (ROPs) |
32 |
28 |
2x 16 |
16 |
2 x 16 |
16 |
|
| Frame Buffer |
1024 MB GDDR3 |
896 MB GDDR3 |
2x 512MB GDDR3 |
512MB GDDR3 |
2 x 512MB GDDR3 |
512MB GDDR3 |
|
| Memory Interface |
512 bits |
448 bits |
2x 256 bits |
256 bits |
2 x 256 bits |
256 bits |
|
| Memory Bandwidth |
141.7 GB/sec |
111.9 GB/sec |
2x 64 GB/sec |
70.4 GB/sec |
2x 57.6 GB/sec |
72 GB/sec |
Though Nvidia claims plenty of improvements in the GTX 200 GPU, don’t expect a lot of improvement in the area of video processing. It still uses the VP2 video engine found in the G92. Not that we’re complaining—it’s a perfectly good video solution.
We should note a few other board-level features. The GTX 280 and 260 both support Nvidia’s new HybridPower feature. If you plug them into a supporting nForce motherboard with integrated graphics, you can power-off the card entirely when you’re just tooling around on your desktop, and power up the card only when you have 3D game running.
Both cards are 10.5-inches long and double-wide, so they may not fit in all cases. This, of course, is not unusual for cards in this price bracket. Nvidia has done a lot to optimize the idle power consumption, but max power consumption should be higher than previous cards (we’ll examine that later).
The GTX 280 variant use both a 6-pin and 8-pin PCIe power connector, so expect power utilization to be relatively high under heavy load, and you’ll want to make sure you have a compatible power supply before considering a purchase. The GTX 260 model uses a more standard configuration of two 6-pin power connectors.
Also, note that 512-bit memory busses carry with it a certain amount of baggage. You can only fit so many pins in so much space, so no amount of manufacturing process change will ever make the GTX 200 GPU “small” without also reducing memory bus interface width. It also requires boards with more layers and trickier layout, increasing board costs.
In other words, there may be some pricing flexibility, but with a chip this big and boards this costly, don’t expect GeForce GTX 280 cards to come down dramatically in price unless Nvidia just starts taking a loss on each one sold