Life In The Fast Lane

Desktop memory has taken one more step forward, even as video memory gets its much awaited shot in the arm. The next generation is here. We answer the what and the why.

 

Michael Browne

We’ve come a long way—from monochrome displays and complex command line syntaxes to the (then) revolutionary and simplified Start menu and Windows Explorer courtesy Windows 95, to MS Vista’s 3D Aero interface. As the adage goes—technology changes everything. Today, news updates are an RSS feed away, and you can say goodbye to daily logins at www.cnn.com. Stock, weather, travel, and even calendar related data is all available as an active task pane on your desktop and the computer has become even more newbie friendly with utilities built into the latest OSes and applications. This so called interactive experience is taken to the next level with content like 3D games, and Blu-Ray movies—the new generation of entertainment. The animation and graphics industry, the world of cinema and content encoding and security and encryption domain all cry out for the next generation of computing hardware to accelerate productivity, reduce latency times, and satisfy our computing demands. The latest hardware available is quickly brought to its knees with taxing applications as developers code their socks off to take advantage of the extra oomph that a shiny new quad core processor or an 800 stream processor graphics card provide. So the cycle of hardware development continues. The main drivers behind this development are games, 3D modelling and animation, and multimedia and interactive content in general.

Memory is one of the performance enhancing components that enjoy a relatively longer market life, as two or three processor generations may be based around a single memory standard. For example DDR2 has been around since Pentium days. There are two types of memory in a computing system—main memory (RAM) and video memory (VRAM), which is present on graphics cards.

The acronym RAM is currently synonymous with DDR2, but this wasn’t the case always. DDR RAM was its predecessor, and DDR RAM itself preceded by SD RAM. Now facing retirement, DDR2 should soon be put to pasture by one of our limelight hogs—DDR3. DDR3 is already available for purchase, but it’s very costly at the moment, owing to very low market acceptance. Then there is the cost of a new motherboard to consider, and DDR3-based boards aren’t cheap. Other than the odd enthusiast, DDR3 at the moment is a largely un-adopted technology, but that will change. Industry experts believe DDR3 to become a norm sometime early next year— when manufacturers of this new technology will do sufficient volumes to drastically reduce prices, and faster DDR3 memory modules appear—which will increase the (current) nominal performance advantage that DDR3 has over DDR2. Intel’s upcoming Nehalem and their X58 chipsets will be DDR3 only. AMDs Sandtiger, the successor to their Phenom processor range, may also go the DDR3 way, but that largely depends on Intel—the market leader. Intel made DDR2 the success it was, and it seems they’re going to lay it to rest soon as well.

The other newcomer we’re spotlighting is an exciting new VRAM standard—GDDR5, and the successor for the ageing GDDR3, which has been around since GeForce 6800 Ultra / Radeon X800XT days. But wait! Where’s GDDR4? Defunct, largely, thanks to the industry having taken a jump from GDDR3 to GDDR5. GDDR4 with its high latencies and comparable-to-GDDR3-speeds is largely a no-go, and other than the odd acceptance by AMD/ATi on their X1950XTX and HD3870, has been largely left alone. Both NVIDIA and ATI have tried all the tricks to up memory bandwidth, which include bumping up clock speeds and/or increasing the bus width. GDDR3 has hit the MHz threshold at 2.4 GHz, and developing wider-than-256-bit memory is very costly. Enter GDDR5—a new standard that vendors like Qimonda (formerly Infineon) and Samsung claim can easily hit speeds of 5 GHz, while consuming less power—no more than 1.5v at whatever speed. Add to that extra features geared towards reliability and you’ve got a sure shot recipe for a successor.

 

DDR3: In Pursuit Of 8 Lane Expressways

DDR3 is being offered in the same 240 pin configuration that DDR2 is available with, but the two are incompatible and the physical notches on the modules’ PCBs have different positions to avoid mistakenly slotting of the modules in incorrect slots. The first thing that’s radically different about DDR3 is the improved clock speeds—to the tune of 2000 MHz whereas the fastest DDR2 modules could do no more than 1200 MHz. Its early days and the speeds of DDR3 could go as high as 2400 MHz—an exciting prospect. Unfortunately latency (wasted clock cycles) also increases with higher clocks, and DDR3 has much higher latencies than DDR2 which somewhat nullifies the clock advantage. The increase in clock frequency is possible because DDR3 has double the prefetch rate than DDR2. The DDR3 core is capable of prefetching 8 bits in a clock, while DDR2 could only manage 4 bits per clock. Doubling the prefetch is possible by doubling the number of internal banks between the DRAM core and the I/O buffer. This speeds up things and further reduces latency due to parallelism. While the advantages of DDR3 may not be apparent at 1066 MHz (the current maximum official frequency for DDR2), at speeds of 1333 MHz and above, the newcomer should start baring it’s teeth.

From the table above we see that DDR3 has an improvement in clock speeds, data throughput and power consumption, which (clock to clock) should reduce proportionately due to lower operating voltages. The BGA packaging remains unchanged although DDR3 chips have a higher pin count. These extra pins are mainly for improving signal integrity at faster clock speeds. Its highly undesirable, (a gross understatement), for memory modules to have corrupt data in their buffers as a result of signal loss—which itself occurs once clock speeds are bumped up.

The architecture adopted by DDR3 is called Fly-by and it features additional goodies like on-module signal termination, and other flow stabilisation commands—to aid data integrity. Unlike DDR2s architecture where signals are sent simultaneously, DDR3 follows a more serial approach by sending signals sequentially—one by one to the memory chips. Therefore the internal working of DDR3 memory in terms of data reading and writing is totally different from DDR2. This sequential operation needs some sort of counter or timer in order to link past, current and future operations that are based on the same process. The solution—integrate such a sequencing device on to the memory controller itself. This gives the controller the ability to manage capsules of data and match them to their relevant processes. The technical term for this is called Data Levelling. The signal protocols for DDR3 have changed too—as a direct result of the much higher frequencies. These obvious differences mean that DDR3 RAM also has an entirely new set of signal protocols to cope with the higher memory bus frequency.

64-bit operating systems mean greater memory addressing capacities than before. This combined with more bandwidth intensive applications dictates that memory density should increase, and not just its frequency. With DDR2 we saw memory modules with up to 2 GB of memory. Such modules use high density chips of capacities of up to 128 and even 256 MB (each memory module has either eight or sixteen such chips). With DDR2 manufacturers promise as much as 512 MB on a single chip, meaning a module with a single bank of chips (chips on a single side of the module) could have a capacity of as much as 4 GB! We’ve heard of manufacturers working on chips with capacities of 1024 MB. Although double bank chips (16 GB on a single module!) will be prohibitively expensive, and will be a rarity, the prospect of seeing eight gigabytes on a single module is exciting. This would also drive prices lower. 2009 should see a sharp decline in DDR3 prices by our estimates—much like DDR2 saw back in mid 2007. Faster, low latency modules should also become available at the price of slower modules, and the ones already in the market will get cheaper. 2009 should be the time that users should migrate to DDR3, as of now DDR2 hasn’t relinquished the throne in terms of market popularity.

In terms of performance things are about even, with fast, low latency DDR2 modules competing hard with similar DDR3 modules. Once 1.6 GHz and 1.8 GHz DDR3 modules become de facto however, DDR2 will trail behind.

 

GDDR5—Speed Demon

After a hiatus of nearly 4 years, the world of video memory is about to receive it’s much awaited shot in the arm courtesy GDDR5. The current norm, GDDR3, has done a great job. It’s gone from a mere 256 bits and 1 GHz in the GeForce 6800GT back in 2004, to a 512-bit 2.2 GHz part in the GeForce GTX280. However this industry is all about change, and we’re seeing a transition of sorts about to commence. We’ve already seen GDDR5 make an entry in ATI’s latest Radeon HD4870. The memory on this card is built on a simple 256-bit bus. As opposed to NVIDIAs GTX280 which has a massive 512 bit memory interface. The aim of both vendors is simple—more bandwidth. However, the approach is radically different—NVIDIA uses wider memory buses, ATI has gone with GDDR5 which has higher clocks.

ATI’s move makes sense because increasing the bus width is no easy or cost effective task. Each memory chip has to have more pins—this leads to a larger chip—which leads to fewer such chips per wafer (yes, memory chips are also cut from silicon wafers). This drives up the cost, which is why mainstream and budget graphics cards usually have 128- or 256-bit memory interfaces. Even bumping up clock speeds doesn’t help, as thermal thresholds are eventually hit. GDDR5 is all about transmitting more data per pin, (we’re talking about the pins on a BGA memory socket), per clock. GDDR5 has the benefit of requiring less power, and also fewer pins which simplifies the manufacturing process.

So what does GDDR5 do radically different from GDDR3 and GDDR4? First up, there’s an exciting new feature called Clock Data Recovery (CDR hereon) which is all about maintaining clean signals while transporting data and thus preserving data integrity. With CDR the data stream itself generates the memory clock speed, and not the memory controller. The bus between the memory chips and the graphics processor is actually trained or synced. In the same way the command and address clocks are also synchronised. For this reason this feature is also sometimes called Interface Timing, or Clock Training. The clock speed is actually picked up and read from the data stream and data and addresses and commands are adjusted or de-skewed (the technical term for it) for best possible data integrity. This is one of the new features integrated into GDDR5 memory modules to achieve higher clock speeds.

Traditionally GDDR3 and other memory technologies have used a single write clock (designated as WC) which transmits data and addresses as well as the read and write operations. GDDR5 makes use of two write clocks instead of one. One clock called the differential command clock (CK) is for data and addressing and the other is called forwarded differential write clock (WCK). The CK operates at half the WCKs frequency, so if the CK is running at 1.5 GHz the WCK would run at 3 GHz. Each of these two clocks gets a byte of storage each which allows for alignment and synchronisation of data to be stored during the initialisation and training sequences—as explained in the previous paragraph. The data is synchronised on the basis of the value stored in the bytes of each clock. The concept of separating data and operations helps to minimise noise and jitter thereby preserving data integrity at the higher clock speeds that GDDR5 can do.

In conjunction with CDR and dual write clocks, one of the new protocols introduced by GDDR5 is error detection for read/write operations. Note we said detection, and not correction. Error correction is not supported and a data re-request occurs if an error is detected. This boosts reliability except for one caveat—as the memory is clocked higher, more errors will occur—and the real world bandwidth may actually suffer as more and more instructions and data need to be re-requested from buffers, in case of an error. However with normal clocks and moderate overclocking this feature should help in maintaining signal integrity and therefore data integrity and system stability.

Hand-in-hand with performance, GDDR5 also has one noteworthy trick that helps in power saving. It’s called Address And Data Bus Inversion, and its main function is to reduce the number of binary 0s which are transmitted on both data and address lines. To achieve this, a couple of extra pins are integrated on to a GDDR5 chip—these pins are DBI and ABI pins i.e. data bus inversion and address bus inversion pins, and there are one each of such pins for every byte (8 bits) of data.

In data and address buses, a binary one (1) means a high voltage while a zero (0) indicates a low voltage. In case more than 50 per cent of the bits in a byte are low voltages (zeroes), this technology simply sends an instruction stored in the DBI pin that essentially inverts the binary values for that byte. So, ones become zeroes and zeroes become ones. What this does is ensure that in any given clock cycle, no block of data, instructions or addresses will ever have more than 50 per cent of its signal containing zeroes. This especially helps in those clock cycles where very little data is being passed through memory—zero (0) values because even in such cases some power consumption still occurs.

And what makes GDDR5 really fast? Very simply GDDR5 doubles the number of DQ links or Data Queue links over predecessors like GDDR3 and GDDR4, so at the same clock speed it’s able to (theoretically) get double the bandwidth. So while GDDR3 at 1 GHz had the data effectively clocked at a speed of 2 GHz, GDDR5 runs the same data at and effective speeds of 4 GHz. Note that the core clock speeds of both GDDR3 and GDDR5 are identical—its only data transmission speeds that have changed. So at the same clock speeds 512-bit GDDR3 memory would yield the same throughput as 256-bit GDDR5 memory, which is a tremendous cost saving factor in favour of the newcomer. GDDR5 is also being developed with higher memory capacities in mind—so the density of each memory chip is also higher than GDDR3—this will further simplify the PCB designs of graphics cards that use this new standard as fewer memory data paths need to be designed as fewer chips will offer the same amount of memory. As of now GDDR5 will be expensive to manufacture with only one real outlet for the chips—ATI’s Radeon HD4870, but both vendors (NVIDIA and ATI) should skip to GDDR5 over the coming months. The allure of reducing costs, better performance scaling, and lower power consumption will ensure that GDDR3 dies a natural death. It’s not a race though—more like a relay, and the baton is about to change hands.

 

The Future?

We’re told that DDR5—the RAM equivalent of GDDR5 is also destined for the desktop sometime in 2011, and its clock speeds could well exceed the fastest processor clock speeds. Obviously quad cores and even hex cores and octal cores would probably be the norm by then. One revolutionary product spawns several others—just like Intel’s Core 2 Duo made the DDR2 standard a runaway success as people migrated from their old, slower Athlon 64/DDR systems. Graphics technologies won’t stagnate as well, and aside from development on new technologies like GDDR6, we can expect to see a lot more stream processors on the next generation of cards, and memory management techniques will also evolve. The thing to remember is that all this advancement is very necessary and not just a race—software and us, the users, are slowly evolving to these new horizons. n

michael.browne@thinkdigit.com

Team Digit

Team Digit

Team Digit is made up of some of the most experienced and geekiest technology editors in India! View Full Profile

Digit.in
Logo
Digit.in
Logo