In good old days when wireline transmission data rate was less than a few hundred Mbit/s, a metal line that connects between 2 chips was simply considered as one of loading that can be easily driven by buffers. Even the data transition is very slow, one unit interval (UI) of data signal was wide enough for a large settling time. As channel length of semiconductor shrinks and transistors can operate a digital signal with fast transition between rail-to-rail, the achievable UI of an integrated system reaches to less than a hundred pico second, which can be interpreted 10~100 Gbit/s data rate. However, the improvement of metal line bandwidth (i.e. PCB/optical cable) in industry is comparably slow and inter-symbol interference (ISI) problem arises. Additionally, the number of IO pads that is able to be pulled out from a single die (which means available number of channels to be used for data transmission) is limitted. People have increased the per-pin data rate with fast transistors and serialized the parallel data bus into a few metal lanes with high-speed as much as possible. Now, more complex wire-line transceiver systems become required to deal with channel ISI problems and transmit/receive the high-speed data signal with high signal integrity as well.
The IO transceiver normally consists of Tx FIR driver, Rx CTLE and Rx DFE as shown in the right-side figure. These three circuits share the burden of ISI cancellation and the data transmission speed can increase by using them. Since the transmitter operates in high frequency and sometimes with various data rates, the high performance PLL system is required to generate a clean and high-speed clock signal with synthesizable frequency tuning ranges. On the receiver-side the timing of received data signal is unknown, but a clock signal aligned with the received data timing should trigger the Rx DFE and the following de-serializer. A clock and data recovery (CDR) system stands data and clock signal timings in line. A new idea to design the fast wire-line transceivers that can remove ISI with low power has become desperately in need.
The ISI is a determinant noise and the main reason for the wire-line transceivers to have become complicated. If the driver transmit high-speed [0 1 0 0 0 0 0 0] data signal into lossy channel with limited bandwidth, the remaining transition due to the slow slew rate interferes the data signal in following UI and causes errorneous decision on the receiver-side samplers. The IO transceivers can remove the ISI noise and are typically designed bidirectionally to save the pin budget as shown on the right. While the transceiver operates in receiver-mode, the transmitter FIR driver presents high impedance at the output and is supposed not to affect the receiver input signal, and vice versa. The bidirectional scheme is useful for testing chip function initially when the chip is fabed-out. The transmitted signal can be directly put into the receiver and basic operation of overall transceiver can be checked out without data frequency offset.
IO channels are terminated with 50 ohm on both input and output to minimize reflection noise. In order to generate a few hundred mV (i.e. more than 500 mV opening) vertical eye-opening on the small termination loads, the transmitter needs to burn more than ~10 mA level current and size of the last transistors enavitably increases. The pre-drivers to drive them become large as well. In addition, implementing a pre-emphasis FIR function complicates the Tx structures in this region and makes the problems worse. IO transmitters normally present large chip area and consume huge current, making the design "power hungry".
One methodology to leverage the power issue is to adopt half-rate clocking structure for Tx. In other words, the doubling the data speed compared to the triggering clock speed can reduce the overall operation speed of transmitter by half, dramatically saving power. We call this "double data rate (DDR)", which is also the name of famous CPU-to-memory interface standard. Although the number of blocks becomes double due to the existance of even clocking blocks and odd clocking blocks, the power saving from speed reduction outweights the doubled blocks.
A voltage-mode driver theoretically consumes one fourth of current-mode driver for a same signal swing in the 50 ohm termination condition on both Tx output and Rx input, and so it is prefered in industry. Segmented drivers presented in the left bottom corner, are useful for making programmable pre-emphasis strength control while maintaining 50 ohm output impedance, which is conducted by a background calibration. Symmetry between timing of even/odd clock is important to maximize horizontal eye-opening at the transmitter output. Taking out differential clocks from a single clock source and reducing the duty error by latch topology as shown in the right middle figure can improve the timing error between even/odd clocks.
The reason for high power consumption of transmitter is small load to drive. Contrastly the load of receiver blocks are MOS gates and requirement for large size design can be relieved. However, the operation speed specification is still GHz level and the size of transistor should be designed as RF circuit's. A continuous-time linear equalizer (CTLE) shown on the left bottom, has a zero formed by degeneration capapacitor Cs and resistor Rs and high frequency peaking transfer function (TF) shown on the right bottom. Since channel has a low pass filter characteristics the overall TF from channel to CTLE output become even in frequency domain. At the same time in time domain ISI can be reduced, widening the eye opening. Usage of low-Q helical inductor instead of RL creates another zero on TF (RL/LD) and more peaking can be generated.
Compared to Rx DFE, the CTLE amplifier is simple and consumes small area. The power efficiency is better for compensating an equivalent loss, but matching the peaking slope to the channel loss in frequency domain is very challenging and tuning peaking slope is not flexible to cope with various channel slopes. Unlike RF amplifiers where the size of input signal is a few mV and the gain is more than 20 dB normally, the CTLE receives hundreds mV of high speed digital signals with ISI. The amplifier needs a wide linear input range and the typical gain is 0~6 dB. Since the CTLE operates without a clock signal, it can contribute to make a least eye-opening for the DFE to initiate their tap calibration properly.
A decision feedback equalizer (DFE) cancels the post ISI curors, which remain after mitigating ISI by a Tx FIR driver and a Rx CTLE. Normally, the proper ISI cancellation strengths of taps (C1,C2) are found by adaptively monitoring the remaining error after cancellation with the digital backend. The following de-serializer parallelizes the high-speed DFE output signal to reduce system power.
The critical timing path formed by the summer, sampler latch and tap C1 decides the speed limitation of overall DFE. A speculative scheme shown on the left bottom can be used to relieve this speed requirement. The 2 replica summers contain the results of both cases that the previous data signal is 0 or 1. After deciding the compensated digital signal using slicers, a domino type mux chooses which one was a correct selection. The scheme is also called a "loop unrolling" technology.
A half-rate scheme can be adopted here to reduce power as well. The figure one the bottom right presents a half rate loop unrolling DFE scheme, where the mux selection of data 0 bank and data 180 bank crosses each other because the correct selection of 1st tap is contained in the other bank each respectively. Expanding the unrolling taps (i.e. 1st/2nd tap loop unrolling) or going to quad-rate scheme doubles the number of replica summer banks making the DFE power hungry