Transistor_Stripes

For most of the 50 years of CMOS design the mobility of the PMOS has been roughly half of the mobility of an NMOS... (In the most recent process this is actually no longer the case, and also you can't route in poly)

Half the mobility means the PMOS must be twice as large(width) to achieve the same "on resistance" as an NMOS device. The PMOS on resistance determines the "pull up" time constant of a logic signal while the NMOS device determines the "pull down" resistance and therefor time constant. To simplify timing it's usually desirable to have symmetric rise (pull up) and fall (pull down) times.

What this means for layout placed side by side so the scaling is obvious:

This is the top view of the 1 unit nmos in yellow and 1 unit pmos in red. The width scaling of both transistors provides the same on resistance and therefore the same rise/fall times for the common load being driven. The yellow is the p-type substrate, the pink is the n-type source/drain regions, the orange is the poly-silicon gate, the red is the nwell, the darker yellow is the n-type source/drain regions.For an inverter, the single unit has a single unit of parasitic input cap determined by the total cap of the NMOS and PMOS. The single unit inverter also has the smallest drive ability. Fan Out Description the thing to note is that the rise and fall times for the same fanout for any unit size will be the same.

Typically the fan out of 4 is used to provide information about a given process node and make certain architectural decisions. What would a flat 1 unit vs 4 unit transistor area look like?

The clay models are a little rough but the idea is in there... On the Left is the one unit inverter and on the right is the 4 unit inverter if the transistors were laid out individually. Two immediate problems associated with using the single device approach: 

1) the gate resistance can become appreciable affecting the speed and unit matching

2) there isn't a really good way to unitize the design for automation as is needed by digital synthesis. 

So you could layout individual transistors and gang them together (in processes where you're allowed to use any poly orientation this works, otherwise imagine the gates are connected in metal 1 with contacts)

This is effectively what you get when you use an array of unit devices or use the multiplier function in the schematic capture. 

Note that the sources and drains are all present even though the devices could easily be oriented to be connected to the same node. 

Lets look into device striping options... If instead of using a unit array or multiplier function you were to specify the width as the 4 unit width for both the pmos and the nmos and then specify that you want to break the transistor into 4 "stripes" the tools will automatically merge the sources and drains as appropriate saving area and parasitic capacitance. 

It's also important to use the appropriate setting for stripes versus multiplier in a schematic capture environment linked to a spice netlister. The spice model of the 4 individual transistor is rather different than a single 4x wide transistor broken into 4 stripes.

An additional important benefit is how nicely these two devices will fit into the same row, greatly improving layout automation.