I participated as an undergraduate intern in the ICAS Lab (Integrated Circuits and Systems Design Lab) at Sungkyunkwan University under the supervision of Professor Yoon-Myung Lee.
I studied charge domain & current domain Analog In-Memory Computing (IMC) by reading papers and then designed an IMC circuit from one of the papers I read: A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS, 2021 IEEE Journal of Solid-State Circuits.
Overall, the IMC circuit consists of a Digital-to-Analog Converter (DAC), Multiply-and-Accumulate (MAC), and Analog-to-Digital Converter (ADC). DAC is designed in Pulse Frequency Modulation (PFM), and ADC is designed in flash ADC. The MAC part is intended for binary-weighted capacitor-based charge-sharing MAC operation. I focused on designing and developing MAC & ADC parts.
M. E. Sinangil et al., JSSC, 2021
[Figure 1] MAC & ADC
The overall operation consists of four-step operations, as shown in [Figure 2]. Note that the input capacitance of comparators is used as computation caps in [Figure 1]. For accurate MAC operation, the total cap attached to the four RBLs should be all the same and constant during Step 2 to ensure linear discharge.
Before Step 3: charge-sharing operation, RBL & compensation caps should be disconnected from computation caps. This will not affect the voltage of the node A~D in [Figure 2] in ideal cases such as ignoring channel-charge injection. Since the node voltages of the SRAMs connected to RBL[3] indicate MSBs of 4-bit weights, a voltage at node A results from MAC discharge between 4-bit inputs and MSB of 4-bit weights. Since RBL and compensation caps were disconnected, the total charges of computation caps are conserved after the nodes A~D are connected. As a result, the voltage after the charge-sharing will be (8V_A + 4 V_B + 2V_C+1V_D)/15, which implements the bit-wise MAC operation. This result will be applied to the input of flash ADC.
Kim, H., Yu, C., Kim, B. (2023). SRAM-Based Processing-in-Memory (PIM). In: Kim, JY., Kim, B., Kim, T.TH. (eds) Processing-in-Memory for AI. Springer, Cham. https://doi.org/10.1007/978-3-030-98781-7_3
[Figure 2] Four-steps of overall operation.
There are some trade-offs in this circuit.
First, the MOSFET switch is not ideal, so it acts as resistance even though it's turned on. This resistance occurs in RC delays while discharging the Read Bit Line (RBL). RC delays will influence linear discharge. To prevent this, reducing resistance & capacitance are needed.
Increasing the size of switches can be suggested to reduce the resistance of MOSFET switches, but this will occur with more considerable channel charge injection, which can influence the accuracy of MAC operation. More significant channel charge injection will be more harmful if the input capacitance of comparators is small.
To reduce input capacitance, I could use small-size transistors for comparators. However, using tiny transistors will cause a trade-off with the RBL discharge range, as explained in the following section.
[Figure 3] Trade-Offs
Because the input capacitances of comparators are used as computation caps, these have to be equal during the MAC operation. Since the MOSFET is to be used as a capacitor, we usually require operation in the strong inversion region. But since reference voltages should be set between RBL discharge ranges, Gate-Source voltage, which is the difference between RBL and reference voltage, might be lower than threshold voltage in some situations, which means it can't be guaranteed that MOSFET will always stay strongly inversed. Instead, the paper used depletion regions for equal input capacitance. As shown in [Figure 4], the input capacitance of the comparator is the sum of Cgg of MPL, MPR, MNL, MNR. In this way, the voltage dependence of the gate cap is mitigated, and linearity is improved.
But gate-source voltage is limited by a threshold voltage to stay in the depletion region. The most significant gate-source voltage will be almost the same as the difference between VDD & minimum RBL voltage, so a larger threshold voltage will widen the RBL discharge range. Since I could only use a specific model, the body effect was the only key to increasing the threshold voltage. I could use the body effect in PMOS but not in NMOS because it's hard to apply body voltage to MNL & MNR differently from other NMOS due to the common p-type substrate of CMOS fabrication.
To solve this problem, I significantly increased the size of PMOS compared to NMOS, which reduces the influence of MNL & MNR on overall input capacitance. However, increasing the size of PMOS will result in large input capacitance, which causes considerable RC delay and dynamic power consumption. Moreover, high RBL voltage and VREF are unsuitable for sensing with small NMOS.
M. E. Sinangil et al., JSSC, 2021
[Figure 4] Input Capcitance of Comparator
Large input capacitance of the comparator will cause large power consumption & RC delay. My idea is to extract MNL/MNR from a comparator. In this way, there is no more need to increase the size of MPL/MPR, so a small input Cap is possible.
However, a comparator with only PMOS is unsuitable for sensing high RBL & VREF, so I suggested changing the structure of the 8T SRAM. The reason for using 8T SRAM rather than 6T is its separate write/read port to avoid read/write disturbance. However, the disturbance problem only matters during IMC operation, not for read operation. So, unlike the 8T SRAM used in paper, I make the read operation occur just like 6T SRAM (not through the 8T read port) and change the read port to PMOS. This read port is used only for IMC and is no longer used for read operation (So this port should be renamed into such a computing port.) Because we are using PMOS, step 1 of [Figure 2] will be RBL discharging, and step 2 will be RBL pre-charging. Because RBL voltage & VREF will be much lower, comparators with only PMOS could compare these voltages.
However, because PMOS is slower than NMOS, it results in slower performance.
Method 1: Bottom-Plate Sampling (Failed)
There were significant channel-charge injection problems at comparators during ADC operation. The first method I thought about was bottom-plate sampling. Charge injection of right-side SH switches, which delivers VREF, can be easily solved using Transmission Gates since VREFs are fixed. For the unfixed RBL voltages, I thought turning off the left-side SH switches later than the right-side switches would make the voltage across Cu constant, independent of the RBL voltage.
However, since comparators were already disconnected from other circuits, charge injection would not take the path to the RBL side, so I tried another idea.
M. E. Sinangil et al., JSSC, 2021
[Figure 5] Solution 1: Bottom-Plate Sampling
Method 2: Differential Circuit (Success)
Charge injections are related to the gate-source voltage of SH switches, where gate voltages are the control signals and source voltages are RBL voltages and VREFs. If RBL voltage and VREF are slightly different, injected charges are almost identical. Injected charges will be different if there is a significant difference between RBL voltage and VREF. Still, thanks to its considerable difference, the difference in charge injection will only affect comparison slightly.
The problem is the different output impedance of the RBL and VREF sides. My solution was simple: add a TG switch at the VREF side to balance the impedances. It would be ideal to use 15 TG switches with different sizes, each cancel charge injection occurring by 15 different VREFs, but finding an optimal TG switch for 15 comparators would be recommended.
M. E. Sinangil et al., JSSC, 2021
[Figure 6] Solution 2: Differential Circuit