當一個封包從網路孔近來,會經過底下7個主要步驟。
網路孔接收到封包
透過PCIe,直接把封包DMA到 DMA memory上去。
複製完後,網路卡向CPU發出硬體中斷。
CPU收到硬體中斷後,會去執行軟體中斷表。
ksoftirqd會起來處理軟體中斷,並且呼叫網路卡註冊的Poll function。
執行續會開始執行把Frame從 DMA memory 複製到 Ring Buffer。
協定程式會把data複製到user space可存取的memory內。
直接記憶體存取(Direct Memory Access,DMA)。讓記憶體可以直接被 Device 直接存取,不需要透過 interrupt。
而DMA有兩種方式:
(1) snooping protocol:The snooping protocol only works with a bus-based system, and uses a number of states to determine whether or not it needs to update cache entries, and whether it has control over writing to the block.
(2) directory-based protocol: a directory is used which holds information about which memory locations are being shared in multiple caches, and which are used exclusively by one core’s cache.
Intel’s Core 2 Duo tries to speed up cache coherence by being able to query the second core’s L1 cache and the shared L2 cache simultaneously. Having a shared L2 cache also has the added benefit that a coherence protocol does not need to be set for this level. AMD’s Athlon 64 X2, however, has to monitor cache coherence in both L1 and L2 caches. This is made fasterby using the Hyper Transport connection, but still has more overhead than Intel’s model.
Intel is developing their Quick path interconnect, which is a 20-bit wide bus running between 4.8 and 6.4 GHz; AMD’s new Hyper Transport 3.0 is a 32-bit wide bus and runs at 5.2 GHz. Using five mesh networks gives the Tile architecture a per core bandwidth of up to 1.28 Tbps (terabits per second).
Intel Xeon DDIO 可以直接存取 L3 cache.
Ref: https://www.sciencedirect.com/topics/engineering/cache-coherence