Lec3 Cache

阅读信息

608 字 3 分钟本页总访问量加载中...

Cache miss的处理

关于cache miss代价的几个概念：

Latency（延迟）：等多久才开始取第一个字
Bandwidth（带宽）：把整个block取完的时间
Miss penalty：从内存中取数据的时间，等于latency+bandwidth

Cache miss的分类：

Compulsory miss：第一次访问时，cache中没有对应数据
Capacity miss：cache太小导致要用的数据被替换
Conflict miss：不同地址映射到同一位置，tag不匹配导致的miss

Cache设计

映射策略

地址划分：

\[ \text{address} = \text{tag\, |\, index\, |\, offset} \]

Cache line结构：

\[ \text{cache line} = \text{valid bit\, |\, dirty bit\, |\, tag\, |\, data block} \]

映射策略：direct mapping, set-associative, fully associative。（略）

替换策略

FIFO, LRU, random, OPT。循环访问的情况下，FIFO和LRU可能全miss。（略）

Q：堆栈式替换算法是什么？

用 \(B_t(n)\) 表示时间t、每个set中n个block时，cache中存储的block集合。如果满足 \(B_t(n)\in B_{t+1}(n)\)，若增大cache则一定包含原有的块，则称为堆栈式替换算法（stack replacement algorithm）。

E.g.，LRU为堆栈式替换算法。可按“最近使用程度”排成栈，栈顶表示最近访问。

堆栈式算法LRU示例

访问序列：2 3 2 1 5 2 4 5 3 2

横轴表示访问地址，纵轴表示堆栈，栈顶为最近访问，*表示当前地址hit所需的栈的大小。

	2	3	2	1	5	2	4	5	3	2
St(1)	2	3	2	1	5	2	4	5	3	2
St(2)		2	3*	2	1	5	2	4	5	3
St(3)				3	2	1*	5	2*	4	5
St(4)					3	3	1	1	2	4*
St(5)							3	3	1*	1

Block数与命中率的关系：

若使用堆栈式替换算法，block数增加后，hit rate上升或不变。
Belady现象：若使用非堆栈式替换算法（如FIFO），block数增加后，hit rate可能反而下降。本质为小cache中保留的块，大cache中不一定仍保留。

Q：怎么硬件实现LRU？

比较对方法（comparison pair method）：对每两个block维护一个单比特触发器，记录它们之间谁最近被访问。若共 \(n\) 个block，则需 \(n(n+1)/2\) 个触发器。访问某个block后，更新所有与之相关的 \(n-1\) 个触发器。替换时，对每个block判断是否满足LRU。

读写策略

Write hit时的策略：

Write through：同时写cache和memory，两者始终一致。（可优化添加write buffer，防止CPU卡住。）
Write back：只写cache，并维护dirty bit，等写的block被替换时在写回memory。

Write miss时的策略：

No write allocate：和write through对应，直接写memory，不写cache。
Write allocate：和write back对应，先将block从memory中读进cache，再只写cache。

策略组合：

Write through + no write allocate（少见）：cache中有则统一写，直接读；cache中没有则只写memory，等read miss时将已写的block加载到cache。
Write back + write allocate（常见）：cache中有则只写cache，直接读；cache中没有则将旧块写回，加载新块后写cache。

两种策略读写

操作序列：

Text Only
1 2 3 4 5	`write Mem[100] write Mem[100] read Mem[200] write Mem[200] write Mem[100]`

write allocate + write back:

	cache	memory
miss	y(d)	n
hit	y(d)	n
miss	write back
hit	y(d)	n
miss	y(d)	n

write allocate + write through:

	cache	memory
miss	n	y
miss	n	y
miss
hit	y	y
miss	n	y

访存时间计算

AMAT（average memory access time）指平均访存时间。

\[ \text{AMAT} = \text{hit time} + \text{miss rate} \times \text{miss penalty} \]

当有多级cache（L1、L2）时，L1的miss penalty嵌套L2的访问，公式为：

\[ \text{AMAT} = \text{hit time}_{L_1} + \text{miss rate}_{L_1} \times \text{miss penalty}_{L_1} \]

\[ \text{miss penalty}_{L_1} = \text{hit time}_{L_2} + \text{miss rate}_{L_2} \times \text{miss penalty}_{L_2} \]

在计算CPU时间时，额外的stall会影响CPI：

\[ \begin{align*} \text{CPI} &= \text{CPI}_{ideal} + \text{memory stall} \\ &= \text{CPI}_{ideal} + \text{miss rate} \times \text{miss penalty} \times \text{memory access rate} \end{align*} \]

因此，优化cache有三种方向：

降低miss rate：增大cache，提高associativity，优化替换策略
降低miss penalty：多级cache，指令预取，关键词优先
降低hit time：简单cache，直接映射（存在trade-off）