内存排序
外观
内存排序是指CPU访问主存时的顺序。可以是编译器在编译时产生,也可以是CPU在运行时产生。反映了内存操作重排序,乱序执行,从而充分利用不同内存的总线带宽。
现代处理器大都是乱序执行。因此需要内存屏障以确保多线程的同步。
编译时内存排序
[编辑]编译时内存屏障
[编辑]这些内存屏障阻止编译器在编译时乱序指令,但在运行时无效。
- GNU内联汇编语句
asm volatile("" ::: "memory");
或者
__asm__ __volatile__ ("" ::: "memory");
- C11/C++11
atomic_signal_fence(memory_order_acq_rel);
阻止编译器跨越它乱序读/写指令。[2]
- Intel_C++编译器使用"full compiler fence"
__memory_barrier()
_ReadWriteBarrier()
运行时内存排序
[编辑]- happens-before:按照程序的代码序执行
- synchronized-with:不同线程间,对于同一个原子操作,需要同步关系,store()操作一定要先于 load(),也就是说 对于一个原子变量x,先写x,然后读x是一个同步的操作
对称多处理器(SMP)系统
[编辑]对称多处理器(SMP)系统有多个内存一致模型。
- 顺序一致(Sequential consistency):同一个线程的原子操作还是按照happens-before关系,但不同线程间的执行关系是任意
- 松弛一致(Relaxed consistency,允许某种类型的重排序):如果某个操作只要求是原子操作,除此之外,不需要其它同步的保障,就可以使用 Relaxed ordering。程序计数器是一种典型的应用场景
- 弱一致(Weak consistency):读写任意排序,受显式的内存屏障限制。
类型 | Alpha | ARMv7 | MIPS | LoongISA | PA-RISC | POWER | SPARC RMO | SPARC PSO | SPARC TSO | x86 | x86 oostore | AMD64 | IA-64 | z/Architecture |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loads reordered after loads | Y | Y | 架构本身不规定 微架构/芯片的实现决定 |
Y | Y | Y | Y | Y | Y | |||||
Loads reordered after stores | Y | Y | Y | Y | Y | Y | Y | Y | ||||||
Stores reordered after stores | Y | Y | Y | Y | Y | Y | Y | Y | Y | |||||
Stores reordered after loads | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
Atomic reordered with loads | Y | Y | Y | Y | Y | |||||||||
Atomic reordered with stores | Y | Y | Y | Y | Y | Y | ||||||||
Dependent loads reordered | Y | |||||||||||||
Incoherent instruction cache pipeline | Y | Y | Y | Y | Y | Y | Y | Y | Y |
某些老的x86有更弱内存序。[8]
SPARC 内存序:
- SPARC TSO = total store order (default)
- SPARC RMO = relaxed-memory order (not supported on recent CPUs)
- SPARC PSO = partial store order (not supported on recent CPUs)
硬件内存屏障
[编辑]lfence (asm), void _mm_lfence(void) sfence (asm), void _mm_sfence(void)[9] mfence (asm), void _mm_mfence(void)[10]
sync (asm)
sync (asm)
mf (asm)
dcs (asm)
dmb (asm) dsb (asm) isb (asm)
编译器对硬件内存屏障的支持
[编辑]- GCC,[12] version 4.4.0 and later,[13] has
__sync_synchronize
. - C11/C++11
atomic_thread_fence()
支持一条命令 - Microsoft Visual C++[14] has
MemoryBarrier()
. - Sun Studio Compiler Suite[15] has
__machine_r_barrier
,__machine_w_barrier
and__machine_rw_barrier
.
参见
[编辑]参考文献
[编辑]- ^ GCC compiler-gcc.h. [2018-12-06]. (原始内容存档于2011-07-24).
- ^ 存档副本. [2018-12-06]. (原始内容存档于2020-08-10).
- ^ ECC compiler-intel.h. [2018-12-06]. (原始内容存档于2011-07-24).
- ^ Intel(R) C++ Compiler Intrinsics Reference (页面存档备份,存于互联网档案馆)
Creates a barrier across which the compiler will not schedule any data access instruction. The compiler may allocate local data in registers across a memory barrier, but not global data.
- ^ Visual C++ Language Reference _ReadWriteBarrier (页面存档备份,存于互联网档案馆)
- ^ Memory Ordering in Modern Microprocessors by Paul McKenney (PDF). [2018-12-06]. (原始内容存档 (PDF)于2020-10-31).
- ^ Memory Barriers: a Hardware View for Software Hackers (页面存档备份,存于互联网档案馆), Figure 5 on Page 16
- ^ Table 1. Summary of Memory Ordering (页面存档备份,存于互联网档案馆), from "Memory Ordering in Modern Microprocessors, Part I"
- ^ SFENCE — Store Fence. [2018-12-06]. (原始内容存档于2019-06-13).
- ^ MFENCE — Memory Fence. [2018-12-06]. (原始内容存档于2019-09-05).
- ^ Data Memory Barrier, Data Synchronization Barrier, and Instruction Synchronization Barrier.. [2020-12-20]. (原始内容存档于2020-06-19).
- ^ Atomic Builtins. [2018-12-06]. (原始内容存档于2017-11-08).
- ^ 存档副本. [2018-12-06]. (原始内容存档于2020-10-31).
- ^ MemoryBarrier macro. [2018-12-06]. (原始内容存档于2017-04-04).
- ^ Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fence [1] (页面存档备份,存于互联网档案馆)
进一步阅读
[编辑]- Computer Architecture — A quantitative approach. 4th edition. J Hennessy, D Patterson, 2007. Chapter 4.6
- Sarita V. Adve, Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial (页面存档备份,存于互联网档案馆)
- Intel 64 Architecture Memory Ordering White Paper (页面存档备份,存于互联网档案馆)
- Memory ordering in Modern Microprocessors part 1 (页面存档备份,存于互联网档案馆)
- Memory ordering in Modern Microprocessors part 2 (页面存档备份,存于互联网档案馆)
- YouTube上的IA (Intel Architecture) Memory Ordering - Google Tech Talk