CPU學(xué)習(xí) (Cache Coherence)
在2004年寫的一篇文章x86匯編語(yǔ)言學(xué)習(xí)手記(1)中,曾經(jīng)涉及到gcc編譯的代碼默認(rèn)16字節(jié)棧對(duì)齊的問題。之所以這樣做,主要是性能優(yōu)化方面的考慮?! 〈蠖鄶?shù)現(xiàn)代cpu都o(jì)ne-die了l1和l2cache。對(duì)于l1 cache,大多是write though的;l2 cache則是write back的,不會(huì)立即寫回memory,這就會(huì)導(dǎo)致cache和memory的內(nèi)容的不一致;另外,對(duì)于mp(multi processors)的環(huán)境,由于cache是cpu私有的,不同cpu的cache的內(nèi)容也存在不一致的問題,因此很多mp的的計(jì)算架構(gòu),不論是ccnuma還是smp都實(shí)現(xiàn)了cache coherence的機(jī)制,即不同cpu的cache一致性機(jī)制?! ache coherence的一種實(shí)現(xiàn)是通過cache-snooping協(xié)議,每個(gè)cpu通過對(duì)bus的snoop實(shí)現(xiàn)對(duì)其它c(diǎn)pu讀寫cache的監(jiān)控: 首先,cache line是cache和memory之間數(shù)據(jù)傳輸?shù)淖钚卧??! ?. 當(dāng)cpu1要寫cache時(shí),其它c(diǎn)pu就會(huì)檢查自己cache中對(duì)應(yīng)的cache line,如果是dirty的,就write back到memory,并且會(huì)將cpu1的相關(guān)cache line刷新;如果不是dirty的,就invalidate該cache line. 2. 當(dāng)cpu1要讀cache時(shí),其它c(diǎn)pu就會(huì)將自己cache中對(duì)應(yīng)的cache line中標(biāo)記為dirty的部分write back到memory,并且會(huì)將cpu1的相關(guān)cache line刷新?! ∷裕岣遚pu的cache hit rate,減少cache和memory之間的數(shù)據(jù)傳輸,將會(huì)提高系統(tǒng)的性能?! ∫虼?,在程序和二進(jìn)制對(duì)象的內(nèi)存分配中保持cache line aligned就十分重要,如果不保證cache line對(duì)齊,出現(xiàn)多個(gè)cpu中并行運(yùn)行的進(jìn)程或者線程同時(shí)讀寫同一個(gè)cache line的情況的概率就會(huì)很大。這時(shí)cpu的cache和memory之間會(huì)反復(fù)出現(xiàn)write back和refresh情況,這種情形就叫做cache thrashing?! 榱擞行У谋苊鈉ache thrashing,通常有以下兩種途徑: 1. 對(duì)于heap的分配,很多系統(tǒng)在malloc調(diào)用中實(shí)現(xiàn)了強(qiáng)制的alignment.
2. 對(duì)于stack的分配,很多編譯器提供了stack aligned的選項(xiàng)?! ‘?dāng)然,如果在編譯器指定了stack aligned,程序的尺寸將會(huì)變大,會(huì)占用更多的內(nèi)存。因此,這中間的取舍需要仔細(xì)考慮,下面是我在google上搜索到的一段討論:one of our customers complained about the additional code generated to
maintain the stack aligned to 16-byte boundaries, and suggested us to
default to the minimum alignment when optimizing for code size. this
has the caveat that, when you link code optimized for size with code
optimized for speed, if a function optimized for size calls a
performance-critical function with the stack misaligned, the
performance-critical function may perform poorly.