上帝視角：多核系統(tǒng)的負(fù)載均衡

時(shí)間：2021-10-14 16:55:09

關(guān)鍵字：負(fù)載均衡

手機(jī)看文章

掃描二維碼
隨時(shí)隨地手機(jī)看文章

[導(dǎo)讀]我們知道為了CPU之間減少“干擾”，每個(gè)CPU上都有一個(gè)任務(wù)隊(duì)列。運(yùn)行的過程種可能會(huì)出現(xiàn)有的CPU很忙，有的CPU很閑，如下圖所示：為了避免這個(gè)問題的出現(xiàn)，Linux內(nèi)核實(shí)現(xiàn)了CPU可運(yùn)行進(jìn)程隊(duì)列之間的負(fù)載均衡。因?yàn)樨?fù)載均衡是在多個(gè)核上的均衡，所以在講解負(fù)載均衡之前，我們先看下多...

我們知道為了 CPU 之間減少“干擾”，每個(gè) CPU 上都有一個(gè)任務(wù)隊(duì)列。運(yùn)行的過程種可能會(huì)出現(xiàn)有的 CPU 很忙，有的 CPU 很閑，如下圖所示：

為了避免這個(gè)問題的出現(xiàn)，Linux 內(nèi)核實(shí)現(xiàn)了 CPU 可運(yùn)行進(jìn)程隊(duì)列之間的負(fù)載均衡。

因?yàn)樨?fù)載均衡是在多個(gè)核上的均衡，所以在講解負(fù)載均衡之前，我們先看下多核的架構(gòu)。

將 task 從負(fù)載較重的 CPU 上轉(zhuǎn)移到負(fù)載相對(duì)較輕的 CPU 上執(zhí)行，這個(gè)過程就是負(fù)載均衡的過程。

多核架構(gòu)

這里以 Arm64 的 NUMA(Non Uniform Memory Access) 架構(gòu)為例，看下多核架構(gòu)的組成。

從圖中可以看出，這是非一致性內(nèi)存訪問。每個(gè) CPU 訪問 local memory，速度更快，延遲更小。因?yàn)?Interconnect 模塊的存在，整體的內(nèi)存會(huì)構(gòu)成一個(gè)內(nèi)存池，所以 CPU 也能訪問 remote memory，但是相對(duì) local memory 來說速度更慢，延遲更大。

我們知道一個(gè)多核心的 SOC 片上系統(tǒng)，內(nèi)部結(jié)構(gòu)是很復(fù)雜的。內(nèi)核采用 CPU 拓?fù)?/strong>結(jié)構(gòu)來描述一個(gè) SOC 的架構(gòu)，使用調(diào)度域和調(diào)度組來描述 CPU 之間的層次關(guān)系。
CPU 拓?fù)?/span>
每一個(gè) CPU 都會(huì)維護(hù)這么一個(gè)結(jié)構(gòu)體實(shí)例，用來描述 CPU 拓?fù)洹?/p>struct?cpu_topology?{ ?int?thread_id; ?int?core_id; ?int?cluster_id; ?cpumask_t?thread_sibling; ?cpumask_t?core_sibling; };
thread_id: 從 mpidr_el1 寄存器中獲取
core_id：從 mpidr_el1 寄存器中獲取
cluster_id：從mpidr_el1寄存器中獲取
thread_sibling：當(dāng)前 CPU 的兄弟 thread。
core_sibling：當(dāng)前 CPU 的兄弟Core，即在同一個(gè) Cluster 中的 CPU。
可以通過 /sys/devices/system/cpu/cpuX/topology 查看 cpu topology 的信息。
cpu_topology 結(jié)構(gòu)體是通過函數(shù) parse_dt_topology() 解析 DTS 中的信息建立的:
kernel_init() -> kernel_init_freeable() -> smp_prepare_cpus() -> init_cpu_topology() -> parse_dt_topology()
static?int?__init?parse_dt_topology(void) { ?struct?device_node?*cn,?*map; ?int?ret?=?0; ?int?cpu; ?cn?=?of_find_node_by_path("/cpus");??????????------(1) ?if?(!cn)?{ ??pr_err("No?CPU?information?found?in?DT\n"); ??return?0; ?} ?/* ??*?When?topology?is?provided?cpu-map?is?essentially?a?root ??*?cluster?with?restricted?subnodes. ??*/ ?map?=?of_get_child_by_name(cn,?"cpu-map");???------(2) ?if?(!map) ??goto?out; ?ret?=?parse_cluster(map,?0);?????????????????------(3) ?if?(ret?!=?0) ??goto?out_map; ?topology_normalize_cpu_scale(); ?/* ??*?Check?that?all?cores?are?in?the?topology;?the?SMP?code?will ??*?only?mark?cores?described?in?the?DT?as?possible. ??*/ ?for_each_possible_cpu(cpu) ??if?(cpu_topology[cpu].cluster_id?==?-1) ???ret?=?-EINVAL; out_map: ?of_node_put(map); out: ?of_node_put(cn); ?return?ret; }
找到 dts 中 cpu topology 的根節(jié)點(diǎn) "/cpus"
找到 "cpu-map" 節(jié)點(diǎn)
解析 "cpu-map" 中的 cluster
以 i.mx8qm 為例，topology 為：”4A53 2A72”，dts中定義如下：
#?imx8qm.dtsi cpus:?cpus?{ ????????#address-cells?=?<2>; ????????#size-cells?=?<0>; ????????A53_0:?cpu@0?{ ????????????????device_type?=?"cpu"; ????????????????compatible?=?"arm,cortex-a53",?"arm,armv8"; ????????????????reg?=?<0x0?0x0>; ????????????????clocks?=?<