一垄青竹映陋室,半枝桃花衬碧窗.

Thursday, May 03, 2007

从setup_arch出发!

本文纯属自己的分析,基于自己移植的2.6.20.6版本内核。
setup_arch的调用路径如下:
在start_kernel()[init/main.c]->setup_arch()[arch/arm/kernel/setup.c]
该的函数执行体包括如下调用:
1)首先让tags指向默认的init_tags:
/*
* This holds our defaults.
*/
static struct init_tags {
struct tag_header hdr1;
struct tag_core core;
struct tag_header hdr2;
struct tag_mem32 mem;
struct tag_header hdr3;
} init_tags __initdata = {
{ tag_size(tag_core), ATAG_CORE },
{ 1, PAGE_SIZE, 0xff },
{ tag_size(tag_mem32), ATAG_MEM },
{ MEM_SIZE, PHYS_OFFSET },
{ 0, ATAG_NONE }
};
该TAG列表仅仅包括了CORE以及MEM32,通常都会在后面被替换掉。只要在
struct machine_desc中的boot_params有值,就会通过下面的语句来
替换这个默认的TAG列表。
if (mdesc->boot_params)
tags = phys_to_virt(mdesc->boot_params);
boot_params是存放内核参数的物理内存地址,通过phys_to_virt的转换
才能供内核访问。因此,我们做bootloader的时候,就要在boot_params
对应的物理内存处存放TAG列表。
但是,如果有的bootloader在boot_params处使用的是老式的参数表格式,
就会试图先将老式的列表转换成现在用的TAG列表。当然转换可能会转换失败,这时
就再次使用默认的列表。TAG列表的第一项必须是ATAG_CORE,因此可以据此来判断
格式是否正确。
/*
* If we have the old style parameters, convert them to
* a tag list.
*/
if (tags->hdr.tag != ATAG_CORE)
convert_to_tag_list(tags);
if (tags->hdr.tag != ATAG_CORE)
tags = (struct tag *)&init_tags;
2)setup_processor 和 setup_machine 的分析
setup_processor函数主要现调用lookup_processor_type来在连接器创建的支持的处理器类型的
列表中寻找与当前正运行的处理器匹配的项,一旦找到,就会返回struct proc_info_list
的指针,该指针包括该处理器的一些信息,如名字,TLB,Cache等,程序会将这些信息的输出。
接着调用cpu_proc_init,实际上这是一个函数替换调用,实际上是调用的是位于
arch/arm/mm/proc-XXX.S中的cpu_armXXX_proc_init。
我们先看lookup_processor_type,这个函数位于arch/arm/kernel/head-common.S
中。
/*
* Read processor ID register (CP#15, CR0), and look up in the linker-built
* supported processor list. Note that we can't use the absolute addresses
* for the __proc_info lists since we aren't running with the MMU on
* (and therefore, we are not in the correct address space). We have to
* calculate the offset.
*
* r9 = cpuid
* Returns:
* r3, r4, r6 corrupted
* r5 = proc_info pointer in physical address space
* r9 = cpuid (preserved)
*/
.type __lookup_processor_type, %function
__lookup_processor_type:
adr r3, 3f @如果这个函数是在head.S中被调用的话,那时候MMU还没开,r3所存的标号3的地址还是物理地址,@
@而__proc_info_begin等是编译时产生的虚拟地址,物理地址和虚拟地址之间有一个固定偏移@
@但是如果这个函数的API封装lookup_processor_type被在setup.c中调用的话,那时候MMU已经@
@打开,则r3所存的标号3的地址也就是虚拟地址,这时候地址示统一的虚拟地址,所以下面的求和以及@
@求差也没有问题@
ldmda r3, {r5 - r7} @r7=__proc_info_begin,r6=__proc_info_end,r5是编译时产生的标号3的虚拟地址@
sub r3, r3, r7 @ get offset between virt&phys
add r5, r5, r3 @ convert virt addresses to
add r6, r6, r3 @ physical address space
1: ldmia r5, {r3, r4} @ value, mask
and r4, r4, r9 @ mask wanted bits
teq r3, r4 @r3存放的是编译是连接器产生的支持的CPU型号的值,与r9中现运行的CPU型号比较@
beq 2f
add r5, r5, #PROC_INFO_SZ @ sizeof(proc_info_list)
cmp r5, r6
blo 1b
mov r5, #0 @ unknown processor
2: mov pc, lr

/*
* This provides a C-API version of the above function.
*/
ENTRY(lookup_processor_type)
stmfd sp!, {r4 - r7, r9, lr}
mov r9, r0
bl __lookup_processor_type
mov r0, r5 @上述调用将r5指向特定CPU的proc_info_list的物理地址,或者空指针,这里将它作为返回值@
ldmfd sp!, {r4 - r7, r9, pc}

/*
* Look in include/asm-arm/procinfo.h and arch/arm/kernel/arch.[ch] for
* more information about the __proc_info and __arch_info structures.
*/
.long __proc_info_begin
.long __proc_info_end
3: .long .
.long __arch_info_begin
.long __arch_info_end

类似的,还有如下函数lookup_machine_type,该函数被setup_machine调用,来确定支持的
machine_arch_type,最后会返回struct machine_desc指针,该指针实际上就是对于块板子
都要提供的配置结构,用MACHINE_START和MACHINE_END来封装,例如我们的板子:
MACHINE_START(ZR4230, "ZR4230 ZOLO Board")
/* Maintainer: Cory WX Xie */
.phys_io = SPU_BASE,
.io_pg_offst = ((io_p2v(SPU_BASE)) >> 18) & 0xfffc,
.fixup = fixup_zolo,
.boot_params = 0x00000100,/*phy sdram addr for params*/
.map_io = zr4230_zolo_map_io,
.init_irq = zr4230_zolo_init_irq,
.init_machine = zr4230_zolo_init,
.timer = &zr4230_kernel_sys_timer,
MACHINE_END
/*
* Lookup machine architecture in the linker-build list of architectures.
* Note that we can't use the absolute addresses for the __arch_info
* lists since we aren't running with the MMU on (and therefore, we are
* not in the correct address space). We have to calculate the offset.
*
* r1 = machine architecture number
* Returns:
* r3, r4, r6 corrupted
* r5 = mach_info pointer in physical address space
*/
.type __lookup_machine_type, %function
__lookup_machine_type:
adr r3, 3b
ldmia r3, {r4, r5, r6}
sub r3, r3, r4 @ get offset between virt&phys
add r5, r5, r3 @ convert virt addresses to
add r6, r6, r3 @ physical address space
1: ldr r3, [r5, #MACHINFO_TYPE] @ get machine type
teq r3, r1 @ matches loader number?
beq 2f @ found
add r5, r5, #SIZEOF_MACHINE_DESC @ next machine_desc
cmp r5, r6
blo 1b
mov r5, #0 @ unknown machine
2: mov pc, lr

/*
* This provides a C-API version of the above function.
*/
ENTRY(lookup_machine_type)
stmfd sp!, {r4 - r6, lr}
mov r1, r0
bl __lookup_machine_type
mov r0, r5
ldmfd sp!, {r4 - r6, pc}
注意,上面两个函数都只能在MMU没有打开的情况下使用。
我们再进一步看看上面两个函数相关的代码:其实这两个函数只是C API的封装,在内核的最现阶段,
都直接调用汇编就行了。我们看内核的入口:
/*
* Kernel startup entry point.
* ---------------------------
*
* This is normally called from the decompressor code. The requirements
* are: MMU = off, D-cache = off, I-cache = dont care, r0 = 0,
* r1 = machine nr.
*
* This code is mostly position independent, so if you link the kernel at
* 0xc0008000, you call this at __pa(0xc0008000).
*
* See linux/arch/arm/tools/mach-types for the complete list of machine
* numbers for r1.
*
* We're trying to keep crap to a minimum; DO NOT add any machine specific
* crap here - that's what the boot loader (or in extreme, well justified
* circumstances, zImage) is for.
*/
__INIT
.type stext, %function
ENTRY(stext)
msr cpsr_c, #PSR_F_BIT | PSR_I_BIT | SVC_MODE @ ensure svc mode
@ and irqs disabled
mrc p15, 0, r9, c0, c0 @ get processor id
bl __lookup_processor_type @ r5=procinfo r9=cpuid
movs r10, r5 @ invalid processor (r5=0)?
beq __error_p @ yes, error 'p'
bl __lookup_machine_type @ r5=machinfo
movs r8, r5 @ invalid machine (r5=0)?
beq __error_a @ yes, error 'a'
bl __create_page_tables

/*
* The following calls CPU specific code in a position independent
* manner. See arch/arm/mm/proc-*.S for details. r10 = base of
* xxx_proc_info structure selected by __lookup_machine_type
* above. On return, the CPU will be ready for the MMU to be
* turned on, and r0 will hold the CPU control register value.
*/
ldr r13, __switch_data @ address to jump to after
@ mmu has been enabled
adr lr, __enable_mmu @ return (PIC) address
add pc, r10, #PROCINFO_INITFUNC
这里,首先起强制进入SVC模式,并读取CP15 c0来得到CPU id,调用__lookup_processor_type来
得到proc_info_list的指针,指向当前CPU的信息结构。proc_info_list结构定义在
中,
/*
* Note! struct processor is always defined if we're
* using MULTI_CPU, otherwise this entry is unused,
* but still exists.
*
* NOTE! The following structure is defined by assembly
* language, NOT C code. For more information, check:
* arch/arm/mm/proc-*.S and arch/arm/kernel/head.S
*/
struct proc_info_list {
unsigned int cpu_val;
unsigned int cpu_mask;
unsigned long __cpu_mm_mmu_flags; /* used by head.S */
unsigned long __cpu_io_mmu_flags; /* used by head.S */
unsigned long __cpu_flush; /* used by head.S */
const char *arch_name;
const char *elf_name;
unsigned int elf_hwcap;
const char *cpu_name;
struct processor *proc;
struct cpu_tlb_fns *tlb;
struct cpu_user_fns *user;
struct cpu_cache_fns *cache;
};
我们看看对于ARMv6的处理器,该结构位于arch/arm/mm/proc-v6.S中:
.section ".proc.info.init", #alloc, #execinstr @这里将该段信息连接到.proc.info.init@
@连接器会用__proc_info_begin和__proc_info_end@
@来包围这个段@
/*
* Match any ARMv6 processor core.
*/
.type __v6_proc_info, #object
__v6_proc_info:
.long 0x0007b000 @cpu_val@
.long 0x0007f000 @cpu_mask@
.long PMD_TYPE_SECT | \
PMD_SECT_BUFFERABLE | \
PMD_SECT_CACHEABLE | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ @__cpu_mm_mmu_flags@
.long PMD_TYPE_SECT | \
PMD_SECT_XN | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ @__cpu_io_mmu_flags@
b __v6_setup @__cpu_flush,注意,是一条跳转指令,@
.long cpu_arch_name
.long cpu_elf_name
.long HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP|HWCAP_JAVA
.long cpu_v6_name
.long v6_processor_functions
.long v6wbi_tlb_fns
.long v6_user_fns
.long v6_cache_fns
.size __v6_proc_info, . - __v6_proc_info
我们在前面看到内核入口不久,就有一条PC赋值跳转:
add pc, r10, #PROCINFO_INITFUNC
这里,r10就是上面__v6_proc_info的地址,并且在arch/arm/kernel/asm-offset.c中有如下语句:
DEFINE(PROCINFO_INITFUNC, offsetof(struct proc_info_list, __cpu_flush));
这就是说,PROCINFO_INITFUNC就是元素__cpu_flush在struct proc_info_list中的偏移。
那么,上面的PC赋值跳转就将PC指向__v6_proc_info中__cpu_flush的位子,也就是下面的指令:
b __v6_setup
将得到执行。
.section ".text.init", #alloc, #execinstr

/*
* __v6_setup
*
* Initialise TLB, Caches, and MMU state ready to switch the MMU
* on. Return in r0 the new CP15 C1 control register setting.
*
* We automatically detect if we have a Harvard cache, and use the
* Harvard cache control instructions insead of the unified cache
* control instructions.
*
* This should be able to cover all ARMv6 cores.
*
* It is assumed that:
* - cache type register is implemented
*/
__v6_setup:
#ifdef CONFIG_SMP
/* Set up the SCU on core 0 only */
mrc p15, 0, r0, c0, c0, 5 @ CPU core number
ands r0, r0, #15
moveq r0, #0x10000000 @ SCU_BASE
orreq r0, r0, #0x00100000
ldreq r5, [r0, #SCU_CTRL]
orreq r5, r5, #1
streq r5, [r0, #SCU_CTRL]

#ifndef CONFIG_CPU_DCACHE_DISABLE
mrc p15, 0, r0, c1, c0, 1 @ Enable SMP/nAMP mode
orr r0, r0, #0x20
mcr p15, 0, r0, c1, c0, 1
#endif
#endif
@对于不是SMP的配置,实际的代码从这里开始@
mov r0, #0
mcr p15, 0, r0, c7, c14, 0 @ clean+invalidate D cache
mcr p15, 0, r0, c7, c5, 0 @ invalidate I cache
mcr p15, 0, r0, c7, c15, 0 @ clean+invalidate cache
mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
#ifdef CONFIG_MMU
mcr p15, 0, r0, c8, c7, 0 @ invalidate I + D TLBs
mcr p15, 0, r0, c2, c0, 2 @ TTB control register
#ifdef CONFIG_SMP
orr r4, r4, #TTB_RGN_WBWA|TTB_S @ mark PTWs shared, outer cacheable
#endif
mcr p15, 0, r4, c2, c0, 1 @ load TTB1 @r4在前面__create_page_tables@
@设置为页表基地址@
#endif /* CONFIG_MMU */
adr r5, v6_crval
ldmia r5, {r5, r6}
mrc p15, 0, r0, c1, c0, 0 @ read control register
bic r0, r0, r5 @ clear bits them
orr r0, r0, r6 @ set them
mov pc, lr @ return to head.S:__ret
@注意前面在进入前有adr lr, __enable_mmu,这里就是跳转到@
@__enable_mmu处执行,跳转前已经将CP15的c1控制寄存器的新值@
@准备好@


/*
* V X F I D LR
* .... ...E PUI. .T.T 4RVI ZFRS BLDP WCAM
* rrrr rrrx xxx0 0101 xxxx xxxx x111 xxxx < forced
* 0 110 0011 1.00 .111 1101 < we want
*/
.type v6_crval, #object
v6_crval:
crval clear=0x01e0fb7f, mmuset=0x00c0387d, ucset=0x00c0187c

我们接着看下去:
/*
* Setup common bits before finally enabling the MMU. Essentially
* this is just loading the page table pointer and domain access
* registers.
*/
.type __enable_mmu, %function
__enable_mmu:
#ifdef CONFIG_ALIGNMENT_TRAP
orr r0, r0, #CR_A
#else
bic r0, r0, #CR_A
#endif
#ifdef CONFIG_CPU_DCACHE_DISABLE
bic r0, r0, #CR_C
#endif
#ifdef CONFIG_CPU_BPREDICT_DISABLE
bic r0, r0, #CR_Z
#endif
#ifdef CONFIG_CPU_ICACHE_DISABLE
bic r0, r0, #CR_I
#endif
mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
domain_val(DOMAIN_IO, DOMAIN_CLIENT))
mcr p15, 0, r5, c3, c0, 0 @ load domain access register
mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
b __turn_mmu_on @r0中已经有准备好的控制寄存器值@

/*
* Enable the MMU. This completely changes the structure of the visible
* memory space. You will not be able to trace execution through this.
* If you have an enquiry about this, *please* check the linux-arm-kernel
* mailing list archives BEFORE sending another post to the list.
*
* r0 = cp#15 control register
* r13 = *virtual* address to jump to upon completion
*
* other registers depend on the function called upon completion
*/
.align 5
.type __turn_mmu_on, %function
__turn_mmu_on:
mov r0, r0
mcr p15, 0, r0, c1, c0, 0 @ write control reg
mrc p15, 0, r3, c0, c0, 0 @ read id reg
mov r3, r3 @几条NOP指令很重要,主要是缓冲CPU的指令流水线@
mov r3, r3 @这前面3条指令是在打开MMU后执行的@
@那么他们的地址已经是虚拟地址了。@
mov pc, r13 @r13在前面已经用ldr r13, __switch_data赋值@
@注意,这里用的是ldr,而不是adr,所以r13也已经是虚拟
@地址了@
我们再次看到PC赋值跳转,这一次,是转到虚拟地址上去了,我们看看__switch_data的定义:
位于arch/arm/kernel/head-common.S的顶部:
.type __switch_data, %object
__switch_data:
.long __mmap_switched
.long __data_loc @ r4
.long __data_start @ r5
.long __bss_start @ r6
.long _end @ r7
.long processor_id @ r4
.long __machine_arch_type @ r5
.long cr_alignment @ r6
.long init_thread_union + THREAD_START_SP @ sp

也就是说,我们现在跳转到了__mmap_switched:
/*
* The following fragment of code is executed with the MMU on in MMU mode,
* and uses absolute addresses; this is not position independent.
*
* r0 = cp#15 control register
* r1 = machine ID
* r9 = processor ID
*/
.type __mmap_switched, %function
__mmap_switched:
adr r3, __switch_data + 4 @r3存放定义标号值__data_loc的地址@

ldmia r3!, {r4, r5, r6, r7}
cmp r4, r5 @ Copy data segment if needed
1: cmpne r5, r6
ldrne fp, [r4], #4
strne fp, [r5], #4
bne 1b

mov fp, #0 @ Clear BSS (and zero fp)
1: cmp r6, r7
strcc fp, [r6],#4
bcc 1b

ldmia r3, {r4, r5, r6, sp}
str r9, [r4] @ Save processor ID
str r1, [r5] @ Save machine type
bic r4, r0, #CR_A @ Clear 'A' bit
stmia r6, {r0, r4} @ Save control register values
b start_kernel @init/main.c@
这里就是搬移数据,清零BSS等操作,然后就是将processor id和machine id存放到全局变量
processor_id和__machine_arch_type中去。这两个变量都是在arch/arm/kernel/setup.c
中定义的。

3)回到setup_arch中来说
现在本来就该解析TAG列表了,但是在解析之前,内核还给了针对不同板子进行移植的人一次机会,就是通过
mdesc->fixup进行。我们可以看到,该fixup函数指针的参数包括了对struct machine_desc,
struct tag,以及cmdline启动参数,struct meminfo等的指针。我们可以定义自己的fixup函数
来对这些指针所指的内容进行修改或者赋值,从而满足自己板子的需要。
if (mdesc->fixup)
mdesc->fixup(mdesc, tags, &from, &meminfo);
例如我们的板子定义的fixup函数如下:
static void __init fixup_zolo(struct machine_desc *desc,
struct tag *tags, char **cmdline, struct meminfo *mi)
{
mi->nr_banks=1;
mi->bank[0].start =0x00000000;
mi->bank[0].node = 0;
mi->bank[0].size = (SZ_128M);
}
这里,因为我们只有一个bank的物理DDR SDRAM,并且我们想让TAG和cmdline都通过bootloader来传递,
所以我们仅仅只对struct meminfo *mi进行fixup。

if (tags->hdr.tag == ATAG_CORE) {
if (meminfo.nr_banks != 0)
squash_mem_tags(tags);@如果有内核定义的meminfo结构,就不需要从TAG列表的ATAG_MEM@
@传递内存信息了,因此这个函数就将这个TAG标为ATAG_NONE,以示结束@
parse_tags(tags);@根据不同的TAG标签调用相应的解析函数来解析TAG@
}
下面是初始化init_mm,保存和解析saved_command_line:
init_mm.start_code = (unsigned long) &_text;
init_mm.end_code = (unsigned long) &_etext;
init_mm.end_data = (unsigned long) &_edata;
init_mm.brk = (unsigned long) &_end;

memcpy(saved_command_line, from, COMMAND_LINE_SIZE);
saved_command_line[COMMAND_LINE_SIZE-1] = '\0';
parse_cmdline(cmdline_p, from);
解析cmdline和解析TAG列表类似,都是通过查找关键字段的方式,找到相应的项,然后调用相应的函数进行处理。

4) paging_init的分析
接着就调用paging_init(&meminfo, mdesc);该函数位于arch/arm/mm/mmu.c
/*
* paging_init() sets up the page tables, initialises the zone memory
* maps, and sets up the zero page, bad page and bad page tables.
*/
void __init paging_init(struct meminfo *mi, struct machine_desc *mdesc)
{
void *zero_page;

build_mem_type_table();
prepare_page_table(mi);
bootmem_init(mi);
devicemaps_init(mdesc);

top_pmd = pmd_off_k(0xffff0000);

/*
* allocate the zero page. Note that we count on this going ok.
*/
zero_page = alloc_bootmem_low_pages(PAGE_SIZE);
memzero(zero_page, PAGE_SIZE);
empty_zero_page = virt_to_page(zero_page);
flush_dcache_page(empty_zero_page);
}

首先看build_mem_type_table(),在这个文件的前面,有一个定义为struct mem_types mem_types[]
的数组,这是用来定义各种类型的存储器区域的MMU保护属性的,其中使用MT_DEVICE,MT_LOW_VECTORS,
MT_HIGH_VECTORS,MT_MEMORY,MT_ROM等作为该数组的下标,对应就是如下的struct mem_types结构:
struct mem_types {
unsigned int prot_pte;
unsigned int prot_l1;
unsigned int prot_sect;
unsigned int domain;
};

static struct mem_types mem_types[] __initdata = {
[MT_DEVICE] = {
.prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
L_PTE_WRITE,
.prot_l1 = PMD_TYPE_TABLE,
.prot_sect = PMD_TYPE_SECT | PMD_BIT4 | PMD_SECT_UNCACHED |
PMD_SECT_AP_WRITE,
.domain = DOMAIN_IO,
},
@此出省略了部分定义@
[MT_LOW_VECTORS] = {
.prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
L_PTE_EXEC,
.prot_l1 = PMD_TYPE_TABLE,
.domain = DOMAIN_USER,
},
[MT_HIGH_VECTORS] = {
.prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
L_PTE_USER | L_PTE_EXEC,
.prot_l1 = PMD_TYPE_TABLE,
.domain = DOMAIN_USER,
},
[MT_MEMORY] = {
.prot_sect = PMD_TYPE_SECT | PMD_BIT4 | PMD_SECT_AP_WRITE,
.domain = DOMAIN_KERNEL,
},
[MT_ROM] = {
.prot_sect = PMD_TYPE_SECT | PMD_BIT4,
.domain = DOMAIN_KERNEL,
},
@此出省略了部分定义@
[MT_NONSHARED_DEVICE] = {
.prot_l1 = PMD_TYPE_TABLE,
.prot_sect = PMD_TYPE_SECT | PMD_BIT4 | PMD_SECT_NONSHARED_DEV |
PMD_SECT_AP_WRITE,
.domain = DOMAIN_IO,
}
};
另外,这个文件的前面还有一个数组,是关于使用的Cache Policy的定义,包括uncached,buffered,
writethrough,writeback,writealloc等几种可能使用的Cache Policy。这个数组其实也可以用
#define CPOLICY_UNCACHED 0
#define CPOLICY_BUFFERED 1
#define CPOLICY_WRITETHROUGH 2
#define CPOLICY_WRITEBACK 3
#define CPOLICY_WRITEALLOC 4
等几个作为下标来选择具体的策略结构:
struct cachepolicy {
const char policy[16];
unsigned int cr_mask;
unsigned int pmd;
unsigned int pte;
};

static struct cachepolicy cache_policies[] __initdata = {
{
.policy = "uncached",
.cr_mask = CR_W|CR_C,
.pmd = PMD_SECT_UNCACHED,
.pte = 0,
}, {
.policy = "buffered",
.cr_mask = CR_C,
.pmd = PMD_SECT_BUFFERED,
.pte = PTE_BUFFERABLE,
}, {
.policy = "writethrough",
.cr_mask = 0,
.pmd = PMD_SECT_WT,
.pte = PTE_CACHEABLE,
}, {
.policy = "writeback",
.cr_mask = 0,
.pmd = PMD_SECT_WB,
.pte = PTE_BUFFERABLE|PTE_CACHEABLE,
}, {
.policy = "writealloc",
.cr_mask = 0,
.pmd = PMD_SECT_WBWA,
.pte = PTE_BUFFERABLE|PTE_CACHEABLE,
}
};

例如,该文件定义了一个cachepolicy变量,来指定当前选择的策略。
static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
这样,一旦选择了某种Cache Policy,那么相应的PMD和PTE项就有了特定的值。
当然,这里所设的值只是通常的配置,对于特定的CPU和ARM内核版本,可能个需要作适当的
调整。这就是build_mem_type_table()的功能所在。该函数就是针对用户的具体配置,
例如CONFIG_CPU_DCACHE_DISABLE,CONFIG_CPU_DCACHE_WRITETHROUGH,以及
对当前CPU的ARM内核版本的不同,首先选择一个特定的cachepolicy以及ecc_mask值。
并根据是不是XSCALE处理器内核而选择是否设置或者清除mem_types[i].prot_sect的
PMD_BIT4位,此外还根据ARM处理器版本设定一些特殊项,例如对于ARMv6的版本,有如下
的设置:
/*
* ARMv6 and above have extended page tables.
*/
if (cpu_arch >= CPU_ARCH_ARMv6 && (cr & CR_XP)) {@ARMv6的Subpage AP bits disabled@
/*
* bit 4 becomes XN which we must clear for the
* kernel memory mapping.
*/
mem_types[MT_MEMORY].prot_sect &= ~PMD_SECT_XN;
mem_types[MT_ROM].prot_sect &= ~PMD_SECT_XN;

/*
* Mark cache clean areas and XIP ROM read only
* from SVC mode and no access from userspace.
*/
mem_types[MT_ROM].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_MINICLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_CACHECLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;

/*
* Mark the device area as "shared device"
*/
mem_types[MT_DEVICE].prot_pte |= L_PTE_BUFFERABLE;
mem_types[MT_DEVICE].prot_sect |= PMD_SECT_BUFFERED;

}
我们再来看看prepare_page_table在干什么:
static inline void prepare_page_table(struct meminfo *mi)
{
unsigned long addr;

/*
* Clear out all the mappings below the kernel image.
*/
for (addr = 0; addr < MODULE_START; addr += PGDIR_SIZE)
pmd_clear(pmd_off_k(addr));

#ifdef CONFIG_XIP_KERNEL
/* The XIP kernel is mapped in the module area -- skip over it */
addr = ((unsigned long)&_etext + PGDIR_SIZE - 1) & PGDIR_MASK;
#endif
for ( ; addr < PAGE_OFFSET; addr += PGDIR_SIZE)
pmd_clear(pmd_off_k(addr));

/*
* Clear out all the kernel space mappings, except for the first
* memory bank, up to the end of the vmalloc region.
*/
for (addr = __phys_to_virt(mi->bank[0].start + mi->bank[0].size);
addr < VMALLOC_END; addr += PGDIR_SIZE)
pmd_clear(pmd_off_k(addr));
}
这个函数完成对虚拟地址0到VMALLOC_END之间的PMD项进行清零的动作,为后面的操作进行准备。

下面就是bootmem_init的分析了,这个函数位于arch/arm/mm/init.c中:
void __init bootmem_init(struct meminfo *mi)
{
unsigned long memend_pfn = 0;
int node, initrd_node, i;

/*
* Invalidate the node number for empty or invalid memory banks
*/
for (i = 0; i < mi->nr_banks; i++)
if (mi->bank[i].size == 0 || mi->bank[i].node >= MAX_NUMNODES)
mi->bank[i].node = -1;

memcpy(&meminfo, mi, sizeof(meminfo));

/*
* Locate which node contains the ramdisk image, if any.
*/
initrd_node = check_initrd(mi);

/*
* Run through each node initialising the bootmem allocator.
*/
for_each_node(node) {/*实际上我们只有一个节点,因为我们是UMA结构*/
unsigned long end_pfn;

end_pfn = bootmem_init_node(node, initrd_node, mi);

/*
* Remember the highest memory PFN.
*/
if (end_pfn > memend_pfn)
memend_pfn = end_pfn;
}

high_memory = __va(memend_pfn << PAGE_SHIFT);

/*
* This doesn't seem to be used by the Linux memory manager any
* more, but is used by ll_rw_block. If we can get rid of it, we
* also get rid of some of the stuff above as well.
*
* Note: max_low_pfn and max_pfn reflect the number of _pages_ in
* the system, not the maximum PFN.
*/
max_pfn = max_low_pfn = memend_pfn - PHYS_PFN_OFFSET;
}
这里,我们看到传递给这个函数的struct meminfo *mi实际上是在arch/arm/kernel/setup.c
定义的meminfo的指针,而这个指针实际上是在前面已经说过得fixup函数中得到实际赋值的。但是在
文件arch/arm/mm/init.c中也定义了一个meminfo结构,这个结构实际上是在bootmem_init函数
中通过memcpy(&meminfo, mi, sizeof(meminfo));来得到的一个拷贝。
接着一件很重要的事情就是定位initrd在哪一个内存板块中。这是通过用phys_initrd_start和
phys_initrd_size的值于meminfo中的个内存板块的起始地址和板块大小进行对比,上述区域如果
落在这个板块,那么就确定initrd存在这个板块。check_initrd就是完成这件事清。它返回initrd
所在内存板块的节点号。phys_initrd_start和phys_initrd_size是在前面解析TAG或者cmdline
的时候得出来的值。

下面就是一段很重要的代码,用来初始化bootmem内存分配器,这个内存分配器虽然只是在内核启动阶段使用,
但是其意义很重大,因为包括page allocator在内的内存分配器都需要在初始化的时候都需要使用某种
意义上的内存分配来得到一些动态的的数据结构,这就仰仗这个bootmem内存分配器了。因此,我们看:

这里我们要知道UMA和NUMA的概念,简单说就是在一个系统中存储器访问的一致性问题;有的系统,例如多处
理器系统通常都有分布式的存储器架构,一个CPU可以访问系统中的另外一个CPU的存储器,但是因为物理上的
原因,这种访问于访问本地存储器是不一样的,即不匀值(NUMA);对于我们常规的的嵌入式系统,基本上都是
内核将NUMA系统中的每一个匀值的存储器区域归为一个节点(用pg_data_t表示),每个节点可能有几个BANK,
并且这些存储器在某些体系结构上需要分成几个区域来对待;每个节点都需要有相应的bootmem内存分配器描述。
typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[MAX_NR_ZONES];
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP
struct page *node_mem_map;
#endif
struct bootmem_data *bdata;
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Must be held any time you expect node_start_pfn, node_present_pages
* or node_spanned_pages stay constant. Holding this will also
* guarantee that any pfn_valid() stays that way.
*
* Nests above zone->lock and zone->size_seqlock.
*/
spinlock_t node_size_lock;
#endif
unsigned long node_start_pfn;
unsigned long node_present_pages; /* total number of physical pages */
unsigned long node_spanned_pages; /* total size of physical page
range, including holes */
int node_id;
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
int kswapd_max_order;
} pg_data_t;[include/mmzone.h]

在arch/arm/mm/discontig.c中,定义了如下数组:
/*
* Our node_data structure for discontiguous memory.
*/

static bootmem_data_t node_bootmem_data[MAX_NUMNODES];

pg_data_t discontig_node_data[MAX_NUMNODES] = {
{ .bdata = &node_bootmem_data[0] },
{ .bdata = &node_bootmem_data[1] },
{ .bdata = &node_bootmem_data[2] },
{ .bdata = &node_bootmem_data[3] },
};
另外,在include/asm-arm/mmzone.h中还定义了如下的访问宏:
/*
* Return a pointer to the node data for node n.
*/
#define NODE_DATA(nid) (&discontig_node_data[nid])

bootmem_data_t就是描述bootmem内存分配器的数据结构,定义在include/linux/bootmem.h中:
/*
* node_bootmem_map is a map pointer - the bits represent all physical
* memory pages (including holes) on the node.
*/
typedef struct bootmem_data {
unsigned long node_boot_start;
unsigned long node_low_pfn;
void *node_bootmem_map;
unsigned long last_offset;
unsigned long last_pos;
unsigned long last_success; /* Previous allocation point. To speed
* up searching */
struct list_head list;
} bootmem_data_t;
有了这些数据结构,我们可以接着分析bootmem内存分配器的初始化过程,回到bootmem_init中:

是用一个for_each_node(node)循环,对每个节点调用bootmem_init_node(node, initrd_node, mi),
这个函数完成下面的工作:
1)对这个节点中的每个BANK,调用map_memory_bank(bank)进行物理地址到虚拟地址的映射;并统计整个
节点的start_pfn和end_pfn;对于我们的情况,实际上只有一个节点,并且这个节点只有一个BANK,所以
start_pfn和end_pfn实际上就是我们在前面fiup函数中配置meminfo结构时所配置的物理内存的起始和
结束页号。
2)调用下面的两句话计算分配给bootmem所需要的bitmap区域的物理页数和bitmap所在的起始物理页号,
分别存放在boot_pages和boot_pfn中,其中boot_pfn实际是从内核代码段中_end对应的物理页后面开始的
页号:
/*
* Allocate the bootmem bitmap page.
*/
boot_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
boot_pfn = find_bootmap_pfn(node, mi, boot_pages);
3)调用下面几句话
/*
* Initialise the bootmem allocator for this node, handing the
* memory banks over to bootmem.
*/
node_set_online(node);
pgdat = NODE_DATA(node);
init_bootmem_node(pgdat, boot_pfn, start_pfn, end_pfn);

其中init_bootmem_node定义在mm/bootmem.c中,直接调用下面的函数:
/*
* Called once to set up the allocator itself.
*/
static unsigned long __init init_bootmem_core(pg_data_t *pgdat,
unsigned long mapstart, unsigned long start, unsigned long end)
{
bootmem_data_t *bdata = pgdat->bdata;
unsigned long mapsize;

bdata->node_bootmem_map = phys_to_virt(PFN_PHYS(mapstart));
bdata->node_boot_start = PFN_PHYS(start);
bdata->node_low_pfn = end;
link_bootmem(bdata);

/*
* Initially all pages are reserved - setup_arch() has to
* register free RAM areas explicitly.
*/
mapsize = get_mapsize(bdata);
memset(bdata->node_bootmem_map, 0xff, mapsize);

return mapsize;
}
这里实际上是要在内核代码段的_end对应的物理页后面开始的页的起始地址处创建一个bitmap,用来
管理内存页面是否可分配,每个bit对应一个页面,如果该bit是1,则不可分配,如果是0,则可以分配;
其中,node_bootmem_map就是这个bitmap对应的虚拟地址;node_boot_start就是所有被按照
bootmem来管理的物理页面的起始地址,而node_low_pfn则是所有bootmem的结束物理页号,也就是
所谓的low memory的结束物理页号;我们看到,在映射完后,代码将bitmap区域全部填1,所以对应的
bootmem都不可分配;
那么,我们什么时候才可分配呢?我们回到arch/arm/mm/init.c中的bootmem_init_node代码,接
着看:
for_each_nodebank(i, mi, node)
free_bootmem_node(pgdat, mi->bank[i].start, mi->bank[i].size);
也就是说,对这个节点的所有bank都运行free_bootmem_node函数,而该函数的主要功能就是对相应物理
页面的bitmap位进行清零操作,标志着相应的物理页可以被bootmem分配器动态分配:
/*
* Round up the beginning of the address.
*/
sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
eidx = PFN_DOWN(addr + size - bdata->node_boot_start);

for (i = sidx; i < eidx; i++) {
if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
BUG();
}
但是,分配也不是随意的,有一些区域,例如内核代码本身所在的区域,是不能动态分配的;所以下面就是分
别对这样的几个特殊区域进行保留,即再次将这些区域的bitmap设置成1:
1)bitmap区域本身
/*
* Reserve the bootmem bitmap for this node.
*/
reserve_bootmem_node(pgdat, boot_pfn << PAGE_SHIFT,
boot_pages << PAGE_SHIFT);
2)如果有initrd存在这个节点,就将initrd的区域保留:
#ifdef CONFIG_BLK_DEV_INITRD
/*
* If the initrd is in this node, reserve its memory.
*/
if (node == initrd_node) {
reserve_bootmem_node(pgdat, phys_initrd_start,
phys_initrd_size);
initrd_start = __phys_to_virt(phys_initrd_start);
initrd_end = initrd_start + phys_initrd_size;
}
#endif
3)调用reserve_node_zero来保留节点0的一些特殊区域,例如内核所在区域,页面目录所在区域,
以及一些特定的板子所特用的区域;例如:
/*
* Reserve the page tables. These are already in use,
* and can only be in node 0.
*/
reserve_bootmem_node(pgdat, __pa(swapper_pg_dir),
PTRS_PER_PGD * sizeof(pgd_t));
之所以单独保留节点0中的这些区域,是因为包括NUMA系统在内的所有系统,都是将内核等资源放在节
点0上的;

到此为止,bootmem内存分配器其实已经完成初始化,已经可以使用了。下面的代码虽然在bootmem_init_node
中被调用,实际上已经是在使用bootmem内存分配器的分配功能了。

4)本来,对于常规的系统,例如x86架构的PC,都需要将这个节点上的存储器分成3个管理区,ZONE_DMA,
ZONE_NORMAL,ZONE_HIGHMEM;最近的代码还新加了一个ZONE_DMA32;但是,对于像ARM这种架构,
则似乎没有必要这样划分,因为DMA可以使用全部物理内存进行,并且目前嵌入式系统中还没有用到需要
HIGHMEM来支持的程度。所以,接下来的代码本来要初始化三个管理区,但是代代码实际上只处理了第
一个:

memset(zone_size, 0, sizeof(zone_size));
memset(zhole_size, 0, sizeof(zhole_size));

/*
* The size of this node has already been determined. If we need
* to do anything fancy with the allocation of this memory to the
* zones, now is the time to do it.
*/
zone_size[0] = end_pfn - start_pfn;

/*
* For each bank in this node, calculate the size of the holes.
* holes = node_size - sum(bank_sizes_in_node)
*/
zhole_size[0] = zone_size[0];
for_each_nodebank(i, mi, node)
zhole_size[0] -= mi->bank[i].size >> PAGE_SHIFT;

/*
* Adjust the sizes according to any special requirements for
* this machine type.
*/
arch_adjust_zones(node, zone_size, zhole_size);

free_area_init_node(node, pgdat, zone_size, start_pfn, zhole_size);
这里调用free_area_init_node来初始化物理页内存分配器:
void __meminit free_area_init_node(int nid, struct pglist_data *pgdat,
unsigned long *zones_size, unsigned long node_start_pfn,
unsigned long *zholes_size)
{
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
calculate_node_totalpages(pgdat, zones_size, zholes_size);

alloc_node_mem_map(pgdat);

free_area_init_core(pgdat, zones_size, zholes_size);
}
调用alloc_node_mem_map函数,这个函数计算节点的本地内存数量(以4K页框计)N,从
bootmem分配器分配能容纳N个struct page结构的内存,保留在node_mem_map中;
该函数就使用到了前面已经初始化好的bootmem分配器:
map = alloc_remap(pgdat->node_id, size);
if (!map)
map = alloc_bootmem_node(pgdat, size);
pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);

free_area_init_node函数进一步调用free_area_init_core
来完成:
/*
* Set up the zone data structures:
* - mark all pages reserved
* - mark all memory queues empty
* - clear the memory bitmaps
*/
static void __meminit free_area_init_core(struct pglist_data *pgdat,
unsigned long *zones_size, unsigned long *zholes_size);
我们此处不详细分析物理页面分配器的初始化,但是我们强调一点,物理页内存分配器在这里虽然已经
被初始化,但是所有的内存都还是被保留的状态,还要等到后面bootmem分配器“退休”时才能使用
物理页面分配器。

我们再次回到bootmem_init,现在已经分析了bootmem_init_node,接着分析就简单了:
bootmem_init_node函数返回物理页面最后的页号,存放在end_pfn,而后面memend_pfn则保留
系统中所有节点的最后物理页号:
/*
* Remember the highest memory PFN.
*/
if (end_pfn > memend_pfn)
memend_pfn = end_pfn;
接着,扫描了所有节点得到的memend_pfn被转化成虚拟地址保存在high_memory中:
high_memory = __va(memend_pfn << PAGE_SHIFT);
此外,还保存了几个不很重要的全局变量:
/*
* This doesn't seem to be used by the Linux memory manager any
* more, but is used by ll_rw_block. If we can get rid of it, we
* also get rid of some of the stuff above as well.
*
* Note: max_low_pfn and max_pfn reflect the number of _pages_ in
* the system, not the maximum PFN.FLUSH_BASE_MINICACHE
*/
max_pfn = max_low_pfn = memend_pfn - PHYS_PFN_OFFSET;

我们再次回到mmu.c里面的paging_init,现在我们已经分析了bootmem_init,接着该分析
devicemaps_init了:
该函数完成如下功能:
1)使用bootmem分配器分配vectors页,用于存放向量表:
/*
* Allocate the vector page early.
*/
vectors = alloc_bootmemfor (addr = VMALLOC_END; addr; addr += PGDIR_SIZE)
pmd_clear(pmd_off_k(addr));_low_pages(PAGE_SIZE);
BUG_ON(!vectors);
2)清零VMALLOC_END以后的虚拟地址对应的映射,这同时也就清除了在内核在刚开始时调用
__create_page_tables进行的映射,包括CONFIG_DEBUG_LL所提供的串口输出信息的映射:
for (addr = VMALLOC_END; addr; addr += PGDIR_SIZE)
pmd_clear(pmd_off_k(addr));
3)根据CONFIG_XIP_KERNEL,FLUSH_BASE,FLUSH_BASE_MINICACHE等几个宏是否被定义而选择是否
对相应的区域进行映射;映射的过程是通过对struct map_desc map进行赋值指定映射的物理地址和虚拟地址,
以及映射的长度和类型,并调用create_mapping(&map)完成;
4)对前面分配的向量页进行映射,过程于前面的类似:
/*
* Create a mapping for the machine vectors at the high-vectors
* location (0xffff0000). If we aren't using high-vectors, also
* create a mapping at the low-vectors virtual address.
*/
map.pfn = __phys_to_pfn(virt_to_phys(vectors));
map.virtual = 0xffff0000;
map.length = PAGE_SIZE;
map.type = MT_HIGH_VECTORS;
create_mapping(&map);

if (!vectors_high()) {
map.virtual = 0;
map.type = MT_LOW_VECTORS;
create_mapping(&map);
}
5)调用mdesc->map_io,针对特定板子进行特定IO的映射:
/*
* Ask the machine support to map in the statically mapped devices.
*/
if (mdesc->map_io)
mdesc->map_io();
对于我们的情况,便是下面的函数[arch/arm/mach-zr4230/board-zolo.c]:
static struct map_desc zr4230_io_desc[] __initdata = {
{
.virtual = ZR4230_INTERNAL_REG_BASE_VIRT,
.pfn = __phys_to_pfn(ZR4230_INTERNAL_REG_BASE_PHYS),
.length = ZR4230_INTERNAL_REG_SPACE_SIZE,
.type = MT_DEVICE
}
};
static void __init zr4230_zolo_map_io(void)
{
iotable_init(zr4230_io_desc, ARRAY_SIZE(zr4230_io_desc));
}
实际上上面的函数只是一个钩子函数,用于得到板子移植时的一些IO的映射配置情况,然后还是
调用iotable_init并进而调用create_mapping来完成映射:
/*
* Create the architecture specific mappings
*/
void __init iotable_init(struct map_desc *io_desc, int nr)
{
int i;

for (i = 0; i < nr; i++)
create_mapping(io_desc + i);
}
6)因为现在已经彻底改变了映射表,所以需要冲刷TLB和Cache来将这些内容的改变发送到实际的物理内存
中去:
/*
* Finally flush the caches and tlb to ensure that we're in a
* consistent state wrt the writebuffer. This also ensures that
* any write-allocated cache lines in the vector page are written
* back. After this point, we can start to touch devices again.
*/
local_flush_tlb_all();
flush_cache_all();

前面的几项功能实际上都是靠create_mapping来完成实际的映射,所以有必要深入分析一下这个函数:
1)通过参数结构来得到映射表项需要的一些值:
domain = mem_types[md->type].domain;
prot_pte = __pgprot(mem_types[md->type].prot_pte);
prot_l1 = mem_types[md->type].prot_l1 | PMD_DOMAIN(domain);
prot_sect = mem_types[md->type].prot_sect | PMD_DOMAIN(domain);
这里,mem_types在前面已经介绍过,是一个描述存储器区域的属性的数组,可以用存储器的类型作为下标
来得到相应类型数组对应的描述结构,进而得到相应的属性值。
2)按照“掐头,去尾,砍中间”的原则,对所映射的区域按照页面,段,和超级段的方式映射,分别调用
alloc_init_page,alloc_init_section,alloc_init_supersection进行映射,例如:
/* N.B. ARMv6 supersections are only defined to work with domain 0.
* Since domain assignments can in fact be arbitrary, the
* 'domain == 0' check below is required to insure that ARMv6
* supersections are only allocated for domain 0 regardless
* of the actual domain assignments in use.
*/
if ((cpu_architecture() >= CPU_ARCH_ARMv6 || cpu_is_xsc3())
&& domain == 0) {
/*
* Align to supersection boundary if !high pages.
* High pages have already been checked for proper
* alignment above and they will fail the SUPSERSECTION_MASK
* check because of the way the address is encoded into
* offset.
*/
if (md->pfn <= 0x100000) {
while ((virt & ~SUPERSECTION_MASK ||
(virt + off) & ~SUPERSECTION_MASK) &&
length >= (PGDIR_SIZE / 2)) {
alloc_init_section(virt, virt + off, prot_sect);

virt += (PGDIR_SIZE / 2);
length -= (PGDIR_SIZE / 2);
}
}

while (length >= SUPERSECTION_SIZE) {
alloc_init_supersection(virt, virt + off, prot_sect);

virt += SUPERSECTION_SIZE;
length -= SUPERSECTION_SIZE;
}
}
我们这里不想细看这里调用嗯的几个函数了,就是填写页目录和页表项的问题而已,对照ARM
的文档可以了解如何填写。

好了,我们已经分析了paging_init里面的devicemaps_init,接下来的工作很简单了:
top_pmd = pmd_off_k(0xffff0000);
这里top_pmd保存了最顶段的页表指针;
下面就是分配一个全局0页:
/*
* allocate the zero page. Note that we count on this going ok.
*/
zero_page = alloc_bootmem_low_pages(PAGE_SIZE);
memzero(zero_page, PAGE_SIZE);
empty_zero_page = virt_to_page(zero_page);
flush_dcache_page(empty_zero_page);

我们现在终于可以回到setup.c中,接着分析setup_arch的下一个函数了,即paging_init后面的
函数request_standard_resources(&meminfo, mdesc):
这个函数中有意义的时下面的代码段:
kernel_code.start = virt_to_phys(&_text);
kernel_code.end = virt_to_phys(&_etext - 1);
kernel_data.start = virt_to_phys(&__data_start);
kernel_data.end = virt_to_phys(&_end - 1);

for (i = 0; i < mi->nr_banks; i++) {
unsigned long virt_start, virt_end;

if (mi->bank[i].size == 0)
continue;

virt_start = __phys_to_virt(mi->bank[i].start);
virt_end = virt_start + mi->bank[i].size - 1;

res = alloc_bootmem_low(sizeof(*res));
res->name = "System RAM";
res->start = __virt_to_phys(virt_start);
res->end = __virt_to_phys(virt_end);
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

request_resource(&iomem_resource, res);

if (kernel_code.start >= res->start &&
kernel_code.end <= res->end)
request_resource(res, &kernel_code);
if (kernel_data.start >= res->start &&
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
}
实际上就是将系统资源通过request_resource组织成一个树状结构,request_resource实际最后调用:
[kernel/resource.c]
/* Return the conflict entry if you can't request it */
static struct resource * __request_resource(struct resource *root, struct resource *new)
{
resource_size_t start = new->start;
resource_size_t end = new->end;
struct resource *tmp, **p;

if (end < start)
return root;
if (start < root->start)
return root;
if (end > root->end)
return root;
p = &root->child;
for (;;) {
tmp = *p;
if (!tmp || tmp->start > end) {
new->sibling = tmp;
*p = new;
new->parent = root;
return NULL;
}
p = &tmp->sibling;
if (tmp->end < start)
continue;
return tmp;
}
}

此后,cpu_init初始化IRQ,ABT,UND模式下的堆栈指针,此外没设么实际功能。
cpu_init();
/*
* Set up various architecture-specific pointers
*/
init_arch_irq = mdesc->init_irq;
system_timer = mdesc->timer;
init_machine = mdesc->init_machine;
后面就是保存了从mdesc传过来的几个指针,在setup_arch返回start_kernel后会被用到。

0 Comments:

Post a Comment

<< Home