URL:Compiling GCC for MinGW & Cross-Compiling with Linux

Compiling GCC for MinGW & Cross-Compiling with Linux

Feescale MX31 bootloader Program : HAB Toolkit（转贴）

Feescale MX31 bootloader Program : HAB Toolkit

MX31 內含一個ROM，boot ROM，裡面的code是Freescake預先寫好的。
boot rom code在boot時會detect boot config pin 的high low狀態，決定boot 的動作：

執行loader
from NOR flash (8/16 bit)
from NAND Flash (8/16 bit, 256/512 block)

利用rom 的loader 程序，可以用來dowload code/data到RAM。

loader 提供由UART和 USB download 的介面。

Freescale有將loader 的protocol, command open 出來。

Freescale 提供一個too : HAB Toolkit，方便和 rom loader溝通。

rom loader 啟動後，只會initial UART和USB週邊，所以在download之前，要另外下command 設定DDR controller。這部份由init file 提供。

HAB Kit 也提供將RAM DATA燒寫到flash 的功能，這部份是利用MX31的ARM Code執行一段code 來完成，燒寫flash 的command被包裝起來，放在RAM中，
所以當 designer 更換 flash 時，可以follow Freescale 的instruction ，implement 需要的flash command，包裝成bin 後讓HAB Toolkit download到RAM中讓 rom loader 使用。

HAB Toolkit 也可以作download and run 的動作，但是固定jmp 到download 的start address。

HAB toolkit 在download 時，出現 DDR Error的話，可能是 DDR controller設定不良，也可以是target board沒電，或是cable 沒接等問題。
==============================================
感谢这位兄台分享，为了方便查阅才在本站转贴您的文章，如有冒犯，请来信告知！

i.MX31主要开发资源(转贴)

i.MX31主要开发资源

http://www.freescale.com/webapp/sps/site/overview.jsp?nodeId=0127260061033202A7

i.MX1，i.MX21，i.MX27，i.MX31 ADS 的BSP包的ISO文件全有。注册就可以免费下载。

http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=i.MX31&nodeId=0162468rH31143297336425774&fpsp=1&tab=Design_Tools_Tab

这是基本全部i.MX31的全部官方开发文档。

http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=M9328MX31ADSE&nodeId=0162468rH31143297336425774&fpsp=1&tab=Design_Tools_Tab

这有MCIMX31ADSE的全部官方开发文档。

http://www.codesourcery.com/gnu_toolchains/arm

这有一个交叉编译工具。

http://www.bitshrine.org/autodocs/bsp_ext_ava_imx31ads.html

这有相当全的i.MX31 内核源码补丁，LTIB方式建立的开发包。

http://mx31.lbox.ca/download/

这有一个MCIMX31LITEKIT板的源码补丁。

http://www.developmentdevices.com/imx31-lpd/

这还有一个MCIMX31LITEKIT板的源码补丁。

缥缈-九哥收集
yuanxihua@21cn.com
QQ:14131338

====================================================

http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=i.MX31&nodeId=01J4Fs2973ZrDR
from this site you will see linux BSP it is included in BSP but you
must be registered to obtain the BSP. it is 455MB iso image
=================================================================
http://www.logicpd.com/support/faq/
has some FAQ for the IMX31 lite

URL:RealView Assembler User's Guide

RealView Assembler User's Guide

ARM GCC Inline Assembler Cookbook

About this Document

The GNU C compiler for ARM RISC processors offers, to embed assembly language code into C programs. This cool feature may be used for manually optimizing time critical parts of the software or to use specific processor instruction, which are not available in the C language.

It's assumed, that you are familiar with writing ARM assembler programs, because this is not an ARM assembler programming tutorial. It's not a C language tutorial either.

This document describes version 3.4 of the compiler.

GCC asm Statement

Let's start with a simple example of rotating bits. It takes the value of one integer variable, right rotates the bits by one and stores the result in a second integer variable.

asm("mov %0, %1, ror #1" : "=r" (result) : "r" (value));

Each asm statement is devided by colons into up to four parts:

The assembler instructions, defined as a single string constant:
```
"mov %0, %1, ror #1" 
```
A list of output operands, separated by commas. Our example uses just one:
```
"=r" (result)
```
A comma separated list of input operands. Again our example uses one operand only:
```
"r" (value)
```
Clobbered registers, left empty in our example.

You can write assembler instructions in much the same way as you would write assembler programs. However, registers and constants are used in a different way if they refer to expressions of your C program. The connection between registers and C operands is specified in the second and third part of the asm instruction, the list of output and input operands, respectively. The general form is

asm(code : output operand list : input operand list : clobber list);

In the code section, operands are referenced by a percent sign followed by a single digit. %0 refers to the first %1 to the second operand and so forth. From the above example:

%0 refers to "=r" (result) and
%1 refers to "r" (value)

The last part of the asm instruction, the clobber list, is mainly used to tell the compiler about modifications done by the assembler code.

This may still look a little odd now, but the syntax of an operand list will be explained soon. Let us first examine the part of a compiler listing which may have been generated from our example:

00309DE5              ldr     r3, [sp, #0]    @ value, value
E330A0E1              mov r3, r3, ror #1      @ tmp69, value
04308DE5              str     r3, [sp, #4]    @ tmp71, result

The compiler selected register r3 for bit rotation. It could have selected any other register, though. It may not explicitly load or store the value and it may even decide not to include your assembler code at all. All these decisions are part of the compiler's optimization strategy. For example, if you never use the variable value in the remaining part of the C program, the compiler will most likely remove your code unless you switched off optimization.

You can add the volatile attribute to the asm statement to instruct the compiler not to optimize your assembler code.

asm volatile("mov %0, %1, ror #1" : "=r" (result) : "r" (value));

As with the clobber list in our example, trailing parts of the asm statement may be omitted, if unused. The following statement does nothing but consuming CPU time and provides the code part only. It is also known as a NOP (no operation) statement and is typically used for tiny delays.

asm volatile ("mov r0, r0");

If an unused part is followed by one which is used, it must be left empty. The following example uses an input, but no output value.

asm volatile ("msr cpsr, %0" : : "r" (status));

Even the code part may be left empty, though an empty string is reuired. The next statement specifies a special clobber to tell the compiler, that memory contents may have changed.

asm volatile ("" : : : "memory");

With inline assembly you can use the same assembler instruction mnemonics as you'd use for writing pure ARM assemly code. And you can write more than one assembler instruction in a single inline asm statement. To make it more readable, you should put each instruction on a seperate line.

asm volatile(
 "mov     r0, r0\n\t"
 "mov     r0, r0\n\t"
 "mov     r0, r0\n\t"
 "mov     r0, r0"
);

The linefeed and tab characters will make the assembler listing generated by the compiler more readable. It may look a bit odd for the first time, but that's the way the compiler creates it's own assembler code. Also note, that eight characters are reserved for the assembler instruction mnemonic.

Input and Output Operands

Each input and output operand is described by a constraint string followed by a C expression in parantheses. For ARM processors, GCC 3.4 provides the following constraint characters.

Constraint	Used for	Range
f	Floating point registers
I	Immediate operands	8 bits, possibly shifted.
J	Indexing constants	-4095 .. 4095
K	Negated value in rhs	-4095 .. 4095
L	Negative value in rhs	-4095 .. 4095
M	For shifts.	0..32 or power of 2
r	General registers

Constraint characters may be prepended by a single constraint modifier. Contraints without a modifier specify read-only operands. Modifiers are:

Modifier	Specifies
=	Write-only operand, usually used for all output operands.
+	Read-write operand (not supported by inline assembler)
&	Register should be used for output only

Output operands must be write-only and the C expression result must be an lvalue, which means that the operands must be valid on the left side of assignments. Note, that the compiler will not check if the operands are of reasonable type for the kind of operation used in the assembler instructions.

Input operands are, you guessed it, read-only. Never ever write to an input operand. But what if you need the same operand for input and output? As stated above, read-write operands are not supported in inline assembler code. But there is another solution.

For input operators it is possible to use a single digit in the constraint string. Using digit n tells the compiler to use the same register as for the n-th operand, starting with zero. Here is an example:

asm volatile("mov %0, %0, ror #1" : "=r" (value) : "0" (value));

This is similar to our initial example. It rotates the contents of the variable value to the right by one bit. In opposite to our first example, the result is not stored in another variable. Instead the original contents of input variable will be modified. Constraint "0" tells the compiler, to use the same input register as for the first output operand.

Note however, that this doesn't automatically imply the reverse case. The compiler may choose the same registers for input and output, even if not told to do so. In our initial example it did indeed choose the same register r3.

This is not a problem in most cases, but may be fatal if the output operator is modified by the assembler code before the input operator is used. In situations where your code depends on different registers used for input and output operands, you must add the & constraint modifier to your output operand. The following example demonstrates this problem.

asm volatile("ldr     %0, [%1]"         "\n\t"
            "str     %2, [%1, #4]"     "\n\t"
            : "=&r" (rdv)
            : "r" (&table), "r" (wdv)
            : "memory"
           );

In this example a value is read from a table and then another value is written to another location in this table. If the compiler would have choosen the same register for input and output, then the output value would have been destroyed on the first assembler instruction. Fortunately, this example uses the & constraint modifier to instruct the compiler not to select any register for the output value, which is used for any of the input operands. Back to swapping. Here is the code to swap high and low byte of a 16-bit value:

Clobbers

If you are using registers, which had not been passed as operands, you need to inform the compiler about this. The following code will adjust a value to a multiple of four. It uses r3 as a scratch register and lets the compiler know about this by specifying r3 in the clobber list. Furthermore the CPU status flags are modified by the ands instruction. Adding the pseudo register cc to the clobber list will keep the compiler informed about this modification as well.

asm volatile("ands    r3, %1, #3"     "\n\t"
            "eor     %0, %0, r3"     "\n\t"
            "addne   %0, #4"        
            : "=r" (len)          
            : "0" (len)           
            : "cc", "r3"
           );

Our previous example, which stored a value in a table

asm volatile("ldr     %0, [%1]"         "\n\t"
            "str     %2, [%1, #4]"     "\n\t"
            : "=&r" (rdv)
            : "r" (&table), "r" (wdv)
            : "memory"
           );

uses another so called pseudo register named "memory"in the clobber list. This special clobber informs the compiler that the assembler code may modify any memory location. It forces the compiler to update all variables for which the contents are currently held in a register before executing the assembler code. And of course, everything has to be reloaded again after this code.

Assembler Macros

In order to reuse your assembler language parts, it is useful to define them as macros and put them into include files. Nut/OS comes with some of them, which could be found in the subdirectory include. Using such include files may produce compiler warnings, if they are used in modules, which are compiled in strict ANSI mode. To avoid that, you can write __asm__ instead of asm and __volatile__ instead of volatile. These are equivalent aliases.

C Stub Functions

Macro definitions will include the same assembler code whenever they are referenced. This may not be acceptable for larger routines. In this case you may define a C stub function, containing nothing other than your assembler code.

unsigned long htonl(unsigned long val)
{
   asm volatile ("eor r3, %1, %1, ror #16\n\t"
                 "bic r3, r3, #0x00FF0000\n\t"
                 "mov %0, %1, ror #8\n\t"
                 "eor %0, %0, r3, lsr #8"
                 : "=r" (val)
                 : "0"(val)
                 : "r3"
       );
   return val;
}

The purpose of this function is to swap all bytes of an unsigend 32 bit value. In other words, it changes a big endian to a little endian value or vice versa.

C Names Used in Assembler Code

By default GCC uses the same symbolic names of functions or variables in C and assembler code. You can specify a different name for the assembler code by using a special form of the asm statement:

unsigned long value asm("clock") = 3686400;

This statement instructs the compiler to use the symbol name clock rather than value. This makes sense only for external or static variables, because local variables do not have symbolic names in the assembler code. However, local variables may be held in registers.

With GCC you can further demand the use of a specific register:

void Count(void) {
   register unsigned char counter asm("r3");

   ... some code...
   asm volatile("eor r3, r3, r3");
   ... more code...
}

The assembler instruction, "eor r3, r3, r3", will clear the variable counter. Be warned, that this sample is bad in most situations, because it interfers with the compiler's optimizer. Furthermore, GCC will not completely reserve the specified register. If the optimizer recognizes that the variable will not be referenced any longer, the register may be re-used. But the compiler is not able to check wether this register usage conflicts with any predefined register. If you reserve too many registers in this way, the compiler may even run out of registers during code generation.

In order to change the name of a function, you need a prototype declaration, because the compiler will not accept the asm keyword in the function definition:

extern long Calc(void) asm ("CALCULATE");

Calling the function Calc() will create assembler instructions to call the function CALCULATE.

Register Usage

Typically the following registers are used by the compiler for specific purposes.

Register	Alt. Name	Usage
r0	a1	First function argument Integer function result Scratch register
r1	a2	Second function argument Scratch register
r2	a3	Third function argument Scratch register
r3	a4	Fourth function argument Scratch register
r4	v1	Register variable
r5	v2	Register variable
r6	v3	Register variable
r7	v4	Register variable
r8	v5	Register variable
r9	v6 rfp	Register variable Real frame pointer
r10	sl	Stack limit
r11	fp	Argument pointer
r12	ip	Temporary workspace
r13	sp	Stack pointer
r14	lr	Link register Workspace
r15	pc	Program counter

Links

For a more thorough discussion of inline assembly usage, see the gcc user manual. The latest version of the gcc manual is always available here:
http://gcc.gnu.org/onlinedocs/

Pin:A tool for dynamic instrumentation of programs

Purpose. Pin is a tool for the dynamic instrumentation of programs. It supports Linux binary executables for Intel (R) Xscale (R), IA-32, Intel64 (64 bit x86), and Itanium (R) processors; Windows executables for IA-32 and Intel64; and MacOS executables for IA-32. Pin was designed to provide functionality similar to the popular ATOM toolkit for Compaq's Tru64 Unix on Alpha, i.e. arbitrary code (written in C or C++) can be injected at arbitrary places in the executable. Unlike Atom, Pin does not instrument an executable statically by rewriting it, but rather adds the code dynamically while the executable is running. This also makes it possible to attach Pin to an already running process.

The API. Pin provides a rich API that abstracts away the underlying instruction set idiosyncrasies and allows context information such as register contents to be passed to the injected code as parameters. Pin automatically saves and restores the registers that are overwritten by the injected code so the application continues to work. Limited access to symbol and debug information is available as well.

See the Pin Wiki Page

MIPS架构学习笔记(转贴)

MIPS架构学习笔记

张驿风

20060905

最近在学习MIPS架构，在系统计算机研究所的网上读了不
少关于MIPS的好文，下面的笔记就是基于上面的好文的摘抄。

一: MIPS寄存器别名记忆:
这一段在学习MIPSCPU架构，一直对mips的32个寄存器的
约定俗成的别名感到迷惑，今天在系统计算机研究所的网
(http://www.xtrj.org/)上看到一篇文章里有这方面的介
绍，一下子豁然开朗原来这里的v,a,t前缀就是英文单词
的缩写呀。（呵呵，以前害得俺在<>书上
都没有找到有助于理解的介绍）

;REGISTER NAME USAGE
$0 $zero 常量0(constant value 0)
$2-$3 $v0-$v1 函数调用返回值(values for results and expression evaluation)
$4-$7 $a0-$a3 函数调用参数(arguments)

$8-$15 $t0-$t7 暂时的(或随便用的)
$16-$23 $s0-$s7 保存的(或如果用，需要SAVE/RESTORE的)(saved)
$24-$25 $t8-$t9 暂时的(或随便用的)
$28 $gp 全局指针(Global Pointer)
$29 $sp 堆栈指针(Stack Pointer)
$30 $fp 帧指针(Frame Pointer)
$31 $ra 返回地址(return address)

二: MIPS 存储空间分配
MIPS将存储空间分为4块分别是:
kuseg, kseg0,kseg1 and kseg2
1. 0xFFFF FFFF mapped kseg2
2. 0xC000 0000 unmapped uncached kseg1
3. 0xA000 0000 unmapped cached kseg0
4. 0x8000 0000 2G kuseg

呵呵可以直观的看到只有kseg1是不需要映射(物理虚拟转换)，没有被缓存
的，也就是说只有kseg1的内存区域可以做引导的存储区(在这里放置引导用
flash存储器).被cached区域必须等到MMU 的TLB被初始化后才可以使用的。

三: MIPS的CPU运行有3个态
1. User Mode.
2. Supervisor Mode.
3. and Kernel Mode.
For simplicity, let's just talk about User Mode and Kernel Mode.
Please always keep this in mind:
CPU can ONLY access kuseg memory area when running in User Mode
CPU MUST be in kernel mode or supervisor mode when visiting kseg0, kseg1
and kseg2 memory area。
呵呵，可以看出MIPS的CPU运行态和x86尤其是ARM基本都是一样的。就是用户层
对物理空间地址的访问是也是受限制的(现代操作系统的先进之处吗)，必须通过使
用驱动方式把操作代码运行在核心态。

四: MMU TLB
MIPS CPU通过TLB来translates all virtual addresses generated by the CPU.
下面谈谈ASID(Address Space Identifier). Basically, ASID, plus the VA(Vir
tual Address) are composed of the primary key of an TLB entry. 换句话说，
虚拟地址本身是不能唯一确定一个TLB entry的。一般而言，ASID的值就是相应的
process ID. Note that ASID can minimized TLB re-loads, since several TLB
entries can have the same virtual page number, but different ASID's. 对
于一个多任务操作系统来讲，每个任务都有自己的4G虚拟空间

五: MMU 控制寄存器
对于一个Kernel Engineer来说，对MMU的处理主要是通过MMU的一些控制寄存器来完成
的。MIPS体系结构中集成了一个叫做System Control Coprocessor (CP0)的部件。CP0
就是我们常说的MMU控制器。在CP0中，除了TLB entry(例如，对RM5200，有48pair,96
个TLB entry),一些控制寄存器提供给OS KERNEL来控制MMU的行为。
每个CP0控制寄存器都对应一个唯一的寄存器号。MIPS提供特殊的指令来对CP0进行操作。
mfc0 reg. CP0_REG
mtc0 reg. CP0_REG
我们通过上述的两条指令来把一个GPR寄存器的值assign给一个CP0寄存器，从而达到
控制MMU的目的。
面简单介绍几个与TLB相关的CP0控制寄存器。
Index Register
这个寄存器是用来指定TLB entry的，当你进行TLB读写的时候。我们已经知道，例如，
MIPS R5提供了48个TLB pair，所以index寄存器的值是从0到47。换句话说，每次TLB写
的行为是对一个pair发生的。这一点是与其他的CPU MMU TLB 读写不同的。
EntryLo0, EntryLo1
这两个寄存器是用来specify 一个TLB pair的偶(even)和奇(odd)物理(Physical)页面
地址。
一定要注意的是：
EntryLo0 is used for even pages; EntryLo1 is used for odd pages.
Otherwise, the MMU will get exception fault.
Entry Hi
Entry Hi寄存器存放VPN2，或一个TLB的虚拟地址部分。注意的是：ASID value也是在
这里被体现。
Page Mask
MIPS TLB提供可变大小的TLB地址映射。一个PAGE可以是4K，16K，64K，256K，1M，4M
或16M。这种可变PAGE SIZE提供了很好的灵活性，特别是对Embedded System Software.
对于Embedded System Softare,一个很大的区别就是：不允许大量的Page Fault.
这一点是传统OS或General OS在Embedded OS上的致命缺陷。也是为什么POSIX 1。B的
目的所在。传统OS存储管理的一个原则就是：Page On Demand.这对大多Embedded
System是不允许的。 For embedded system,往往是需要在系统初始化的时刻就对所有的
存储进行configuration，以确保在系统运行时不会有Page Fault.

上述几个寄存器除了MAP一个虚拟页面之外，还包括设置一个页面的属性。其中包括：
writable or not; invalide or not; cache write back or write through

下面简单谈谈MIPS的JTLB。

在MIPS中，如R5000， JTLB is provided. JTLB stands for Joint TLB. 什么意思呢？
就是 TLB buffer中包含的mixed Instruction and Data TLB 映射。有的CPU的Instruction
TLB 和Data TLB buffer 是分开的。
当然MIPS(R5000)确实还有两个小的，分开的Instruction TLB和Data TLB。但其大小很小。
主要是为了Performance,而且是对系统软件透明的。
在这里再谈谈MMU TLB和CPU Level 1 Cache的关系。
我们知道，MIPS，或大多数CPU，的Level 1 Cache都是采用Virtually Indexed and P
hysicall tagged. 通过这个机制，OS就不需要在每次进程切换的时候去flush CACHE。
为什么呢？
举一个例子吧：
进程A的一个虚拟地址Addr1，其对应的物理地址是addre1；
进程B的一个虚拟地址Addr1，其对应的物理地址是addre2;
在某个时刻，进程A在运行中，并且Addr1在Level 1 CACHE中。
这时候，OS does a context swith and bring process B up, having process A sleep.
Now, let's assume that the first instruction/data fetch process B does is to
access its own virtual address Addr1.
这时候CPU会错误的把进程A在Level 1中的Addr1的addr1返回给CPU吗？
我们的回答应该是：不会的。
原因是：
当进程切换时，OS会将进程B的ASID或PID填入ASID寄存器中。请记住：对TLB的访问，
(ASID + VPN)才是Primary Key. 由于MIPS的CACHE属性是Virtually Indexed,
Physically tagged.所以，任何地址的访问，CPU都会issue the request to MMU for
TLB translation to get the correct physical address, which then will be used
for level cache matching.
与此同时，CPU会把虚拟地址信号传给Level 1 Cache 控制器。然后，我们必须等待MMU
的Physical Address数据。只有physical tag也匹配上了，我们才能说一个：Cache Hit.
所以，我们不需要担心不同的进程有相同的虚拟地址的事情。
弟兄们可以重温一下我们讲过的Direct Mapped; Full Associative, and Set Associative.
从而理解为什么Cache中可以存在多个具有相同虚拟地址的entry. For example,the above
Addr1 for proccess A and Addr1 for process B.

系统计算研究所

系统计算研究所里面有一些好东西，包括<MIPS体系结构剖析，编程与实践>
以及<PowerPC and Linux Kernel Inside>

两篇关于ARM VFP的邮件

下面是两篇关于ARM VFP的邮件：
===========================================================
第一篇：VFPv3 Compiler Option
===========================================================
Amit Mandil wrote:
> Hi,
>
> I am using GNU ARM Toolchain for GNU/Linux as host and target both.
> I have seen two compiler options available for VFP instructions in its
> source- vfp and vfp3.
>
> I am using -mfpu=vfp3 for vfpv3 instructions. Am I right or it is meant
> for something else? Please reply!!

That should be fine. Note that you'll also need "-mfloat-abi=softfp" to
get the compiler to emit VFP instructions.

Julian
===========================================================
第二篇：
[arm-gnu] Best library for transcendental functions (sin, cos) for ARM with VFP
===========================================================
On Fri, Jun 15, 2007 at 12:24:00PM -0400, Michael Bergandi wrote: > I am working on an embedded application that makes extensive use of > floating point and transcendental functions (ie. sin, cos). The > target is an arm1136jf-s, which has a VFP unit. The application is > being compiled with your tool chain (arm-2006q3) using: > > CFLAGS = -mcpu=arm1136jf-s -mfpu=vfp -mfloat-abi=softfp -static > -Wall -O3 The use of -mfpu=vfp is only needed for older toolchains. The (-mfloat-abi=softfp) will enable the use of the VFP. > The problem is that performing any transcendentals are extremely > slow. From my research, I know that the transcendentals are being > trapped and pushed up to the kernel to handle in software. I'm fine > with that. However, the implementation in software is done using > Taylor series approximations that contain mostly floating point > multiplication and can certainly be accelerated in the VFP. > Unfortunately, it does not seem that this is being done. Transcendentals are not trapped and pushed up to the kernel, they are computed in userspace by the glibc libm functions. CodeSourcery currently provides a software floating point math library (-mfloat-abi=soft), and this means that the VFP is not used to accelerate transcendentals. CodeSourcery does have plans for enabling accelerated math functions. > My application is running on top of a 2.6.19 kernel that was > generated using the Freescale Linux Target Image Builder package > (that also uses your cross-compiler). The kernel was compiled using > the following flags: > > CFLAGS = -march=armv6 -mtune=arm1136j-s -O2 -fsigned-char > -mabi=aapcs-linux -mfpu=vfp -mfloat-abi=softfp This means your kernel uses VFP instructions, but not neccessarily userspace, and not the provided libm. > Do you have any suggestions as to how I might improve this > performance problem? -- Is the math library in one library package > faster than another? -- Can I rebuild the a library with the VFP > enabling compiler flags? Would that help? If you need fast transcendental functions I recommend implementing them yourself or using a 3rd party library and compiling with (-mfloat-abi=softfp). Cheers, Carlos. -- Carlos O'Donell CodeSourcery carlos@codesourcery.com (650) 331-3385 x716 ===========================================================

高手进阶，终极内存技术指南——完整/进阶版

存储时代上面有一篇较好的内存技术介绍文章

高手进阶，终极内存技术指南——完整/进阶版

ARM S3C2410硬件手册上的重要部分(转载)

[下面这篇文章写的不错]

ARM S3C2410硬件手册上的重要部分

a.Memory Controller
b.Nand Flash
c.UART
d.Interrupt
e.Timer

Memory Controller
SDRAM:
S3C2410提供了外接ROM、SRAM、SDRAM、NOR Flash、NAND Flash的接口。S3C2410外接存储器的空间被分为8 BANKS，每BANK容量为128M：当访问BANKx(x从0到7)所对应的地址范围x*128M到(x+1)*128M-1
SDRAM使用BANK6，它的物理起始地址为6*128M=0x30000000。

SDRAM(刷新):
之所以称为DRAM，就是因为它要不断进行刷新（Refresh）才能保留住数据，因此它是DRAM最重要的操作。
那么要隔多长时间重复一次刷新呢？目前公认的标准是，存储体中电容的数据有效保存期上限是64ms（毫秒，1/1000秒），也就是说每一行刷新的循环周期是64ms。这样刷新速度就是：行数量/64ms
刷新操作分为两种：自动刷新（Auto Refresh，简称AR）与自刷新（Self Refresh，简称SR）。不论是何种刷新方式，都不需要外部提供行地址信息，因为这是一个内部的自动操作。对于AR， SDRAM内部有一个行地址生成器（也称刷新计数器）用来自动的依次生成行地址。
由于刷新涉及到所有L-Bank，因此在刷新过程中，所有L-Bank都停止工作，而每次刷新所占用的时间为9个时钟周期（PC133标准），之后就可进入正常的工作状态，也就是说在这9 个时钟期间内，所有工作指令只能等待而无法执行。
SR则主要用于休眠模式低功耗状态下的数据保存，这方面最著名的应用就是STR（Suspend to RAM，休眠挂起于内存）。在发出AR命令时，将CKE置于无效状态，就进入了SR模式，此时不再依靠系统时钟工作，而是根据内部的时钟进行刷新操作。

SDRAM(寄存器设置):
本实验介绍如何使用SDRAM，这需要设置13个寄存器。由于我们只使用了BANK6，大部分的寄存器我们不必理会：
1．BWSCON：对应BANK0-BANK7，每BANK使用4位。这4位分别表示：
a．STx：启动/禁止SDRAM的数据掩码引脚，对于SDRAM，此位为0；对于SRAM，此位为1。
b．WSx：是否使用存储器的WAIT信号，通常设为0
c．DWx：使用两位来设置存储器的位宽：00-8位，01-16位，10-32位，11-保留。
d．比较特殊的是BANK0对应的4位，它们由硬件跳线决定，只读。
对于本开发板，使用两片容量为32Mbyte、位宽为16的SDRAM组成容量为64Mbyte、位宽为32的存储器，所以其BWSCON相应位为： 0010。对于本开发板，BWSCON可设为0x22111110：其实我们只需要将BANK6对应的4位设为0010即可，其它的是什么值没什么影响，这个值是参考手册上给出的。
2．BANKCON0-BANKCON5：我们没用到，使用默认值0x00000700即
3． BANKCON6-BANKCON7：设为0x00018005 在8个BANK中，只有BANK6和BANK7可以使用SRAM或SDRAM，所以BANKCON6-7与BANKCON0-5有点不同：
a．MT([16:15])：用于设置本BANK外接的是SRAM还是SDRAM：SRAM-0b00，SDRAM-0b11
b．当MT=0b11时，还需要设置两个参数：
Trcd([3:2])：RAS to CAS delay，设为推荐值0b01
SCAN ([1:0])：SDRAM的列地址位数，对于本开发板使用的SDRAM HY57V561620CT-H，列地址位数为9，所以SCAN=0b01。如果使用其他型号的SDRAM，您需要查看它的数据手册来决定SCAN的取值：00-8位，01-9位，10-10位
4． REFRESH(SDRAM refresh control register)：设为0x008e0000+ R_CNT 其中R_CNT用于控制SDRAM的刷新周期，占用REFRESH寄存器的[10:0]位，它的取值可如下计算(SDRAM时钟频率就是HCLK)：
R_CNT = 2^11 + 1 – SDRAM时钟频率(MHz) * SDRAM刷新周期(uS)
在未使用PLL时，SDRAM时钟频率等于晶振频率12MHz；
SDRAM 的刷新周期在SDRAM的数据手册上有标明，在本开发板使用的SDRAM HY57V561620CT-H的数据手册上，可看见这么一行“8192 refresh cycles / 64ms”：所以，刷新周期=64ms/8192 = 7.8125 uS。
对于本实验，R_CNT = 2^11 + 1 – 12 * 7.8125 = 1955, REFRESH=0x008e0000 + 1955 = 0x008e07a3
5．BANKSIZE：0x000000b2
位[7]=1：Enable burst operation
位[5]=1：SDRAM power down mode enable
位[4] =1：SCLK is active only during the access (recommended) 位[2:1]=010：BANK6、BANK7对应的地址空间与BANK0-5不同。BANK0-5的地址空间都是固定的128M，地址范围是 (x*128M)到(x+1)*128M-1，x表示0到5。但是BANK7的起始地址是可变的，您可以从S3C2410数据手册第5章“Table 5-1. Bank 6/7 Addresses”中了解到BANK6、7的地址范围与地址空间的关系。
本开发板仅使用BANK6的64M空间，我们可以令位[2:1]=010(128M/128M)或001(64M/64M)：这没关系，多出来的空间程序会检测出来，不会发生使用不存在的内存的情况——后面介绍到的bootloader和linux内核都会作内存检测。位[6]、位[3]没有使用
6．MRSRB6、MRSRB7：0x00000030
能让我们修改的只有位[6:4](CL)，SDRAM HY57V561620CT-H不支持CL=1的情况，所以位[6:4]取值为010(CL=2)或011(CL=3)。

Nand Flash

当OM1、OM0都是低电平——即开发板插上BOOT SEL跳线时，S3C2410从NAND Flash启动：NAND Flash的开始4k代码会被自动地复制到内部SRAM中。我们需要使用这4k代码来把更多的代码从NAND Flash中读到SDRAM中去。
NAND Flash的操作通过NFCONF、NFCMD、NFADDR、NFDATA、NFSTAT和NFECC六个寄存器来完成。在开始下面内容前，请打开S3C2410数据手册和NAND Flash K9F1208U0M的数据手册。

在S3C2410数据手册218页，我们可以看到读写NAND Flash的操作次序：
1.Set NAND flash configuration by NFCONF register.
2.Write NAND flash command onto NFCMD register.
3.Write NAND flash address onto NFADDR register.
4.Read/Write data while checking NAND flash status by NFSTAT register. R/nB signal should be checked before read operation or after program operation.

1、NFCONF：设为0xf830——
使能NAND Flash控制器、初始化ECC、NAND Flash片选信号nFCE=1(inactive，真正使用时再让它等于0)

设置TACLS、TWRPH0、TWRPH1。
需要指出的是TACLS、TWRPH0和TWRPH1，请打开S3C2410数据手册218页，可以看到这三个参数控制的是NAND Flash信号线CLE/ALE与写控制信号nWE的时序关系。
我们设的值为TACLS=0，TWRPH0=3，TWRPH1=0，其含义为：TACLS=1个HCLK时钟，TWRPH0=4个HCLK时钟，TWRPH1=1个HCLK时钟。

请打开K9F1208U0M数据手册第13页，在表“AC Timing Characteristics for Command / Address / Data Input”中可以看到： CLE setup Time = 0 ns，CLE Hold Time = 10 ns， ALE setup Time = 0 ns，ALE Hold Time = 10 ns， WE Pulse Width = 25 ns 可以计算，即使在HCLK=100MHz的情况下，TACLS+TWRPH0+TWRPH1=6/100 uS=60 ns，也是可以满足NAND Flash K9F1208U0M的时序要求的。

2、NFCMD：对于不同型号的Flash，操作命令一般不一样。对于本板使用的K9F1208U0M，请打开其数据手册第8页“Table 1. Command Sets”

3、NFADDR：地址

4、NFDATA：数据，只用到低8位

5、NFSTAT：状态，只用到位0，0-busy，1-ready

6、NFECC：校验

现在来看一下如何从NAND Flash中读出数据：
1、NFCONF = 0xf830
2、在第一次操作NAND Flash前，通常复位一下：
NFCONF &= ~0x800 (使能NAND Flash)
NFCMD = 0xff (reset命令)
循环查询NFSTAT位0，直到它等于1
3、NFCMD = 0 (读命令)
4、这步得稍微注意一下，请打开K9F1208U0M数据手册第7页，那个表格列出了在地址操作的4个步骤对应的地址线，A8没用到：
NFADDR = addr & 0xff
NFADDR = (addr>>9) & 0xff (注意了，左移9位，不是8位)
NFADDR = (addr>>17) & 0xff (左移17位，不是16位)
NFADDR = (addr>>25) & 0xff (左移25位，不是24位)
5、循环查询NFSTAT位0，直到它等于1
6、连续读NFDATA寄存器512次，得到一页数据(512字节)
7、NFCONF |= 0x800 (禁止NAND Flash)
UART

UART的寄存器有11X3个(3个UART)之多，我选最简单的方法来进行本实验，用到的寄存器也有8个。不过初始化就用去了5个寄存器，剩下的3个用于接收、发送数据。
1、初始化：
a.把使用到的引脚GPH2、GPH3定义为TXD0、RXD0：
GPHCON |= 0xa0 GPHUP |= 0x0c (上拉)
b．ULCON0 ( UART channel 0 line control register )：设为0x03 此值含义为：8个数据位，1个停止位，无校验，正常操作模式(与之相对的是Infra-Red Mode，此模式表示0、1的方式比较特殊)。
c．UCON0 (UART channel 0 control register )：设为0x05 除了位[3:0]，其他位都使用默认值。位[3:0]=0b0101表示：发送、接收都
使用“中断或查询方式”——本实验使用查询查询方式。
d．UFCON0 (UART channel 0 FIFO control register )：设为0x00 每个UART内部都有一个16字节的发送FIFO和接收FIFO，但是本实验不使用FIFO，设为默认值0
e．UMCON0 (UART channel 0 Modem control register )：设为0x00 本实验不使用流控，设为默认值0

f．UBRDIV0 ( R/W Baud rate divisior register 0 )：设为12 本实验未使用PLL， PCLK=12MHz，设置波特率为57600，则由公式 UBRDIVn = (int)(PCLK / (bps x 16) ) –1 可以计算得UBRDIV0 = 12，请使用S3C2410数据手册第314页的误差公式验算一下此波特率是否在可容忍的误差范围之内，如果不在，则需要更换另一个波特率(本实验使用的 57600是符合的)。
void init_uart( )
{//初始化UART
GPHCON |= 0xa0; //GPH2,GPH3 used as TXD0,RXD0
GPHUP = 0x0c; //GPH2,GPH3内部上拉
ULCON0 = 0x03; //8N1(8个数据位，无校验位，1个停止位)
UCON0 = 0x05; //查询方式
UFCON0 = 0x00; //不使用FIFO
UMCON0 = 0x00; //不使用流控
UBRDIV0 = 12; //波特率为57600 10 }

2、发送数据：
a．UTRSTAT0 ( UART channel 0 Tx/Rx status register )：位[2]：无数据发送时，自动设为1。当我们要使用串口发送数据时，先读此位以判断是否有数据正在占用发送口。位[1]：发送FIFO是否为空，本实验未用此位位[0]：接收缓冲区是否有数据，若有，此位设为1。本实验中，需要不断查询此位一判断是否有数据已经被接收。
b．UTXH0 (UART channel 0 transmit buffer register )：把要发送的数据写入此寄存器。

void putc(unsigned char c)
{
while( ! (UTRSTAT0 & TXD0READY) ); //不断查询，直到可以发送数据
UTXH0 = c; //发送数据
}
3、接收数据：
a．UTRSTAT0：如同上述“2、发送数据”所列，我们用到位[0]
b．URXH0 (UART channel 0 receive buffer register )：当查询到UTRSTAT0 位[0]=1时，读此寄存器获得串口接收到的数据。

unsigned char getc( )
{
while( ! (UTRSTAT0 & RXD0READY) ); //不断查询，直到接收到了数据
return URXH0; //返回接收到的数据
}

Interrrupt

SUBSRCPND和SRCPND寄存器表明有哪些中断被触发了，正在等待处理(pending)；
SUBMASK(INTSUBMSK寄存器)和MASK(INTMSK寄存器)用于屏蔽某些中断。
1、“Request sources(without sub -register)”中的中断源被触发之后，SRCPND寄存器中相应位被置1，如果此中断没有被INTMSK寄存器屏蔽、或者是快中断(FIQ)的话，它将被进一步处理
2、对于“Request sources(with sub -register)”中的中断源被触发之后，SUBSRCPND寄存器中的相应位被置1，如果此中断没有被INTSUBMSK寄存器屏蔽的话，它在 SRCPND寄存器中的相应位也被置1，之后的处理过程就和“Request sources(without sub -register)”一样了
请打开S3C2410数据手册357页，“Figure 14-2. Priority Generating Block”显示了各中断源先经过6个一级优先级仲裁器选出各自优先级最高的中断，然后再经过二级优先级仲裁器选从中选出优先级最高的中断。IRQ的中断优先级由RIORITY寄存器设定，请参考数据手册365页，RIORITY寄存器中ARB_SELn(n从0到6)用于设定仲裁器n各输入信号的中断优先级，例如ARB_SEL6[20:19](0最高，其后各项依次降低)：
00 = REQ 0-1-2-3-4-5 01 = REQ 0-2-3-4-1-5
10 = REQ 0-3-4-1-2-5 11 = REQ 0-4-1-2-3-5
RIORITY寄存器还有一项比较特殊的功能，如果ARB_MODEn设为1，则仲裁器n中输入的中断信号的优先级别将会轮换。例如ARB_MODE6设为1，则仲裁器6的6个输入信号的优先级将如下轮换(见数据手册358页)
使用中断的步骤：
1、当发生中断IRQ时，CPU进入“中断模式”，这时使用“中断模式”下的堆栈；当发生快中断FIQ时，CPU进入“快中断模式”，这时使用“快中断模式”下的堆栈。所以在使用中断前，先设置好相应模式下的堆栈。
2、对于“Request sources(without sub -register)”中的中断，将INTSUBMSK寄存器中相应位设为0
3、将INTMSK寄存器中相应位设为0
4、确定使用此的方式：是FIQ还是IRQ。
a．如果是FIQ，则在INTMOD寄存器设置相应位为1
b．如果是IRQ，则在RIORITY寄存器中设置优先级

使用中断的步骤：
5、准备好中断处理函数，
a．中断向量：在中断向量设置好当FIQ或IRQ被触发时的跳转函数， IRQ、FIQ的中断向量地址分别为0x00000018、0x0000001c
b．对于IRQ，在跳转函数中读取INTPND寄存器或INTOFFSET寄存器的值来确定中断源，然后调用具体的处理函数
c．对于FIQ，因为只有一个中断可以设为FIQ，无须判断中断源
d．中断处理函数进入和返回

6、设置CPSR寄存器中的F-bit(对于FIQ)或I-bit(对于IRQ)为0，开中断
使用中断的步骤：
IRQ进入和返回
sub lr, lr, #4 @计算返回地址
stmdb sp!, { r0-r12,lr } @保存使用到的寄存器 ⋯ ⋯

ldmia sp!, { r0-r12,pc }^ @中断返回 @^表示将spsr的值赋给cpsr

对于FIQ，进入和返回的代码如下：
sub lr, lr, #4 @计算返回地址
stmdb sp!, { r0-r7,lr } @保存使用到的寄存器 ⋯ ⋯

ldmia sp!, { r0-r7,pc }^ @快中断返回， @^表示将spsr的值赋给cpsr
中断返回之前需要清中断：往SUBSRCPND(用到的话)、SRCPND、INTPND中相应位写1即可。对于INTPND，最简单的方法就是“INTPND=INTPND”

Timer几个重要寄存器介绍

1、TCFG0和TCFG1：分别设为119和0x03
这连个寄存器用于设置“Control Logic”的时钟，计算公式如下：
Timer input clock Frequency = PCLK / {prescaler value+1} / {divider value}
对于TIMER0，prescaler value = TCFG0[7:0]，divider value由TCFG1[3:0]确定(0b000：2，0b001：4，0b010：8，0b0011：16，0b01xx：使用外部TCLK0)。
对于本实验，TIMER0时钟 = 12MHz/(119+1)/(16) = 6250Hz
2、TCNTB0：设为3125
在6250Hz的频率下，此值对应的时间为0.5S
3、TCON：
TIMER0对应bit[3:0]：
bit[3]用于确定在TCNT0计数到0时，是否自动将TCMPB0和TCNTB0寄存器的值装入TCMP0和TCNT0寄存器中
bit[2]用于确定TOUT0是否反转输出(本实验未用)
bit[1]用于手动更新TCMP0和TCNT0寄存器：在第一次使用定时器前，此位需要设为1，此时TCMPB0和TCNTB0寄存器的值装入TCMP0和TCNT0寄存器中
bit[0]用于启动TIMER0
4、TCONO0：只读寄存器，用于读取当前TCON0寄存器的值。

圆明园的新荷

几次到圆明园，见的都是残荷，因了去的时节都是秋冬或早春之际。我的
几位同事，另择佳时，在盛夏六月里去到这个曾经极盛的所在，并拍下了
一组极富生机的新荷。故窃取几幅，以为自己亦存盛夏游圆明园之意！

TinyOS:another parttime job.

This is the third parttime job I did in TsingHua University.TinyOS is designed to be used in Wireless Sensor Networks environment.The OS is actually not a common OS,such as ThreadX,vxWorks or Linux.The scheduler is very simple,even not using any timer for
scheduling,just FIFO scheme.Even simple,TinyOS has some new features to be explored:
1)A new language for a new OS.TinyOS uses nesC,which is much like C,but with more features from C++ and Java.It has interface,module,configration,with wiring scheme.
2)Component based.This draws a good solution for Resource Limited devices.Only the actually used components are built into the final image of the system.
3)Event driven.Even the OS does not use intterrupt for scheduling,the interrut is widely used for event handling.This gives more flexbility for a low power consumption device.
And more...
Currently,within two weeks I have ported the OS to the S3C2410 based board for a prototype.The design will be used in a Zigbee project guided by Professor Deng,who has some PhDs working for him.

中国开放教育资源平台

http://www.core.org.cn
是个不错的网站，里面有许多翻译的国外著名大学的PPT等！

小水在线

Thursday, June 28, 2007

URL:Compiling GCC for MinGW & Cross-Compiling with Linux

Tuesday, June 26, 2007

Feescale MX31 bootloader Program : HAB Toolkit（转贴）

Feescale MX31 bootloader Program : HAB Toolkit

i.MX31主要开发资源(转贴)

URL:RealView Assembler User's Guide

Friday, June 22, 2007

ARM GCC Inline Assembler Cookbook

About this Document

GCC asm Statement

Input and Output Operands

Clobbers

Assembler Macros

C Stub Functions

C Names Used in Assembler Code

Register Usage

Links

Pin:A tool for dynamic instrumentation of programs

Tuesday, June 19, 2007

MIPS架构学习笔记(转贴)

系统计算研究所

两篇关于ARM VFP的邮件

Monday, June 18, 2007

高手进阶，终极内存技术指南——完整/进阶版

高手进阶，终极内存技术指南——完整/进阶版

ARM S3C2410硬件手册上的重要部分(转载)

Sunday, June 17, 2007

圆明园的新荷

Friday, June 15, 2007

TinyOS:another parttime job.

Monday, June 11, 2007

中国开放教育资源平台

About Me

Links

Previous Posts

Archives