The Kernel Boot Process

The previous post explained how computers boot up right up to the point where the boot loader, after stuffing the kernel image into memory, is about to jump into the kernel entry point. This last post about booting takes a look at the guts of the kernel to see how an operating system starts life. Since I have an empirical bent I’ll link heavily to the sources for Linux kernel 2.6.25.6 at the Linux Cross Reference. The sources are very readable if you are familiar with C-like syntax; even if you miss some details you can get the gist of what’s happening. The main obstacle is the lack of context around some of the code, such as when or why it runs or the underlying features of the machine. I hope to provide a bit of that context. Due to brevity (hah!) a lot of fun stuff - like interrupts and memory - gets only a nod for now. The post ends with the highlights for the Windows boot.

At this point in the Intel x86 boot story the processor is running in real-mode, is able to address 1 MB of memory, and RAM looks like this for a modern Linux system:

RAM contents after boot loader runs
RAM contents after boot loader is done

The kernel image has been loaded to memory by the boot loader using the BIOS disk I/O services. This image is an exact copy of the file in your hard drive that contains the kernel, e.g. /boot/vmlinuz-2.6.22-14-server. The image is split into two pieces: a small part containing the real-mode kernel code is loaded below the 640K barrier; the bulk of the kernel, which runs in protected mode, is loaded after the first megabyte of memory.

The action starts in the real-mode kernel header pictured above. This region of memory is used to implement the Linux boot protocol between the boot loader and the kernel. Some of the values there are read by the boot loader while doing its work. These include amenities such as a human-readable string containing the kernel version, but also crucial information like the size of the real-mode kernel piece. The boot loader also writes values to this region, such as the memory address for the command-line parameters given by the user in the boot menu. Once the boot loader is finished it has filled in all of the parameters required by the kernel header. It’s then time to jump into the kernel entry point. The diagram below shows the code sequence for the kernel initialization, along with source directories, files, and line numbers:

Architecture-specific Linux Kernel Initialization
Architecture-specific Linux Kernel Initialization

The early kernel start-up for the Intel architecture is in file arch/x86/boot/header.S. It’s in assembly language, which is rare for the kernel at large but common for boot code. The start of this file actually contains boot sector code, a left over from the days when Linux could work without a boot loader. Nowadays this boot sector, if executed, only prints a “bugger_off_msg” to the user and reboots. Modern boot loaders ignore this legacy code. After the boot sector code we have the first 15 bytes of the real-mode kernel header; these two pieces together add up to 512 bytes, the size of a typical disk sector on Intel hardware.

After these 512 bytes, at offset 0×200, we find the very first instruction that runs as part of the Linux kernel: the real-mode entry point. It’s in header.S:110 and it is a 2-byte jump written directly in machine code as 0×3aeb. You can verify this by running hexdump on your kernel image and seeing the bytes at that offset - just a sanity check to make sure it’s not all a dream. The boot loader jumps into this location when it is finished, which in turn jumps to header.S:229 where we have a regular assembly routine called start_of_setup. This short routine sets up a stack, zeroes the bss segment (the area that contains static variables, so they start with zero values) for the real-mode kernel and then jumps to good old C code at arch/x86/boot/main.c:122.

main() does some house keeping like detecting memory layout, setting a video mode, etc. It then calls go_to_protected_mode(). Before the CPU can be set to protected mode, however, a few tasks must be done. There are two main issues: interrupts and memory. In real-mode the interrupt vector table for the processor is always at memory address 0, whereas in protected mode the location of the interrupt vector table is stored in a CPU register called IDTR. Meanwhile, the translation of logical memory addresses (the ones programs manipulate) to linear memory addresses (a raw number from 0 to the top of the memory) is different between real-mode and protected mode. Protected mode requires a register called GDTR to be loaded with the address of a Global Descriptor Table for memory. So go_to_protected_mode() calls setup_idt() and setup_gdt() to install a temporary interrupt descriptor table and global descriptor table.

We’re now ready for the plunge into protected mode, which is done by protected_mode_jump, another assembly routine. This routine enables protected mode by setting the PE bit in the CR0 CPU register. At this point we’re running with paging disabled; paging is an optional feature of the processor, even in protected mode, and there’s no need for it yet. What’s important is that we’re no longer confined to the 640K barrier and can now address up to 4GB of RAM. The routine then calls the 32-bit kernel entry point, which is startup_32 for compressed kernels. This routine does some basic register initializations and calls decompress_kernel(), a C function to do the actual decompression.

decompress_kernel() prints the familiar “Decompressing Linux…” message. Decompression happens in-place and once it’s finished the uncompressed kernel image has overwritten the compressed one pictured in the first diagram. Hence the uncompressed contents also start at 1MB. decompress_kernel() then prints “done.” and the comforting “Booting the kernel.” By “Booting” it means a jump to the final entry point in this whole story, given to Linus by God himself atop Mountain Halti, which is the protected-mode kernel entry point at the start of the second megabyte of RAM (0×100000). That sacred location contains a routine called, uh, startup_32. But this one is in a different directory, you see.

The second incarnation of startup_32 is also an assembly routine, but it contains 32-bit mode initializations. It clears the bss segment for the protected-mode kernel (which is the true kernel that will now run until the machine reboots or shuts down), sets up the final global descriptor table for memory, builds page tables so that paging can be turned on, enables paging, initializes a stack, creates the final interrupt descriptor table, and finally jumps to to the architecture-independent kernel start-up, start_kernel(). The diagram below shows the code flow for the last leg of the boot:

Architecture-independent Linux Kernel Initialization
Architecture-independent Linux Kernel Initialization

start_kernel() looks more like typical kernel code, which is nearly all C and machine independent. The function is a long list of calls to initializations of the various kernel subsystems and data structures. These include the scheduler, memory zones, time keeping, and so on. start_kernel() then calls rest_init(), at which point things are almost all working. rest_init() creates a kernel thread passing another function, kernel_init(), as the entry point. rest_init() then calls schedule() to kickstart task scheduling and goes to sleep by calling cpu_idle(), which is the idle thread for the Linux kernel. cpu_idle() runs forever and so does process zero, which hosts it. Whenever there is work to do - a runnable process - process zero gets booted out of the CPU, only to return when no runnable processes are available.

But here’s the kicker for us. This idle loop is the end of the long thread we followed since boot, it’s the final descendent of the very first jump executed by the processor after power up. All of this mess, from reset vector to BIOS to MBR to boot loader to real-mode kernel to protected-mode kernel, all of it leads right here, jump by jump by jump it ends in the idle loop for the boot processor, cpu_idle(). Which is really kind of cool. However, this can’t be the whole story otherwise the computer would do no work.

At this point, the kernel thread started previously is ready to kick in, displacing process 0 and its idle thread. And so it does, at which point kernel_init() starts running since it was given as the thread entry point. kernel_init() is responsible for initializing the remaining CPUs in the system, which have been halted since boot. All of the code we’ve seen so far has been executed in a single CPU, called the boot processor. As the other CPUs, called application processors, are started they come up in real-mode and must run through several initializations as well. Many of the code paths are common, as you can see in the code for startup_32, but there are slight forks taken by the late-coming application processors. Finally, kernel_init() calls init_post(), which tries to execute a user-mode process in the following order: /sbin/init, /etc/init, /bin/init, and /bin/sh. If all fail, the kernel will panic. Luckily init is usually there, and starts running as PID 1. It checks its configuration file to figure out which processes to launch, which might include X11 Windows, programs for logging in on the console, network daemons, and so on. Thus ends the boot process as yet another Linux box starts running somewhere. May your uptime be long and untroubled.

The process for Windows is similar in many ways, given the common architecture. Many of the same problems are faced and similar initializations must be done. When it comes to boot one of the biggest differences is that Windows packs all of the real-mode kernel code, and some of the initial protected mode code, into the boot loader itself (C:\NTLDR). So instead of having two regions in the same kernel image, Windows uses different binary images. Plus Linux completely separates boot loader and kernel; in a way this automatically falls out of the open source process. The diagram below shows the main bits for the Windows kernel:

Windows Kernel Initialization
Windows Kernel Initialization

The Windows user-mode start-up is naturally very different. There’s no /sbin/init, but rather Csrss.exe and Winlogon.exe. Winlogon spawns Services.exe, which starts all of the Windows Services, and Lsass.exe, the local security authentication subsystem. The classic Windows login dialog runs in the context of Winlogon.

This is the end of this boot series. Thanks everyone for reading and for feedback. I’m sorry some things got superficial treatment; I’ve gotta start somewhere and only so much fits into blog-sized bites. But nothing like a day after the next; my plan is to do regular “Software Illustrated” posts like this series along with other topics. Meanwhile, here are some resources:

[Update: Thanks to Marius Barbu for catching a mistake where I wrote "CR3" instead of GDTR]

Comments

42 Responses to “The Kernel Boot Process”

  1. FAb on June 23rd, 2008 8:40 am

    Great article, I loved it. Thanks.

    What tool do you use to generate these cute schema and illustrations ?

  2. Frank Spychalski on June 23rd, 2008 8:44 am

    Excellent article, thanks!

  3. xxx on June 23rd, 2008 8:45 am

    > Wow, that sounds very complicated. Is the process really that complicated?

    No, he made it all up. What were you thinking?!

  4. Maurice on June 23rd, 2008 8:45 am

    Only for people that spam.

  5. Gustavo Duarte on June 23rd, 2008 9:21 am

    @FAb: cool, you’re welcome. The diagrams were all done in Visio 2007.

    @Frank: thanks for reading.

    @xxx: hahaha.

    @Maurice: Yea, that comment cum URL is borderline. Sigh.

  6. Traverse Davies on June 23rd, 2008 9:34 am

    I have been seeing that ultimate anonymity crap come up in so many comment threads lately. Funny, always under different URL’s too. Other than that, great article (although I really could have used it about two weeks ago when trying to fix some weird boot errors, ah well, muddled through them in the end)

  7. Gustavo Duarte on June 23rd, 2008 9:39 am

    Alright, then I’m deleting it. Thanks for the heads up.

  8. Dreamtorrent on June 23rd, 2008 11:26 am

    Kick ass article, I enjoyed it!

    It fills in a few blanks I got in my vague knowledge of this process, and being pretty humble in my knowledge, don’t think you under-complicated it at all - however I now am curious for more detail.

    Oh, yeah. Did an upgrade which crashed and it deleted /sbin/init - at least now I know what step of the process that was … in hindsight. LOL

    Keep it up!

    H

  9. meneame.net on June 23rd, 2008 11:36 am

    La secuencia de inicio en el kernel Linux (ingles)…

    Gran explicación de cómo se inicia el sistema operativo Linux. Quizá es un poco compleja para los no informático pero me ha parecido interesante….

  10. kaizen on June 23rd, 2008 12:26 pm

    what software do you use to create those nice diagrams?

  11. [FAQ] How the Kernel Starts Up - Overclock.net - Overclocking.net on June 23rd, 2008 1:39 pm

    [...] startup of a linux kernel primarily, but describes how the windows kernel is different in the end. Link __________________ BIG BROTHERWe apologize for the inconvenience IS [...]

  12. Marius Barbu on June 23rd, 2008 2:42 pm

    Nice writeup, subscribed!

    However, there’s a little error in the article:
    “Protected mode requires a register called CR3 to be loaded with the address of a Global Descriptor Table for memory”.

    CR3 is the PDBR (Page Directory Base Register, holds the physical address of the page directory) so is only needed when paging is enabled. The Global Descriptor Table is loaded into GDTR (special register just like IDTR) by the lgdt instruction.

  13. Stop Being Carbon · Things I wanna read in the next few days on June 23rd, 2008 2:55 pm

    [...] Kernel Boot Process Diary of a failed Startup Who needs a Computer Science Degree when there’s Wikipedia Programmer Insecurity Metaclass Programming in Python [...]

  14. Patrick Moroney on June 23rd, 2008 2:58 pm

    I also highly recommend Linux Kernel Development by Robert Love
    Much less dry then Understanding the Linux Kernel, and also more recent.

    http://www.amazon.com/Linux-Kernel-Development-Novell-Press/dp/0672327201

  15. Gustavo Duarte on June 24th, 2008 12:44 am

    @Dreamtorrent: thanks :)

    @kaizen: MS Visio 2007. I use ‘themes’, which make it easy to make decent-looking stuff.

    @Marius: thanks for catching it. Fixed in the text.

    @Patrick: thanks for the reference, I’ll add it to the text as well.

  16. Jeff Moser on June 24th, 2008 9:07 pm

    Thanks for the well researched post! I especially liked the links to the specific functions in the Linux kernel source.

    I’ve subscribed to your feed and look forward to upcoming posts.

    Keep up the great work!

  17. pligg.com on June 25th, 2008 6:08 am

    The Kernel Boot Process Explained: 2.6.25.6…

    The kernel boot process for linux 2.6.25.6 explained…

  18. Nikesh on June 25th, 2008 7:50 am

    Can not have better then this, awesome !!!

    Thanks.

  19. The Burgeoning Openly Owned Web » links for 2008-06-28 on June 27th, 2008 7:09 pm

    [...] The Kernel Boot Process : Gustavo Duarte the birth of “life” (tags: linux kernel boot bootloader) [...]

  20. Sara Eulodue on June 29th, 2008 6:07 pm

    This would be infinitely more useful if you showed the Windows boot process first since that is what most computers actually use. And THEN at the end you can use the academic Linux boot process for completeness. Nope, sorry, this gets a thumbs down from me on SU. Next.

  21. Kevin DuBois on June 29th, 2008 10:01 pm

    Great trifecta of boot-up articles, thanks!

  22. Naseer on July 1st, 2008 5:14 am

    Awesome article, Thank you !

  23. Gustavo Duarte on July 1st, 2008 10:18 am

    Thank you all for reading and for the feedback. :)

  24. Justin Blanton | The kernel boot process on July 2nd, 2008 1:16 am

    [...] The kernel boot process. [...]

  25. Ben Petering on July 7th, 2008 12:40 am

    Very good article. I love the ‘illustrated’ style you used.

    Incidentally, I’ve just skimmed your entire blog, and I’m rather impressed. Not only is your English quite good (a concern you voiced in one post - IMO, reading TCP/IP Illustrated is a damn good start if you’re doing technical writing :), but _every_ post you’ve written so far looks interesting and substantial.

    Keep up the good work. I’ll be back.

    -ben

  26. Linkdump: Teorija kategorija, kako radi kernel… by Nikola Plejić on July 7th, 2008 4:14 am

    [...] Chipsets and the Memory Map, How Computers Boot Up i The Kernel Boot Process Za one koje zanima kako računala rade iznutra, Gustavo Duarte je napisao seriju članaka čiji je [...]

  27. The Kernel Boot Process « Vietwow’s Weblog on July 8th, 2008 8:09 am
  28. Christian on July 8th, 2008 1:59 pm

    Muito bom gustavo!
    Posso traduzir e colocar no meu blog, e uma referência p/ cá?

    []´s

  29. eto demerzel on July 8th, 2008 2:51 pm

    It’s fake, photoshopped. Look, you can see the blurred pixel area :D

    Great article, definitely the best explanation about boot up flow I’ve found the graphics are the top.

    Good work.

  30. Gustavo Duarte on July 9th, 2008 4:56 am

    @Ben: thanks a ton :) I’m a huge fan of the W. Richard Stevens books as well, so I fully agree they’re a damn good start. What I meant by the English comment was that I sometimes feel a lack of non-tech reading has hampered my English. Say, when it’s time to come up with a metaphor or the ‘right word’, that kind of thing. But I’ve been here in the US for a few years now, so it’s less of a problem now.

    @Christian: obrigado, e pode traduzir sem problemas, desde que tenha o link. Se vc quiser eu posse te mandar os arquivos Visio 2007 para as imagens ou traduzi-las pra voce.

    @eto: hahaha, fake computer pr0n. Anyhow, thanks for the kind words

  31. Regular (S)expressions :: Entries :: linkz on July 10th, 2008 7:18 am

    [...] linux kernel boot process; http://duartes.org/gustavo/blog/post/kernel-boot-process The previous post explained how computers boot up right up to the point where the boot loader, [...]

  32. Christian on July 15th, 2008 8:31 am

    Olá Gustavo!

    Pode deixar que eu vou colocar o link sim.
    Por favor, me envie os arquivos para eu traduzir.

    Quando eu terminar de traduzir tudo, eu mando para você dar uma revisada, vc quer?

    Obrigado e forte abraço!

    Christian

  33. Idefix on July 16th, 2008 4:55 am

    Excellent article, but I have one question:
    If decompression happens in-place, how come the compressed parts don’t get overwritten by uncompressed data before those compressed part are read?

  34. Alfredo Reino » Archivo del Blog » Cómo arrancan los ordenadores on July 16th, 2008 8:07 am

    [...] The kernel boot process [...]

  35. Gustavo Duarte on July 16th, 2008 10:43 am

    @Idefix: the compressed image is temporarily moved up in memory a notch, creating a ‘buffer zone’ between the place in memory where uncompressed contents are being written to and the place where compressed contents are read from.

    The code is here.

    cheers

  36. Amjith on July 16th, 2008 11:05 am

    Hi Gustavo,
    The whole process of computer boot up from memory map to kernel loading was amazing. I linked your articles to http://www.osnews.com/story/20064/Computer_Boot_Up_Process. It is refreshing to see articles that are succinct and resourceful.

  37. Mojes on July 16th, 2008 3:24 pm

    Thank You!

    This three articles show something very complicated in easy way. Good job!
    I was looking for such text for a long time.

    -mojes

  38. Kilian Hekhuis on July 17th, 2008 1:45 am

    “In real-mode the interrupt vector table for the processor is always at memory address 0, whereas in protected mode the location of the interrupt vector table is stored in a CPU register called IDTR” - This is not true. Also in real mode, the CPU uses the IDTR to locate the (real mode) IVT. In practice, the IDTR is always set to 0, but it could be changed.

  39. Gustavo Duarte on July 17th, 2008 2:01 am

    @Amjith: thank you for the kind words and also for the link. I got a ton of traffic from you.

    @Mojes: you’re very welcome!

    @Kilian: Thanks for noting this. I’ll change the language to be more accurate.

  40. nakisa on August 8th, 2008 12:46 am

    thanks alot , that was great

    i will be so much glad :)))))) if you post more and more such sweets.

  41. Frederik Braun on August 11th, 2008 8:00 am

    Well explained. I think I got it, despite the fact that I didn’t know much about this topic.
    So, thank you ;)

    Frederik

    P.S.: More posts on this topic will be appreciated ;)

  42. Memory Translation and Segmentation : Gustavo Duarte on August 12th, 2008 2:33 am

    [...] segmentation, protection, and paging in Intel-compatible (x86) computers, in the spirit of the boot series, as the next step down the path of how kernels work. As usual, I’ll link to Linux kernel [...]

Leave a Reply