Getting Physical With Memory

When trying to understand complex systems, you can often learn a lot by stripping away abstractions and looking at their lowest levels. In that spirit we take a look at memory and I/O ports in their simplest and most fundamental level: the interface between the processor and bus. These details underlie higher level topics like thread synchronization and the need for the Core i7. Also, since I’m a programmer I ignore things EE people care about. Here’s our friend the Core 2 again:

Physical Memory Access

A Core 2 processor has 775 pins, about half of which only provide power and carry no data. Once you group the pins by functionality, the physical interface to the processor is surprisingly simple. The diagram shows the key pins involved in a memory or I/O port operation: address lines, data pins, and request pins. These operations take place in the context of a transaction on the front side bus. FSB transactions go through 5 phases: arbitration, request, snoop, response, and data. Throughout these phases, different roles are played by the components on the FSB, which are called agents. Normally the agents are all the processors plus the northbridge.

We only look at the request phase in this post, in which 2 packets are output by the request agent, who is usually a processor. Here are the juiciest bits of the first packet, output by the address and request pins:

FSB Request Phase, Packet A

The address lines output the starting physical memory address for the transaction. We have 33 bits but they are interpreted as bits 35-3 of an address in which bits 2-0 are zero. Hence we have a 36-bit address, aligned to 8 bytes, for a total of 64GB addressable physical memory. This has been the case since the Pentium Pro. The request pins specify what type of transaction is being initiated; in I/O requests the address pins specify an I/O port rather than a memory address. After the first packet is output, the same pins transmit a second packet in the subsequent bus clock cycle:

FSB Request Phase, Packet B

The attribute signals are interesting: they reflect the 5 types of memory caching behavior available in Intel processors. By putting this information on the FSB, the request agent lets other processors know how this transaction affects their caches, and how the memory controller (northbridge) should behave. The processor determines the type of a given memory region mainly by looking at page tables, which are maintained by the kernel.

Typically kernels treat all RAM memory as write-back, which yields the best performance. In write-back mode the unit of memory access is the cache line, 64 bytes in the Core 2. If a program reads a single byte in memory, the processor loads the whole cache line that contains that byte into the L2 and L1 caches. When a program writes to memory, the processor only modifies the line in the cache, but does not update main memory. Later, when it becomes necessary to post the modified line to the bus, the whole cache line is written at once. So most requests have 11 in their length field, for 64 bytes. Here’s a read example in which the data is not in the caches:

Memory Read Sequence Diagram

Some of the physical memory range in an Intel computer is mapped to devices like hard drives and network cards instead of actual RAM memory. This allows drivers to communicate with their devices by writing to and reading from memory. The kernel marks these memory regions as uncacheable in the page tables. Accesses to uncacheable memory regions are reproduced in the bus exactly as requested by a program or driver. Hence it’s possible to read or write single bytes, words, and so on. This is done via the byte enable mask in packet B above.

The primitives discussed here have many implications. For example:

  1. Performance-sensitive applications should try to pack data that is accessed together into the same cache line. Once the cache line is loaded, further reads are much faster and extra RAM accesses are avoided.
  2. Any memory access that falls within a single cache line is guaranteed to be atomic (assuming write-back memory). Such an access is serviced by the processor’s L1 cache and the data is read or written all at once; it cannot be affected halfway by other processors or threads. In particular, 32-bit and 64-bit operations that don’t cross cache line boundaries are atomic.
  3. The front bus is shared by all agents, who must arbitrate for bus ownership before they can start a transaction. Moreover, all agents must listen to all transactions in order to maintain cache coherence. Thus bus contention becomes a severe problem as more cores and processors are added to Intel computers. The Core i7 solves this by having processors attached directly to memory and communicating in a point-to-point rather than broadcast fashion.

These are the highlights of physical memory requests; the bus will surface again later in connection with locking, multi-threading, and cache coherence. The first time I saw FSB packet descriptions I had a huge “ahhh!” moment so I hope someone out there gets the same benefit. In the next post we’ll go back up the abstraction ladder to take a thorough look at virtual memory.

Comments

15 Responses to “Getting Physical With Memory”

  1. Casey on January 16th, 2009 5:32 pm

    Sorry to nit-pick. On the first graphic, “4 Request Pins REQ[4:0]” looks like a typo to me.

  2. Gustavo Duarte on January 16th, 2009 11:46 pm

    @Casey: Don’t be sorry, I appreciate it. Fixed. Thanks!

  3. JK on January 16th, 2009 11:51 pm

    I wish your blog was around when I was at college. Very useful stuff.

  4. Santiago on January 17th, 2009 4:05 am

    I think the “Attribute Signals” table has another typo: Both values for the Write-protected and Write-back items are “110″. Cheers

  5. Gustavo Duarte on January 17th, 2009 12:10 pm

    @Santiago: Fixed, thanks for letting me know. The Write-back should have been 111. Have a great weekend.

  6. McGrew Security Blog » Blog Archive » Gustavo Duarte’s Great Internals Series on January 27th, 2009 3:23 pm

    [...] Getting Physical With Memory [...]

  7. links for 2009-01-27 « Donghai Ma on January 27th, 2009 9:00 pm

    [...] Getting Physical With Memory : Gustavo Duarte (tags: memory hardware x86 internals computer) [...]

  8. Romeo on January 29th, 2009 12:17 pm

    Great post. Looking how ugly the Intel architecture is, it reminded me of a famous quotation of a computer scientist whose name I don’t remember now, that once said something like this: “Cache is not architecture. It is a performance hack”. Indeed. Cheers,

    Romeo

    P.S. Seu trabalho tem sido considerado muitíssimo educativo. Abraços de um conterrâneo.

  9. avaz on February 17th, 2009 1:29 am

    ..awesome blog, really useful stuff..hope you don’t mind me pinching some of your diagrams for my network engineering course ;-)

    Cheers

  10. Gustavo Duarte on February 18th, 2009 8:55 am

    @avaz: That’s fine, I’m happy to hear about the stuff being used in courses. I only ask that you credit the blog (say, put the URL somewhere so people might reach it). Cheers.

  11. Ya-tou & me » Blog Archive » How The Kernel Manages Your Memory on February 19th, 2009 1:43 am

    [...] as a large block called the physical address space. While memory operations on the bus are somewhat involved, we can ignore that here and assume that physical addresses range from zero to the top of available [...]

  12. How The Kernel Manages Your Memory « Motherboard Blog on May 14th, 2009 4:41 pm

    [...] as a large block called the physical address space. While memory operations on the bus are somewhat involved, we can ignore that here and assume that physical addresses range from zero to the top of available [...]

  13. Khushal Singh Narooka on July 19th, 2009 9:14 pm

    Fantastic blog I ever found over web, well explained and exceptionally good diagrams.

    Regards
    Khushal

  14. avinash on July 22nd, 2009 3:59 am

    good job gustavo…… I am become fan of you :)

  15. Samip on February 3rd, 2010 6:01 pm

    I had that “aagh” moment as I always wondered how “volatile” keyword does its function and description of this bus explains a lot of things. I assume kernel marks them as uncacheable and hence everything else is taken care by hardware…. Thank You for wonderful article…

Leave a Reply