What Your Computer Does While You Wait
This post takes a look at the speed - latency and throughput - of various subsystems in a modern commodity PC, an Intel Core 2 Duo at 3.0GHz. I hope to give a feel for the relative speed of each component and a cheatsheet for back-of-the-envelope performance calculations. I’ve tried to show real-world throughputs (the sources are posted as a comment) rather than theoretical maximums. Time units are nanoseconds (ns, 10-9 seconds), milliseconds (ms, 10-3 seconds), and seconds (s). Throughput units are in megabytes and gigabytes per second. Let’s start with CPU and memory, the north of the northbridge:
The first thing that jumps out is how absurdly fast our processors are. Most simple instructions on the Core 2 take one clock cycle to execute, hence a third of a nanosecond at 3.0Ghz. For reference, light only travels ~4 inches (10 cm) in the time taken by a clock cycle. It’s worth keeping this in mind when you’re thinking of optimization - instructions are comically cheap to execute nowadays.
As the CPU works away, it must read from and write to system memory, which it accesses via the L1 and L2 caches. The caches use static RAM, a much faster (and expensive) type of memory than the DRAM memory used as the main system memory. The caches are part of the processor itself and for the pricier memory we get very low latency. One way in which instruction-level optimization is still very relevant is code size. Due to caching, there can be massive performance differences between code that fits wholly into the L1/L2 caches and code that needs to be marshalled into and out of the caches as it executes.
Normally when the CPU needs to touch the contents of a memory region they must either be in the L1/L2 caches already or be brought in from the main system memory. Here we see our first major hit, a massive ~250 cycles of latency that often leads to a stall, when the CPU has no work to do while it waits. To put this into perspective, reading from L1 cache is like grabbing a piece of paper from your desk (3 seconds), L2 cache is picking up a book from a nearby shelf (14 seconds), and main system memory is taking a 4-minute walk down the hall to buy a Twix bar.
The exact latency of main memory is variable and depends on the application and many other factors. For example, it depends on the CAS latency and specifications of the actual RAM stick that is in the computer. It also depends on how successful the processor is at prefetching - guessing which parts of memory will be needed based on the code that is executing and having them brought into the caches ahead of time.
Looking at L1/L2 cache performance versus main memory performance, it is clear how much there is to gain from larger L2 caches and from applications designed to use it well. For a discussion of all things memory, see Ulrich Drepper’s What Every Programmer Should Know About Memory (pdf), a fine paper on the subject.
People refer to the bottleneck between CPU and memory as the von Neumann bottleneck. Now, the front side bus bandwidth, ~10GB/s, actually looks decent. At that rate, you could read all of 8GB of system memory in less than one second or read 100 bytes in 10ns. Sadly this throughput is a theoretical maximum (unlike most others in the diagram) and cannot be achieved due to delays in the main RAM circuitry. Many discrete wait periods are required when accessing memory. The electrical protocol for access calls for delays after a memory row is selected, after a column is selected, before data can be read reliably, and so on. The use of capacitors calls for periodic refreshes of the data stored in memory lest some bits get corrupted, which adds further overhead. Certain consecutive memory accesses may happen more quickly but there are still delays, and more so for random access. Latency is always present.
Down in the southbridge we have a number of other buses (e.g., PCIe, USB) and peripherals connected:
Sadly the southbridge hosts some truly sluggish performers, for even main memory is blazing fast compared to hard drives. Keeping with the office analogy, waiting for a hard drive seek is like leaving the building to roam the earth for one year and three months. This is why so many workloads are dominated by disk I/O and why database performance can drive off a cliff once the in-memory buffers are exhausted. It is also why plentiful RAM (for buffering) and fast hard drives are so important for overall system performance.
While the “sustained” disk throughput is real in the sense that it is actually achieved by the disk in real-world situations, it does not tell the whole story. The bane of disk performance are seeks, which involve moving the read/write heads across the platter to the right track and then waiting for the platter to spin around to the right position so that the desired sector can be read. Disk RPMs refer to the speed of rotation of the platters: the faster the RPMs, the less time you wait on average for the rotation to give you the desired sector, hence higher RPMs mean faster disks. A cool place to read about the impact of seeks is the paper where a couple of Stanford grad students describe the Anatomy of a Large-Scale Hypertextual Web Search Engine (pdf).
When the disk is reading one large continuous file it achieves greater sustained read speeds due to the lack of seeks. Filesystem defragmentation aims to keep files in continuous chunks on the disk to minimize seeks and boost throughput. When it comes to how fast a computer feels, sustained throughput is less important than seek times and the number of random I/O operations (reads/writes) that a disk can do per time unit. Solid state disks can make for a great option here.
Hard drive caches also help performance. Their tiny size - a 16MB cache in a 750GB drive covers only 0.002% of the disk - suggest they’re useless, but in reality their contribution is allowing a disk to queue up writes and then perform them in one bunch, thereby allowing the disk to plan the order of the writes in a way that - surprise - minimizes seeks. Reads can also be grouped in this way for performance, and both the OS and the drive firmware engage in these optimizations.
Finally, the diagram has various real-world throughputs for networking and other buses. Firewire is shown for reference but is not available natively in the Intel X48 chipset. It’s fun to think of the Internet as a computer bus. The latency to a fast website (say, google.com) is about 45ms, comparable to hard drive seek latency. In fact, while hard drives are 5 orders of magnitude removed from main memory, they’re in the same magnitude as the Internet. Residential bandwidth still lags behind that of sustained hard drive reads, but the ‘network is the computer’ in a pretty literal sense now. What happens when the Internet is faster than a hard drive?
I hope this diagram is useful. It’s fascinating for me to look at all these numbers together and see how far we’ve come. Sources are posted as a comment. I posted a full diagram showing both north and south bridges here if you’re interested.
Comments
114 Responses to “What Your Computer Does While You Wait”
Leave a Reply
EDIT: updated 3/23/2009
Instruction and L1/L2 latencies come from Intel’s Optimization Reference Manual.
DMI speed is from Intel’s datasheet on the chipset shown in the diagram, which you can find here.
Front side bus bandwidth is from Ulrich Drepper’s paper and common knowledge of the FSB.
The hardest bit of info in the diagram is actually the RAM latency. I looked at several of Drepper’s experiments to decide on ~250 cycles. I believe this is a fair number,
close to real-world situations, but reality is much more complicated than a single value conveys. TLB is ignored as well in this post.
SATA bandwidth from Wikipedia.
Hard drive I/O specs from Storage Review.
Firewire and USB 2.0 speed: http://www.barefeats.com/usb2.html,
http://www.firewire-1394.com/firewire-vs-usb.htm
USB 1.0: http://www.tomshardware.com/reviews/step,449-7.html
It’s hard to find comprehensive information for Ethernet speeds. Andre went into detail in a great comment below on why many of the speeds we see posted online are too slow. He also posted results from a netio run for his network. Some slower speeds (with a Windows bias, hence hampered by SMB) can be found in:
http://www.ezlan.net/net_speed.html
http://www.ezlan.net/giga.html
http://www.hanselman.com/blog/WiringTheHouseForAHomeNetworkPart5GigabitThroughputAndVista.aspx
As per Scott Hanselman’s post,
Jumbo frames can make a big difference in Gigabit performance, boosting
throughput by as much as 100%.
These bus and network throughput values match up with rules of thumb I often see in the trenches. If you know of more thorough benchmarks
corroborating or contradicting anything, please let me know. Thanks!
@Gustavo: x16 PCie slot has wrong speed, its 500MB/s per lane @ PCIe 2.0 so x16 would be 8GB/s…
Great post Gustavo, keep up the good work!
The L2 cache can catch you out when working on large data structures. I remember when L2 caches were 4MB getting caught out scaling images for a texture mapped game. Once the images approached 4MB in size the performance dropped of a cliff. I know that now the GPU does a lot of this scaling and heavy-lifting for you but the principle when managing any large structure is the same.
Very informative, enjoyed reading it! Thanks.
I’m jealous of your hypothetical 7500 GB hard drive.
[...] While You Wait Sunday, November 30th, 2008 | Computers, JFI1 Computers This post takes a look at the speed - latency and throughput - of various subsystems. [...]
It would be interested to compare this to the new Intel X58 chipset.
This is a great compilation of memory hierarchy performance numbers. One word of caution though—you mix in the numbers for latency and bandwidth, which might lead a superficial reader to confuse the effect of the two. Bandwidth is the crucial one, because if it’s not adequate, there’s nothing you can do.
Latency, on the other hand, can be remediated: even though bad latency can kill a good bandwidth, caching/prefetching or amortization over large transfers can mask it out. Unfortunately, those techniques assume specific
data access patterns, and if those assumptions fail, latency becomes a problem. This leads to a circular trap in system design: the access pattern
has to be well characterized for a successful latency remediation, so that
only programs with ‘conforming’ access pattern run well on this system. Paradoxically, the better-optimized systems penalize weird and non-standard access patterns even more.
Gigabit Ethernet is a good example: 1 Gbps is a great peak bandwidth, but the actual practical bandwidth is much lower, due to relatively short packets (you quote 30 MBps instead of expected 100 megabytes/sec). The actual number for a stupid protocol (e.g. synchronous exchange of short packets) would be much smaller and probably not much faster than 100baseT, because it would have been dominated by packet send/receive latency.
Great post. Think I found a typo:
“Now, the front side bus bandwidth, ~10GB/s, actually looks decent. At that rate, you could read all of 8GB of system memory under 10 seconds or read 100 bytes in 10ns.”
Shouldn’t that only take 1 second (or less) to read all of main memory, not 10s (in theory)?
Excellent article, thank you!
I’m curious, what software did you use to create the graphics in the article?
great article - very interesting.
i have the same chip in my home pc - good to know what it’s up to
Thanks, I really enjoyed this article
Wow, totally fascinating article indeed. Well done.
jess
Thank you all for the feedback!
@Miha: Thank you, will fix the diagram.
@Michael: I wish. hahah. Fixed.
@Przemek: that was an interesting point. Good material for another post
@Kevin: yep, typo indeed, thanks!
Also, I think I got the amount of L1 cache wrong, will fix that too.
Excellent article, thank you very much. What software did you use for the graphics? (Magnus has also asked this).
Light travels 11.78 inches in a nanosecond.
@Magnus, @saurabh: Thanks for the feedback. The diagrams are all made in Visio 2007.
@Keith: but we have 3 cycles per nanosecond, so 11.78/3 ~= 4.
Wonderful article. I love all the pages in this site!
Nice post! Good diagrams. And finally, a clear illustration of why optimization is the LAST thing most developers should worry about, as long as bug-infested software is still the norm.
Great post! Thanks!
[...] …während wir warten: [...]
WHAT A WASTE OF TIME AND INTERNET SPACE. WHY NOT REPORT ON THE GOOD THAT OUR TROOPS ARE DOING IN IRAQ? WHERE ARE THE GOOD STORIES?? OUR FREEDOM FIGHTERS HAVE OCCOMPLISHED SO MUCH IN THE LAST FEW YEARS, AND THE LIBERAL INTERNET IGNORES THEM, JUST LIKE THEY IGNORE GOD AND THE TEN COMMANDMENTS!!!
IF YOU MUST HATE THE TROOPS FROM YOUR LIBERAL IDEOLOGY, WHY NOT AT LEAST TALK ABOUT JESUS? I SAW NO REFERENCES TO JESUS ON YOUR WEBSITE!! WHY NO BIBLE PASSAGES AT LEAST?? THESE MEN ARE DYING FOR YOUR FREEDOM TO SIT AROUND AND TYPE USELESS CRAP, AND ALL YOU CAN DO IS DEFICATE ON THEIR IMAGE AND THE SAVIOR FOR WHOM THEY ARE WORKING.
YOU ARE JUST ANOTHER LIBERAL TERRORIST. HOPEFULLY THIS ECONOMY WILL CLEAN UP THE INTERNET AND GET RID OF WEBSITES LIKE THIS. ENJOY HOMELESSNESS, APOSTATE.
Thanks for a very interesting article.
To be consistent, the 1Gbit ethernet should have ~100MB/sec next to it — the rest of the speeds and bandwidth listed are theoretical maxes, whereas the linked article provides the throughput for a particular protocol (tcp/samba) and configuration.
[...] What Your Computer Does While You Wait : Gustavo Duarte [...]
Great article. Now I know why my external USB hard drives and wired ethernet network drives were so slow. Maybe you could add eSATA the next go around.
Gustavo,QUE BUENO!
Well done and expect the criticism. Action will always trump knowledge,ans you have both.
This will help my boardroom analogies for the not so technical purse strings.
L
maybe pencil in what the new fast usb 3.0 standard would mean?
What a wonderful post — you are a great teacher. Thanks for posting this! I loved “roam the earth”.
rofl @ #22
[...] deal with the issues what your computer does while you wait [...]
I stopped reading when you said this: “It’s worth keeping this in mind when you’re thinking of optimization - instructions are comically cheap to execute nowadays.”
That’s so Java School ignorant.
[...] 1st, 2008 @ 6:00 pm | Author: admin Software developer Gustavo Duarte explains exactly what your computer does while you wait, in a thorough, technical answer to the question “Why the hell is my computer taking so long [...]
[...] What Your Computer Does While You Wait : Gustavo Duarte Blogged with the Flock Browser [...]
[...] What Your Computer Does While You Wait [...]
#22 WTF???
[...] What Your Computer Does While You Wait : Gustavo DuarteThe first thing that jumps out is how absurdly fast our processors are. Most simple instructions on the Core 2 take one clock cycle to execute, hence a third of a nanosecond at 3.0Ghz. For reference, light only travels ~4 inches (10 cm) in the time taken by a clock cycle. It’s worth keeping this in mind when you’re thinking of optimization - instructions are comically cheap to execute nowadays. [...]
Great article!
Kudos from Indonesia!
[...] developer Gustavo Duarte explains exactly what your computer does while you wait, in a thorough, technical answer to the question “Why the hell is my computer taking so long [...]
Great article! I have the same processor overclocked to 3.8GHz on air, not a big boost, but a little bit helps, can’t remember my RAM speeds, I think my RAM latency is 5-5-5-15. Nice to know what the little buggers up to.
Where the heck did #22 come from? That’s just psychotic.
[...] how a computer actually works. This post gives some details on the inner workings of your computer. What Your Computer Does While You Wait People refer to the bottleneck between CPU and memory as the von Neumann bottleneck. Now, the front [...]
[...] how a computer actually works. This post gives some details on the inner workings of your computer. What Your Computer Does While You Wait People refer to the bottleneck between CPU and memory as the von Neumann bottleneck. Now, the front [...]
The comment
“… but in reality their contribution is allowing a disk to queue up writes and then perform them in one bunch, thereby allowing the disk to plan the order of the writes in a way that - surprise - minimizes seeks. Reads can also be grouped in this way for performance, and both the OS and the drive firmware engage in these optimizations.”
is misleading. Actually Native Command Queuing (NCQ) (the re-ordering to minimize seeks) is only available on certain hard drives (some SATA, SAS and SCSI). Also, NCQ improves performance when the disk requests are some what random. For example multiple users accessing different files on a network fileshare. It does not improve performance on desktop system that is running only one or two applications. In fact it might slow things down a bit because applications may be delayed by the optimization that minimizes seek times at the expensive of getting data to the application.
> Residential bandwidth still lags behind that of sustained
> hard drive reads, but the ‘network is the computer’ in a
> pretty literal sense now. What happens when the Internet
> is faster than a hard drive?
Perhaps not on a home PC, but in any datacenter worth the name, roundtrip to the next rack over is already faster than hitting disk. With lots of spindles (ie, database hosts) you can still get greater sustained throughput locally, but for very latency-sensitive applications it’s at the point now that the cheapest way to get 90 gigs of low-latency storage is a three hosts with 32G of RAM running memcached.
I don’t imagine it will be long before you can run applications out of S3 as fast as off your local disk. “Cloud computing” is a stupid term, but that doesn’t mean the underlying idea is wrong.
- BB
[...] Gustavo Duarte has a great chart illustrating what your computer does when it invariably hangs, waits, or slows [...]
I think someone forgot about spyware.
[...] What Your Computer Does While You Wait : Gustavo Duarte (tags: hardware optimization geek) [...]
I have to concur about the Gigabit Ethernet speeds–I have little difficulty achieving 90MB/s sustained with my NAS over HTTP. Most other protocols are somewhat slower, and I have done slightly better in my UDP experiments, but 90MB/s is a normal speed for actual file transfers for me.
[...] from there. Oh, and HE’S NEVER SLEPT. Thirty Vehicular You’re Doing It Wrong” Moments (PICS)” What your computer does while you wait Now there’s a lad with a bright future! [...]
Nice article. Can we get more inputs on how HyperTransport / Intel QPI would fit into the picture.
This was a good post, thanks for the good brush-up on computer architecture. Something I would really like to see though is “What does Windows do while you’re watching the hourglass?”
For instance, you right click on an icon, and it takes 2 seconds for the context menu to pop up. Or you click on the control panel menu, and it takes 10 seconds for the menu to come out. I’m sure that it’s timing out trying to access a network resource or something… but it would be nice to know
[...] developer Gustavo Duarte explains exactly what your computer does while you wait, in a thorough, technical answer to the question “Why the hell is my computer taking so long [...]
Hi!
First off, I love the post. Hours/week of studying in Operating Systems and Computer Architecture courses paid off!
I would like to make a poster of the combined north/south bridge diagram (for personal, non-commercial use). Would you mind if I did so?
[...] http://duartes.org/gustavo/blog/post/what-your-computer-does-while-you-wait [...]
[...] What Your Computer Does While You Wait : Gustavo Duarte (tags: interesting computer statistics hardware) [...]
#22 : neocon-religio-freak — Christ, that was hilarious!!
However, you really need to go back to grade school and ‘lern how too spel’, or clean all of the ‘defecate’ out of your tiny brain.
Gustavo;
that was the most interesting article I’ve read all day — certainly puts things into perspective. I can only suggest you might adjust the diagram to show bandwidth in relative, graphical terms — although I concede that might prove difficult (eg. 3 seconds : 15 months) may distort the scale.
[...] http://duartes.org/gustavo/blog/post/what-your-computer-does-while-you-wait « 3connect won’t open? [...]
[...] by damagednoob to programming [link] [150 comments] SHARETHIS.addEntry({ title: “What your computer does while you wait”, url: [...]
Great post, really informative. Now to get through the “What every programmer should know about memory” doc.
[...] came across a good diagram and article explaining why a computer runs as quickly (or as slowly) as it does. The article points out that various links [...]
Thanks Gustavo for this very informative article!
I also liked the document “What Every Programmer Should Know about Memory” - that’s a very thorough paper!
[...] article, What Your Computer Does While You Wait, ganders a very interesting look at how ass-twitching-annoyingly long it takes for your CPU to [...]
[...] Ο κύριος Gustavo Duarte, έγραψε ένα πολύ ενδιαφέρον άρθρο για την αρχιτεκτονική των σύγχρονων (intel core 2 duo) [...]
[...] developer Gustavo Duarte explains exactly what your computer does while you wait, in a thorough, technical answer to the question “Why the hell is my computer taking so long [...]
This was pretty interesting, I might combine the images, and make a little poster just for myself, I’ve always been interested in such things.
Post #22 was really just unnecesarry, I feel. What is really a waste of time, space and money is having to host that stupid message. Jeez.
I like this line, “A cool place to read about the impact of seeks is the paper where a couple of Stanford grad students describe the Anatomy of a Large-Scale Hypertextual Web Search Engine”. Find out who those grads are
[...] Hint: it’s not your processor. [...]
Thanks for the article. The diagrams are great!
[...] 2nd, 2008 @ 1:30 pm | Author: admin Software developer Gustavo Duarte explains exactly what your computer does while you wait, in a thorough, technical answer to the question “Why the hell is my computer taking so long [...]
[...] What Your Computer Does While You Wait : Gustavo Duarte [...]
[...] What Your Computer Does While You Wait : Gustavo Duarte [...]
[...] What Your Computer Does While You Wait : Gustavo Duarte [...]
[...] developer Gustavo Duarte explains exactly what your computer does while you wait, in a thorough, technical answer to the question “Why the hell is my computer taking so long [...]
[...] http://duartes.org/gustavo/blog/post/what-your-computer-does-while-you-wait [...]
Very cool and informative article. Like you said, it’s nice to put thing into perspectiva.
I remember talking about the same stuff many years ago, as I tried to explain to friends why - when the time came for CPUs to be fast enough, it would be faster to write compressed data to hard drives than uncompressed data. (Assuming the CPU would be fast enough to compress the data making it smaller so it would spend less time accessing the “slow” hard drive.)
And the same thing applies to the internet, as you said.
My computer just says don’t wait…later!
[...] What Your Computer Does While You Wait : Gustavo Duarte (tags: computer wait interesting) [...]
What font did you use for your diagrams? Thanks.
Thank you for the informative article. Added it to my wall in my office. Also to all the relevant persons leaving comments, please stop feeding the troll. Thanks again for a great article!
First I’d suggest labeling every number with Peak or observed, there’s a large difference. I’d also suggest any time you mention latency or bandwidth, that you mention the complimentary value as well. So if you mention 83ns of observed latency, mention both the observed bandwidth and the peak bandwidth (1333 MHz * 128 bits)=20.8GB/sec. For devices where random is significantly different than sequential access I’d suggest mentioning both numbers. So sequential and random for memory, and sequential and random for disk.
The throughput of 1 instruction per cycle is particularly bad, it can vary significantly higher and hugely lower based on what you are doing. Maybe mention the peak, and then the issue rate for a particular workload or benchmark?
Bandwidth would be interesting and useful for the caches.
FSB and ram isn’t the same thing. They often run at different speeds, and of course the mentioned DDR is connected to the north bridge not the FSB. So remove DDR3 from the FSB description.
For a real world memory bandwidth number I’d suggest McCalpin’s stream benchmark (ask google).
The PCI-e numbers you quote are for version 2.0, probably worth mentioning that. The usual PCI-e is 1/2 as fast.
If you are going to mention 83ns of memory latency, I’d mention the bandwidth as well (see stream above).
The disk again you should mention peak (from the specification for the drive) and observed. Keep in mind that reading say 1GB from a disk will vary often by a factor of 2 based on where you read the file. I bought a cheap 320GB disk drive gets 50MB/sec or so on the inside of the disk, and 115MB/sec on the outside.
GIG-e easily manages 90+ MB/sec, without Jumbo frames, of course that assumes an OS that does networking reasonably well. Linux does as do many others, you just have to be careful not to be disk limited.
If you need help collecting some of these numbers let me know.
Of course a related diagram for the Core i7 would be very interesting as well.
[...] What Your Computer Does While You Wait : Gustavo Duarte [...]
Thank you all very much for the feedback and the comments.
There are tons of comments bringing up good points. I’m going to post some replies tomorrow (Thursday) and do some updates on the post reflecting them.
cheers
@b: Absolutely. 99% of the time you want the cleanest, simplest code you can possible write. In a few hotspots, which you discover by PROFILING the code rather than guessing, you optimize for performance if it’s really called for.
@Jason: When I looked for benchmarks, I found a lot of folks capping out at 30MB/s, 40MB/s for gigabit ethernet. Certainly though OS, protocol, and network gear makes a difference. Do you have some benchmarks handy for gigabit ethernet?
@lrbell: thanks for the suggestion. I’ll keep USB 3.0 in mind when it’s time to update this thing.
@SirPwn4g3: that’s cool, I have not been overclocking lately, but I used to enjoy it a ton. Just no time
@zenium: this conflicts with the knowledge I have regarding the buffering behavior of disks. I may well be wrong, but I’d appreciate if you could point me to some resources covering this stuff.
When it comes to apps being delayed, true if the OS is doing it, certainly there is room for different I/O scheduling strategies depending on usage, which is what the Linux kernel offers. But this does not apply to the write back cache since the disk accepts the writes immediately and then proceeds to do them in a seek-minimizing way.
@BB: yep, totally agreed. That’s why I said residential there, in a data center or even LAN there are a lot of interesting possibilities. I am into memcached and other RAM solutions as well.
@Dan: I’m going to mention this in the post. Do you have any idea of how average throughputs work out? Most of the benchmarks I found were well below 90MB/s. I’d love more benchmarks on this.
@Billamama: thanks for the suggestion. I’ll queue this up in the !ideas.txt here
@Mike: you might want to try some Sysinternals tools (now Microsoft tools since they were bought out). They shed light into a look of questions like this for Windows.
@Nick: totally ok with me. I have however posted a combined diagram, which I linked to in the last sentence of the article. Feel free to use my materials, I appreciate a link back though or mention of where it came from.
@kryzstoff: HAH, funny you should mention this
I’m a Tufte fan and I had all sorts of fancy plans for the diagrams, for example I had thought of:
1. Making widths proportional to throughput. Maybe lengths.
2. Making the DISTANCES proportional to latency.
Given the brutal order-of-magnitude differences I experimented with a log scale and so on. But in the end, it looked like crap hahah and I don’t think it conveyed much
So I went for simple simple. I’m sure a more talented person could come up with a way to make it work though.
@Filipe: hey, check the last paragraph, I have a link to the combined image
@Carlos: exactly, compression on the disk is definitely an interesting side effect of all this stuff.
@Sithdartha: It’s Consolas 9pt in Visio 2007. I also set Visio to “Higher quality text display” under Tools > Options > View.
@Bill: thank you for the suggestions. You make a number of good points, but I need to work some of the information in a way that still keeps the diagrams simple and readable. There are tradeoffs involved, some of which pertain to the target audience and the appropriate level of detail. I appreciate the offer for help, I may write you with a couple of question when I find some time to work on the blog.
Again, thanks everybody for all the kind comments and feedback. It’s great to be able to help out however minimally.
Take care.
Great article.
I just want to part with these wise words :
Bandwith you can buy, latency is defined by nature.
Very informative, enjoyed reading it! Thanks
“It’s fascinating for me to look at all these numbers together and see how far we’ve come.”
That’s the same thing i thought =)
And the best has yet to come
@ 22,
You wouldn’t be able to use your computer to spread your thoughts if it weren’t for computer experts such as Mr. Duarte. You really should apologize for your overly passionate rant above.
Please try to look at the bigger picture in like, I think you might enjoy yourself if you tone down the emphasize on Jesus…You know, God already knows you appreciate him, so there really is no need to spread your message that way. Good luck in life bud.
Ohh, and I don’t think I want to join your organization there, Republicans for Jesus; I hope the other members aren’t as over the top as you are…
Seriously, good luck.
Very informative and well written, thanks!
[...] News link: here [...]
My patient waiting for a new article was finally rewarded, thanks Gustavo!
I would like to comment the analogies, they don’t feel right to me. When you look at the numbers, everything fits, but the _perceived_ delay is much longer; especially with the “1 year 3 months” case.
I interpreted that as “one year and 3 months of standing still”, but then I audited my thought and remembered that in practice the CPU is doing other things (for another process) while the HDD fetches a new chunk of data.
In other words, this can mislead some people (ex: journalists who “translate” articles like these for the masses); for them it is better to convert everything to Libraries of Congress, just to be sure. Otherwise modern systems will be perceived as horribly inefficient.
Hi Alex,
Thanks a ton for the feedback.
That’s a good point you bring up. In a server environment with lots of tasks going on, it is true that you might have workloads that keep the CPU busy by a combination of multiple processes / decent balance between I/O and CPU usage.
In fact, top notch data centers always aim for that, to come up with workloads that keep the processor busy, since efficiency/money and efficiency/electricity is a big concern and idle hardware is money down the drain.
However, I often see servers where what you describe IS the case - the CPUs are always idling whereas the disks are very busy. Sometimes the server is dog slow - due to I/O - while the CPU is barely breaking 25% usage (one core out of four).
So I think modern systems _indeed_ became horribly inefficient. This million-cycle business is terrible. I think that’s why Jeremy Zawodny said the post “made him sad”.
We need to improve this. Maybe SSDs, maybe something else, but something’s gotta give. CPU stalls due to disk are now catastrophic.
Nice to see you again
Great Post Gustavo! The diagrams are great and the analogies help non-technical people get a grasp of what is going on at the time scales that matter to a Computer instruction cycle.
Any thoughts on updating the post once the newer architectures from Intel such as the HyperConnect for the i7 become more popular?
@Wanderer: thanks
Yep, a couple of people brought up the i7, I hope to write something on it
[...] fed to the CPU, where optimizations such as branch prediction, memory prefetching and caching have drastic performance implications. What’s worse, much of the above can and does change between different versions of compilers, [...]
[...] blogger Gustavo Duarte oferece-nos uma excelente resposta a esta questão, no seu post “What Your Computer Does While You Wait“ Filed under: Informática [...]
[...] blogger Gustavo Duarte oferece-nos uma excelente resposta a esta questão, no seu post “What Your Computer Does While You Wait“. Filed under: Informática [...]
Good article.
It matches other things I’ve been reading. This pyramid of latency and the corresponding trends will have a profound impact on how we program.
Here are some more pointers on recent computer trends: http://blog.monstuff.com/archives/000333.html
In particular, one of the direct consequences on the numbers you present above is that any large data processing needs to read from the disk sequentially (no seeks), like tapes. That’s the design behind Google MapReduce, Hadoop and Cosmos/Dryad: http://www.lexemetech.com/2008/03/disks-have-become-tapes.html
Hi,
A very good article. The analogy given in comparing speed among the various memory architecture layers is superb. The article discusses in taking the latest trend in Intel technology - Dual core. This logic is quite commendable. Overall, I believe,every reader might have enjoyed this article!
.. #22 was obviously bad satire..
Anyway, good post.
Great analogies, I was having to come up with a few weird ones of my own when people asked me what possible diference could an extra 8MB cache on the HDD make. Now I just rattle off the paper example used by you in the article. Thanks again.
[...] by the OS when it comes to files. The first one is the mind-blowing slowness of hard drives, and disk seeks in particular, relative to memory. The second is the need to load file contents in physical memory once and share [...]
[...] by the OS when it comes to files. The first one is the mind-blowing slowness of hard drives, and disk seeks in particular, relative to memory. The second is the need to load file contents in physical memory once and share [...]
Hi Gustavo,
a bit late but I just found your blog last week (following a quote of the week from LWN). Great article that IMO just missed the punchline spelled out, so what is our computer doing when we wait? It’s waiting, too!
I also wanted to complain on your 30MB/s GigE figure. You asked for benchmarks that show 1000Base Ethernet to perform better than that. Here is one using netio between two HP ProLiant class servers (none of them is too new, still P4-Xeon class, and one of them is even pre-PCIe):
[root@alucard ~]# ./netio-126 -t dracula
NETIO - Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel
TCP connection established.
Packet size 1k bytes: 114411 KByte/s Tx, 111119 KByte/s Rx.
Packet size 2k bytes: 114058 KByte/s Tx, 111604 KByte/s Rx.
Packet size 4k bytes: 113891 KByte/s Tx, 113995 KByte/s Rx.
Packet size 8k bytes: 113587 KByte/s Tx, 114077 KByte/s Rx.
Packet size 16k bytes: 111699 KByte/s Tx, 113767 KByte/s Rx.
Packet size 32k bytes: 112168 KByte/s Tx, 84575 KByte/s Rx.
Done.
There is a physical distance of 9km separating these two servers. Each one is connected using 1000BaseT to a switch, the switches are connected using 10GBase-LR (servers are in a single broadcast domain). I’m testing from a CentOS 4.latest, the other side is running W2k3. There are no jumbo frames in use (or, for that matter, any kind of non-standard optimization).
Whenever a poor measurement is seen in the wild, it’s usually one or a combination of:
* Crappy hardware. Not so often today with PCIe, but earlier incarnations
of 1000BaseT NICs were often attached insufficiently (PCI 32@33).
There’s also still a difference between an Intel Server NIC and the
typical RealTek cheap-as-dirt stuff. Offloading to the NIC (Checksums
or even the entire TCP processing) also helps a lot.
* Inadequate measuring means. SMB is completely useless, as it is both an
inefficient protocol involving ping-pong latency (you don’t achieve
10MB/s on 100Base either) and is usually disk-and-filesystem-backed.
* Challenged transport protocol implementations. The bandwidth-latency-
product of GigE is demanding, some older TCP stacks cannot really make
use of it (especially with multiple switch hops between the end systems)
So when measuring adequate hardware (both the CPU and I/O subsystem including the NIC have to fit in here) and software (a modern server OS with properly designed TCP stack [window scaling etc] and a pure TCP pumping benchmark program like netio or iperf) you will see 100% standard compliant GigE doing significantly more than 90MB/s without a hitch.
HTH and keep on the good work,
Andre.
[...] artículo lo encontré en el blog de Gustavo Duarte y me pareció tan bueno y completo, que no dudé ni un segundo en pedirle los permisos necesarios [...]
[...] Why your computer is slow Published December 1, 2008 Hardware , Uncategorized Tags: PC Why your computer is slow [...]
Requirements…
Requirements End user response time 250ms for all pages 100ms roundtrip between request and response our side…….
@abpsoft: thank you for posting the benchmarks. I have fixed the diagram to reflect them (I went with 100 MB/s). And sorry for the delay in the reply.
Hello!
Very Interesting post! Thank you for such interesting resource!
PS: Sorry for my bad english, I’v just started to learn this language
See you!
Your, Raiul Baztepo
good article.
regards
shivlu jain
[...] What Your Computer Does While You Wait [...]
Here’s a possible source for gigabit throughput numbers: http://www.smallnetbuilder.com/component/option,com_chart/Itemid,189/
[...] by the OS when it comes to files. The first one is the mind-blowing slowness of hard drives, and disk seeks in particular, relative to memory. The second is the need to load file contents in physical memory once and share [...]
[...] your computer does while you wait By suprgeek Gustavo has a great post on his blog What your computer does while you wait. It has a couple of very excellent diagrams detailing the state of affairs and some nifty work [...]