For a number of years I’ve configured my desktops so that most tasks can be done
using only home row keys on the keyboard, a technique I call home row
computing. It takes the Vi idea of staying on the home row to every app, all
the time, but without using modes so things are simpler.
I’ve described an implementation for Windows, but I have since
moved to Macs and back to a qwerty keyboard (away from Dvorak). The current
setup is described in this post. It uses familiar Vi key bindings and is far more
suitable. It’s fairly painless to configure on the Mac and has never given me
any problems, thanks to Takayama Fumihiko’s awesome keyboard apps.
Using this is a joy. It’s really fast, easy on the hands, and makes you feel
like a geek god. If you don’t use Vim, you’ll now have one of its benefits in
your favorite editor and in other apps, plus a weapon against smug Vimmers. If
you already use Vim, your cherished hjkl keys become universal and pressing
Esc gets a hell of a lot easier.
Some of the important keys that must be moved to home row are the arrow keys,
Esc, delete (backspace) and forward delete. Another helpful home row
task is moving and resizing windows. The key to all this is remapping Caps Lock to allow combinations of Caps Lock plus a home key to do these tasks.
Again, there are no modes involved here, Caps Lock works as a modifier like
the cmd and fn keys. Here’s a good start:
I have left several keys unmapped so you can customize your own setup, and we’ll
get to window management in a moment. The first step is to set Caps Lock to
No Action in System Preferences > Keyboard > Modifier keys:
Now we must remap the Caps Lock key code to something else. To do so, you need
a small tool called Seil
(open source). You can map Caps Lock to any
other key, like cmd or option. So if you don’t want to go all-out home row,
you can still benefit from the remapping.
I like to remap Caps Lock into something that guarantees no conflicts ever
for our combos. So I use key code 110, which is the Apps key on a Windows
keyboard and is safely absent from Apple keyboards:
Now we’re in business, the world - or at least the keyboard - is our oyster. The
maker of Seil also makes Karabiner,
open as well and an outstanding
keyboard customizer for OS X. I have no affiliation with these tools, apart from
being a happy user for years. If you end up using them, please donate. So go
ahead and install Karabiner, and you’ll see a plethora of keyboard tweak
possibilities:
Each of the tweaks can be toggled on and off. There are even native Vi, Vim, and
Emacs modes. However, I don’t like the built-in ones, so I built my own config.
Go to Misc & Uninstall and click Open private.xml:
In this file, ~/Library/Application Support/Karabiner/private.xml, you can
define your own keyboard remapping scheme. I actually symlink that to
a Dropbox file to keep the configuration consistent across my machines, but
at any rate, here is a file you can use to implement what we have
discussed so far. Drop the file in, click ReloadXML and you’ll have this:
Home Row Computing is at the top (prefixed with ! for sorting). Toggle it on,
and you’re done. Enjoy your new keyboard layout, do a search on Spotlight and
see how fast and smooth it is to choose an option.
Finally, there is window management. That’s an area where you can fumble quite
a bit, resizing and moving about clumsily with a mouse. My favorite options to
make it fast and homerow-friendly are
ShiftIt (open) and
Moom (best $10 I ever spent, no affiliation).
There are some others, but to me Moom towers above the rest. It has a great
two-step usage, where one hot key activates it:
And the following key triggers a command you get to define using window
primitives like move, zoom, resize, and change monitors. You can also define
shortcuts that run commands directly. Moom has some handy default actions:
Out of box, arrow keys can be used to send a window to the left, right, top, or
bottom of the screen, and Moom natively interprets hjkl as arrows making it easy
to stay on home row. You can associate keys with various commands and precise
window positions:
This is gold for large monitors like Apple Thunderbolts.
I remap Caps Lock + M into the global Moom shortcut for painless activation.
This allows me to set the shortcut itself to something bizarre that won’t
conflict with anything but would be a dog to type. Currently it’s an
improbable Fn + Control + Command + M.
I also have Caps Lock + N activating a Moom command that cycles a window
between my two monitors. Both of these shortcuts are in the keyboard map
I provided.
If you have any questions, let me know. I know a number of keyboard nuts out
there use this scheme on Windows and Linux, and I hope this makes it easy to do
so on Macs.
I hate to break it to you, but a user application is a helpless brain in a vat:
Every interaction with the outside world is mediated by the kernel through
system calls. If an app saves a file, writes to the terminal, or opens a TCP
connection, the kernel is involved. Apps are regarded as highly suspicious: at
best a bug-ridden mess, at worst the malicious brain of an evil genius.
These system calls are function calls from an app into the kernel. They use
a specific mechanism for safety reasons, but really you’re just calling the
kernel’s API. The term “system call” can refer to a specific function offered by
the kernel (e.g., the open() system call) or to the calling mechanism. You
can also say syscall for short.
This post looks at system calls, how they differ from calls to a library, and
tools to poke at this OS/app interface. A solid understanding of what happens
within an app versus what happens through the OS can turn an impossible-to-fix
problem into a quick, fun puzzle.
So here’s a running program, a user process:
It has a private virtual address space, its very own memory sandbox.
The vat, if you will. In its address space, the program’s binary file plus the
libraries it uses are all memory mapped. Part of the address
space maps the kernel itself.
Below is the code for our program, pid, which simply retrieves its process id
via getpid(2):
intmain() { pid_t p = getpid(); printf("%d\n", p); }
In Linux, a process isn’t born knowing its PID. It must ask the kernel, so this
requires a system call:
It all starts with a call to the C library’s getpid(), which is
a wrapper for the system call. When you call functions like open(2),
read(2), and friends, you’re calling these wrappers. This is true for many
languages where the native methods ultimately end up in libc.
Wrappers offer convenience atop the bare-bones OS API, helping keep the kernel
lean. Lines of code is where bugs live, and all kernel code runs in privileged
mode, where mistakes can be disastrous. Anything that can be done in user mode
should be done in user mode. Let the libraries offer friendly methods and fancy
argument processing a la printf(3).
Compared to web APIs, this is analogous to building the simplest possible HTTP
interface to a service and then offering language-specific libraries with
helper methods. Or maybe some caching, which is what libc’s
getpid() does: when first called it actually performs a system
call, but the PID is then cached to avoid the syscall overhead in subsequent
invocations.
Once the wrapper has done its initial work it’s time to jump into
hyperspace the kernel. The mechanics of this transition vary by
processor architecture. In Intel processors, arguments and the
syscall number are loaded into registers,
then an instruction is executed to put the CPU
in privileged mode and immediately transfer control to a global syscall
entry point within the kernel. If you’re interested in
details, David Drysdale has two great articles in LWN (first,
second).
The kernel then uses the syscall number as an index into
sys_call_table, an array of function pointers to each syscall implementation.
Here, sys_getpid is called:
In Linux, syscall implementations are mostly arch-independent C functions,
sometimes trivial, insulated from the syscall mechanism by
the kernel’s excellent design. They are regular code working on general data
structures. Well, apart from being completely paranoid about argument
validation.
Once their work is done they return normally, and the arch-specific code takes
care of transitioning back into user mode where the wrapper does some post
processing. In our example, getpid(2) now caches the PID returned by the
kernel. Other wrappers might set the global errno variable if the kernel
returns an error. Small things to let you know GNU cares.
If you want to be raw, glibc offers the syscall(2) function, which makes
a system call without a wrapper. You can also do so yourself in assembly.
There’s nothing magical or privileged about a C library.
This syscall design has far-reaching consequences. Let’s start with the
incredibly useful strace(1), a tool you can use to spy on system calls made by
Linux processes (in Macs, see dtruss(1m) and the amazing dtrace; in Windows,
see sysinternals). Here’s strace on pid:
Each line of output shows a system call, its arguments, and a return value.
If you put getpid(2) in a loop running 1000 times, you would still have only
one getpid() syscall because of the PID caching. We can also see that
printf(3) calls write(2) after formatting the output string.
strace can start a new process and also attach to an already running one. You
can learn a lot by looking at the syscalls made by different programs. For
example, what does the sshd daemon do all day?
~/code/x86-os$ sudo strace -p 12218 Process 12218 attached - interrupt to quit select(7, [3 4], NULL, NULL, NULL
[ ... nothing happens ... No fun, it's just waiting for a connection using select(2) If we wait long enough, we might see new keys being generated and so on, but let's attach again, tell strace to follow forks (-f), and connect via SSH ]
~/code/x86-os$ sudo strace -p 12218 -f
[lots of calls happen during an SSH login, only a few shown]
SSH is a large chunk to bite off, but it gives a feel for strace usage. Being
able to see which files an app opens can be useful (“where the hell is this
config coming from?”). If you have a process that appears stuck, you can strace
it and see what it might be doing via system calls. When some app is quitting
unexpectedly without a proper error message, check if a syscall failure explains
it. You can also use filters, time each call, and so so:
I encourage you to explore these tools in your OS. Using them well is like
having a super power.
But enough useful stuff, let’s go back to design. We’ve seen that a userland app
is trapped in its virtual address space running in ring 3 (unprivileged). In
general, tasks that involve only computation and memory accesses do not
require syscalls. For example, C library functions like strlen(3) and
memcpy(3) have nothing to do with the kernel. Those happen within the app.
The man page sections for a C library function (the 2 and 3 in parenthesis) also
offer clues. Section 2 is used for system call wrappers, while section
3 contains other C library functions. However, as we saw with printf(3),
a library function might ultimately make one or more syscalls.
If you’re curious, here are full syscall listings for Linux
(also Filippo’s list) and
Windows. They have ~310 and ~460 system
calls, respectively. It’s fun to look at those because, in a way, they represent
all that software can do on a modern computer. Plus, you might find gems to
help with things like interprocess communication and performance. This is an
area where “Those who do not understand Unix are condemned to reinvent it,
poorly.”
Many syscalls perform tasks that take eons compared to CPU cycles, for
example reading from a hard drive. In those situations the calling process is
often put to sleep until the underlying work is completed. Because CPUs are so
fast, your average program is I/O bound and spends most of its life
sleeping, waiting on syscalls. By contrast, if you strace a program busy with
a computational task, you often see no syscalls being invoked. In such a case,
top(1) would show intense CPU usage.
The overhead involved in a system call can be a problem. For example, SSDs are
so fast that general OS overhead can be more expensive than the I/O
operation itself. Programs doing large numbers of reads and writes can also have
OS overhead as their bottleneck. Vectored I/O can help some. So can
memory mapped files, which allow a program to read and write from
disk using only memory access. Analogous mappings exist for things like video
card memory. Eventually, the economics of cloud computing might lead us to
kernels that eliminate or minimize user/kernel mode switches.
Finally, syscalls have interesting security implications. One is that no matter
how obfuscated a binary, you can still examine its behavior by looking at the
system calls it makes. This can be used to detect malware, for example. We can
also record profiles of a known program’s syscall usage and alert on deviations,
or perhaps whitelist specific syscalls for programs so that exploiting
vulnerabilities becomes harder. We have a ton of research in this area, a number
of tools, but not a killer solution yet.
And that’s it for system calls. I’m sorry for the length of this post, I hope it
was helpful. More (and shorter) next week, RSS and Twitter. Also, last night
I made a promise to the universe. This post is dedicated to the glorious Clube
Atlético Mineiro.
In the last post I said the fundamental axiom of OS behavior is that at any
given time, exactly one and only one task is active on a CPU. But if
there’s absolutely nothing to do, then what?
It turns out that this situation is extremely common, and for most personal
computers it’s actually the norm: an ocean of sleeping processes, all waiting on
some condition to wake up, while nearly 100% of CPU time is going into the
mythical “idle task.” In fact, if the CPU is consistently busy for a normal
user, it’s often a misconfiguration, bug, or malware.
Since we can’t violate our axiom, some task needs to be active on a CPU.
First because it’s good design: it would be unwise to spread special cases all
over the kernel checking whether there is in fact an active task. A design is
far better when there are no exceptions. Whenever you write an if statement,
Nyan Cat cries. And second, we need to do something with all those idle CPUs,
lest they get spunky and, you know, create Skynet.
So to keep design consistency and be one step ahead of the devil, OS developers
create an idle task that gets scheduled to run when there’s no other work.
We have seen in the Linux boot process that the idle task is process 0,
a direct descendent of the very first instruction that runs when a computer is
first turned on. It is initialized in rest_init, where init_idle_bootup_task
initializes the idle scheduling class.
Briefly, Linux supports different scheduling classes for things like real-time
processes, regular user processes, and so on. When it’s time to choose a process
to become the active task, these classes are queried in order of priority. That
way, the nuclear reactor control code always gets to run before the web browser.
Often, though, these classes return NULL, meaning they don’t have a suitable
process to run - they’re all sleeping. But the idle scheduling class, which runs
last, never fails: it always returns the idle task.
That’s all good, but let’s get down to just what exactly this idle task is
doing. So here is cpu_idle_loop, courtesy of open source:
cpu_idle_loop
1 2 3 4 5 6 7 8 9 10 11
while (1) { while(!need_resched()) { cpuidle_idle_call(); }
/* [Note: Switch to a different task. We will return to this loop when the idle task is again selected to run.] */ schedule_preempt_disabled(); }
I’ve omitted many details, and we’ll look at task switching closely later on,
but if you read the code you’ll get the gist of it: as long as there’s no need
to reschedule, meaning change the active task, stay idle. Measured in elapsed
time, this loop and its cousins in other OSes are probably the most executed
pieces of code in computing history. For Intel processors, staying idle
traditionally meant running the halt instruction:
hlt stops code execution in the processor and puts it in a halted state. It’s
weird to think that across the world millions and millions of Intel-like CPUs
are spending the majority of their time halted, even while they’re powered up.
It’s also not terribly efficient, energy wise, which led chip makers to develop
deeper sleep states for processors, which trade off less power consumption for
longer wake-up latency. The kernel’s cpuidle subsystem is
responsible for taking advantage of these power-saving modes.
Now once we tell the CPU to halt, or sleep, we need to somehow bring it back to
life. If you’ve read the last post, you might suspect interrupts are
involved, and indeed they are. Interrupts spur the CPU out of its halted state
and back into action. So putting this all together, here’s what your system
mostly does as you read a fully rendered web page:
Other interrupts besides the timer interrupt also get the processor moving
again. That’s what happens if you click on a web page, for example: your mouse
issues an interrupt, its driver processes it, and suddenly a process is runnable
because it has fresh input. At that point need_resched() returns true, and the
idle task is booted out in favor of your browser.
But let’s stick to idleness in this post. Here’s the idle loop over time:
In this example the timer interrupt was programmed by the kernel to happen every
4 milliseconds (ms). This is the tick period. That means we get 250 ticks per
second, so the tick rate or tick frequency is 250 Hz. That’s a typical value
for Linux running on Intel processors, with 100 Hz being another crowd favorite.
This is defined in the CONFIG_HZ option when you build the kernel.
Now that looks like an awful lot of pointless work for an idle CPU, and it is.
Without fresh input from the outside world, the CPU will remain stuck in this
hellish nap getting woken up 250 times a second while your laptop battery is
drained. If this is running in a virtual machine, we’re burning both power and
valuable cycles from the host CPU.
The solution here is to have a dynamic tick so that when the CPU is idle, the
timer interrupt is either deactivated or reprogrammed to
happen at a point where the kernel knows there will be work to do (for
example, a process might have a timer expiring in 5 seconds, so we must not
sleep past that). This is also called tickless mode.
Finally, suppose you have one active process in a system, for example
a long-running CPU-intensive task. That’s nearly identical to an idle system:
these diagrams remain about the same, just substitute the one process for the
idle task and the pictures are accurate. In that case it’s still pointless to
interrupt the task every 4 ms for no good reason: it’s merely OS jitter slowing
your work ever so slightly. Linux can also stop the fixed-rate tick in this
one-process scenario, in what’s called adaptive-tick mode. Eventually,
a fixed-rate tick may be gone altogether.
That’s enough idleness for one post. The kernel’s idle behavior is an important
part of the OS puzzle, and it’s very similar to other situations we’ll see, so
this helps us build the picture of a running kernel. More next week, RSS and
Twitter.
Here’s a question: in the time it takes you to read this sentence, has your OS
been running? Or was it only your browser? Or were they perhaps both idle,
just waiting for you to do something already?
These questions are simple but they cut through the essence of how software
works. To answer them accurately we need a good mental model of OS behavior,
which in turn informs performance, security, and troubleshooting decisions.
We’ll build such a model in this post series using Linux as the primary OS, with
guest appearances by OS X and Windows. I’ll link to the Linux kernel sources
for those who want to delve deeper.
The fundamental axiom here is that at any given moment, exactly one task is
active on a CPU. The task is normally a program, like your browser or music
player, or it could be an operating system thread, but it is one task. Not
two or more. Never zero, either. One. Always.
This sounds like trouble. For what if, say, your music player hogs the CPU and
doesn’t let any other tasks run? You would not be able to open a tool to kill
it, and even mouse clicks would be futile as the OS wouldn’t process them. You
could be stuck blaring “What does the fox say?” and incite a workplace riot.
That’s where interrupts come in. Much as the nervous system interrupts the
brain to bring in external stimuli - a loud noise, a touch on the shoulder - the
chipset in a computer’s motherboard interrupts the CPU to deliver news of
outside events - key presses, the arrival of network packets, the completion of
a hard drive read, and so on. Hardware peripherals, the interrupt controller on
the motherboard, and the CPU itself all work together to implement these
interruptions, called interrupts for short.
Interrupts are also essential in tracking that which we hold dearest: time.
During the boot process the kernel programs a hardware timer to issue timer
interrupts at a periodic interval, for example every 10 milliseconds.
When the timer goes off, the kernel gets a shot at the CPU to update system
statistics and take stock of things: has the current program been running for
too long? Has a TCP timeout expired? Interrupts give the kernel a chance to both
ponder these questions and take appropriate actions. It’s as if you set periodic
alarms throughout the day and used them as checkpoints: should I be doing what
I’m doing right now? Is there anything more pressing? One day you find ten
years have got behind you.
These periodic hijackings of the CPU by the kernel are called ticks, so
interrupts quite literally make your OS tick. But there’s more: interrupts are
also used to handle some software events like integer overflows and page faults,
which involve no external hardware. Interrupts are the most frequent and
crucial entry point into the OS kernel. They’re not some oddity for the EE
people to worry about, they’re the mechanism whereby your OS runs.
Enough talk, let’s see some action. Below is a network card interrupt in an
Intel Core i5 system. The diagrams now have image maps, so you can click on
juicy bits for more information. For example, each device links to its Linux
driver.
Let’s take a look at this. First off, since there are many sources of
interrupts, it wouldn’t be very helpful if the hardware simply told the CPU
“hey, something happened!” and left it at that. The suspense would be
unbearable. So each device is assigned an interrupt request line, or IRQ,
during power up. These IRQs are in turn mapped into interrupt vectors,
a number between 0 and 255, by the interrupt controller. By the time an
interrupt reaches the CPU it has a nice, well-defined number insulated from the
vagaries of hardware.
The CPU in turn has a pointer to what’s essentially an array of 255 functions,
supplied by the kernel, where each function is the handler for that
particular interrupt vector. We’ll look at this array, the Interrupt Descriptor Table (IDT), in more detail later on.
Whenever an interrupt arrives, the CPU uses its vector as an index into the
IDT and runs the appropriate handler. This happens as a special function call
that takes place in the context of the currently running task, allowing the OS
to respond to external events quickly and with minimal overhead. So web servers
out there indirectly call a function in your CPU when they send you data,
which is either pretty cool or terrifying. Below we show a situation where
a CPU is busy running a Vim command when an interrupt arrives:
Notice how the interrupt’s arrival causes a switch to kernel mode
and ring zero but it does not change the active task. It’s as if Vim made
a magic function call straight into the kernel, but Vim is still there, its
address space intact, waiting for that call to return.
Exciting stuff! Alas, I need to keep this post-sized, so let’s finish up for
now. I understand we have not answered the opening question and have in fact
opened up new questions, but you now suspect ticks were taking place while
you read that sentence. We’ll find the answers as we flesh out our model of
dynamic OS behavior, and the browser scenario will become clear. If you
have questions, especially as the posts come out, fire away and I’ll try to
answer them in the posts themselves or as comments. Next installment is
tomorrow on RSS and Twitter.
The last post in this series looks at closures, objects, and other creatures
roaming beyond the stack. Much of what we’ll see is language neutral, but I’ll
focus on JavaScript with a dash of C. Let’s start with a simple C program that
reads a song and a band name and outputs them back to the user:
puts("Enter song, then band:"); song = read(); band = read();
printf("\n%sby %s", song, band);
return0; }
If you run this gem, here’s what you get (=> denotes program output):
1 2 3 4 5 6 7
./stackFolly => Enter song, then band: The Past is a Grotesque Animal of Montreal
=> ?ǿontreal => by ?ǿontreal
Ayeee! Where did things go so wrong? (Said every C beginner, ever.)
It turns out that the contents of a function’s stack variables are only valid
while the stack frame is active, that is, until the function returns. Upon
return, the memory used by the stack frame is deemed free and
liable to be overwritten in the next function call.
Below is exactly what happens in this case. The diagrams now have image maps,
so you can click on a piece of data to see the relevant gdb output (gdb commands
are here). As soon as read() is done with the song
name, the stack is thus:
At this point, the song variable actually points to the song name. Sadly, the
memory storing that string is ready to be reused by the stack frame of
whatever function is called next. In this case, read() is called again, with
the same stack frame layout, so the result is this:
The band name is read into the same memory location and overwrites the
previously stored song name. band and song end up pointing to the exact
same spot. Finally, we didn’t even get “of Montreal” output correctly. Can you
guess why?
And so it happens that the stack, for all its usefulness, has this serious
limitation. It cannot be used by a function to store data that needs to outlive
the function’s execution. You must resort to the heap and say
goodbye to the hot caches, deterministic instantaneous operations, and easily
computed offsets. On the plus side, it works:
The price is you must now remember to free() memory or take a performance hit
on a garbage collector, which finds unused heap objects and frees them. That’s
the fundamental tradeoff between stack and heap: performance vs. flexibility.
Most languages’ virtual machines take a middle road that mirrors what
C programmers do. The stack is used for value types, things like integers,
floats and booleans. These are stored directly in local variables and object
fields as a sequence of bytes specifying a value (like argc above). In
contrast, heap inhabitants are reference types such as strings and
objects. Variables and fields contain a memory address that
references these objects, like song and band above.
Consider this JavaScript function:
1 2 3 4 5
functionfn() { var a = 10; var b = { name: 'foo', n: 10 }; }
This might produce the following:
I say “might” because specific behaviors depend heavily on implementation. This
post takes a V8-centric approach with many diagram shapes linking to relevant
source code. In V8, only small integers are
stored as values. Also,
from now on I’ll show strings directly in objects to reduce visual noise, but
keep in mind they exist separately in the heap, as shown above.
Now let’s take a look at closures, which are simple but get weirdly hyped up and
mythologized. Take a trivial JS function:
1 2 3 4 5
functionadd(a, b) { var c = a + b; return c; }
This function defines a lexical scope, a happy little kingdom where the
names a, b, and c have precise meanings. They are the two parameters and
one local variable declared by the function. The program might use those same
names elsewhere, but within addthat’s what they refer to. And while
lexical scope is a fancy term, it aligns well with our intuitive understanding:
after all, we can quite literally see the bloody thing, much as a lexer
does, as a textual block in the program’s source.
Having seen stack frames in action, it’s easy to imagine an implementation for
this name specificity. Within add, these names refer to stack locations
private to each running instance of the function. That’s in fact how it
often plays out in a VM.
var hi = makeGreeter(); hi('dear reader'); // prints "hi, dear reader"
That’s more interesting. Function hi is built at runtime within makeGreeter.
It has its own lexical scope, where name is an argument on the stack, but
visually it sure looks like it can access its parent’s lexical scope as well,
which it can. Let’s take advantage of that:
var heya = makeGreeter('HEYA'); heya('dear reader'); // prints "HEYA, dear reader"
A little strange, but pretty cool. There’s something about it though that
violates our intuition: greeting sure looks like a stack variable, the kind
that should be dead after makeGreeter() returns. And yet, since greet()
keeps working, something funny is going on. Enter the closure:
The VM allocated an object to store the parent variable used by the inner
greet(). It’s as if makeGreeter's lexical scope had been closed over at
that moment, crystallized into a heap object for as long as needed (in this case,
the lifetime of the returned function). Hence the name closure, which makes
a lot of sense when you see it that way. If more parent variables had been used
(or captured), the Context object would have more properties, one per
captured variable. Naturally, the code emitted for greet() knows to read
greeting from the Context object, rather than expect it on the stack.
var greeter = makeGreeter(["hi", "hello", "howdy"]) greeter.hi('poppet'); // prints "howdy, poppet" greeter.hello('darling'); // prints "howdy, darling" greeter.count(); // returns 2
Well… count() works, but our greeter is stuck in howdy. Can you tell why?
What we’re doing with count is a clue: even though the lexical scope is closed
over into a heap object, the values taken by the variables (or object
properties) can still be changed. Here’s what we have:
There is one common context shared by all functions. That’s why count works.
But the greeting is also being shared, and it was set to the last value iterated
over, “howdy” in this case. That’s a pretty common error, and the easiest way to
avoid it is to introduce a function call to take the closed-over variable as an
argument. In CoffeeScript, the do command provides an easy way to
do so. Here’s a simple solution for our greeter:
var greeter = makeGreeter(["hi", "hello", "howdy"]) greeter.hi('poppet'); // prints "hi, poppet" greeter.hello('darling'); // prints "hello, darling" greeter.count(); // returns 2
It now works, and the result becomes:
That’s a lot of arrows! But here’s the interesting feature: in our code, we
closed over two nested lexical contexts, and sure enough we get two linked
Context objects in the heap. You could nest and close over many lexical
contexts, Russian-doll style, and you end up with essentially a linked list of
all these Context objects.
Of course, just as you can implement TCP over carrier pigeons, there are many
ways to implement these language features. For example, the ES6 spec defines
lexical environments as consisting of an environment record (roughly, the
local identifiers within a block) plus a link to an outer environment record,
allowing the nesting we have seen. The logical rules are nailed by the spec
(one hopes), but it’s up to the implementation to translate them into bits and
bytes.
You can also inspect the assembly code produced by V8 for specific cases.
Vyacheslav Egorov has great posts and explains this process along with
V8 closure internals in detail. I’ve only started studying V8, so
pointers and corrections are welcome. If you know C#, inspecting the IL code
emitted for closures is enlightening - you will see the analog of V8 Contexts
explicitly defined and instantiated.
Closures are powerful beasts. They provide a succinct way to hide information
from a caller while sharing it among a set of functions. I love that they
truly hide your data: unlike object fields, callers cannot access or even
see closed-over variables. Keeps the interface cleaner and safer.
But they’re no silver bullet. Sometimes an object nut and a closure fanatic will
argue endlessly about their relative merits. Like most tech discussions, it’s
often more about ego than real tradeoffs. At any rate, this epic koan by
Anton van Straaten settles the issue:
The venerable master Qc Na was walking with his student, Anton. Hoping to
prompt the master into a discussion, Anton said “Master, I have heard that
objects are a very good thing - is this true?” Qc Na looked pityingly at
his student and replied, “Foolish pupil - objects are merely a poor man’s
closures.”
Chastised, Anton took his leave from his master and returned to his cell,
intent on studying closures. He carefully read the entire “Lambda: The
Ultimate…” series of papers and its cousins, and implemented a small
Scheme interpreter with a closure-based object system. He learned much, and
looked forward to informing his master of his progress.
On his next walk with Qc Na, Anton attempted to impress his master by
saying “Master, I have diligently studied the matter, and now understand
that objects are truly a poor man’s closures.” Qc Na responded by hitting
Anton with his stick, saying “When will you learn? Closures are a poor man’s
object.” At that moment, Anton became enlightened.
And that closes our stack series. In the future I plan to cover other language
implementation topics like object binding and vtables. But the call of the
kernel is strong, so there’s an OS post coming out tomorrow. I invite you to
subscribe and follow me.