# Buttery Smooth Emacs
Daniel Colascione, Friday, October 27, 2016
Emacs has flickered for 30 years. Now, it should be flicker-free. I've
just [landed][1] support for double-buffered rendering for the X11
port. Now you should be able to edit, resize, and introduce bugs in
your awful codebase without seeing a partially-rendered buffer or
being incited to murder by barely-perceptible white flashes while
editing that disappear when you look at them.
## History
You might say, "That's great, but double-buffered rendering is the
textbook solution to the problem of displaying incomplete rendering to
users and driving them to kill their dogs in maniacal frustration.".
That's true, but Emacs predates those textbooks. GNU Emacs is an
old-school C program emulating a 1980s Symbolics Lisp Machine
emulating an old-fashioned Motif-style Xt toolkit emulating a 1970s
text terminal emulating a 1960s teletype. Compiling Emacs is a
challenge. Adding modern rendering features to the redisplay engine is
a miracle.
### Read-Eval-WTF
"Emacs is a great operating system", the old joke goes, "but it just
needs a decent text editor." That's not so far from the truth ---
Emacs is basically a Lisp interpreter married to a big bunch of C code
called [redisplay][2]. (Iä! Iä! Fhtagn!) Under normal operation, Emacs
sits idle waiting for input, reads that input, maps the input to a
command function, executes the command function, and displays the
result of executing that command. It's a fairly simple model: at a
high level, it's not so different from the read-eval-print loop that
you see when you run /usr/bin/python3.
But Python has a simple command line interface. Emacs is a visual
system, so "display[ing] the result" has some subtlety to it. Emacs
organizes its view of the outside world into frames (what the rest of
the world calls "windows"), windows (which the rest of the world calls
"panes"), and buffers (which the rest of the world calls "documents").
At any given time, a user might be looking at any number of frames,
displaying any number of buffers distributed into any number of panes.
When a user hits a key, Emacs does whatever the key-command says to
do, then updates these frames, windows, and buffers to reflect the
results whatever changes that command made. The act of updating
display to reflect Emacs' internal model of the world is called
redisplay. One simple approach to implementing redisplay is to just
redraw all the frames, windows, and buffers from scratch. This
approach might be good enough for a shitty 2016 video game like
Nuclide or Eclipse, but not Emacs.
Emacs was designed for much more constrained systems. Men were men,
women were women, and bandwidth was expensive. Consequently, Emacs
tries very hard to optimize redisplay. Internally, Emacs has a model
of what each frame used to look like, before the last invocation of
redisplay. This model is one of redisplay's inputs. Another input is
the current contents of each Emacs buffer. Redisplay essentially diffs
the last-known display configuration and what it's supposed to be
displaying right now, then emits a minimal set of terminal control
codes needed to change the last-known state to the current-good state.
(Incidentally, this approach, applied to the web and mobile, is the
core of [React][3]. Jordan Walke, eat your heart out.)
All of this redisplay code was written a long time ago. At one time,
it had K&R function prototypes. The authors (mainly [RMS][4], whose
present sanity reflects this effort) intended redisplay to be used on
text terminals over slow links. (Emacs, to this day, has code that
activates if it thinks you're running on a connection slower than 2600
baud.) In this environment, redisplay works very well and affords
compelling advantages.
### Graphical Redisplay
One day, a fool wanted to run Emacs in a GUI as a native GUI program.
The rest is ChangeLog.
To understand why Emacs is so unusual, it's important to understand
how a normal GUI program differs from a normal terminal (henceforth,
TUI) program. A TUI program is driven by its read-eval-print loop,
just like python3 above. Until the program does something, the world
stands still. The program reads input, does something, and squirts out
some bytes in response. Life is simple. The worst problem that a TUI
program has to consider is that the terminal changes size.
By contrast, a GUI program is event driven: innumerable things can
happen to the program, outside of its control. A user can move or
resize a window, click on a button, use VR googles to lovingly career
the titlebar, and do other things that are generally unpredictable and
that happen at completely unexpected times. When you write a GUI
program from scratch, you usually register some kind of [callbacks][5]
that run in response to various events happening. In each callback,
the program does some work and displays the result. These callbacks
can happen in arbitrary order at arbitrary times. The GUI event model
is not a hard programming model: it's just different from the TUI one,
because the set of events is much richer.
Whoever made Emacs into a native [X11][6] program didn't port Emacs to
the event driven model, of which TUI is a neat subset. Instead, he
pretended the GUI was a text terminal. Everything that is wrong with
Emacs stems from this decision. Emacs does not, like most GUI
programs, just receive GUI messages and respond to them. Emacs's main
mode of operation is a still honey-badger-esque read-eval-print loop.
Everything Emacs does to respond to window events happens inside the
read and (horrifyingly) eval parts of this process.
Rendering is worth mentioning. One of the callbacks a normal GUI
program can receive is called Expose. (That's the X11 name: the
Windows equivalent is WM_PAINT.) An Expose event says "I need you [the
program] to render this part of your window. Do it.". Most programs
are perfectly happy just responding to Expose callbacks and drawing
what they need to draw, but Emacs is not most programs. Emacs is a
1980s Lisp Machine pretending that it's running on a text terminal.
It's going to draw when it wants to draw, not when some stupid "GUI
system" tells it to draw.
Consequently, Emacs window rendering is "push", not "pull". When Emacs
gets an Expose event, it draws a god damn white square to tide the
window system over until it gets around to letting redisplay (which
still thinks it's talking to a 1960s teletype) redraw the display.
This redisplay happens in terms of character cells and cursor
positions, not pixels. Emacs demands that the window system let Emacs
draw onto the screen whenever it wants, not just in response to an
Expose event.
The pretending doesn't stop at terminals though. The first GUI ports
of Emacs were based on a GUI framework called [Xt][7]. Xt worked well
for many years. (Does "Motif" ring a bell? Yes? If so, you are old
enough to have seen some shit.) But modern, non-Xt toolkits came along
eventually. Xt works very differently from GTK. GTK+ is much better,
but has a different model.
Did Emacs just adapt to whatever these non-Xt toolkits did? Did Emacs
adopt modern best practices? [GTK+][8] is a modern GUI library. Emacs
supports GTK+. Is Emacs a well-behaved GTK+ program now?
LOLOLOLOLOLOLOLOLOLOLOLOL
Of course not. Emacs pretends GTK+ is an old-fashioned Xt toolkit. The
entire Emacs philosophy is to force $MODERN_THING to behave just like
Xt just like a 1960s TTY. Emacs does awful things to GTK+ to maintain
this illusion.
Keep in mind that Emacs xdisp.c tries to support five different
toolkits (including two different major versions of GTK) with #ifdefs.
There is no runtime abstraction. We define three or four different
versions of each damn function. It's a nightmare.
(When I was at Facebook, I was famous for "convincing" Android to do
things it was never intended to do. Do you think that I gained an
appreciation for this perversion when I joined Facebook? Emacs was my
first and best school of awful hacks.)
### SIGIO
Remember how Emacs just does crap, then displays the result, oblivious
to the outside world? This model doesn't work very well when combined
with a window system that can ask Emacs to do arbitrary things at
arbitrary times. While Emacs is in the middle of syntax-highlighting a
20,000 line C++ file, the window system can say "You! Paint your
window! Now!". Emacs could just wait for a convenient time to get
around to doing this painting, but this strategy would produce a poor
user experience.
To provide a better user experience, Emacs installs a SIGIO signal
handler for the X11 socket. Whatever Emacs is doing, wherever it is in
its code, if the GUI wants to tell Emacs something, Emacs stops what
it's doing and runs redisplay. So now redisplay is not only a
fiendishly complicated algorithm designed to minimize 1980s modem
bills, but it also needs to be thread-safe with respect to every other
part of Emacs.
In the SIGIO callback (which runs whenever the GUI asks Emacs to do
something), Emacs runs a very limited version of redisplay. If this
gimped version of redisplay says that it can't cope with the current
state, Emacs arranges for the full version of redisplay to be done
later. In this case, Emacs usually just paints a white background over
whatever area redisplay can't consider at the moment.
SIGIO handlers can interrupt Emacs at any moment. In a sense, it's
like thread safety. Have you ever tried to make a single-threaded
program safe for threads? Hard, yes? Well, Emacs is like that. Except
that we don't acknowledge that we have threads. (Our global lock is
called block_input().)
What's particularly hilarious is that SIGIO can happen in the middle
of redisplay. The REPL loop (in the Emacs case, not Read Eval Print,
but Read Eval WTF) can be recursive.
### Flicker
Emacs flickers like crazy. This flickering is a predictable
consequence of the "do what the fuck I want when I want it" redisplay
strategy Emacs uses. There's no coordination between your video card,
your GUI system, Emacs, and your sinful soul. Say we're about to draw
a line of text. Step one is to erase that line with the background
color. Step two is to draw each character of text, one by one, of that
line. If your video card happens to refresh in the middle of this
process, you'll see, momentarily, incomplete state. The next frame
will probably be perfect. The GUI system sampled Emacs in the middle
of its drawing operation. You perceive your eyes seeing this
incomplete rendering as flicker.
A program like Emacs can minimize flicker by minimizing the amount of
drawing that you actually do. If Emacs were to redraw every window
every frame, you'd see massive flickering. It's because redisplay
(which is optimized for modems) is pretty good at optimizing the
updating of the screen that you usually don't see flicker. But you
still see it sometimes. It's a fundamental problem. In a
single-buffer, immediate-mode, direct-drawing system, you can always
get unlucky and your GUI can always show you Emacs in the middle of
changing its underwear.
The amount of flicker you actually see depends on things like whether
redisplay optimized the last update, your video driver, and the purity
of your soul. In a single-buffer environment, the compositing manager
and your video driver sample Emacs at essentially arbitrary intervals.
It's only through double buffering that we can guarantee that you see
either valid old state or valid new state, not some random bullshit
in-between.
## Killing Flicker
I am a sinner. Unrepentant. Damned. Thusly, for me, Emacs flickers
constantly. I hate flicker. I love Emacs. Something has to change. I
decided to hack Emacs to eliminate this antediluvian flickering.
Eliminating flicker is not a conceptually hard problem. The basic idea
is that you do your rendering and drawing into some off-screen area,
then, when it's done, atomically (i.e., all at once, all-or-nothing)
show your human user the result of that drawing. A user sees either
the complete new state or the complete old state. Modern GUI toolkits
like Qt and GTK deal with this problem automatically: when a modern
GTK program gets an Expose event, it asks everything affected by this
Expose event to draw onto a bitmap, and when everything has drawn, it
blits this bitmap to the main screen. You never see embarrassing
intermediate state.
It's elegant. It's also easy to implement --- if you're not Emacs.
Recall that Emacs still thinks it's running in a terminal and has
complete control over all output. Flicker was driving me crazy. I had
to retrofit double buffering onto this horrible system.
### Double buffering extension
Modern incarnations of the X Window System have a nice extension
called DOUBLE-BUFFER. This extension lets a program pretend it's
rendering directly to the user while in fact rendering some
intermediate buffer. Under program control, X11 (the GUI, recall) will
copy this intermediate buffer to the primary display. This
functionality is perfect for Emacs.
### Modernity for Emacs
In order to eliminate flickering from Emacs rendering, I needed to
retrofit double buffering into a Byzantine system. The X double buffer
extension helped. Most of Emacs still believes it's drawing to a
normal X window. The reality is that it's rendering to a back buffer.
Keep in mind that Emacs can draw at any time. It's not enough to just
copy this intermediate buffer to the primary display at the end of
processing each command.
We either render too often, imposing unacceptable load on the X
server, or too seldom, and generate user-visible bugs. Remember that
Emacs can draw at arbitrary points, so there's no clear point at which
we should expose the back buffer, the one that contains the results of
our accumulated and thus-by invisible drawing operations. The code is
reentrant, "thead"-safe, and full of special cases, but it achieves
the result I desired.
The problem is imposing a well-defined render->publish cycle on a
free-for-all program.
### Global Variables Are Not Awful
Eventually, I settled on a solution: a global "block redraw" count.
Emacs has no way of indicating dirtiness in drawing, so I just decided
that any code that asked to draw would mark its parent frame as
"dirty". At the end of processing each X11 event (unless blocked) or
when the blocked count reached zero, we walk over all frames and
buffer-flip any that might have been dirtied since the last
buffer-flip operation. We want to minimize the number of buffer-flip
operations, so we try to coalesce as many as possible, which is
challenging when SIGIO can interrupt anything.
I started out by trying to figure out what parts of the program might
redraw the display and doing a "display flip" (i.e., atomic redraw)
after each, but I quickly started licking the walls and cuddling
myself as I stared off into the distance and imagined better days.
Emacs redraw the display from anywhere.
Instead, I just created a system where Emacs keeps track of dirty
(i.e., drawn) regions and we redraw at the end of any X command.
Inefficient? Maybe. Satisfying? Eh. Can I sleep tonight? Probably!
When the global lock count transitions from one to zero, we flip dirty
buffers. Redisplay locks buffer flipping, but other components do too,
depending on context.,
Now, we redraw when the "block redraw" count transitions from one to
zero. Redisplay always blocks redraw. Asynchronous input blocks
redraw. Timers block redraw. Eventually, it all works out, we
decrement a counter from one to zero, and we show you a new view onto
your shitty awful code.
## Overall Result
Emacs should now render itself as smoothly as any other modern GUI
program. It just provides this functionality through a mechanism
that's completely alien and antithetical to modern GUI frameworks.
Internally, Emacs still belives it's a text program, and we pretend Xt
is a text terminal, and we pretend GTK is an Xt toolkit. It's a
fractal of delusion.
Emacs uses an X11 extension, DOUBLE-BUFFER, largely seen as an
historical artifact. There are other hacks I didn't describe, like
putting scrollbars in their own X11 window, contrary to the intent and
design of the GTK people. This extension allows us to reuse our
existing drawing code and redirect it to an off-screen buffer. GTK+ or
Lucid or Motif or whatever we're using is oblivious. My diff turns
scrollbars and other widgets that share screen space with the
double-buffered region into independent X windows. Overall, it's a
giant hack.
But it works. Somehow, it all works. And as a result, Emacs is as
smooth and flicker-free as any other modern GUI program, and regular
users have no idea what horrors lie beneath.
Damn, I love working on this program.
[1]:
https://lists.gnu.org/archive/html/emacs-diffs/2016-10/msg00307.html
[2]:
https://github.com/emacs-mirror/emacs/blob/c29071587c64efb30792bd72248d3c791abd9337/src/xdisp.c#L13517
[3]:
https://react.dev/
[4]:
https://en.wikipedia.org/wiki/Richard_Stallman
[5]:
https://developer.android.com/reference/android/app/Activity.html#onPause()
[6]:
https://en.wikipedia.org/wiki/X_Window_System#Release_history
[7]:
https://en.wikipedia.org/wiki/X_Toolkit_Intrinsics
[8]:
http://www.gtk.org/