= 31 days of text editors: Sed

= 31 days of text editors: Sed

:Author: Seth Kenlon
:Email: [email protected]

Few UNIX commands are as famous as sed, LINK-TO-GREP-ARTICLE[grep], and https://opensource.com/article/20/9/awk-ebook[awk].
They get grouped together often probably because they have strange names, but also because they powerful tools for parsing text.
They also share, sometimes, syntactical or logical similarities.
And while they're all useful for parsing text, they each have their specialities.
This article examines the `sed` command, which is a _stream editor_.

I've written about https://opensource.com/article/20/12/sed[sed], as well as its distant relative https://opensource.com/article/20/12/gnu-ed[ed].
To get comfortable with sed, it helps to have some familiarity with ed, because it forces you to get used to the idea of buffers.
This article also assumes that you're familiar with the very basics of sed, meaning you've at least run the classic `s/foo/bar/` style find-and-replacement command.

== Installing

If you're using Linux, BSD, or macOS, then you already have either GNU or BSD sed installed.
These are two unique reimplementations of the original `sed` command, and while they're similar, there can be minor differences.
This article, however, has been tested on both Linux and NetBSD, so you can use whatever sed you find on your computer.

GNU Sed is generally regarded to be the most feature-rich sed available, so you might want to try it whether or not you're running Linux.
If you can't find GNU Sed (often called `gsed` on non-Linux systems) in your ports tree, then you can http://www.gnu.org/software/sed/[download its source code from the GNU website].
The nice thing about having GNU Sed installed is that it can be used for its extra functions, but it can also be constrained to conform to just the https://opensource.com/article/19/7/what-posix-richard-stallman-explains[POSIX] specifications of Sed, should you require portability.

Mac users can find GNU sed on https://opensource.com/article/20/6/homebrew-mac[Homebrew] or https://opensource.com/article/20/11/macports[MacPorts].

On Windows, you can https://chocolatey.org/packages/sed[install GNU Sed] with https://opensource.com/article/20/3/chocolatey[Chocolatey].

== Pattern space and hold space

Sed works on exactly one line at a time.
Because it has no visual display, it creates a _pattern space_, a space in memory containing the current line from the input stream (with any trailing newline character removed).
Once the pattern space is populated, your instructions to sed are executed.
When the end of commands are reached, Sed prints the contents of the pattern space to the output stream.
The default output stream is *stdout*, but it can be redirected to a file or even back into the same file using the `--in-place=.bak` option.

Then the cycle begins again with the next input line.

To provide you with a little flexibility as you scrub through files with sed, though, sed also provides a _hold space_ (sometimes also called a _hold buffer_), a space in sed's memory reserved for temporary data storage.
You can think hold space as a clipboard, and in fact that's exactly what this article demonstrates: how to implement a copy and paste, and a cut and paste, with sed.

First, create a sample text file, with this text as its content:

[source,text]
----
Line one
Line three
Line two
----

== Copying data to hold space

To place something in sed's hold space, you use the `h` or `H` command.
A lower-case `h` tells sed to overwrite the current contents of hold space, while a capital `H` tells it to append data to whatever's already in hold space.

Used on its own, there's not much to see:

[source,bash]
----
$ sed --quiet -e '/three/ h' example.txt
$
----

All that's happened here is that any line containing `three` has been added to the hold space.

== Copying data from hold space

To get some insight into hold space, you can copy its contents from hold space and place it into the pattern space with the `g` command.
Watch what happens:

[source,bash]
----
$ sed -n -e '/three/h' -e 'g;p' example.txt

Line three
Line three
----

The first blark line is printed because the hold space is empty when it's first copied into pattern space.
The next two lines contain 'Line three' because that's what's in hold space from line 2 onward.

== Appending data to pattern space

The `G` command appends a newline character and the contents of the hold space to the pattern space.

[source,bash]
----
$ sed -n -e '/three/h' -e 'G;p' example.txt
Line one

Line three
Line three
Line two
Line three
----

The first two lines of this example output contain both the contents of the pattern space (`Line one`) and the empty hold space.
The next two lines match the search text (`three`), and so it contains both the pattern space and the hold space.
The hold space doesn't change for the third pair of lines, so the pattern space (`Line two`) is printed with the hold space (still `Line three`) trailing at the end.

== Cut and paste with sed

Now that you know how to juggle a string from pattern to hold space and back again, you can devise a sed script that copies, then deletes, and then pastes a line within a document.
For example, the example file for this article has `Line three` out of order.
Sed can fix that:

[source,bash]
----
$ sed -n -e '/three/ h' -e '/three/ d' \
-e '/two/ G;p' example.txt
Line one
Line two
Line three
----

* The first script finds a line containing the string `three`, and copies it from pattern space to hold space, replacing anything currently in hold space.
* The second script deletes any line containing the string `three`. This completes the equivalent of a *cut* action in a word processor or text editor.
* The final script finds a line containing `two`, and _appends_ the contents of hold space to pattern space, and then prints the pattern space.

Job done.

== Sed scripting

I tend to divide actions into separate script actions.
This isn't always necessary.
The final example in this article can be written as one script for the same results:

[source,bash]
----
$ sed -n -e \
'/three/h ; /three/d ; /two/ G;p' \
example.txt
Line one
Line two
Line three
----

The important thing is to recognize distinct actions, and to understand when sed moves to the next line, and what the pattern and hold space can be expected to contain.

Of course, the more predictable the text you need to parse, the easier it is to solve your problem with sed.
It's usually not practical to invent "recipes" for sed actions (such as a copy and paste) because the condition to trigger that action is probably different from file to file.
However, the more fluent with sed's commands you become, the easier it is to devise complex actions based on the input you need to parse.

== Download the cheat sheet

Sed is complex.
It's only got a dozen commands, and yet its flexible syntax and raw power means it's full of endless potential.
I used to reference pages of clever one-liners in an attempt to get the most use out of sed for myself, but it wasn't until I started inventing (and sometimes reinventing) my own solutions that I felt like I was starting to actually learn sed.
If you're looking for gentle reminders of commands and helpful tips on syntax, LINK-TO-CHEAT-SHEET[download our sed cheat sheet], and start learning sed once and for all!