The vi/ex Editor, Part 7: A Little "R" and "r": The Fine Points | |
of those Replacement Commands | |
There's more to R than to r | |
Quoting in Characters | |
Readers Ask | |
Tommy Spratlin & Thai-Nghia Dinh writes: | |
Next Time Around | |
This installment of our Vi/Ex tutorial series is a diversion from | |
the subjects I promised at the end of the previous part -- the | |
change is my fault, and yet it is necessary. When I blithely | |
suggested last time that the R command is just like the familiar r | |
command, except for a few differences I mentioned, I was leading | |
you astray. | |
There are several differences that can cause problems in certain | |
uses unless you understand those differences. And you won't really | |
comprehend the greatest of those differences until you know about | |
metacharacters in insert mode. But as an encouragement to follow | |
all this, consider that almost all of what I say here about the R | |
command also is valid with all the other commands that put you into | |
text insertion mode: | |
a A i I o O c s :a :i etcetera. | |
There's more to R than to r | |
The r command replaces whatever character is presently under the | |
cursor, so there must be some character under the cursor for it to | |
replace -- otherwise it just gives you an error beep. Not so with | |
R. | |
You can give the R command on an empty line; whatever you type | |
after that, up to the next escape character, will take the place of | |
that empty line just as though you had typed past the end of an | |
existing line after giving an R command. (I was going to say | |
"just as though you had given an a command", but I'm now very | |
leary of making comparisons that are incomplete without paragraphs | |
of explanations.) You can even start entering text into a | |
brand-new file via the R command. | |
The factor above can be useful in various situations; I only have | |
space to mention one. At times I want to type new characters to | |
replace blank spaces in a place where some of the lines are empty. | |
These do not have any blanks; no characters at all. But I do not | |
have to look at each line before I start typing on it, to see | |
whether I should use an R or an a command, because R will work in | |
either case. | |
The R command is more forgiving of your typing errors, too. | |
Whatever character you type after an r is final. If you | |
accidentally typed the wrong character, you can only put back what | |
was there by typing a u command, if the mistake was the last | |
editing command you typed, or put in the replacement you had in | |
mind by returning the cursor to the spot and running another, more | |
careful, r command. | |
But if you mistype during an R command, you can backspace over the | |
error with the backspace key. Then you can type in the character | |
(or characters; you can back up multiple spaces by repeating the | |
backspace key) you should have typed. And if you simply typed too | |
far, you'll be glad to know that backspacing doesn't just remove | |
the incorrect characters, it restores the characters that were | |
there, either right away or as soon as you hit the escape key. You | |
can even backspace over everything you've typed during this R | |
command before you type escape, because the editor does not object | |
to a replacement string length of zero. | |
One caveat here, though, lest my clarification turn out to need a | |
clarification of its own. With either of these commands it is | |
possible to break a line, just by typing the return key as a | |
replacement character, and with the R command this linebreaking can | |
be done either while actually replacing characters or when typing | |
on beyond the end of the existing line. With almost all versions | |
of the editor, it is not possible to backspace over an inserted | |
linebreak, even while you are still in R insertion mode. | |
The most important difference, though, is the handling of | |
metacharacters. Yes, text insertion utilizes metacharacters too, | |
quite apart from the ones that the replacement patterns in | |
:substitute commands use. The r command recognizes hardly any of | |
these metacharacters, and quoting those in as literal characters is | |
very simple. The R command, though, recognizes almost all of them, | |
and quoting characters in with R is rather complicated. | |
Quoting in Characters | |
The phrase "quoting in" is standard terminology, but it is rather | |
misleading in the editor. Unlike Unix shells, the editor does not | |
use any of the ASCII quotation marks: ` ' " (backquote, single and | |
double quote) to quote characters into a file. Instead, it uses | |
the backslash ("\") and control-V ("^V"); the latter is what | |
you send when you press the V key while holding the CONTROL or CTRL | |
key down. In either case, you quote a character in by typing the | |
quoting character just prior to the character you want to quote in. | |
So if @ is your line kill character, and you want to put that | |
character in the text you are typing in, you would have to type | |
either \@ or ^V@ to get it there. And if you want several | |
consecutive characters quoted in, you must quote each of them | |
individually. That is, if you want to put @@@ into a line, you | |
must type either ^V@^V@^V@ or \@\@\@ to put that string there. | |
But \ and ^V are not always interchangeable. In many cases either | |
will work; but sometimes you must choose the right one. Which one | |
to use depends both on what character you want to quote in and | |
whether you're using the r or R command. | |
One obvious use for quoting is to insert a character that normally | |
erases part or all of what you've just typed in. The ASCII | |
backspace character, control-H, must be quoted in, and so must your | |
own line-kill character (@ in the example above) and your own erase | |
character if it is not control-H. With the r command you quote in | |
any of these with a backslash; when using R you may quote any of | |
these in using either backslash or control-V. | |
A pause here, to answer a question that might be in the minds of | |
people who know a little about Unix internals. Ordinarily it is | |
the asynchronous serial terminal line (or TTY) driver that | |
recognizes the erase and line-kill characters and edits the input | |
line accordingly without including these characters in the final | |
result. Then, how can one enter these same input-line characters | |
into the edit buffer if they don't get past the TTY driver? | |
Because Vi/Ex places the TTY driver into a special "raw" mode that | |
ignores the line-editing characters passing them on to the editor. | |
Otherwise you would not be able to quote these characters in. | |
Also, the editor is set up to discover your erase and line-kill | |
characters by querying your personal environment, and then | |
interpret these characters as the line driver would have. A nifty | |
feature -- but unfortunately, the editor has no way to let the user | |
turn this feature off. | |
The editor's creators came up with a curious method for repeating | |
short text insertions, where the text to go in is always the same | |
but any outgoing text varies. They decided that when you are in | |
screen mode, and have just gone into typing-in-text submode, and | |
make Control-@ ("^@") the first character you type in, then the | |
editor should insert the last piece of text you had previously | |
inserted (if it was not more than 128 characters long) and take you | |
back to command mode. Unfortunately, they never made this work as | |
promised. | |
In actuality, ^@ operates anywhere in a text insertion, not just in | |
the first character position. What a ^@ does there depends on the | |
situation. If your last c d y command, or one of their variants | |
such as s D etcetera, removed or copied a full line of text or | |
parts of two or more lines, or if you haven't run one of those | |
commands in your current editing session, then typing ^@ is just a | |
nuisance. It will take you out of text input submode and probably | |
move the cursor back a few characters from where the input ended. | |
But if you have done at least one c d y command or a variant, and | |
if the very last one you did removed or copied only a part of a | |
single line of text, then surprise! Typing a ^@ in this case will | |
do three things: | |
Unless you typed it at the first character position on a line, it | |
will move the cursor back one character. This will move over the | |
last character you typed in if you've typed any, or over one | |
existing character if you type ^@ as the first character of your | |
insertion, but will not erase the character it passes over. | |
Just to the left of the new cursor position, the editor will insert | |
the text that was removed or copied by your last c d y command or | |
variant. (If you went into text-insertion submode via a c command | |
or a variant of it, the text you just took out is what will be put | |
back in.) | |
Finally, the text insertion will automatically end and you will be | |
back in command submode, with the cursor positioned at the start of | |
the last simple word that was inserted by the ^@ metacharacter. | |
Quoting a ^@ into your text isn't possible, because the editor | |
reserves that character for internal use and will not accept it as | |
itself in any file you may edit. Not that there would be any | |
reason to put ^@ in a file anyway: it is the ASCII character NUL, a | |
padding character that is routinely inserted in data streams by | |
device drivers, and just as routinely stripped at the receiving | |
end, so any ^@ characters you might add would be lost in the | |
shuffle. But when you are using the R command, or any other | |
command that lets you insert an indefinite amount of text, you can | |
quote a ^@ anyway by preceding it with a ^V. The result will be to | |
quote ^[Pb into your file at that point; this being the command | |
string the editor issues to perform the odd operation I've detailed | |
above. | |
Those of you who are skillful with the editor may wonder why the ^@ | |
insertion operates only when your last text extraction was a | |
fragment of one line. After all, the P command by itself inserts | |
the contents of the unnamed buffer, and that buffer holds whatever | |
was extracted last, be it half a line or a hundred lines, doesn't | |
it? The answer lies in one of the editor's undocumented features. | |
When you give a command to insert text, even the r command that | |
only inserts a single character, the editor simultaneously flushes | |
the unnamed buffer and leaves it empty -- if and only if that | |
buffer contained more than a fragment of one line. So, when you | |
entered the text insertion mode from which ^@ operates, you emptied | |
the unnamed buffer unless there was only a fragment of one line in | |
it. | |
At times you may want to use the beautify option to the set | |
command. This tells the editor to throw away most, but not all, | |
control characters you may try to type in -- the exceptions usually | |
are the tab (^I), newline (^J), and form feed (^L) -- in order to | |
keep you from inadvertently putting in invisible control characters | |
that will be hard to detect later. This option is normally off, | |
but you can type :se bf to turn it on. | |
But even when you want most control characters thrown out, there | |
will be occasions when one must go in. This is not possible using | |
a r command. The usual r technique of backslashing will usually | |
bite back in this case -- the editor will interpret the control | |
character by acting on its control meaning rather than inserting it | |
in the text. Using R, though, you can insert most control | |
characters by preceding each with ^V. | |
Even this may not be enough. Some systems are set up so that when | |
certain control characters are typed in, even though preceded by | |
^V, the system acts on them as control characters before the editor | |
ever sees them. To get around this problem, many implementations | |
of the editor, especially older ones, interpret an ordinary | |
character typed right after a ^V as a control character. That is, | |
on these systems, typing ^VF or ^Vf while running an R command | |
inserts a ^F in the file, just as typing ^V^F would on systems that | |
don't have this challenge. | |
Readers Ask | |
Here are the latest questions, and my solutions, from inquiring | |
readers with problems you might face someday. | |
Tommy Spratlin writes: | |
Hi Walter, | |
In moving files from Windows machines to UNIX, some of our users do | |
binary transfers which result in ^M characters in the ASCII files. | |
Usually they occur at the ends of individual lines and I do: | |
:1,$ s/^M//g | |
where ^M is generated by ^V^M and everything works fine to delete | |
these characters. I now have a new problem: I found a file with ^M | |
characters embedded in it, but the file is one long line. I need | |
to replace them with Vi's line-end character to split this long | |
line into multiple lines. But I can't because it's the same as | |
pressing the ENTER or RETURN key in the middle of the substitution | |
command. How can I replace the superfluous carriage return? We | |
have several files like this and it's causing problems viewing them | |
with Web browsers. | |
I tried substituting a newline with the character code and the | |
octal code unsuccessfully, and tried the ^M as a last unsuccessful | |
resort. | |
Things aren't as complicated as you make them seem, Tommy. First | |
of all, Web browsers generally ignore carriage-return and/or | |
linefeed characters while formatting text for display. If your | |
browser is choking on these all-one-line files, it is probably | |
because the lines are too long for your browser, or for some other | |
cause not related to embedded ^M characters. | |
Now, as you have deduced, the difference between Microsoft and Unix | |
text file formats is that Microsoft operating systems seem to favor | |
carriage-return followed by linefeed (^J) as the line separator, | |
while Unix systems use linefeed alone. | |
As you've discovered, you cannot directly quote a ^J into any | |
editor command. And yet, you put a ^J into your file every time | |
you hit return during text entry, although the return key on most | |
terminals sends a ^M character. That's the trick; the substitute | |
command regards a ^M in the input pattern as a signal to insert a | |
^J and discard the ^M. So you only need to get that ^M into the | |
replacement pattern by typing in your command line like this: | |
:1,$ s/^V^M/^V^M/g | |
You just have to overlook the appearance of futility in this | |
command line, as though it were going to replace each ^M with | |
itself. That first ^M is in the outgoing pattern, so it matches a | |
real ^M. The second, in the replacement pattern, calls for a ^J as | |
I explained above. | |
However, these all-one-line files may be too long for the Vi | |
editor, which cannot handle lines much more than a thousand | |
characters long in most common implementations, with shorter limits | |
in older versions. The editor will truncate lines that exceed the | |
limit, with only a minimal and rather cryptic warning. In such | |
cases, use the tr utility to replace the ^M characters (which is a | |
very straightforward job with that tool), before you bring the file | |
into the Vi editor. | |
You may wonder then, how you would use the substitute command to | |
put ^M characters into your file. The answer is to backslash the | |
quoted-in ^M. To add a ^M at the end of every line in your file, so | |
as to conform it to Microsoft practice, type this command: | |
:%s/$/\^V^M | |
(Note that it is important to type the \ first, then the ^V, | |
followed by the ^M.) The ^V puts the immediately-following ^M into | |
the command line, and the backslash tells the command that this ^M | |
is to be considered a real one, not a metacharacter for ^J. In | |
fact, these are the general principles for quoting characters | |
almost everywhere except in typing-in-text mode: | |
Precede a character by ^V to keep that character from being | |
interpreted as a metacharacter at the moment you type it. In this case, | |
you don't want typing ^M to immediately end the substitution command. | |
Precede a character by a backslash to keep that character from acting | |
as a metacharacter later, when what you've typed is interpreted by the | |
editor -- for example, when what you have typed in is run as a command, | |
or interpreted as a search pattern. This command uses a backslash to | |
keep the command from inserting ^J instead of ^M at the time it executes. | |
When you must use both, as in this case, type the backslash before | |
you type the ^V. (If you think that this backslash would then | |
affect the immediately following ^V rather than the later ^M, | |
remember that the ^V is not there when the backslash takes effect. | |
The ^V disappears as soon as it tells the editor to insert the ^M | |
in the command instead of taking the ^M as the signal to end the | |
command.) | |
Finally, you can replace linefeed characters with something else | |
via line mode commands, but you must use two commands and only one | |
of them is the substitute command. Suppose you need to change a | |
short file's format from a number of lines to the format Tommy | |
encountered: a single line with ^M separators. That is, replace | |
each ^J (except the last) with a ^M. (This had better be a fairly | |
short file, because even newer versions of the editor can't handle | |
any lines longer than 1024 characters.) | |
Start by using a command similar to the one above to put ^M at the | |
end of every line except the last. (Since these ^M characters are | |
to separate lines, there's no use for one at the end of the last | |
line.) Then use this command: | |
:%j! | |
to join all the lines into one. The "j" in this command line is | |
the shortest abbreviation for the line mode join command, and the | |
"!" switch at the end of it tells the command not to insert blank | |
space between the lines it joins. | |
Thai-Nghia Dinh writes: | |
Hi, | |
I have a question (rather simple, really) but no one seem able to | |
know the answer. Not even the help desk (with all the Vi gurus | |
:)). I'm hoping you can help me with it. | |
I have a text file of unknown length. Each line of the file can be | |
very short or very long (from 3 characters up to 1000 characters). | |
Within this file, I'm trying to locate (search) the nth occurrence | |
of a word. | |
Here are a few things I've tried: | |
The simple solution would be (from visual command mode): a | |
/foobar command followed by the n command typed n-1 times. But | |
what if n is large, say 200 or greater?) | |
:1,$ global /^/ /foobar/ (and its variations) Nothing useful... | |
Can you suggest a better way? | |
Yes, although it involves a slightly tricky procedure. Consider | |
the following command string: | |
:$|/\<foobar\>/s//QQQ | |
The first command in this string takes us to the last line of our | |
file and -- incidentally -- displays it on our screen, which is not | |
important here. The second command searches forward for a line | |
containing "foobar" as a word, and starting from the last line the | |
search must wrap around and find the first instance in the file. | |
Then that second command replaces the word "foobar" with "QQQ", | |
leaving the cursor at the point where the substitution was made. | |
Now let us make an addition to the start of this command string: | |
:1,199g/^/$|/\<foobar\>/s//QQQ | |
This revised string repeats the procedure 199 times; each time the | |
first instance of "foobar" remaining in the file is the one | |
replaced. So we end up sitting on the "QQQ" string that replaced | |
the 199th instance of "foobar"; simply typing n will bring us to | |
the 200th instance. And if we move off that 200th instance for any | |
reason, going to the top of the file and searching for "foobar" | |
will bring us right back to it, because the first 199 are now gone. | |
When we are finished with that 200th "foobar", this command: | |
:%s/QQQ/foobar/g | |
will change those 199 "QQQ" strings back to "foobar". Of course, | |
if there is any chance that "QQQ" might occur in the document as | |
itself, we can choose another dummy string. | |
And while I'm at it, I've got another question. | |
How do I delete all lines beginning with a certain string, say, | |
!@#$ (or foobar for that matter). And a related question: how to | |
delete lines containing the word foobar (anywhere within the line)? | |
The first command line following will solve your first problem, and | |
the second will solve your second: | |
:g/^foobar/d | |
:g/\<foobar\>/d | |
Next Time Around | |
To make room to answer two readers' questions, I had to skip | |
presenting three great Vi tools -- autoindent, abbreviate, and map! | |
-- and the effect their metacharacters have in text-insertion mode. | |
They'll be first up in the next part of this tutorial. | |
More answers to reader questions are coming, too. I have queries | |
to answer about the semicolon address separator and about yanking | |
within macros -- and if a few more significant problems arrive | |
here, I'll try to fit them in, too. | |
And this time you won't have to wait and wait for the next tutorial | |
part. As I write this paragraph, I'm already in the middle of | |
creating the next part, so you should see it within a month after | |
this part appears online. | |
Part 8: Indent, Like a Typewriter | |
Back to the index |