cs.utexas.edu!convex!convex!tchrist Thu Jul 16 18:12:32 CDT 1992
>From the keyboard of
[email protected] (Nathan D. Lane):
:Hello all,
: Having struggled with this all weekend and not figured it out
:even with the help of the Camel Book, the manpage, or the FAQ, I've
:decided it's time to post. Could someone please tell me how to remove
:characters such as the bullet and foreign characters from a file? I'm
:trying to convert files from a CPT 8525 word processing system into a
:format that makes more sense on an IBM RS/6000 220 or 340 or a Sun 3
:or Sparc. I need to remove invalid control characters and characters with
:the 8th bit set. I don't want to remove linefeed (^J), however. I'd
:LOVE to have it convert the codes to troff (or so my husband tells me :-)
:..any ideas? If this is too trivial a question to answer with a post, I'd
:still really appreciate email. Thanks in advance for *any* replies!
In general, it's hard to know how to fix up a file with wordprocessing
magic in it unless you've specialized in said magic.
But if all you want to do is throw away the stuff you don't recognize,
you can in-place edit files using perl this way:
perl -i.bak -p -e 'y/\000-\200-\377//d' file1 file2 file3 ...
which strips high-bit characters. If you want to remove all the
nonprintables except for space and tab, you could do this:
y/\000-\010\013-\037\177-\377//d;
I skipped characters 010 and 011 because they're \t and \n.
--tom
--
Tom Christiansen
[email protected] convex!tchrist
signal(i, SIG_DFL); /* crunch, crunch, crunch */
--Larry Wall in doarg.c from the perl source code
cs.utexas.edu!convex!convex!tchrist Thu Jul 16 18:12:45 CDT 1992
I wrote:
: perl -i.bak -p -e 'y/\000-\200-\377//d' file1 file2 file3 ...
But that won't even compile. I meant to write something more like:
: perl -i.bak -p -e 'y/\000-\037\200-\377//d' file1 file2 file3 ...
--tom
--
Tom Christiansen
[email protected] convex!tchrist
Real programmers can write assembly code in any language. :-)
--Larry Wall in <
[email protected]>