Convert Media
Sometimes one simply need to convert a video, audio file or document to another format.
Text encoding
Text encoding can get totally wrong, specially when the language requires special
characters like à äç. The command iconv can convert from one encoding to an other.
# iconv -f <from_encoding> -t <to_encoding> <input_file>
# iconv -f ISO8859-1 -t UTF-8 -o file.input > file_utf8
# iconv -l # List known coded character sets
Without the -f option, iconv will use the local char-set, which is usually fine if the
document displays well.
Convert filenames from one encoding to another (not file content). Works also if only some
files are already utf8
# convmv -r -f utf8 --nfd -t utf8 --nfc /dir/* --notest
Unix - DOS newlines
Convert DOS (CR/LF) to Unix (LF) newlines and back within a Unix shell. See also dos2unix
and unix2dos if you have them.
# sed 's/.$//' dosfile.txt > unixfile.txt # DOS to UNIX
# awk '{sub(/\r$/,"");print}' dosfile.txt > unixfile.txt # DOS to UNIX
# awk '{sub(/$/,"\r");print}' unixfile.txt > dosfile.txt # UNIX to DOS
Convert Unix to DOS newlines within a Windows environment. Use sed or awk from mingw or
cygwin.
# sed -n p unixfile.txt > dosfile.txt
# awk 1 unixfile.txt > dosfile.txt # UNIX to DOS (with a cygwin shell)
Remove ^M mac newline and replace with unix new line. To get a ^M use CTL-V then CTL-M
# tr '^M' '\n' < macfile.txt
PDF to Jpeg and concatenate PDF files
Convert a PDF document with gs (GhostScript) to jpeg (or png) images for each page. Also
much shorter with convert and mogrify (from ImageMagick or GraphicsMagick).
# gs -dBATCH -dNOPAUSE -sDEVICE=jpeg -r150 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
-dMaxStripSize=8192 -sOutputFile=unixtoolbox_%d.jpg unixtoolbox.pdf
# convert unixtoolbox.pdf unixtoolbox-%03d.png
# convert *.jpeg images.pdf # Create a simple PDF with all pictures
# convert image000* -resample 120x120 -compress JPEG -quality 80 images.pdf
# mogrify -format png *.ppm # convert all ppm images to png format
Ghostscript can also concatenate multiple pdf files into a single one. This only works well
if the PDF files are "well behaved".
# gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=all.pdf \
file1.pdf file2.pdf ... # On Windows use '#' instead of '='
Extract images from pdf document using pdfimages from poppler or
xpdf
http://foolabs.com/xpdf/download.html
# pdfimages document.pdf dst/ # extract all images and put in dst
# yum install poppler-utils # install poppler-utils if needed. or:
# apt-get install poppler-utils
Convert video
Compress the Canon digicam video with an mpeg4 codec and repair the crappy sound.
# mencoder -o videoout.avi -oac mp3lame -ovc lavc -srate 11025 \
-channels 1 -af-adv force=1 -lameopts preset=medium -lavcopts \
vcodec=msmpeg4v2:vbitrate=600 -mc 0 vidoein.AVI
See sox for sound processing.
Copy an audio cd
The program cdparanoia
http://xiph.org/paranoia/ can save the audio tracks (FreeBSD port in
audio/cdparanoia/), oggenc can encode in Ogg Vorbis format, lame converts to mp3.
# cdparanoia -B # Copy the tracks to wav files in current dir
# lame -b 256 in.wav out.mp3 # Encode in mp3 256 kb/s
# for i in *.wav; do lame -b 256 $i `basename $i .wav`.mp3; done
# oggenc in.wav -b 256 out.ogg # Encode in Ogg Vorbis 256 kb/s