Convert Media

  Sometimes one simply need to convert a video, audio file or document to another format.

Text encoding

  Text encoding can get totally wrong, specially when the language requires special
  characters like à äç. The command iconv can convert from one encoding to an other.
# iconv -f <from_encoding> -t <to_encoding> <input_file>
# iconv -f ISO8859-1 -t UTF-8 -o file.input > file_utf8
# iconv -l                           # List known coded character sets

  Without the -f option, iconv will use the local char-set, which is usually fine if the
  document displays well.
  Convert filenames from one encoding to another (not file content). Works also if only some
  files are already utf8
# convmv -r -f utf8 --nfd -t utf8 --nfc /dir/* --notest

Unix - DOS newlines

  Convert DOS (CR/LF) to Unix (LF) newlines and back within a Unix shell. See also dos2unix
  and unix2dos if you have them.
# sed 's/.$//' dosfile.txt > unixfile.txt                  # DOS to UNIX
# awk '{sub(/\r$/,"");print}' dosfile.txt > unixfile.txt   # DOS to UNIX
# awk '{sub(/$/,"\r");print}' unixfile.txt > dosfile.txt   # UNIX to DOS

  Convert Unix to DOS newlines within a Windows environment. Use sed or awk from mingw or
  cygwin.
# sed -n p unixfile.txt > dosfile.txt
# awk 1 unixfile.txt > dosfile.txt   # UNIX to DOS (with a cygwin shell)

  Remove ^M mac newline and replace with unix new line. To get a ^M use CTL-V then CTL-M
# tr '^M' '\n' < macfile.txt

PDF to Jpeg and concatenate PDF files

  Convert a PDF document with gs (GhostScript) to jpeg (or png) images for each page. Also
  much shorter with convert and mogrify (from ImageMagick or GraphicsMagick).
# gs -dBATCH -dNOPAUSE -sDEVICE=jpeg -r150 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 \
-dMaxStripSize=8192 -sOutputFile=unixtoolbox_%d.jpg unixtoolbox.pdf
# convert unixtoolbox.pdf unixtoolbox-%03d.png
# convert *.jpeg images.pdf          # Create a simple PDF with all pictures
# convert image000* -resample 120x120 -compress JPEG -quality 80 images.pdf
# mogrify -format png *.ppm          # convert all ppm images to png format

  Ghostscript can also concatenate multiple pdf files into a single one. This only works well
  if the PDF files are "well behaved".
# gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=all.pdf \
file1.pdf file2.pdf ...              # On Windows use '#' instead of '='

  Extract images from pdf document using pdfimages from poppler or
  xpdfhttp://foolabs.com/xpdf/download.html
# pdfimages document.pdf dst/        # extract all images and put in dst
# yum install poppler-utils          # install poppler-utils if needed. or:
# apt-get install poppler-utils

Convert video

  Compress the Canon digicam video with an mpeg4 codec and repair the crappy sound.
# mencoder -o videoout.avi -oac mp3lame -ovc lavc -srate 11025 \
-channels 1 -af-adv force=1 -lameopts preset=medium -lavcopts \
vcodec=msmpeg4v2:vbitrate=600 -mc 0 vidoein.AVI

  See sox for sound processing.

Copy an audio cd

  The program cdparanoiahttp://xiph.org/paranoia/ can save the audio tracks (FreeBSD port in
  audio/cdparanoia/), oggenc can encode in Ogg Vorbis format, lame converts to mp3.
# cdparanoia -B                      # Copy the tracks to wav files in current dir
# lame -b 256 in.wav out.mp3         # Encode in mp3 256 kb/s
# for i in *.wav; do lame -b 256 $i `basename $i .wav`.mp3; done
# oggenc in.wav -b 256 out.ogg       # Encode in Ogg Vorbis 256 kb/s