Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!news.mathworks.com!fu-berlin.de!cs.tu-berlin.de!phade
From: [email protected] (Frank Gadegast)
Newsgroups: alt.answers,comp.answers,news.answers
Subject: MPEG-FAQ: multimedia compression [1/9]
Followup-To: alt.binaries.multimedia
Date: 9 Nov 1996 09:32:20 GMT
Organization: Technical University of Berlin, Germany
Lines: 1304
Approved: [email protected]
Expires: 31 Dec 1996 12:00:00 GMT
Message-ID: <[email protected]>
Reply-To: [email protected]
NNTP-Posting-Host: 130.149.22.20
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Summary: This is the summary about the ISO video and audioformats MPEG 1, 2 and 4
Keywords: MPEG, FAQ, Compression
Xref: senator-bedfellow.mit.edu alt.answers:21694 comp.answers:22304 news.answers:86419

Archive-name: mpeg-faq/part1
Last-modified: 1996/06/02
Version: v 4.1 96/06/02
Posting-Frequency: bimonthly

===========================================================================

~Subject: SECTION 0. - INTRO

       ====================================================
       THE MPEG-FAQ            [Version 4.1 - 1. June 1996]
       ====================================================
       PHADE Software
       Inh. Dipl-Inform. Frank Gadegast
       Leibnizstr. 30
       10625 Berlin, GERMANY

       Fon/Fax   ++ 49 30 3128103
       E-mail    [email protected]
       Web site  http://www.powerweb.de/mpeg


It's the eights publication of this file. Lots of information has been
changed (which has surely brought errors with it, see Murphy's Law).

This eights compilation is very different to the previous one, Version 4.0.

First:    The location of this file is:

         Text-Version : URL: ftp://ftp.powerweb.de/mpeg/faq/mpegfa41.zip
                             [194.77.15.46]
         HTML-Version : URL: http://www.powerweb.de/mpeg/faq/

         My MPEG-related software and my DOS-ports of several
         programs can be found there too.

Second:   "The Internet MPEG Audio Archive" is there ! Our brilliant
         collecting of everything that belongs to MPEG audio. For only
         DM 49,- ! Get it ! More than 400 MB of songs, documentation
         and utilities ! Read below, about how to Order !

Third:    "The Internet MPEG CD-Rom" is still available ! The uniq
         collecting of everything that belongs to MPEG. For only
         DM 49,90 ! Get it ! More than 600 MB of movies, songs,
         documentation and utilities ! Read below, about how to Order !

         Another CD-Rom containing material for MPEG-2 is about to get
         released ! It will be called the "MPEG-2 Movie Toolbox".

Fourth:   This FAQ has and the famous MPEG Archive has a complete new
         home now on the PowerWeb site ! The newest FAQ and other
         MPEG-related information and utilities for all platforms
         can always be loaded using WWW from:

                URL=http://www.powerweb.de/mpeg

         And surely, there are more interesting things to find ;o)


I add my comments in brackets [], lines (---- or ====) seperate the
chapters and questions.

Please try and find out more information yourself. I had enough to do by
getting and preparing this information. And only bother me with file-
request if its not possible for you to get it somewhere else !!!

If you want to contribute to this FAQ in any way, please email directly too
(probably by replying to this posting):

 [email protected]

If you want to contribute to the MPEG Archive, please upload via ftp to
ftp://ftp.powerweb.de/incoming/mpeg and notity [email protected] via
e-mail about your contribution.

Other usefull information related to MPEG can be e-mailed to

 [email protected]

Or send any additional information via fax or e-mail.

Enjoy MPEG, KeyJ "MPEG" Phade (Frank Gadegast)


-------------------------------------------------------------------------------

~Subject: Disclaimer

           I HAVE NOTHING TO DO WITH THE NAMED COMPANIES, NO BUSINESS,
           IT'S JUST MY PERSONAL INTERESTED. COMPANIES ARE NAMED,
           BECAUSE THEY ARE THE FIRST, BRINGING REAL MULTIMEDIA TO THE
           WORLD. SURE I MAKE ADVERTS FOR THEM WITH THIS FAQ, BUT HOPE-
           FULLY YOU, AS A READER OF THIS FAQ, WILL FORCE THEM TO PRODUCE
           MORE AND BETTER PRODUCTS.

           MOST ADDITIONAL INFORMATION IS WRITTEN AS PERSONAL COMMENT,
           AND SHOULD NOT BE TAKEN AS PROOFEN FACTS. INFORMATION IS
           PRESENTED "AS IS", COULD BE OUT OF DATE AND CANNOT BE
           GARANTIED TO BE THE TRUTH. THIS INFOMATION COMES WITHOUT
           WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION OF
           WARRANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR
           PURPOSE AND NON-INFRINGEMENT.

           UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, TORT, CONTRACT,
           OR OTHERWISE, SHALL THE AUTHOR BE LIABLE TO YOU OR ANY OTHER
           PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL
           DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES
           FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR
           MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES.

           Frank Gadegast

-------------------------------------------------------------------------------

~Subject: Copyright information

           THIS COMPILATION OF INFORMATION IS COPYRIGHTED BY THE AUTHOR
           AND MAINTAINER, CURRENTLY FRANK GADEGAST. ANY NON-COMMERCIAL
           USE OF IT, OR PARTS OF IT IS ALLOWED, UNTIL THE USE OF IT IS
           REPORTED TO THE AUTHOR AND THE COMPILATION IS KEPT UNCHANGED.
           ADDITONAL, IF PARTS OF IT ARE USED, INFORMATION HAS TO BE ADDED
           WITH THAT PART, WHO THE AUTHOR OF THAT PARTS IS, THAT IT BELONGS
           TO THE COMPLETE COMPILATION AND WHERE TO FIND THE COMPLETE
           COMPILATION.

           COMMERCIAL USE CAN BE GRANTED IN SPECIAL CIRCUMSTANCES, FEEL
           FREE TO ASK AND SEND A DESCRIPTION OF THE INTENDED USE, TO
           RECEIVE A CERTIFICATION.

           ANY NON-REPORTED OR NON-CERTIFIED COMMERCIAL USE OF THIS
           COMPILATION IS A VIOLATION OF GERMAN COPYRIGHT LAW !

           ANY RE-PUBLICATION OF THE INFORMATION IN THIS COMPILATION SHOULD
           BE REPORTED TO THE AUTHOR AND SHOULD BE QUOTED IN THE NEW
           PUBLICATION.

           ANY RE-DISTRIBUTION OF THE COMPLETE FILE ON NON-COMMERCIAL
           ARCHIVES, LIKE FTP- OR FAQ-MIRRORS IS ALLOWED.

-------------------------------------------------------------------------------

~Subject: Digest format

It should be possible to read this FAQ with a threaded newsreader or emacs
in FAQ-mode to enable you, to jump from one question to another, because
this FAQ is organized as a digest.

You can move to the next question with the digest commands in gnus, rn or
other newsreaders, or with a regex search for ^~Subject or ^--.

-------------------------------------------------------------------------------

~Subject: Recommendations

Well, to stop some of the most enoying question, from those that do not read
this FAQ at all, I recommend the following player/decoder and encoder.
Search the FAQ for these words and download them BEFORE e-mailing to me !

DOS:     VMPEG, MAPLAYPC and CMPEG, ENC11BIN
Windows: VMPEG, SoftPeg, COOL 1.5.3 and Maplay 1.2 for Win32
Unix:    XMPLAY and VCR

CD-I's and Video-CDs are currently only supported by VMPEG and SoftPeg !

-------------------------------------------------------------------------------

~Subject: What questions are getting answered in this FAQ ?

SECTION 0. - INTRO
   Disclaimer
   Copyright information
   Digest format
   What questions are getting answered in this FAQ ?
SECTION 1. - WHAT IS MPEG-VIDEO/VIDEO
   What is MPEG ?
   What is MPEG-Audio then ?
   What is the Audio Layer 3 then ?
   What is MPEG-1+ ?
   What is MPEG-2 ?
   What happened at the MPEG - NY meeting ?
   What's about Video-CD and CD-I ?
SECTION 2. - PROFESSIONAL SOFTWARE
   SUBSECTION - DOS
   MPEG Encoder by Xing
SUBSECTION - WINDOWS
   MPEG ARCADETM
   XingSound
   XingCD
SUBSECTION - UNIX
   Xing Distributed Media Architecture
   NVR Research Kit
   Demo of NVR Digital Media Development Kit
   How will I get the NVR-Software ?
SECTION 3. - FREE AVAILABLE SOFTWARE
SUBSECTION - DOS
   layr_100
   mpeg2ppm
   vmpeg
   cmpeg
   dmpeg
   secmpeg
   mpegstat
   enc11dos
   pvrg MPEG
SUBSECTION - Windows
   XingIt
   mpgaudio
SUBSECTION - WINDOWS-NT
   mpeg2ply
   mpegplay
SUBSECTION - OS/2
   mp
SUBSECTION - X-WINDOWS and UNIX
   Berkeley's MPEG Tools
   MPEG-1 Video Software Encoder
   MPEG Video Software Decoder
   MPEG Video Software Analyzer
   MPEG Blocks Analyzer
   MPEG Video Software Statistics Gatherer
   xmg
   mpegstat
   mplex
   xmplay
   xplayer
   xmpeg.tk
   mpeg2encode / mpeg2decode
   mpegaudio
   maplay
   Scanning MPEG's ...
   MPEG decoder...
   MPEGTool
   What is "SECMPEG" ?
   PVRG-MPEG Codec
   wdgt
SUBSECTION - VMS
   vms MPEG
SUBSECTION - MacIntosh
   Sparcle
   Qt2MPEG
   Audio on Macintosh ?!
SUBSECTION - Atari
SUBSECTION - Amiga
   MPEG2DCTV
SUBSECTION - NeXT
   MPEG_Play.app
   mpegnext
SUBSECTION - SGI
SECTION 4. - MPEG-RELATED HARDWARE
   MPEG audio Layer-3
   Video-Maker
   Some MPEG chips
   Optibase
   ReelMagic
   Cinerama
   XingIt!-card
   MPEG-decompression hardware list
   Amiga CD32
SECTION 5. - MAILBOX-ACCESS
   Genoabox
   Xing Technologies BBS and fax
SECTION 6. - FTP-ACCESS
   FTP-ACCESS - Overview
   MPEG-2 validation bitstreams
   Audio streams and utils
   Accessing Aminet
   Where will I find test-material for MPEG-encoders ?
SECTION 7. - WWW-ACCESS
   Where is the WWW-home of this FAQ ?
   An Interactive Explanation on the Web ?
   Where is the WWW-demo of "The Internet MPEG CD-Rom" ?
   Which archive is mostly related to MPEG-Audio ?
   What's with Bryan Woodworth ftp-area ?
   Rock'n'Roll stored in MPEG on the Web ?
   Where can I find space movies coded in MPEG ?
   Movies on Web-site
   Where can I find fractal movies coded in MPEG ?
   Is qt2mpeg on the Web ?
   What are other good URL's ?
SECTION 8. - MAIL ORDER
   The Internet MPEG CD-Rom
   Conversion, WWW and CD-Rom production service
   How can I order information from C-CUBE ?
SECTION 9. - ADDITIONAL INFORMATION
   What are the MPEG standard documents ?
   So, the Xing decoder is cheating, right ?
   What is Aware Inc. doing ?
   Will MPEG be included in QuickTime ?
   What's about MPEG-2 software ?
   What about good MPEG Hardware encoders (Optivision) ?
   What's about CD-I ?
   What is the PCMotion Player ?
   What is the MPEG-2 ISO number ?
   Some papers about MPEG-audio
   Where can I find more documents about what Berkeley is doing ?
   Is there a book about MPEG ?
   Who are CD-I producers ?
   Where can I get VideoCD and CD-I coding ?
   Where can I do MPEG encoding ?
   What the problem with all these file extensions for MPEG-files ?
   How can I do RTP encapsulation of MPEG1/MPEG2 ?
   Wo kann ich den MPEG-standard bestellen ?
SECTION 10. - WHERE TO FIND MORE INFOS
   What newsgroups discuss MPEG ?
   How can 'archie' help me ?
SECTION 11. - QUESTIONS

===========================================================================

~Subject: SECTION 1. - WHAT IS MPEG-VIDEO/VIDEO

-------------------------------------------------------------------------------

~Subject: What is MPEG ?

From comp.compression Mon Oct 19 15:38:38 1992
Sender: [email protected]
Author: Mark Adler <[email protected]>

[71] Introduction to MPEG (long)
      What is MPEG?
      Does it have anything to do with JPEG?
      Then what's JBIG and MHEG?
      What has MPEG accomplished?
      So how does MPEG I work?
      What about the audio compression?
      So how much does it compress?
      What's phase II?
      When will all this be finished?
      How do I join MPEG?
      How do I get the documents, like the MPEG I standard?

[ There is no newer version of this part so far. Whoever wants to update ]
[ this description, should do the job and send it over.                  ]

Written by Mark Adler <[email protected]>.

Q. What is MPEG?
A. MPEG is a group of people that meet under ISO (the International
  Standards Organization) to generate standards for digital video
  (sequences of images in time) and audio compression.  In particular,
  they define a compressed bit stream, which implicitly defines a
  decompressor.  However, the compression algorithms are up to the
  individual manufacturers, and that is where proprietary advantage
  is obtained within the scope of a publicly available international
  standard.  MPEG meets roughly four times a year for roughly a week
  each time.  In between meetings, a great deal of work is done by
  the members, so it doesn't all happen at the meetings.  The work
  is organized and planned at the meetings.

Q. So what does MPEG stand for?
A. Moving Pictures Experts Group.

Q. Does it have anything to do with JPEG?
A. Well, it sounds the same, and they are part of the same subcommittee
  of ISO along with JBIG and MHEG, and they usually meet at the same
  place at the same time.  However, they are different sets of people
  with few or no common individual members, and they have different
  charters and requirements.  JPEG is for still image compression.

Q. Then what's JBIG and MHEG?
A. Sorry I mentioned them. Ok, I'll simply say that JBIG is for binary
  image compression (like faxes), and MHEG is for multi-media data
  standards (like integrating stills, video, audio, text, etc.).
  For an introduction to JBIG, see question 74 below.

Q. Ok, I'll stick to MPEG.  What has MPEG accomplished?
A. So far (as of January 1996), they have completed the "Standard
  of MPEG phase I, colloquially called MPEG I. This defines
  a bit stream for compressed video and audio optimized to fit into
  a bandwidth (data rate) of 1.5 Mbits/s. This rate is special
  because it is the data rate of (uncompressed) audio CD's and DAT's.
  The standard is in three parts, video, audio, and systems, where the
  last part gives the integration of the audio and video streams
  with the proper timestamping to allow synchronization of the two.
  They have also gotten well into MPEG phase II, whose task is to
  define a bitstream for video and audio coded at around 3 to 10
  Mbits/s.

Q. So how does MPEG I work?
A. First off, it starts with a relatively low resolution video
  sequence (possibly decimated from the original) of about 352 by
  240 frames by 30 frames/s (US--different numbers for Europe),
  but original high (CD) quality audio.  The images are in color,
  but converted to YUV space, and the two chrominance channels
  (U and V) are decimated further to 176 by 120 pixels.  It turns
  out that you can get away with a lot less resolution in those
  channels and not notice it, at least in "natural" (not computer
  generated) images.

<IMG SRC="yuv411.gif">

<IMG SRC="yuv422.gif">

<IMG SRC="yuv444.gif">

  The basic scheme is to predict motion from frame to frame in the
  temporal direction, and then to use DCT's (discrete cosine
  transforms) to organize the redundancy in the spatial directions.
  The DCT's are done on 8x8 blocks, and the motion prediction is
  done in the luminance (Y) channel on 16x16 blocks.  In other words,
  given the 16x16 block in the current frame that you are trying to
  code, you look for a close match to that block in a previous or
  future frame (there are backward prediction modes where later
  frames are sent first to allow interpolating between frames).
  The DCT coefficients (of either the actual data, or the difference
  between this block and the close match) are "quantized", which
  means that you divide them by some value to drop bits off the
  bottom end.  Hopefully, many of the coefficients will then end up
  being zero.  The quantization can change for every "macroblock"
  (a macroblock is 16x16 of Y and the corresponding 8x8's in both
  U and V).  The results of all of this, which include the DCT
  coefficients, the motion vectors, and the quantization parameters
  (and other stuff) is Huffman coded using fixed tables.  The DCT
  coefficients have a special Huffman table that is "two-dimensional"
  in that one code specifies a run-length of zeros and the non-zero
  value that ended the run.  Also, the motion vectors and the DC
  DCT components are DPCM (subtracted from the last one) coded.

Q. So is each frame predicted from the last frame?
A. No.  The scheme is a little more complicated than that.  There are
  three types of coded frames.  There are "I" or intra frames.  They
  are simply a frame coded as a still image, not using any past
  history.  You have to start somewhere.  Then there are "P" or
  predicted frames.  They are predicted from the most recently
  reconstructed I or P frame.  (I'm describing this from the point
  of view of the decompressor.)  Each macroblock in a P frame can
  either come with a vector and difference DCT coefficients for a
  close match in the last I or P, or it can just be "intra" coded
  (like in the I frames) if there was no good match.

  Lastly, there are "B" or bidirectional frames.  They are predicted
  from the closest two I or P frames, one in the past and one in the
  future.  You search for matching blocks in those frames, and try
  three different things to see which works best.  (Now I have the
  point of view of the compressor, just to confuse you.)  You try using
  the forward vector, the backward vector, and you try averaging the
  two blocks from the future and past frames, and subtracting that from
  the block being coded.  If none of those work well, you can intra-
  code the block.

  The sequence of decoded frames usually goes like:

  IBBPBBPBBPBBIBBPBBPB...

  Where there are 12 frames from I to I (for US and Japan anyway.)
  This is based on a random access requirement that you need a
  starting point at least once every 0.4 seconds or so.  The ratio
  of P's to B's is based on experience.

  Of course, for the decoder to work, you have to send that first
  P *before* the first two B's, so the compressed data stream ends
  up looking like:

  0xx312645...

  where those are frame numbers.  xx might be nothing (if this is
  the true starting point), or it might be the B's of frames -2 and
  -1 if we're in the middle of the stream somewhere.

  You have to decode the I, then decode the P, keep both of those
  in memory, and then decode the two B's.  You probably display the
  I while you're decoding the P, and display the B's as you're
  decoding them, and then display the P as you're decoding the next
  P, and so on.

Q. You've got to be kidding.
A. No, really!

Q. Hmm.  Where did they get 352x240?
A. That derives from the CCIR-601 digital television standard which
  is used by professional digital video equipment.  It is (in the US)
  720 by 243 by 60 fields (not frames) per second, where the fields
  are interlaced when displayed.  (It is important to note though
  that fields are actually acquired and displayed a 60th of a second
  apart.)  The chrominance channels are 360 by 243 by 60 fields a
  second, again interlaced.  This degree of chrominance decimation
  (2:1 in the horizontal direction) is called 4:2:2.  The source
  input format for MPEG I, called SIF, is CCIR-601 decimated by 2:1
  in the horizontal direction, 2:1 in the time direction, and an
  additional 2:1 in the chrominance vertical direction.  And some
  lines are cut off to make sure things divide by 8 or 16 where
  needed.

Q. What if I'm in Europe?
A. For 50 Hz display standards (PAL, SECAM) change the number of lines
  in a field from 243 or 240 to 288, and change the display rate to
  50 fields/s or 25 frames/s.  Similarly, change the 120 lines in
  the decimated chrominance channels to 144 lines.  Since 288*50 is
  exactly equal to 240*60, the two formats have the same source data
  rate.

Q. You didn't mention anything about the audio compression.
A. Oh, right.  Well, I don't know as much about the audio compression.
  Basically they use very carefully developed psychoacoustic models
  derived from experiments with the best obtainable listeners to
  pick out pieces of the sound that you can't hear.  There are what
  are called "masking" effects where, for example, a large component
  at one frequency will prevent you from hearing lower energy parts
  at nearby frequencies, where the relative energy vs. frequency
  that is masked is described by some empirical curve.  There are
  similar temporal masking effects, as well as some more complicated
  interactions where a temporal effect can unmask a frequency, and
  vice-versa.

  The sound is broken up into spectral chunks with a hybrid scheme
  that combines sine transforms with subband transforms, and the
  psychoacoustic model written in terms of those chunks.  Whatever
  can be removed or reduced in precision is, and the remainder is
  sent.  It's a little more complicated than that, since the bits
  have to be allocated across the bands.  And, of course, what is
  sent is entropy coded.

Q. So how much does it compress?
A. As I mentioned before, audio CD data rates are about 1.5 Mbits/s.
  You can compress the same stereo program down to 256 Kbits/s with
  no loss in discernable quality.  (So they say.  For the most part
  it's true, but every once in a while a weird thing might happen
  that you'll notice.  However the effect is very small, and it takes
  a listener trained to notice these particular types of effects.)
  That's about 6:1 compression.  So, a CD MPEG I stream would have
  about 1.25 MBits/s left for video.  The number I usually see though
  is 1.15 MBits/s (maybe you need the rest for the system data
  stream).  You can then calculate the video compression ratio from
  the numbers here to be about 26:1.  If you step back and think
  about that, it's little short of a miracle.  Of course, it's lossy
  compression, but it can be pretty hard sometimes to see the loss,
  if you're comparing the SIF original to the SIF decompressed.  There
  is, however, a very noticeable loss if you're coming from CCIR-601
  and have to decimate to SIF, but that's another matter.  I'm not
  counting that in the 26:1.

  The standard also provides for other bit rates ranging from 32Kbits/s
  for a single channel, up to 448 Kbits/s for stereo.

Q. What's phase II?
A. As I said, there is a considerable loss of quality in going from
  CCIR-601 to SIF resolution.  For entertainment video, it's simply
  not acceptable.  You want to use more bits and code all or almost
  all the CCIR-601 data.  From subjective testing at the Japan
  meeting in November 1991, it seems that 4 MBits/s can give very
  good quality compared to the original CCIR-601 material.  The
  objective of phase II is to define a bit stream optimized for these
  resolutions and bit rates.

Q. Why not just scale up what you're doing with MPEG I?
A. The main difficulty is the interlacing.  The simplest way to extend
  MPEG I to interlaced material is to put the fields together into
  frames (720x486x30/s).  This results in bad motion artifacts that
  stem from the fact that moving objects are in different places
  in the two fields, and so don't line up in the frames.  Compressing
  and decompressing without taking that into account somehow tends to
  muddle the objects in the two different fields.

  The other thing you might try is to code the even and odd field
  streams separately.  This avoids the motion artifacts, but as you
  might imagine, doesn't get very good compression since you are not
  using the redundancy between the even and odd fields where there
  is not much motion (which is typically most of image).

  Or you can code it as a single stream of fields.  Or you can
  interpolate lines.  Or, etc. etc.  There are many things you can
  try, and the point of MPEG II is to figure out what works well.
  MPEG II is not limited to consider only derivations of MPEG I.
  There were several non-MPEG I-like schemes in the competition in
  November, and some aspects of those algorithms may or may not
  make it into the final standard for entertainment video compression.

Q. So what works?
A. Basically, derivations of MPEG I worked quite well, with one that
  used wavelet subband coding instead of DCT's that also worked very
  well.  Also among the worked-very-well's was a scheme that did not
  use B frames at all, just I and P's.  All of them, except maybe one,
  did some sort of adaptive frame/field coding, where a decision is
  made on a macroblock basis as to whether to code that one as one
  frame macroblock or as two field macroblocks.  Some other aspects
  are how to code I-frames--some suggest predicting the even field
  from the odd field.  Or you can predict evens from evens and odds
  or odds from evens and odds or any field from any other field, etc.

Q. So what works?
A. Ok, we're not really sure what works best yet.  The next step is
  to define a "test model" to start from, that incorporates most of
  the salient features of the worked-very-well proposals in a
  simple way.  Then experiments will be done on that test model,
  making a mod at a time, and seeing what makes it better and what
  makes it worse.  Example experiments are, B's or no B's, DCT vs.
  wavelets, various field prediction modes, etc.  The requirements,
  such as implementation cost, quality, random access, etc. will all
  feed into this process as well.

Q. When will all this be finished?
A. I don't know.  I'd have to hope in about a year or less.

Q. How do I join MPEG?
A. You don't join MPEG.  You have to participate in ISO as part of a
  national delegation.  How you get to be part of the national
  delegation is up to each nation.  I only know the U.S., where you
  have to attend the corresponding ANSI meetings to be able to
  attend the ISO meetings.  Your company or institution has to be
  willing to sink some bucks into travel since, naturally, these
  meetings are held all over the world.  (For example, Paris,
  Santa Clara, Kurihama Japan, Singapore, Haifa Israel, Rio de
  Janeiro, London, etc.)

Q. Well, then how do I get the documents, like the MPEG I standard ?
A. MPEG is a ISO standard. It's exact name is ISO CD 11172.
  The standard consists of three parts: System, Video, and Audio. The
  System part (11172-1) deals with synchronization and multiplexing
  of audio-visual information, while the Video (11172-2) and Audio
  part (11172-3) address the video and the audio compression techniques
  respectively.

  You may order it from your national standards body (e.g. ANSI in
  the USA) or buy it from companies like
    OMNICOM
    phone +44 438 742424
    FAX +44 438 740154

  Or from 'ISO Online' at http://www.iso.ch/welcome.html

-------------------------------------------------------------------------------

~Subject: What is MPEG-Audio then ?

From: "Harald Popp" <[email protected]>
From: [email protected]
Date:          Fri, 25 Mar 1994 19:09:06 +0100

Q.      What is MPEG?
A.      MPEG is an ISO committee that proposes standards for
       compression of Audio and Video. MPEG deals with 3 issues:
       Video, Audio, and System (the combination of the two into one
       stream). You can find more info on the MPEG committee in other
       parts of this document.

Q.      I've heard about MPEG Video. So this is the same compression
       applied to audio?
A.      Definitely no. The eye and the ear... even if they are only a
       few centimeters apart, works very differently... The ear has
       a much higher dynamic range and resolution. It can pick out
       more details but it is "slower" than the eye.
       The MPEG committee chose to recommend 3 compression methods
       and named them Audio Layer-1, Layer-2, and Layer-3.

Q.      What does it mean exactly?
A.      MPEG-1, IS 11172-3, describes the compression of audio
       signals using high performance perceptual coding schemes.
       It specifies a family of three audio coding schemes,
       simply called Layer-1,-2,-3, with increasing encoder
       complexity and performance (sound quality per bitrate).
       The three codecs are compatible in a hierarchical
       way, i.e. a Layer-N decoder is able to decode bitstream data
       encoded in Layer-N and all Layers below N (e.g., a Layer-3
       decoder may accept Layer-1,-2 and -3, whereas a Layer-2
       decoder may accept only Layer-1 and -2.)

Q.      So we have a family of three audio coding schemes. What does
       the MPEG standard define, exactly?
A.      For each Layer, the standard specifies the bitstream format
       and the decoder. It does *not* specify the encoder to
       allow for future improvements, but an informative chapter
       gives an example for an encoder for each Layer.

Q.      What have the three audio Layers in common?
A.      All Layers use the same basic structure. The coding scheme can
       be described as "perceptual noise shaping" or "perceptual
       subband / transform coding".
       The encoder analyzes the spectral components of the audio
       signal by calculating a filterbank or transform and applies
       a psychoacoustic model to estimate the just noticeable
       noise-level. In its quantization and coding stage, the
       encoder tries to allocate the available number of data
       bits in a way to meet both the bitrate and masking
       requirements.
       The decoder is much less complex. Its only task is to
       synthesize an audio signal out of the coded spectral
       components.
       All Layers use the same analysis filterbank (polyphase with
       32 subbands). Layer-3 adds a MDCT transform to increase
       the frequency resolution.
       All Layers use the same "header information" in their
       bitstream, to support the hierarchical structure of the
       standard.
       All Layers use a bitstream structure that contains parts that
       are more sensitive to biterrors ("header", "bit
       allocation", "scalefactors", "side information") and parts
       that are less sensitive ("data of spectral components").
       All Layers may use 32, 44.1 or 48 kHz sampling frequency.
       All Layers are allowed to work with similar bitrates:
       Layer-1: from 32 kbps to 448 kbps
       Layer-2: from 32 kbps to 384 kbps
       Layer-3: from 32 kbps to 320 kbps

Q.      What are the main differences between the three Layers, from a
       global view?
A.      From Layer-1 to Layer-3,
       complexity increases (mainly true for the encoder),
       overall codec delay increases, and
       performance increases (sound quality per bitrate).

Q.      Which Layer should I use for my application?
A.      Good Question. Of course, it depends on all your requirements.
       But as a first approach, you should consider the available
       bitrate of your application as the Layers have been
       designed to support certain areas of bitrates most
       efficiently, i.e. with a minimum drop of sound quality.
       Let us look a little closer at the strong domains of each
       Layer.

       Layer-1: Its ISO target bitrate is 192 kbps per audio
       channel.
       Layer-1 is a simplified version of Layer-2. It is most useful
       for bitrates around the "high" bitrates around or above
       192 kbps. A version of Layer-1 is used as "PASC" with the
       DCC recorder.

       Layer-2: Its ISO target bitrate is 128 kbps per audio
       channel.
       Layer-2 is identical with MUSICAM. It has been designed as
       trade-off between sound quality per bitrate and encoder
       complexity. It is most useful for bitrates around the
       "medium" bitrates of 128 or even 96 kbps per audio
       channel. The DAB (EU 147) proponents have decided to use
       Layer-2 in the future Digital Audio Broadcasting network.

       Layer-3: Its ISO target bitrate is 64 kbps per audio channel.
       Layer-3 merges the best ideas of MUSICAM and ASPEC. It has
       been designed for best performance at "low" bitrates
       around 64 kbps or even below. The Layer-3 format specifies
       a set of advanced features that all address one goal: to

       preserve as much sound quality as possible even at rather
       low bitrates. Today, Layer-3 is already in use in various
       telecommunication networks (ISDN, satellite links, and so
       on) and speech announcement systems.

Q.      So how does MPEG audio work?
A.      Well, first you need to know how sound is stored in a
       computer. Sound is pressure differences in air. When picked up
       by a microphone and fed through an amplifier this becomes
       voltage levels. The voltage is sampled by the computer a
       number of times per second. For CD audio quality you need to
       sample 44100 times per second and each sample has a resolution
       of 16 bits. In stereo this gives you 1,4Mbit per second
       and you can probably see the need for compression.

       To compress audio MPEG tries to remove the irrelevant parts
       of the signal and the redundant parts of the signal. Parts of
       the sound that we do not hear can be thrown away. To do this
       MPEG Audio uses psychoacoustic principles.

Q.      Tell me more about sound quality. How good is MPEG audio
       compression? And how do you assess that?
A.      Today, there is no alternative to expensive listening tests.
       During the ISO-MPEG-1 process, 3 international listening tests
       have been performed, with a lot of trained listeners,
       supervised by Swedish Radio. They took place in 7.90, 3.91
       and 11.91. Another international listening test was
       performed by CCIR, now ITU-R, in 92.
       All these tests used the "triple stimulus, hidden reference"
       method and the so-called CCIR impairment scale to assess the
       audio quality.
       The listening sequence is "ABC", with A = original, BC = pair
       of original / coded signal with random sequence, and the
       listener has to evaluate both B and C with a number
       between 1.0 and 5.0. The meaning of these values is:
       5.0 = transparent (this should be the original signal)
       4.0 = perceptible, but not annoying (first differences
             noticable)
       3.0 = slightly annoying
       2.0 = annoying
       1.0 = very annoying
       With perceptual codecs (like MPEG audio), all traditional
       parameters (like SNR, THD+N, bandwidth) are especially
       useless.

       Fraunhofer-IIS (among others) works on objective quality
       assessment tools, like the NMR meter (Noise-to-Mask-Ratio),
       too. If you need more informations about NMR, please
       contact [email protected]

Q.      Now that I know how to assess quality, come on, tell me the
       results of these tests.
A.      Well, for details you should study one of those AES papers
       listed below. One main result is that for low bitrates (60
       or 64 kbps per channel, i.e. a compression ratio of around
       12:1), Layer-2 scored between 2.1 and 2.6, whereas Layer-3
       scored between 3.6 and 3.8.
       This is a significant increase in sound quality, indeed!
       Furthermore, the selection process for critical sound material
       showed that it was rather difficult to find worst-case
       material for Layer-3 whereas it was not so hard to find
       such items for Layer-2.
       For medium and high bitrates (120 kbps or more per channel),
       Layer-2 and Layer-3 scored rather similar, i.e. even
       trained listeners found it difficult to detect differences
       between original and reconstructed signal.

Q.      So how does MPEG achieve this compression ratio?
A.      Well, with audio you basically have two alternatives. Either
       you sample less often or you sample with less resolution (less
       than 16 bit per sample). If you want quality you can't do much
       with the sample frequency. Humans can hear sounds with
       frequencies from about 20Hz to 20kHz. According to the Nyquist
       theorem you must sample at least two times the highest
       frequency you want to reproduce. Allowing for imperfect
       filters, a 44,1kHz sampling rate is a fair minimum. So
       you either set out to prove the Nyquist theorem is wrong or
       go to work on reducing the resolution. The MPEG committee
       chose the latter.
       Now, the real reason for using 16 bits is to get a good
       signal-to-noise (s/n) ratio. The noise we're talking
       about here is quantization noise from the digitizing
       process. For each bit you add, you get 6dB
       better s/n. (To the ear, 6dBu corresponds to a doubling of
       the sound level.) CD-audio achieves about 90dB s/n. This
       matches the dynamic range of the ear fairly well. That is, you
       will not hear any noise coming from the system itself (well,
       there is still some people arguing about that, but lets not
       worry about them for the moment).
       So what happens when you sample to 8 bit resolution? You get
       a very noticeable noise floor in your recording. You can
       easily hear this in silent moments in the music or between
       words or sentences if your recording is a human voice.
       Waitaminnit. You don't notice any noise in loud passages,
       right? This is the masking effect and is the key to MPEG Audio
       coding. Stuff like the masking effect belongs to a science
       called psycho-acoustics that deals with the way the human
       brain perceives sound.
       And MPEG uses psychoacoustic principles when it does its
       thing.

Q.      Explain this masking effect.
A.      OK, say you have a strong tone with a frequency of 1000Hz.
       You also have a tone nearby of say 1100Hz. This second tone is
       18 dB lower. You are not going to hear this second tone. It is
       completely masked by the first 1000Hz tone. As a matter of
       fact, any relatively weak sounds near a strong sound is
       masked. If you introduce another tone at 2000Hz also 18 dB
       below the first 1000Hz tone, you will hear this.
       You will have to turn down the 2000Hz tone to something like
       45 dB below the 1000Hz tone before it will be masked by the
       first tone. So the further you get from a sound the less
       masking effect it has.
       The masking effect means that you can raise the noise floor
       around a strong sound because the noise will be masked anyway.
       And raising the noise floor is the same as using less bits
       and using less bits is the same as compression. Do you get it?

Q.      I don't get it.
A.      Well, let me try to explain how the MPEG Audio Layer-2 encoder
       goes about its thing. It divides the frequency spectrum (20Hz
       to 20kHz) into 32 subbands. Each subband holds a little slice
       of the audio spectrum. Say, in the upper region of subband 8,
       a 6500Hz tone with a level of 60dB is present. OK, the
       coder calculates the masking effect of this sound and finds
       that there is a masking threshold for the entire 8th
       subband (all sounds w. a frequency...) 35dB below this tone.
       The acceptable s/n ratio is thus 60 - 35 = 25 dB. The equals 4
       bit resolution. In addition there are masking effects on band
       9-13 and on band 5-7, the effect decreasing with the distance
       from band 8.
       In a real-life situation you have sounds in most bands and the
       masking effects are additive. In addition the coder considers
       the sensitivity of the ear for various frequencies. The ear
       is a lot less sensitive in the high and low frequencies. Peak
       sensivity is around 2 - 4kHz, the same region that the human
       voice occupies.
       The subbands should match the ear, that is each subband should
       consist of frequencies that have the same psychoacoustic
       properties. In MPEG Layer 2, each subband is 750Hz wide
       (with 48 kHz sampling frequency). It would have been better if
       the subbands were narrower in the low frequency range and
       wider in the high frequency range. That is the trade-off
       Layer-2 took in favour of a simpler approach.
       Layer-3 has a much higher frequency resolution (18 times
       more) - and that is one of the reasons why Layer-3 has a much
       better low bitrate performance than Layer-2.
       But there is more to it. I have explained concurrent masking,
       but the masking effect also occurs before and after a strong
       sound (pre- and postmasking).

Q.      Before?
A.      Yes, if there is a significant (30 - 40dB ) shift in level.
       The reason is believed to be that the brain needs some
       processing time. Premasking is only about 2 to 5 ms. The
       postmasking can be up till 100ms.
       Other bit-reduction techniques involve considering tonal and
       non-tonal components of the sound. For a stereo signal you
       may have a lot of redundancy between channels. All MPEG
       Layers may exploit these stereo effects by using a "joint-
       stereo" mode, with a most flexible approach for Layer-3.
       Furthermore, only Layer-3 further reduces the redundancy
       by applying huffmann coding.

Q.      What are the downside?
A.      The coder calculates masking effects by an iterative process
       until it runs out of time. It is up to the implementor to
       spend bits in the least obtrusive fashion.
       For Layer 2 and Layer 3, the encoder works on 24 ms of sound
       (with 1152 sample, and fs = 48 kHz) at a time. For some
       material, the time-window can be a problem. This is
       normally in a situation with transients where there are large
       differences in sound level over the 24 ms. The masking is
       calculated on the strongest sound and the weak parts will
       drown in quantization noise. This is perceived as a "noise-
       echo" by the ear. Layer 3 addresses this problem
       specifically by using a smaller analysis window (4 ms), if
       the encoder encounters an "attack" situation.

Q.      Tell me about the complexity. What are the hardware demands?

A.      Alright. First, we have to separate between decoder and
       encoder.
       Remember: the MPEG coding is done asymmetrical, with a much
       larger workload on the encoder than on the decoder.
       For a stereo decoder, variuos real-time implementations exist
       for Layer-2 and Layer-3. They are either based on single-DSP
       solutions or on dedicated MPEG audio decoder chips. So
       you need not worry about decoder complexity.
       For a stereo Layer-2-encoder, various DSP based solutions with
       one or more DSPs exist (with different quality, also).
       For a stereo Layer-3-encoder achieving ISO reference quality,
       the current real-time implementations use two DSP32C and
       two DSP56002.

Q.      How many audio channels?
A.      MPEG-1 allows for two audio channels. These can be either
       single (mono), dual (two mono channels), stereo or
       joint stereo (intensity stereo (Layer-2 and Layer-3) or m/s-
       stereo (Layer-3 only)).
       In normal (l/r) stereo one channel carries the left audio
       signal and one channel carries the right audio signal. In
       m/s stereo one channel carries the sum signal (l+r) and the
       other the difference (l-r) signal. In intensity stereo the
       high frequency part of the signal (above 2kHz) is combined.
       The stereo image is preserved but only the temporal envelope
       is transmitted.
       In addition MPEG allows for pre-emphasis, copyright marks and
       original/copy marks. MPEG-2 allows for several channels in
       the same stream.

Q.      What about the audio codec delay?
A.      Well, the standard gives some figures of the theoretical
       minimum delay:
       Layer-1: 19 ms (<50 ms)
       Layer-2: 35 ms (100 ms)
       Layer-3: 59 ms (150 ms)
       The practical values are significantly above that. As they
       depend on the implementation, exact figures are hard to
       give. So the figures in brackets are just rough thumb
       values.
       Yes, for some applications, a very short delay is of critical
       importance. E.g. in a feedback link, a reporter can only talk
       intelligibly if the overall delay is below around 10 ms.
       If broadcasters want to apply MPEG audio coding, they have to
       use "N-1" switches in the studio to overcome this problem
       (or appropriate echo-cancellers) - or they have to forget
       about MPEG at all.
       But with most applications, these figures are small enough to
       present no extra problem. At least, if one can accept a Layer-
       2 delay, one can most likely also accept the higher Layer-3
       delay.

Q.     OK, I am hooked on! Where can I find more technical
      informations about MPEG audio coding, especially about Layer-
      3?
A.     Well, there is a variety of AES papers, e.g.

      K. Brandenburg, G. Stoll, ...: "The ISO/MPEG-Audio Codec: A
      Generic Standard for Coding of High Quality Digital Audio",
      92nd AES, Vienna 1992, pp.3336

      E. Eberlein, H. Popp, ...: "Layer-3, a Flexible Coding
      Standard",    94th AES, Berlin 93, pp.3493

      K. Brandenburg, G. Zimmer, ...: "Variable Data-Rate Recording
      on a PC Using MPEG-Audio Layer-3", 95th AES, New York 93

      B. Grill, J. Herre,... : "Improved MPEG-2 Audio Multi-Channel
      Encoding", 96th AES, Amsterdam 94

      And for further informations, please contact [email protected]

Q.     Where can I get more details about MPEG audio?
A.     Still more details? No shit. You can get the full ISO spec
      from Omnicom. The specs do a fairly good job of obscuring
      exactly how these things are supposed to work... Jokes aside,
      there are no description of the coder in the specs. The specs
      describes in great detail the bitstream and suggests
      psychoacoustic models.

Originally written by Morten Hjerde <100034,[email protected]>,
modified and updated by Harald Popp ([email protected]).

Harald Popp
Audio & Multimedia ("Music is the *BEST*" - F. Zappa)
Fraunhofer-IIS-A, Weichselgarten 3, D-91058 Erlangen, Germany
Phone: +49-9131-776-340
Fax:   +49-9131-776-399
email: [email protected]

-------------------------------------------------------------------------------

~Subject: What is the Audio Layer 3 then ?

Informations about MPEG Audio Layer-3
Version 1.51 - 1. 95

This text is organized as a kind of Mini-FAQ (Frequently Asked
Questions). It covers several topics:

1. ISO-MPEG Standard
2. MPEG Audio Codec Family ("Layer 1, 2, 3")
3. Applications
4. Products
5. Support by Fraunhofer-IIS
6. Shareware Information

For further comments and questions regarding Layer-3, please contact:
-       [email protected]

For further informations about MPEG, you may also like to contact:
-       [email protected]


1. ISO-MPEG Standard

Q: What is MPEG, exactly?
A: MPEG is the "Moving Picture Experts Group", working under the joint
direction of the International Standards Organization (ISO) and the
International Electro-Technical Commission (IEC). This group works on
standards for the coding of moving pictures and associated audio.

Q: What is the status of MPEG's work, then? What about MPEG-1, -2, and so
on?
A: MPEG approaches the growing need for multimedia standards step-by-
step. Today, three "phases" are defined:

MPEG-1:"Coding of Moving Pictures and Associated Audio for
Digital Storage Media at up to about 1.5 MBit/s"
Status: International Standard IS-11172, completed in 10.92

MPEG-2:"Generic Coding of Moving Pictures and Associated
Audio"
Status: International Standard IS-13818, completed in 11.94

MPEG-3: does no longer exist (has been merged into MPEG-2)

MPEG-4: "Very Low Bitrate Audio-Visual Coding"
Status: Call for Proposals first deadline 1. 10. 95

Q: MPEG-1 and MPEG-2 are  ready-for-use. How do the standards look like?
A: Both standards consist of 4 main parts.
The structure is the same for MPEG-1 and MPEG-2.
-1: System      describes synchronization and multiplexing of video and audio
-2: Video describes compression of video signals
-3: Audio describes compression of audio signals
-4: Compliance Testing describes procedures for determining the characteristics
of coded bitstreams and the decoding process and for testing compliance with
the requirements stated in the other parts.

Q: How do I get the MPEG documents?
A: You order it from your national standards body.
E.g., in Germany, please contact:
DIN-Beuth Verlag, Auslandsnormen
Mrs. Niehoff, Burggrafenstr. 6, D-10772 Berlin, Germany
Phone: +49-30-2601-2757, Fax: +49-30-2601-1231


2. MPEG Audio Codec Family ("Layer 1, 2, 3")

Q: Talking about MPEG audio coding, I heard a lot about "Layer 1, 2 and 3".
What does it mean, exactly?
A: MPEG describes the compression of audio signals using high performance
perceptual coding schemes. It specifies a family of three audio coding
schemes, simply called Layer-1,-2,-3, with increasing encoder complexity
and performance (sound quality per bitrate) from 1 to 3.
The three codecs are compatible in a hierarchical way, i.e. a Layer-N
decoder is able to decode bitstream data encoded in Layer-N and all Layers
below N (e.g., a Layer-3 decoder may accept Layer-1,-2 and -3, whereas a
Layer-2 decoder may accept only Layer-1 and -2.)

Q: So we have a family of three audio coding schemes. What does the MPEG
standard define, exactly?
A: For each Layer, the standard specifies the bitstream format and the
decoder. To allow for future improvements, it does *not* specify the
encoder, but an informative chapter gives an example for an encoder for
each Layer.

Q: What have the three audio Layers in common?
A: All Layers use the same basic structure. The coding scheme can be
described as "perceptual noise shaping" or "perceptual subband / transform
coding".
The encoder analyzes the spectral components of the audio signal by
calculating a filterbank or transform and applies a psychoacoustic model
to estimate the just noticeable noise-level. In its quantization and coding
stage, the encoder tries to allocate the available number of data bits in a
way to meet both the bitrate and masking requirements.
The decoder is much less complex. Its only task is to synthesize an audio
signal out of the coded spectral components.
All Layers use the same analysis filterbank (polyphase with 32 subbands).
Layer-3 adds a MDCT transform to increase the frequency resolution.
All Layers use the same "header information" in their bitstream, to support
the hierarchical structure of the standard.
All Layers have a similar sensitivity to biterrors. They use a bitstream
structure that contains parts that are more sensitive to biterrors ("header",
"bit allocation", "scalefactors", "side information") and parts that
are less sensitive ("data of spectral components").
All Layers support the insertion of programm-associated information
("ancillary data") into their audio data bitstream.
All Layers may use 32, 44.1 or 48 kHz sampling frequency.
All Layers are allowed to work with similar bitrates:
Layer-1: from 32 kbps to 448 kbps
Layer-2: from 32 kbps to 384 kbps
Layer-3: from 32 kbps to 320 kbps
The last two statements refer to MPEG-1; with MPEG-2, there is an
extension for the sampling frequencies and bitrates (see below).

Q: What are the main differences between the three Layers, from a global
view?
A: From Layer-1 to Layer-3,
complexity increases (mainly true for the encoder),
overall codec delay increases, and
performance increases (sound quality per bitrate).

Q: What are the main differences between MPEG-1 and MPEG-2 in the audio
part?
A: MPEG-1 and MPEG-2 use the same family of audio codecs, Layer-1, -2
and -3. The new audio features of MPEG-2 are:
"low sample rate extension" to address very low bitrate applications
with limited bandwidth requirements (the new sampling frequencies
are 16, 22.05 or 24 kHz, the bitrates extend down to 8 kbps),
"multichannel extension" to address surround sound applications
with up to 5 main audio channels (left, center, right, left surround,
right surround) and optionally 1 extra "low frequency enhancement
(LFE)" channel for subwoofer signals; in addition, a "multilingual
extension" allows the inclusion of up to 7 more audio channels.

Q: A lot of new stuff! Is this all compatible to each other?
A: Well, more or less, yes - with the execption of the low sample rate
extension. Obviously, a pure MPEG-1 decoder is not able to handle the
new "half" sample rates.

Q: You mean: compatible!? With all these extra audio channels? Please
explain!
A: Compatibility has been a major topic during the MPEG-2 definition phase.
The main idea is to use the same basic bitstream format as defined in
MPEG-1, with the main data field carrying two audio signals (called L0
and R0) as before, and the ancillary data field carrying the multichannel
extension information. Without going further into details, three terms can
be explained here:
"forwards compatible": the MPEG-2 decoder has to accept any
MPEG-1 audio bitstream (that represents one or two audio channels)
"backwards compatible": the MPEG-1 decoder should be able to
decode the audio signals in the main data field (L0 and R0) of the
MPEG-2 bitstream
"Matrixing" may be used to get the surround information into L0 and
R0:
L0 = left signal + a * center signal + b * left surround signal
R0 = right signal + a * center signal + b * right surround signal
Therefore, a MPEG-1 decoder can reproduce a comprehensive downmix of
the full 5-channel information. A MPEG-2 decoder uses the multichannel
extension information (3 more audio signals) to reconstruct the five
surround channels.

Q: I heard something about a new NBC mode for MPEG-2 audio? What does
it mean?
A: "NBC" stands for "non-backwards compatible". During the development
of the backwards compatible MPEG-2 standard, the experts encountered
some trouble with the compatibility matrix. The introduced quantisation
noise may become audible after dematrixing. Although some clever
strategies have been devised to overcome this problem, the question
remained how much better a non-compatible multichannel codec might
perform.
So ISO-MPEG decided to address that issue in a "NBC" working group -
among the proponents are AT&T, Dolby, Fraunhofer, IRT, Philips, and
Sony. Their work will lead to an addendum to the MPEG-2 standard
(13818-8).

Q: O.K., that should do for a first overview. Are there some papers for a more
detailed information?
A: Sure! You'll find more technical informations about MPEG audio coding
in a variety of AES papers (AES = Audio Engineering Society). The AES
organizes two conventions per year, and perceptual audio coding has been
a topic since the middle of the 80s. Some interesting papers might be:

K. Brandenburg, G. Stoll, et al.: "The ISO/MPEG-Audio Codec: A
Generic Standard for Coding of High Quality Digital Audio", 92nd
AES, Vienna Mar. 92, pp. 3336; revised version ("ISO-MPEG-1
Audio: A Generic Standard...") published in the Journal of AES,
Vol.42, No. 10, Oct. 94

S. Church, B. Grill, et al.: "ISDN and ISO/MPEG Layer-3 Audio
Coding: Powerful New tools for Broadcast and Audio Production",
95th AES, New York Oct. 93, pp. 3743

E. Eberlein, H. Popp, et al.: "Layer-3, a Flexible Coding Standard",
94th AES, Berlin Mar. 93, pp. 3493

B. Grill, J. Herre, et al.: "Improved MPEG-2 Audio Multi-Channel
Encoding", 96th AES, Amsterdam Feb. 94, pp. 3865

J. Herre, K. Brandenburg, et al.: "Second Generation ISO/MPEG
Audio Layer-3 Coding", 98th AES, Paris Feb. 95

F.-O. Witte, M. Dietz, et al.: "'Single Chip Implementation of an
ISO/MPEG Layer-3 Decoder", 96th AES, Amsterdam Feb. 94, pp.
3805

For ordering informations, contact:

AES
60 East 42nd Street, Suite 2520
New York, NY 10165-2520, USA
phone: (212) 661-8528, fax: (212) 682-0477

Another interesting publication: the "Proceedings of the Sixth Tirrenia
International Workshop on Digital Communications", Tirrenia Sep. 93,
Elsevier Science B.V. Amsterdam 94 (ISBN 0 444 81580 5).

An excellent tutorial about MPEG-2 has recently been published in a
German technical journal (Fernseh- und Kino-Technik); part 4, by E. F.
Schroeder and J. Spille, talks about the audio part (7/8 94, p. 364 ff).

And for further informations, please feel free to contact [email protected].


3. Applications

Q: O.K., let us concentrate on one or two audio channels. Which Layer shall I
use for my application?
A: Good Question. Of course, it depends on all your requirements. But as a
first approach, you should consider the available bitrate of your
application as the Layers have been designed to support certain areas of
bitrates most effectively. Roughly, today you can achieve a data reduction
of around
1:4     with Layer-1 (or 192 kbps per audio channel),
1:6..8  with Layer-2 (or 128..96 kbps per audio channel), and
1:10..12        with Layer-3, (or 64..56 kbps per audio channel),
and still the reconstructed audio signal will maintain a "CD-like" sound
quality. This may be used as a first "thumb rule" - let's talk about details
later on.

Q:      Why does the performance increase with the number of the Layer? Why
does the standard define a family of audio codecs instead of one single
powerful algorithm?
A: Well, the MPEG standard has forged together two main coding schemes
that offered advantages either in complexity (MUSICAM) or in
performance (ASPEC).
Layer-2 is identical with the MUSICAM format. It has been designed as a
trade-off between sound quality per bitrate and encoder complexity. So it is
most useful for the "medium" range of bitrates (96..128 kbps per channel).
For higher bitrates, even a simplified version, the Layer-1, performs well
enough. Layer-1 has originally been developed for a target bitrate of 192
kbps per channel. It is used as "PASC" within the DCC recorder.
For lower bitrates (64 kbps per channel or even less), the Layer-2 format
suffers from its build-in limitations, and with decreasing bitrate, artefacts
become audible more and more. Here is the strong domain of the most
powerful MPEG audio format, Layer-3. It specifies a set of unique features
that all address one goal: to preserve as much sound quality as possible
even at very low bitrates.

Q: Wait a second! I understand that Layer-3 has been an important asset to
the MPEG-1 standard, to address the high-quality low bitrate
applications. With the advent of  the "low sample rate extension (LSF)" in
MPEG-2, is it still necessary to rely on Layer-3 to achieve a high-quality
sound at low bitrates?
A: Yes, for sure! Please, don't mix up MPEG-1 and MPEG-2 LSF. MPEG-2
LSF is useful only for applications with limited bandwidth (11.25 kHz, at
best). For applications with full bandwidth, MPEG-1 Layer-3 at 64 or 56
kbps per channel achieves the best sound quality of all ISO codecs.
For applications with limited bandwidth, MPEG-2 LSF Layer-3 provides
an excellent sound quality at 56 kbps for monophonic speech signals and
still a good sound quality at only 64 kbps total bitrate for stereo music
signals (with around 10 kHz bandwidth). The latest MPEG ISO listening
test (in September 94 at NTT Japan, doc. MPEG 94/437) proved the
superior performance of Layer-3 in MPEG-1 and MPEG-2 LSF.

Q: Tell me more about sound quality. How do you assess that?
A: Today, there is no alternative to expensive listening tests. During the ISO-
MPEG process, a number of international listening tests have been
performed, with a lot of trained listeners. All these tests used the "triple
stimulus, hidden reference" method and the "CCIR impairment scale" to
assess the sound quality.
The listening sequence is "ABC", with A = original, BC = pair of original
/ coded signal with random sequence, and the listener has to evaluate both
B and C with a number between 1.0 and 5.0. The meaning of these values
is:
  5.0 = transparent (this should be the original signal)
  4.0 = perceptible, but not annoying (first differences noticable)
  3.0 = slightly annoying
  2.0 = annoying
  1.0 = very annoying

Q: Is there really no alternative to listening tests?
A: No, there is not. With perceptual codecs, all traditional "quality"
parameters (like SNR, THD+N, bandwidth) are rather useless, as any
codec may introduce noise and distortions as long as it does not affect the
perceived sound quality. So, listening tests are necessary, and, if carefully
prepared and performed, lead to rather reliable results.
Nevertheless, Fraunhofer-IIS works on objective sound quality assessment
tools, too. There is already a first product available, the NMR meter, a
real-time DSP-based measurement tool that nicely supports the analysis of