The USMARC Formats: Background and Principles

         The following statement of background and principles for
content designation in the USMARC formats was approved in 1982 and
revised in 1989 by the American Library Association's
RTSD/LITA/RASD Machine-Readable Bibliographic Information Committee
(MARBI), in consultation with representatives from United States
and Canadian national libraries and designated bibliographic
networks.  The statement includes the principles under which the
USMARC formats were developed and constitutes a set of working
principles for the ongoing process of format development.  This
document will be revised as necessary.

1.   Introduction
1.1.      The USMARC formats are standards for the representation
         and communication of bibliographic and related
         information in machine-readable form.
1.2.      A USMARC record involves three elements: the record
         structure, the content designation, and the data content
         of the record.
1.2.1.         The structure of USMARC records is an
              implementation of national and international
              standards, e.g.,  Bibliographic Information
              Interchange (ANSI Z39.2) and Format for
              Bibliographic Information Interchange on Magnetic
              Tape (ISO 2709).
1.2.2.         Content designation, the codes and conventions
              established to identify explicitly and characterize
              further the data elements within a record and to
              support the manipulation of those data, is defined
              in the USMARC formats.
1.2.3.         The content of most data elements is defined by
              standards outside the formats, e.g., Anglo-American
              Cataloguing Rules, Library of Congress Subject
              Headings, National Library of Medicine
              Classification.  The content of other data
              elements, e.g., coded data (see section 9. below),
              is defined in the USMARC formats.
1.3.      A USMARC format is a set of codes and content designators
         defined for encoding a particular type of machine-
         readable record.  USMARC formats are defined for the
         following types of data: bibliographic, holdings, and
         authority.
1.3.1.         USMARC Format for Bibliographic Data contains
              format specifications for encoding data elements
              needed to describe, retrieve, and control various
              forms of bibliographic material.  The USMARC Format
              for Bibliographic Data is an integrated format
              defined for the identification and description of
              different forms of bibliographic material.  USMARC
              specifications are defined for books, archival and
              manuscripts control, computer files, maps, music,
              visual materials, and serials.  With the full
              integration of the previously discrete
              bibliographic formats, consistent definition and
              usage are maintained for different forms of
              material.
1.3.2.         USMARC Format for Holdings Data contains format
              specifications for encoding data elements pertinent
              to holdings and location data for all forms of
              material.
1.3.3.         USMARC Format for Authority Data contains format
              specifications for encoding data elements that
              identify or control the content and content
              designation of those portions of a bibliographic
              record that may be subject to authority control.
1.4.      The USMARC formats are maintained by the Library of
         Congress in consultation with various user communities.
1.4.1.         Through maintenance and revision, content
              designation is added to and existing content
              designation is made obsolete or deleted from
              formats.  Content designation is made obsolete when
              it is found to be no longer appropriate or when the
              data element involved is no longer needed.  An
              obsolete content designator may continue to appear
              in records created prior to the date it was made
              obsolete.  Obsolete content designators are not
              used in new records.  A deleted content designator
              is one that had been reserved in USMARC but had not
              been defined or one that had been defined but it is
              known with near certainty that it had not been
              used.
1.4.2.         The principles stated in this document have
              developed over time.  The formats contain
              exceptions to the principles due to early format
              development decisions.  While many exceptions have
              been made obsolete, others remain because of the
              need to maintain upward compatibility of the
              formats in current development.

2.   General Considerations
2.1.      The USMARC formats are communication formats, primarily
         designed to provide specifications for the exchange of
         bibliographic and related information between systems.
         They are widely used in a variety of exchange and
         processing environments.  As communication formats, they
         do not mandate internal storage or display formats to be
         used by individual systems.
2.2.      The USMARC formats, particularly the bibliographic and
         authority formats, were developed to enable the Library
         of Congress to communicate its catalog records to other
         institutions.  The formats have had a close relationship
         to the needs and practices of United States libraries.
         They reflect both the various cataloging codes applied in
         the library community and the requirements of the
         archives community.
2.3.      The USMARC formats were designed to facilitate the
         exchange of bibliographic and related information on
         magnetic tape within the United States.  An attempt has
         been made to preserve compatiblity with other national
         and international formats, e.g., CANMARC and UNIMARC.
         Lack of international agreement on cataloging codes and
         practices has made complete compatibility impossible.
2.4.      National agencies in the United States and Canada
         (Library of Congress, National Agricultural Library,
         National Library of Medicine, United States Government
         Printing Office, and National Library of Canada) are
         given special emphasis and consideration in the formats
         because they serve as sources of authoritative cataloging
         and as agencies responsible for certain data elements.
2.5.      The institutions responsible for the content, content
         designation, and transcription accuracy of bibliographic
         and authority data within a USMARC record are identified
         at the record level in field 008/39 (Fixed-Length Data
         Elements�Cataloging source) and in field 040 (Cataloging
         Source).  This responsibility may be evaluated in terms
         of the following rule.
2.5.1.         Responsible Parties Rule:
2.5.1.1.       Unmodified records�The institution identified as
              the cataloging institution (field 040$a) is
              considered responsible for data content in the
              record except for agency-assigned data (see section
              2.5.2.1. below).  The institution identified as the
              transcribing institution (field 040$c) is
              considered responsible for content designation and
              transcription accuracy for all data.
2.5.1.2.       Modified records�Institutions identified as
              cataloging or modifying institutions (field
              040$a,$d) are considered collectively responsible
              for data content in the record except for agency-
              assigned and authoritative-agency data (see section
              2.5.2. below).  Institutions identified as
              transcribing or modifying institutions (field
              040$c,$d) are considered collectively responsible
              for content designation and transcription accuracy.
2.5.2.         Exceptions to Responsible Parties Rule:
2.5.2.1.       Certain data elements are defined in the USMARC
              formats as being exclusively assigned by particular
              agencies, e.g., International Standard Serial
              Number (field 022), Library of Congress Control
              Number (field 010).  The content of such agency-
              assigned elements is always the responsibility of
              the agency.
2.5.2.2.       Certain data elements have been defined in the
              USMARC formats in relation to one or more
              authoritative agencies that maintain the lists or
              rules upon which the data is based, e.g., Library
              of Congress Call Number (field 050), National
              Library of Medicine Call Number (field 060).  Where
              it is possible for other agencies to create similar
              or identical content for these data elements,
              content designation may be provided to distinguish
              between content actually assigned by the
              authoritative agency and that assigned by other
              agencies.  In the former case, responsibility for
              content rests with the authoritative agency.  In
              the latter case, the Responsible Parties Rule
              applies, and no further identification of the
              assigning agency is provided.
2.6.      The USMARC bibliographic format provides content
         designation only for data that are applicable to all
         copies of the bibliographic entity described.
2.6.1.         Information which applies only to some copies (or
              even to a single copy) of a title may be of
              interest beyond the institutions holding such
              copies. The USMARC formats provide limited content
              designation for the encoding of this information
              and for identifying the holding institution, e.g.,
              subfield $5 in the 700-740 added entry fields in
              the bibliographic format.
2.6.2.         Information that does not apply to all copies of a
              title, and is not of interest to other
              institutions, is coded in local fields. For
              instance, the 59X block is reserved for local notes
              in the bibliographic format (see section 6.7
              below).
2.7.      Although a USMARC record is usually autonomous, data
         elements are provided that contain information used to
         link related records.  These linkages may be implicit,
         through identical access points in each record, or
         explicit, through a linking entry field.  The 76X-78X
         linking entry fields in the bibliographic format may
         contain either selected data elements that identify the
         related item or a control number that identifies the
         related record.  In addition, an explicit code in the
         leader identifies a record that is linked to another
         record through a control number.

3.   Structural Features
3.1.      The USMARC formats are an implementation of the
         Bibliographic Information Interchange (ANSI Z39.2).  The
         formats also incorporate other relevant ANSI standards,
         e.g., Magnetic Tape Labels and File Structure for
         Information Interchange (ANSI X3.27).
3.2.      All information in a USMARC record is stored in character
         form.  USMARC communications records are coded in
         Extended ASCII, as defined in the USMARC Specifications
         for Record Structure, Character Sets, Tapes.
3.3.      The length of each variable field can be determined
         either from the length-of-field portion of the directory
         entry or from the occurrence of the field terminator
         character [1E16, 8-bit].  The length of a record can be
         determined either from the logical record length element
         in Leader/00-04 or from the occurrence of the record
         terminator character [1D16, 8-bit].  The location of each
         variable field is explicitly stated in the starting
         character position element in its directory entry.

4.   Content Designation
4.1.      The goal of content designation is to identify and
         characterize the data elements that comprise a USMARC
         record with sufficient precision to support manipulation
         of the data for a variety of functions.
4.2.      USMARC content designation is designed to support
         functions that include:
         a.   Display�the formatting of data for display on a
              CRT, for printing on 3x5 cards or in book catalogs,
              for production of COM catalogs, or for other visual
              presentation of the data.
         b.   Information retrieval�the identification,
              categorization, and retrieval of any identifiable
              data element in a record.
4.3.      Some fields serve multiple functions.  For example, field
         245 (Title Statement) serves both as the bibliographic
         transcription of the title and the statement of
         responsibility and as an access point for the title.
4.4.      The USMARC formats provide for display constants.  A
         display constant is a term, phrase, and/or spacing or
         punctuation convention that may be system generated under
         prescribed circumstances to make a visual presentation of
         data in a record more meaningful to a user.  Such display
         constants are not carried in the data, but may be
         supplied for display by the processing system.  For
         example, subfield $x in Series Statement field 490 (and
         in some other fields) implies the display constant ISSN;
         also, the combination of tag 780 (Preceding Entry) and
         second indicator value 3 implies the display constant
         Supersedes in part:.
4.5.      The USMARC formats support the sorting of data only to a
         limited extent.  In general, sorting must be accomplished
         through the application of external algorithms to the
         data.

5.   Organization of the Record
5.1.      A USMARC record consists of three main sections: the
         leader, the directory, and the variable fields.
5.2.      The leader consists of data elements that contain coded
         values and are identified by relative character position.

         Data elements in the leader define parameters for
         processing the record.  The leader is fixed in length (24
         characters) and occurs at the beginning of each USMARC
         record.
5.3.      The directory contains the tag, starting location, and
         length of each field within the record.  Directory
         entries for variable control fields appear first, in
         ascending tag order.  Entries for variable data fields
         follow, arranged in ascending order according to the
         first character of the tag.  The order of the fields in
         the record does not necessarily correspond to the order
         of directory entries.  Duplicate tags are distinguished
         only by  location of the respective fields within the
         record.  The length of the directory entry is defined in
         the entry map elements in Leader/20-23.  In the USMARC
         formats, the length of a directory entry is 12
         characters.  The directory ends with a field terminator
         character.
5.4.      The data content of a record is divided into variable
         fields.  The USMARC formats distinguish two types of
         variable fields: variable control fields and variable
         data fields.  Control and data fields are distinguished
         only by structure (see sections 7 and 8 below). The term
         fixed fields is occasionally used in USMARC
         documentation, referring either to control fields
         generally or to specific coded-data fields, e.g., 007
         (Physical Description Fixed Field) or 008 (Fixed-Length
         Data Elements).
6.   Variable Fields and Tags
6.1.      The data in a USMARC record is organized into fields,
         each identified by a three-character tag.
6.2.      According to ANSI Z39.2, the tag must consist of
         alphabetic or numeric ASCII graphic characters, i.e.,
         decimal integers 0-9 or letters A-Z (uppercase or
         lowercase, but not both).  The MARC formats have used
         only numeric tags.
6.3.      The tag is stored in the directory entry for the field,
         not in the field itself.
6.4.      Variable fields are grouped into blocks according to the
         first character of the tag, which identifies the function
         of the data within a record, e.g., main entry, added
         entry, subject entry.  The type of information in the
         field, e.g., personal name, corporate name, or title, is
         identified by the remainder of the tag.
6.4.1.         Bibliographic format blocks:
                        0XX = Control information, numbers, and
                              codes
                        1XX = Main entry
                        2XX = Titles and title paragraph (title,
                             edition, imprint)
                        3XX = Physical description, etc.
                        4XX = Series statements
                        5XX = Notes
                        6XX = Subject access fields
                        7XX = Added entries other than subject
                             or series; linking fields
                        8XX = Series added entries, etc.
                        9XX = Reserved for local implementation
6.4.2.         Authority format blocks:
                        0XX = Control information, numbers, and
                             codes
                        1XX = Heading
                        2XX = Complex see references
                        3XX = Complex see also references
                        4XX = See from tracings
                        5XX = See also from tracings
                        6XX = Reference notes, treatment
                             decisions, notes, etc.
                        7XX = Not defined
                        8XX = Not defined
                        9XX = Reserved for local implementation
6.4.3.         Holdings format blocks:
                        0XX = Control information, numbers, and
                             codes
                        1XX = Not defined
                        2XX = Not defined
                        3XX = Not defined
                        4XX = Not defined
                        5XX = Notes
                        6XX = Not defined
                        7XX = Not defined
                        8XX = Holdings and location data, notes
                        9XX = Reserved for local implementation
6.5.      Certain blocks in the USMARC bibliographic and authority
         formats contain data which may be subject to authority
         control (1XX, 4XX, 6XX, 7XX, 8XX for bibliographic
         records; 1XX, 4XX, 5XX for authority records).
6.5.1.         In these blocks, certain parallels of content
              designation are preserved.  The following meanings
              are generally given to the final two characters of
              the tag:
                        X00 = Personal names
                        X10 = Corporate names
                        X11 = Meeting names
                        X30 = Uniform titles
                        X40 = Bibliographic titles
                        X50 = Topical terms
                        X51 = Geographic names
              Further content designation (indicators and
              subfield codes) for data elements subject to
              authority control are defined consistently across
              the bibliographic and authority formats.  These
              guidelines apply only to the main range of fields
              in each block, not to secondary ranges, e.g., the
              linking entry fields 760-787 in the bibliographic
              format.
6.5.2.         Within fields subject to authority control, data
              elements may exist which are not subject to
              authority control and which may vary from record to
              record containing the same heading, e.g., subfield
              $e, Relator.
6.5.3.         In fields not subject to authority control, each
              tag is defined independently.  Parallel meanings
              have been preserved whenever possible.
6.6.      Principles have been established to assist in determining
         when a separate field should be defined for note data and
         when the data should be included in a general note field.
6.6.1.         In the USMARC bibliographic format, a specific 5XX
              note field is defined when at least one of the
              following is true:
              a.   Categorical indexing or retrieval is required
                   on the data defined for the note.  The note is
                   used for structured access purposes but does
                   not have the nature of a controlled access
                   point.
              b.   Special manipulation of that specific category
                   of data is a routine requirement.  Such
                   manipulation includes special print/display
                   formatting or selection/suppression from
                   display or printed product.
              c.   Specialized structuring of information for
                   reasons other than those given in (a) or (b),
                   e.g., to support particular standards of data
                   content when they cannot be supported in
                   existing fields.
6.6.2.         In the USMARC authority format, the specifications
              for notes are covered in the following two
              conditions:
              a.   A specific note field is needed when special
                   manipulation of that specific category of data
                   is a routine requirement.  Such manipulation
                   includes special print/display formatting or
                   selection/suppression from display or printed
                   product.
              b.   Multiple notes are generally not established
                   to accommodate the same type of information
                   for different types of authorities.  Notes are
                   thus not differentiated by or limited to
                   subject, name, or series if the same
                   information applies to more than one type.
6.7.      Certain tags have been reserved for local implementation.

         The USMARC formats specify no structure or meaning for
         local fields.  Communication of local fields between
         systems is governed by mutual agreements on the content
         and content designation of the fields communicated.
6.7.1.         The 9XX block is reserved for local implementation.
6.7.2.         In general, any tag containing the character 9 is
              reserved for local implementation within the block
              structure (see section 6.4 above).
6.7.3.         The historical development of the USMARC formats
              has left one exception to this general principle:
              field 490 (Series Statement) in the bibliographic
              format.  There are several obsolete fields with
              tags containing the character 9.
6.8.      Theoretically, all fields, except field 001 (Control
         Number) and field 005 (Date and Time of Latest
         Transaction), may be repeated.  The nature of the data,
         however, often precludes repetition.  For example, a
         bibliographic record may contain only one field 245
         (Title Statement) and an authority record may contain
         only one 1XX heading field.  The
         repeatability/nonrepeatability of each field is defined
         in the USMARC formats.

7.   Variable Control Fields
7.1.      The 00X fields in the USMARC formats are variable control
         fields.
7.2.      Variable control fields consist of data and a field
         terminator.  They  contain neither indicators nor
         subfield codes (see sections 8.3 and 8.4  below).
7.3.      Variable control fields contain either a single data
         element or a series of fixed-length data elements
         identified by relative character position.

8.   Variable Data Fields
8.1.      All fields except 00X are variable data fields.
8.2.      Four levels of content designation are provided for
         variable data fields in ANSI Z39.2:
         a.   a three-character tag, stored in the directory
              entry;
         b.   indicators stored at the beginning of each variable
              data field, the number of indicators being
              reflected in Leader/10 (Indicator count);
         c.   subfield codes preceding each data element, the
              length of the code being reflected in Leader/11
              (Subfield code count); and
         d.   a field terminator following the last data element
              in the field.
8.3.      Indicators
8.3.1.         Indicators contain values conveying information
              that interprets or  supplements the data found in
              the field.
8.3.2.         The USMARC formats specify two indicator positions
              at the beginning of each variable data field.
8.3.3.         Indicators are defined independently for each
              field.  Parallel meanings are preserved whenever
              possible.
8.3.4.         Indicator values are interpreted independently;
              meaning is not ascribed to the two indicators taken
              together.
8.3.5.         Indicators may be any lowercase alphabetic or
              numeric character or a blank (#).  Numeric values
              are defined first.  A blank (#) is used in an
              undefined indicator position or to mean information
              not provided in a defined indicator position.
8.3.6.         The value 9 is reserved for local implementation.
8.4.           Subfield Codes
8.4.1.         Subfield codes identify data elements within a
              field that require (or might require) separate
              manipulation.
8.4.2.         Subfield codes in the USMARC formats consist of two
              characters�a  delimiter [1F16, 8-bit], followed by
              a data element identifier.  A data element
              identifier may be any lowercase alphabetic or
              numeric character.
8.4.2.1.  Numeric identifiers are defined for parametric data used
         to process the field, or coded data needed to interpret
         the field.  (Note that not all numeric identifiers
         defined in the past have followed this specification.)
8.4.2.2.  Alphabetic identifiers are defined for the separate
         elements that constitute the data content of the field.
8.4.2.3.  The character 9 and the following graphic symbols are
         reserved for local definition as data element
         identifiers:  ! " # $ % & '  ( ) * + ' - . / : ; < = > ?
8.4.3.         Subfield codes are defined independently for each
              field.  Parallel meanings are preserved whenever
              possible.
8.4.4.         Subfield codes are defined for purposes of
              identification, not arrangement.  The order of
              subfields is specified by content standards, e.g.,
              cataloging rules.  In some cases, however, such
              specifications may be incorporated in the USMARC
              format documentation.
8.4.5.         Theoretically, all data elements may be repeated.
              The nature of the data, however, often precludes
              repetition. The repeatability/nonrepeatability of
              each subfield code is defined in the USMARC
              formats.

9.   Coded Data
9.1.      In addition to content designation, the USMARC formats
         include specifications for the content of certain data
         elements, particularly those that provide for the
         representation of data by coded values.
9.2.      Coded values consist of fixed-length character strings.
         Individual elements within a coded-data field or subfield
         are identified by relative character position.
9.3.      Although coded data occur most frequently in the leader,
         directory, and variable control fields, any field or
         subfield may be defined for coded-data elements.
9.4.      Certain common values have been defined whenever
         applicable:
              #    Undefined (element not defined)
              n    Not applicable (element is not
                             applicable to the item)
              u    Unknown (record creator was unable to
                   determine value)
              z    Other (value other than those defined for the
                   element)
              |    Fill character (record creator has chosen not
                   to provide information)
         Historical exceptions do occur in the formats.  In
         particular, the blank (#) often has been defined as not
         applicable or has been assigned a specific meaning.


STANDARDS AND OTHER DOCUMENTS RELATED TO USMARC FORMATS

National and international standards:
         These publications are available from the American
National Standards Institute, Inc., 1430 Broadway, New York, NY
10018.

Bibliographic Information Interchange (ANSI Z39.2-1985)
Format for Bibliographic Information Interchange on Magnetic Tape
(ISO 2709-1981)
Magnetic Tape Labels and File Structure for Information Interchange
(ANSI X3.27-1987)


USMARC standards:
         These publications are available from the Library of
Congress, Cataloging Distribution Service, Washington, DC   20541.

USMARC Concise Formats
USMARC Format for Authority Data
USMARC Format for Bibliographic Data
USMARC Format for Holdings Data
USMARC Specifications for Record Structure, Character Sets, Tapes
USMARC Code List for Languages
USMARC Code List for Countries
USMARC Code List for Geographic Areas
USMARC Code List for Relators, Sources, Descriptive Conventions