This document is an unofficial English translation of the original
Japanese specification made by someone who has no knowledge of Japanese.
Implement at your own risk.

Overview

This document describes Hina-Di, the metadata format used by Asahina
Antenna. In this document, “metadata” is defined as data on a webpage
such as its last update time or its author. Asahina Antenna acts as a
feed reader for Hina-Di.

Conventions used in this document

This document uses the Backus-Naur notation (RFC 822) to formally
specify the format.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and
“OPTIONAL” in this document are to be interpreted as described in BCP 14
(RFC 2119, RFC 8174) when, and only when, they appear in all capitals,
as shown here.

Data Types

The basic data types that constitute Hina-Di are listed below. The
US-ASCII character set is defined by ANSI X3.4-1986.

   OCTET     = <any 8-bit sequence of data>
   CHAR      = <any US-ASCII character (octets 0 - 127)>
   UPALPHA   = <any US-ASCII uppercase letter "A".."Z">
   LOALPHA   = <any US-ASCII lowercase letter "a".."z">
   ALPHA     = UPALPHA | LOALPHA
   DIGIT     = <any US-ASCII digit "0".."9">
   WORD      = 1*(ALPHA|DIGIT)

   CTL       = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
   CR        = <US-ASCII CR, carriage return (13)>
   LF        = <US-ASCII LF, linefeed (10)>
   SP        = <US-ASCII SP, space (32)>
   HT        = <US-ASCII HT, horizontal-tab (9)>
   <">       = <US-ASCII double-quote mark (34)>

   CRLF      = CR LF

   TEXT      = <any OCTET except CTLs, but including HT>
   TOKEN     = <any TEXT, but don't start with SP or HT>

   SEPARATOR = ":" 1*(SP|HT)
   DELIMITER = "," *(SP|HT)
   SLASH     = "/" *(SP|HT)

Structure

A Hina-Di file consists of a series of blocks that summarize the
metadata on a website: a header block, followed by one or more entity
blocks.

   hina-di = header-block
             1*( entity-block )

Block

A block is a set of metadata for a document. Each metadata is
represented as a single header, in a manner similar to RFC 822, with a
field name and a field value.

Field names in a block MUST be unique. A block with duplicate field
names MUST be discarded.

Field names are case-insensitive. Unless explicitly stated for a
particular field, a field’s value is case-insensitive.

   line-format = field-name SEPARATOR field-value CRLF
   field-name  = WORD *( "-" WORD)
   field-value = TOKEN

Header block

Exactly one header block MUST appear in a Hina-Di file, and it MUST be
the first block. It holds metadata about the Hina-Di file itself.

   header-block  = HINA
                   Hinadi-Header
                   CRLF
   Hinadi-Header = 1*( User-Agent
                       | Content-Type
                       | Date )

Entity block

One or more entity blocks MUST be present after the header block. Each
entity block defines metadata about a specific document.

   Entity-block = URL ( HINA-Version
                      | Virtual
                      | Content-Type
                      | Date
                      | Title
                      | Author-Name
                      | Expires
                      | Expire
                      | Last-Modified
                      | Last-Modified-Detected
                      | Server
                      | Authorized
                      | Authorized-url
                      | Method
                      | Keyword
                      | Image-Width
                      | Image-Height
                      | Experimental-field
                      | Undefined-field )
                      CRLF

Fields

This section defines the various fields that may be found in blocks. All
fields are OPTIONAL and case-insensitive unless otherwise specified.

HINA

Indicates that this is a Hina-Di file, and includes its version. This
field is REQUIRED as the first field of Hina-Di files.

   HINA           = "HINA" "/" hinadi-version CRLF
   hinadi-version = "2.2beta"

User-Agent

Name of the user agent that created this Hina-Di file. This field is
REQUIRED in header blocks. The value of this field is case-sensitive.

   User-Agent = "User-Agent" SEPARATOR TOKEN CRLF

URL

URL of the document, compliant with RFC 2396.

This field is REQUIRED in entity blocks.
Making this field the first field of an entity block is RECOMMENDED.

The scheme and domain portions of the URL are not case-sensitive. If the
other portions of the URL are not case-insensitive, they SHOULD be
written using lowercase characters.

   URL         = "URL" SEPARATOR rfc2396-url CRLF
   rfc2396-url = <URI described in section 5.1.2 "Request-URI" in RFC 2396>

Implementations can use this field as a unique key that distinguishes
the entity block from other blocks. To ensure proper uniqueness of this
field, the following conditions MUST be respected by the providing
Hina-Di user agents or their administrators:

-   If the URL can end in a slash (/), then it SHOULD end in a slash.
   Prefer http://www.hoge.jp/foo/ over http://www.hoge.jp/foo
-   If the URL includes a file name, but the file name can be omitted,
   then it SHOULD be omitted.
   Prefer http://www.hoge.jp/foo/ over
   http://www.hoge.jp/foo/index.html

HINA-Version

Specifies that the integrity of the entity block was guaranteed
according to the specification of a specific Hina-Di version. If this
field is missing from an entity block, it means the block might be
incomplete.

   HINA-Version = "HINA-Version" SEPARATOR version
   version      = "HINA" "/" 1*( DIGIT ) "." 1*( DIGIT )

Virtual

URL of another Hina-Di file that holds the entity block, compliant with
RFC 2396.

If there are fields in the entity block other than Virtual, then it
takes the same meaning as the regular URL field.

The case-sensitivity and URL uniqueness conditions defined for the URL
field MUST be followed for this field.

   Virtual     = "Virtual" SEPARATOR rfc2396-url CRLF
   rfc2396-url = <URI described in section 5.1.2 "Request-URI" in RFC 2396>

 Note that the original version of the document defines the Virtual
 feed as Vitural.

Content-Type

MIME type of the Hina-Di file or the document, as described in RFC 1521.
The value of this field is case-sensitive to the extent defined by RFC
1521.

   Content-Type     = "Content-Type" SEPARATOR rfc1521-type CRLF
   rfc1521-type     = "Content-Type" ":" type "/" subtype *(";"parameter)
   type             = "application"
                    | "audio"
                    | "image"
                    | "message"
                    | "multipart"
                    | "text"
                    | "video"
                    | extension-token
   extension-token  = x-token / iana-token
   iana-token       = <a publicly-defined extension token,
                       registered with IANA, as specified in
                       appendix E of RFC1521>
   x-token          = <The two characters "X-" or "x-" followed, with
                       no intervening white space, by any token>
   subtype          = TOKEN
   parameter        = attribute "=" value
   attribute        = TOKEN   ; case-insensitive

   value            = token / quoted-string

   token            =  1*<any (ASCII) CHAR except SPACE, CTLs or tspecials>

   tspecials        =  "(" / ")" / "<" / ">" / "@"
                    /  "," / ";" / ":" / "¥" / <">
                    /  "/" / "[" / "]" / "?" / "="
                    ; Must be in quoted-string to use within parameter values

Date

The date and time when the block or the Hina-Di file was generated. The
dates MUST comply with RFC 1123. The value of this field is
case-sensitive.

   Date            = "Date" SEPARATOR rfc1123-date CRLF
   rfc1123-date    = wkday "," SP day-month-year SP time SP "GMT"
   wkday           = "Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun"
   day-month-year  = 2*DIGIT SP month SP 4*DIGIT      ; day month year (e.g. 02 Jun 1982)
   time            = 2*DIGIT ":" 2*DIGIT ":" 2*DIGIT  ; 00:00:00 - 23:59:59
   month           = "Jan" | "Feb" | "Mar" | "Apr"
                   | "May" | "Jun" | "Jul" | "Aug"
                   | "Sep" | "Oct" | "Nov" | "Dec"

Title

The title of the document.

   Title = "Title" SEPARATOR TOKEN CRLF

Author-Name

Name of the author of the document. The value of this field is
case-sensitive.

   Author-Name = "Author-Name" SEPARATOR TOKEN CRLF

Expires

Expiration date for the block. The dates MUST comply with RFC 1123. The
value of this field is case-sensitive to the extent defined by RFC 1123.

   Expires = "Expires" SEPARATOR rfc1123-date CRLF

Expire

Alias for the Expires field, included for backwards compatibility.

   Expire = "Expire" SEPARATOR rfc1123-date CRLF

Last-Modified

Date and time when the document was last updated. The dates MUST comply
with RFC 1123. The value of this field is case-sensitive to the extent
defined by RFC 1123.

   Last-Modified = "Last-Modified" SEPARATOR rfc1123-date CRLF

Last-Modified-Detected

Date and time representing when the user agent retrieved the document’s
metadata. The dates MUST comply with RFC 1123. The value of this field
is case-sensitive to the extent defined by RFC 1123.

   Last-Modified-Detected = "Last-Modified-Detected" SEPARATOR rfc1123-date CRLF

Server

User agent string of the server used to retrieve the metadata of the
document described by this entity block.

   Server = "Server" SEPARATOR TOKEN CRLF

Authorized

The user agent that retrieved the metadata of the document described by
this entity block.

   Authorized = "Authorized" SEPARATOR TOKEN CRLF WORD

Authorized-url

URL of a page describing the user agent referred to in the Authorized
field, compliant with RFC 2396.

The case-sensitivity and URL uniqueness conditions defined for the URL
field MUST be followed for this field.

   Authorized-url = "Authorized-url" SEPARATOR rfc2396-url CRLF

Method

Describes the chain of propagation that this entity block went through.

   Method      = "Method" SEPARATOR method-type *(SLASH method-type) (SLASH result-code)
   method-type = "GET" | "HEAD" | "FILE" | "REMOTE"
   result-code = <URI described on "???????" in RFC 2396>

Method types

GET
   Metadata retrieved using a HTTP GET request.

HEAD
   Metadata retrieved using a HTTP HEAD request.

FILE
   Metadata retrieved from a local file’s timestamp.

REMOTE
   Metadata retrieved from an entity block generated by another agent.

Example

   Method: REMOTE/REMOTE/GET/200

1.  A first user agent retrieved the metadata on the document using a
   HTTP GET and got a 200 response code (GET/200).
2.  A second user agent retrieved the first user agent’s Hina-Di file,
   then propagated it to its own file (REMOTE).
3.  A third user agent retrieved the second user agent’s Hina-Di file,
   then propogated it to its own file (REMOTE).

Keyword

Words that can be used to give an overview of the document described by
this entity block; tags, categories, etc. The value of this field is
case-sensitive.

   Keyword  = "Keyword" SEPARATOR keywords CRLF
   keywords = TOKEN *(SEPARATOR TOKEN)

Image-Width

Width of an image described by an entity block, in pixels.

This field MUST NOT be used for entity blocks that do not describe
images.

   Image-Width = "Image-Width" SEPARATOR width CRLF
   width       = DIGIT

Image-Height

Height of an image described by an entity block, in pixels.

This field MUST NOT be used for entity blocks that do not describe
images.

   Image-Height = "Image-Height" SEPARATOR width CRLF
   height       = DIGIT

Experimental fields

Implementations MAY define custom fields with an X- prefix to provide
additional metadata not covered in this specification. Implementations
MUST NOT assume that all clients will use each of those fields. Clients
that do not support any experimental field SHOULD ignore them.

Experimental fields MAY include data that is not directly related to
metadata that the document has, and SHOULD be used shall a field for
that purpose be created by an implementor.

   Experimental-field = x-field-name SEPARATOR TOKEN
   x-field-name       = "X-" WORD *("-" WORD)

Undefined fields

Any field that is not defined in this specification. Implementations
that encounter such fields and do not support them SHOULD ignore them.

   Undefined-field  = undef-field-name SEPARATOR TOKEN CRLF
   undef-field-name = WORD *("-" WORD)

Encoding

The character encoding of the Hina-Di file SHOULD be specified as a
parameter of the Content-Type field of the header block. If it is not
specified, it defaults to EUC-JP.

Propagation

In Hina-Di, metadata propagation consists in acquiring metadata from
other agents, then sharing it as it is in the user agent’s own Hina-Di
file. This can be used for aggregation services or a peer-to-peer
network.

The Authorized and Authorized-url fields allow indicating the user agent
from which the metadata originally came from to help ensure its
legitimacy. Propagating MUST only be performed if both fields are
defined and if the user agent is trusted.

When propagating, all fields of an entity block defined in this
specification, with the exception of experimental and undefined fields
or of fields with empty values, MUST be reproduced without modification.
Propagating experimental or undefined fields is not guaranteed. A header
block, or any field that is part of it, MUST NOT be propagated.

The Method field MUST be processed according to the process described in
the Method section.

Appendix: Related terms

Asahina-Antenna
   Metadata acquisition agent based on Hina-Di.

metadata
   Information about the content, such as the author, title and update
   time.

hina-di
   Metadata transfer format used by Asahina-Antenna 2.x.

hina.txt
   Metadata transfer format used by Asahina-Antenna 1.x,
   made obsolete by hina-di.

DI
   Document Information. Metadata transfer format used by DIXS.
   Hina-Di has been influenced by DI.