\"      Id: mandoc_html.3,v 1.23 2020/04/24 13:13:06 schwarze Exp
\"
\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <[email protected]>
\"
\" Permission to use, copy, modify, and distribute this software for any
\" purpose with or without fee is hereby granted, provided that the above
\" copyright notice and this permission notice appear in all copies.
\"
\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
\"
Dd April 24, 2020
Dt MANDOC_HTML 3
Os
Sh NAME
Nm mandoc_html
Nd internals of the mandoc HTML formatter
Sh SYNOPSIS
In sys/types.h
Fd #include """mandoc.h"""
Fd #include """roff.h"""
Fd #include """out.h"""
Fd #include """html.h"""
Ft void
Fn print_gen_decls "struct html *h"
Ft void
Fn print_gen_comment "struct html *h" "struct roff_node *n"
Ft void
Fn print_gen_head "struct html *h"
Ft struct tag *
Fo print_otag
Fa "struct html *h"
Fa "enum htmltag tag"
Fa "const char *fmt"
Fa ...
Fc
Ft void
Fo print_tagq
Fa "struct html *h"
Fa "const struct tag *until"
Fc
Ft void
Fo print_stagq
Fa "struct html *h"
Fa "const struct tag *suntil"
Fc
Ft void
Fn html_close_paragraph "struct html *h"
Ft enum roff_tok
Fo html_fillmode
Fa "struct html *h"
Fa "enum roff_tok tok"
Fc
Ft int
Fo html_setfont
Fa "struct html *h"
Fa "enum mandoc_esc font"
Fc
Ft void
Fo print_text
Fa "struct html *h"
Fa "const char *word"
Fc
Ft void
Fo print_tagged_text
Fa "struct html *h"
Fa "const char *word"
Fa "struct roff_node *n"
Fc
Ft char *
Fo html_make_id
Fa "const struct roff_node *n"
Fa "int unique"
Fc
Ft struct tag *
Fo print_otag_id
Fa "struct html *h"
Fa "enum htmltag tag"
Fa "const char *cattr"
Fa "struct roff_node *n"
Fc
Ft void
Fn print_endline "struct html *h"
Sh DESCRIPTION
The mandoc HTML formatter is not a formal library.
However, as it is compiled into more than one program, in particular
Xr mandoc 1
and
Xr man.cgi 8 ,
and because it may be security-critical in some contexts,
some documentation is useful to help to use it correctly and
to prevent XSS vulnerabilities.
Pp
The formatter produces HTML output on the standard output.
Since proper escaping is usually required and best taken care of
at one central place, the language-specific formatters
Po
Pa *_html.c ,
see
Sx FILES
Pc
are not supposed to print directly to
Dv stdout
using functions like
Xr printf 3 ,
Xr putc 3 ,
Xr puts 3 ,
or
Xr write 2 .
Instead, they are expected to use the output functions declared in
Pa html.h
and implemented as part of the main HTML formatting engine in
Pa html.c .
Ss Data structures
These structures are declared in
Pa html.h .
Bl -tag -width Ds
It Vt struct html
Internal state of the HTML formatter.
It Vt struct tag
One entry for the LIFO stack of HTML elements.
Members include
Fa "enum htmltag tag"
and
Fa "struct tag *next" .
El
Ss Private interface functions
The function
Fn print_gen_decls
prints the opening
Aq Pf \&! Ic DOCTYPE
declaration.
Pp
The function
Fn print_gen_comment
prints the leading comments, usually containing a Copyright notice
and license, as an HTML comment.
It is intended to be called right after opening the
Aq Ic HTML
element.
Pass the first
Dv ROFFT_COMMENT
node in
Fa n .
Pp
The function
Fn print_gen_head
prints the opening
Aq Ic META
and
Aq Ic LINK
elements for the document
Aq Ic HEAD ,
using the
Fa style
member of
Fa h
unless that is
Dv NULL .
It uses
Fn print_otag
which takes care of properly encoding attributes,
which is relevant for the
Fa style
link in particular.
Pp
The function
Fn print_otag
prints the start tag of an HTML element with the name
Fa tag ,
optionally including the attributes specified by
Fa fmt .
If
Fa fmt
is the empty string, no attributes are written.
Each letter of
Fa fmt
specifies one attribute to write.
Most attributes require one
Va char *
argument which becomes the value of the attribute.
The arguments have to be given in the same order as the attribute letters.
If an argument is
Dv NULL ,
the respective attribute is not written.
Bl -tag -width 1n -offset indent
It Cm c
Print a
Cm class
attribute.
It Cm h
Print a
Cm href
attribute.
This attribute letter can optionally be followed by a modifier letter.
If followed by
Cm R ,
it formats the link as a local one by prefixing a
Sq #
character.
If followed by
Cm I ,
it interpretes the argument as a header file name
and generates a link using the
Xr mandoc 1
Fl O Cm includes
option.
If followed by
Cm M ,
it takes two arguments instead of one, a manual page name and
section, and formats them as a link to a manual page using the
Xr mandoc 1
Fl O Cm man
option.
It Cm i
Print an
Cm id
attribute.
It Cm \&?
Print an arbitrary attribute.
This format letter requires two
Vt char *
arguments, the attribute name and the value.
The name must not be
Dv NULL .
It Cm s
Print a
Cm style
attribute.
If present, it must be the last format letter.
It requires two
Va char *
arguments.
The first is the name of the style property, the second its value.
The name must not be
Dv NULL .
The
Cm s
Ar fmt
letter can be repeated, each repetition requiring an additional pair of
Va char *
arguments.
El
Pp
Fn print_otag
uses the private function
Fn print_encode
to take care of HTML encoding.
If required by the element type, it remembers in
Fa h
that the element is open.
The function
Fn print_tagq
is used to close out all open elements up to and including
Fa until ;
Fn print_stagq
is a variant to close out all open elements up to but excluding
Fa suntil .
The function
Fn html_close_paragraph
closes all open elements that establish phrasing context,
thus returning to the innermost flow context.
Pp
The function
Fn html_fillmode
switches to fill mode if
Fa want
is
Dv ROFF_fi
or to no-fill mode if
Fa want
is
Dv ROFF_nf .
Switching from fill mode to no-fill mode closes the current paragraph
and opens a
Aq Ic PRE
element.
Switching in the opposite direction closes the
Aq Ic PRE
element, but does not open a new paragraph.
If
Fa want
matches the mode that is already active, no elements are closed nor opened.
If
Fa want
is
Dv TOKEN_NONE ,
the mode remains as it is.
Pp
The function
Fn html_setfont
selects the
Fa font ,
which can be
Dv ESCAPE_FONTROMAN ,
Dv ESCAPE_FONTBOLD ,
Dv ESCAPE_FONTITALIC ,
Dv ESCAPE_FONTBI ,
or
Dv ESCAPE_FONTCW ,
for future text output and internally remembers
the font that was active before the change.
If the
Fa font
argument is
Dv ESCAPE_FONTPREV ,
the current and the previous font are exchanged.
This function only changes the internal state of the
Fa h
object; no HTML elements are written yet.
Subsequent text output will write font elements when needed.
Pp
The function
Fn print_text
prints HTML element content.
It uses the private function
Fn print_encode
to take care of HTML encoding.
If the document has requested a non-standard font, for example using a
Xr roff 7
Ic \ef
font escape sequence,
Fn print_text
wraps
Fa word
in an HTML font selection element using the
Fn print_otag
and
Fn print_tagq
functions.
Pp
The function
Fn print_tagged_text
is a variant of
Fn print_text
that wraps
Fa word
in an
Aq Ic A
element of class
Qq permalink
if
Fa n
is not
Dv NULL
and yields a segment identifier when passed to
Fn html_make_id .
Pp
The function
Fn html_make_id
allocates a string to be used for the
Cm id
attribute of an HTML element and/or as a segment identifier for a URI in an
Aq Ic A
element.
If
Fa n
contains a
Fa tag
attribute, it is used; otherwise, child nodes are used.
If
Fa n
is an
Ic \&Sh ,
Ic \&Ss ,
Ic \&Sx ,
Ic SH ,
or
Ic SS
node, the resulting string is the concatenation of the child strings;
for other node types, only the first child is used.
Bytes not permitted in URI-fragment strings are replaced by underscores.
If any of the children to be used is not a text node,
no string is generated and
Dv NULL
is returned instead.
If the
Fa unique
argument is non-zero, deduplication is performed by appending an
underscore and a decimal integer, if necessary.
If the
Fa unique
argument is 1, this is assumed to be the first call for this tag
at this location, typically for use by
Dv NODE_ID ,
so the integer is incremented before use.
If the
Fa unique
argument is 2, this is ssumed to be the second call for this tag
at this location, typically for use by
Dv NODE_HREF ,
so the existing integer, if any, is used without incrementing it.
Pp
The function
Fn print_otag_id
opens a
Fa tag
element of class
Fa cattr
for the node
Fa n .
If the flag
Dv NODE_ID
is set in
Fa n ,
it attempts to generate an
Cm id
attribute with
Fn html_make_id .
If the flag
Dv NODE_HREF
is set in
Fa n ,
an
Aq Ic A
element of class
Qq permalink
is added:
outside if
Fa n
generates an element that can only occur in phrasing context,
or inside otherwise.
This function is a wrapper around
Fn html_make_id
and
Fn print_otag ,
automatically chosing the
Fa unique
argument appropriately and setting the
Fa fmt
arguments to
Qq chR
and
Qq ci ,
respectively.
Pp
The function
Fn print_endline
makes sure subsequent output starts on a new HTML output line.
If nothing was printed on the current output line yet, it has no effect.
Otherwise, it appends any buffered text to the current output line,
ends the line, and updates the internal state of the
Fa h
object.
Pp
The functions
Fn print_eqn ,
Fn print_tbl ,
and
Fn print_tblclose
are not yet documented.
Sh RETURN VALUES
The functions
Fn print_otag
and
Fn print_otag_id
return a pointer to a new element on the stack of HTML elements.
When
Fn print_otag_id
opens two elements, a pointer to the outer one is returned.
The memory pointed to is owned by the library and is automatically
Xr free 3 Ns d
when
Fn print_tagq
is called on it or when
Fn print_stagq
is called on a parent element.
Pp
The function
Fn html_fillmode
returns
Dv ROFF_fi
if fill mode was active before the call or
Dv ROFF_nf
otherwise.
Pp
The function
Fn html_make_id
returns a newly allocated string or
Dv NULL
if
Fa n
lacks text data to create the attribute from.
The caller is responsible for
Xr free 3 Ns ing
the returned string after using it.
Pp
In case of
Xr malloc 3
failure, these functions do not return but call
Xr err 3 .
Sh FILES
Bl -tag -width mandoc_aux.c -compact
It Pa main.h
declarations of public functions for use by the main program,
not yet documented
It Pa html.h
declarations of data types and private functions
for use by language-specific HTML formatters
It Pa html.c
main HTML formatting engine and utility functions
It Pa mdoc_html.c
Xr mdoc 7
HTML formatter
It Pa man_html.c
Xr man 7
HTML formatter
It Pa tbl_html.c
Xr tbl 7
HTML formatter
It Pa eqn_html.c
Xr eqn 7
HTML formatter
It Pa roff_html.c
Xr roff 7
HTML formatter, handling requests like
Ic br ,
Ic ce ,
Ic fi ,
Ic ft ,
Ic nf ,
Ic rj ,
and
Ic sp .
It Pa out.h
declarations of data types and private functions
for shared use by all mandoc formatters,
not yet documented
It Pa out.c
private functions for shared use by all mandoc formatters
It Pa mandoc_aux.h
declarations of common mandoc utility functions, see
Xr mandoc 3
It Pa mandoc_aux.c
implementation of common mandoc utility functions
El
Sh SEE ALSO
Xr mandoc 1 ,
Xr mandoc 3 ,
Xr man.cgi 8
Sh AUTHORS
An -nosplit
The mandoc HTML formatter was written by
An Kristaps Dzonsons Aq Mt [email protected] .
It is maintained by
An Ingo Schwarze Aq Mt [email protected] ,
who also wrote this manual.