\documentclass[a4paper]{article}

\usepackage[colorlinks]{hyperref}
\usepackage{bookmark}
\usepackage{booktabs}

\usepackage{lmodern}
% \usepackage{kerkis}
% \usepackage{gfsdidot}

% allow other files to load this file with different setup
\providecommand{\greeksetup}{
 \usepackage[LGR,T1]{fontenc}
 \usepackage{textalpha}
 \usepackage{alphabeta}

 % Greek utf8 definitions work with and without "Babel",
 % with monotonic, polytonic, and ancient Greek variants.
 % The new implementation of \MakeUppercase requires Babel
 % for the Greek localisation
 \usepackage[greek,english]{babel}
 % \languageattribute{greek}{polutoniko}
 % \languageattribute{greek}{ancient}
}
\greeksetup

% Fallbacks:
\ProvideTextCommandDefault{\greekscript}{\fontencoding{LGR}\selectfont
                                        \def\encodingdefault{LGR}}
\providecommand{\latinscript}{\fontencoding{T1}\selectfont
                             \def\encodingdefault{T1}}
\ProvideTextCommandDefault{\ensuregreek}[1]{\leavevmode{\greekscript #1}}
\providecommand{\ensureascii}[1]{{\fontencoding{T1}\selectfont #1}}


\begin{document}

\title{Greek Unicode with 8-bit TeX}
\author{Günter Milde}
\maketitle

\begin{abstract}
\noindent
The definitions in \texttt{lgrenc.dfu} provide UTF-8 support for the Greek
script based on the \emph{LaTeX internal character representation} macros
(LICRs) defined in the \emph{greek-fontenc} package.
\end{abstract}

\tableofcontents

\section{Introduction}

The default input encoding for 8-bit LaTeX changed from 7-bit ASCII to UTF-8
in April 2018.\footnote{%
 The XeTeX and LuaTeX engines use UTF-8 as input, internal, and font
 encoding. They do not require (and, except in 8-bit compatibility mode,
 do not work with) the and \emph{greek-inputenc} package.}
However, the standard setup misses definitions for Greek Unicode characters.
\emph{Greek-inputenc} adds definitions to allow the use of
literal characters for Greek letters and symbols in the document source.

As with all input encoding definitions, this only works if the active font
encoding supports the characters.
For the Greek script, this is usually the \emph{LGR} font encoding set up by
\href{https://ctan.org/pkg/greek-fontenc}{\emph{greek-fontenc}}.
% e.g. Π produces:
% ! LaTeX Error: Command \textPi unavailable in encoding T1.
% just like Ж produces:
% ! LaTeX Error: Command \CYRZH unavailable in encoding T1.

\section{Usage}

Since 2018, it is no longer necessary to load the \emph{inputenc} package
for UTF-8 encoded sources.\footnote{%
 The legacy input encodings \emph{iso-8859-7} and
 \emph{macgreek} are selected by giving them as options to the
 \href{https://ctan.org/pkg/inputenc}{\emph{inputenc}} package.}
The character definitions in the file \texttt{lgrenc.dfu} are automatically
loaded, if the LGR font encoding is loaded by one of the following
alternatives:

\begin{itemize}

\item With \emph{fontenc}, e.g.,
 %
 \begin{verbatim}
   \usepackage[LGR,T1]{fontenc}
\end{verbatim}
 %
 Ensure that LGR is the active font encoding whenever a Greek character is
 used in the text (see fntguide.pdf for font encoding switching commands).
 \begin{quote}
   \greekscript
   Τί φήις; Ἱδὼν ἐνθέδε παῖδ’ ἐλευθέραν
   τὰς πλησίον Νύμφας στεφανοῦσαν, Σώστρατε,
   ἐρῶν άπῆλθες εὐθύς;
 \end{quote}

\item For text in the Greek language, it is recommended to use the
 \href{https://ctan.org/pkg/babel}{\emph{Babel}} package with the Greek
 language definitions in
 \href{https://ctan.org/pkg/babel-greek}{\emph{babel-greek}}.
 Babel sets the font encoding automatically to LGR and Greek Unicode
 characters work as expected. Write in the preamble, e.g.,
 %
 \begin{verbatim}
   \usepackage[english,greek,german]{babel}
\end{verbatim}
 %
 and use \verb+\foreignlanguage+ or \verb+\selectlanguage+ to set the text
 language to Greek (see the
 \href{https://ctan.org/pkg/babel-greek}{\emph{babel-greek}} documentation
 for detailed examples).

\item In combination with the
 \href{http://mirrors.ctan.org/language/greek/greek-fontenc/textalpha.sty.html}%
 {\emph{textalpha}} package from \emph{greek-fontenc}, Greek Unicode
 characters can be used in text with any font encoding -- just like the
 symbols provided by the ``textcomp'' package (i.e. with some limitations
 described in
 \href{https://mirrors.ctan.org/language/greek/greek-fontenc/textalpha-doc.pdf}
 {textalpha-doc}).
\makeatletter
\ifdefined\textalpha@define@breathings % textalpha package is loaded
 With the preamble lines
 \begin{verbatim}
   \usepackage{textalpha}
\end{verbatim}
 it is straightforward to write about π-mesons, γ-radiation, or a 50\,kΩ
 resistor.\footnote{%
   The MICRO SIGN and OHM SIGN characters are set up with
   \emph{textcomp} characters for any font encoding while
   GREEK CAPITAL LETTER OMEGA works only with the LGR font encoding.}
 Words and phrases should be wrapped in \verb|\ensuregreek| to preserve
 kerning or the Babel command \verb|\foreignlanguage{greek}| to also
 ensure correct hyphenation.

\item \sloppy
 In combination with the
 \href{http://mirrors.ctan.org/language/greek/greek-fontenc/alphabeta.sty.html}%
 {\emph{alphabeta}} package (also from \emph{greek-fontenc}),
 Greek Unicode literals can also be used in math mode:
 %
 \begin{verbatim}
   \usepackage{alphabeta}
\end{verbatim}
   \[
      \tan β = \frac{\sin β}{\cos β}.
   \]
\fi
\makeatother

\item Greek literal characters can also be used in PDF-strings (bookmarks and
 ToC entries with \href{https://ctan.org/pkg/hyperref}{\emph{hyperref}}).
 See \href{https://ctan.org/pkg/greek-fontenc}{\emph{greek-fontenc}}
 for a
 \href{https://mirrors.ctan.org/language/greek/greek-fontenc/hyperref-with-greek.pdf}
 {hyperref test and usage example}.

\end{itemize}


\section{Warning: unsafe ASCII input}

LGR is no ``standard font encoding''. Latin characters and some other ASCII
symbols are mapped to Greek equivalents if LGR is the active font encoding.
(See
\href{https://mirrors.ctan.org/language/babel/contrib/greek/usage.pdf}%
{usage.pdf} for a description of this Latin-Greek transliteration.)

This means you need an explicit language and/or font-encoding switch for
Latin words and abbreviations in Greek text, e.g., not
\ensuregreek{((ηία αντίσταση 750-kΩ))} but
\ensuregreek{((ηία αντίσταση 750-\ensureascii{k}Ω))}

Special care is also required with the question mark characters:
\begin{itemize}
 \item The Unicode standard says character U+003B SEMICOLON and not
       U+037E GREEK QUESTION MARK, is the preferred character for a
       ``Greek question mark'' (erotimatiko),
 \item The LGR font encoding maps a SEMICOLON to a middle dot (ano teleia),
       while the Latin question mark ``?'' is mapped to the erotimatiko.
\end{itemize}
Only the deprecated character U+037E GREEK QUESTION MARK works with both,
Xe/LuaTeX and 8-bit TeX. However, Unicode treats it as equivalent
to U+003B SEMICOLON so a quote copy-pasted from a source using U+037E may
end up with U+003B and middle dots instead of erotimatiko!
\makeatletter
\ifdefined\textalpha@define@breathings
 Compare the source \url{greek-utf8.tex} and the PDF output:

 \begin{tabular}{llc}
   Input               & \latinencoding{} & \greekfontencoding  \\
   \midrule
   003F QUESTION MARK       & ?           & \ensuregreek{?} \\
   037E GREEK QUESTION MARK & not defined & \ensuregreek{;} \\
   003B SEMICOLON           & ;           & \ensuregreek{;} \\
   00B7 MIDDLE DOT          & ·           & \ensuregreek{·}
 \end{tabular}
\fi
\makeatother
\\
With the \href{https://ctan.org/pkg/babel-greek}{babel-greek} language
attribute ``keep-semicolon'' or the \emph{textalpha} package's
``keep-semicolon'' option, the SEMICOLON character can be used for the
erotimatiko also with LGR encoded fonts.


\section{Supported Characters}

Unicode definitions exist for all non-ASCII characters that can be rendered
with an LGR-encoded font.


\subsection{Greek and Coptic}

\greekscript
\begin{tabular}{ccccccccccccccccc}
\toprule
     0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 &
     A & B & \latinscript C & \latinscript D & E & \latinscript F\\
\midrule
 ␣ & ␣ & ␣ & ␣ & ʹ & ͵ & ␣ & ␣ &   &   & ͺ & ␣ & ␣ & ␣ & ; &  \\
   &   &   &   & ΄ & ΅ & Ά & · & Έ & Ή & Ί &   & Ό &   & Ύ & Ώ\\
 ΐ & Α & Β & Γ & Δ & Ε & Ζ & Η & Θ & Ι & Κ & Λ & Μ & Ν & Ξ & Ο\\
 Π & Ρ &   & Σ & Τ & Υ & Φ & Χ & Ψ & Ω & Ϊ & Ϋ & ά & έ & ή & ί\\
 ΰ & α & β & γ & δ & ε & ζ & η & θ & ι & κ & λ & μ & ν & ξ & ο\\
 π & ρ & ς & σ & τ & υ & φ & χ & ψ & ω & ϊ & ϋ & ό & ύ & ώ &  \\
 ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & Ϙ & ϙ & Ϛ & ϛ & Ϝ & ϝ & ␣ & ϟ\\
 Ϡ & ϡ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣\\
 ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣ & ␣\\
\bottomrule
\end{tabular}
\latinscript

\smallskip\noindent
legend: ␣ glyph missing in LGR, <\emph{space}> Unicode point not defined


\subsection{Greek Extended}

\greekscript
\begin{tabular}{cccccccccccccccc}
\toprule
     0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 &
     A & B & \latinscript C & \latinscript D & E & \latinscript F\\
\midrule
ἀ & ἁ & ἂ & ἃ & ἄ & ἅ & ἆ & ἇ & Ἀ & Ἁ & Ἂ & Ἃ & Ἄ & Ἅ & Ἆ & Ἇ\\
ἐ & ἑ & ἒ & ἓ & ἔ & ἕ &   &   & Ἐ & Ἑ & Ἒ & Ἓ & Ἔ & Ἕ &   &  \\
ἠ & ἡ & ἢ & ἣ & ἤ & ἥ & ἦ & ἧ & Ἠ & Ἡ & Ἢ & Ἣ & Ἤ & Ἥ & Ἦ & Ἧ\\
ἰ & ἱ & ἲ & ἳ & ἴ & ἵ & ἶ & ἷ & Ἰ & Ἱ & Ἲ & Ἳ & Ἴ & Ἵ & Ἶ & Ἷ\\
ὀ & ὁ & ὂ & ὃ & ὄ & ὅ &   &   & Ὀ & Ὁ & Ὂ & Ὃ & Ὄ & Ὅ &   &  \\
ὐ & ὑ & ὒ & ὓ & ὔ & ὕ & ὖ & ὗ &   & Ὑ &   & Ὓ &   & Ὕ &   & Ὗ\\
ὠ & ὡ & ὢ & ὣ & ὤ & ὥ & ὦ & ὧ & Ὠ & Ὡ & Ὢ & Ὣ & Ὤ & Ὥ & Ὦ & Ὧ\\
ὰ & ά & ὲ & έ & ὴ & ή & ὶ & ί & ὸ & ό & ὺ & ύ & ὼ & ώ &   &  \\
ᾀ & ᾁ & ᾂ & ᾃ & ᾄ & ᾅ & ᾆ & ᾇ & ᾈ & ᾉ & ᾊ & ᾋ & ᾌ & ᾍ & ᾎ & ᾏ\\
ᾐ & ᾑ & ᾒ & ᾓ & ᾔ & ᾕ & ᾖ & ᾗ & ᾘ & ᾙ & ᾚ & ᾛ & ᾜ & ᾝ & ᾞ & ᾟ\\
ᾠ & ᾡ & ᾢ & ᾣ & ᾤ & ᾥ & ᾦ & ᾧ & ᾨ & ᾩ & ᾪ & ᾫ & ᾬ & ᾭ & ᾮ & ᾯ\\
ᾰ & ᾱ & ᾲ & ᾳ & ᾴ &   & ᾶ & ᾷ & Ᾰ & Ᾱ & Ὰ & Ά & ᾼ & ᾽ & ι & ᾿\\
῀ & ῁ & ῂ & ῃ & ῄ &   & ῆ & ῇ & Ὲ & Έ & Ὴ & Ή & ῌ & ῍ & ῎ & ῏\\
ῐ & ῑ & ῒ & ΐ &   &   & ῖ & ῗ & Ῐ & Ῑ & Ὶ & Ί &   & ῝ & ῞ & ῟\\
ῠ & ῡ & ῢ & ΰ & ῤ & ῥ & ῦ & ῧ & Ῠ & Ῡ & Ὺ & Ύ & Ῥ & ῭ & ΅ & `\\
  &   & ῲ & ῳ & ῴ &   & ῶ & ῷ & Ὸ & Ό & Ὼ & Ώ & ῼ & ´ & ῾ &  \\
\bottomrule
\end{tabular}
\latinscript


\subsection{Other Unicode Blocks}

\begin{description}

\item [Latin-1 Supplement:] \ensuregreek{¨ « ¯ ´ · »}
\item [IPA Extensions:] \ensuregreek{ə} LATIN SMALL LETTER SCHWA
\item [Spacing Modifier Letters:]
     \ensuregreek{˘α} (BREVE, here followed by letter alpha)
\item [General Punctuation:]
     \ensuregreek{– — ‘ ’ ‰} ZWNJ (zero width no joiner, prevents kerning
     and ligatures, e.g. \ensuregreek{A‌‌U} vs. \ensuregreek{AU} and
     \ensuregreek{'‌a} vs. \ensuregreek{'a})
\item [Currency Symbols:] \ensuregreek{€}
\item [Letter-like Symbols:] Ω  % OHM SIGN, preferred representation is 03A9
\item [Ancient Greek Numbers:] \ensuregreek{
     𐅄 \textpentedeka{}    % GREEK ACROPHONIC ATTIC FIFTY
     𐅅 \textpentehekaton{} % GREEK ACROPHONIC ATTIC FIVE HUNDRED
     𐅆 \textpenteqilioi{}  % GREEK ACROPHONIC ATTIC FIVE THOUSAND
     𐅇 \textpentemuria{}   % GREEK ACROPHONIC ATTIC FIFTY THOUSAND
     }
\end{description}


\section{up/downcasing}

Capital Greek letters have diacritics (except the dialytika, macron, and
breve) to the left (instead of above) and drop them in uppercase, e.g.
\ensuregreek{μαΐστρος → ΜΑΪΣΤΡΟΣ}.

The implementation of \verb|\MakeUppercase| changed significantly in the
2022/06 LaTeX release (cf. LaTeX News 35). Since then, Greek uppercase rules
are only applied if the text language is set to ``greek'' with Babel.
See \href{https://ctan.org/pkg/babel-greek}{\emph{babel-greek}} for details
and a comprehensive test document.


\section{Test kerning/ligatures}


Check for kerning and unwanted ligatures:

\begin{quote}
 \greekscript

Αἀα Αἁα Αἂα Αἃα Αἄα Αἅα Αἆα Αἇα ΑἈα ΑἉα ΑἊα ΑἋα ΑἌα ΑἍα ΑἎα ΑἏα

Αἐα Αἑα Αἒα Αἓα Αἔα Αἕα ΑἘα ΑἙα ΑἚα ΑἛα ΑἜα ΑἝα

Αἠα Αἡα Αἢα Αἣα Αἤα Αἥα Αἦα Αἧα ΑἨα ΑἩα ΑἪα ΑἫα ΑἬα ΑἭα ΑἮα ΑἯα

Αἰα Αἱα Αἲα Αἳα Αἴα Αἵα Αἶα Αἷα ΑἸα ΑἹα ΑἺα ΑἻα ΑἼα ΑἽα ΑἾα ΑἿα

Αὀα Αὁα Αὂα Αὃα Αὄα Αὅα ΑὈα ΑὉα ΑὊα ΑὋα ΑὌα ΑὍα

Αὐα Αὑα Αὒα Αὓα Αὔα Αὕα Αὖα Αὗα ΑὙα ΑὛα ΑὝα ΑὟα

Αὠα Αὡα Αὢα Αὣα Αὤα Αὥα Αὦα Αὧα ΑὨα ΑὩα ΑὪα ΑὫα ΑὬα ΑὭα ΑὮα ΑὯα

Αὰα Αάα Αὲα Αέα Αὴα Αήα Αὶα Αία Αὸα Αόα Αὺα Αύα Αὼα Αώα

Αᾀα Αᾁα Αᾂα Αᾃα Αᾄα Αᾅα Αᾆα Αᾇα Αᾈα Αᾉα Αᾊα Αᾋα Αᾌα Αᾍα Αᾎα Αᾏα

Αᾐα Αᾑα Αᾒα Αᾓα Αᾔα Αᾕα Αᾖα Αᾗα Αᾘα Αᾙα Αᾚα Αᾛα Αᾜα Αᾝα Αᾞα Αᾟα

Αᾠα Αᾡα Αᾢα Αᾣα Αᾤα Αᾥα Αᾦα Αᾧα Αᾨα Αᾩα Αᾪα Αᾫα Αᾬα Αᾭα Αᾮα Αᾯα

Αᾰα Αᾱα Αᾲα Αᾳα Αᾴα Αᾶα Αᾷα ΑᾸα ΑᾹα ΑᾺα ΑΆα Αᾼα Α᾽α Αια Α᾿α

Α῀α Α῁α Αῂα Αῃα Αῄα Αῆα Αῇα ΑῈα ΑΈα ΑῊα ΑΉα Αῌα Α῍α Α῎α Α῏α

Αῐα Αῑα Αῒα Αΐα Αῖα Αῗα ΑῘα ΑῙα ΑῚα ΑΊα Α῝α Α῞α Α῟α

Αῠα Αῡα Αῢα Αΰα Αῤα Αῥα Αῦα Αῧα ΑῨα ΑῩα ΑῪα ΑΎα ΑῬα Α῭α Α΅α Α`α

Αῲα Αῳα Αῴα Αῶα Αῷα ΑῸα ΑΌα ΑῺα ΑΏα Αῼα Α´α Α῾α

\end{quote}

\end{document}


Problems with text-extraction from PDF with Kerkis:

    0  1 2 3 4  5   6 7  8  9  A  B  C D E    F
370  ␣  ␣ ␣ ␣  ΄  ͵  ␣ ␣         ι ␣  ␣ ␣   ;
380            ΄ ΅  ΄Α   ΄Ε ΄Η  ΄Ι   ΄Ο   ΄Υ  ΄Ω
390   ΐ Α Β Γ ∆  Ε   Ζ Η Θ    Ι Κ  Λ Μ  Ν Ξ    Ο
3Α0  Π  Ρ   Σ Τ  Υ  Φ  Χ Ψ   Ω   Ϊ Ϋ  ά έ  ή    ί
3Β0  ΰ  α ϐ γ δ  ε   Ϲ η  ϑ   ι κ  λ  µ ν  ξ   ο
3῝0  π  ϱ ς σ τ  υ   ϕ χ  ψ  ω   ϊ ϋ  ό ύ  ώ
3∆0  ␣  ␣ ␣ ␣ ␣  ␣   ␣ ␣     Ϟ  Ϝ  ϝ  Ϝ ϝ  ␣   ϟ
3Ε0     ϡ ␣ ␣ ␣  ␣   ␣ ␣  ␣  ␣  ␣  ␣  ␣ ␣  ␣   ␣
3Φ0  ␣  ␣ ␣ ␣ ␣  ␣   ␣ ␣  ␣  ␣  ␣  ␣  ␣ ␣  ␣   ␣

03B6    zeta                      replaced by 03F9      GREEK CAPITAL LUNATE SIGMA SYMBOL
03B8    GREEK SMALL LETTER THETA  replaced by 03D1      GREEK THETA SYMBOL
03C1    GREEK SMALL LETTER RHO    replaced by 03F1      GREEK RHO SYMBOL
03C6    GREEK SMALL LETTER PHI    replaced by 03D5      GREEK PHI SYMBOL


and GFS Didot:

      0   1   2  3  4  5   6 7  8  9  A  B  C D  E   F
370 ␣      ␣   ␣  ␣   ´  ͵ ␣  ␣         ι ␣  ␣ ␣   ;
380                   ´  ῆ Α
                        ´ ´    ´Ε ´Η  ´Ι   ´Ο   ´Υ  ´Ω
390 ῆ  ´ι Α   Β   Γ  ∆  Ε Ζ   Η  Θ  Ι  Κ  Λ Μ  Ν  Ξ   Ο
3Α0 Π      Ρ       Σ  Τ  Υ Φ   Χ Ψ   Ω   ῆ
                                       Ι ῆ
                                         Υ  ά έ  ή    ί
3Β0 ῆ ´υ  α    β  γ  δ  ε ζ   η  ϑ   ι κ  λ  μ ν  ξ   ο
3῝0 π    ρ    ς  σ  τ  υ φ   χ  ψ  ω   ι
                                       ῆ υ
                                         ῆ  ό ύ ώ
3∆0 ␣      ␣   ␣  ␣  ␣  ␣ ␣   ␣           Ϛ  Ϝ Ϝ  ␣   Ϟ
3Ε0            ␣  ␣  ␣  ␣ ␣   ␣  ␣  ␣   ␣ ␣  ␣ ␣  ␣   ␣
3Φ0 ␣      ␣   ␣  ␣  ␣  ␣ ␣   ␣  ␣  ␣   ␣ ␣  ␣ ␣  ␣   ␣