%%
%% This is file `elsarticle-template-num.tex',
%% generated with the docstrip utility.
%%
%% The original source files were:
%%
%% elsarticle.dtx  (with options: `numtemplate')
%%
%% Copyright 2007, 2008 Elsevier Ltd.
%%
%% This file is part of the 'Elsarticle Bundle'.
%% -------------------------------------------
%%
%% It may be distributed under the conditions of the LaTeX Project Public
%% License, either version 1.2 of this license or (at your option) any
%% later version.  The latest version of this license is in
%%    http://www.latex-project.org/lppl.txt
%% and version 1.2 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.
%%
%% The list of all files belonging to the 'Elsarticle Bundle' is
%% given in the file `manifest.txt'.
%%

%% Template article for Elsevier's document class `elsarticle'
%% with numbered style bibliographic references
%% SP 2008/03/01

%%%original
%\documentclass[preprint,12pt]{elsarticle}

%% Use the option review to obtain double line spacing
%\documentclass[authoryear,preprint,review,12pt,times]{elsarticle}
\documentclass[authoryear,preprint,12pt,times]{elsarticle}

%% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
%% for a journal layout:
%% \documentclass[final,1p,times]{elsarticle}
%% \documentclass[final,1p,times,twocolumn]{elsarticle}
%% \documentclass[final,3p,times]{elsarticle}
%% \documentclass[final,3p,times,twocolumn]{elsarticle}
%% \documentclass[final,5p,times]{elsarticle}
%% \documentclass[final,5p,times,twocolumn]{elsarticle}

%% if you use PostScript figures in your article
%% use the graphics package for simple commands
%% \usepackage{graphics}
%% or use the graphicx package for more complicated commands
%% \usepackage{graphicx}
%% or use the epsfig package if you prefer to use the old commands
%% \usepackage{epsfig}

%% The amssymb package provides various useful mathematical symbols
\usepackage{amssymb}
%% The amsthm package provides extended theorem environments
%% \usepackage{amsthm}

%% The lineno packages adds line numbers. Start line numbering with
%% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
%% for the whole article with \linenumbers.
%% \usepackage{lineno}

\journal{Journal of Visual Communication and Image Representation}



%% nic

%%%Path automático para figuras
\graphicspath{{figures/}}

%%%Para URLs
\usepackage{url}

%%%Algoritmo
\usepackage{algorithm}
\usepackage{listings,relsize}
\lstset{language=Python,flexiblecolumns=true,fontadjust=true,basicstyle=\fontfamily{pbk}\selectfont\smaller,mathescape=true,numbers=left,numberstyle=\tiny}
%\lstset{language=Python,fontadjust=true,mathescape=true}
%\lstset{language=Python,mathescape=true}

%%%subfigures
\usepackage{subfigure}

%% nic





\begin{document}

\begin{frontmatter}

%% Title, authors and addresses

%% use the tnoteref command within \title for footnotes;
%% use the tnotetext command for theassociated footnote;
%% use the fnref command within \author or \address for footnotes;
%% use the fntext command for theassociated footnote;
%% use the corref command within \author for corresponding author footnotes;
%% use the cortext command for theassociated footnote;
%% use the ead command for the email address,
%% and the form \ead[url] for the home page:
%% \title{Title\tnoteref{label1}}
%% \tnotetext[label1]{}
%% \author{Name\corref{cor1}\fnref{label2}}
%% \ead{email address}
%% \ead[url]{home page}
%% \fntext[label2]{}
%% \cortext[cor1]{}
%% \address{Address\fnref{label3}}
%% \fntext[label3]{}

\title{A method to encode high-quality regions in JPEG files using multiplication overflow}

%% use optional labels to link authors explicitly to addresses:
%% \author[label1,label2]{}
%% \address[label1]{}
%% \address[label2]{}

\author[label1]{Nicolau Leal Werneck\corref{cor1}\fnref{money}}
\ead{[email protected]}
\ead[url]{http://www.lti.pcs.usp.br/\~{}nwerneck}
\address[label1]{%
Av. Prof. Luciano Gualberto, trav.3, n.158\\
Escola Polit\'ecnica (LTI---PCS---EPUSP)\\
Universidade de S\~ao Paulo\\
05508-900 S\~ao Paulo, SP, Brasil\\
Tel: +55-11-3091-5397\\
Fax: +55-11-3091-5294\\
(Tel2: +55-11-7014-1017)\\
}
\cortext[cor1]{}
%%
\author[label2]{Hani Camille Yehia\fnref{money}}
\ead{[email protected]}
\address[label2]{CEFALA--CPDE--UFMG}
%%
\author[label1]{Anna Helena Reali Costa\fnref{money}}
\ead{[email protected]}
\fntext[money]{The authors were supported by grants from CNPq and CAPES.}





\begin{abstract}
The lossy encoding of images with regions of interest (ROI) is a contemporary trend in image compression. Recent standards such as JPEG~2000 often support this kind of encoding, but the difficulties in replacing the popular JPEG standard motivates the search for a simple modification to JPEG in order to endow it with this feature. This paper contributes such a modification. The resulting codec is capable of encoding images with high-quality regions while keeping the original JPEG encoding on the rest of it. There is no need to modify the file format, only the baseline codec programs must be slightly modified. The technique is based on modular arithmetics, and was successfully implemented. Tests show an increase in file size when the technique is applied to whole images. The explanation of this effect offers an insight on how quantization contributes to the reduction of file size. Usually the elimination of information is regarded as the only relevant factor, but there are cases where the overall shaping of t
he probability distribution of the encoded values must be considered.
\end{abstract}

\begin{keyword}
%% keywords here, in the form: keyword \sep keyword
%% PACS codes here, in the form: \PACS code \sep code
%% MSC codes here, in the form: \MSC code \sep code
%% or \MSC[2008] code \sep code (2000 is the default)
JPEG \sep Region of interest (ROI) \sep Quantization \sep Number
 theory \sep Entropy encoding \sep Image compression
\end{keyword}

\end{frontmatter}

%% \linenumbers







%% main text

%%
%%
%%------------------------------------------------------------------------
\section{Introduction}
\label{intro}
In image compression it is often possible to sacrifice image quality for compression rate. Lossy compression schemes explicetely explore this compromise, originating parametric codecs where the user user can select the desired quality or where bit-rate limitations can be met.

In some applications the reduction of overall image quality is not desirable. Instead, it would be preferable to preserve a higher quality in a few {\em regions of interest} (ROI) of the image while reducing the quality only outside of them. These regions may simply be defined by an user according to his or her needs, but encoders can also automatically select regions, allocating bits to different areas of the image in an heterogeneous way that optimizes image quality at a given compression rate~\citep{Bradley2003232,content-dep}. Regions may also be selected by some other automatic criterion in more specialized applications. The complete input to a codec capable of ROI encoding is then the original image along with a definition of how quality should vary along it.

This kind of heterogeneous encoding is only a modern trend. The first lossy compression methods to become widely available were capable of applying just the same quality setting throughout the images. This is the case of the JPEG image coding standard, which doesn't support ROI encoding natively. Newer standards proposed to replace JPEG, such as JPEG~2000, are often able to perform ROI encoding, but no such standard has yet come close to become as popular as JPEG.

The JPEG standard defines a lossy method to encode and compress images, capable of high compression rates while retaining good perceptual quality. It has become a widely accepted standard for photographic images. JPEG was created in the early 1990s, and approved as the ITU-T standard T.81 in 18th September of 1992~\citep{103089,T.81,salomon}. The same text also became the ISO/IEC standard 10918-1.
% que referência será essa?... \citep{Gibson:1998:DCM}

Since its creation, JPEG has enjoyed a great popularity. It became the format of choice in portable digital photographic cameras and photographic images in general. Codification schemes similar to JPEG are also used in other standards, such as the MPEG video format~\citep{MPEG-2}. Though JPEG was conceived with relative flexibility, almost all applications use a single standardized set of parameters, called the baseline specification. Other possibilities such as lossless, progressive and hierarchical compression, and also the arithmetic coding are seldomly used.

Newer codification standards such as JPEG~2000 focus not only on higher compression performance, but also on other functionalities~\citep{overview,sceb:j2k-vs-stds-spie,salomon}. One of these is the support for the codification of regions of interest~\citep{Bradley2003232,content-dep}.

Because ROI encoding is such an interesting technique that JPEG lacks, and given the strong hegemony of this standard, we propose here a modified JPEG codec capable of doing ROI encoding. This implementation only takes minor modifications to the original transcoder algorithms. Because of this simplicity, the modified codec would still benefit from much of the research made on JPEG, such as post-processing techniques, and optimized software and hardware designs. The file format is not changed at all, although the files generated are not intended to be read by the traditional decoders.  There is forward compatibility, outside of a minor detail, and there is backward compatibility only in the regions outside the ROI. There is no extra data structure to be transmitted, or standard data structures to be omitted in the files used by the new codec.

Not only the method proposed here is a valuable tool, but also understanding its properties gives an insight on the nature of scalar quantization and its role on the JPEG technology. While this article is largely directed to this specific compression standard, the technique presented is based on a general principle. Our proposal affects only the scalar quantizer stage of the transcodification, and thus it can also be applied to any other system where this technique is used, as in JPEG~2000 itself.

The rest of this paper is organized as follows: in Section~\ref{sec:tradsys} the original JPEG codec is presented, as well as some other proposals to enhance it. In Section~\ref{sec:newsys} we detail the proposed modifications to JPEG in order to support ROI. Experiments and their results are presented in Section~\ref{sec:res}. Finally, in Section~\ref{sec:conc} conclusions are drawn based on the experiments conducted.











%%
%%
%%------------------------------------------------------------------------
\section{The original system and some proposed modifications}
\label{sec:tradsys}
The JPEG codec is first and foremost based on the use of a transform operation.  The image is tiled, and each part is transformed to the discrete cosine transform (DCT) domain, and the coefficients in this domain are the values that are encoded~\citep{salomon}. This contrasts to scale space methods such as run-length encoding~\citep{salomon}. The DCT is also used in other codecs, such as DV and MPEG-2 video compression standards~\citep{DV,MPEG-2}. JPEG also applies to these coefficients a quantization step, and performs an entropy encoding in a final stage. These values that are transmitted to the decoder through the entropy encoding are the coefficient values scaled by quantization factors, and will be often refered to in this article as the {\em transmitted values} or also {\em quantized values} --- altought this latter term might more accurately describe the coefficient values after the scale is restored.

In this section of the article we will review the operation of the parts of the JPEG encoder and decoder that are relevant to our work, and will also review some of the modifications already proposed to JPEG in order to endow it with complimentary features.

We focused our studies on grayscale, i.e. single-channel images. The codification of colored images in JPEG is directly based on the codification of grayscale images, and its peculiarities are not relevant to our work.

\subsection{Codification}
The JPEG codec is composed by several stages. Figure~\ref{fig:block} displays the stages among these which are most relevant to our work. First the image is divided into square blocks 8 pixels wide. The DCT stage receives these blocks in a sequence, and outputs the DCT coefficients of each one of them.

According to the T.81 standard, the input image to a JPEG encoder must have a bit depth of either 8 bits --- used in the baseline specification --- or 12 bits~\citep{T.81}. Another newer standard called T.851 specifies a codification base in T.81 that allows the use of $8$ to $16$ bits of precision~\citep{T.851}. The resulting precision for the DCT coefficients is 8 times larger (or 3 bits wider) than the bit depth of the scale-space image, therefore the DCT coefficients have 11 bits in the 8 bits-per-pixel case.

In the next stages these spectral-domain coefficients are then quantized, and the resulting values are entropy-encoded. The final bit-stream consists of both the compressed data and complimentary information, and is stored in a structured file format such as JFIF~\citep{JFIF}.

\begin{figure}
%84 ou 174mm
%  \includegraphics[width=84mm]{figures/blocks.eps}
%\resizebox{84mm}{!} {\input{figures/vblocks.pstex_t}}
\resizebox{70mm}{!} {\input{figures/vblocks.pdf_t}}
\caption{Block diagram for the JPEG codec.}
\label{fig:block}
\end{figure}

The quantization stage performs an inherently lossy operation, and is the main responsible for quality control and information loss in the transcodification process. This stage is controlled by a list of quantization values $Q_n$ supplied {\it a priori}. These values determine the granularity of the $n^{\rm th}$ DCT coefficient of each block, and the same values are used along the whole image. For each DCT coefficient $c_n$ the quantizer outputs a transmitted value $t_n$ given by:
\begin{equation}
t_n = \left\lfloor{c_n\over Q_n}+ {1\over 2}\right\rfloor,
\end{equation}
where $\left\lfloor r\right\rfloor$ denotes the largest integer smaller than $r$.

The output of the quantization stage is delivered to an entropy-encoder, the third block in the Figure~\ref{fig:block} diagram. Each possible value $t_n$ constitutes a symbol to be transcoded by this stage. The T.81 standard allows the use of either a Huffman or an MQ coder~\citep{T.81}. Huffman is by far the most popular. The T.851 standard adds a third option, called Q15-coder~\citep{T.851}.

\subsection{Decodification}
The decoder reads a bitstream from the input file which is decoded using the appropriate routines for the entropy coding method selected at the codification. This stage yields values that are fed to the dequantizer, and the numbers produced there are finally used to compose blocks of coefficient values that are processed by an inverse DCT to produce each block of the image, which are then concatenated to form the whole image.

The dequantizer step works by simply multiplying the transmitted value $t_n$ --- which was calculated in the quantizer step of the encoder --- by the corresponding quantization factor $Q_n$. Therefore the dequantizer output $\hat{c}_n$ is
\begin{equation}
\label{eq:c-norig}
 \hat{c}_n = t_nQ_n.
\end{equation}
These operations are all deterministic, but the overall effect is an irreversible function that causes the reduction of possible values of the decoded DCT coefficients, and is given by the expression
\begin{equation}
 \hat{c}_n = Q_n\left\lfloor{c_n\over Q_n}+ {1\over 2}\right\rfloor.
\end{equation}
In the original image we had $c_n$ at the input of the quantizer block, but in the decoded image the corresponding values $\hat{c}_n$ will be mostly innacurate values that only approximate the original ones. The difference between the original and retrieved coefficient values can be seen as a form of noise added to the signal, often refered to as {\em quantization noise}.








%%
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Some proposed modifications to JPEG}
Ever since the specification and adoption of JPEG by its users, researchers have tried to fully understand its qualities and limitations, and find out how to increase its powers. These researches can be approximately categorized in two classes: those restrained by the standard, and those that propose more drastic modifications to the codecs.

As codification techniques that have explored the original JPEG specifications, we can mention methods to find optimal quantization tables for a given image~\citep{wu1993}, pre-~and post-processors, specially those directed at eliminating codification artifacts~\citep{DBLP:journals/tip/PopoviciW07,nosratinia01,shen98,877222}, and finally modifications limited to either the encoder~\citep{DBLP:journals/tip/RamchandranV94} or to the decoder, such as Biased dequantization~\citep{844179}.

Some more intense modification proposals to JPEG include modulating the quantization table along the image~\citep{golner2002,rosenholtz1996,watson:202}, discarding high-frequency coefficients~\citep{DBLP:conf/itcc/Hsieh05}, and transmitting some of the coefficients using watermarking techniques~\citep{DBLP:conf/acivs/KurodaMFI07}. There is also the simple solution of transmitting extra data separately from the JPEG data, and then combine them in a post-processing. For example, one can transmit a JPEG file and then a second file containing the coordinates of a ROI inside the first image and a bitmap with that region encoded with a different method. The final image is then produced by taking the values for the pixels inside the ROI from the second image, while using the JPEG image otherwise.

The codec proposed in this article belongs to this second class, where there is a great departure from the original standard, but a peculiarity is that the data transmitted has exactly the same structure as the one used in the original JPEG codec. The same amount of DCT coefficient values per block is transmitted, and the transmitted values are used in a very similar way as in the standard. Also, no complimentary processing steps are needed, but only modifications to the behavior of the standard processing blocks --- more specifically the quantizer and dequantizer blocks.

These characteristics of our proposal are advantageous because it makes the implementation of our method very easy compared to other codec modification proposals. The transmission of complimentary data structures that is usually necessary in other proposals requires the programmer to deal with the parsing and integration of the data.  Implementing this can be daunting, specially when modifying existing codes instead of producing a new codec from scratch. For example, the number of coefficients transmitted at each block might change, or there may be a need to re-calculate the dequantized coefficients considering values obtained from complimentary sources. This does not happen in the approach we propose, where the data handling operations are kept all the same from the original codec. We simply modify the functions used in the quantization and dequantization stages and manage to create a codec based in JPEG that supports the encoding of high-quality regions of interest in the image.

As detailed in the next section, the proposed encoder simply selects the suitable values to transmit in order to obtain the desired effect in the decodification. This is similar to what happens in proposals of pre-processors and of more elaborate encoders that do not simply use the division operation in the quantization step, but still generate files complying to the standard decoder. However, since we also need a modification in the decoder, our proposal is an actual modification to JPEG instead of just an exploration of obscure possibilities allowed by the standard. The files generated by the proposed encoder are just like the ones from the original standard, but are not intended to be read by the original decoder. A standard decoder can interpret the file, but in the regions of interest of the images encoded with our technique it would render a meaningless signal. The other regions can be decoded exactly, apart from a minor detail.











%%
%%
%%------------------------------------------------------------------------
\section{The proposed modification}
\label{sec:newsys}
This section presents an explanation of how the technique was developed, starting with the modification made to the decoder. The modification needed for the encoder arises as a natural counterpart to this new decoder. The section closes with a description of how to modify an existing codec to implement the technique, and how one specific and very popular real-world codec has been actually modified by us.

\subsection{Principle of operation of the proposed decoder}
To understand the technique proposed here we must first look at the quantization operation in an unconventional way. Table \ref{tab:1} illustrates the effect of quantization on the input values $c_n$ for a certain $Q_n=5$. The reduction in the number of possible values for the decoded coefficients $\hat{c}_n$ and the quantization error introduced are evident. For example, all five $c_n$ values from $3$ to $7$ are mapped to the single $\hat{c}_n$ value of $5$ in the table.

\begin{table}[H]
\caption{Effect of original codification on the input.}
\label{tab:1}       % Give a unique label
\footnotesize
\begin{tabular}{rl}
\hline\noalign{\smallskip} $c_n$
=&\verb+0 1 2 3 4 5 6 7  8  9 10 11 12 13 14 15+\\ $t_n$
=&\verb+0 0 0 1 1 1 1 1  2  2  2  2  2  3  3  3+\\ $\hat{c}_n$
=&\verb+0 0 0 5 5 5 5 5 10 10 10 10 10 15 15 15+\\ \noalign{\smallskip}\hline
\end{tabular}
\end{table}

These coefficient values $c_n$ produced by the DCT stage have a limited range.  As mentioned previously, the precision of the values output by this step is $11$ bits, what means a range $-1024\le c_n\le 1023$. This limited range for the $c_n$ values determines the range of possible transmitted values $t_n$ for a given quantization value. In our example, with $Q_n=5$ we have $-205\le t_n\le205$. Note that for coefficient values near the extremes the dequantization may produce values of $\hat{c}_n$ outside the range of legal values for the coefficients. This is the case in our example, since $205\times5=1025$, above the upper limit of $1023$.

In the original decoder this decoded value $\hat{c}_n$ is clipped, i.e. limited to be kept inside the allowed range. In our example any value $t_n\ge 204$ will result in $\hat{c}_n=1023$, and $t_n\le -204$ in $\hat{c}_n=-1024$. This clipping results in proper values for the coefficients, and as close as possible from the previously calculated ones. This hard clipping of values happens somewhere else too: it is also used after the inverse DCT to ensure valid pixel levels.

In the conventional operation of the codec these clippings are most often subtle changes to the calculated values, but is is still possible for a file complying to the standard to contain transmitted values that result in drastic clippings. This could happen if a file is decoded with the wrong quantization table, for example. It is important to understand at this moment that there is no enforced restriction whatsoever on the quantized values $t_n$ that are transmitted in a JPEG file in relation to the corresponding $Q_n$. The limits we observe in the values of $t_n$ at the moment of the decodification are merely an indirect consequence of how the codec works. The only limit these values must comply to is the precision of the original DCT coefficient. The codec must be able to handle values of $t_n$ at least that large because of the possibility of having $Q_n=1$. Unexpected values of $t_n$, breaking that artificial limit, can merely cause a significant clipping to occur at the decodification, but they are su
ccessfully transmitted nonetheless.

The idea our method is based on is to abandon this clipping and, instead, allow the multiplication to overflow when unexpected values of $t_n$ are processed.  This was inspired in what naturally happens in many digital circuits, and might even already happen in some existing oversimple decoder implementations.  Instead of limiting the result, we take its modulus by the proper base $\beta$.

By using overflow instead of clipping, the restored coefficient values will now be calculated by
\begin{equation}
\hat{c}_n = t_nQ_n {\rm~mod~} \beta.
\end{equation}
The difference of this equation to Equation~\ref{eq:c-norig} is the modulus operation, where originally we would have a clipping function --- which is not shown in the original formula because it is considered a separate process in the original codec. Both equations are similar, and their results differ only when the product $Q_nt_n$ reaches a certain threshold value.

Table \ref{tab:2} shows what happens to the decoded values if we use modulus instead of clipping. In this example, to illustrate how the technique works, we considered just positive values, and an upper limit of $15$ instead of the actual value of $1023$. This limit implies a base $\beta=16$ for the modulus, whereas in the actual decoder the value is $\beta=2048$. Notice that to deal with negative numbers we offset the values before applying the modulus operation, and compensate this afterwards. Thus the processing is essentially moved to the domain of positive numbers. In this example case with $\beta=16$ we are not considering negative values for didactic purposes, but it is the base we would use when working with the interval from $-8$~to~$7$.

Figure~\ref{fig:mod16} shows the values of the $\beta=16$ example in graphical form, where it is possible to identify the familiar curves that result from the clipping and modulus operations. It is possible to see in the later case that, for these specific values of $\beta$ and $Q_n$, this modified dequantization becomes an invertible function, because there are no repeated output values for inputs from zero to~$\beta-1$.

\begin{table}[H]
\caption{Values of $\hat{c}_n$ obtained with the traditional (clip) and proposed (mod) decoders.}
\label{tab:2}       % Give a unique label
\footnotesize
\begin{tabular}{rl}
\hline\noalign{\smallskip} $t_n$
=&\verb+0 1  2  3  4  5  6  7  8  9 10 11 12+\\ $t_n\cdot Q_n$
=&\verb+0 5 10 15 20 25 30 35 40 45 50 55 60+\\ $\hat{c}_n$ (clip)
=&\verb+0 5 10 15 15 15 15 15 15 15 15 15 15+\\ $\hat{c}_n$ (mod)
=&\verb+0 5 10 15  4  9 14  3  8 13  2  7 12+\\ \noalign{\smallskip}\hline
\end{tabular}
\end{table}

\begin{figure}
%  \includegraphics[angle=-90,width=84mm]{mod16-crop.pdf}
 \includegraphics[width=80mm,clip=true,trim=0 30 0 0]{zigzag.pdf}
\caption{Values generated by multiplication followed by either clipping of the result, or by finding the remainder from a division by $\beta=16$.}
\label{fig:mod16}
\end{figure}

When applying our technique, the former dequantization operation becomes a multiplication in the quotient ring $\mathbf{Z}/\beta\mathbf{Z}$~\citep{rotman05}. For usual values of $t_n$, found by the standard quantization operation in the encoder and therefore indirectly limited as discussed before, the new decodification operation will still yield the same desired values (except for the boundary values that used to get clipped).  The interesting fact is that if we now transmit unusual values for $t_n$, values that cannot be produced by the standard quantization, the corresponding $\hat{c}_n$ calculated will be not only valid values that can be used without problem by the following inverse DCT step, but are values that we can predict and control.

%% I N Herstein, Abstract Algebra IME /QA162^H572a
%% Joseph A. Gallian, Contemporary Abstract Algebra IME /QA150^G168c


This dequantization step originally received values of $t_n$ whose range was indirectly restricted by $Q_n$ and by the largest possible value of the coefficients. If a large $t_n$ value is received in the original decoder, the recovered $\hat{c}_n$ will simply be too large and get clipped. However, since we are now performing multiplications in a finite ring, the modified dequantization step now yields valid coefficient numbers, and these values obtained from the unusual values of $t_n$ can be values that were formerly being discarded by the quantizer.

Originally, the values of $\hat{c}_n$ were only multiples of $Q_n$, and the extreme values attained when clipping occurs. But with our modified dequantization stage, if $Q_n$ and $\beta$ are coprimes, any number $0\le c_n<\beta$ can be obtained at the new decoder. This is a well-known fact from abstract algebra~\citep[Proposition 2.71]{rotman05}.

When $\beta$ is a power of 2, the coprimality restriction only means that $Q_n$ must be an odd number. To be able to have any possible $\hat{c}$ at the decodification, all we have to do now is to make use of all possible values for $t_n$, and not just the restricted set that the original quantizer offers.

In standard operation each $Q_n$ is a control parameter that restricts the range of $t_n$, and the set of possible values recovered as $\hat{c}_n$. In our method the $Q_n$ is used as the generator of a cyclic group~\citep{rotman05}. The $t_n$ is now an index to the sequence of integers $g^p$ obtained by $g^{p+1} = g^p +Q_n {\rm~mod~} \beta$, with $g^{0}=0$.\footnote{In abstract algebra the exponential notation $g^p$ is more usual than the multiplicative $g\cdot p$ that might be more familiar to some readers.}

\subsection{Building an encoder for the decoder}
In the previous subsection we described how to modify the standard JPEG decoder in order to allow the production of any desired value of coefficients for the inverse DCT operation, while retaining much similarity to the way the original decoder works.

In order to actually explore this modified decoder we need to create an encoder that discovers the values of $t_n$ that must be transmitted so that the desired $\hat{c}_n$ is produced at the decoder. While the modification done to the decoder was very simple, the corresponding modification that must be done to the encoder is somewhat more complicated. The values that must be found at the encoding are the indices of $\hat{c}_n$ in the previously mentioned sequence, and one way to solve the problem is to simply generate the sequence for each $Q_n$ and perform searches.

Another way to solve this problem is by finding out the multiplicative inverse $k$ of $Q_n$ in the codec base. The multiplication of a number by its multiplicative inverse can be understood as the analogous in modular algebra to the division operation, which is used in the original quantization procedure.

The multiplicative inverse $k$ of a number $Q_n$ in the base $\beta$ is given by:
\begin{equation}
kQ_n\equiv 1\pmod\beta.
\end{equation}
If the transmitted values are then calculated as $t_n=c_nk$, the decoder will calculate
\begin{equation}
\hat{c}_n=t_nQ_n \bmod\beta = c_nkQ_n\bmod\beta = c_n.
\end{equation}
The coprimality restriction ensures that this multiplicative inverse will exist, and it is possible to find it by means of the Extended Euclidean Algorithm~\citep[Chapter 33]{DBLP:books/mg/CormenLRS01}.

Therefore, we can not only have any possible value at the decoder, but there are ways to effectively transmit any desired value through the codification and decodification chain. To create a ROI in an encoded file using this modified codec, all we have to do is to calculate the appropriate $t_n$ for the coefficients of the blocks where we want enhanced image quality. In the rest of the image we can simply use the value given by the original quantizer.

This means that the image does not have information about where the ROI is located, putting this method along a family of similar codecs that do not receive this information explicitly. The decoder can try to detect these regions though, checking whether the received $t_n$ values are the expected for the corresponding $Q_n$.

\subsection{Summary of the new encoder and decoder algorithms}
The encoder of our proposed codec works almost exactly as in the JPEG standard in regions of the image where higher quality is not desired. A modification must be done to avoid transmitting quantized values that exceed the maximum and minimum allowed values for the coefficients when multiplying by the quantization factor after rounding. A form of pre-emptive limiting is therefore needed, because there will be no clipping of values at the decoder.

In image blocks where higher quality is desired, the encoder must calculate for each value $c_n$ of the DCT coefficients its product by the multiplicative inverse of the quantization factor $Q_n$ of that coefficient. This product must be calculated with modular arithmetics, and it helps to previously translate the range of possible values for $c_n$ to non-negative numbers, and then back again. The basis of these modular arithmetics operations is $\beta=2048$ for the baseline JPEG case, because the precision of the DCT coefficients is $11$ bits.

At the decoder there is a multiplication by $Q_n$ of the quantized values $t_n$, which are the values read from the JPEG file. In the standard decoder the resulting product must be clipped to make it conform to the limited range of the DCT coefficients. In the modified decoder, instead of clipping this value we calculate the remainder of its division by $\beta$. This remainder is in the less usual form of negative and positive numbers, and can be easily calculated by translating the value to bring the problem to the domain of non-negative numbers, and moving back again, as done in the encoder.

\subsection{Implementation}
Algorithm~\ref{alg:codec} presents a draft of the proposed encoder and decoder algorithms, written in the Python language. The lines 15, 29 and 31 are the lines where the quantization and dequantization happen in the original codec. The only modifications performed on the original program to generate this one are the introduction of lines 12, 13 and 30, and the creation in the encoder of the control mechanism that checks whether each block is inside the ROI or not, and applies either the classic or the new method at the quantization. Note that the clipping operation at line 31 could be removed, because the modulus at line 30 guarantees valid values. The effect of the clipping will be negligible, tough, so it can be mantained as long as the operations happen in that order. While in theory the modulus replaces the clipping, during implementation by the modification of an existing program it is easier just to perform the modulus right after the multiplication, and leave the lines that perform the clipping where
they are.

\begin{algorithm}
%\footnotesize
\lstinputlisting{algo.py}
\caption{Description of the proposed codec in Python language.}
\label{alg:codec}
\end{algorithm}

The encoder and decoder are pipelines with corresponding inverse stages, as shown in Figure~\ref{fig:block}. The input to the \lstinline{encode} function is a raster image that is appropriately split into blocks of 8$\times$8 pixels. Each block is then iteratively processed by the \lstinline{encode_block} function. This function begins performing the DCT transform of the pixel values, originating the vector $c$ of coefficient values, which is quantized by the \lstinline{quantize} function to create the vector $t$ of values to be transmitted. The list \lstinline{symbols} contains the $t$ vector for each block in the image, and this sequence of all transmitted values is processed by the \lstinline{entropy_encode} function to create the final bitstream that is stored in a file. The function \lstinline{is_roi} sets a flag to \lstinline{quantize} that indicates if the block is inside the region of interest, and should be encoded with the proposed algorithm, or with the original one.

To decode the file, the bitstream is input to the \lstinline{decode} function. This function recovers the $t$ vector relative to each block of the image, and applies the \lstinline{decode_block} function to them iteratively. This function calls the \lstinline{dequantize} function to produce a $c$ vector and create image blocks by applying the inverse DCT transform to its values. This list of blocks is finally used to assemble the decoded image, which is returned.

The modifications performed are in the \lstinline{quantize} and \lstinline{dequantize} functions. The original quantization is the case where \lstinline{is_roi} is false, and there is a simple division of each coefficient value $c_n$ by the corresponding $Q_n$ value, read from the input vectors $c$ and $Q$. The original dequantization receives $t$ and $Q$, and simply performs the corresponding multiplication to the quantization division, followed by a clipping operation (function {\lstinline{clip}) to assure that the values of $c$ are inside the valid range.

In the proposed codec, the division at the \lstinline{quantize} function is substituted in the selected blocks by a modular multiplication by the appropriate $k$ factor, calculated for each $Q_n$ using the \lstinline{find_multiplicative_inverse} function. The dequantization at the proposed decoder starts with a multiplication just like in the original algorithm. The difference is that in the next step this calculated product is input to the \lstinline{mod} function with the appropriate base, instead of being clipped. For most coefficient values calculated with the original quantization algorithm this modification will not be relevant, and its problems can be circumvented in a more sophisticated implementation.


We successfully produced the codec programs using our technique, making small modifications to the IJG library~\citep{libJPEG6}. A new version of this library has been released~\citep{libJPEG7}, but the modifications described here still apply. First we modified the macro called {\tt DEQUANTIZE} in the file {\tt jidctint.c}. This macro contains the line where the multiplication between an input coefficient value and the corresponding quantization factor happens. It was substituted for a function that implements the overflow behavior, using a function that calculates the remainder of the division of a given number by another. The result is the modulus of the numerator with the denominator as a base. To handle negative values, the lower limit is also added to the coefficient value before this modulus calculation, and subtracted right afterwards.

In the encoder, we first had to implement the Extended Euclidean Algorithm.  This routine is called for each value of $Q_n$ to find the corresponding multiplicative inverse $k$. We then modified the function {\tt forward\_DCT} in the file {\tt jcdctmgr.c}. The value of $c_nk$ was then sent to the next stage, instead of the original $t_n$, on the blocks inside the region of interest, where full image quality is desired.

%%
%%
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results}
\label{sec:res}
Figure \ref{fig:solid} shows the proposed decoder working. The first image has 64 blocks with the same size that is used in the segmentation done in JPEG, $8\times8$ pixels. The blocks have solid color, with all 256 possible gray intensities, and the image was encoded with $Q_n=1$. The file was then manually corrupted to make the DC level quantization factor $Q_{00}=3$.\footnote{Notice we use the $n$ index in $Q_n$ as a short notation to the line and column in the $\mathbf{Q}$ matrix. It might also mean the position in the vector obtained by stacking the elements of $\mathbf{Q}$ as it happens in JPEG, using the so-called zig-zag sequence.}  The original program outputs Figure~\ref{fig:block_b} for the corrupted file, where clipping occurred at the top and bottom sections, and in the middle there is a scaling of the original intensity values by a factor of $3$. This image has only 87 different pixel intensity levels.

Figure~\ref{fig:block_c} was decoded by our program. Its blocks have all the 256 possible intensities, although in a different order from the non-corrupted file. In the middle sections, this and the previous image are equal, but in the top and bottom sections of Figure~\ref{fig:block_c} the modulus in the proposed decoder translates the dequantized values to the allowed range, creating two other intensity gradients, but with values that are skipped in the two other sections of the image. Figure~\ref{fig:mod16} can be useful to understand the properties of these images.


\begin{figure}
 \subfigure[]{\includegraphics[width=27mm]{solid100.png}\label{fig:block_a}}~
 \subfigure[]{\includegraphics[width=27mm]{solid-det3.png}\label{fig:block_b}}~
 \subfigure[]{\includegraphics[width=27mm]{solid-det3-nic.png}\label{fig:block_c}}
\caption{Image~\subref{fig:block_a} is the original image with 256 blocks. Images~\subref{fig:block_b} and~\subref{fig:block_c} are the decodifications of the corrupted file by the original decoder and by the proposed decoder respectively.}
\label{fig:solid}
\end{figure}
%\begin{figure*}
%       \includegraphics[width=56mm]{figures/solid100.eps}~
%       \includegraphics[width=56mm]{figures/solid-det3.eps}~
%       \includegraphics[width=56mm]{figures/solid-det3-nic.eps}
%\caption{Original image with 256 blocks (left), and decodification of the corrupted file by the original decoder (center), and by ours (right).}
%\label{fig:solid}
%\end{figure*}

All images presented in Figure~\ref{fig:lena} are decodifications of JFIF files that possess the same quantization table. In Figure~\ref{fig:lena_a} we used our technique in the whole image. In Figure~\ref{fig:lena_b} only 42 blocks over Lena's face were kept with full precision. At Figure~\ref{fig:lena_d} the image does not have any full-quality areas, and is equivalent to an image encoded by the original system.

\begin{figure}
       \subfigure[]{\includegraphics[width=40mm]{lena_full.png}\label{fig:lena_a}}~
       \subfigure[]{\includegraphics[width=40mm]{lena_new.png}\label{fig:lena_b}}\\[1mm]
       \subfigure[]{\includegraphics[width=40mm]{lena_back.png}\label{fig:lena_c}}~
       \subfigure[]{\includegraphics[width=40mm]{lena_old.png}\label{fig:lena_d}}
\caption{Image \subref{fig:lena_a} uses our technique in the whole area. Image \subref{fig:lena_b} has a ROI in the middle, and \subref{fig:lena_d} does not have
 a ROI. Image \subref{fig:lena_c} is the result of opening \subref{fig:lena_b} with
 a standard decoder. They all have the same quantization table. }
\label{fig:lena}
\end{figure}

%\begin{figure*}
%       \includegraphics[width=56mm]{figures/lena_full.eps}~
%       \includegraphics[width=56mm]{figures/lena_new.eps}~
%       \includegraphics[width=56mm]{figures/lena_old.eps}
%\caption{First image uses our technique in the whole area. The second image has a ROI in the middle, and the last image does not have a ROI.}
%\label{fig:lena}
%\end{figure*}

Figure~\ref{fig:lena_c} shows what happens if an image encoded with this technique is opened by a standard decoder. The ROI where we wanted higher quality will most probably be rendered into an unrecognizable garbage, making the method only partially backward-compatible.  Forward-compatibility is not complete only because when extreme values are decoded in the original system, sometimes we obtain coefficients slightly outside the allowed range, that should be clipped. In our decoder, such values will get mapped to others probably very distant from the original coefficient values.

Next we present an analysis of the impact of the proposed method on the file size. We performed the test with 102 similar photographic images produced by professionals and available from a commercial website. They had about $10^5$ pixels each, and $2/3$ aspect ratio. The images were encoded with different encoders and settings, and the resulting file sizes relative to the original sizes are presented in Table~\ref{tab:sizes}. The error estimates are $2.5$ times the standard deviation.

\begin{table}[H]
\centering
\footnotesize
\begin{tabular}{rll}
\hline\noalign{\smallskip} \multicolumn{1}{r}{Encoder}& Standard & Optimized \\
%
\hline\noalign{\smallskip}
Original JPEG (full quality) &
$ 60.5\% \pm  24.9\%$ & $ 57.4\% \pm  20.7\%$ \\
Original JPEG (low quality) &
$ 10.5\% \pm   6.5\%$ & $ 10.1\% \pm   6.5\%$ \\
Proposal &
$198.2\% \pm  81.6\%$ & $ 94.6\% \pm  34.4\%$ \\
\hline
\end{tabular}
\caption{Size of the files produced by different encoders and parameters.}
\label{tab:sizes}
\end{table}


The original encoder was first used with a quantization table with all $Q_n=1$, generating the files in the first line of Table~\ref{tab:sizes}. Then it was used with a quantization table tuned for high compression and low quality, generating the results at the second line of the table. This quantization table was obtained with slight modifications from the one created by the original encoder program with a quality parameter of $25$. At last the images were encoded using this same quantization table but this time applying our method in the whole area, generating the results at the third line of the table. All these codifications were also performed both with the standard and optimized Huffman tables at the entropy encoding stage, and each case repesent a column at Table~\ref{tab:sizes}.

While the images encoded with the proposed method have a low quality quantization table, they are just as good as the images encoded without quantization. While the quality is the same, the application of our method resulted in a significative increase of file size. This should not pose a problem for the application of the method to images with small ROIs, tough. But it is a phenomenon that deserves further analysis. Another intersting related fact is that the use of the standard table for the Huffman codification with the proposed technique resulted in very large files, but the creation of optimized tables worked very well to reduce this effect, while not being much relevant in any test with the original encoder. This can be seen comparing both columns for each line in Table~\ref{tab:sizes}. The use of optimized codes can reduce the resulting file sizes by half only if the proposed technique has been used, otherwise the gain is very small.

To understand why the files get so large when the proposed technique is used, and why the default Huffman code is far from optimal in this case, we must look at the probability density function (PDF) of the values transmitted by each method.  Figure~\ref{fig:hist} presents a measurement of this PDF for the files encoded by the original algorithm with and without quantization. Figure~\ref{fig:hist2} shows the PDF of the orignal encoder without quantization and also the PDF that results from the proposed technique when using the low quality quantization table. The curves were obtained by counting the number of values in intervals of $16$ units. The graphic also presents approximate models for two of the histograms: a uniform distribution, and a Laplacean.



\begin{figure}
\centering
%  \includegraphics[width=84mm]{histjpeg.pdf}
 \includegraphics[width=90mm]{histjpeg.pdf}
\caption{Normalized histograms of the transmitted values in the original JPEG codec without quantization ($Q_n=1$) and with a typical quantization table. The dotted curve shows one approximate formula for this probability distribution.}
\label{fig:hist}
\end{figure}

\begin{figure}
\centering
 \includegraphics[width=90mm]{histnq.pdf}
\caption{Normalized histogram of the transmitted values.}
\label{fig:hist2}
\end{figure}


%std 0,0010926329457041203
%mean 0,00048828125000000011


In the original JPEG codec the distribution maintains the natural Laplacean shape of the AC components~\citep{844179}. Low-quality compression has a similar curve, except it is scaled down, with reduced variance. The PDF in our method approaches the uniform distribution covering the whole range of possible values. This resulting distribution has a larger variance and entropy then in the original encoder. That is the reason for the increase in file size, and why the default Huffman table, targeted at a peaked distribution, performs so badly with our technique. The PDF of the values in the original encoder has a shape that can benefit from the entropy encoding. Our method maps the values in such a way that this property is lost, creating an approximately uniform PDF that is very different from the one supposed by the standard coding table, and cannot benefit much from a posterior entropy encoding at all.





\section{Conclusion}
\label{sec:conc}
This article describes a method to turn codecs based on quantization more flexible, and is specially suited to create regions of interest (ROI) where the codification quality is higher than in the rest of the encoded signal. It is based on the substitution for the modulus operation of the clipping function that is usually applied after the dequantization step. This modification can be exploited to allow the transmission of a generic value. The encoder only has to calculate the multiplication of the desired value by the multiplicative inverse of the quantization factor in the base of the modulus operation. An application of this method to the JPEG image codec was implemented and tested.

The best quality JPEG can offer is obtained when no quantization is performed, with all quantization factors set to $1$. Our method can be applied to create images with the same high quality, but using typical quantization tables. While the resulting quality is the same, the files created with the proposed encoder are larger than with the original. The explanation of this effect requires a detailed understanding of how JPEG works. Usually the reduction of the number of possible DCT coefficient levels is regarded as the main cause for the decrease in file size. But the proposed encoder produced large files because the PDF of the values it transmits has a larger entropy than in the original encoder. So the quantization step must be seen not as just reducing the number of possible values and entropy of each individual DCT coefficient, but as shaping the overall PDF and determining its entropy. By shaping the PDF appropriately the quantizer can make the subsequent entropy encoding more or less efficient. Sophist
icated quantization strategies might be devised to produce optimal results.

The method presented here can be used to modify any codification technique based on quantization to implement ROI encoding. Because it can archieve sub-optimal compression levels in some cases, it is only advisable for applications where the the selected regions are not very large. What is attractive in the technique is the ease of implementation by the modification of an existing codec. It was carried out through small modifications to the standard JPEG coding and decoding procedures, and the data structures stored in the intermediate file were exactly the same ones found in a standard JPEG file. This contrasts to many other techniques for ROI encoding where supplemental data must be stored and the transcoding pipeline must be extended with new processing blocks.

One of the possibilities we would like to study next is creating not only areas of high quality, but also intermediate. We would also like to see the technique applied to other real-life quantization-based transcoding systems that might benefit from it.

%-------------------------------------------------------------------------

























%% The Appendices part is started with the command \appendix;
%% appendix sections are then done as normal sections
%% \appendix

%% \section{}
%% \label{}





%\begin{thebibliography}{00}

\bibliographystyle{elsarticle-num-names}
\bibliography{roi_jpeg}

%\end{thebibliography}

\end{document}
\endinput
%%
%% End of file `elsarticle-template-num.tex'.




%% Local Variables:
%% mode: latex
%% mode: longlines
%% fill-column:84
%% default-justification: left
%% end: