NAME
Catalyst::Plugin::Params::Demoronize - convert common UTF-8 and
Windows-1252 characters to their ASCII equivalents
SYNOPSIS
# Be sure and use the Unicode plugin if you want to handle Unicode
# replacement.
use Catalyst qw(Unicode Demoronize);
# Optionally enable replacement of common unicode "smart" characters.
MyApp->config->{demoronize} = { replace_unicode => 1 }
DESCRIPTION
to borrow a few passages from the documentation packaged with john
walker's demoronizer.pl:
...as is usually the case when you encounter something shoddy in the
vicinity of a computer, Microsoft incompetence and gratuitous
incompatibility were to blame. Western language HTML documents are
written in the ISO 8859-1 Latin-1 character set, with a specified
set of escapes for special characters. Blithely ignoring this
prescription, as usual, Microsoft use their own "extension" to
Latin-1, in which a variety of characters which do not appear in
Latin-1 are inserted in the range 0x82 through 0x95--this having the
merit of being incompatible with both Latin-1 and Unicode, which
reserve this region for additional control characters.
These characters include open and close single and double quotes, em
and en dashes, an ellipsis and a variety of other things you've been
dying for, such as a capital Y umlaut and a florin symbol. Well,
okay, you say, if Microsoft want to have their own little
incompatible character set, why not? Because it doesn't stop
there--in their inimitable fashion (who would want to?)--they
aggressively pollute the Web pages of unknowing and innocent victims
worldwide with these characters, with the result that the owners of
these pages look like semi-literate morons when their pages are
viewed on non-Microsoft platforms (or on Microsoft platforms, for
that matter, if the user has selected as the browser's font one of
the many TrueType fonts which do not include the incompatible
Microsoft characters).
You see, "state of the art" Microsoft Office applications sport a
nifty feature called "smart quotes." (Rule of thumb--every time
Microsoft use the word "smart," be on the lookout for something
dumb). This feature is on by default in both Word and PowerPoint,
and can be disabled only by finding the little box buried among the
dozens of bewildering option panels these products contain. If
enabled, and you type the string,
"Halt," he cried, "this is the police!"
"smart quotes" transforms the ASCII quote characters automatically
into the incompatible Microsoft opening and closing quotes. ASCII
single and double quotes are similarly transformed (even though
ASCII already contains apostrophe and single open quote characters),
and double hyphens are replaced by the incompatible em dash symbol.
What other horrors occur, I know not. If the user notices this
happening at all, their reaction might be "Thank you Billy-boy--that
looks ever so much nicer," not knowing they've been set up to look
like a moron to folks all over the world.
these characters are commonly inserted into form elements via cut and
paste operations. in many cases, they are converted to UTF-8 by the
browser. this plugin will replace both the unicode characters AND the
Windows-1252 characters with sane ASCII equivalents.
UNICODE
Demoronize assumes that you are using Catalyst::Plugin::Unicode to
convert incoming parameters into Unicode characters. If you are not and
enable optional "replace_unicode", you may have issues.
CONFIG
replace_unicode
If this flag is enabled (it is off by default) then commonly substituted
Unicode characters will be converted to their ASCII equivalents.
replace_map
A map of Unicode characters and their ASCII equivalents that will be
swapped. This can be overridden, but defaults to:
METHODS
prepare_parameters
Converts parameters.
AUTHOR
Mike Eldridge <
[email protected]>
CONTRIBUTORS
* Cory Watson <
[email protected]>
* Chisel Wright <
[email protected]>
* Michele Beltrame <
[email protected]>