mbsprint: improve printing output when it has invalid UTF data - sacc - sacc(om… | |
git clone git://bitreich.org/sacc/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65… | |
Log | |
Files | |
Refs | |
Tags | |
LICENSE | |
--- | |
commit edab539b23594219bbfc83729822da917a18a243 | |
parent c416c8c73d0a33eb8c428b1a9b9eaaffc098ee5b | |
Author: Hiltjo Posthuma <[email protected]> | |
Date: Tue, 5 Jan 2021 21:21:03 +0100 | |
mbsprint: improve printing output when it has invalid UTF data | |
Reset the decode state when mbtowc returns -1. The OpenBSD mbtowc(3) | |
man page says: "If a call to mbtowc() resulted in an undefined internal | |
state, mbtowc() must be called with s set to NULL to reset the internal | |
state before it can safely be used again." | |
Print the UTF replacement character (codepoint 0xfffd) for the invalid | |
codepoint or incomplete sequence and continue printing the line | |
(instead of stopping). | |
Remove the 0 return code as it can't happen because we're already | |
checking the string length in the loop. | |
Diffstat: | |
M sacc.c | 12 +++++++++--- | |
1 file changed, 9 insertions(+), 3 deletions(-) | |
--- | |
diff --git a/sacc.c b/sacc.c | |
@@ -110,12 +110,18 @@ mbsprint(const char *s, size_t len) | |
slen = strlen(s); | |
for (i = 0; i < slen; i += rl) { | |
- if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= … | |
- break; | |
+ rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4); | |
+ if (rl == -1) { | |
+ mbtowc(NULL, NULL, 0); /* reset state */ | |
+ fputs("\xef\xbf\xbd", stdout); /* replacement characte… | |
+ col++; | |
+ rl = 1; | |
+ continue; | |
+ } | |
if ((w = wcwidth(wc)) == -1) | |
continue; | |
if (col + w > len || (col + w == len && s[i + rl])) { | |
- fputs("\xe2\x80\xa6", stdout); | |
+ fputs("\xe2\x80\xa6", stdout); /* ellipsis */ | |
col++; | |
break; | |
} |