fix unicode glitch in DCS strings, patch by Tim Allen - st - simple terminal | |
git clone git://git.suckless.org/st | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
commit 818ec746f4caae453d09368b101c3e841cf39870 | |
parent 9ba7ecf7b15ec2986c6142036706aa353b249ef9 | |
Author: Hiltjo Posthuma <[email protected]> | |
Date: Wed, 17 Jun 2020 21:35:39 +0200 | |
fix unicode glitch in DCS strings, patch by Tim Allen | |
Reported on the mailinglist: | |
" | |
I discovered recently that if an application running inside st tries to | |
send a DCS string, subsequent Unicode characters get messed up. For | |
example, consider the following test-case: | |
printf '\303\277\033P\033\\\303\277' | |
...where: | |
- \303\277 is the UTF-8 encoding of U+00FF LATIN SMALL LETTER Y WITH | |
DIAERESIS (ÿ). | |
- \033P is ESC P, the token that begins a DCS string. | |
- \033\\ is ESC \, a token that ends a DCS string. | |
- \303\277 is the same ÿ character again. | |
If I run the above command in a VTE-based terminal, or xterm, or | |
QTerminal, or pterm (PuTTY), I get the output: | |
ÿÿ | |
...which is to say, the empty DCS string is ignored. However, if I run | |
that command inside st (as of commit 9ba7ecf), I get: | |
ÿÿ | |
...where those last two characters are \303\277 interpreted as ISO8859-1 | |
characters, instead of UTF-8. | |
I spent some time tracing through the state machines in st.c, and so far | |
as I can tell, this is how it works currently: | |
- ESC P sets the "ESC_DCS" and "ESC_STR" flags, indicating that | |
incoming bytes should be collected into the strescseq buffer, rather | |
than being interpreted. | |
- ESC \ sets the "ESC_STR_END" flag (when ESC is received), and then | |
calls strhandle() (when \ is received) to interpret the collected | |
bytes. | |
- If the collected bytes begin with 'P' (i.e. if this was a DCS | |
string) strhandle() sets the "ESC_DCS" flag again, confusing the | |
state machine. | |
If my understanding is correct, fixing the problem should be as easy as | |
removing the line that sets ESC_DCS from strhandle(): | |
diff --git a/st.c b/st.c | |
index ef8abd5..b5b805a 100644 | |
--- a/st.c | |
+++ b/st.c | |
@@ -1897,7 +1897,6 @@ strhandle(void) | |
xsettitle(strescseq.args[0]); | |
return; | |
case 'P': /* DCS -- Device Control String */ | |
- term.mode |= ESC_DCS; | |
case '_': /* APC -- Application Program Command */ | |
case '^': /* PM -- Privacy Message */ | |
return; | |
I've tried the above patch and it fixes my problem, but I don't know if | |
it introduces any others. | |
" | |
Diffstat: | |
M st.c | 1 - | |
1 file changed, 0 insertions(+), 1 deletion(-) | |
--- | |
diff --git a/st.c b/st.c | |
@@ -1897,7 +1897,6 @@ strhandle(void) | |
xsettitle(strescseq.args[0]); | |
return; | |
case 'P': /* DCS -- Device Control String */ | |
- term.mode |= ESC_DCS; | |
case '_': /* APC -- Application Program Command */ | |
case '^': /* PM -- Privacy Message */ | |
return; |