=================
No tech support
=================

A few days ago I wished to play with computer networks. Now I almost
wish I did not have to (though not exactly getting to, anyway: just
having issues with others' networks).

Yesterday I noticed an issue with calls, tried to debug it today:
oddly, an XMPP client kept failing to setup a call, and reconnecting
afterwards. Then I noticed that some messages are getting stuck after
that, too. Then noticed that no messages go to the server when it
happens, though they do come back. Same thing with different clients
and devices, all gets stuck upon sending a large packet (with all the
transport candidates). Then discovered that packets (either TCP or
ICMP) of 1487 bytes or more don't make it to the Hetzner networks
(neither this server nor hetzner.com) from me (connected via
Rostelecom), though the expected MTU is 1500. Apparently it requires
rooting to set a lower MTU on an Android phone, and that would be an
awkward workaround anyway. It also works fine from other ISPs (work
servers, mobile network operator).

Without really hoping to get help, I decided at least to try to
contact the Rostelecom's tech support, thinking that it wouldn't harm,
and perhaps the right thing to do in this situation. Sent all the ping
and traceroute data at once, described the issue, mentioned that it is
the same with TCP, bypassed the chat bot and reached a human. The
human refused to help if the computer from which I checked it is
connected via a home router (hence the title). When I asked whether
they could at least check whether large packets get to Hetzner from
Rostelecom network, they said that they can't ping "on my behalf".

I tried it without a router then, even though it is silly (perhaps
should have just lied), and attempted again to get it
investigated/fixed, though I was even less hopeful than before
contacting the support. They asked whether I have access to resources
despite the issues with packet sending then; I said that I have
trouble accessing services there because of the lost packets. And
they've finally mentioned that there were issues with other servers
today, but everything is fine on their end, and I should contact the
resource owners.

It is like that almost each time I try to contact the support,
Rostelecom's or other local ones. Ugh, I can't save this file now,
since tried to edit it in Emacs, remotely, and it is more than 1500
bytes. Will try to paste it via SSH instead. Tried now, failed to;
will have to paste slowly. Though the SSH connection I had is stuck
now, and failing to open a new one. Did ``sudo ip link set enp6s0 mtu
1486`` now, but it is awkward and wrong, and still does not fix calls
on the phone.

Maybe I will finally try to switch an ISP yet again once will run out
of money on the balance with this one. Though they all seem to be
awful like that, as are mobile network operators. And right now I am
not quite certain whether the issue is with Rostelecom or farther on
the path (dataix.eu, Hetzner itself, somewhere between those; though
Rostelecom's AS12389 is peering with Hetzner's AS24940, and traceroute
from me goes through both of those, so there are not many
organizations in the chain).

Update: the same thing happens with wikipedia.org
(91.198.174.192). Image loading seemed quite laggy from Wikipedia in
the past week or so, possibly that's related. I only hope it is
related to technical issues, and not to censorship and intentional
blocking (of which there is a lot these days). And maybe in the worst
case (if things won't be fixed), and before changing an ISP, I could
try setting MTU on the router, to set it everywhere via DHCP (if
Android reads that).

Update 2: ended up setting "26,1486" in DHCP server options on the
router for now, so that the phone can use it (and apparently the phone
indeed does use MTU advertised via DHCP).

Update 3: added an optional path MTU discovery setting (the
IP_MTU_DISCOVER socket option) into rexmpp, for TCP
sockets. Explicitly enabling it actually won't help in this case, and
it can be configured system-wide (/proc/sys/net/ipv4/ip_no_pmtu_disc,
see the ip(7) man page) as well, but wouldn't harm to have that
configurable, and noticed that IP_PMTUDISC_DONT actually does help,
since it disables the DF bit (i.e., allows fragmentation), and then
the packets get through. So apparently a router on the path doesn't
simply drop the packets, but wants to fragment them, and fails to
communicate it back.

Update 4: tried to poke Hetzner's support ([email protected]) as
well, but they replied in 3 days, asking to authenticate on their
website and submit a support request via a form, "for security and
privacy reasons". Trying that, too, though it begins as it was with
Rostelecom: they ask to jump through some silly hoops, which isn't
very promising. They replied after checking it, in a week, at which
point it was already clear that the issue is at Rostelecom; thanked
them and asked to close the ticket.

Update 5: no reply from Hetzner by 2023-04-22, but found that the last
hop I see with ``traceroute -F 157.90.29.18 1487`` is the last
Rostelecom router on the way, which suggests that the issue is between
it and the router at DATAIX (which later turned out to be Rostelecom's
as well, see below). Maybe it is the time to try writing to the
dataix.eu NOC directly (actually in the past I had a better experience
with writing to a NOC directly than with getting issues fixed via the
ISP's tech support, though I don't remember if it was Rostelecom or a
different one). Or that of Rostelecom, if I'll manage to find its
address (probably that's [email protected]; later looked up my mail
archive, that's the correct address and they were helpful 7 years ago,
when there was an issue between Rostelecom and another network, though
the packets weren't getting through at all back then).

Update 6: tried that traceroute command a few more times, and another
dataix.eu (actually Rostelecom's, at turned out later) router,
178.18.225.153, replied with "!F-1486" (an ICMP "packet too big"
message). So, perhaps the issue is at their router through which it
usually goes, 178.18.227.8.

Update 7: actually even ping by itself eventually discovers correct
MTU, receiving a reply about it from 178.18.225.153, though that takes
a few minutes (hundreds of ping packets) until an ICMP PTB
message. The 178.18.225.153 DATAIX's address also shows up instead of
178.18.227.8 when trying traceroute the other way around (from Hetzner
to Rostelecom).

Update 8: 2023-04-23, wrote to the DATAIX NOC. They replied quickly
that things seem to be fine on their end, mentioned that
178.18.225.153 is Rostelecom's router, and attached ping output
showing that it doesn't reply to larger ping packets, while does to
smaller ones (to me or to a Hetzner server it doesn't reply to
any). Wrote into the Rostelecom tech support's awkward chat again. Had
no reply in an hour, then discovered that apparently the message
wasn't sent at all, and there were some JavaScript errors on further
attempts to send messages from the same web browser tab. Sent it as an
attachment (a text file) from another browser tab. Then spent another
hour (actually a bit more) convincing some clueless person that there
is an issue, that the routers in question seem to be theirs
(Rostelecom's, at least according to he DATAIX NOC; otherwise the
support person did claim at some point that "it is not our zone of
responsibility", as did another one previously), sending them
screenshots of traceroute output because they can't read plaintext
attachments, reloading the buggy chat, etc, and then they've finally
filed a ticket further (though without a ticket monitoring method: I
can't see what is its status, and what was filed by that clueless
person; I suspect that critical details could have gone missing
easily, and something like "hetzner.com is not available" was
reported).

Update 9: the next day Rostelecom called, but to advertise some junk
services and software (they keep trying to push Kaspersky software for
years, and something else on top now) instead of being about the
ticket; I wonder whether they use tickets as an excuse to spam, or
it's just a coincidence. While the issue is still there. Blocked and
reported that phone number (to whatever service Google uses by default
for spam tracking). Later in the day I missed a call, which apparently
was from the technical support.

Update 10: 2023-04-25, the issue is finally (and hopefully
permanently) fixed: the route is the same, but larger IP packets (up
to 1500 bytes) do get through now, to both Hetzner servers and
Wikipedia. Tried to confirm that it is fixed via the chat, in case if
they need a feedback from the client, but the support told me that the
ticket is open, and they will just call again today (they have only
called on 2023-04-28, asking whether the issue is still there).

Update 11: received a message from DATAIX NOC mentioning that they
were unable to obtain more information from Rostelecom, but they've
also noticed that the 178.18.225.153 router now responds to larger
ICMP echo requests. Thanked them and confirmed that it appears to be
fixed (and that neither did I receive any useful information from
Rostelecom); it is nice of them to follow through.

Not sure what are the lessons learned here: I knew that NOCs/engineers
tend to be helpful, and that the first-line tech support tends to be
awful (pretty much everywhere; even worse with government services,
where I just give up sometimes), though perhaps the memory faded a
bit. Maybe I will try to keep less money on the balance, as I do with
Beeline/VimpelCom (which likes to silently drain balance with
automatically and silently enabled services), so that it would be
easier to switch in case if the next time it will be too challenging
to get the issues actually looked at and fixed.


----

:Date: 2023-04-14