* * * * *
“Every time the shipper takes away a pallet from the shipping room, the
server times out within two seconds.”
> Last year at my job we had a pretty severe problem just as unexplainable.
>
> The day after an unscheduled closing (hurricane), I started getting calls
> from users complaining about database connection timeouts. Since I had a
> very simple network with less than 32 nodes and barely any bandwidth in
> use, it was quite scary that I could ping to the database server for 15- 20
> minutes and then get "request timed out" for about 2 minutes. I had
> performance monitors etc. running on the server and was pinging the server
> from multiple sources. Pretty much every machine except the server was able
> to talk to the others constantly. I tried to isolate a faulty switch or a
> bad connection but there was no way to explain the random yet periodic
> failures.
>
> I asked my coworker to observe the lights on a switch in the warehouse
> while I ran trace routes and unplugged different devices. After 45-50
> minutes on the walkie-talkie with him saying "ya it's down, ok it's back
> up," I asked if he noticed any patterns. He said, "Yeah… I did. But you're
> going to think I'm nuts. Every time the shipper takes away a pallet from
> the shipping room, the server times out within 2 seconds." I said "WHAT???"
> He said "Yeah. And the server comes back up once he starts processing the
> next order."
>
Via Hacker News [1], “chime comments on The case of the 500-mile email [2]”
This is every bit as amusing as the 500-mile email (The story about a server
that refused to send email more than 500 miles away) [3] and shows that bugs
can be very hard to debug, especially when they aren't caused by bug-ridden
code.
I'm fortunate in that I've never had to debug such issues.
[1]
https://news.ycombinator.com/item?id=13347058
[2]
https://www.reddit.com/r/reddit.com/comments/vunp/the_case_of_the_500mi
[3]
http://www.ibiblio.org/harris/500milemail.html?
Email author at
[email protected]