(C) Alec Muffett's DropSafe blog.
Author Name: Alec Muffett
This story was originally published on allecmuffett.com. [1]
License: CC-BY-SA 3.0.[2]
Andy Burrows has insinuating questions about WhatsApp Security; I don’t work for WhatsApp but I think I can answer most of them: The War On Regular Expressions
2023-03-21 09:27:03+00:00
So Andy posted this thread, and I thought I would have a go at answering it here because (a) he blocks me for previous corrections I’ve issued to his posts, and (b) he’s restricted who can respond…
https://twitter.com/_andyburrows/status/1637770139314561025
A few days ago @WhatsApp VP of Engineering @nagupta made the charge I was spreading ‘misinformation’ about what scanning activity the company performs on a user’s device. However he’s so far chosen not to reply to the following points: (1/6)
Okay, let’s do this:
1. Does @WhatsApp scan files on a user’s device for the use case of malware detection? This FAQ suggests it does:
https://faq.whatsapp.com/667552568038157/?helpref=hc_fnav&cms_platform=android (2/6)
The page Andy mentions is related to attachments that the user may sends or receive in WhatsApp. Generally it’s triggered by trying to send a file of a type (or with a filename extension) which the receiving phone will not be able to handle, or which might present a security risk.
This sounds dramatic and complicated to implement – insinuating that “surely just a little more effort could be applied in order to protect children!” – but basically the test is something like:
Does the filename end in .EXE?
…or something like that; or it might be that someone is attempting to send a filename ending in `.APK` (i.e. an Android application) in a way that would circumvent the age-checks and virus-checking of the Google App Store on Android devices.
It’s basic string-matching stuff.
Does this or any other activity, performed on a user’s device, use a text filtering engine which could allow targeting of an identifiable individual or group? If so, that feels much more privacy invasive & open to potential misuse than, say, scanning to detect a CSA hash (3/6)
The answer to the above would be “no” because these are all basic tests which can be implemented using what are called regular expressions, a popular text-matching tool, and the tests are applied to metadata of the attachment (i.e. the filename) rather than the content.
This clarity is important when @WhatsApp is actively influencing legislation which claims that on-device scanning for child abuse is incompatible with privacy and would undermine E2E. Strong safety and privacy are both important outcomes
https://www.bbc.co.uk/news/technology-64863448.amp (4/6)
Checks on metadata are not the same as scanning file contents; if a message contains a URL then in the process of making that URL clickable then again regular expressions are applied to see if the URL contains accents, diacritics, or UNICODE which might indicate that it’s attempting to deceive the user, for instance:
http://ämažon.com
The above looks suspicious, doesn’t it? Well spotted, you’re a regular expression!
Many of us will hold differing positions on how we uphold fundamental human rights & what are proportionate threat responses. But some clarity and candour about what @WhatsApp @Meta currently does is important when technical considerations will influence legislative design (5/6)
Regular Expressions. They are a very simple string-matching tool applied to filenames and to URLs when they are being rendered as “clickable”.
They are a very, very long distance from “scanning files on the user’s device” not to mention “processing file contents to create a fuzzy hash which can be matched against a per-nation-state supplied database of hashes of known abusive content (and leaked top-secret documents)”
These are important issues which deserve scrutiny. U.K. & EU legislators deserve to understand the facts when scrutinising the #OnlineSafetyBill and CSA legislative proposal, & if @Meta won’t share details, @CommonsDCMS @CommonsHomeAffs might wish to secure them (ENDS)
I recommend learning Python and how it provides Regular Expressions, it’s very accessible.
Whilst we’re here: in 2015 WhatsApp (opinion) abused the regular-expression URL-checking feature to prevent people clicking on Telegram group-chat invitation links; this business-focused usage was later rescinded, but it was a substantial influence on the reason that I left Facebook in 2016 – see “Reason 1, the Telegram Thing” in that blogpost.
Regular Expressions can likewise be used for good or ill; the best use of them is minimal, looking for cues of suspicious behaviour like someone wanting to send a .EXE file with a filename containing right-to-left UNICODE and a fake .PDF extension.
That’s trivial to test for, fortunately.
Again: all of this is guesswork and I am not affiliated with WhatsApp, but I will bet that I’m right.
Codicil
ps: For politicians who don’t see the difference between checking metadata vs: scanning content…
It’s the same difference as a politician leaving a lumpy parcel on the doorstep for having wires hanging out and an Ulster postmark, vs: having Royal Mail open-and-read all the post
Originally tweeted by Alec Muffett (@AlecMuffett) on 2023/03/21.
Postscript
Via @khaleesicodes, how on earth did I forget to include the requisite XKCD comic?
https://xkcd.com/208/
[END]
[1] URL:
https://alecmuffett.com/article/45435
[2] URL:
https://creativecommons.org/licenses/by-sa/3.0/
DropSafe Blog via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/alecmuffett/