What does it do? It opens a file, reads it, and then prints "an integer
representing the Unicode code point of" the first character in that
file.
So, after consulting "man ascii" and doing an "echo a >foo", you'd
expect this to print "2, 0x61" (length of 2 due to the final newline).
*A lot* of Python code I've seen and written does a simple
"open(filename, mode)". But, uhm ... The type of the variable "content"
is "str", which, in Python, means a "sequence of Unicode code points".
In other words, Python *decodes* the file you're reading on the fly. But
according to which encoding? ASCII? UTF-8? Something else? Well, nobody
knows, it is in fact platform-specific.
Let's make the example more obvious. On a shell prompt, do this:
$ printf '\360\237\220\247\n' >foo
This writes 5 bytes to the file. On my system, running the Python script
now prints:
$ ./test.py
2, 0x1F427
Python decoded the file and "content" is now a "str" of length 2. It
holds exactly one Unicode code point (a penguin emoji) and the final
newline. In my case, Python decoded the file using UTF-8, because I'm
using an UTF-8 locale (`en_US.UTF-8`).
But if you use another locale, you might get this:
$ LANG=en_US.ISO-8859-1 ./test.py
5, 0xF0
Completely different result.
If you want to force Python to use UTF-8, you have to do this:
with open('foo', 'r', encoding='UTF-8') as fp:
content = fp.read()
Now the calls above show this:
$ ./test.py
2, 0x1F427
$ LANG=en_US.ISO-8859-1 ./test.py
2, 0x1F427
To be honest, I wasn't aware of this platform-specific behaviour and I
assumed that Python defaulted to UTF-8 here. Well, now I know that it
doesn't.