Avoiding Roko's basilisk, part II

* * * * *

Avoiding Roko's basilisk, part II

The other day I came across this comment on Lobsters [1]:

> On a personal level I have helped various people get value out of AI
> (Artificial Intelligence) tools where they initially did not understand how
> to use it properly. But that setting is more of a 1:1 for a specific
> situation. For generic how to use agentic tools, there are so many articles
> already. Peter Steinberger has a multi hour talk online of him using an
> army of agents to write on his project.
>
> If someone has a specific situation where they failed using an agent,
> ideally with some open source code, I would be happy to have a look at it.
> It’s just hard to engage on abstract “does not work for me” posts.
>

“Comment on “AI Changes Everything” [2]”

I failed using an agent a few months ago [3]. It was on an open source
project of mine. Perhaps mitsuhiko would be happy to have a look at it. So I
replied [4].

And mitsuhiko was happy to look at it.

Or rather, spend a few minutes telling his “coding agent” to look at the code
and let it do its thing. So I took a look [5].

Development was done on a Mac, which doesn't have the vm86() system call, so
his agent, “Claude,” started writing an 8086 emulator. Or I should say, an
80386 emulator since that's the most common architecture these days. It also
came up with a few tests and once it those tests were working, it stopped.

When I tried the code, attempting to run RACTER.EXE, it just sat there,
turning my computer into a space heater. Looking a bit further, I saw there
was an option for debug output (but the option appears at the end of the
command line, not after the command itself, like every other command on
Unix). Then I saw line after line of

-----[ data ]-----
..
Execute: 2010:0020: 8B
Unhandled opcode at 2010:0020: 8B
Execute: 2010:0021: EC
Unhandled opcode at 2010:0021: EC
Execute: 2010:0022: 81
Unhandled opcode at 2010:0022: 81
Execute: 2010:0023: EC
Unhandled opcode at 2010:0023: EC
Execute: 2010:0024: 02
Unhandled opcode at 2010:0024: 02
Execute: 2010:0025: 00
Unhandled opcode at 2010:0025: 00
Execute: 2010:0026: 9A
Unhandled opcode at 2010:0026: 9A
Execute: 2010:0027: C2
Unhandled opcode at 2010:0027: C2
Execute: 2010:0028: 10
Unhandled opcode at 2010:0028: 10
Execute: 2010:0029: 52
Unhandled opcode at 2010:0029: 52
Execute: 2010:002A: 24
Unhandled opcode at 2010:002A: 24
Execute: 2010:002B: 9A
Unhandled opcode at 2010:002B: 9A
Execute: 2010:002C: A2
Unhandled opcode at 2010:002C: A2
Execute: 2010:002D: 19
Unhandled opcode at 2010:002D: 19
Execute: 2010:002E: 52
Unhandled opcode at 2010:002E: 52
..
-----[ END OF LINE ]-----

To say I was underwhelmed is an understatement.

The thread somewhat petered out.

I noticed today that mitsuhiko gave it another attempt [6]. He put the whole
thing into Docker so he could run under a Linux VM, and the code now could
run enough of RACTER.EXE to display the banner:

-----[ shell ]-----
[spc]lucy:/tmp/racter>/tmp/NaNoGenMo-2015/C/msdos RACTER.EXE

.-----------------------------------------------------,
| |
| A CONVERSATION WITH RACTER |
| |
| COPYRIGHTED BY INRAC CORPORATION, 1984 |
| PORTIONS COPYRIGHTED BY MICROSOFT CORPORATION, 1982 |
| ........... |
`-----------------------------------------------------'

Hello, I'm Racter. You are?
>Sean
Sean
-----[ END OF LINE ]-----

But that's it. It's still chugging along, turning my computer into a space
heater. I'm still unimpressed.

This isn't to fault mitsuhiko. I'm sure he finds value in AI agents coding
for him, but I think this was way out of his bailiwick, which is why he
didn't bother to understand what I was trying to attempt. “Claude” got to the
point of printing the banner from RACTER.EXE and stopped, because I think
that's all it was instructed to do, besides attempting to buffer the input.

I'll close this out with the last few comments in the thread:

Sean: What type of programming do you do? Or rather, what type of programming
do you have Claude do for you? Because I am still unconvinced it will
be any benefit to the programming I do.
mitsuhiko: Right now I’m building a backend for a prototype of the next
project I’m working on. That is a rather complex web application
using both Python and Rust. Over the last year or so I used it
quite a bit to extend minijinja (but that wasn’t agentic yet).
Sean: Ah, stuff that is definitely over-represented in the training sets.
Gotcha.
mitsuhiko: Considering that I’m doing a very fringe thing I’m not so sure
that this is a very accurate assessment :)
Sean: Python, Rust and web applications are over-represented in the training
sets. The 6809, RACTER.EXE and ANS Forth aren’t. What you are doing
might be novel, but the tech being used isn’t. The stuff I described
isn’t novel (well, maybe having RACTER and Eliza chat, but I was
riffing on an article written in the 80s about doing that) but using
tech that (in my opinion) is novel (that is, not mainstream). There’s a
difference.

I do appreciate the attempt though.

Update on Friday, June 6^th, 2025 at 3:06 AM

One last comment from mitsuhiko in the thread: “I had excellent results with
completely niche technology too. For as long as you have a way for the
machine to validate it’s [sic] outputs it can even program in languages that
you just invented.”

I think I'll have to keep this in mind for next time.

[1] https://lobste.rs/
[2] https://lobste.rs/s/n2lvmy/ai_changes_everything#c_m1ra0b
[3] gopher://gopher.conman.org/0Phlog:2024/12/19.1
[4] https://lobste.rs/s/n2lvmy/ai_changes_everything#c_wtt60w
[5] https://github.com/spc476/NaNoGenMo-2015/tree/d7d8631ad3609d37b315586ba7a1619826b8f55d
[6] https://github.com/mitsuhiko/NaNoGenMo-2015/tree/d7d8631ad3609d37b315586ba7a1619826b8f55d

Email author at [email protected]