Start with TV quality video ...

Start with TV quality video ...
[bpf = bits per frame bps bits per second]

320x240 x YUV422 = 614440 bpf

Using 16x16 squares (would look like shit)

20 x 16 x YUV422 = 2400 bpf

Using 8x8 squares (looks blurred to hell but we can fix it later)

40 x 32 x YUV422 = 10240bpf

For 1fps that has 8000bps for audio if very compressed thats 2000bpf
left for detail... (shall we aim for 1 per 2s ;)

Now the squares can be compacted because adjacencies tend to be similar
so we can store YUV422 for the 1 square and YUV211 differentials for the
others in most cells.

ie
[422][211][211][211][211] giving 25% saving

its now

40 x 32 x YUVS = 7680 bpf

keeping 1 bit per supercell (20 x 16) gives 320bpf of shift on the 211's
for 211 high or 211 low bits

Now look for the 8x8 cells that have the most detail (detail being
measured by difference between pixels and the proposed block colour). JPEG
DCD encode these cells and write a cell map for these in YUV422

On a 56Kbit link going for 2fps we have

Realistic bandwidth 42000
- audio 8000

video bandwidth 34000 bps
square maps x2 16000 bps

Detail x2 18000 bps

So we have 9000 bps available for detail per frame. We can encode the detail
in several ways. Suggested are (4 bit encoding, 1111 reserved see below)

4 4x4 pixel YUV422 - 32bpf + header
4 4x4 pixel YUV211h - 16bpf + header
4 4x4 pixel YUV211l - 16bpf + header
4 2x8 pixel YUV422 - 32bpf + header
4 8x2 pixel YUV422 - 32bpf + header
4 2x8 pixel YUV211 - 16bpf + header
4 8x2 pixel YUV211 - 16bpf + header
4 2x8 pixel YUV211h - 16bpf + header
4 8x2 pixel YUV211h - 16bpf + header
8 1x8 pixel YUV422 - 64bpf + header
8 8x1 pixel YUV422 - 64bpf + header
16 2x2 pixel YUV422 - 128bpf + header
16 2x2 pixel YUV211low - 64bpf + header
16 2x2 pixel YUV211hi - 64bpf + header
16x16 DCT - variable (maybe ??)

header (16bit long form)
[0][Codec.4][Row.5][Col.6]

short

[1][Skip.3][Codec.4] Skip being 1-8 cells

short-run (16bit for up to 8 bytes 8 skip)

[0][1111][Skip.3][Run.3][codec.4]