Start with TV quality video ...
[bpf = bits per frame bps bits per second]
320x240 x YUV422 = 614440 bpf
Using 16x16 squares (would look like shit)
20 x 16 x YUV422 = 2400 bpf
Using 8x8 squares (looks blurred to hell but we can fix it later)
40 x 32 x YUV422 = 10240bpf
For 1fps that has 8000bps for audio if very compressed thats 2000bpf
left for detail... (shall we aim for 1 per 2s ;)
Now the squares can be compacted because adjacencies tend to be similar
so we can store YUV422 for the 1 square and YUV211 differentials for the
others in most cells.
ie
[422][211][211][211][211] giving 25% saving
its now
40 x 32 x YUVS = 7680 bpf
keeping 1 bit per supercell (20 x 16) gives 320bpf of shift on the 211's
for 211 high or 211 low bits
Now look for the 8x8 cells that have the most detail (detail being
measured by difference between pixels and the proposed block colour). JPEG
DCD encode these cells and write a cell map for these in YUV422
On a 56Kbit link going for 2fps we have
Realistic bandwidth 42000
- audio 8000
video bandwidth 34000 bps
square maps x2 16000 bps
Detail x2 18000 bps
So we have 9000 bps available for detail per frame. We can encode the detail
in several ways. Suggested are (4 bit encoding, 1111 reserved see below)