+---------------------------------------------------------------------------+
| Fetched from the "Vintage Computer Forum" the 02 Nov 2016                 |
| Original URL: http://www.vcfed.org/forum/archive/index.php/t-51985.html   |
+---------------------------------------------------------------------------+

-----------------------------------------------------------------------------
From: PgrAm [March 31st, 2016, 04:56 PM]

This forum has been incredibly helpful for me in the past so I figure I might
be able to find someone who can explain how the VGA latches work. I'm writing
a point and click Lucasarts style adventure game and I'm doing it using mode X
so I can get a few more pixels on screen. I'm reading Michael Abrash's
Graphics Programming Black Book
(http://www.phatcode.net/res/224/files/html/index.html) and I've got a pretty
good grasp of how mode x works and I've got page flipping working but I'm
trying to work out how to do a display memory to display memory copy but
Abrash doesn't really do a good job of explaining how latched copying works.
Can anyone point me in the right direction?

what I'm trying to do basically is store the 320x240 background on the third
page of the video memory and fast copy that onto the screen
-----------------------------------------------------------------------------
From: Scali [April 1st, 2016, 01:03 AM]

Well, as you probably know by now, mode x works with 4 different 'bitplanes'.
This is basically the EGA memory layout. Except pixels are interpreted as
bytes now, rather than bits.
EGA introduced latches to manipulate these bitplanes. Think of them as
internal registers. Each bitplane has a dedicated latch.
If you read a byte from video memory, only one of the bitplanes can be
selected for the CPU to read the result. However, all 4 latches will load the
respective byte from their bitplane. So after the read, the value is now
'cached' in the latch registers of the ALU.
For the write, there are various ways to configure the ALU. It basically
performs an operation (eg AND, OR XOR) between the value written by the CPU
and the value of the latch, and then writes the result to all enabled
bitplanes.
Configure the ALU so that it writes the value of the latch to all bitplanes,
et voila, you have a 'latch copy' routine.
That is, for every byte you read with the CPU, 4 bytes are read into the
latches, and for every byte you write, 4 bytes are written. So this is far
more effective than using the CPU to read and write the individual bytes.

The downside is that it only works with 'bitplane-aligned' data, so you can
only do this per 4-pixel boundary.
For more accurate movement you'd need to have pre-shifted data in memory,
because you can't move the data from one latch to the next, so you can't read
data from one bitplane and write it to another.

Note that the latches are only 1 byte large on VGA and most clones, so trying
to use this technique with 16-bit or 32-bit writes at a time will not work.
-----------------------------------------------------------------------------
From: PgrAm [April 1st, 2016, 09:08 AM]

So if I've got this right what I do is basically

for each pixel on the plane:
read the pixel I want to copy from (this loads all four latches with a pixel
from each plane), I assume this is a normal read like:

char pixel = vgamem[y * width + x];
select all planes
tell the vga card to write to the pixel I want (this will copy from the latch
to the pixel on all four planes), i'm unclear on how to do this.

some pseudocode would be helpful if anyone can do that
-----------------------------------------------------------------------------
From: Scali [April 1st, 2016, 09:37 AM]
char pixel = vgamem[y * width + x];

To be safe, you want to make pixel 'volatile', to prevent the compiler from
optimizing it out (you won't actually need the value).

> select all planes

You don't want to do this every pixel, it's expensive. First select all
planes, then go into the copy loop. The settings are the same for all pixels.

> tell the vga card to write to the pixel I want (this will copy from the
> latch to the pixel on all four planes), i'm unclear on how to do this.
>
> some pseudocode would be helpful if anyone can do that

Writing the pixels is the opposite to reading:
vgamem[y * width + x] = pixel;

The key is in setting up the ALU properly for the copy loop. You can find a
reasonable reference on osdev: http://wiki.osdev.org/VGA_Hardware
You will want to use write mode 0.
As you can see, the key to combining the CPU pixel value with the latches is
in the logical operation.
You can choose any of the three options really... after all:

latch AND 0xFF == latch;
latch OR 0x00 == latch;
latch XOR 0x00 == latch;

That's pretty much it.
So eg, set write mode 0, enable all planes, and set it to OR-mode.
Then you can simply do this to write the latches to all 4 bitplanes:

vgamem[y * width + x] = 0;

In fact, I just checked the IBM VGA docs, and it seems that mode 1 simply
writes the latches directly, so you don't even need to change the logical
operation.
-----------------------------------------------------------------------------
From: PgrAm [April 1st, 2016, 10:27 AM]

Wow that's deceptively simple. I assumed vgamem[y * width + x] = 0; would just
set that color to zero but now I actually understand how the ALU factors into
it and does some combination. Here's what I have, this function copies an
entire page in Vram to another one.

void vramCopyPage(unsigned int dest, unsigned int src) {
 outpw(SC_INDEX, ((word)0xff << 8) + MAP_MASK); //select all planes
 outpw(GC_INDEX, 0x08); //set to or mode

 int j;
 for(j = 0; j < SCREEN_SIZE; j++) { //all the pixels on the page eg. 19200
   volatile char pixel = VGA[src + j]; //read pixel to load the latches
   VGA[dest + j] = 0; //write four pixels
 }

 outpw(GC_INDEX + 1, 0x0ff);
}

Now that Ive figured that out it makes sense how I could use the ALUs to do some transparency.
-----------------------------------------------------------------------------
From: Scali [April 1st, 2016, 11:08 AM]

If you set write mode to 1 instead of 0, then technically it does not matter
what value you write to memory.
This allows you to use rep movsb (not movsw/movsd because the latches are 1
byte) to copy blocks of memory very efficiently.
Saying that, I wonder if newer CPUs have 'optimizations' for rep movsb to turn
it into 16-bit or 32-bit operations... That would not work, as the addresses
of every read and write need to be generated on the bus explicitly, for the
VGA chip to pick them up.
It may be that it doesn't work. In which case you can still use lodsb/stosb on
these CPUs.

One could argue though that rep movsb should specifically copy bytes, since
you can already write rep movsw or rep movsd yourself. So if you write rep
movsb, you mean rep movsb. But with all that micro-op/macro-op fusion and all
that, who knows.
-----------------------------------------------------------------------------
From: PgrAm [April 1st, 2016, 12:03 PM]

Heres a more general blit function I've written in case anyone might find it
useful:

void vramBlitPage(unsigned int dest_page, int source_x, int source_y,
                 int dest_x, int dest_y, unsigned int width,
                 unsigned int height, unsigned int src_page) {
 if (!clipRectangle(source_x, source_y, dest_x, dest_y, width, height)) {
   return;
 }

 outpw(SC_INDEX, ((word)0xff << 8) + MAP_MASK); //select all planes
 outpw(GC_INDEX, 0x08); //set to or mode

 unsigned int source_offset = (((uint32_t)source_y * (uint32_t)SCREEN_WIDTH + source_x) >> 2) + src_page;
 unsigned int dest_offset = (((uint32_t)dest_y * (uint32_t)SCREEN_WIDTH + dest_x) >> 2) + dest_page;

 for (int line = 0; line < height; line++) { //for each scan line
   for (int x = 0; x < width >> 2; x++) { //for each scan pixel
     volatile char pixel = VGA[source_offset + x]; //read pix to load latches
     VGA[dest_offset + x] = 0; //write four pixels
   }

   source_offset += SCREEN_WIDTH >> 2;
   dest_offset += SCREEN_WIDTH >> 2;
 }

 outpw(GC_INDEX + 1, 0x0ff);
}

If you have any pointers on transparency I'd love to hear it.
-----------------------------------------------------------------------------
EOF