*******************************************
* How to make raster effects on the NES ? *
*     in a game programmer's viewpoint    *
*               by Bregalad               *
*******************************************

History :

February 17th, 2009

Added a chapter about precise scanline operation
Added a chapter about switching nametables midframe (what have I been missing)
Some other minor changes

January 21st, 2009            

original release



In order to enhance graphics available for the NES, it is necessary to do "mid-frame" effects, that is changing the status of some graphics
registers while a frame is rendered.
Because televisions standard have an implementation where the pixels are rendered like text, from left to right then top to bottom,
it's possible to change parameters in the middle of the screen so that the area above the split have some parameters and the area below some
other parameters. Understanding this may not be always be easy but it crucial in getting good effects for games and get rid of some
limitations the system normally imposes to the programmer.

While many documents are available on the net to explain such behavior, many may turn newbie game programmers off as they are almost
always written in the emulator author's viewpoint, and never in a game programmer's viewpoint. I consider knowing at what exact clock cycle
a latch inside the NES change it's state is not meaningfully to game programmers, but knowing how to put such a behaviour to good use is.

It is assumed that the reader already knowns a bit about the NES architecture, PPU registers, tiles, nametable and a bit about scrolling.
If you don't know what I'm talking about you should first check other documents that covers basic graphic rendering for the NES.
It is not necessary to read the whole document to understand the concept. You can go straight to the end if the middle does not interest
you. This document ended up *much* longer than what I expected to write at first, but since I wrote it I don't want to just delete it.

Content list :
- About scanlines
- List of possible effects about
	* $2000
	* $2001
	* $2002
	* $2003/4
	* $2005
	* $2006/7
- Bankswitching CHR midframe
- Practical use
- Synchronizing with the PPU
- Writing timed code
- Warning about some evil instructions

---------------
About scanlines
---------------

I won't go into details here, because details are rarely relevant when doing a game or tech-demo, they are for emulator writers. Some
documents I've found gives headache informations about how the NES renders scanlines. I took the time to rewrite the informations so that
a simple guy wanting to write a game can figure what's going on easily without spending too much time for useless details.

- The NES outputs 240 scanlines of 256 pixels to the television. Pixels are rendered left to right, and then up to down.

- After pixel 256 of a scanline, it takes 85 more "pixels" to go to the next line, while no image is rendered. This is called the HBlank.

- After scanline 240, it takes 20 (NTSC) or 70 (PAL) "scanlines" to go to the first line again. This is called the VBlank. The non-VBlank
  time is called "a frame".

- The whole process is repeated at the rate of 60 (NTSC) or 50 (PAL) times per second.

- During the VBlank period only should you write to VRAM via $2006/$2007. The rest of the time the PPU will read the VRAM automatically
  to render it's graphics. Using $2007 during that time will screw it's operation. This also applies to $2004 and $4014.

- For that reason it's only possible to write to VRAM (nametables and pattern tables) during a short period. A NMI can be configured to be
  triggered at the start of that period, so that a program can write potential updates to a buffer, and write the buffer to the actual
  VRAM in the NMI interrupt routine.

- Before the end of that VBlank period, but after all VRAM updates are done, you want to reset the scrolling by using $2005 (and possibly
  $2000) for the frame that is to come in the standard way. Any write to VRAM will overwrite a scroll value.

- A forced blanking mode is available so that you can write to $2006/$2007 at anytime, but no image is rendered (very useful when
  completely changing the screen)

-------------------------
List of possible effects
-------------------------

A "raster effect" is any graphical effect, that could be extremely simple or absolutely incredible, that changes the state of the
PPU between scanlines through registers. I refer the point where the state is changed as a "split point". Also, it's possible to have many
split points. Up to 240 in theory, but more on that later.

The PPU has 8 registers, ranging from $2000 to $2007. I will mention what is possible for each register.

--------------------------------------
Mid-frame effects possible with $2000
--------------------------------------

- Changing nametable -

Bits 0&1 of $2000 are the 9th scrolling bit, so you'll have to see below in the scrolling section for details. By changing the bit $2000.0,
it is possible to change the nametable displayed at any time. You can get interesting effects with that.
For example if both nametables show similar data but there is a difference somewhere, you can do some effects to the place where the
graphics changes, without altering the rest of the screen. You can also have the same nametable in both tables, but a different attribute
tables, and this enables fun effects with the color of objects on the screen, or get transparency effects.
However, changing $2000.1 has no direct effect. This implies that this kind of effects require vertical mirroring.

- Changing patterns -

Bits 3&4 controls which pattern tables are used for BG and sprites. It is also possible to change them anytime in the frame, so that the
patterns used to draw the graphics changes from one scanline to another. This can also lead to many effects.
For example if the pattern in both tables are close but with a different luminosity you can get transparency effects.
But the main purpose of this is to bypass the 256-tiles limit for either BG or sprites, at the expense of the other.

- Changing sprite mode -

It is also possible to change bit 5 of $2000, so that the sprite size (8x8 or 8x16) is changed in the middle of the screen.
This is not really recommended, as it may have weird effects at the time you switch. In addition to this, you cannot get really awesome
effects out of it as far I know. You could get 8x8 sprites for the status bar and 8x16 for the gameplay (or the other way around) and this
could be useful. If you do that be sure to test it extensively on the real hardware.

- Disabling interrupts -

Not really an effect, but by changing bit 7 of $2000 you can disable VBlank interrupts. You can do that anytime if you don't want your code
to be interrupted, as the "sei" instruction does not prevent VBlank NMIs from happening. However, when writing a value with bit 7 set again
to $2000, it may be possible that you missed an interrupt.
If the VBlank has not ended yet, an interrupt will start right after the write, and for consequences the timing will not be the same as
usual. It may be possible that the VBlank updates start later, and ends up too late.
So in doubt, read $2002 before enabling interrupts back to be sure that no interrupt will fire straight away.

All other bits of $2000 are not useful to change midframe as far I know.

-------------------------------------
Mid-frame effects possible with $2001
-------------------------------------

- Changing color emphasis -

Bit 0, 5, 6 and 7 of $2001 can allow you to use gray-scale mode and color emphasis. It can be changed anytime in the frame and take effect
immediately. Not only you can get some graphic effects with that, but it is very useful to see how much time a routine you wrote takes.
You can see how much the CPU is busy by setting the grayscale bit while it is busy, and clear it afterwards. By seeing how large the gray
band is, you can instantly make your conclusions.
You can get great transparency effects by setting both grayscale and emphasis. Also, it is not only possible to change this bit mid-frame,
but mid scanline, so that the left part of the screen is gray and the right part is normal. Unfortunately the CPU is much slower than the
PPU, so the gray bar will be shaking a lot and you cannot get any precision in doing that.

- Changing sprite or Background enable -

Bits 3&4 of $2001 can be changed anytime to display or hide background and sprites. They take effect immediately.
As long as you only hide background or only sprites, this can lead to interesting effects (such as disabling sprites on your status bar).
However, if you set both of those bits to '0', the PPU will stop function as usually, and will enter in it's "Forced VBlank" mode.
Normally, only the background color (the color you write at $3f00) is seen in this mode.

- Forced VBlank mode -

This mode allows the screen to be "blanked" constantly, and normally when starting your program the first thing you do without knowing is
entering in this mode by writing $00 to $2000 and $2001. You can write to $2006 and $2007 as if it was in VBlank. By cleverly blanking
areas of the screen while leaving other areas enabled, it can lead to heavy patterns/name table uploads in order to get some effects. It
is also possible to rewrite the palette that way. But be warned that if you set $2006 to address a palette, that color will be seen on the
screen instead of the background color. This can lead to interesting effects, but can be undesirable if you only wants to change the
palette. Also, blanking the screen is the *only* way to change the palette midframe. You cannot get a lot of colors on the same screen
unless you blank the screen midframe.
If you ever do this kind of effects, be sure to extensively test on the real hardware.

Important : While in forced blank mode, the scrolling counters are not updated, the OAM DRAM is not refreshed, and the Palette DRAM is not
refreshed either. For consequences, when you enable the rendering back, the scrolling will be erased and be set to some unpredictable
values as you wrote to $2006 (see below for details).
Another important thing is that the PPU still sends a background color to the screen and continue to execute VBlank interrupts normally in
that mode, even if the screen is constantly "blanked".
The only way to directly render a proper frame when enabling the screen back is to enable VBlank NMI (but *not* enable the rendering), and
set proper scroll values, OAM and palette in this NMI, along with turning normal rendering on via $2001.

-------------------------------------
Mid-frame effects possible with $2002
-------------------------------------

$2002 is a read only register, so you can't directly do effects with it. However, reading it is a key in doing mid frame effects.

- VBlank flag -

$2002.7 tells you if you are in VBlank. This flag is set to '1' when a VBlank period starts (at scanline 240), it is clear to '0' when the
VBlank period stops *or* when you read the $2002 register. As a consequence, if you read a '1' to this flag, it will automatically read '0'
the next time. This flag is used to decide whenever a NMI fires (if enabled via $2000.7), and read it acknowledge the NMI. You must read
$2002 in your NMI routine to acknowledge it.
In theory, you could have many NMI sources and if $2002.7 is clear on the NMI you know it's from another source. However the only other
possible source is the almost never used expansion port on the bottom of your NES. So it is usually assumed any NMI is triggered from
$2002.7 without any further check.

Reading $2002 at the exact time of a VBlank start will clear the flag *and* return a '0' in bit 7, implying you missed a VBlank. For that
reason if you don't want to randomly miss frames, you should not poll $2002, but use NMIs instead.

- Sprite zero hit -

$2002.6 is a very interesting flag. This flag is always cleared on the start of a new frame (at the end of the VBlank), and is set as soon
as we reach a non-transparent pixel from the sprite zero that hits a non-transparent pixel of the background. Sprite zero is the first
sprite found in OAM. By placing the sprite at the good place, you just have to check this flag by reading $2002.6 By checking regularly,
when you detect a '0' to '1' transition, you know the rendering is exactly where you placed your sprite zero, and you can start doing
effects (more on that later).

The only problem is that you can only get one sprite zero hit per frame. So you can only synchronize your code with the PPU 3 times per
frame : one time for the VBlank NMI and the second time with the sprite zero hit. You can also synchronize by detection '1' to '0'
transitions of this flag to know when a VBlank is about to end, but it's not that interesting, as you already synchronized on the beginning
of VBlank normally.

At least one pixel should "hit" so that this flag becomes true, if no pixel hits, it will remain clear forever, and in some cases, this
will lead your program to freeze, which is a *bad* thing. Also I'm pretty sure that the hit shouldn't happen on the far right of the
screen, else it will be ignored for some reason. It shouldn't happen inside the left-most clipping area if left most clipping is enabled
via $2001 (for either BG or sprites, because there is no hit). It's important to respect that rules. It's the imagination of anyone to
hide the sprite zero so that it's not visible, or barely visible.

- Sprite overflow flag -

$2002.5 is also interesting. It is clear on the beginning of each frame, and it set as soon as more than 8 sprites are met on the same
line. As you may know, when more than 8 sprites are supposed to be seen on the same line, the NES discards the lower priority sprites,
and they won't be visible.
When that happens, this flag is set, and once set it remains set until the end of the frame. You could use that flag to synchronize with
the PPU, for example you can put 8 sprites on the bottom of the status bar (which is at the top of the screen) on the same line, and rely
on a '0' to '1' transition to synchronize with the PPU without a sprite zero hit. However, I've heard that this flag is unreliable I don't
really know why. So if you do something like that, you must test extensively on the real hardware

- Abusing the 8 sprite per line limitation -

Not really an effect with $2002, but something interesting while we're on the topic. If you want to disable some sprites on some scanlines,
but that you don't want to disable all sprites, or if you just don't want to have a code that synchronize with the PPU, or both, you can
actually rely on the 8 sprites per line limitation to hide your sprites. However, many people will play your game/demo with the sprite
limit disabled on their emulators as this limit is often more annoying than useful, so you'd have to mention it in your documentation.

-------------------------------------------
Mid-frame effects possible with $2003/$2004
-------------------------------------------

If there is any registers on the NES which are really obscure even many years after the console was first reverse-engineered, it's without
a doubt $2003 and $2004. You can most likely not get interesting effects by writing to $2003 and $2004 during the frame, and it will 
probably have no effect or just add glitches to your sprites, but I can't give a 100% guarantee on that.

However, as $2003 is write only, $2004 can also be read. During VBlank this allows you to read back what you wrote in OAM, but I really
see no point in doing that. During the frame, the PPU will have to read sprite data from OAM, and while doing that, the data read is also
mirrored in $2004. Unfortunately, the CPU is much slower than the PPU, and the chances of putting that to a good use are slim.

It has been recently discovered that by using many sprites made of all $00 (using tile zero, color zero, all flags to zero, and X position
to zero) on the same Y position, you can synchronize with the PPU by reading $2004 and looping until your read the value $00. However I'm
not sure how reliable this is.
You could get false positives (read the value $00 before excepted), so to prevent that you want to make sure all other sprites are never
placed at X=0, are never using tile $00, and that if their color is zero then another flag should be turned on (vertical flipping,
horizontal flipping or behind background), so that all values for other sprites are never $00. In addition to that, you may also loop until
two consecutive $00 are read from $2004, I'm not exactly sure about that.

Finally, as this have been discovered recently, and no emulator, even the most accurate ones, really implement this properly yet. Most
emulators will just read OAM back via $2004, even during the frame, so you'll get only a small delay before you think the PPU has reached
the point where you zero sprites are placed. How many sprites are needed depend on the loop that polls $2004 and if you're on NTSC/PAL I
guess so you should just try some values and see what happens.

This could also work with another value than $00, but that value should not change when and-ed with $e3, as the flags are not all
implemented in OAM, and writing $ff to it and reading it back will read $e3. Also, other values should be hard to avoid for other sprites.
But for $00, any sprite at position $00 is useless if left-clipping is turned on, so the only true limitation is that when using color $00,
another flag have to be turned on.

If you do any tricks using $2003 or $2004, you want to test it extensively on the real hardware.

-------------------------------------
Mid-frame effects possible with $2005
-------------------------------------

- Changing horizontal scrolling -

Writing to $2005 midframe will, simply put, change the horizontal scrolling. Writing a second time to it will *not* change the vertical
scrolling, it will get ignored. Writing a third time will change horizontal scrolling again, etc... You can read $2002 to make sure your
next write is the horizontal one. Contrary to popular belief, after a horizontal write there is absolutely no need for a second dummy
$2005 write, it works perfectly with only one write. This can lead to many interesting effects.

Combined with $2000.0, it is possible to change the scrolling to any horizontal position possible.

- Unwanted effects -

Writing to $2005 mid-scanline can get undesirable glitches on the screen. In order to change the scrolling without any glitches, you want
to experiment trial and error by adding or removing some NOPs before your $2005 write, and fine tune your timing until you get a setup with
no glitches.
Read the new section about scanline timing if you want details.


-------------------------------------------
Mid-frame effects possible with $2006/$2007
-------------------------------------------

- Set scrolling to a fixed value -

$2006 writes, are much more interesting. Most other docs say complicated things about it, because emulating is is complicated, but on the
programmer's side it's really simple.
When you want to write something to the PPU memory, you write an address to $2006. Writing an address during the frame will simply make the
corresponding name table tile show up on the left of the next scanline.
This allow you to change both vertical and horizontal scrolling, but with a resolution of 8 pixels. For example, if you write $21 then $40
to $2006, the scrolling will be affected in a way so that the tile at address $2140 will immediately be shown on the far left of the next
scanline. You don't even have to compute which scroll value this corresponds, and this is really useful.
Note : From now on the "tile-based" 8 pixel resolution scroll will be called the coarse scroll, and the lower 3 bits of scrolls which scrolls within
a tiles will be called the fine scroll, for both axis.

- Fine scrolling -

This is a rather odd feature. The higher 2 bits can allow you to increase the resolution. In the example above the mentioned tile will be shown,
but will be scrolled 2 pixels down, because the higher 2 bits says '2'. If you wrote $01 then $40, the same tile would be shown, but it won't be scrolled, it will
be shown from the top. It's then impossible to scroll lower than 3 pixels, writing $41 then $40 has the same effect than writing $01 then
$40, it won't scroll 4 pixels down (you could say the bits 14 and 15 of the address have no effect at all, because it's nothing new the PPU addresses are 14 bits).

Because of this fine scroll feature, there is no way you can trick the PPU and use pattern table data as name table data by writing an
address that is between $0000 and $1fff, it will point to nametables with a different fine scroll value (it's such a shame).
However, you can trick the PPU by using attribute table data as name table data, by writing an address between $x3c0 and $x3ff. Sadly,
it will also be used as attribute at the same time, which is not practical. By setting all 4 background palettes to be identical, it could
become more useful.

- Undesirable glitches -

Writing to $2006 mid-scanline can get undesirable glitches on the screen. In order to change the scrolling without any glitches, you want
to experiment trial and error by adding or removing some NOPs before your $2006 writes, and fine tune your timing until you get a setup
with no glitches. Sometimes it even seems hard to come with no glitches at all (also many emulators won't show glitches when they actually
are there, you must test under an accurate emulator or the real hardware).
Read the new section about scanline timing if you want details.

- Multidirectionnal scrolling -

By writing to $2006, $2006 then $2005, you can set the screen to a known vertical position, and then set the desired horizontal scroll. If
you do that, I recommend the you write a correct horizontal position with $2006 as close to it's $2005 counterpart, and then write the
$2005 scroll so that you get rid of the 8-pixel resolution limit. This seems to avoid glitches, although I'm not exactly sure why.
A $2000 write is never needed when doing that, as you chose the used nametable with $2006.

If you do not do a write to $2005, the fine horizontal scrolling seems to be unaffected by $2006 writes. For example if you write $21 then
$40 to $2006, and that the previous horizontal scrolling value written to $2005 was $34, the tile at address $2140 will be shown on the
left-most border on the next scanline, but the tile will be scrolled 2 pixels down (as mentioned above) and 4 pixels right (because the
fine scrolling is not affected by the $2006 writes).

- About $2007 -

As far I know accessing $2007 during the frame is a *bad* idea and will only introduce glitches in your graphics. Writes will *not* have
any other effect than glitching your graphics, even during HBlank. You must use "forced blanking" mode mentioned above, even if it's for
a very short time. For reads I have no idea how reading $2007 could be useful, maybe there is a obscure behavior here that hasn't been
yet discovered. Do not rely on any emulator to behave correctly when it comes to $2007 access during the frame.

--------------------------
Bankswitching CHR midframe
--------------------------

If you use a mapper with CHR bankswitching, it's possible to bankswitch your CHR ROM or CHR RAM midframe so that you can get rid of the
256 tiles limit for both the background and sprites, without "eating" space of the other as you would do the "old" way by using $2000.

In addition to have more tiles, you can also bankswitch similar but different patterns in order to get transparency effects or things like
that. Or you could bankswitch a CHR bank which contains all zeroes to simply disable some backgrounds or some sprites, but not all of them.

---------------------------
Changing mirroring midframe
---------------------------

If you use a mapper with mirroring control, it's possible to change the mirroring midframe.
The main use of this is to have the play field in one nametable and the status bar in the other nametable (with 1-screen mirroring) but it
could have other uses as well.

-------------
Practical use
-------------

You get nothing for nothing

While most of those effects are cool, it is very impractical to do full screen effects in a game, simply because it would take 100% of the
CPU time, leaving no time to do any game logic. Also, it's only possible to synchronize with the PPU 3 times per frame under standard
conditions (VBlank, end of VBlank, sprite zero hit), and the two latter relies on the fact that there is a sprite zero hit, and are not
suitable for all cases. To get more than one split point per frame, you'll have to write timed code from one synchronized point, and while
this can be fun it's tedious to do.

The NES by itself only has 2 IRQ sources, DMC and APU frame, both are not synchronized with the PPU but with the APU, and both are mostly
useless. APU frame IRQ will probably never ever be useful, because it triggers at a rate close to the VBlank NMI, but is unsynchronized
with the PPU.

DMC IRQ on the other hand can be useful to free CPU time : You can play a silent DMC sample, and a IRQ will trigger when it ends. You can
made the sample so that it starts during VBlank, and end above your sprite zero hit. In the IRQ interrupt, acknowledge the IRQ and poll
the sprite zero flag. There will be a very large jitter window in which the IRQ might happen, so you can only use it as a tool to free CPU
time that would be wasted waiting the hit, you can't use it for synchronizing in itself.

External IRQs (from the cartridge) are the most useful, but you have to use a mapper that supports them. With such a mapper,
it's much less tedious to get effects, as you can synchronize with the PPU more easily under some conditions.

-------------------------
Synchronizing with the PPU
-------------------------

As mentioned above, you'll need a reliable source of synchronisation. The easier to use is the sprite zero hit, that you have to set up in
a way so that it does collide with background. Remember that all color #0 pixels are considered transparent, no matter if this is a BG or
Sprite tile. A non-transparent BG pixel has to collide with a non-transparent sprite #0 pixel so that the flag at $2002.6 get set to '1'.
Once set, it is only clear to '0' at the end of VBlank, so at scanline 0.

Synchronizing at VBlank :

NMI		;Pointer at $fffa points here
   pha
   txa		;Do never go away without first saving the regs
   pha
   tya
   pha
   bit $2002	;Acknowledge the interrupt
   ......	;You know you're at scanline 240 here if the 'N' flag is set (it normally always will unless another NMI source exists
   ......
   pla
   tax
   pla
   rti		;Depending on your coding style, you may or may not still be in VBlank here

Synchronizing at scanline 0 by detecting a '1' to '0' transition :

		;It is assumed the sprite zero hit flag is already set here
-  bit $2002
   bvs -
  		;When the CPU goes outside of that loop you know we're at scanline 0

When the flag has a '0' to '1' transition, we know we're at the location of the sprite 0 :

		;It is assumed the flag is already clear here
-  bit $2002
   bvc -
   		;When the CPU exits the loops, we know we're at the location of the first pixel that makes the collision

Because a bit / bvc loop takes 7 clock cycles to complete, you can exit the loop in a 7 clock cycle window without moving the sprite
zero at all and this is normal, you'll have to deal with that. An IRQ or NMI also can trigger in a 7 clock cycle window as some 6502
instructions takes up to 7 clock cycles to complete (although they are rare).

------------------
Writing timed code
------------------

If you want more than one split point, it will be necessary to write some timed code. This will allow cool things like changing the state
of some registers each scanline (not just once) so that it does cool effects.
You will just have to remember a few points when writing timed code :

- Each instruction takes a certain amount of clock cycles, you should look inside tables to figure this out.
  There is a good detailed table available at http://6502.org/tutorials/6502opcodes.html, but it's not the only one.
- A NTSC Scanline is exactly 113 + 2/3 clock cycles
- A PAL scanline is exactly 106 + 9/16 clock cycles
- You will have to make a loop that continuously write to a registers where each write is separated by the number of c.c. mentioned above
- A PAL instruction takes, relatively to the PPU, 16/15 of it's NTSC counterpart.


- To handle the fractional clock cycles -

A convenient way of doing it is that in each iteration of the loop you have something like that :

	lda var
	clc
	adc #$ab
	sta var
	bcs +		;This instruction takes almost exactly 2 + 2/3 clock cycles for NTSC timed code
+	....

	lda var
	clc
	adc #$90
	sta var
	bcs +		;This instruction takes exactly 2 + 9/16 clock cycles for PAL timed code
+	.....

Var does not need any initialisation but should not be touched by the rest of the loop. This is not the *only* wat to do this, but I
find it's one of the most convenient. Feel free to come with your own ideas in your game/demo.


- To adjust the starting point of the loop -

Often after a sprite zero hit (or an IRQ) you do not want to start immediately the loop (or even a single split) doing mid-frame writes to
the PPU registers. Reason for that is you're likely to get glitches, and by finding a way to make it happen later or sooner, by trial and
error you will be able to get rid of the glitches. There is no instruction that can take one single c.c., but nop takes 2 and lda zeropage
takes 3. So in order to adjust the loop you'd want to add something like that :

nop
nop          ;Takes 4 cc., but too early
.....

nop
nop
nop          ;Takes 6 cc., but too late ?
.....

lda $ff      ;Dummy read that will not matter
nop
nop          ;Takes 5 cc. -> get more precision and make your conclusions
......


- Longer delays -

If you want longer delays than just a few cycles (like if the sprite zero is always some scanline above the desired split) you do not want
to have a long useless chain of nops. Instead just write a loop like that :

   ldx #Constant
-  dex
   bne -

And adjust your constant with trial and error. Increasing the constant of 1 means 5 cc. of more delay. You must follow this loop by the nop
technique above to fine-tune the timing within 5 c.c. For those who likes a more theoretical approach, you have to find the # of cc. to wait,
and divide it by 5. The rest of the division is the # of cc. to wait after the loop :
2 cc = 1 nop, 3 cc = 1 lda zeropage, 4 cc = 2 nops. If the rest is 1, decrease the loop count by one and add 3 nops.

Also, when converting from NTSC to PAL, multiply the constant by 15/16. When converting from PAL to NTSC, multiply the constant by 16/15.

------------------------------------
Low level scanline timing
------------------------------------

This complicated chapter is only for people who really want to understand the low level things, or for those who really can't get rid of
graphical glitches in their raster effects.

Like I already said above, the PPU renders scanlines of 256 visible pixels and 81 "pixels". Each PPU clock cycle, a "pixel" is rendered.
For a NTSC NES, a CPU clock cycle is 3 PPU clock cycles long. For a PAL NES, 5 CPU clock cycles are 16 PPU clock cycles long.

What is tricky to understand is how exactly the PPU fetches data. I won't give any details as there is another document that describe
all the details. However remembering the following can help :
- The PPU fetches nametable, attribute table and background pattern tables for current scanline during cycles 0-255
- The PPU fetches OAM and sprite's pattern table for following scanline during cycles 256-322
- The PPU fetches nametable data again for following scanlines during cycles 322-341

How they decided which of those cycles is cycle 0 is beyond me (since this repeats over and over) but that's how Nintendulator's debugger
counts them. It would have made more sense to make cycle 256 the first one I think.

So when updating scrolling registers for Background, you get that 66 cycles window where you can freely update the counters without any
risk of glitches. The good news is that the PPU fetches some 36 tiles, when only 31-33 are actually visible. So this makes the actual glitch
free window larger.

Another thing to understand is the jittering. Because a loop that wait to sync with the PPU takes so many clock cycles, this is called a
jitter or a timing window.
Just remember that :
- bit $2002/bvc loops, interrupt over random codes takes 7 cycle jitter (= 21 pixels NTSC, 24 pixels PAL = 3 tiles)
- interrupt over a lda zeropage/branch loop or a jmp here loop takes 3 cycles jitter (= 9 pixels NTSC, 12 pixels PAL = 2 tiles)

So if you really want to reduce jitter synchronize with the NMI interrupt instead of a sprite zero hit, and make sure the idle loop only
uses shorts 3 cycles instructions. I don't know of any way to reduce jitter even further, but maybe there is one.

One thing which is important to understand is that on cycle 256, the PPU increments it's internal row counter, and the coarse horizontal
scrolling gets reloaded.
There is basically 2 "correct" ways to update the scrolling registers glitch-free (or almost glitch-free) : Either you do it always before
cycle 256 or always after. If your writes are sometimes before and sometimes after this cycle because of the jitter, you will get some shaking
graphics. Just remember the following points :

- If you write before cycle 256, the fine vertical scroll with be incremented and the coarse horizontal scroll will get updated just after
  your write. If you're too early, glitches may appear on the right side of the screen. This explains the glitches above Shadow Man's face
  in Mega Man 3 for example : They did a scrolling write WAY too early

- If you write after cycle 256, the coarse horizontal scroll won't get updated before the next scanline. If you're too late, glitches may
  appear on the right side of the screen.

- Only the second $2006 writes takes effect, the first write is just buffered and can be done "too early" without this causing any problems.

If you write to $2000, $2005 only you should do it the "before" way to avoid glitches. If you writes to $2006 however, it depends on your
needs and tastes.

-------------------------------------
List of possible mid-scanline effects
-------------------------------------

It's possible to do multiple writes to the same registers during a single scanline to achieve even further effects. The list is quite short,
and all of them are hard to pull of because they need real precise timing and you have to deal with at best 2 tile jittering (sync with NMI
with 3 cycle instruction idle loop, no sprite zero hit). As far I know only 2 emulators emulate them properly (Nintendulator, Nestopia), and
this will require you to test extensively on real hardware before using them.

$2000 : You can change which pattern table is used for BG mid-scanline in order to bypass the 256 tile limitation.
You need a zone of at least 2 tiles which are shared by both pattern tables for each switch (else it will look glitchy). As far I know only the
game "Marble Madness" did that (when displaying text over the playfield).

$2001 : You can turn on and off the grayscale and color emphasis bits any time you want (along with the left clipping - not very useful). As far
I know only the game "Final Fantasy" did that (when you light an orb).

$2005 : You can change fine scrolling (low 3 bits) anytime. You need a zone for at least 2 tiles with a unique horizontal color for each change
(else it will look glitchy). The coarse scrolling only get loaded at cycle 256. As far I know this was never used intentionally.

$2006 : You can force the PPU to fetch any tile any time but as long as there is jittering this will never be useful. If one could ever somehow
get the jittering window fit inside a single tile, it might become possible to do CRAZY effect with that, simulating multiple background layers.

Other : It's possible to bankswitch CHR-ROM mid-scanline in a similar way you can change the pattern tables with $2000. As far I know only the
game "Mother (J)" did that (when opening the menu).
If you want to be evil, you could try to do constant bankswitches between BG and sprites fetches, so you could use 512 tiles for sprites (using
8x16 mode) and 256 more tiles for BG. The MMC5 does that automatically, but I don't know how well it'll work with another mapper - I haven't tested
but if you attempt this be sure to test extensively on real hardware.

Other (bis) : For mappers that support selectable nametable mirroring it is possible to switch which nametable is displayed mid-scanline. Again
you'll need a zone of at least 2 tiles which are identical between two nametables. You CAN'T do that with $2000.0 or $2000.1, it WON'T work. To
do that you'll need 1-screen mirror, or switch between H and V mirroring while displaying from the table at $2400 or at $2800, or anything else
that allow the PPU to fetch tiles from a different part of memory while reading the same nametable address.
As far I know this effect was never used intentionally.

------------------------------------
Warning about some evil instructions
------------------------------------

Some instruction takes a variable number of clock cycles when a "page boundary" is crossed (if the high byte of the destination address is
changed).

For example :
ldx #$00
adc $6ff,X	;Takes 4 clock cycles
inx
adc $6ff,X	;Argument is located at $700 but takes 5 clock cycles instead of normal 4

Also :
$efff:	lda Var
$f001:	bmi $efff	;Here the BMI opcode if the branch happen is 4 cycles instead of normal 3.

Be sure to avoid such instructions, and if you use them, find a way to make sure they do never (or always) cross a page boundary so that
your timing is not screwed up.

Also, the DPCM channel must be inactive (turned OFF via $4015.4) when doing such effects else your timing will be screwed as well.

----------
Conclusion
----------

I hope this doc helped people that wanted to do a program that does some complicated graphics tricks on the NES but did not want to go
through the headache of the un-necessary stuff. It covers most aspects of raster effects on the practical/coder's side and not from the
emulator author's side. It explains how to exploit the effects, not what they are due to. I hope this have been useful to you.

I'd like to thanks all active members of NESdev, as without them I wouldn't have the knowledge to write that, even less to understand the
"headache documents" for the emu author. (I hope this one wasn't too much a headache for you).
If you have any questions, please contact me via the Nesdev BBS on http://nesdev.parodius.com/bbs for any questions or comments.