View Full Version : Genesis ROM speed
kool kitty89
05-18-2011, 02:27 AM
Exactly how fast was the ROM used in Genesis games?
I know the PCE consistently used ROM fast enough for zero wait states from the CPU (better than 140 ns), but that was exceptionally fast ROM for the time and well beyond the speed for commodity mass-market ROM. (which was in the 400-500 ns range towards the end of the 80s and very early 90s)
I also got the impression that lat least some early MD games used very slow ROMs (~520 ns, barely fast enough for the CPU to access without wait states), but now I realize that might not make sense given the speed memory is accessed with vblank DMA for block transfer to VRAM.
The VDP does block transfer at about 2.6 or 3.3 MB/s, but that means it's reading and writing that much, so actually accessing memory 2x as fast (presumably at the clock rate of the VDP -5.37 or 6.71 MHz, but not doing block transfer continuously for the entire length of a scanline in vblank, so actual transfer rate is slightly lower). So that would mean the VDP is accessing memory at 186 or 149 ns, but since it would only ever access ROM every other cycle, it should be possible to configure things such that ROM would only need to be capable of 298 or 372 ns. (but that's still significantly faster than the 520 ns claimed to be used for early MD games)
However, that's the access speeds needed if the VDP's block transfer logic can only make 8-bit accesses, but if it could make 16 bit reads (followed by 2 consecutive 8-bit writes to VRAM) or if there were an external 16-bit bus latch employed for a similar purpose, it should have been possible to get by with RAM that was half that speed again. (so 520 ns would be acceptable for either either VDP speed)
Otherwise, Sega would definitely have had to be using faster ROM to allow use of DMA at those speeds.
Chilly Willy
05-18-2011, 04:55 AM
The default rom space /DTACK always results in four 68000 clock cycles unless held off by something like DMA or the 32X. In that period, you can access either a byte or a word.
200ns is absolute minimal speed you can use in MD without getting graphical corruption from failing DMA transfers.
EDIT: according to Eke-Eke's calculations it would be 300ns
Eke-Eke
05-18-2011, 05:12 AM
Most (all ?) ROMS are 16-bit so the cartridge does not make any difference betwen byte/word reads and always return a word on data bus. The CPU picks upper or lower byte in case of 8-bit read and the VDP most likely read one 16-bit word at once during DMA for performance reasons.
VDP DMA transfer time is tied to the maximal number of access to its internal bus which is a slot every 2 pixels and indeed makes ~300ns between each slots (at the fastest 6.71 Mhz pixel clock).
For VRAM, one byte is written per slot which means one word is being read from ROM every 2 slots = 600ns but for VSRAM/CRAM, access it's one word per slot so I too wonder how it could work with ROM slower than 300ns... most likely, all released games are using ROM with suitable speed or this would be immediately visible ;)
kool kitty89
05-18-2011, 08:41 PM
For VRAM, one byte is written per slot which means one word is being read from ROM every 2 slots = 600ns but for VSRAM/CRAM, access it's one word per slot so I too wonder how it could work with ROM slower than 300ns... most likely, all released games are using ROM with suitable speed or this would be immediately visible ;)
Perhaps games planned to use slow ROM would have the programmers take that into account. (can VSRAM/VRAM be updated manually by the CPU or use DMA to 68k work RAM rather than ROM -have the 68k load the necessary data into work RAM ahead of time)
There's definitely no buffering that would allow CRAM updates using slower accesses to ROM? (it seems rather wasteful to use fast ROM just for that purpose . . . not like the PCE where fast ROM is necessary for peak CPU speed, or SNES for that matter)
I was mainly wondering if NEC was the main exception to using much faster than commodity ROM speeds (I know the SNES uses slightly faster than average ROM, but it's not a huge leap from the cheap ~420 ns ROM the likes of the lynx was using -actually the Lynx and Jaguar were the main cases where ROM speed was brought into question -both avoiding ROM access as much as possible and relying on RAM to maintain good performance; the Jaguar in particular only using 375 ns -I think that might be hardcoded too, but I'm not sure -if not, that could mean some massive boosts in flexibility with ROM on homebrew releases, or even some of the low-profile late 90s releases)
Jorge Nuno
05-18-2011, 08:58 PM
Even 300ns is too much. The VDP reads the rom in one pixel and writes to cram (or vsram) on the next pixel, but the data has to be valid for a great part of the write cycle, not just for a split picosecond at the end :p (because of setup and hold times and propagation delays, etc). So at the end of the pixel where data has been read, there should be already valid data. Doesnt need to be, if its not then the write signal pulse has to be an "half-pixel" pulse.
I say 150ns should be the safe minimum, 180 or 200 may work.
villahed94
05-18-2011, 11:40 PM
The access time is much higher than the typical RAM chip used(100 ns). However the failing write-read may be related to the CPU speed. At a quite higher speed(12 mhz) the Cpu is reading and writing data much faster than what both the ROM and the VDP can, causing Graphic garbage. This is a rule for most games. In the other side there are games on which too fast writeback routines cause them to crash. There are some games on which the ROM is fast enough to have a constant data transfer and donīt have any problem with this, but they do have problems with PCM.
You have no idea what you just said....
Joe Redifer
05-19-2011, 01:36 AM
He does that. :)
kool kitty89
05-19-2011, 01:58 AM
Even 300ns is too much. The VDP reads the rom in one pixel and writes to cram (or vsram) on the next pixel, but the data has to be valid for a great part of the write cycle, not just for a split picosecond at the end :p (because of setup and hold times and propagation delays, etc). So at the end of the pixel where data has been read, there should be already valid data. Doesnt need to be, if its not then the write signal pulse has to be an "half-pixel" pulse.
I say 150ns should be the safe minimum, 180 or 200 may work.
Again, could you get around that by not using DMA to update CRAM? (ie if a programmer knew a game would be using slow ROM, could they set things up such that CRAM was only updated by the CPU manually copying data over -or copying into work RAM and then loading via DMA from that RAM)
Chilly Willy
05-19-2011, 02:08 AM
Again, could you get around that by not using DMA to update CRAM? (ie if a programmer knew a game would be using slow ROM, could they set things up such that CRAM was only updated by the CPU manually copying data over -or copying into work RAM and then loading via DMA from that RAM)
Yeah, they could do that. There are basically three completely different timings used for the rom access: the CPU uses four 68000 cycle accesses as mentioned; the VDP uses its own timing for DMA; finally, the Z80 uses yet another completely different timing for it's own accesses. The CPU is easy to understand - the other two aren't very well documented.
You have no idea what you just said....I don't think I have any idea what did he say either... x_X
kool kitty89
05-19-2011, 03:50 PM
Yeah, they could do that. There are basically three completely different timings used for the rom access: the CPU uses four 68000 cycle accesses as mentioned; the VDP uses its own timing for DMA; finally, the Z80 uses yet another completely different timing for it's own accesses. The CPU is easy to understand - the other two aren't very well documented.
In the 68000's case, doesn't it need to have the data ready on the bus by the end of the 3rd cycle or a memory access? (which would imply you'd need ~390 ns memory to avoid wait states in the MD, unless there's some buffering that could get around that . . . but I'd gotten the impression that it was with buffering that you could get away with 3 cycles -ie using a 16-bit latch like in the ST and Amiga- otherwise the 68k needs to access the bus on all 4 cycles of an access iirc)
Chilly Willy
05-19-2011, 04:28 PM
In the 68000's case, doesn't it need to have the data ready on the bus by the end of the 3rd cycle or a memory access? (which would imply you'd need ~390 ns memory to avoid wait states in the MD, unless there's some buffering that could get around that . . . but I'd gotten the impression that it was with buffering that you could get away with 3 cycles -ie using a 16-bit latch like in the ST and Amiga- otherwise the 68k needs to access the bus on all 4 cycles of an access iirc)
The standard 68000 read cycle without wait states has you assert the /DTACK by S4; the data bus must be stable by that point. So it's the end of cycle two, not three. With a latch, you could cut the bus short a cycle by isolating the 68000 from the bus and keeping the latch supplying the data to the 68000 for the fourth cycle. The 68000 always goes four cycles (plus any waits).
The Amiga was a little more sophisticated. Since the first two cycles aren't used by the 68000 (except as address setup), the custom chips used the first two cycles for itself, then the last two cycles for the 68000. That made much of the DMA transparent to the 68000, but required much faster ram.
villahed94
05-19-2011, 10:04 PM
You have no idea what you just said....
Of course I do. Frying Mdīs by super overclocking and some analysis of the games has lead me to 2 conclusions about how games can behave differently each other, and how some games tolerate much better speeds than others.
1.- Games that have a slow chip(They donīt tolerate as fast writeback routines as the cpu , if the latter runs at much higher speeds)
2.- Bad programming of the game itself(Even if the chip is fast enough to be running well, how the game was programmed can lead to disastrous results).
Typical Rom speed should be around 250-300 ns, while the oc tolerants might be around 200 ns or less. Fantasia is a good example of bad rom, being near 500 ns or more.
Jorge Nuno
05-19-2011, 10:11 PM
MDs don't die from "super clocking", you killed it by doing something else. Also game roms can't be 500ns, they wouldn't even run at the defualt speed.
Chilly Willy
05-19-2011, 11:28 PM
MDs don't die from "super clocking", you killed it by doing something else. Also game roms can't be 500ns, they wouldn't even run at the defualt speed.
It depends on if he means cycle time or access time. A 500ns cycle time is fine. A 500ns access time is not. The access time without overclocking needs to be at most 260ns or the 68000 might not be able to read it.
villahed94
05-20-2011, 01:14 AM
Jorge, they do die if you verclock it to about 30 mhz or so :P
Chilly, you are right, that means iīve got to lower bit the values, but Fantasia is in the lower end definetly
kool kitty89
05-20-2011, 04:44 AM
It depends on if he means cycle time or access time. A 500ns cycle time is fine. A 500ns access time is not. The access time without overclocking needs to be at most 260ns or the 68000 might not be able to read it.
I was talking about actual access times (that was the context where 520 ns initially came up) . . . but if the MD has no buffering and wait state generation hardware, there's no way the 68000 could work from such slow ROM at all. (and I haven't seen any mention of wait state management logic in the MD -in fact, Tomaitheous mentioned that as a potential added cost if Sega wanted to use a faster CPU)
If there WERE wait states, it would have worked, but with a performance hit to the 68k (you'd also want to emphasize use of RAM wherever possible in such circumstances).
Edit: here's the original discussion where ROM speed was mentioned:
http://www.atariage.com/forums/topic/119048-its-1993-youre-in-charge-of-the-jag-what-do-you-do/page__st__975__p__2013542#entry2013542
Wouldn't the Lynx fit in that category? (it's significantly slower than the older PC Engine/TG-16 using a similar CPU, plus the SNES if you take into account the 2x bus speeds necessary for the 65816 multiplexing -though I think the necessary access times for the MD's 68k would be lower)
Wouldn't the 68k's accessing in the Jaguar be slow enough to not be hindered working in ROM compared to RAM?
Low-cost ROMs run at between 350 and 500ns. For example, the Lynx ROMs are rated at 440ns.
Lynx DRAM is between two and four times that speed: 250ns in random access mode and 125ns in page mode. The Lynx uses page mode frequently, for handling graphics and even for sequential 6502 accesses such as opcode fetches.
On the Jaguar, the ROM speed is 375ns. This is slower than the memory access time of the 68000 which is 225ns.
For comparison: The Genesis ROM speed is 525ns. The somewhat newer SNES's ROM access time is 375ns.
- KS
The standard 68000 read cycle without wait states has you assert the /DTACK by S4; the data bus must be stable by that point. So it's the end of cycle two, not three. With a latch, you could cut the bus short a cycle by isolating the 68000 from the bus and keeping the latch supplying the data to the 68000 for the fourth cycle. The 68000 always goes four cycles (plus any waits).
The Amiga was a little more sophisticated. Since the first two cycles aren't used by the 68000 (except as address setup), the custom chips used the first two cycles for itself, then the last two cycles for the 68000. That made much of the DMA transparent to the 68000, but required much faster ram.
I'm nearly positive the ST uses the same set-up. (ST has the SHIFTER and DMA chips -floppy, ASIC, etc, use the first 2 cycles -250 ns- and the 2nd 2 for the 68k -another 250 ns -so slightly faster than the Amiga)
Still, that wouldn't have meant especially fast RAM for the time . . . just fast enough to respond in 279 ns (or 250 for the ST), and in the Amiga's case, that meant using 150 ns DRAM (rated by CAS speed -which made more sense once page mode was supported). With 150 ns DRAM, CAS+RAS could be completed within 279 ns (might have been barely adequate or with plenty of leeway, I'm not sure of the exact timing). That same memory might only allow a 3.58 MHz 6502 to run without wait states -without interleaving. ;) (and that's if it could be configured/buffered to allow full single cycle access times and not require the data to be ready within 1/2 a cycle of a request . . . otherwise it might be limited to just 1.79 MHz)
The Apple II used a similar interleaving scheme iirc, memory that could respond twice as fast as the CPU. (so 500 ns in that case -or faster depending on how the 6502 was interfaced)
The Acorn BBC Micro did that with a 2 MHz 6502. (interleaved access for video and CPU without wait states)
That sort of bus sharing becomes impractical extremely quickly as you move onto faster speeds. As you go to faster and faster memory, CAS speed increases consistently, but RAS has diminishing returns by comparison. (hence why efficient use of page mode accesses became so important . . . not to mention that even with common RAM speeds in the mid 80s, page mode accesses were somewhat faster, but the gap wasn't that huge -let alone orders of magnitude that it is today -ignoring things like multi-bank RAM configurations allowing multiple pages to be held open)
By the early 90s with commodity 100 ns DRAM, you might manage 200 ns random accesses (not sure on the exact timing, but definitely only a modest improvement in RAS over what the ST or Amiga were using), 80 ns would be somewhere around 180 ns, and 75 ns FPM DRAM was 175 ns (this one I know for sure, that was directly referenced in a jaguar discussion). So, from mid 80s to early/mid 90s commodity DRAM you haven't even gotten a 2:1 performance improvement . . . actually, if you wanted to use the Amiga's 50/50 split with that sort of DRAM, you wouldn't be able to boost CPU speed beyond ~11.43 MHz (without wait states, and assuming you're using a 68000 and not something with faster access times).
So a configuration optimized around page mode accesses (with an emphasis on serial bus accessing and caching/buffering -or a multi-bank/bus design) would be critical for making good use of bandwidth in DRAM. (in such a case, having any bus master that couldn't utilize page mode at peak bandwidth -in the case of 75 ns, that's 13.3 MHz accesses- it would be a detriment to the system's . . . albeit so would any bus master not capable of operating at the full bus width -whatever that may be)
A totally different story with SRAM, of course, no CAS/RAS or refresh to worry about, just plain accesses. ;) (takes a lot more silicon though -more cost and more power to use . . . not to mention lower volumes driving prices higher)
Same for ROM for that matter. ;)
Jorge Nuno
05-20-2011, 06:22 AM
Roms and srams are not like dram: the cycle time of roms can be pushed all the way back to the access time, if the data can be stored on the target with infinitesimal hold times. Unlike drams which have shitty cycle tiems, specially if the row address is changed.
The thing here is that rom/sram cycle times are limited by access time of the chip and by the hold time the target device requires, as opposed to dram that need you have to respect the minimum CAS (and or RAS) pulse width even if the data could be held for less time.
Whatever 500ns means, it's always too much, considering a cram dma cycle is 300ns >_>
And no 30MHz wont kill a MD by itself I already tried more than that, and a simple clock signal just doesnt kill.
villahed94
05-20-2011, 03:23 PM
Well, gotta add there was some overheating in the 68k area :P
kool kitty89
05-20-2011, 11:31 PM
Roms and srams are not like dram: the cycle time of roms can be pushed all the way back to the access time, if the data can be stored on the target with infinitesimal hold times. Unlike drams which have shitty cycle tiems, specially if the row address is changed.
But that's the context of the DRAM speeds mentions. Not talking about fast accesses, but true slow full random access (time from RAS cycle to the next complete RAS). The complete access time. (the RAM in the Amiga can totally complete a random access within 280 ns, that's the total access time, not the cycle time -that's also the only way the Amiga can access its DRAM, no page support, just true slow random accesses like on most 8-bit computers)
And no 30MHz wont kill a MD by itself I already tried more than that, and a simple clock signal just doesnt kill.
Couldn't it burn out the CPU? (overheat)
villahed94
05-21-2011, 01:04 AM
Couldn't it burn out the CPU? (overheat)
Of course it can, as the Silicon used in the 68k canīt oscillate that fast, and generate heat.
kool kitty89
05-21-2011, 02:29 AM
It would depend on the model used too . . . older NMOS 68ks would be worst, newer NMOS ones (assumign they switched to smaller interconnect at some point) would be somewhat better, and CMOS would be best . . . especially the very cool (temp wise) HD68HC000.
villahed94
05-21-2011, 04:19 AM
They were 68kīs made by motorola , used in the va6 Model 1
The chips will not overheat.... there's a point where propagation delays got so high that successive clocks get missed and execution gets messed up, causing the chip to go crash and stop working properly until its reset. I can feed 100MHz to my 68Ks and none of them will get damaged from it... but if I feed 12V in it then we'll get some damage, and I'm quite sure 12V is the number 1 killer... people like following guides that use 7805 directly as source for power, and people also like to strip wires very long, or do mistakes, and then wonder why nothing works afterwards:roll::fail::rofl:
villahed94
05-21-2011, 04:49 AM
And what about that in 1 Md , my 150 w iron fell into the 68k? Thatīs sure frying :P Anyway Nmos ones are the worst for these purposes, Cmos is better.
that was just plain stupidity
Guntz
05-21-2011, 05:08 AM
What I'd like to know is how'd he get his hands on an iron like that? Irons with that high a wattage are for stained glass soldering (or maybe welding :lol:), not delicate electronics soldering.
villahed94
05-21-2011, 03:01 PM
that was just plain stupidity
You are right tiido, having such powerful iron for modding wasnīt a really good idea :P Going to decommission that iron ASAP
kool kitty89
05-21-2011, 06:18 PM
The chips will not overheat.... there's a point where propagation delays got so high that successive clocks get missed and execution gets messed up, causing the chip to go crash and stop working properly until its reset. I can feed 100MHz to my 68Ks and none of them will get damaged from it...
Really? Interesting, so it's just stability then? (not even heat problems from older NMOS chips in VA0/1/2 boards? -though those are already several years newer than what you'd find with really early 68000s)
Hmm, then again, NMOS circuitry doesn't have the same sort of power dissipation curves as CMOS. (CMOS draws more and more at higher speeds, NMOS consumes a lot at low speeds, but not much more at high speeds -of course, it's so much higher consumption by default, that you'd need a really, really fast CMOS chips to actually consume more . . . let alone if there's a big disparity in interconnect size -ie CMOS chips would also tend to use newer, smaller processes -I believe most of the 68000s being produced by the early 90s were 1 micron)
There's just not enough silicon/logic used to have overheating problems, I guess.
It also seems like a surprisingly high percentage of old CPUs can be substantially overclocked without crashing (the systems themselves are generally the limiting factor, so with that aside, it seems like a lot of older CPUs are stable well above their actual ratings).
It makes me wonder why more manufacturers weren't overclocking CPUs as routine. (with any really unstable chips not meeting benchmark standards being used in down-rated systems -or just accepted the occasional return of a bad machine if the failure margins were small enough)
but if I feed 12V in it then we'll get some damage, and I'm quite sure 12V is the number 1 killer... people like following guides that use 7805 directly as source for power, and people also like to strip wires very long, or do mistakes, and then wonder why nothing works afterwards:roll::fail::rofl:
Heh, you mean people trying to pull 5V from the 7805, but accidentally pulling the unregulated 9/10/etc external power instead? ;)
Oh, reversed polarity and kill things too . . . even going through a voltage regulator, it can fry things on the system before the regulator blows. (that happened to some of the main chips on a VCS I have -literally blew a chunk off the package of one of them)
Heh, and then there's crazy stuff like this where the CPUs can self-destruct:
http://www.youtube.com/watch?v=4YEL7Jx26Wk
4YEL7Jx26Wk
http://www.youtube.com/watch?v=mZ7pUADoo58
mZ7pUADoo58
Yeah, just a minor engineering flaw, right? :rofl:
What I'd like to know is how'd he get his hands on an iron like that? Irons with that high a wattage are for stained glass soldering (or maybe welding :lol:), not delicate electronics soldering.
They're also good for general sheet metal soldering . . . not nearly hot enough to do any brazing (let alone welding) though. You can also use a blow torch for sheet metal soldering and such (also much too cool for brazing -unless you use MAPP gas and a swirl flame burner/nozzle -maybe propane with a swirly flame, but it will be slow going)
Oh, and then there's proper soldering coppers used with a small soldering furnace (we used those in metal shop back in middle school).
High wattage soldering irons are also good for wood burning. (for inscribing, stenciling, and such)
I really need a finer tip for our soldering iron, we probably have some somewhere, but I'm not sure where. (as it is, it's not too bad, but not great for really fine stuff)
One of those cold heat things might be good for really fine work too. (also less worry about burning the board or case)
Chilly Willy
05-21-2011, 08:27 PM
Heh, and then there's crazy stuff like this where the CPUs can self-destruct:
http://www.youtube.com/watch?v=4YEL7Jx26Wk
4YEL7Jx26Wk
http://www.youtube.com/watch?v=mZ7pUADoo58
mZ7pUADoo58
Yeah, just a minor engineering flaw, right? :rofl:
Yeah, that is TOTALLY due to software and NOT the 45V @ 4A applied to the boards as mentioned elsewhere. ;) :cool:
Hint that it MIGHT be fake: the AVR doesn't even HAVE a divide instruction.
villahed94
05-21-2011, 10:41 PM
A bit suspicious indeed, how can a Software error lead to Cpuīs self-destruction????
On one video with DIP chip you could see one logic chip explode too... I guess the pwoer supply indeed got a stack overflow or division by zero in it :P
kool kitty89
05-27-2011, 12:57 AM
On one video with DIP chip you could see one logic chip explode too... I guess the pwoer supply indeed got a stack overflow or division by zero in it :P
Heh, I thought that smoke was from a capacitor at first, but it looks like it actually burnt that other chip.
tomaitheous
07-06-2011, 11:52 AM
Well, it is an MCU. Maybe the stack overflow or whatever software crash, cause code to execute that turn input ports to output ports and high or low state causing havoc on some external components, 'round robin coming back to damage the chip (as well as some other circuit devices). It's possible (no, I didn't watch the videos).
Back on topic, what if a specific game only DMA'd from work ram? Thus you could get away with slower rom, right? I mean, if the VDP isn't trying to access the rom at faster than specific speeds (and at all), then I don't see why local vDMA would be an influencing factor.
Chilly Willy
07-06-2011, 05:32 PM
Back on topic, what if a specific game only DMA'd from work ram? Thus you could get away with slower rom, right? I mean, if the VDP isn't trying to access the rom at faster than specific speeds (and at all), then I don't see why local vDMA would be an influencing factor.
I think someone mentioned something similar earlier in the thread. Yes, if you put the data in the work ram and DMA'd it from there, the rom could be slower. The rom needs to be faster if you DMA straight from rom.
There's also another reason the rom should be faster - there's a bug in the IO hardware where if the Z80 is accessing the rom while the 68000 accesses the controller ports, the Z80 rom accesses will be shortened from then on. The short access cycle WILL cause slow roms to fail on the Z80 access. For slow roms where this is a danger, SEGA recommends requesting the Z80 bus before you read the controller ports, then release the Z80 when done. All the example SEGA code for reading the pads all do this as a result. If you use a faster rom, the short Z80 access won't cause a problem, so you don't need to request the bus from the Z80 before reading the IO.
kool kitty89
07-06-2011, 07:07 PM
Back on topic, what if a specific game only DMA'd from work ram? Thus you could get away with slower rom, right? I mean, if the VDP isn't trying to access the rom at faster than specific speeds (and at all), then I don't see why local vDMA would be an influencing factor.
I mentioned that earlier, but wouldn't that also involve the CPU copying data to work RAM beforehand -and then more overhead -CPU halted- for DMA to copy that to VRAM? Why not just have the CPU directly copy data to VRAM during vblank instead? (unless you could optimize the game's code better by having the CPU copy data to work RAM when it was more convenient -rather than only in vblank- and spend significantly less time "stuck" in vblank due to DMA's much faster copy rate -still more CPU time down to updating VRAM, but potentially more conveniently spread out, potentially avoiding certain performance bottlenecks)
Plus, if you had the data compressed in ROM, you'd probably end up decompressing it into work RAM before dma'ing it to VRAM. (granted, you'd probably only manage fairly simple compression schemes on the fly -like RLE- unless perhaps you did very gradual updates and could thus do more intensive schemes without heavy continuous overhead)
tomaitheous
07-06-2011, 09:32 PM
I think someone mentioned something similar earlier in the thread. Yes, if you put the data in the work ram and DMA'd it from there, the rom could be slower. The rom needs to be faster if you DMA straight from rom.
There's also another reason the rom should be faster - there's a bug in the IO hardware where if the Z80 is accessing the rom while the 68000 accesses the controller ports, the Z80 rom accesses will be shortened from then on. The short access cycle WILL cause slow roms to fail on the Z80 access. For slow roms where this is a danger, SEGA recommends requesting the Z80 bus before you read the controller ports, then release the Z80 when done. All the example SEGA code for reading the pads all do this as a result. If you use a faster rom, the short Z80 access won't cause a problem, so you don't need to request the bus from the Z80 before reading the IO.
Ahh. I remember reading that Sega recommended halting the z80 before reading the I/O ports, but I didn't remember any particular reason for it (just something about a potential lockup, but no specific details). So that's why. Interesting and good to know.
I mentioned that earlier, but wouldn't that also involve the CPU copying data to work RAM beforehand -and then more overhead -CPU halted- for DMA to copy that to VRAM? Why not just have the CPU directly copy data to VRAM during vblank instead? (unless you could optimize the game's code better by having the CPU copy data to work RAM when it was more convenient -rather than only in vblank- and spend significantly less time "stuck" in vblank due to DMA's much faster copy rate -still more CPU time down to updating VRAM, but potentially more conveniently spread out, potentially avoiding certain performance bottlenecks)
Plus, if you had the data compressed in ROM, you'd probably end up decompressing it into work RAM before dma'ing it to VRAM. (granted, you'd probably only manage fairly simple compression schemes on the fly -like RLE- unless perhaps you did very gradual updates and could thus do more intensive schemes without heavy continuous overhead)
It's extremely common for games to compress sprite and tile data, as well as tile map data. Almost all Genesis games I've seen mentioned with compression schemes, use an LZ variant for tile/sprite data. That can't exactly be decompressed fast enough for realtime/single frame requests. And if it did, normally it still needs a full area in ram to decompress to - unless to you use a circular buffer and directly write to an I/O port but this means the CPU writing to the port or calling the vDMA in small bursts as aligned segments are decompressed into the ring buffer. Either way, it's slow going and requires ram (small circular buffer or full decompressed area needed). Tilemaps can usually get away with RLE variant compression and can be made direct port write friendly if needed. But if you got free cpu resource during active display, then no reason why you shouldn't decompress to ram and just vDMA during vblank. I think the only exception where most developers probably wouldn't compress data would be palette blocks. The overall palette data to rom size is incredibly tiny. Probably less than 1% of 1%. If a developer is choosing to use a slow rom, then they only have to make sure to copy the CRAM data using the cpu or copy it to ram first before vDMA. Not a big deal IMO.
There's also another reason the rom should be faster - there's a bug in the IO hardware where if the Z80 is accessing the rom while the 68000 accesses the controller ports, the Z80 rom accesses will be shortened from then on. The short access cycle WILL cause slow roms to fail on the Z80 access. For slow roms where this is a danger, SEGA recommends requesting the Z80 bus before you read the controller ports, then release the Z80 when done. All the example SEGA code for reading the pads all do this as a result. If you use a faster rom, the short Z80 access won't cause a problem, so you don't need to request the bus from the Z80 before reading the IO.
Only that access gets shortened if I recall correctly, not all following accesses. Also, I was under the impression that the problem wasn't slow memory response but the slow processor (basically, Z80 tries to read from memory, but 68000 stomps into place and changes the bus contents while the Z80 is still on its reading cycle, causing the Z80 to read garbage).
Plus, if you had the data compressed in ROM, you'd probably end up decompressing it into work RAM before dma'ing it to VRAM. (granted, you'd probably only manage fairly simple compression schemes on the fly -like RLE- unless perhaps you did very gradual updates and could thus do more intensive schemes without heavy continuous overhead)
UFTC =P
Also quoting myself from another topic: (http://www.sega-16.com/forum/showthread.php?17697-MDTools-GIT-repository&p=376196&viewfull=1#post376196)
Also just now I bothered to do a stress test of UFTC, I had forgotten to do it >_> Tested against Stephany sprites in Project MD, which seems like a good candidate for this thing:
Uncompressed size: 44,576 bytes (100%)
Compressed size: 19,482 bytes (43.7%)
Well, it isn't much, but it's still some important saving =| This also goes to prove that UFTC works well when used in the domain it was designed for.
Chilly Willy
07-06-2011, 11:58 PM
Ahh. I remember reading that Sega recommended halting the z80 before reading the I/O ports, but I didn't remember any particular reason for it (just something about a potential lockup, but no specific details). So that's why. Interesting and good to know.
Yeah, there's a tech bulletin about it that goes into it at some depth, even showing how it occurs and how much the Z80 cycle is shortened. They include the latest recommended pad read code that halts the Z80 before reading. I had a talk with Steve Snake about that years back over at SpritesMind over whether that halt is really needed or not. With "modern" (for the mid-90's) roms and flash cards, it's not. But if you were trying to use as slow a rom as possible for some reason, it would be something to keep in mind. The bulletin was more trying to explain why some devs lost sound WAAAAAAAAY back in the stone age of MegaDrive development.
kool kitty89
07-07-2011, 07:38 PM
It's extremely common for games to compress sprite and tile data, as well as tile map data. Almost all Genesis games I've seen mentioned with compression schemes, use an LZ variant for tile/sprite data. That can't exactly be decompressed fast enough for realtime/single frame requests. And if it did, normally it still needs a full area in ram to decompress to - unless to you use a circular buffer and directly write to an I/O port but this means the CPU writing to the port or calling the vDMA in small bursts as aligned segments are decompressed into the ring buffer. Either way, it's slow going and requires ram (small circular buffer or full decompressed area needed). Tilemaps can usually get away with RLE variant compression and can be made direct port write friendly if needed. But if you got free cpu resource during active display, then no reason why you shouldn't decompress to ram and just vDMA during vblank.
Yes, anything loaded ahead of time (between levels, before a boss fight, etc) would be heavier/more optimal stuff, but decoding-resource-light compression could be an attractive alternative to areas you'd otherwise have to use uncompressed textures. (be it an RLE derivative or something else . . . or perhaps use column or line based RLE selectively on a per-texture basis -you'd really need textures catering well to RLE for decent results in either case -the examples of, apparently, RLE encoded FMV on the MCD obviously use almost a complete lack of dithering and possibly additional preprocessing to cater even better to RLE, and use full-frame bitmaps rather than smaller textures, so much tending towards better compression ratios, plus you have the sub-CPU and ASIC to help out there too)
Doing a 4-bit specific RLE format could be particularly attractive for smaller/medium sized objects (or anything that doesn't have considerable amounts of solid color lines -or dithered pairs if you did RLE on a paired pixel basis) since you could use 1 byte with the 1st nybble defining the color and the 2nd defining the length of pixels to fill (1 to 16) and only wasting 4-bits for each case of a single pixel rather than 12 (compared to uncompressed 4-bpp graphics), and you'd rarely need more than 16 pixels in a row anyway.
But back to the issue of slow ROM, that's also where DMA directly to VRAM would be important (games with lots of on the fly animation with VRAM maxed out, so no compression or light compression). I think it was mentioned that SFII uses a ton of uncompressed sprite animation that gets DMA'd straight from ROM . . . so if DMA DOES ever need ROM faster than the 68k needs, you might have some games where you couldn't DMA graphics from ROM. (of course, most such games would be rather large in general, and thus tend to be later games, facilitating faster ROM at low cost, with most small/early games tending to load everything into VRAM beforehand)
And yes, CRAM updates would be trivial to buffer as mentioned before. (also noted was that CRAM DMA would need significantly faster access times than normal graphics block transfer)
bloodflowers
02-28-2012, 05:21 PM
I just reprogrammed a Mega Turrican US cart so that it works on a Japanese unmodified console - problem is I'm seeing odd instances of full screen corruption like it didn't read the tiles or map data quickly enough - just does it for a frame. Do you think it might be EPROM speed? I used a 27C800-120 chip. Just ordered a few 100s but I keep reading that 150 'should' be OK.
Has anyone seen this happen before? I suppose it's possible I have a somewhat flaky C800 but..
150ns is on the bleeding edge for DMA, 200ns will have faulty DMAs. 120ns and lower should work fine
bloodflowers
03-06-2012, 01:34 PM
150ns is on the bleeding edge for DMA, 200ns will have faulty DMAs. 120ns and lower should work fine
Well I swapped it for a 100ns chip, seems less glitchy but still the occasional glitch on screen and hangs from time to time. It's a strange one, I took the Mega Turrican US rev, changed the region and fixed the checksum, byteswapped it and am running it on a known good system. I'd love to know just how fast the original ROM was, but there's no part number on it - just the datestamp and EPR number.
Maybe it's the make of chip - what are people using these days? Does anywhere sell dev boards with on board flash I could use instead? It needs to fit inside the original cart casing though with no modifications.
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.