Quantcast

Page 3 of 11 FirstFirst 1234567 ... LastLast
Results 31 to 45 of 152

Thread: How capable would the Neo Geo MVS/AES be in 3D polygon graphics?

  1. #31
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,795
    Rep Power
    50

    Default

    Quote Originally Posted by Guntz View Post
    Isn't Silpheed close enough to a fully 3D polygonal Sega CD game? Pretty much every other 3D game I've seen has a bunch of 2D stuff to optimize for speed.
    No, not at all. There's very little polygon rendering in Silpheed - all those polygon graphics you see in the game are just video streamed off the CD. The majority of the game is simply scaled sprites.

  2. #32
    Sports Talker
    Join Date
    Jun 2011
    Posts
    36
    Rep Power
    0

    Default

    This may be kind of a dumb idea, but how about saving a ton of triangles (or maybe quadrilaterals?) at varying angles/proportions in ROM to try and pre-render the polygons instead?
    It would be an enormous waste of space but the Neo can handle 1024mb of graphics without bankswitching and ROM is cheap today.

    The 68k need then only set the sprites' priority, zoom, and palette (out of 256). Even if you include a single background layer and a few bitmaps, you would still have around 300 "polygons" left to play with, which is probably more than what StarFox used in normal gameplay.
    Could this work or is it just impossible?

    On another note, in my opinion SNK missed a huge opportunity by not putting a 32-bit/3D processor in the Neo CD. Imagine all that memory used for textured polygons...

  3. #33
    Mastering your Systems Hero of Algol TmEE's Avatar
    Join Date
    Oct 2007
    Location
    Estonia, Rapla City
    Age
    23
    Posts
    9,071
    Rep Power
    68

    Default

    Scrubbing together 1GB NOR flash or some EPROMs will cost a nice chunk of money...
    Death To MP3, :3
    Mida sa loed ? Nagunii aru ei saa "Gnirts test is a shit" New and growing website of total jawusumness !

  4. #34
    Banned by Administrators
    Join Date
    Jun 2010
    Posts
    537
    Rep Power
    0

    Default

    Quote Originally Posted by axel View Post
    On another note, in my opinion SNK missed a huge opportunity by not putting a 32-bit/3D processor in the Neo CD. Imagine all that memory used for textured polygons...
    Too bad this bombed:
    http://www.youtube.com/watch?v=IwVXC...eature=related
    http://www.youtube.com/watch?v=-lRsh...eature=related

  5. #35
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,127
    Rep Power
    49

    Default

    Quote Originally Posted by TmEE View Post
    you can completely forget that idea, you don't know when one or the other does an access and you cannot stop anything there
    Is it an asynchronousity issue? (differing clock speeds meaning unmatches memory accesses . . . the same thing that would prevent interleaving with the MD VDP's DMA transfer engine -unless they'd included a mode where the DMA mechanism ran at a speed separate from the rest of the VDP, or at a different speed in vblank -of course, you'd need fast enough ROM to allow such an interleaved mode)

    Anyway, yeah, double buffering would make things a lot easier in that case . . . except I thought the Neo's VDP ran at 6 or 12 MHz, similar to the CPU. (which, again, should facilitate such interleaving -except similar clock speeds with total asynchonousity would still prevent the exact timing needed for such sharing)


    You could use 1 bank with contention and only have the CPU access in vblank, but there's little need to even consider that anyway. (for that matter, an interleaving scheme would be a bit unnecessary to consider even if it did work since we're talking about the Neo Geo and thus don't need to consider special low-cost concerns, even if they were using it in a game back then )
    Actually, if they WERE using it in a game back then, they probably wouldn't burden the CPU with so much and probably would have put a bunch more coprocessor support on-cart instead. (a fast CPU, TMS340, DSP, and/or blitter, or something totally custom)
    Actually, I'm a little surprised that (with all the expensive of Neo Geo games) they didn't push something like that for certain games that would benefit from such effects.

    On NES you just do your write through the PPU into VRAM, no extra headache or anything else...
    That includes VRAM on-cart?







    Quote Originally Posted by Chilly Willy View Post
    You can do a SINGLE write faster with the CPU; if you stored a long, you'd do four pixels in 6 cycles. The problem is the REST of the 68000 code around that single memory access. Remember that most 68000 commands are 8 to 14 cycles long, and you'll need at least a few even with unrolling loops. That makes the ASIC faster than anything but the most basic FILL operation, even though it works on nibbles. So solid color polygons may be faster to draw with the 68000, but that would be about it.
    Even for solid filling or such (that could be slower with the ASIC), could there be any advantage for offloading that drawing to the ASIC to free up the 68k for other things. (primitive rendering pipelining of sorts, send commands to the ASIC to render, go back to processing other things, and then send another command as needed -especially if the ASIC supports lists/chains of commands to be sent at a time)



    Quote Originally Posted by Chilly Willy View Post
    Ah! I see what they're doing - they're making the textures match the triangles before hand. That does allow you to render the triangle in one pass with something like the ASIC. Nifty idea. More work at the start to make the end much faster. My explanation was for conventional rendering of triangles where the texture is "normal", not pre-processed. The one thing you lose with pre-processing the texture is space: you have to have a square of cleared space that completely surrounds the triangle, otherwise it will overdraw already drawn parts of the display. You also need to set the memory mode to overdraw for this to work. Oh, that's another thing - it still overdraws the triangle as it will always draw a square surrounding the triangle, but some/much of the overdraw will be zero.
    Would that method eat up more RAM (for buffering the textures) than the "normal" line by line method?

    From the sound of it, that same method would also be useful for the Jaguar (the blitter's scaling/rotation/texture feature is rather like the Sega CD ASIC). Or maybe even for a triangle rasterizer on the Saturn or 3DO. (as an alternative to folded quads or line by line rasterization)






    Quote Originally Posted by Kamahl View Post
    It's an FMV game with what SEEM like 3D ships, I won't even bet on that.
    Love it anyway. Only game that really used the FMV well during gameplay.
    Quote Originally Posted by Chilly Willy View Post
    No, not at all. There's very little polygon rendering in Silpheed - all those polygon graphics you see in the game are just video streamed off the CD. The majority of the game is simply scaled sprites.
    Yeah, for a while I thought the small polygons (ships/objects/etc) were realtime polygons, but looking carefully it's pretty obvious that they're scaled animated sprites (carefully optimized for flips at that).

    It makes sense too given that would give a much faster/smoother result and leave CPU time for other things. (especially since it's constantly streaming video data, meaning the sub-CPU is going to be occupied much of the time)


    That and the original Silpheed also used pre-rendered polygonal objects as "sprites" (really software blitter objects given the platforms it was released for -and the original PC8801 in particular), and probably not realtime scaled either, but plain animation due to resource limitations. (it also uses a mostly solid color BG for simplified blitting and 8 pixel movement increments in the PC88 version -not sure about EGA- to work easily on byte boundaries -it's a planar bitmap)
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

  6. #36
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,795
    Rep Power
    50

    Default

    Quote Originally Posted by kool kitty89 View Post
    Even for solid filling or such (that could be slower with the ASIC), could there be any advantage for offloading that drawing to the ASIC to free up the 68k for other things. (primitive rendering pipelining of sorts, send commands to the ASIC to render, go back to processing other things, and then send another command as needed -especially if the ASIC supports lists/chains of commands to be sent at a time)
    No command lists for the ASIC... it's one of the improvements for VDP1 on the Saturn.

    The 68000 stores to the appropriate registers, one of which starts the operation. When done, the ASIC sets a flag and (if enabled) generates a Level 1 interrupt. So lists of operations can be handled asynchronously... via interrupt handler, but they still require CPU intervention.


    Would that method eat up more RAM (for buffering the textures) than the "normal" line by line method?
    Yes. No packing the textures. However, remember that the "texture" in the SCD is a stamp map - you can reuse tiles; the tiles can also be flipped. So you just need to be more creative about your textures.


    From the sound of it, that same method would also be useful for the Jaguar (the blitter's scaling/rotation/texture feature is rather like the Sega CD ASIC). Or maybe even for a triangle rasterizer on the Saturn or 3DO. (as an alternative to folded quads or line by line rasterization)
    Yeah, I could see the Saturn using this in particular to avoid needing to reprogram triangle based engines. It's probably easier to split the textures into triangles than to make pre-distorted textures for quads. You'll have essentially the same render time as they're both overdrawing, but overdrawing 0s is probably just slightly faster.

  7. #37
    Mastering your Systems Hero of Algol TmEE's Avatar
    Join Date
    Oct 2007
    Location
    Estonia, Rapla City
    Age
    23
    Posts
    9,071
    Rep Power
    68

    Default

    Quote Originally Posted by kool kitty89 View Post
    Is it an asynchronousity issue? (differing clock speeds meaning unmatches memory accesses . . . the same thing that would prevent interleaving with the MD VDP's DMA transfer engine -unless they'd included a mode where the DMA mechanism ran at a speed separate from the rest of the VDP, or at a different speed in vblank -of course, you'd need fast enough ROM to allow such an interleaved mode)

    Anyway, yeah, double buffering would make things a lot easier in that case . . . except I thought the Neo's VDP ran at 6 or 12 MHz, similar to the CPU. (which, again, should facilitate such interleaving -except similar clock speeds with total asynchonousity would still prevent the exact timing needed for such sharing)
    its synchronous to some degree but nothing you can exploit, and VDP runs at 24MHz and is using up most of the cycles given to it.

    That includes VRAM on-cart?
    Yes, its all in same address space
    Death To MP3, :3
    Mida sa loed ? Nagunii aru ei saa "Gnirts test is a shit" New and growing website of total jawusumness !

  8. #38
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    You know, how many pixels does the ASIC write per access? Because memory is 16-bit, and having to access the same word four times (once per pixel) seems... stupid.

  9. #39
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,795
    Rep Power
    50

    Default

    Quote Originally Posted by Sik View Post
    You know, how many pixels does the ASIC write per access? Because memory is 16-bit, and having to access the same word four times (once per pixel) seems... stupid.
    You'd have to do four writes with the CPU (or set aside a register to accumulate four pixels) if you do anything other than a straight copy. Remember that the pixels can come from non-adjacent locations in the texture. The four pixels may be in a row in the destination, but probably not in the source.

    But that's one reason GPUs went VERY quickly to only supporting 16 or 32 bit output - the pixels match the bus better. Not the only reason, of course, but if you're only going to write one pixel, might as well match the bus width.

  10. #40
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    Quote Originally Posted by Chilly Willy View Post
    You'd have to do four writes with the CPU (or set aside a register to accumulate four pixels) if you do anything other than a straight copy. Remember that the pixels can come from non-adjacent locations in the texture. The four pixels may be in a row in the destination, but probably not in the source.
    Touché, but then shouldn't we take into account the calculation time for both reading and writing? Also, what's the clock speed of the ASIC? How many cycles does it take to draw four pixels?

    Quote Originally Posted by Chilly Willy View Post
    But that's one reason GPUs went VERY quickly to only supporting 16 or 32 bit output - the pixels match the bus better. Not the only reason, of course, but if you're only going to write one pixel, might as well match the bus width.
    Technically that isn't true anymore since everything goes through a caché that's using a larger data bus. It's more something to do with alignment than with bus size (also, 8-bit graphics mode had the issue of being paletted, which gave lots of trouble with graphics calculations...).

  11. #41
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,795
    Rep Power
    50

    Default

    Quote Originally Posted by Sik View Post
    Touché, but then shouldn't we take into account the calculation time for both reading and writing? Also, what's the clock speed of the ASIC? How many cycles does it take to draw four pixels?
    The timing for the ASIC does take into account writes as well as reads... as well as CPU accesses of word ram, and refresh cycles. The ASIC clock is the same as the CPU, 12.5MHz. The exact timing formulas are illegible in the docs that exist, but at a guess, I'd say it's 3 clocks for every access made, be it reading a pixel, writing a pixel, CPU access, or refresh. There are some boundary issues that effect the clock, and also some differences for the memory mode, but like I said, the current scan of the docs can't be read for this one area. I do wish someone came up with a clean scan as I'd love to see the exact timing.


    Technically that isn't true anymore since everything goes through a caché that's using a larger data bus. It's more something to do with alignment than with bus size (also, 8-bit graphics mode had the issue of being paletted, which gave lots of trouble with graphics calculations...).
    Yes, modern GPUs don't have this issue anymore... among many others. It was more an issue in early generations of accelerators.

  12. #42
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,127
    Rep Power
    49

    Default

    Quote Originally Posted by Sik View Post
    You know, how many pixels does the ASIC write per access? Because memory is 16-bit, and having to access the same word four times (once per pixel) seems... stupid.
    The only use of the 16-bit wide memory is for the CPU and Sub-CPU to access.

    The ASIC does pixel by pixel rendering, it can't work on bytes or words, just nybbles, and doesn't support more flexible blitting operations, just copying/moving texture stamps along with the scaling/rotation algortithm to stretch and rotate those stamps.
    If it supported simple block copy or block fill (or copy/fill with hardware masking support for that matter), that could have been implemented on 16 bit words, or if the scaling/rotation rendering mechanism supported modes for working with 8 or 16-bit pixels rather than just 4-bit ones (which, of course, would only be useful for drawing dithered objects on the Genesis -with pairs or groups of 4 pixels- or for future add-ons like the 32x ).
    Its a bit of a shame it at least didn't have simple DMA block copy like the MD VDP (but a bit faster), especially with connectivity to program RAM to allow much faster updates to word RAM than the CD's 68k can manage. (a 16-bit 12.5 MHz block copy engine should manage at least 4.17 MB/s assuming 3 cycle long random accesses -80 ns FPM RAM cycle times should be 180 ns so 3 cycles is as fast as you can go- with no special timing/pipelining to set the destination address concurrently and cut out wait states for writes, and perhaps 6.25 MB/s if such pipelining is used, 2x that peak if fast page mode was supported)


    sing the texture mapping feature with 16-bit reads or writes with 4-bit pixels would require a significantly more advanced chip with read/write buffers supporting fetching up to 4 pixels at a time and building a row of up to 4 pixels to spit back out. (or just a write buffer to make multiple single pixel reads, but still allow up to 4 pixels to be buffered for output)
    The Saturn, jaguar, and (I think) 3DO all lack such a feature, but the Playstation and jaguar II (unreleased) do and I believe most/all later platforms do as well. (along with far more sophisticated caching and pipelining)
    In fact, the jaguar's texture mapping feature works very much like the Sega CD's ASIC (rendering scaled/rotated rectangles) except it adds 8bpp to 16bpp indexing as well as the ability to work with variable pixel depths of 8, 16, or 32-bits (it might do 4-bits too, but that would be to a 16 color framebuffer since I don't think there's any 16-color indexing support, unfortunately -there is for "sprite" objects renderd by the object processor, but not for the blitter, and no indexing for 32-bits either, only 8 to 16 bit indexing).

    So in the jaguar's case, you'd obviously be making use of 16-bit writes most of the time (and either 8 or 16-bit reads -probably 8-bit indexed to save space), but still only a fraction of the main bus, and also unbuffered to make use of fast page mode, so each read and write takes 5 cycles. (much like how the MCD's ASIC takes 3 cycles per read/write -though it may have some automatic clearing support for overwrite, and that could make use of fast page mode for the second access and only add 1 more cycle rather than 3 -same for the Jaguar, if overwrites occur)


    However, that's not the only area you could speed things up, the 3DO and Saturn both separate source (texture memory) and destination (framebuffer) into different banks/buses of RAM to allow fast page mode much of the time (without more advanced design to allow heavy on-chip line buffers and caching like the PSX did -or the jaguar for many other operations, just not texture mapping -the Jaguar also could have done that and actually supports 2-bank interleaving so it wouldn't even need the complexity of an added bus, but would need the cost of more RAM chips for the 2nd bank, though the CoJag did it and with VRAM at that -as it is, the 4k GPU scratchpad can be used to accelerate textures somewhat).

    Anyway, for the MCD, that's pretty significant as you already have a 2nd bus to work with like the 3DO (halt the CPU to do a texture fetch in CPU RAM), or an even more useful option could have been allowing the 2 word RAM chips (2 64kx16-bit DRAMs) to support interleaved fast page mode accesses (employ a DRAM controller capable of holding 1 page open on each chip simultaneously) and thus allow considerably use of fast page mode (80 ns) reads/writes from one bank to another. (one as source and another as destination)
    Not only would that be very useful for accelerating rendering in general (with textures in one bank and buffer in the other), but it also could have been useful for the 2-pass rendering method Chilly Willy and I were discussing a while back. (use the ASIC for column rendered games sideways and then do a 2nd pass to rotate the lines to columns and complete a final pass with the line based floor/ceiling -for a doom-like game- ) Of course, for such a scheme, you'd need to have some textures in each bank and buffer space in each as well. (so RAM use is a bit tighter and updates by the 68k more frequent -still probably more efficient than contending over the 68k's bus)









    Quote Originally Posted by Chilly Willy View Post
    You'd have to do four writes with the CPU (or set aside a register to accumulate four pixels) if you do anything other than a straight copy. Remember that the pixels can come from non-adjacent locations in the texture. The four pixels may be in a row in the destination, but probably not in the source.
    Yes, and that's what I meant by a write buffer (a register to accumulate pixels for longer words/phrases -and same issue with the Jaguar, except you'd want a 64-bit write buffer there).

    And yes, a write buffer would be more often needed than for reads, though having both could still be useful (multiple pixels and work with them internally to fill the write buffer and do additional fetches as needed -more useful in some cases than other). You could also do buffers longer than the bus width to take advantage of page mode. (if you had a 64-bit write buffer on the CD ASIC -or 256 bits in the jaguar-, that would mean 1 16-bit random write followed by 3 page mode writes -of course, only at peak use in cases where the destination would be at least 16 pixels wide)

    But that's one reason GPUs went VERY quickly to only supporting 16 or 32 bit output - the pixels match the bus better. Not the only reason, of course, but if you're only going to write one pixel, might as well match the bus width.
    Except they also moved very quickly to 64-bits (and not just for faster framebuffer reads and 2D blitting), and that requires additional buffers and caching to make proper use of. (just as systems working with 4/8/16-bit pixels on a 32-bit bus would -like the Playstation)
    And then moved to 128-bits in the late 1990s, requiring even heavier buffering of phrases of pixels to make full use of the bandwidth. (not to mention buffering/caching to make optimal use of page mode accesses to DRAM -even more so if working with shared memory with the CPU, like in some laptops and the Xbox or 360 for that matter)
    And more recently 256 bits, but with 128 and occasional 64-bit examples still floating around some newer stuff. (especially looking at ATi and NVidia's offerings over the last 2 decades)
    The N64's RSP (not sure about RDP) worked on 128-bits internally, but 8 (or 9) externally.

    Dreamcast went 64-bit, PS2 was 16-bit for main with RDRAM (which was also 16-bits on PCs standard -not sure what the width on-die GPU memory is in the PS2), Xbox was 128-bit (dual channel), and I'm not sure what the GC used. (either for external DRAM or the on-die PSRAM for the GPU or GPU -same for Wii)
    I think both the 360 and PS3's GPUs use 256-bit external buses.


    The Saturn was one of the very few examples of a major 3D accelerator product where the blitter/GPU bus was the width of the pixels (assuming there's not a write buffer for 8bpp mode). The Jaguar had buffering for some 3D and most 2D operations to work on 64-bits and use fast page mode (but not for texture mapping, which Flare guessed would be more of a "spice" used sparingly when they were laying down the design in 1990), and the PSX GPU definitely buffers heavily for both 32-bit read/write and page mode use.
    The original Rage accelerator also seems to have buffered for 16 (not sure about 8) bit pixels/textels with 32-bit reads/writes and page mode use. (actually, wiki's specs on the ATi GPU page lists 320 MB/s max bandwidth, but with 40 MHz EDO DRAM, that would need a 64-bit bus and not 32-bit as mentioned . . . odd -it also mentions 40 M pixels/textels per second, though that figure would conform more to 32-bits at 40 MHz for 16-bit pixels -that would be like the PSX's 33M 16-bit textels per second on a 33 MHz 32-bit bus, with that also being the theoretical peak and not real-world performance with refresh, page breaks, and framebuffer scanning overhead taken into account -granted, you WOULD reach that speed for short burst in real-world performance, if not actually faster for cached textures)








    Quote Originally Posted by Chilly Willy View Post
    The timing for the ASIC does take into account writes as well as reads... as well as CPU accesses of word ram, and refresh cycles. The ASIC clock is the same as the CPU, 12.5MHz. The exact timing formulas are illegible in the docs that exist, but at a guess, I'd say it's 3 clocks for every access made, be it reading a pixel, writing a pixel, CPU access, or refresh. There are some boundary issues that effect the clock, and also some differences for the memory mode, but like I said, the current scan of the docs can't be read for this one area. I do wish someone came up with a clean scan as I'd love to see the exact timing.
    Yes, assuming the approximate 180 ns memory cycle time for 80 ns FPM DRAM is accurate, that would mean no fewer than 3 cycles per complete read/write. (aside from possible cases of page mode use -which may have been employed for drawing on top of other graphics -where a few consecutive reads/writes would be useful for clearing and writing -it would be useful for masking if you ever did more than 1 pixel at a time, but that's a bit moot with single pixels -ie just don't draw transparent pixels at all)

    Depending how refresh is handled, it might not be a significant hit at all, but another consideration would also be any additional cycles lost to processing between reads and writes. (I believe the jaguar's blitter loses one more cycle between reads and writes for its textures, no idea what the Sega CD is like though -or the Saturn VDP1 for that matter)
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

  13. #43
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    Quote Originally Posted by kool kitty89 View Post
    The only use of the 16-bit wide memory is for the CPU and Sub-CPU to access.

    The ASIC does pixel by pixel rendering, it can't work on bytes or words, just nybbles, and doesn't support more flexible blitting operations, just copying/moving texture stamps along with the scaling/rotation algortithm to stretch and rotate those stamps.
    Except because the 68000 simply can't deal with memory that doesn't have a 16-bit data bus, period (Z80 RAM has an 8-bit data bus, but the hack needed to get that to work means the 68000 can't do a word access to it). The ASIC uses the same memory the 68000 accesses, so it must be 16-bit too. If the ASIC only did 4-bit accesses, that means the memory must support 4-bit writes too. I really doubt that's the case, as far as I know all 16-bit RAM only provided at most 8-bit writes (with the /USB and /LSB lines). So, it must be accessing more than 4-bit at once.

    Also here's what I mean. With your logic, it'd go like this:
    1. Read word from bitmap
    2. Read word from texture
    3. Write word into bitmap
    4. Read word from bitmap
    5. Read word from texture
    6. Write word into bitmap
    7. Read word from bitmap
    8. Read word from texture
    9. Write word into bitmap
    10. Read word from bitmap
    11. Read word from texture
    12. Write word into bitmap

    What I mean is that the ASIC is probably doing this:
    1. Read word from texture
    2. Read word from texture
    3. Read word from texture
    4. Read word from texture
    5. Write word into bitmap

    See my point?

  14. #44
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,127
    Rep Power
    49

    Default

    Quote Originally Posted by Sik View Post
    Except because the 68000 simply can't deal with memory that doesn't have a 16-bit data bus, period (Z80 RAM has an 8-bit data bus, but the hack needed to get that to work means the 68000 can't do a word access to it). The ASIC uses the same memory the 68000 accesses, so it must be 16-bit too. If the ASIC only did 4-bit accesses, that means the memory must support 4-bit writes too. I really doubt that's the case, as far as I know all 16-bit RAM only provided at most 8-bit writes (with the /USB and /LSB lines). So, it must be accessing more than 4-bit at once.
    If doing actual 4-bit aligned addressing and individual 4-bit reads/writes are an issue, it might just be working on 4 bits and then masking that to a word (or possibly byte) aligned output.

    Also here's what I mean. With your logic, it'd go like this:
    1. Read word from bitmap
    2. Read word from texture
    3. Write word into bitmap
    4. Read word from bitmap
    5. Read word from texture
    6. Write word into bitmap
    7. Read word from bitmap
    8. Read word from texture
    9. Write word into bitmap
    10. Read word from bitmap
    11. Read word from texture
    12. Write word into bitmap

    What I mean is that the ASIC is probably doing this:
    1. Read word from texture
    2. Read word from texture
    3. Read word from texture
    4. Read word from texture
    5. Write word into bitmap

    See my point?
    To do the latter it would need more advanced logic to support a multi-pixel wide write buffer, that's something the jaguar also lacked that would have helped a lot with texture mapping bandwidth.

    And those are things I already mentioned . . . there actually things that came up on Atariage in context of the jaguar. (again, the jag's texture mapping works rather like the ASIC with single pixels at a time for scaled/rotated rectangular stamps -with added CPU/GPU grunt for warped 3D- but it can work on different pixel depths up to 32-bits, but all of those depths are still single pixel reads/writes -usually 8 or 16 bit reads are used and 16-bit writes)

    It was explicitly mentioned (in that Jaguar discussion) that adding a multi-pixel write buffer would have been one of the simpler changes to the blitter dramatically improve texture mapping performance on the Jaguar. (for 16-bit pixels, a 64-bit write buffer would be about 2.5x faster than single pixel texture mapping)
    I don't think the Saturn's VDP1 even supports a word buffer for writes in 8bpp mode (though that would make 8bpp rendering more attractive), not sure about the 3DO. (it's a 32-bit bus and normally using a 16bpp framebuffer, so 2 pixels buffered maximum)


    Also, why would you need to read a word from the bitmap if you positively knew it was going to be zeros (for non overwritten scaled objects -like for non overlapping objects on sprite or BG cells)?
    And for cases where you DO want overwrites, wouldn't it be more efficient to read the bitmap (destination) after reading the texture? (since the next access will be to that same address for the bitmap and thus facilitate page mode operation)
    Last edited by kool kitty89; 06-18-2011 at 01:33 AM.
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

  15. #45
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    Quote Originally Posted by kool kitty89 View Post
    If doing actual 4-bit aligned addressing and individual 4-bit reads/writes are an issue, it might just be working on 4 bits and then masking that to a word (or possibly byte) aligned output.
    You're going by the assumption the ASIC is just 4-bit. In fact, we say ASIC to refer to the renderer, but if I recall correctly the ASIC actually contained all the custom hardware in the Mega CD (much like its ASIC counterpart in the Mega Drive side), so it's already pretty complex for starters. It's still less complex than the ASICs in the MD though, meaning it should have had enough die space for 12 more bits...


    Quote Originally Posted by kool kitty89 View Post
    Also, why would you need to read a word from the bitmap if you positively knew it was going to be zeros (for non overwritten scaled objects -like for non overlapping objects on sprite or BG cells)?
    Because of your own flawed assumption. If the ASIC only manipulates 4 bits at once, but memory needs 16-bit, it needs to buffer 16-bit of data from memory, modify the 4-bit that are affected, then write them back. This also means the ASIC already has a 16-bit buffer to hold the data... by which point you may as well just modify all 16-bit at once. See my logic?

    Quote Originally Posted by kool kitty89 View Post
    And for cases where you DO want overwrites, wouldn't it be more efficient to read the bitmap (destination) after reading the texture? (since the next access will be to that same address for the bitmap and thus facilitate page mode operation)
    My point still stands, it's still doing more reads than it could when not doing overwrites.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •