Quantcast

Page 14 of 46 FirstFirst ... 410111213141516171824 ... LastLast
Results 196 to 210 of 690

Thread: Comparison of 5th generation ("32/64-bit") game console hardware

  1. #196
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,171
    Rep Power
    50

    Default

    Quote Originally Posted by Crazyace View Post
    VDP1 has 3 seperate sets of address/data pins going to the 3 ram chips - so it's not a single bus, and should support simultaneous read/write operations. ( That's why I compare it to the OPL on Jaguar, rather than a blitter system which would be issueing read/writes sequentially )
    Hmm, I knew framebuffer and texture RAM were on separate buses, but I didn't realize they could be simultaneously accessed by the VDP. (I assumed it switched between the 2 via a shared set of address/data lines -like the 32x CPUs do)

    If VDP1 was pipelined to allow writes to the framebuffer while fetching the next textel, that would certainly boost texture fill rate to similar speed as plain line filling (and perhaps faster than gouraud shading -depending on what logic was dedicated to that).
    So a lot faster than I was thinking. (and also a much better match for the PSX GPU, for that matter -and could probably have made for a quite nice 2D engine for the time even without VDP2 . . . and obviously similarly capable in 3D as it already is)

    Hmm, I wonder if the 3DO supports that too . . . that would be a much better use of the separate source and destination buses (VRAM for the framebuffer and shared DRAM for textures), or for that matter if it has 32-bit source and destination buffers to take advantage of the full bus width (or at least a destination buffer). In the 3DO's case, the max fillrate (with FPM and 32-bit optimizations) would be 25 Mpix/s for 16-bit pixels. (not sure of the actual 3DO texture performance though)

    It's also odd that VDP1 would have separate buses for both framebufer banks though, since it would only be accessing 1 per-frame (with VDP2 accessing the other) . . . unless VDP1 also contains the bus-steering logic to flip the banks between VDP1 and VDP2 (except that would require an additional set of address/data lines to connect to VDP2 . . . though since it should just be scanning a linear framebuffer, that could be limited to a serial pixel bus -somewhat like VRAM uses)

    I found this quite interesting ( Saturn schematic )

    http://www.geocities.jp/atx197/Ssmn052S1_10.pdf

    along with a datasheet for the ram - which is SDRAM for VDP1 frame buffers.

    http://www.datasheetarchive.com/HM52...datasheet.html

    seems to show single cycle burst writes...
    Hmm, interesting, especially the dual bank support (which would make that slightly closer to SGRAM in some respects -except the banks are fixed, so more like 2 separate 128k chips with interleaving . . . which would also explain the odd 2Mbit density). That should mean the 32x could get a significant performance boost from having the master and slave SH2s work mostly in separate 128k banks of SDRAM (reducing page breaks).

    The speed ratings confirm my previous knowledge on the Saturn/32x SDRAM . . . much faster rated speed than the system was actually clocked at. (in that respect, a single bus using the same SDRAM chips clocked at double the VDP speed could have performed similarly to the separate buses -assuming source/destination were still interleaved as separate banks . . . or double the current bandwidth if used at full speed and with simultaneous accessed on separate buses)




    Quote Originally Posted by Chilly Willy View Post
    The 32X uses a FIFO for reading/writing the frame buffer. So you can read from the cache/SDRAM/cart at the same time a write to the frame buffer is pending in the FIFO. I would expect VDP1 was setup similarly, so it SHOULD be able to read from a texture/sprite while writes to the frame buffer are pending in a FIFO, meaning you SHOULD be able to get the same fill rate as a plain fill operation. SEGA was good about using FIFOs like that to get slightly better performance from hardware.
    In this case, it should be a good bit more than slightly more, but potentially double the performance.

    The 32x example is limited since the framebuffers are quite slow in general, and the CPUs wouldn't be reading and writing pixels simultaneously even if the buffers were at SDRAM speed. (ie you wouldn't even be able to do block copy faster than 11.5 M words/s)
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

  2. #197
    Mastering your Systems Hero of Algol TmEE's Avatar
    Join Date
    Oct 2007
    Location
    Estonia, Rapla City
    Age
    23
    Posts
    9,137
    Rep Power
    71

    Default

    Multiple buses are done so there could be parallel accesses happen. VRAM is read and written to one of the framebuffers, while other framebuffer is read by the other VDP and shown along with BG stuff.
    Death To MP3, :3
    Mida sa loed ? Nagunii aru ei saa "Gnirts test is a shit" New and growing website of total jawusumness !

  3. #198
    Wildside Expert
    Join Date
    May 2011
    Posts
    144
    Rep Power
    4

    Default

    Quote Originally Posted by kool kitty89 View Post
    Hmm, I knew framebuffer and texture RAM were on separate buses, but I didn't realize they could be simultaneously accessed by the VDP. (I assumed it switched between the 2 via a shared set of address/data lines -like the 32x CPUs do)

    If VDP1 was pipelined to allow writes to the framebuffer while fetching the next textel, that would certainly boost texture fill rate to similar speed as plain line filling (and perhaps faster than gouraud shading -depending on what logic was dedicated to that).
    So a lot faster than I was thinking. (and also a much better match for the PSX GPU, for that matter -and could probably have made for a quite nice 2D engine for the time even without VDP2 . . . and obviously similarly capable in 3D as it already is)
    It makes sense if you think of VDP1 as a souped up arcade sprite board ( system X or Y ) with good performance for scaled sprites - general performance would depend on the page break penalty ( or half of it given that the SDRAM supports 2 open banks )

    Quote Originally Posted by kool kitty89 View Post
    Hmm, I wonder if the 3DO supports that too . . . that would be a much better use of the separate source and destination buses (VRAM for the framebuffer and shared DRAM for textures), or for that matter if it has 32-bit source and destination buffers to take advantage of the full bus width (or at least a destination buffer). In the 3DO's case, the max fillrate (with FPM and 32-bit optimizations) would be 25 Mpix/s for 16-bit pixels. (not sure of the actual 3DO texture performance though)
    Haven't looked into the 3DO for a long long time - what width was the VRAM? ( and how fast was it clocked ) - I remember some comments about 2 render engines to speed things up, but everything was high level.

    Quote Originally Posted by kool kitty89 View Post
    It's also odd that VDP1 would have separate buses for both framebufer banks though, since it would only be accessing 1 per-frame (with VDP2 accessing the other) . . . unless VDP1 also contains the bus-steering logic to flip the banks between VDP1 and VDP2 (except that would require an additional set of address/data lines to connect to VDP2 . . . though since it should just be scanning a linear framebuffer, that could be limited to a serial pixel bus -somewhat like VRAM uses)
    Again it makes sense if you think of VDP1 as a arcade sprite part - supporting rotated scan out of the frame buffer to display rather than rotating when drawing.


    Quote Originally Posted by kool kitty89 View Post
    Hmm, interesting, especially the dual bank support (which would make that slightly closer to SGRAM in some respects -except the banks are fixed, so more like 2 separate 128k chips with interleaving . . . which would also explain the odd 2Mbit density). That should mean the 32x could get a significant performance boost from having the master and slave SH2s work mostly in separate 128k banks of SDRAM (reducing page breaks).
    I read it as supporting two open banks at once which would be useful for speeding up copying memory in general. Obviously it's a ram feature, so dependant on the controller. ( Also why the reference to the 32X - this is the chip on VDP1 )

    Quote Originally Posted by kool kitty89 View Post
    The speed ratings confirm my previous knowledge on the Saturn/32x SDRAM . . . much faster rated speed than the system was actually clocked at. (in that respect, a single bus using the same SDRAM chips clocked at double the VDP speed could have performed similarly to the separate buses -assuming source/destination were still interleaved as separate banks . . . or double the current bandwidth if used at full speed and with simultaneous accessed on separate buses)
    Page breaks would still be the same if the clock was higher though I guess.

  4. #199
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,832
    Rep Power
    51

    Default

    Quote Originally Posted by kool kitty89 View Post
    The 32x example is limited since the framebuffers are quite slow in general, and the CPUs wouldn't be reading and writing pixels simultaneously even if the buffers were at SDRAM speed. (ie you wouldn't even be able to do block copy faster than 11.5 M words/s)
    Which is why I said reading from cache/SDRAM/cart. Reading from the frame buffer is slower than writing it, but faster than reading uncached SDRAM or about the same speed as reading the cart. So it would be beneficial in some cases to use frame buffer memory for storing data. That would be more useful with 256 color displays than 16-bit color as the latter takes most of the frame buffer. As an example, cell data for a 2D game would not be a good thing to store as you would be constantly reading/writing the frame buffer at the same time, but name table data (to use the MD term) would be fine as you wouldn't be reading that data at the same time you are trying to write the frame buffer.

  5. #200
    Raging in the Streets Da_Shocker's Avatar
    Join Date
    Apr 2009
    Location
    Cashville,TN
    Posts
    3,830
    Rep Power
    32

    Default

    So I know this system flopped bigger than the 32X and VB but has anyone played the Apple Bandai Pippin before?

  6. #201
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,832
    Rep Power
    51

    Default

    Quote Originally Posted by Da_Shocker View Post
    So I know this system flopped bigger than the 32X and VB but has anyone played the Apple Bandai Pippin before?
    I've got the closest thing to it - a Performa 5200CD.

  7. #202
    Raging in the Streets Da_Shocker's Avatar
    Join Date
    Apr 2009
    Location
    Cashville,TN
    Posts
    3,830
    Rep Power
    32

    Default

    Quote Originally Posted by Chilly Willy View Post
    I've got the closest thing to it - a Performa 5200CD.
    LOL those were more advanced than the LC 2 and 3. I remember the ole IIe or whatever they're called we used them from 1st-6th grade. Then we the LC's in 7th grade and there were so advanced and shit.

  8. #203
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,832
    Rep Power
    51

    Default

    Quote Originally Posted by Da_Shocker View Post
    LOL those were more advanced than the LC 2 and 3. I remember the ole IIe or whatever they're called we used them from 1st-6th grade. Then we the LC's in 7th grade and there were so advanced and shit.
    It was basically a Quadra 605 with the 68040 replaced by a PPC603 and some cache. It wasn't a BAD machine... for the time. The best thing about it was the video card - it had RF/svideo/composite inputs, a remote, and could capture video. It ran in YUYV mode for better quality than the previous Apple video cards. I used to digitize my video tapes using it. It wasn't very good for 3D - no hardware acceleration and the 603 wasn't any better than a Pentium for software 3D.

  9. #204
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,171
    Rep Power
    50

    Default

    Quote Originally Posted by TmEE View Post
    Multiple buses are done so there could be parallel accesses happen. VRAM is read and written to one of the framebuffers, while other framebuffer is read by the other VDP and shown along with BG stuff.
    Yes, I already knew that . . . having hardware flipped framebuffer banks eliminates contention (like the 32x framebuffers or MCD word RAM in flip-flop 1M mode) and does so without needing dual-port memory (3DO does similarly using VRAM). Having separate buses for source and destination also allows use of fast page mode copying without extensive on-chip buffering (something the 3DO also takes advantage of), though having a single bus with bank interleaving (to allow 1 page to be held open in each bank) also allows that. (something the Jaguar supported -and used for the ROM/RAM banks, though also supported a 2nd DRAM bank which wasn't populated for cost reasons -the CoJag used VRAM in that bank)

    However, what I had failed to consider was the possibility of reading and writing to both buses on the same VDP clock cycle. (and using buffering -ie FIFOs- to pipeline texture rendering would certainly facilitate that . . . and technically they could even have clocked the memory at 2x the VDP speed with 32-bit read/write buffers -even more so if those buffers could be configured as either 16 or 8-bit words to double the speed of 8-bit mode; that would have been much more efficient use of the already fast-rated SDRAM, and would have made VDP1 considerably faster than the PSX GPU . . . for that matter, they could have done the same for CPU memory on the 32x, with memory at 46 MHz and a 32-bit latch to allow 23 MHz 32-bit reads . . . or to reduce cost on the Saturn's main memory by using narrower SDRAM chips at 2x speed and a latch to make use of the bandwidth -Atari was actually considering doing that for the Jaguar II since SDRAM was becoming cheaper by that point and a 16-bit bus at 66.67 MHz would allow roughly the same bandwidth as 64-bit FPM DRAM at 16.67 MHz . . . or slightly more due to reduced wait states)

    And, again, I wonder if the 3DO takes advantage of parallel reads/writes . . . actually, in that case the GPU (Cel Engine) is clocked at 2x the bus speed (80 ns FPM DRAM and VRAM), so it wouldn't even necessarily be using FIFOs, but rather doing reads (to DRAM) on 1 cycle and writes (to VRAM) on the next. (sort of like what the MD's VDP DMA does with ROM at 1/2 the speed of the VDP, except VRAM isn't at 1/2 speed too)
    I'm not sure how much actual buffering the 3DO has (let alone for different drawing modes), but with the separate buses for textures and framebuffer (and dual-port VRAM cutting out any contention for framebuffer scanning), and with 32-bit wide 80 ns RAM, the 3DO's Cel Engine could potentially push 16bpp textures at 25 Mpix/s peak (which would be fairly close to the Saturn or PSX). Albeit there's the issue of CPU/GPU contention for main RAM (where textures are stored) and the lack of a CPU cache (and a relatively slow CPU in general) greatly exacerbating the issue. Plus that's just the potential that the RAM would allow . . . I'm not sure if the Cel can actually do that. (though having that sort of parallel rendering would make the separate bus for textures much more sensible -otherwise they could have had the Cel work on the video bus alone with 2 interleaved banks with a DRAM texture buffer and VRAM framebuffer -so no parallel reads/writes, but still using FPM bandwidth- . . . or dropped the VRAM entirely in favor of DRAM framebuffer banks more like the 32x -for that matter, having 1 MB of VRAM as a glorified framebuffer was rather excessive for the 3DO in general . . . 512k should have been fine for the time, or 256k at a consrvative minimum -though given the deluxe nature of the 3DO, 512k would make more sense . . . 256k would only allow 320x204 16bpp double buffered or 288x224, and you'd need 450k to double buffer 320x240x24-bpp -ie for VCD support, or potentially for Truecolor FMV . . . the only thing you'd need 1 MB for is 640x480 truecolor still images)









    Quote Originally Posted by Crazyace View Post
    It makes sense if you think of VDP1 as a souped up arcade sprite board ( system X or Y ) with good performance for scaled sprites - general performance would depend on the page break penalty ( or half of it given that the SDRAM supports 2 open banks )
    It seems more like an evolution of the MCD's ASIC/blitter . . . did the X/Y boards even use framebuffers? (they had no support for sprite rotation/warping -if they had had simple affine mapping like the MCD ASIC, that could have made for some interesting 3D/Pseudo 3D games beyond the scaled sprite stuff)

    My problem was I just hadn't considered parallel reads/writes to both buses on the same VDP clock cycle, but again, that makes the use of independent buses for source and destination much more sensible. (otherwise you could just use 2 separate/interleaved banks on a shared bus for lower cost/complexity -like the Jaguar supported for ROM and the 2nd DRAM bank)

    Haven't looked into the 3DO for a long long time - what width was the VRAM? ( and how fast was it clocked ) - I remember some comments about 2 render engines to speed things up, but everything was high level.
    I'm not sure how much low-level documentation was ever available for the 3DO (after all, developers had to use 3DO's libraries alone . . . so any such documentation would have had to been leaked separate from official dev manuals).
    However, from what I recall seeing (and some comments from Kskunk), the 3DO used 80 ns 32-bit FPM DRAM for main RAM and 80 ns 32-bit VRAM for the framebuffer (I believe 4 512kx8-bit chips for main and 2 256x16-bit dual port VRAMs)

    Albeit, texture mapping rates in tech demos/benchmarks for the 3DO (or quoted textels/s figures) should shed some light on the 3DO's peak texture bandwidth.

    If it didn't do parallel reads/writes, using separate buses for source/destination would have been rather wasteful (since 2 banks on the same bus would give the same bandwidth)

    Again it makes sense if you think of VDP1 as a arcade sprite part - supporting rotated scan out of the frame buffer to display rather than rotating when drawing.
    No, I mean it's odd that VDP1 has traces that connect separate address/data lines to each of the framebuffers rather than a single set of address/data lines that connects to a middle man (buss steering chip) that sits between VDP1 and VDP2. (unless VDP1 has that logic built into itself and has access ports connecting to VDP2)

    I wasn't surprised that the 2 framebuffers were flipped between 2 buses . . . that's pretty obvious: 1 RAM bank is on VDP2's bus while the other is on VDP1's (just like the 32x framebuffers or MCD word RAM in flip-flop mode)

    I read it as supporting two open banks at once which would be useful for speeding up copying memory in general. Obviously it's a ram feature, so dependant on the controller. ( Also why the reference to the 32X - this is the chip on VDP1 )
    Yes, so similar to having 2 separate 64kx16-bit SDRAMs, but with 1/2 the board space/traces needed. (and not so much like SGRAM, since that allows 2 pages to be opened anywhere in the chip -not just 2 distinct banks)
    So, again, rather like the Jaguar's bank interleaving.

    And, again, since the 32x uses the same 128kx16-bit SDRAM, that could mean a performance boost if the master/slave SH2s kept mostly to separate 128k banks. (fewer page breaks)
    Hmm, for that matter, it would mean less buffering was needed for the way SDRAM was used in the Genesis 3. (a single 128kx16-bit SDRAM chip was used in place of 68k work RAM -normally 32kx16-bit PSRAM- and VRAM -normally 64kx8-bitx2 port VRAM- . . . and while a single 16-bit SDRAM chip could potentially provide enough bandwidth to cover both of those buses seamlessly, it would require much less buffering if said chip had 2 banks allowing 1 page to be held open in each)

    Page breaks would still be the same if the clock was higher though I guess.
    I was thinking in terms of having 32-bit latches/buffers for reads/writes . . . so for page mode accesses, a 32-bit read/write (on a double speed 16-bit bus) would be just as fast as a 16-bit read/write on the normal speed equivalent (albeit only useful when reading/writing 32-bits -or 2 consecutive 16-bit words or 4 consecutive bytes).
    Now, for cases of page breaks, you already have the separate buses (and 2 banks per SDRAM chip) to greatly reduce that overhead, but the 32-bit reads/writes themselves would still roughly double bandwidth even with page changes. (since the 2nd 16-bit word read will always been an FPM access and only 1/2 a VDP clock cycle, so rowchange+read+read -or write+write)

    Actually, having double clocked RAM might even speed things up for plain 16-bit reads/writes due to faster RAM cycle times allowing fewer wait states for random (non page mode) accesses. Having DRAM clocked at 57.3/53.7 MHz vs 28.6/26.6 MHz might make the difference of 3 vs 4 cycles for random access memory cycle times . . . actually, it might get it down to 2 VDP cycles. (I'd have to see the actual SDRAM cycle times for those speeds) For that matter, it might even be able to squeeze 1 random access + 1 page mode access (ie for a 32-bit read/write) into 2 VDP clock cycles.

    Hmm, and unless you optimized for 8-bpp rendering, you'd only ever need to deal with 2 pixels at a time (for 32-bit reads/writes), so you still would avoid needing to read the framebuffer before a write (if 1 of the 2 pixels being written is transparent, just do a single 16-bit write), though you'd need to read the framebuffer for translucent blending effects.








    Quote Originally Posted by Chilly Willy View Post
    Which is why I said reading from cache/SDRAM/cart. Reading from the frame buffer is slower than writing it, but faster than reading uncached SDRAM or about the same speed as reading the cart. So it would be beneficial in some cases to use frame buffer memory for storing data. That would be more useful with 256 color displays than 16-bit color as the latter takes most of the frame buffer. As an example, cell data for a 2D game would not be a good thing to store as you would be constantly reading/writing the frame buffer at the same time, but name table data (to use the MD term) would be fine as you wouldn't be reading that data at the same time you are trying to write the frame buffer.
    Reads from SDRAM are slower than the framebuffer?
    By "uncached", I assume you mean non-sequential (ie non page mode) accesses to SDRAM . . . so comparing that to best case (page mode) accesses to framebuffer DRAM (which is at 7.67 MHz - so 3 SH2 clocks for a complete FPM read or write), that would mean SDRAM takes 4 cycles for a random access? (I can't imagine it being any slower than that, since that's already 173.8 ns . . .)

    However, you didn't mention cached/sequential reads to the framebuffer . . . so are you implying page-mode DRAM reads, or not? (since random accesses of 7.67 MHz -130 ns- FPM DRAM should be somewhere around 230 ns . . . or 6 SH2 cycles -though I think Tiido mentioned that worst-case access time for the framebuffer was 7 cycles)
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

  10. #205
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,832
    Rep Power
    51

    Default

    Quote Originally Posted by kool kitty89 View Post
    Reads from SDRAM are slower than the framebuffer?
    By "uncached", I assume you mean non-sequential (ie non page mode) accesses to SDRAM . . . so comparing that to best case (page mode) accesses to framebuffer DRAM (which is at 7.67 MHz - so 3 SH2 clocks for a complete FPM read or write), that would mean SDRAM takes 4 cycles for a random access? (I can't imagine it being any slower than that, since that's already 173.8 ns . . .)

    However, you didn't mention cached/sequential reads to the framebuffer . . . so are you implying page-mode DRAM reads, or not? (since random accesses of 7.67 MHz -130 ns- FPM DRAM should be somewhere around 230 ns . . . or 6 SH2 cycles -though I think Tiido mentioned that worst-case access time for the framebuffer was 7 cycles)
    The SDRAM is set to do burst reads, and non-burst writes. A burst read fetches 8 words in 12 cycles - REGARDLESS of cached or uncached. When doing a cached read, the eight words exactly fill one cache line. When doing an uncached read, the one word required is used and the other seven are discarded. So reading a word in uncached mode takes 12 cycles! Reading a long takes 24 cycles!! So when using uncached memory for cross SH2 communications, use WORDS whenever possible. In some cases, flushing the caches and then doing cached reads will be much faster than using uncached memory.

    The non-burst write takes 2 cycles to write a word to SDRAM and is the same for cached or uncached.

    The frame buffer uses non-burst reads and writes, but has to wait on the FIFO because the DRAM used for the frame buffer is slower than the SH2. If the FIFO is empty, it takes 3 cycles to write a word; if it's full, it takes 5 cycles. So when copying large blocks to the frame buffer, the average time to write a word is 5 cycles. So doing "stuff" between writes to the frame buffer makes better use of CPU time as you can do things while waiting for the FIFO to empty. Reading the frame buffer takes 6 cycles minimum, and 13 max. Writing being faster than reading is common where frame buffers are concerned. It gets ludicrously lopsided on many PC cards with writing being in the GBytes/sec while reading is in the low MBytes per second. 1000:1 seems absurd, but that's the way they are.

    Non-burst cycles are the same regardless of cached or uncached - they use single random cycles with optional waits. It's only burst transfers that have such a lopsided difference between cached and uncached as explained above.

  11. #206
    Wildside Expert
    Join Date
    May 2011
    Posts
    144
    Rep Power
    4

    Default

    Quote Originally Posted by kool kitty89 View Post
    It seems more like an evolution of the MCD's ASIC/blitter . . . did the X/Y boards even use framebuffers? (they had no support for sprite rotation/warping -if they had had simple affine mapping like the MCD ASIC, that could have made for some interesting 3D/Pseudo 3D games beyond the scaled sprite stuff)
    The Arcade boards had scaled sprites - and then rotation of the sprite buffer on scanout ( for galaxy force ) - which is one of the VDP1 features. Look at the mame source ( video\segaic16.c ) it talks about frame buffer sprites starting with the Outrun boards.

    Quote Originally Posted by kool kitty89 View Post
    I'm not sure how much low-level documentation was ever available for the 3DO (after all, developers had to use 3DO's libraries alone . . . so any such documentation would have had to been leaked separate from official dev manuals).
    However, from what I recall seeing (and some comments from Kskunk), the 3DO used 80 ns 32-bit FPM DRAM for main RAM and 80 ns 32-bit VRAM for the framebuffer (I believe 4 512kx8-bit chips for main and 2 256x16-bit dual port VRAMs)

    Albeit, texture mapping rates in tech demos/benchmarks for the 3DO (or quoted textels/s figures) should shed some light on the 3DO's peak texture bandwidth.

    If it didn't do parallel reads/writes, using separate buses for source/destination would have been rather wasteful (since 2 banks on the same bus would give the same bandwidth)
    It was specced at 50MBytes/second ( 12.5MHz@32bit ) with 9-16MPixels peak fillrate - Interesting limitation less than your 25MPixel cap. ( I wonder if the 2 renderers work on seperate lines and combine into 32bit writes as much as possible )

  12. #207
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,171
    Rep Power
    50

    Default

    Quote Originally Posted by Crazyace View Post
    The Arcade boards had scaled sprites - and then rotation of the sprite buffer on scanout ( for galaxy force ) - which is one of the VDP1 features. Look at the mame source ( video\segaic16.c ) it talks about frame buffer sprites starting with the Outrun boards.
    The only rotation I know of was full layer rotation (several Sega Arcade boards supported a sprite plane that could be rotated -but never on a per-sprite basis, and always with an additional non-rotated sprite plane . . . and usually additional text/character planes)

    It was specced at 50MBytes/second ( 12.5MHz@32bit ) with 9-16MPixels peak fillrate - Interesting limitation less than your 25MPixel cap. ( I wonder if the 2 renderers work on seperate lines and combine into 32bit writes as much as possible )
    The 25 Mpix comment was for theoretical peak throughput for block copy of 16-bit pixels. (with no added waits between reads and writes)
    16 Mpix/s would correspond to 3 25 MHz cycles per 32-bit write . . . so there's an added cycle for something there. (possibly a read to the framebuffer prior to the write -though, again, since you'd only ever write a single pixel or a consecutive pair at any time -simply not writing transparent pixels, that step could be avoided anyway -aside from doing blending effects-)

    I'd imagine simple block/line filling would be done at closer to 1 clock per 16-bit pixel.

    In any case, that figure is close enough that it would have been reasonably competitive with the PSX/Saturn graphics bandwidth . . . if not for the CPU contention issue. (and overall CPU performance limiting things even further -and shared audio bandwidth having some impact too) And, of course, the limitation of programming only to 3DO's libraries also limited potential performance.



    On another note, looking at the hitachi SDRAM datasheet, the double clocked SDRAM idea wouldn't have been possible at all it seem. It lists the max system clock cycle time as 30/34/40 ns (for the 3 grades listed -66 MHz, 58 MHz, and 50 MHz), so the RAM Sega was using was already being pushed to the rated speed. (the clock speeds are 2x the data/bus rates . . . which is how most/all DRAM -and DRAM controllers- are clocked . . . thus the RAM in question would be 30/34/40 ns DRAM -so similar in speed to commonly available EDO DRAM at the time, though with the other performance advantages of SDRAM -and easier interfacing, plus the dual bank for these specific chips)

    Oh, and it seem that random accesses to SDRAM can already be done in 2 VDP/CPU cycles . . . especially given Chilly Willy's comment on 32x SDRAM timing.








    Quote Originally Posted by Chilly Willy View Post
    The SDRAM is set to do burst reads, and non-burst writes. A burst read fetches 8 words in 12 cycles - REGARDLESS of cached or uncached. When doing a cached read, the eight words exactly fill one cache line. When doing an uncached read, the one word required is used and the other seven are discarded. So reading a word in uncached mode takes 12 cycles! Reading a long takes 24 cycles!! So when using uncached memory for cross SH2 communications, use WORDS whenever possible. In some cases, flushing the caches and then doing cached reads will be much faster than using uncached memory.
    Hmm, and these are all SH2 clock cycles?

    That would mean 1.5 SH2 (23.01 MHz) cycles per burst read to FPM DRAM . . . which would be 65.19 ns read cycle times, which would mean that RAM is definitely not clocked at 7.67 MHz as Tiido implied a while back, but almost certainly at 15.34 MHz (ie 2x the MD clock rate) . . . though that would technically be overclocking the 80 ns FPM DRAM they used. (though certainly well within the capabilities of FPM DRAM grades in common use at the time -which went up to 45 ns)
    Err. . . technically the DRAM would be 2x all of those clock rates listed above (as would the DRAM controller), though the FPM/burst read/write cycle times (and effective bus speed) would be the rates I listed.

    The non-burst write takes 2 cycles to write a word to SDRAM and is the same for cached or uncached.
    That's quite good . . . only 87 ns for a random (non-page-mode) read/write.
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

  13. #208
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,832
    Rep Power
    51

    Default

    Quote Originally Posted by kool kitty89 View Post
    Hmm, and these are all SH2 clock cycles?
    Yes.


    That would mean 1.5 SH2 (23.01 MHz) cycles per burst read to FPM DRAM . . . which would be 65.19 ns read cycle times, which would mean that RAM is definitely not clocked at 7.67 MHz as Tiido implied a while back, but almost certainly at 15.34 MHz (ie 2x the MD clock rate) . . . though that would technically be overclocking the 80 ns FPM DRAM they used. (though certainly well within the capabilities of FPM DRAM grades in common use at the time -which went up to 45 ns)
    Errr - the main memory is SDRAM, not FPM DRAM. It's connected directly to and controlled by the SH2(s). It's all handled by the SH2 bus controller. You might want to look at the SH2 hardware manual to see how burst and non-burst SDRAM bus cycles are handled exactly.

    The frame buffer is DRAM. That could be clocked using some multiple of the MD clock since it's writable by both the SH2 (through the control chip through a FIFO), or by the 68000.

  14. #209
    Wildside Expert
    Join Date
    May 2011
    Posts
    144
    Rep Power
    4

    Default

    Quote Originally Posted by kool kitty89 View Post
    The only rotation I know of was full layer rotation (several Sega Arcade boards supported a sprite plane that could be rotated -but never on a per-sprite basis, and always with an additional non-rotated sprite plane . . . and usually additional text/character planes)
    Yes - that's the rotation I'm referring to. It's really superflous if you have full rotation in drawing, but is still present in VDP1/VDP2 - which is why I treat VDP1 as an extended version of the system Y sprite board with added gourard shading/general quad texturing.


    Quote Originally Posted by kool kitty89 View Post
    The 25 Mpix comment was for theoretical peak throughput for block copy of 16-bit pixels. (with no added waits between reads and writes)
    16 Mpix/s would correspond to 3 25 MHz cycles per 32-bit write . . . so there's an added cycle for something there. (possibly a read to the framebuffer prior to the write -though, again, since you'd only ever write a single pixel or a consecutive pair at any time -simply not writing transparent pixels, that step could be avoided anyway -aside from doing blending effects-)
    Does the 3D0 have a 25MHz clock - it doesn't make sense - a 12.5MHz clock matches the cpu clock and memory bandwidth better.


    Quote Originally Posted by kool kitty89 View Post
    In any case, that figure is close enough that it would have been reasonably competitive with the PSX/Saturn graphics bandwidth . . . if not for the CPU contention issue. (and overall CPU performance limiting things even further -and shared audio bandwidth having some impact too) And, of course, the limitation of programming only to 3DO's libraries also limited potential performance.
    Not sure about that - it's better than the Jaguar though

  15. #210
    Hero of Algol kool kitty89's Avatar
    Join Date
    Mar 2009
    Location
    San Jose, CA
    Age
    23
    Posts
    9,171
    Rep Power
    50

    Default

    Quote Originally Posted by Chilly Willy View Post
    Errr - the main memory is SDRAM, not FPM DRAM. It's connected directly to and controlled by the SH2(s). It's all handled by the SH2 bus controller. You might want to look at the SH2 hardware manual to see how burst and non-burst SDRAM bus cycles are handled exactly.
    Oh, I misread part of your previous post as commenting on the framebuffer DRAM rather than SDRAM . . . (the 8 words in 12 cycles would make much more sense for a burst SDRAM access, though I'd have thought it would be a bit less than that -like 9 or 10 cycles for 8 words accessed in the same page . . . 9 cycles assuming the first read took 2 cycles and the other 7 took 1 cycle each)

    The frame buffer is DRAM. That could be clocked using some multiple of the MD clock since it's writable by both the SH2 (through the control chip through a FIFO), or by the 68000.
    OK, but you implied that reading from the framebuffer is sometimes faster than reading from SDRAM:
    Quote Originally Posted by Chilly Willy View Post
    Reading from the frame buffer is slower than writing it, but faster than reading uncached SDRAM or about the same speed as reading the cart. So it would be beneficial in some cases to use frame buffer memory for storing data.
    So how would read times compare in this case?
    6 days older than SEGA Genesis
    -------------
    Quote Originally Posted by evilevoix View Post
    Dude it’s the bios that marries the 16 bit and the 8 bit that makes it 24 bit. If SNK released their double speed bios revision SNK would have had the world’s first 48 bit machine, IDK how you keep ignoring this.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •