View Full Version : Bad Apple demo thread
So, AnalYoGirl uploaded version 0.07:
http://hotfile.com/dl/130776084/c73493f/apple_0.07.zip.html
Updates I found so far:
Some playback controls
Four-level grayscale
Two modes: 15 FPS without any artifacts or 30 FPS with vertical interlacing
The speech problem was... um... hidden :v
Making this a new thread to avoid bumping an old one for something that wouldn't even be on-topic (like that even exists in Sega-16) and to avoid triple posting =|
I'm glad to see a topic for this demo :)
I find it really impressive so far and if i find sometime i would try to test some stuff about it ;)
Edit :
Done :) The total rom size is 8 MB but it does play at full 320x224 resolution, 30 FPS playback and 2bpp color.
Sound is 4bit ADPCM @13Khz.
Sources code (also contains first part video data) :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_src.7z
YouTube :
www.youtube.com/watch?v=2vPe452cegU
Final version 4 MB version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_p1.bin
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_p2.bin
Final version 8 MB version (without bank switch) :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple.bin
Note that the 8 MB version can work only with Mega Everdrive or custom flash cart supporting full 8 MB mapping (without SSF2 bank switch style).
Also some special emulator can support it as well (as this (http://umk3.hacking-cult.org/2.11hack.zip) one).
You can download previous versions here :
version 1 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple1.bin
version 2 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple2.bin
version 3 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple3.bin
version 4 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple4.bin
version 5 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple5.bin
version 6 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple6.bin
version 7 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple7.bin
version 8 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple8.bin
version 9 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple9_p1.bin
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple9_p2.bin
final version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_p1.bin
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_p2.bin
tomaitheous
10-12-2011, 12:02 PM
I'm in the middle of writing a compressor for Bad Apple in 2bit format. It's fairly extensive. I can wait to see what sort of overall compression ratio I get.Though.. I'm only using 3 shades instead of 4, for better compression.
I'm also trying to work on that, what i initially want to do is to conserve 8 gray levels with lossless compression.
Right now for a 320x224 resolution (from a smooth reduction of an higher resolution) it take about 13 MBytes of pure data :p
But still i've some stuff i can improve ;)
tomaitheous
10-16-2011, 04:59 PM
What kind of compression scheme are you using?
I just got my compressor working with in the first few compression stages and it's working out pretty good. I got typical type frame of 256x192 for 12288 bytes (2bit) compressed currently down to 806bytes. Though I still need to implement more of the sub level schemes.
cdoty
10-17-2011, 08:56 PM
I had it down to less than 8MB, with a specialized run length compression, but I was only using 4 levels of grey.
I first detect tilemap changes between frame then i encode plain tile data as tilemap info. Remaining tiles are compressed using simple RLE (with 90° rotation if it provides better ratio). This is on 320x224 resolution video with 8 gray levels (actually it could be 16 as i encore the color on 4 bits).
Event with that i still obtain 13 MB of data (10 MB if i use a lower quality base video).
I now plan to use similar tiles to encode one tile from an already existing tile + different pixel infos to reduce size again.
kool kitty89
10-18-2011, 11:10 PM
I first detect tilemap changes between frame then i encode plain tile data as tilemap info. Remaining tiles are compressed using simple RLE (with 90° rotation if it provides better ratio). This is on 320x224 resolution video with 8 gray levels (actually it could be 16 as i encore the color on 4 bits).
Event with that i still obtain 13 MB of data (10 MB if i use a lower quality base video).
I now plan to use similar tiles to encode one tile from an already existing tile + different pixel infos to reduce size again.
Since you're only using 8 colors (shades), couldn't you optimize the RLE scheme for that:
ie use 1 bit to define whether the nybble will be a line or a single pixel, then read the other 3 bits to define the shade and a second nybble to define the length of the line (if not a single pixel), or (if a single pixel) the next nybble defines and the next pixel/line, and so on. (you could also then have the 2nd nybble define run length as 2-17 pixels rather than 1-16 -since you'd handle 1 pixel segments separately)
Does your current scheme also allow tile flips? (so any tiles that are H/V mirrors or other tiles can be omitted)
(you could also then have the 2nd nybble define run length as 2-17 pixels rather than 1-16 -since you'd handle 1 pixel segments separately)If you use two nibbles for RLE, then a two pixel run will take up the same space whether compressed or not. As such, wouldn't it be better to define the range as 3-18?
kool kitty89
10-19-2011, 03:00 AM
If you use two nibbles for RLE, then a two pixel run will take up the same space whether compressed or not. As such, wouldn't it be better to define the range as 3-18?
Yes, that's a good point, and the same would apply to a 128 color scheme using 1 or 2 bytes rather than nybbles. (the CD-i's video chipset actually supports such a scheme in hardware iirc, though I'm not sure it uses 3-258 for the run lengths)
Since you're only using 8 colors (shades), couldn't you optimize the RLE scheme for that:
ie use 1 bit to define whether the nybble will be a line or a single pixel, then read the other 3 bits to define the shade and a second nybble to define the length of the line (if not a single pixel), or (if a single pixel) the next nybble defines and the next pixel/line, and so on. (you could also then have the 2nd nybble define run length as 2-17 pixels rather than 1-16 -since you'd handle 1 pixel segments separately)
Does your current scheme also allow tile flips? (so any tiles that are H/V mirrors or other tiles can be omitted)
Yeah i guess i can spare some bytes with that but i do not think it can improve that much the actual compression ratio... Anyway that s some bytes, i ve to test that ;)
Initially i wanted to stay with 16 colors as i planned to code a generic video encoder but if i want to achieve descent compression ratio for this video i have to use its specificities...
I tried to use the H / V Flip but on a total of 950000 tiles i found only about 10000 copies (and this is, by using H and V flip capabilities)... So now i'm just ignoring copies as we gain so few from them.
I tried to use the H / V Flip but on a total of 950000 tiles i found only about 10000 copies (and this is, by using H and V flip capabilities)... So now i'm just ignoring copies as we gain so few from them.I assume you'd get a larger gain if the format was lossy.
Of course :) But i first want try lossless compression, then if i can't reduce that much i'll switch to lossy compression...
sega16
10-19-2011, 04:40 PM
I tried to use the H / V Flip but on a total of 950000 tiles i found only about 10000 copies (and this is, by using H and V flip capabilities)... So now i'm just ignoring copies as we gain so few from them.that is still alot in my option.
A saving of ~1.05% isn't a lot in my opinion.
It seems a lot if we count 10000 x 32 bytes (unpacked) but as Sik said that represent only 1% of total size, really not enough to care about.
I modified my RLE code to handle only 8 colors and adjusted to handle repetition to 3-18, now the tiles data size a bit less than 10 MB.
I forgot to mention but i also have the tilemap data which represent 1.4 MB using RLE compression :p
Stupid question, what video are you trying to encode?
The one we are talking about in this topic :D
I used a HQ source as this one : www.youtube.com/watch?v=G3C-VevI36s
sega16
10-19-2011, 09:43 PM
The one we are talking about in this topic :D
I used a HQ source as this one : www.youtube.com/watch?v=G3C-VevI36s
I don't get it why bad apple how about some thing useful (http://www.youtube.com/watch?v=uVM-74K50RM) and yes I consider the sega intro useful.Or how about something we all love to listen to (http://www.youtube.com/watch?v=dQw4w9WgXcQ&ob=av2e) Also did you think about reusing a tile if it is the same the next frame.All you would do is make the tile that stays constant for x amount of frames loaded last in vram so it does not get overwritten the next frame and just not store the tile for the next frame but you would end up using a tile map so the vdp knows were the reused tile is.
I don't get it why bad apple
Blame AnalYoGirl, and in turn blame Touhou fans =P
kool kitty89
10-20-2011, 12:34 AM
A saving of ~1.05% isn't a lot in my opinion.
It seems a lot if we count 10000 x 32 bytes (unpacked) but as Sik said that represent only 1% of total size, really not enough to care about.
I modified my RLE code to handle only 8 colors and adjusted to handle repetition to 3-18, now the tiles data size a bit less than 10 MB.
I forgot to mention but i also have the tilemap data which represent 1.4 MB using RLE compression :p
There's also the information needed to specify/map those repeated tiles. (so it could potentially take more space, depending how you packed things)
Blame AnalYoGirl, and in turn blame Touhou fans =P
Plus, the heavily stylized animation allows for some novel compression techniques less practical for other types of video. (which would be better with something more like Sega's Cinepak . . . except you wouldn't be able to manage the super low bitrates these demos are using, and a cart-specific codec could optimize around random access to any point in ROM vs linear streaming off CD -so highly variable bitrates, large codebooks, etc become practical)
OTOH, it would be interesting to see what sort of compression you could get for full-color FMV using a format heavily oriented around the MD VDP tilemap scheme. (lossy compression using repeated tiles and interframing on a tile basis -possibly with further compression of the individual tiles, perhaps somewhat like cinepak or just RLE)
It would probably look worse for most things than something more like Sega's Cinepak (at least for a case not further compressing tiles -ie compressing only though using repeating tiles with uncompressed tiel data), but it would still be interesting to see. (there's probably more worthwhile routes to take for improving upon Sega's Cinepak format -or something similar)
You could also do something really simple like this, with a fixed set of character patterns and colors (which I assume is how Analyogirl's format works, though without color obviously)
http://www.youtube.com/watch?v=BrwGxwLuo5I
BrwGxwLuo5I
Except, unlike CGA video you wouldn't be stuck with a fixed 16 color palette, could use more than 2 colors per tile, and could use custom character patterns -and more than 256 characters. (so even with a fixed character set and fixed palettes for the entire video, you'd be a good bit better off than that)
Plus, the heavily stylized animation allows for some novel compression techniques less practical for other types of video.
Honestly I think the only reason Bad Apple was used is because AnalYoGirl's codec was originally monochrome and that video lends itself well to being downgraded to just two colors (since most of the video is indeed monochrome).
(which I assume is how Analyogirl's format works, though without color obviously)
0.7 is completely different, especially since it added support for four colors (also resolution is odd - either you get proper updates at 15 FPS or interlaced updates at 30 FPS).
kool kitty89
10-20-2011, 04:30 AM
0.7 is completely different, especially since it added support for four colors (also resolution is odd - either you get proper updates at 15 FPS or interlaced updates at 30 FPS).
Yes, I was thinking of the earlier builds.
Hmm, on that note though, how is it doing the interlaced updates like that? (can the VDP's DMA be set-up to copy every other pixel like that?) The 15 FPS updates (for 256x216 with 44 lines of DMA time per frame; 44x166x4=29216 bytes of bandwidth per 15 Hz screen) is pretty straightforward, but the interlacing route seems interesting.
Oh, and Stef, what frame rate is your compression format using?
30 FPS. But i've to test that i can actually maintain it on the real hardware :p
From some tests i need to upload up to 400 tiles by "movie frame" (and this is on several consecutive frames so i have to keep that rate).
This do 200 tiles per NTSC frame ~ 6400 bytes + tilemap update.
I hope to reduce this value by using updating only tiles delta in VRAM when this is possible but still i would need to heavily use the DMA :-/
That need optimizations anyway, 400 tiles per movie frame is a lot, it would be nice to reduce that number to 250 or something like that.
Hmm, on that note though, how is it doing the interlaced updates like that? (can the VDP's DMA be set-up to copy every other pixel like that?) The 15 FPS updates (for 256x216 with 44 lines of DMA time per frame; 44x166x4=29216 bytes of bandwidth per 15 Hz screen) is pretty straightforward, but the interlacing route seems interesting.
Nope, interlacing isn't done by DMA - it's just skipping every pixel horizontally, but with DMA the best you can get is a granurality of 4 consecutive pixels.
It looks like it's just CPU bound. I guess it's using both tilemaps and then just processing the pixels as if they were bytes (leaving one of the nibbles as always zero so the other tilemap can show through). Doesn't seem to be anything interesting from that viewpoint, although admittedly the higher framerate looks better despite the interlacing getting in the way.
Chilly Willy
10-20-2011, 03:01 PM
Nope, interlacing isn't done by DMA - it's just skipping every pixel horizontally, but with DMA the best you can get is a granurality of 4 consecutive pixels.
One interlace mode leaves the cells as 8x8 and just does the same display for each field. The other interlace mode expands the cells to 8x16 and does every other line in the cells on every other field.
One interlace mode leaves the cells as 8x8 and just does the same display for each field. The other interlace mode expands the cells to 8x16 and does every other line in the cells on every other field.
He was referring to the interlacing in the video... (press B in the 0.7 build to see what I mean), not the VDP's interlaced resolutions...
EDIT: also the interlaced rendering has another advantage over non-interlaced rendering for some reason. Compare the amount of shades:
http://www.mdscene.net/user/~sik/.junk/apple_0.07003.png
http://www.mdscene.net/user/~sik/.junk/apple_0.07001.png
I would have shrunk the screenshots down but it turns out they're running at 256×224 :/ Remind me to set Fusion to output raw screenshots next time.
kool kitty89
10-20-2011, 04:06 PM
Nope, interlacing isn't done by DMA - it's just skipping every pixel horizontally, but with DMA the best you can get is a granurality of 4 consecutive pixels.
It looks like it's just CPU bound. I guess it's using both tilemaps and then just processing the pixels as if they were bytes (leaving one of the nibbles as always zero so the other tilemap can show through). Doesn't seem to be anything interesting from that viewpoint, although admittedly the higher framerate looks better despite the interlacing getting in the way.
OK, so CPU based updates? (though that would seem like a lot of data for the 68k to copy within vblank -assuming it's a full bitmap and not specifically limiting/sorting tile updates)
If updates had been interleaved on a scanline basis (ie every other line rather than every other column), you wouldn't have to bother working with nybble boundaries like that. (and DMA should be practical)
I assume the use of columns was to facilitate horizontal blending via composite video (or emulator filters) . . . though it could also exploit the linescroll h-blur technique to blend the interleaved columns.
One interlace mode leaves the cells as 8x8 and just does the same display for each field. The other interlace mode expands the cells to 8x16 and does every other line in the cells on every other field.
The context was for using actual 480i interlaced modes, but interlaced (interleaved) updates to the video, specifically updating every other pixel column each frame (so effectively interlacing 2 128x216 frames at 30 Hz to 256x216).
He was referring to the interlacing in the video... (press B in the 0.7 build to see what I mean), not the VDP's interlaced resolutions...
EDIT: also the interlaced rendering has another advantage over non-interlaced rendering for some reason. Compare the amount of shades:
It's not just the shades either, but the actual granularity of the pixels . . . or, rather the 15 Hz mode doesn't improve the spacial resolution. (granted, that would really need a separate demo that was optimized for 15 Hz video with a higher resolution -rather than just using 128x216 with 4 shades . . . though I'm not sure why the non-interlaced mode couldn't have interpreted that to 128x216 with 8 shades at least)
The 30 Hz mode has better shading and better preservation of the motion of the original animation.
OK, so CPU based updates? (though that would seem like a lot of data for the 68k to copy within vblank -assuming it's a full bitmap and not specifically limiting/sorting tile updates)
No, I meant processing bound (decompression, rendering, etc., not the transfer to VRAM).
If updates had been interleaved on a scanline basis (ie every other line rather than every other column), you wouldn't have to bother working with nybble boundaries like that. (and DMA should be practical)
I think the issue here isn't updating (since ultimately it's still transfering the same amount of data to VRAM, for the record), but processing the video stream. By doing it this way, each pixel is effectively one byte, making it much easier to process. This is also true of the non-interlaced mode actually, but since you need to fill the second nibble and that requires bit shifting, OR operations and and a spare register, that imposes a performance penalty (while with interlacing you get it "for free" without doing anything extra).
It's not just the shades either, but the actual granularity of the pixels . . . or, rather the 15 Hz mode doesn't improve the spacial resolution.
I think both modes have the same resolution, but the interlacing makes the 30 FPS mode look like it has a higher resolution because of the way pixels get updated. Not the first time AnalYoGirl pulls off something like this =P
EDIT: also pretty sure the resolution is halved vertically as well, so if anything, it isn't 128×216, but 128×108 (didn't check the actual resolution though, I'm going by what kool kitty said).
sega16
10-20-2011, 06:12 PM
Blame AnalYoGirl, and in turn blame Touhou fans =P
I googled touhou and got:
http://en.wikipedia.org/wiki/Touhou_Project
read the article and it did not mention bad apple in the article so what is this touchou thing you speak of?
Also:
http://img832.imageshack.us/img832/4335/goodapple.png
and just like zun I develop my images with Photoshop
ZUN develops his games with Visual Studio, Adobe Photoshop, and Cubase SX, according to his interview in Bohemian Archive in Japanese Red.[15]
read the article and it did not mention bad apple in the article so what is this touchou thing you speak of?
Bad Apple is a fan video =/
Lyrics (http://touhou.wikia.com/wiki/Lyrics:_Bad_Apple!!) :v
sega16
10-20-2011, 06:31 PM
that makes more since and is the monochrome footage the video or the alpha channel of the original video? so the foreground can be placed on a still image background similar to what some philips cd-i games did such as hotel Mario.The reason for this was so it would compress better by not trying to store the background for a 1000 frames or so.
that makes more since and is the monochrome footage the video or the alpha channel of the original video?
The video is like that.
kool kitty89
10-20-2011, 08:25 PM
that makes more since and is the monochrome footage the video or the alpha channel of the original video? so the foreground can be placed on a still image background similar to what some philips cd-i games did such as hotel Mario.The reason for this was so it would compress better by not trying to store the background for a 1000 frames or so.
Here's the original video:
http://www.youtube.com/watch?v=UkgK8eUdpAo
UkgK8eUdpAo
There have also been some fan video responses using full color rendered 3D models. (all of those were done after the fact and all look more ammeturish from what I've seen -and obviously none was part of the original animation -ie not related to the original 3D models used for the animation)
I think the issue here isn't updating (since ultimately it's still transfering the same amount of data to VRAM, for the record), but processing the video stream. By doing it this way, each pixel is effectively one byte, making it much easier to process. This is also true of the non-interlaced mode actually, but since you need to fill the second nibble and that requires bit shifting, OR operations and and a spare register, that imposes a performance penalty (while with interlacing you get it "for free" without doing anything extra).
OK, then my original question still stands: how could it be transferring 30 FPS in H32, even assuming the screen is clipped to 216 pixels? You'd only get 44 lines of DMA tops per 60 Hz frame, so 7304 bytes per frame and up to 288 tiles per 60 Hz frame or 456 tiles at 30 FPS. (with no tilemap updates) So only slightly more than 1/2 of what you'd need for a 256x216 screen.
That is, unless it's doing something like Stef's format and sorting out redundant tiles between frames and within frames, and limiting tile updates to available bandwidth constraints.
I think both modes have the same resolution, but the interlacing makes the 30 FPS mode look like it has a higher resolution because of the way pixels get updated. Not the first time AnalYoGirl pulls off something like this =P
It seems like double the horizontal resolution too, at least as much as interlaced video on a TV is double the vertical resolution over 240p (ie effective resolution is limited by combing artifacts, thus degrading areas of high motion, but improving resolution for areas of lesser motion) except in the Bad Apple demo, it's interlaced at 30 Hz rather than 60 (or 50 for PAL), so it's even more motion-sensitive. (the added "shades" of of the interlacing are sometimes intentional shading -obvious for nearly static frames where it's used as dithering, but mostly just combing artifacts -which look like motion blur when filtered/blended)
Another issue is that, unlike 240p 60 vs 480i 60, you don't have a higher effective framerate in the non-interlaced mode (for 480i vs 240p, it's effectively 30 vs 60 Hz), but 15 FPS vs 30 FPS. If the non-interlaced mode had simply used 30 Hz with each "field" displayed independently (like showing 480i video as 240p/60), it would be another story.
EDIT: also pretty sure the resolution is halved vertically as well, so if anything, it isn't 128×216, but 128×108 (didn't check the actual resolution though, I'm going by what kool kitty said).
It looks like much of the video has 2 pixel high steps, but a fair amount also has single pixel high steps, so it may be compression artifacting making it look lower res rather than actually being encoded/rendered at that lower res.
Well, for the interlaced mode at least . . . non-interlaced seems to be totally 128x108.
As for it being 216 lines, the actual displayed animation window is definitely within 216 lines, but I'm not positive that there's 44 lines available for DMA. (it's hard to tell where vblank actuallyends just by looking, but it seems like active display does extent at least slightly beyond 216 lines -given where the flickering garbage appears, it may be 217 lines active -see attachment)
sega16
10-20-2011, 08:39 PM
ok thanks for clearing that up must have been an artistic effect and I gotta admit black and white can sure look cool:
http://img832.imageshack.us/img832/4335/goodapple.png
I love my picture for some reason.
OK, then my original question still stands: how could it be transferring 30 FPS in H32, even assuming the screen is clipped to 216 pixels?
Pretty sure it isn't updating all the tiles.
tomaitheous
10-22-2011, 01:15 PM
Honestly I think the only reason Bad Apple was used is because AnalYoGirl's codec was originally monochrome and that video lends itself well to being downgraded to just two colors (since most of the video is indeed monochrome).
I'd say the NES/Famicom demo a year before doing the Bad Apple demo video might have been a big influence for AnalYoGirl's demo.
I'd say the NES/Famicom demo a year before doing the Bad Apple demo video might have been a big influence for AnalYoGirl's demo.
The point is that Bad Apple is mostly monochrome, with a few parts in gray shades that aren't important. Yet it's still pretty varied overall, and quite a long video. From a technical standpoint, if you're making a monochrome FMV format, Bad Apple looks like a good choice.
Here's where i am about compression for 320x224 resolution, 8 gray level with 30 FPS playback rate :
- 7.8 MB of tiles data
- 1.4 MB of tilemap data.
so still about 9.4 MB of data... way too much.
I reduced size by using difference on tiles placed in VRAM only, so i can improve the tiles data ratio by applying the same algo but on all tiles (it will take age for the algo to do that though).
Beside that i can gain a bit on tilemap data too (i used a bad RLE compression on this one).
sega16
10-24-2011, 07:38 PM
Here's where i am about compression for 320x224 resolution, 8 gray level with 30 FPS playback rate :
- 7.8 MB of tiles data
- 1.4 MB of tilemap data.
so still about 9.4 MB of data... way too much.
I reduced size by using difference on tiles placed in VRAM only, so i can improve the tiles data ratio by applying the same algo but on all tiles (it will take age for the algo to do that though).
Beside that i can gain a bit on tilemap data too (i used a bad RLE compression on this one).
how about 320x112 then line double it.Also I personally would use 61 colors and a tile map then what you would want to do is find all tiles the can be filliped or are the same.Then you need to look for tiles that are the same in any frame and load it in such a place were it would not get overwritten and change the tilemap data to match were the tile is now placed in the vram.And of course you could RLE compress the tile maps to save memory.
how about 320x112 then line double it.Also I personally would use 61 colors and a tile map then what you would want to do is find all tiles the can be filliped or are the same.Then you need to look for tiles that are the same in any frame and load it in such a place were it would not get overwritten and change the tilemap data to match were the tile is now placed in the vram.And of course you could RLE compress the tile maps to save memory.
An important point is that i really don't want to lose in resolution, i can try to reduce to 4 colors (instead of 8) to improve compression but i won't sacrifice the resolution. Also i tried to find tile copy (flipped or not) but it only gave me about 1% of copy on the whole tile data so i trashed the idea... maybe a bit too early as i could use it to encode some tiles from others ones with small deltas (only a few different pixels).
What do you mean by using 61 colors tilemap ?
kool kitty89
10-25-2011, 05:08 PM
An important point is that i really don't want to lose in resolution, i can try to reduce to 4 colors (instead of 8) to improve compression but i won't sacrifice the resolution. Also i tried to find tile copy (flipped or not) but it only gave me about 1% of copy on the whole tile data so i trashed the idea... maybe a bit too early as i could use it to encode some tiles from others ones with small deltas (only a few different pixels).
One thing about halving the vertical resolution would be halving DMA bandwidth needed (you could leave every other line blank and line double by using the 2nd BG plane with a 1 pixel vertical offset). Albeit, if bandwidth was the only issue, and you didn't want to clip the screen for more DMA time, you could opt to do interlaced updates at 60 Hz (ie updating 112 lines per 60 Hz frame, for an effective 30 Hz display -with some coming artifacts, but not hard mid-screen tearing)
What do you mean by using 61 colors tilemap ?
I think he means using all 4 palettes with optimized palette selections per-frame (ie grays mapped to different values in different palettes) to facilitate tilemap based compression. (ie far more re-used tiles, but using different palettes and flips to better optimize for that)
Granted, that would be even more useful for a lossy format.
bgvanbur
10-30-2011, 01:01 PM
It would probably look worse for most things than something more like Sega's Cinepak (at least for a case not further compressing tiles -ie compressing only though using repeating tiles with uncompressed tiel data), but it would still be interesting to see. (there's probably more worthwhile routes to take for improving upon Sega's Cinepak format -or something similar)
I guess its my job to make the Cinepak for Sega version. I took the youtube video and converted it to a 256x192 size 15 fps 8 color grayscale video with 16276 Hz mono audio for Cinepak for Sega. It resulted in 9950039 video bytes and 3511593 audio bytes. This means my Cinepak for Sega encoder reduced the video data to 12% of the uncompressed size (lots of tile reuse, and codebooks were useful). This is what I expected. Also note that Cinepak for Sega cannot handle such a large video size with 30 fps due to VDP bandwidth (it sends all the tile data for each frame).
Link for ISO: http://dl.dropbox.com/u/26821164/BADAPPLE.ISO
I guess its my job to make the Cinepak for Sega version. I took the youtube video and converted it to a 256x192 size 15 fps 8 color grayscale video with 16276 Hz mono audio for Cinepak for Sega. It resulted in 9950039 video bytes and 3511593 audio bytes. This means my Cinepak for Sega encoder reduced the video data to 12% of the uncompressed size (lots of tile reuse, and codebooks were useful). This is what I expected. Also note that Cinepak for Sega cannot handle such a large video size with 30 fps due to VDP bandwidth (it sends all the tile data for each frame).
Link for ISO: http://dl.dropbox.com/u/26821164/BADAPPLE.ISO
Weel done ! It does look really good, far better than i expected !
The compression seems almost lossless and having 8 color grayscale really improve the gradient and overall quality :)
The resolution 256x192 looks nice too (as it conserve aspect ratio), the only draw back is the 15 FPS frame rate but not too bad for that resolution (far better than many sega cd FMV).
TrekkiesUnite118
10-30-2011, 08:44 PM
Now we just need a Saturn version and we're all set!
Confirmed, Bad Apple is the most whored out video in the story of homebrew =P
bgvanbur
10-30-2011, 11:31 PM
The compression seems almost lossless
Well its lossless in terms of Sega VDP. I converted from flv to PNGs (using ffmpeg), so lose quality in scaling to 256x192 and changing frame rate to 15 fps. I then converted these PNGs to the Sega VDP format (each pixel converted to nearest Sega VDP grey color, so color quality loss). But the step to convert this optimal uncompressed Sega VDP data to the Cinepak for Sega format is lossless.
Confirmed, Bad Apple is the most whored out video in the story of homebrew =P
Holy shit, I was right (http://www.youtube.com/watch?v=uJaAYD0YT44) o_O; [/off-topic]
sega16
10-31-2011, 05:55 PM
I can not help but feel that people are wasting using there talents on the same video over and over.Lots of people who post here are very smart and fully capable of doing some awesome stuff but in my opinion recreating other people's awesome stuff is not always as cool.What made me think of this is when I ran bgvanbur's sega cd demo.IT WAS AMAZING that is what I thought good quality video and audio.I have spent along time trying to make a good sega genesis and yet it is still not relay that amazing (see link below for my c based fmv player) (I am not at sega cd level yet as bex doesn't count)I have seen official sega cd games with worse quality and this demo was perfect except one thing IT IS THE SAME VIDEO!
Download my fmv player it may not be as cool but at-least it is not bad apple (http://www.mediafire.com/?w683ewo2pda8fm8)
Hint: Sega already did a shitton of FMVs back in the MCD days, and FMV is also one of the most common things done in homebrew demos, so technically the mere fact of doing FMVs in itself is just lame with your criteria. Ever thought about that? If you want to be interesting then go make something that isn't FMV for starters =P
Also somebody elsewhere mentioned that the main reason so many people pick Bad Apple is because from a technical standpoint it's actually pretty easy to work with while still having interesting material (there's a lot of stuff going on, and it's relatively long, being over three minutes), so it has sort of become the B&W FMV equivalent of the Utah teapot (http://en.wikipedia.org/wiki/Utah_teapot) when it comes to homebrew stuff.
sega16
10-31-2011, 10:03 PM
Hint: Sega already did a shitton of FMVs back in the MCD days, and FMV is also one of the most common things done in homebrew demos, so technically the mere fact of doing FMVs in itself is just lame with your criteria. Ever thought about that? If you want to be interesting then go make something that isn't FMV for starters =P
Also somebody elsewhere mentioned that the main reason so many people pick Bad Apple is because from a technical standpoint it's actually pretty easy to work with while still having interesting material (there's a lot of stuff going on, and it's relatively long, being over three minutes), so it has sort of become the B&W FMV equivalent of the Utah teapot (http://en.wikipedia.org/wiki/Utah_teapot) when it comes to homebrew stuff.You do have a relay good point.I did not want a real long fmv my goal was to just play a few 5 or so seconds clips like maybe glass shattering or a spaceship blowing up just little stuff to enhance the game.Also I see your point about the utah tea pot it is just something standard and the b&w is easier to work with on older hardware.
Yeah seems that video was used a lot in laser art :
http://www.youtube.com/watch?v=iL5wv06WSb0&feature=related
http://www.youtube.com/watch?v=HKwv0iGaeOc&feature=related
http://www.youtube.com/watch?v=uHeb67Njz7U&feature=related
As explained above this video is very interesting as technically it's quite easy to reproduce it, even on weak system as it's almost monochrome.
Also the video has very smooth and quality animations with a good music dynamic, i think that's why so many people use it :)
bgvanbur
11-01-2011, 09:13 AM
Well I only used this video because of this thread.
And regarding sega16's comment: I don't feel like I wasted my time to make a Bad Apple demo. It only took about 2 hours to make since I already had all the tools to make simple Cinepak for Sega demos since I already made a few. And it helped me refine my scripts a bit since I made a few enhancements to my scripts for this demo.
sega16
11-01-2011, 04:49 PM
Well I only used this video because of this thread.
And regarding sega16's comment: I don't feel like I wasted my time to make a Bad Apple demo. It only took about 2 hours to make since I already had all the tools to make simple Cinepak for Sega demos since I already made a few. And it helped me refine my scripts a bit since I made a few enhancements to my scripts for this demo.I wish it would take me only 2 hours to make a sega cd fmv player.Wait do you mean you already had the fmv player coded and you just modified the code a bit?
tomaitheous
11-01-2011, 07:49 PM
I can not help but feel that people are wasting using there talents on the same video over and over.Lots of people who post here are very smart and fully capable of doing some awesome stuff but in my opinion recreating other people's awesome stuff is not always as cool.What made me think of this is when I ran bgvanbur's sega cd demo.IT WAS AMAZING that is what I thought good quality video and audio.I have spent along time trying to make a good sega genesis and yet it is still not relay that amazing (see link below for my c based fmv player) (I am not at sega cd level yet as bex doesn't count)I have seen official sega cd games with worse quality and this demo was perfect except one thing IT IS THE SAME VIDEO!
Download my fmv player it may not be as cool but at-least it is not bad apple (http://www.mediafire.com/?w683ewo2pda8fm8)
Why is it your concern what other people do in their free time? There others out there that do things that I feel is a complete waste of time, but I don't go around subjecting them to my opinion. I mean, to the point that you're almost getting bent out of shape in your reply. People should be able to do what they consider fun in there free time/hobby life/etc.
I like the bad apple demo thing if only for the fact that it presents an interesting challenge: write the best/most efficient compressor you can. Anything additional to help along, like display filters and such, is included in that - in regards to retaining at much resolution (and possible greyscale range) as possible. It's not just a simple matter of just updating uncompressed frames every so many intervals.
So, since you posted a FMV demo - give the details on it. Is it more than a simple small 16 color uncompressed batch of images? Do you use multiple subpalettes (and/or layers) to break the 16 color per tile limit? If so, what algo did you use for reduction and isolation? Are the tiles themselves compressed? Etc.
sega16
11-01-2011, 10:06 PM
So, since you posted a FMV demo - give the details on it. Is it more than a simple small 16 color uncompressed batch of images? Do you use multiple subpalettes (and/or layers) to break the 16 color per tile limit? If so, what algo did you use for reduction and isolation? Are the tiles themselves compressed? Etc.
My fmv player is a series of 16 color images with tile maps.I used ffmpeg to convert any video file to resize it to 160x56 and save each frame as a .png and then I run a program that generates a .bat file which runs imagemagick the conversion code looks like this:
convert img1.png -dither FloydSteinberg -remap MDColor_fixed.png out1_2.png
convert out1_2.png -dither FloydSteinberg -colors 16 out1_3.png
convert out1_3.png -dither None -remap MDColor_fixed.png PNG8:out1_genesis.png
convert img2.png -dither FloydSteinberg -remap MDColor_fixed.png out2_2.png
convert out2_2.png -dither FloydSteinberg -colors 16 out2_3.png
convert out2_3.png -dither None -remap MDColor_fixed.png PNG8:out2_genesis.png
convert img3.png -dither FloydSteinberg -remap MDColor_fixed.png out3_2.png
convert out3_2.png -dither FloydSteinberg -colors 16 out3_3.png
convert out3_3.png -dither None -remap MDColor_fixed.png PNG8:out3_genesis.png
convert img4.png -dither FloydSteinberg -remap MDColor_fixed.png out4_2.png
convert out4_2.png -dither FloydSteinberg -colors 16 out4_3.png
convert out4_3.png -dither None -remap MDColor_fixed.png PNG8:out4_genesis.png
and so on
once that it is done it runs sixpack to convert all the images to genesis tile maps.
Then another program is run which puts all the outputed files in .sfmv format also called sega16 full motion video format
the fmv player uses a raster effect that doubles each line to form a 160x112 fmv player
I wish it would take me only 2 hours to make a sega cd fmv player.Wait do you mean you already had the fmv player coded and you just modified the code a bit?
Cinepak is one of the most common codecs used in Sega CD games and he had already messed with that before.
bgvanbur
11-02-2011, 09:14 AM
I wish it would take me only 2 hours to make a sega cd fmv player.Wait do you mean you already had the fmv player coded and you just modified the code a bit?
Like Sik said, I use the exisiting Cinepak for Sega code. I just setup the arguments to call it, and run it (and obviously make a Cinepak for Sega movie file). I plan to release how to do this in the future, but I am still polishing it for public release.
kool kitty89
11-02-2011, 03:08 PM
I guess its my job to make the Cinepak for Sega version. I took the youtube video and converted it to a 256x192 size 15 fps 8 color grayscale video with 16276 Hz mono audio for Cinepak for Sega. It resulted in 9950039 video bytes and 3511593 audio bytes. This means my Cinepak for Sega encoder reduced the video data to 12% of the uncompressed size (lots of tile reuse, and codebooks were useful). This is what I expected. Also note that Cinepak for Sega cannot handle such a large video size with 30 fps due to VDP bandwidth (it sends all the tile data for each frame).
Link for ISO: http://dl.dropbox.com/u/26821164/BADAPPLE.ISO
This is using the conventional Sega Cinepak format (as far as decoding is concerned)?
By tile re-use, do you mean having tiles unchanged from 2 frames prior? (since there isn't any other way to "re-use" tiles with cinepak, right?)
And why wouldn't 30 FPS be possible? You're running in H40 with 192 lines, so you should have 68 lines available for DMA per 60 Hz frame/field. (which should be enough to allow 256x192+tilemap updates at 30 Hz)
sega16
11-02-2011, 04:17 PM
This is using the conventional Sega Cinepak format (as far as decoding is concerned)?
By tile re-use, do you mean having tiles unchanged from 2 frames prior? (since there isn't any other way to "re-use" tiles with cinepak, right?)
And why wouldn't 30 FPS be possible? You're running in H40 with 192 lines, so you should have 68 lines available for DMA per 60 Hz frame/field. (which should be enough to allow 256x192+tilemap updates at 30 Hz)
even if 30fps is possible it is still good to use 15fps because it cuts the file size in half and still looks ok with frame blending use to down sample it to 15 fps from 30
bgvanbur
11-02-2011, 04:46 PM
This is using the conventional Sega Cinepak format (as far as decoding is concerned)?
By tile re-use, do you mean having tiles unchanged from 2 frames prior? (since there isn't any other way to "re-use" tiles with cinepak, right?)
And why wouldn't 30 FPS be possible? You're running in H40 with 192 lines, so you should have 68 lines available for DMA per 60 Hz frame/field. (which should be enough to allow 256x192+tilemap updates at 30 Hz)
My Cinepak movie encoder reuses tiles from two frames ago if possible. This is supported by the original Cinepak for Sega decoder, but was never used during the 90s by any Cinepak movies.
So every frame I need to copy 256/8 * 192/8 * (32 + 2) = 26112 bytes. 256 is the movie width, 192 is the movie height, 8 is the tile width and tile height, 32 is the single tile data bytes, and 2 is the single tile map data bytes. Using the genvdp.txt document, an NTSC frame has 262 scanlines (38 blanking and 224 active scanlines). So a 68k to VRAM DMA will have 167 * 38 + 224 * 16 = 9930 bytes transfered per NTSC frame. This means just for this DMA this requires 2.63 NTSC frames for the copy. So just this aspect makes 30 fps impossible for this frame size with the Cinepak for Sega codec.
When I tried 30 fps originally, I got slow video playback (with major audio glitches as PCM samples got looped while waiting for the main CPU to consume frames). When I tried 20 fps originally, the video appeared to run at normal speed but it caused audio glitches every once in a while. This is due to the main CPU needing more than 3 NTSC frames to display one of my movie frames. I imagine this is due besides the 2.63 NTSC frames worth of VDP DMA, the main CPU also has other things to do (CRAM buffer, tile map generation and buffering, syncing to vblank for CRAM copy and tile map copy, handshaking with sub CPU).
even if 30fps is possible it is still good to use 15fps because it cuts the file size in half and still looks ok with frame blending use to down sample it to 15 fps from 30
At 30 fps, it only made the total Cinepak movie file at 16M. This is about a 33% increase in image data (instead of a 100% increase you are suggesting). This is because since at 30 fps a lot more tiles can be reused from previous frames.
kool kitty89
11-02-2011, 04:53 PM
Like Sik said, I use the exisiting Cinepak for Sega code. I just setup the arguments to call it, and run it (and obviously make a Cinepak for Sega movie file). I plan to release how to do this in the future, but I am still polishing it for public release.
You made your own encoding software though, right? (and quality/features of the encoder is a huge part of any video format, compressed or otherwise . . . a poorly encoded VCD might look worse than 12-bit Cinepak of the same bitrate with a good encoder . . . or 256 color cinepak for that matter -or even Sega CD cinepak in extreme cases, especially depending on what the source video is -Bad Apple would obviously be an extreme example of that ;))
Is your encoder based on any of Sega's utilities, or is it totally custom?
The (albeit primitive) tile-based interframing mechanism (unchanged from 2 frames ago) wasn't ever implemented in commercial Sega CD games, was it?
Edit: you answered that already; I missed that post.
As explained above this video is very interesting as technically it's quite easy to reproduce it, even on weak system as it's almost monochrome.
Also the video has very smooth and quality animations with a good music dynamic, i think that's why so many people use it :)
Now all we need is a demo from a platform from the 70s. ;)
Doing it on an A8 or TI99 wouldn't be much more impressive than the NES (and the fact that both have actual 1bpp char modes makes that even more efficient in some respects -plus A8 has framebuffer/bitmap support), but doing it on the Apple II, TRS-80, or PET would be more impressive (especially with the PET's limitations -weaker semigraphics than the TRS-80) . . . not to mention the VCS or Channel F. (except the Channel F would actually be better off than the TRS-80 or PET in several aspects)
Of course, there's the Spectrum and ZX81 to consider too, but both of those aren't in the "70s" category.
even if 30fps is possible it is still good to use 15fps because it cuts the file size in half and still looks ok with frame blending use to down sample it to 15 fps from 30
That's only if we're talking uncompressed video . . . higher framerates for compressed video (depending on the format and the source video being converted) can have far less impact from higher framerates. Or, if you're using lossy compression, you could avoid any increase in file size (or bitrate) at all, albeit with some trade-offs for artifacts. (the current video is totally lossless though -compared to uncompressed 256x192 8-shade grayscale images) The existing demos on the MD obviously made trade-offs in image quality to preserve motion. (and lossy compression to Sega CD cinepak -or something similar- might be practical to use to get down to the 4 MB file size used by the MD demos)
And DMA bandwidth was specifically mentioned as the limiting factor on the framerate for that video . . .
kool kitty89
11-02-2011, 05:23 PM
My Cinepak movie encoder reuses tiles from two frames ago if possible. This is supported by the original Cinepak for Sega decoder, but was never used during the 90s by any Cinepak movies.
So every frame I need to copy 256/8 * 192/8 * (32 + 2) = 26112 bytes. 256 is the movie width, 192 is the movie height, 8 is the tile width and tile height, 32 is the single tile data bytes, and 2 is the single tile map data bytes. Using the genvdp.txt document, an NTSC frame has 262 scanlines (38 blanking and 224 active scanlines). So a 68k to VRAM DMA will have 167 * 38 + 224 * 16 = 9930 bytes transfered per NTSC frame. This means just for this DMA this requires 2.63 NTSC frames for the copy. So just this aspect makes 30 fps impossible for this frame size with the Cinepak for Sega codec.
That's only if you leave the display active on all 224 lines . . . you can extend vblank beyond that, and many games do exactly that (for faster animation and/or software rendering, etc), and I assume Bad Apple 0.07 is doing that too. (given the graphical garbage in the extended boarder)
Also, 2 lines per frame are reserved for something else (sync or something, I forget the specifics -Tomaitheous mentioned this before), so the actual number of vblank lines available for DMA is 262 - (2+number of active lines), or 36 in the normal 224 line display.
And, actually, at 224 lines (with 36 for vblank DMA at 167 bytes/line), you'd only get 6012 bytes in vblank, plus 224*16=3584 in hblank, for 9596 bytes per NTSC frame. (albeit that would still be enough to allow 20 FPS for 256x192)
However, running in H40 increases the VDP DMA speed by 25% (equal to the clock speed increase) to approximately 208 bytes per line (vs 167 in H32 -though I recall that actually being 166 bytes for some reason).
So you'd actually have 36*208+16*224= 11072 bytes per frame. (but still not enough for 30 FPS)
But, if you clipped the active display to 192 lines, that would give you 68 lines for vblank DMA . . . so 208*68= 14144 bytes per NTSC frame without even using hblank. (or 28288 bytes in 2 NTSC fames, so 30 FPS should be possible)
When I tried 30 fps originally, I got slow video playback (with major audio glitches as PCM samples got looped while waiting for the main CPU to consume frames). When I tried 20 fps originally, the video appeared to run at normal speed but it caused audio glitches every once in a while. This is due to the main CPU needing more than 3 NTSC frames to display one of my movie frames. I imagine this is due besides the 2.63 NTSC frames worth of VDP DMA, the main CPU also has other things to do (CRAM buffer, tile map generation and buffering, syncing to vblank for CRAM copy and tile map copy, handshaking with sub CPU).
Avoiding use of hblank should greatly reduce contention for the main CPU. (and, again, at 192 lines in H40, there's more than enough bandwidth in vblank alone to update at 30 fps -and leave all of active display for main CPU work time)
At 30 fps, it only made the total Cinepak movie file at 16M. This is about a 33% increase in image data (instead of a 100% increase you are suggesting). This is because since at 30 fps a lot more tiles can be reused from previous frames.
And that's all without going lossy. :)
bgvanbur
11-02-2011, 05:29 PM
I made my encoding software (I plan to release this aspect soon with examples). The only aspect I did not create is the the main CPU code and sub CPU code (I just code up an argument list on the stack and call the official Cinepak for Sega code). I do not have access to the original Cinepak encoder, but by analyzing a majority of the cinepak files I made conclusions about the capabilities of the original Cinepak encoder.
I looked at the main CPU code, and it only uses DMA for the tile data (except the first long word which it copies with non-DMA after the DMA copy). It uses non-DMA writes to copy the tile map (probably since each row has a gap where tile data is not written). You assert that 30 fps is possible, but just because you can squeeze the DMA copy in (which is the most time consuming activity), there are other things the main CPU must do.
So really this is main CPU limited, which is primarily caused by the VDP copies (DMA and non-DMA, and copies to a buffer so they can be copied just in time during vblank).
Jorge Nuno
11-02-2011, 05:51 PM
Not sure if running at H40 helps anything, on H32 the amount of data needed is smaller (and slower to transfer), and switching video modes dynamically probably disrupts horizontal timing.
tomaitheous
11-02-2011, 08:24 PM
Not sure if running at H40 helps anything, on H32 the amount of data needed is smaller (and slower to transfer), and switching video modes dynamically probably disrupts horizontal timing.
I thought I remember Fonzy doing tests for this and couldn't get H32 display with a H40 vblank (or inactive screen) within a single frame, but I don't know the details of his tests.
And why wouldn't 30 FPS be possible? You're running in H40 with 192 lines, so you should have 68 lines available for DMA per 60 Hz frame/field. (which should be enough to allow 256x192+tilemap updates at 30 Hz)
Probably because the Cinepak decoder didn't support doing this at all.
doing it on the Apple II, TRS-80, or PET would be more impressive (especially with the PET's limitations -weaker semigraphics than the TRS-80)
I dare you to try this on a 4KB PET >=P
Not sure if running at H40 helps anything, on H32 the amount of data needed is smaller (and slower to transfer), and switching video modes dynamically probably disrupts horizontal timing.
Depends on when you switch the resolution. If you switch the resolution in the middle of a line, the video signal gets screwed up because a scanline will have different length from the rest. But there's one area where pixels always have H32-like sizes, so if you change the resolution in this area then you probably can avoid this issue. But this requires some really tight timing.
Jorge Nuno
11-03-2011, 01:12 PM
In H40, the region of constant H32 pixels is 8 pixels, then 1 H40 pixel, then 8 H32 pixels and so on.. during the sync pulse, so timing has to be REALLY tight :P
(assuming the change is performed immediately and not in a designated spot)
kool kitty89
11-04-2011, 04:00 AM
I made my encoding software (I plan to release this aspect soon with examples). The only aspect I did not create is the the main CPU code and sub CPU code (I just code up an argument list on the stack and call the official Cinepak for Sega code). I do not have access to the original Cinepak encoder, but by analyzing a majority of the cinepak files I made conclusions about the capabilities of the original Cinepak encoder.
I looked at the main CPU code, and it only uses DMA for the tile data (except the first long word which it copies with non-DMA after the DMA copy). It uses non-DMA writes to copy the tile map (probably since each row has a gap where tile data is not written). You assert that 30 fps is possible, but just because you can squeeze the DMA copy in (which is the most time consuming activity), there are other things the main CPU must do.
So really this is main CPU limited, which is primarily caused by the VDP copies (DMA and non-DMA, and copies to a buffer so they can be copied just in time during vblank).
The current decoder is already using DMA hblank, right? That alone would be a massive burden on the main CPU (halted for nearly entire frames at a time) and a massive waste for video not actually using 224 lines.
Not sure if running at H40 helps anything, on H32 the amount of data needed is smaller (and slower to transfer), and switching video modes dynamically probably disrupts horizontal timing.
His video is already running at H40 . . . it's 256x192 in a window with a horizontal and vertical boarder. (I believe he mentioned that resolution was used for the closer to original/square pixel aspect ratio, but it has the obvious benefit of increased DMA bandwidth)
Probably because the Cinepak decoder didn't support doing this at all.
That would be rather odd (and unfortunate) given how many cutscenes run letterboxed and how that mechanism is the most common/practical method for increasing DMA time.
It's even stranger that they'd support using hblank DMA (which bgvanbur seemed to be addressing in his previous post) but not support extending vblank.
On another note:
In the context of a demo/video format running at 1/2 vertical resolution, rather than scaling 2x vertically and updating full-res tiles, and aside from interlaced/interleaved line updates (which could technically be used at full vertical resolution) or simply leaving every other line blank on each tile, you could also use 2 tilemap at a 1-pixel v-scroll offset to line double tiles with every other line left as transparent pixels. (albeit in the case of just leaving every other line blank, you could potentially disable the display entirely on those lines, for more DMA time per-frame)
On another note:
In the context of a demo/video format running at 1/2 vertical resolution, rather than scaling 2x vertically and updating full-res tiles, and aside from interlaced/interleaved line updates (which could technically be used at full vertical resolution) or simply leaving every other line blank on each tile, you could also use 2 tilemap at a 1-pixel v-scroll offset to line double tiles with every other line left as transparent pixels. (albeit in the case of just leaving every other line blank, you could potentially disable the display entirely on those lines, for more DMA time per-frame)
This is pretty much part of the trick I mentioned some time ago to have a pseudo-linear bitmap =/
kool kitty89
11-04-2011, 06:20 AM
This is pretty much part of the trick I mentioned some time ago to have a pseudo-linear bitmap =/
Yes . . . I thought that was related . . .I should have made note of that too. (though your trick involved more than just line doubling like that -that would also allow different palettes to be used on even and odd lines -or shadow for that matter, though I think I mentioned that when you originally commented on the pseudo-linear bitmap trick)
You could also use that set-up for a V-blur effect (changing layer priority or scroll position to switch even/odd lines every other frame and flicker-blur the scanlines), sort of like the scroll-based h-blur technique (though taking double the VRAM space -so more like a full-screen flicker effect, except 1/2 res and requiring less DMA bandwidth . . . and should also reduce visible flicker artifacts compared to full-screen flicker based pseudo-color)
that would also allow different palettes to be used on even and odd lines
Yeah, but I don't see how much use you'd have for that since you have to use the same color indexes on both planes anyways, and you can only change the palette on a per-tile basis. The only use I can see for it is to create a scanlines effect... that'd be cool actually :v
You could also use that set-up for a V-blur effect (changing layer priority or scroll position to switch even/odd lines every other frame and flicker-blur the scanlines), sort of like the scroll-based h-blur technique (though taking double the VRAM space -so more like a full-screen flicker effect, except 1/2 res and requiring less DMA bandwidth . . . and should also reduce visible flicker artifacts compared to full-screen flicker based pseudo-color)
Not of much use if you get the same pixels on both planes anyways. Unless you want to use the palette trickery to fake shades not possible with the VDP, but that doesn't look very good anyways =/ (Socket does this but with vertical lines in one of the sprites, and that doesn't look very good - but on real hardware it isn't much of a problem due to blurring of the vertical lines)
bgvanbur
11-04-2011, 10:32 AM
The current decoder is already using DMA hblank, right? That alone would be a massive burden on the main CPU (halted for nearly entire frames at a time) and a massive waste for video not actually using 224 lines.
The cinepak code only used the rectangle in plane A needed to show the movie. The border of plane A, the entire plane B, and sprites were still available for use by the game engine so all of the lines could be in use. Therefore the cinepak codec should not restrict non-cinepak lines from being shown.
I wondered why the Cinepak VDP DMA copy is split in two halves in later revisions, perhaps this helps with IRQs being handled in a timely manner for larger image sizes. Does anyone know if a vblank IRQ happens during a DMA operation, does the IRQ code get called immediately suspend the DMA transfer or does vblank IRQ stay pending until the DMA transfer is over?
ints stay pending when DMA is in progress on MD side. Bus is locked and 68K is halted, and Z80 too if it does a ROM access
kool kitty89
11-05-2011, 03:04 AM
The cinepak code only used the rectangle in plane A needed to show the movie. The border of plane A, the entire plane B, and sprites were still available for use by the game engine so all of the lines could be in use. Therefore the cinepak codec should not restrict non-cinepak lines from being shown.
You'd obviously want to make vblank size programmable (and thus DMA bandwidth per-frame variable) depending on the specific instance used . . . or specific game (if a fixed screen resolution/vblank size was used for all instances of FMV in a given game).
If a game needed a graphical boarder beyond the cinepak window (especially vertically), you obviously would have to work within the limits of that. (albeit, for dedicated cutscenes, that should never be the case . . . and many examples of in-game FMV could work fine with letterboxing -and many games did do that, regardless of additional TMA time being used)
Regardless of FMV, many games clipped the screen for more vblank time . . . and many FMV games had totally blank letterboxed boarders as well. And, while cinepak may not support extending vblank DMA, several other FMV examples certainly did. (especially obvious for games with the telltale boarder artifacts visible in the letterboxing)
Also, your previous comments seem to include hblank DMA bandwidth . . . or is this referring to something else:
So every frame I need to copy 256/8 * 192/8 * (32 + 2) = 26112 bytes. 256 is the movie width, 192 is the movie height, 8 is the tile width and tile height, 32 is the single tile data bytes, and 2 is the single tile map data bytes. Using the genvdp.txt document, an NTSC frame has 262 scanlines (38 blanking and 224 active scanlines). So a 68k to VRAM DMA will have 167 * 38 + 224 * 16 = 9930 bytes transfered per NTSC frame. This means just for this DMA this requires 2.63 NTSC frames for the copy. So just this aspect makes 30 fps impossible for this frame size with the Cinepak for Sega codec.
If not hblank DMA, then what does the 224 * 16 figure refer to?
Hblank does not exist as free time, all of the active lines are used up from beginning to end with some access slots thrown in
Chilly Willy
11-05-2011, 02:12 PM
Hblank does not exist as free time, all of the active lines are used up from beginning to end with some access slots thrown in
Yeah, there are single accesses in a larger "zone" of pixels across the line. There's a diagram over at SpritesMind in one of the threads. I think it is one access slot per 64 pixels, but I'm not positive since it's been a while since I saw the diagram. That's why changing the CRAM (palette) during the display causes garbage in the display - if the access slots were all in the hblank, that would be invisible.
there's 3 slots every 64 pixels, 4th slot is turned into refresh
kool kitty89
11-07-2011, 04:03 AM
Ah, that makes more sense . . . too bad the 68k gets halted for the whole line for updates in active display rather than only being halted when the VDP is actually accessing the 68k's bus.
Still, it seems odd that Cinepak wouldn't support extending vblank DMA.
It is near impossible to give 68K bus back when there's no access slot in sight, especially it only lets you run few instructions and 68K won't give the bus back immediately, which means you'll miss the slot
Ah, that makes more sense . . . too bad the 68k gets halted for the whole line for updates in active display rather than only being halted when the VDP is actually accessing the 68k's bus.
The 68000 gets halted when the FIFO is full and it tries to send yet more commands. If the VDP didn't halt the 68000 in that situation, then writes that can't get in the FIFO would get lost forever.
Still, it seems odd that Cinepak wouldn't support extending vblank DMA.
It sounds more like it leaves that to yourself instead of doing it on its own.
Chilly Willy
11-07-2011, 01:14 PM
The 68000 gets halted when the FIFO is full and it tries to send yet more commands. If the VDP didn't halt the 68000 in that situation, then writes that can't get in the FIFO would get lost forever.
Which is EXACTLY what happens on the SMS VDP - if you write the FIFO too fast, the old data is lost. During the vblank, the Z80 cannot write the VDP fast enough for that to happen, but it can in the active display. SEGA gave the "standard" delay between writes to the VDP on the Mark III/SMS/GG as "push ix; pop ix" instructions between outs. You'll see other instructions depending on the developer, but the basic idea is to waste enough time for the FIFO to get an access slot and write the data before you try to write it again.
That's the most common mistake for SMS homebrew - not waiting between writes during the active display. No SMS/GG emulators emulate this timing issue, so when they test on emulators, it looks fine. When run on real hardware, you have tons of missing graphics.
Emukon / eSMS does attempt at emulating the speed limits and it comes fairly close, not quite it but it gives an estimation. When its clear then 99% sure real HW is clean too
Also, MD VDP is very generous with speed aswell as Game Gear
Which is EXACTLY what happens on the SMS VDP - if you write the FIFO too fast, the old data is lost. During the vblank, the Z80 cannot write the VDP fast enough for that to happen, but it can in the active display. SEGA gave the "standard" delay between writes to the VDP on the Mark III/SMS/GG as "push ix; pop ix" instructions between outs. You'll see other instructions depending on the developer, but the basic idea is to waste enough time for the FIFO to get an access slot and write the data before you try to write it again.
That's the most common mistake for SMS homebrew - not waiting between writes during the active display. No SMS/GG emulators emulate this timing issue, so when they test on emulators, it looks fine. When run on real hardware, you have tons of missing graphics.
To be fair, I was always under the impression that there was no FIFO at all in the SMS VDP, and that any attempts to write outside vblank at all (registers aside) were completely ignored, no matter what you tried. But I suppose that somehow the gradient in Sonic Blast GG has to work.
Chilly Willy
11-07-2011, 04:30 PM
Emukon / eSMS does attempt at emulating the speed limits and it comes fairly close, not quite it but it gives an estimation. When its clear then 99% sure real HW is clean too
Also, MD VDP is very generous with speed aswell as Game Gear
The MD VDP in SMS mode is no different for vram writing than a real SMS to my tests. Homebrew that is too fast for SMS fails the exact same way on the MD. I've tried this on a Model 2, CDX, Nomad, and a real SMS. No idea on the GameGear, though. Don't have one.
To be fair, I was always under the impression that there was no FIFO at all in the SMS VDP, and that any attempts to write outside vblank at all (registers aside) were completely ignored, no matter what you tried. But I suppose that somehow the gradient in Sonic Blast GG has to work.
There would have to be a latch at the very least because the Z80 isn't held off when writing. The Z80 writes to the latch/FIFO and continues on. I would guess in the SMS it's more likely a single latch than a FIFO. TECHNICALLY, a latch can be thought of as a single entry FIFO, so sometimes we'll say FIFO for anything when in a few cases it is implied that it's really a latch.
kool kitty89
11-07-2011, 09:32 PM
It sounds more like it leaves that to yourself instead of doing it on its own.
In that case, my comments to bgvanbur (about clipping for more DMA time) would still apply.
If there's no specific issues with Sega's Cinepak decoder that causes problems when vblank is extended (and there probably isn't, given how the same player works fine in both PAL and NTSC -so you already get a lot more vblank time in PAL), then it should just be up to the programmer to set active display to less than 224 lines.
And, again, clipping for more vblank time could not only provide the bandwidth needed for 30 FPS (in this specific case), but could do that without needing to halt the CPU in active display (ie 2 vblank DMA periods alone would be enough for 1 frame update), so you'd get a lot more MD CPU time overall. (with the current set-up, you'd have the 68k halted roughly 60% of the time vs ~13.6% with the screen clipped to 192 lines -or ~25.3% for 30 FPS updates)
kool kitty89
11-08-2011, 03:20 AM
I guess its my job to make the Cinepak for Sega version. I took the youtube video and converted it to a 256x192 size 15 fps 8 color grayscale video with 16276 Hz mono audio for Cinepak for Sega. It resulted in 9950039 video bytes and 3511593 audio bytes. This means my Cinepak for Sega encoder reduced the video data to 12% of the uncompressed size (lots of tile reuse, and codebooks were useful). This is what I expected. Also note that Cinepak for Sega cannot handle such a large video size with 30 fps due to VDP bandwidth (it sends all the tile data for each frame).
Link for ISO: http://dl.dropbox.com/u/26821164/BADAPPLE.ISO
I forgot to ask: what sort of preprocessing/filtering did you do for the audio?
It seems a lot cleaner than most 16 kHz 8-bit PCM use on the MCD. (did you low pass filter the audio to stay within nyquist limits -ie filter out everything beyond ~8 kHz? . . . I know not filtering has the advantage of retaining high pitch sounds at the expense of aliasing -though your example doesn't sound very muffled either)
The MD VDP in SMS mode is no different for vram writing than a real SMS to my tests. Homebrew that is too fast for SMS fails the exact same way on the MD. I've tried this on a Model 2, CDX, Nomad, and a real SMS. No idea on the GameGear, though. Don't have one.
Then it is definitely MD model dependant. 315-5487 is very generous with VDP access, and allows for hefty overclocking in SMS mode, while 315-5660 which is also used in CDX and most MD2s does not allow much overclocking and thus does not give extra VDP cycles...
When I was testing my SMS dabbling on my MD2 I quickly discovered that side, image was nice and clean on MD but completely jumbled up on SMS...
Gee, that makes me wonder how many VDP revisions are out there then. I know Jorge found out at least one revision has different behavior for the vscroll bug, and now there's this? =/
315-5313
315-5315A
315-5313A-01
315-5487
315-5487-10
315-5660
315-5660-02
315-5700
315-5708
315-5786
315-5960
315-6123
:P
I meant different revisions of the VDP itself - i.e. VDPs with different behavior. Those are ASIC revisions =P
Each of those ASICs has one potentially different VDP implementation :P
Yes, but we have yet to prove every single one of them is different =P
Jorge Nuno
11-08-2011, 08:34 AM
I can prove the 315-6123 is different :V
bgvanbur
11-08-2011, 09:11 AM
I forgot to ask: what sort of preprocessing/filtering did you do for the audio?
It seems a lot cleaner than most 16 kHz 8-bit PCM use on the MCD. (did you low pass filter the audio to stay within nyquist limits -ie filter out everything beyond ~8 kHz? . . . I know not filtering has the advantage of retaining high pitch sounds at the expense of aliasing -though your example doesn't sound very muffled either)
I have released the source for several of my Cinepak demos at: http://forums.sonicretro.org/index.php?showtopic=26243&view=findpost&p=636875.
Specifically to answer your question, I convert the audio using ffmpeg (this does all the hard work of converting to linear 8 bit PCM) and scdwav2pcm (part of my SCDTools scripts that convert 8 bit PCM in the wave to the Sega CD 8 bit format so lossless as long as you avoid 0xFF).
ffmpeg -i badapple.flv -acodec pcm_u8 -ac 1 -ar 16276 -vol 224 pcm.wav
scdwav2pcm pcm.wav pcm.pcm
kool kitty89
11-08-2011, 04:32 PM
Then it is definitely MD model dependant. 315-5487 is very generous with VDP access, and allows for hefty overclocking in SMS mode, while 315-5660 which is also used in CDX and most MD2s does not allow much overclocking and thus does not give extra VDP cycles...
When I was testing my SMS dabbling on my MD2 I quickly discovered that side, image was nice and clean on MD but completely jumbled up on SMS...
315-5313
315-5315A
315-5313A-01
315-5487
315-5487-10
315-5660
315-5660-02
315-5700
315-5708
315-5786
315-5960
315-6123
:P
At which point did the VDP switch to CMOS? (I'd imagine that alone could have led to some of the more significant differences . . . not necessarly the most extreme difference in behavior/performance, but certainly a fairly dramatic change in actual silicon implementation if nothing else -then again, switching to smaller chip processes -be it NMOS or CMOS- could lead to significant changes too -namely if optimized for using chip space in the new process rather than a simple die shrink)
bgvanbur
11-10-2011, 11:51 AM
If there's no specific issues with Sega's Cinepak decoder that causes problems when vblank is extended (and there probably isn't, given how the same player works fine in both PAL and NTSC -so you already get a lot more vblank time in PAL), then it should just be up to the programmer to set active display to less than 224 lines.
Looking at all the Cinepak for Sega movies I have decoded (over 6000), one of the movies from Fahrenheit has the highest VDP DMA tile transfer utilization. It requires 64% of the maximum VDP DMA transfer while displaying 224 lines. The next highest is 51%. Most movies fall in the range of 40% to 50% as shown in the following chart. And note, my Bad Apple demo at 15 fps was at 54%, which is above all but one Cinepak movie. When I tried 20 fps which would be 72%, it caused slight glitches (remember more than just tile VDP DMA needs to be done). Since all but one Cinepak movie required less VDP DMA bandwidth, they obviously didn't need to play tricks with the active lines to get higher VDP DMA. Later I should try 18 fps to test out the 64% VDP DMA usage.
http://img256.imageshack.us/img256/8788/cpdma.png
Can't you measure the DMA usage including everything (even overhead if possible), not just tiles? =/
kool kitty89
11-10-2011, 07:34 PM
Looking at all the Cinepak for Sega movies I have decoded (over 6000), one of the movies from Fahrenheit has the highest VDP DMA tile transfer utilization. It requires 64% of the maximum VDP DMA transfer while displaying 224 lines. The next highest is 51%. Most movies fall in the range of 40% to 50% as shown in the following chart. And note, my Bad Apple demo at 15 fps was at 54%, which is above all but one Cinepak movie. When I tried 20 fps which would be 72%, it caused slight glitches (remember more than just tile VDP DMA needs to be done). Since all but one Cinepak movie required less VDP DMA bandwidth, they obviously didn't need to play tricks with the active lines to get higher VDP DMA. Later I should try 18 fps to test out the 64% VDP DMA usage.
Again, those are using transfers in active display too, right?
That's a massive waste of MD CPU time that's unnecessary unless you absolutely can't get enough bandwidth in vblank alone . . . especially for games that use cropped FMV to fewer than 224 lines (where even more vblank time would be quite practical).
It would have made far more sense for most FMV to do all transfers in vblank, period, with the exceptions only being when vblank bandwidth was totally maxed out.
Let's take your specific video as an example (which, granted, pushed much more bandwidth than most).
You're doing 256x192 (26112 bytes) at 15 FPS (ie 1 per 4 60 Hz frames) in H40 with 224 lines active.
So, using vblank alone, you'd have 7488 bytes per frame (208 bytes per line, 36 lines available for DMA) which would take 3.49/4 60 Hz frames to transfer (or ~87% of available vblank bandwidth). That would also mean the 68k would be halted roughly 12% of the time, leaving ~88% or ~6.75 MHz performance.
However, using active display, you'd get another 16 bytes per line (not sure if it's different in H40 from H30), so 11072 bytes per 60 Hz frame and ~2.36/4 frames to transfer 26112 bytes. However, this case leaves the 68k halted for entire frames at a time, and would thus mean having the CPU halted roughly 59% of the time (or a bit better ~51.8% if that last .36 frame worth of data was done all in vblank), or effectively ~3.14 MHz or 3.7 MHz best-case. (a massive performance loss over restricting DMA to vblank)
At 20 FPS (1 in 3 60 Hz frames), you wouldn't have enough bandwidth in vblank alone (without disabling more of the display), but using active display would mean ~2.36/3 60 Hz frames . . . or ~78.6% of overall bandwidth and CPU time (or ~69.1% CPU time, best-case) leaving CPU performance at just ~1.64 MHz or 2.37 MHz best-case.
But, if you clipped the display to 192 lines (as per the visible screen window), that would increase available vblank DMA lines from 36 to 68, or 14144 bytes per frame (with zero use of active display).
And 20 FPS in that case would need only ~1.85/3 frames (16.0% CPU time) leaving ~6.44 MHz CPU performance.
Or, for 30 FPS (impossible at 224 lines, even saturating active display with 68k halted 100% of the time), at 192 lines clipped, you'd need 1.85/2 frames (~24.0% CPU time) leaving ~5.83 MHz CPU performance. (note, still substantially better than even the 15 Hz example using active display)
So, obviously, there's tons of practicality to extending vblank and (more so) in avoiding DMA in active display at all costs. (ie used only when there are absolutely no other practical options for increasing DMA bandwidth)
That's obviously much more important still for plain MD games (without all the added resource from the CD), and any video decompression drivers on the MD alone (ie all the MD-specific bad apple demos -among others . . . like the FMV intro used in Sonic 3D Blast) would need to use vblnak DMA alone out of absolute necessity. (and most/all MD games -aside from cases needing very little 68k resource . . . with many games making the sacrifice to letterbox the screen to facilitate such -like SFII, Virtua Racing, etc . . . the letterbox sacrifice is a much more realistic one to make most of the time, and even much of the time with the MCD -unless you're really doing almost nothing with the MD CPU)
However, since almost all FMV could be done in 36 lines of vblank alone, this really shouldn't have been an issue either (ie neither active display DMA nor clipping should have been necessary at all -at least for plain cutscenes, where significant additional sprite/BG bandwidth wasn't needed). But it's odd that your player would push DMA in active display if that were the case.
6012 (H32) or 7488 (H40) bytes per 60 Hz frame was enough to handle 256x152 15 FPS in H32, 288x192 12 FPS in H40, 256x224 10 FPS in H32, let alone lower bandwidth examples. (though batman and robin's 256x192 ~15 FPS H32 would need more than that since you'd max out at ~13.8 FPS otherwise at that resolution -but clipping to 216 active lines would have been more than enough to allow a full 15 FPS, hell clipping to the existing 192 lines would give more than enough for 20 FPS . . . 288x192 clipped to 192 lines should allow 20 FPS with an even wider margin -though not enough for 30 FPS . . . maybe enough for a consistent 24 FPS- and 256x152 H32 clipped to 152 lines could allow 30 FPS with tons of bandwidth to spare)
bgvanbur
11-10-2011, 10:55 PM
Can't you measure the DMA usage including everything (even overhead if possible), not just tiles? =/
Yes, this DMA is just the tile data DMA. The tile map and CRAM are written using VDP register writes, not quite sure how you would measure this speed.
6012 (H32) or 7488 (H40) bytes per 60 Hz frame was enough to handle 256x152 15 FPS in H32, 288x192 12 FPS in H40, 256x224 10 FPS in H32, let alone lower bandwidth examples. (though batman and robin's 256x192 ~15 FPS H32 would need more than that since you'd max out at ~13.8 FPS otherwise at that resolution -but clipping to 216 active lines would have been more than enough to allow a full 15 FPS, hell clipping to the existing 192 lines would give more than enough for 20 FPS . . . 288x192 clipped to 192 lines should allow 20 FPS with an even wider margin -though not enough for 30 FPS . . . maybe enough for a consistent 24 FPS- and 256x152 H32 clipped to 152 lines could allow 30 FPS with tons of bandwidth to spare)
Batman and Robin is 12 FPS. It requires 24576 byte tile data DMA, 1536 bytes tile map VDP register writes, and 128 bytes CRAM register writes. For the tile data DMA, you have (12*(256/8*192/8*32+1536+128))/(60*(36*167+224*16)), or 54% of the potential DMA (though the tile map and CRAM writes are probably slower than DMA). Your calculation of 13.8 fps is off since you assume no VDP data is sent during active lines, so you are missing 3584 bytes for each frame.
So how do you implement 192 line clipping? I was looking at the VDP docs and couldn't find any clues.
kool kitty89
11-11-2011, 02:04 AM
Batman and Robin is 12 FPS. It requires 24576 byte tile data DMA, 1536 bytes tile map VDP register writes, and 128 bytes CRAM register writes. For the tile data DMA, you have (12*(256/8*192/8*32+1536+128))/(60*(36*167+224*16)), or 54% of the potential DMA (though the tile map and CRAM writes are probably slower than DMA). Your calculation of 13.8 fps is off since you assume no VDP data is sent during active lines, so you are missing 3584 bytes for each frame.
OK, then you'd avoid the need for DMA in active display there too. (12 FPS at 24576 bytes per frame would fit well within the 6012 byte/60 Hz frame limit . . . you'd only need 24576 bytes updated once every 5 60 Hz frames, and you've got up to 30060 bytes in vblank bandwidth alone in that time)
So many no examples of Cinepak bother with DMA in active display or clipping . . . though some other FMV examples (and certainly software/ASIC framebuffer renderers -and several fighting games, Virtua Racing, etc) make use of extended vblank for more DMA time.
So how do you implement 192 line clipping? I was looking at the VDP docs and couldn't find any clues.
You'd need to ask Tiido, Chilly Willy, Sik, or one of the other programmers (not sure if it's been detailed on spritesmind before . . . but I'd imagine it has at least been discussed).
I can't actually program for the MD . . . I only know about these things from discussions on these matters in the past. (it's some up many times, usually in the context of sprite/BG animation limits or -especially- software rendering to a pseudo framebuffer -or FMV, or ASIC rendering)
From what I understand, the VDP allows active display to be programmable up from 0 to 224 (non interleased) lines in 60 Hz or up to 240 lines in 50 Hz, but I don't how to actually program the VDP to do that.
But it should shouldn't be difficult to see the advantage to sacrificing a bit of vertical display for more bandwidth . . . especially to avoid using DMA in active display. (clipping to 200 lines -like SFII- in H32 allows more DMA bandwidth in 1 vblank period than you'd get in an entire 224 line frame+vblank while offering much, much less impact on CPU performance, and H40 would give a much bigger advantage)
It's somewhat more practical to use for the MCD (since you could design a game that ran mostly on the MCD CPU, and just leave enough main 68k time to handle I/O, FM updates, etc), but even then it's rather wasteful compared to the modest sacrifice of vertical resolution/screen size. (and, obviously, for a plain MD game, it's far less attractive to use DMA in active display like that . . . a massive sacrifice in CPU time for a relatively tiny gain in DMA bandwidth -plus, it's unattractive for anything that's not double buffered . . . unless you really time things precisely so you don't update something too early mid-screen)
Batman and Robin is 12 FPS.
The FMVs? Because I'm pretty sure the game runs at 15 FPS, and it's updating a 256×192 screen in H32 mode.
So how do you implement 192 line clipping? I was looking at the VDP docs and couldn't find any clues.
Use hblank at line 192
Disable display
DMA everything you need
Enable display
You can use VDP register $0A to tell the VDP how often to trigger hblank. Set it to 192 to make it trigger every 192 lines (but since there're 224 lines, that effectively means it only triggers once, at line 192).
bgvanbur
11-11-2011, 08:52 AM
The FMVs? Because I'm pretty sure the game runs at 15 FPS, and it's updating a 256×192 screen in H32 mode.
The FMVs are 12 FPS. The game could easily run at 15 FPS since it can preload or reuse tiles whereas the Cinepak codec does not preload or reuse any tiles per frame (with regard to the VDP).
Use hblank at line 192
Disable display
DMA everything you need
Enable display
You can use VDP register $0A to tell the VDP how often to trigger hblank. Set it to 192 to make it trigger every 192 lines (but since there're 224 lines, that effectively means it only triggers once, at line 192).
You couldn't just add a hblank routine and have this work for Cinepak. Since the Cinepak performs the DMA, you could have pending hblank if line 192 occurs during a DMA, or you could disable the display, and the DMA makes you miss enabling the display for the next frame. To support this, you would need to modify the Cinepak MAIN CPU code since you can't do it just by adding a user defined hblank.
The FMVs are 12 FPS. The game could easily run at 15 FPS since it can preload or reuse tiles whereas the Cinepak codec does not preload or reuse any tiles per frame (with regard to the VDP).
Pretty sure it just redraws the entire screen (but then again, it uses the ASIC for that, which is a huge speed boost).
You couldn't just add a hblank routine and have this work for Cinepak. Since the Cinepak performs the DMA, you could have pending hblank if line 192 occurs during a DMA, or you could disable the display, and the DMA makes you miss enabling the display for the next frame. To support this, you would need to modify the Cinepak MAIN CPU code since you can't do it just by adding a user defined hblank.
So it does everything by itself after all =|
kool kitty89
11-17-2011, 02:00 AM
You couldn't just add a hblank routine and have this work for Cinepak. Since the Cinepak performs the DMA, you could have pending hblank if line 192 occurs during a DMA, or you could disable the display, and the DMA makes you miss enabling the display for the next frame. To support this, you would need to modify the Cinepak MAIN CPU code since you can't do it just by adding a user defined hblank.
Wouldn't DMA only occur on line 192 if you were using DMA in active display?
ie, if you had a Cinepak decoder that never, ever used DMA outside of vblank, this wouldn't be a problem, right? (and, again, it seems extremely wasteful -if not silly- to use non-vblnak DMA at all for this purpose; it would have made far more sense to disallow DMA outside of vblank, especially since most/all FMV fit within vblank DMA bandwidth limits even of a 224 line display, let alone the many cases where letterboxing already exists and would thus mean no sacrifice for extending vblank)
I modified the video to 4 gray color level still with full resolution though (320x224) and lossless compression.
I tweaked a bit my compression and now i obtain this :
Original tiles size : 13118672
Packed tiles size : 5532992
RAW : 18946 tiles
packed RLE : 185222 tiles
packed RLE Rot : 212647 tiles
packed Plain + Pix : 170209 tiles
copy (flipped or not) : 232893 where 72150 tiles are used as reference.
Packed tilemap data size : 708437
Total data size : 6241429 bytes
still a lot more than the 3.5 MB max i want :o
If we take the packed tiles size, we can see i get a 42% compression ratio... not really good :-/
I did not abandoned my project :D
After many work i reduced a lot the size of the data :
4 gray levels at 320x224 resolution with lossless compression :
Original (not plain) tile size: 13118672
Packed tile size: 3621268 (28970146 bits)
Details of packed tiles :
Plain sharing tilemap: 200579 tiles - 101277 bytes (810220 bits)
RAW : 1476 tiles - 24354 bytes (194832 bits)
Plain + pix : 41727 tiles - 108081 bytes (864654 bits)
Dico: 270659 tiles - 1355088 bytes (10840709 bits)
Dico Rotated: 314278 tiles - 1528549 bytes (12228395 bits)
Derive previous: 83733 tiles - 262103 bytes (2096824 bits)
Derive other: 67382 tiles - 221483 bytes (1771864 bits)
Copy: 40662 tiles - 20331 bytes (162648 bits)
Tilemap size : 278384 bytes
Total data size : 3899652 bytes (~3.72 MB)
That is not the wanted 3.5 MB but really close and enough to fit in 4 MB :)
The dico i used is definitely not the best one, with only my main dictionary packed tiles size is about 3800000 bytes...
I had to add 2 others alternates dictionaries to achieve a better compression (about 3600000 bytes).
I guess we could get a bit better compression by having optimal dictionary here.
I'm posting the first demo, it is a beta demo as it is not optimized. The interesting point is that the video does fit in 4 MB :)
Right now the video is very choppy, buggy in some part, very slow, i need to optimize the code *a lot*... I do not know if i will be able to optimize it enough to get a stable 30 FPS playback rate but well, i will try :)
When it will be done i will also try to add some FM music (not enough free space for PCM).
You can download the rom from here :
version 1 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple1.bin
version 2 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple2.bin
Kamahl
08-18-2012, 02:00 PM
15-20 fps should be good enough.
Really for me it is very different ;) Of course 15/20 FPS already looks "good" but feeling is very different from a smooth and stable 30 FPS :)
15/20 FPS would be nice, 30 FPS would be awesome :)
evildragon
08-19-2012, 02:06 PM
Man not too sure how you got it down that much.
When I wrote a Bad Apple demo for an Intel 8086 PC, with RLE compression it was down to like 12MB, and it came out like this (no sound): http://www.youtube.com/watch?v=OKyolreDW04
1-bit monochrome, no shades of grey. I think I need a new compression scheme.
(PS, it looks interlaced in the video, cause when in CGA video modes, the video memory IS interleaved).
New version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple2.bin
Speed is a bit better (not that much) but at least i fixed last bugs. The video is not anymore choppy and buggy in some place :)
I am still in C but even with ASM i will need very important optimizations to get things at correct speed.
Man not too sure how you got it down that much.
Honestly i spent a lot of time on that, i am not really good in compression stuff hopefully the video offers good compression opportunities...
Still keeping the video 320x224 resolution, 2bpp with lossless compression was really not easy.
When I wrote a Bad Apple demo for an Intel 8086 PC, with RLE compression it was down to like 12MB, and it came out like this (no sound): http://www.youtube.com/watch?v=OKyolreDW04
1-bit monochrome, no shades of grey. I think I need a new compression scheme.
(PS, it looks interlaced in the video, cause when in CGA video modes, the video memory IS interleaved).
At least it looks really smooth :D
Having speed and compression is the hard part :p
By the way what is the resolution on your video, hard to tell by looking at it.
evildragon
08-19-2012, 05:09 PM
On my video, the screen is in a CGA 320x200 mode, palette 1 high intensity mode. The video itself however was 64x48.
Yeah i was speaking about the video resolution :) 64x48 ok thanks. I guess you needed to reduce to maintain a good frame rate ? And maybe also for the size...
evildragon
08-19-2012, 05:43 PM
Yea I had to reduce it.. I had a few test shots that were higher resolution in the video (and using LZW compression....AND using BIOS calls to update the screen) and the result was absolute trash.
Here I tried to use shades of grey, in 320x200 64 shades of grey mode: http://www.youtube.com/watch?v=idK4meBnUJw
And here, same compression and video, but instead in CGA mode (thus the interlacing as CGA modes memory addresses are interleaved and I don't bother optimizing the writing method to video RAM to get around the effect) and 1-bit monochrome video: http://www.youtube.com/watch?v=lm_mgy9c4aQ
I then found that using BIOS calls sucked hardcore and had to just access everything directly myself and that's where the final version came from.. The test videos above though used a higher resolution video.. I forget what resolution it was though..
R3000A
08-19-2012, 06:22 PM
Has this been done on the SNES?
evildragon
08-19-2012, 06:46 PM
Not the SNES, but it has been done on the NES.
http://www.youtube.com/watch?v=cuMkI6cDKMs
R3000A
08-19-2012, 07:05 PM
Impressive. Can the SNES do this?
evildragon
08-19-2012, 07:06 PM
I don't see why not.. I mean, if the NES can do it, why not?
R3000A
08-19-2012, 07:13 PM
Just curious. Does the SNES have the bandwidth to handle it?
R3000A
08-19-2012, 07:20 PM
What's the DMA transfer speed for both systems?
Kamahl
08-19-2012, 07:24 PM
The SNES can do it, it just depends at what quality.
The transfer speeds are somewhere in the 4th gen discussion thread... good luck finding them.
The SNES can do it, but not the same quality as you don't have the same bandwidth (i think it is at least divided by 2) and also for compression you won't be able to achieve same ratio because of the weaker CPU.
R3000A
08-19-2012, 07:29 PM
Thanks for the replies. What about the custom graphics chip?
Joe Redifer
08-19-2012, 07:33 PM
I really like all of the Bad Apple demos. I'd be curious to see what an 8MB or even 10MB version (with PCM?) would look and sound like. That could be run on the Mega Everdrive. Regular Everdrive is limited to 4MB, though.
sega16
08-19-2012, 07:38 PM
For some reason I get real bothered when I do not understand what people are saying for example in a song so what is the story of bad apple? What makes the apple so bad. Are there good apples in Touhou? Is the bad apple referring to a person for example the bad guy/villain is a bad apple and "bad apple" is a bad translation for a bad person?(insert other generic questions here:D)
Kamahl
08-19-2012, 07:39 PM
Thanks for the replies. What about the custom graphics chip?
What do you mean? Are you asking if the SNES hardware has some advantages in producing this kind of video? If so, it depends. It uses a bitplane format which for fully black and white video works as an extremely cheap form of compression. There's also mode-7 to do the video in half-res and then scale it up... but mode-7 doesn't use a bitplane format.
SNES hardware is weird.
R3000A
08-19-2012, 07:42 PM
Yes, all the Bad Apple demos I've seen are nice. I'd like to buy an Everdrive, but can't afford it.
R3000A
08-19-2012, 07:55 PM
What do you mean? Are you asking if the SNES hardware has some advantages in producing this kind of video? If so, it depends. It uses a bitplane format which for fully black and white video works as an extremely cheap form of compression. There's also mode-7 to do the video in half-res and then scale it up... but mode-7 doesn't use a bitplane format.
SNES hardware is weird.
I was just curious if the SNES's GPU could play this video. From what I've read, the custom chips in the SNES were designed to offset the weak CPU.
R3000A
08-19-2012, 08:16 PM
I was just curious if the SNES's GPU could play this video. From what I've read, the custom chips in the SNES were designed to offset the weak CPU. Is this right? [/QUOTE]
R3000A
08-19-2012, 08:32 PM
176 Kb/sec for the DMA transfers - is this right?
evildragon
08-19-2012, 08:36 PM
EDIT button.
R3000A
08-19-2012, 08:40 PM
Structure=0
Joe Redifer
08-19-2012, 10:27 PM
I agree, you have no structure.
R3000A
08-20-2012, 02:05 AM
Can the SNES transfer 176 Kb per second? How many DMA channels does it have?
I really like all of the Bad Apple demos. I'd be curious to see what an 8MB or even 10MB version (with PCM?) would look and sound like. That could be run on the Mega Everdrive. Regular Everdrive is limited to 4MB, though.
Of course having 8 MB or 10 MB could help a lot : we could have PCM playback with full resolution video and easier decompression than mine :)
What do you mean? Are you asking if the SNES hardware has some advantages in producing this kind of video? If so, it depends. It uses a bitplane format which for fully black and white video works as an extremely cheap form of compression. There's also mode-7 to do the video in half-res and then scale it up... but mode-7 doesn't use a bitplane format.
SNES hardware is weird.
I think that bit plane mode is not that useful here, i believe classic tile display is better suited for that type of compression as you could use plain tile for fast color fill. The limited bandwidth prevent you to do large updates.
What can be useful on the SNES, for my particular case, is the 2 bpp plan mode. On the genesis i have to convert all my 2bpp data back to 4 bpp which is really time consuming, something i could avoid on SNES by using 2bpp tile mode.
SNES have a lower resolution so it won't look as nice than genesis but fortunately that helps for the more limited bandwidth :)
Can the SNES transfer 176 Kb per second? How many DMA channels does it have?
176 Kb second seems really low, honestly i don't know how much you can do with SNES DMA. Having severals channels don't help as you can do only one transfer at once.
R3000A
08-20-2012, 05:48 PM
I read the MD can transfer 176 Kb per second somewhere. I'm just trying to understand the SNES's hardware.
SNES has lower transfer counts than MD, mainly because it has lower resolution and all transfers are tied with video subsystem. The speed is defintiely way above 176KB/sec.
If you count only VBL on a normal frame then MD does 860KB/sec in 50Hz and 440KB/sec in 60Hz at most. SNES would be closer to 360KB/sec for 60Hz and 700KB/sec for 50Hz (those are figures of MD 256 pixel res).
R3000A
08-20-2012, 07:44 PM
Thanks. Wow - that's a lot higher than I expected - huge difference when using NTSC and PAL.
Chilly Willy
08-20-2012, 07:53 PM
Thanks. Wow - that's a lot higher than I expected - huge difference when using NTSC and PAL.
Look at the numbers: there's only 7% more pixels (240 vs 224... IF you use 240 mode in PAL), but there's about 17% more time due to the 50 Hz refresh vs 60 Hz on NTSC. So you've got 17% more time to deal with only 7% more pixels. That leaves a lot of free time that can be devoted to DMA.
R3000A
08-20-2012, 08:04 PM
I understand. How long does it take the MD to draw a line (NTSC mode)? What's the impact on the YM2612?
Chilly Willy
08-20-2012, 09:06 PM
I understand. How long does it take the MD to draw a line (NTSC mode)? What's the impact on the YM2612?
It depends on how you draw the line... are you drawing to a bitmap in ram then DMAing it to vram? Are you drawing it in vram directly? Are you merely changing patterns to different predefined patterns?
The impact on the 2612 is negligible for music. For PCM, it depends on how long your DMA blocks are vs how fast your sample rate is for the PCM.
R3000A
08-20-2012, 09:27 PM
Isn't it faster to write to VRAM directly?
sega16
08-20-2012, 09:59 PM
Isn't it faster to write to VRAM directly?
Do you mean in software if so the answer is no. In vblank DMA is alot faster when not in vblank dma is about the same as software.
R3000A
08-20-2012, 10:15 PM
Hardware. I'm not that familiar with the MD's architecture - sorry. Is there anyway to disable the Z80?
evildragon
08-20-2012, 10:16 PM
Why don't you start a new thread? Your kinda derailing this one dude.
R3000A
08-20-2012, 10:17 PM
The Z80 seems like a huge bottleneck to this system.
R3000A
08-20-2012, 10:20 PM
Sorry, what should I call it?
evildragon
08-20-2012, 10:54 PM
R3000A's Questionare Bonanza.
R3000A
08-20-2012, 11:10 PM
Sorry, I'm trying to learn about these systems.
Joe Redifer
08-20-2012, 11:31 PM
Yeah, start a new thread. A Sega forum also might not be the best place to learn about the SNES.
R3000A
08-20-2012, 11:42 PM
Ok, thanks.
Flygon
08-21-2012, 12:48 AM
Yeah, start a new thread. A Sega forum also might not be the best place to learn about the SNES.
Nonsense, a good warrior keeps his friends close. A better one keeps his enemies closer.
We have a lot of better warriors.
Edit: And I just realized the implications of my avatar.
evildragon
08-21-2012, 04:21 AM
So I'm trying to improve my 8086 version of this demo, with music, but I can't find a damn MIDI of the Bad Apple theme..
There always is a MIDI of everything but apparently not this one.
The last version of AnalYoGirl use FM for music... probably converted from a midi version.
evildragon
08-21-2012, 12:46 PM
I think I found a MIDI, but it's weird, OPL3 won't play it, nothing is heard, but my Mac can play it in QuickTime.
Looks like it may be an un-standard MIDI format.. Then again I didn't try real OPL3 hardware, I was trying in DosBox.
I actually, would like the MIDI to the original song from Touhou 4. I like how that tune is too.
New version for my bad apple demo :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple3.bin
Better speed but still far from what i need :p
I had to add a 128 KB lookup table to improve the 2 bpp to 4 bpp tile conversion speed... that hurts when you are so close from the 4 MB limit.
R3000A
08-31-2012, 12:47 AM
Do you have a YouTube video of this? Great compression.
evildragon
08-31-2012, 01:00 AM
Why does he need a youtube video if he uploaded a ROM?
Do you have a YouTube video of this? Great compression.
The youtube video won't show more than actuals bad apple video we can find except it's a slower frame rate, without any sound...
That also do not show it fits in 4 MB :p
I think I found a MIDI, but it's weird, OPL3 won't play it, nothing is heard, but my Mac can play it in QuickTime.
Looks like it may be an un-standard MIDI format.. Then again I didn't try real OPL3 hardware, I was trying in DosBox.
I actually, would like the MIDI to the original song from Touhou 4. I like how that tune is too.
At some point i would like to find the midi version of the music too, to convert it to YM2612 music... i am still not there for the moment ;)
Joe Redifer
08-31-2012, 04:13 AM
Stef I tried your last Bad Apple demo on real hardware. I haven't tried this latest one yet, but I hope to soon. Anyway it looked really nice but played at maybe 1 frame every 2 seconds or so and almost seemed to get slower and slower. Is this normal?
Yeah the first version is really choppy, the last one is better but still really slow to what it should be (i.e. 30 FPS).
It is not slower and slower but the frame rate is very dependent to inter frame changes so when you have many inter frame changes frame rate becomes very low.
Chilly Willy
08-31-2012, 03:30 PM
Yeah, if not much changes, it's full speed, but it slows quickly. Hope you can eventually get that up.
R3000A
08-31-2012, 04:00 PM
Real hardware performs differently than the current emulators (Kega and Regen.)
TrekkiesUnite118
08-31-2012, 04:10 PM
Real hardware performs differently than the current emulators (Kega and Regen.)
http://www.sega-16.com/forum/attachment.php?attachmentid=5340
Hopefully Kega as Regen are close enough so you don't get too surprised when you test on real hardware, specially about the VRAM access time during active period which is something i have to care about in my demo ;)
Joe Redifer
09-01-2012, 01:49 PM
Real hardware performs differently than the current emulators (Kega and Regen.)
I know... emulators still aren't very accurate. I rarely ever have any need for emulators when I have the real thing.
New version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple4.bin
I passed almost all the tile unpack algo to ASM code.
Unfortunately that is still too slow :-/
I do not see much more room for big improvements now...
~600000 tiles on the total 850000 tiles are packed with dictionary method.
Unfortunately the dictionary unpack code is the more complex and slowest one : i believe that i have 20% to 70% of CPU time (depending the frame complexity) eat in that code.
I profiled time to unpack a single 2bpp tile with dictionary method : 5 to 16 scanlines (close to 8000 cycles in worst case) ! And we can have 250 tiles to unpack per frame. I think i should find a simpler unpacking method :p
Again a new version :)
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple5.bin
Again a nice speed boost but still too slow to be 30 FPS (actually it can maintain it but only for few periods).
I wrote a tool that generate the dico unpacking assembly code.
Code is a bit larger that storing dicos directly but not that much (some kilobytes). The code for dico unpacking is now really fast, it was taking between 5 to 17 scanlines with previous assembly code using dico object, now the generated code takes between 2 to 6 scanlines which is almost 3 times faster ! I cannot improve more the dico unpacking code but i can still improve some others unpacking... unfortunately that won't help much for worst case, i guess i can still improve performance by 30% at best but i think i won't waste time in that.
What i will try to do is to simplify decompression so i can unpack the movie at 30 FPS, the drawback is that it won't fit anymore in 4MB (probably 6 or 7 MB). I think i will cut it half and that will permit to also add PCM sound :)
R3000A
10-13-2012, 10:11 PM
Thanks a lot for this!
evildragon
10-13-2012, 10:19 PM
Yea this is pretty freakin cool. I can't wait till it's near the final stages and has a sound track to go with it.
But, for the time being, it would probably be good if the text has some kind of black stroke around it, so when the white text is against a white background, you can still read it though. So that we know what's going on.
Thanks for comments, i hope i will get it finish at some point ;)
About the text, yeah i will improve that.
Even if it does not bring anything interesting for you.
First number is number of vblank.
Second number is number of played frame (at correct speed it should be vblank / 2)
Third number is a number representing waste cpu in waiting loop (synchronization between vblank process and active period process).
evildragon
10-14-2012, 05:14 PM
Thanks for the detailed information. I mean the numbers may not mean much to the normal user, but it does help.
Time for a new version !
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple6.bin
I made many changes but the result does not necessary show it.
The encoder now generates directly 4bpp tiles with 2 frames packed in one so i don't have to handle it in the decoder :) This is nice for speed but has the drawback to reduce the compression ratio as increasing a bit the data to unpack...
I use 8 dictionaries, they are quite similar but compared to 1 dico I gain about 450 KB. I still generate them "manually" and we could get better compression with an algorithm to find best alternate dictionaries.
Unfortunately I am still not full speed, and far from it in some part :-(
The number of tiles to unpack is just too important and I cannot get it fast enough for the moment. I can have up to 500 tiles to unpack in 800 scanlines which is just impossible to get with my current unpacking method... But i have some ideas again to improve that :p
You will notice the video is not anymore complete (around 2/3 of the original) because it does not fit anymore in 4 MB and also because i want to keep some bits for the future PCM inclusion :)
Some numbers for interested :
Tile :
------
RAW size : 75264000 bytes
Packed size : 3345212 bytes (26761699 bits)
Plain : 2017342 tiles - packed in tilemap
RAW : 2064 tiles - 67080 bytes (536640 bits)
PlainAndPix : 31057 tiles - 141510 bytes (1132084 bits)
Dico : 119325 tiles - 1301830 bytes (10414642 bits)
DeriveSame : 7821 tiles - 45888 bytes (367108 bits)
DeriveOther : 83578 tiles - 625065 bytes (5000526 bits)
Copy : 3874 tiles - 1937 bytes (15496 bits)
Tilemap :
----------
packed size : 206162 bytes
R3000A
10-29-2012, 07:57 PM
Impressive.
evildragon
10-30-2012, 03:16 AM
Have you tried RLE compression? That's what I used for my bad apple demo, but it was pure 1-bit monochrome.
Yeah RLE was the first thing i tried but the compression wasn't good enough at that time (when i was trying to get all fitting in 4MB). Now i may consider to retry it but i have some ideas which are even less cpu intensive than 4bpp RLE :)
tomaitheous
10-30-2012, 02:12 PM
Have you tried RLE compression? That's what I used for my bad apple demo, but it was pure 1-bit monochrome.
Where's your demo? I don't remember seeing it.
evildragon
10-30-2012, 04:39 PM
Here it is.
http://www.youtube.com/watch?v=OKyolreDW04
tomaitheous
10-30-2012, 10:28 PM
Here it is.
http://www.youtube.com/watch?v=OKyolreDW04
No download link? What tools did you use? C++? Did you write the video converter yourself? What did you code the player with (language)?
I started writing a compressor for PCE that used 2bpp format, but I limited the colors (err shades) to 3 - in C++. I just converted the image to YUV, striped UV and posterized the Y channel. It was tile based and RLE compressed per row in the tile. But I had other compression schemes like special shift/rotate last row (just looking at the video I converted, it was pretty apparent that there were redundant 'shapes' that looked a lot of padded shifts - shifts with repeating leading pixel).
evildragon
10-31-2012, 12:13 AM
No download link because you literally have to have a model 25 IBM for it to work. It was written for the MCGA, and the addresses are a little bit different. CGA compatible, but how it handles things is a tad different, to prevent snow on the picture. It was also corrupting HD's, so I gave up on it.
Mostly assembly, but the video frames are RLE compressed, and are simply made from PCX graphics formats, and are simply stitched together in a large file. Around 14MB. The program just simply displays them using direct access to the MCGA hardware. I was at first using BIOS calls and it was painfully slow. I used BIOS calls because for a while I didn't know how to even access MCGA directly. It's mainly the same as CGA, it's interleaved, and I found that if the program reads the frames interleaved and directly placed it in the video RAMs contents, it worked out pretty fast, though in a super small picture.
It's mainly 8086 assembly btw. That's about the only assembly language I know, and not very much of it.
I started writing a compressor for PCE that used 2bpp format, but I limited the colors (err shades) to 3 - in C++.
I just converted the image to YUV, striped UV and posterized the Y channel. It was tile based and RLE compressed per row in the tile. But I had other compression schemes like special shift/rotate last row (just looking at the video I converted, it was pretty apparent that there were redundant 'shapes' that looked a lot of padded shifts - shifts with repeating leading pixel).
I guess you use the YUV conversion to optimize the 3 colors rendering, also you probably use the last color (0 or 3) to store your RLE information ?
What resolution and what size you obtain with your compression schemes ? I'm really looking forward what you can obtain on the PCE :)
tomaitheous
11-01-2012, 04:06 PM
I guess you use the YUV conversion to optimize the 3 colors rendering, also you probably use the last color (0 or 3) to store your RLE information ?
What resolution and what size you obtain with your compression schemes ? I'm really looking forward what you can obtain on the PCE :)
The res is 256x192 @ 30fps @ 3 colors . I have a separate mask-map that uses variable length encoded binary commands (binary tree). The data is broken down into either; 8x8 tiles, or 8x1 rows. There's a flag that sets the layout for each frame; either vertical stored or horizontally stored data. The PCE Background plane is set to 2bpp mode, so I save on space in vram AND bandwidth. I also double buffer (I have two tilemaps in vram, one for vertical stored tiles and one for horizontal stored tiles so they correspond with the buffer layout as needed). I don't have any compression on the VDC side; data is decompressed with full frames even with redundant tiles. I haven't seen a need to change this as of yet as I have the cpu and VDC bandwidth for it. The tile data itself is stored in planar format (PCE's bitplane interleaved planar layout).
I haven't worked on this for a while. I have padding bit shift compression scheme (one for left padding bit and one for right padding bit) which I haven't implemented yet, but should get some additional decent savings. I don't do any frame difference compression.
Here's some specs:
http://www.pcedev.net/bad_apple/test.bmp
http://www.pcedev.net/bad_apple/BA_frame.png
Mask-map is $2c bytes for the frame. Compressed data is $327 bytes for the frame. Format 2bpp.
Again, the compressor isn't finished. I've always been a fan of multiple compression scheme setups, so still have this in mind (and ID tags which style of compression its using). I.e. Some frames compress better with different altered schemes.
The res is 256x192 @ 30fps @ 3 colors . I have a separate mask-map that uses variable length encoded binary commands (binary tree).
The data is broken down into either; 8x8 tiles, or 8x1 rows. There's a flag that sets the layout for each frame; either vertical stored or horizontally stored data. The PCE Background plane is set to 2bpp mode, so I save on space in vram AND bandwidth. I also double buffer (I have two tilemaps in vram, one for vertical stored tiles and one for horizontal stored tiles so they correspond with the buffer layout as needed). I don't have any compression on the VDC side; data is decompressed with full frames even with redundant tiles. I haven't seen a need to change this as of yet as I have the cpu and VDC bandwidth for it. The tile data itself is stored in planar format (PCE's bitplane interleaved planar layout).
It's neat you directly have access to a 2bpp mode on the PCE, even on mode 4 we don't have access to that on the MD, which is a shame in this particular case as i waste both bandwidth and cpu time with the 4bpp mode. I guess your vertical map correspond to the horizontal with a 90° rotation ? Currently I'm doing the rotation manually but i probably can use a prepared vertical tilemap as you do :) When you said you haven't compression on VDC, you means you really unpack all tiles instead of using predefined plain tiles ? If my calculation is correct that represent about 6144 bytes per frame to send to the VDC... i don't know how much you can send on PCE.
With my resolution a trivial frame transfer would eat 8960 bytes per frame. Given the 38 blank lines with 200 bytes max per scanline i barely obtain 7600 bytes so i have to use plain tile.
I haven't worked on this for a while. I have padding bit shift compression scheme (one for left padding bit and one for right padding bit) which I haven't implemented yet, but should get
some additional decent savings. I don't do any frame difference compression.
I don't get what is exactly the padding bit shift compression you are talking about but i guess that is something which work nicely on that particular video case.
Mask-map is $2c bytes for the frame. Compressed data is $327 bytes for the frame. Format 2bpp.
Again, the compressor isn't finished. I've always been a fan of multiple compression scheme setups, so still have this in mind (and ID tags which style of compression its using). I.e. Some frames compress better with different altered schemes.
With $327 bytes for that frame, if we count about $300 byte as the average frame cost, the complete video would be ~5MB which is nice :)
I started with many complexe compression schemes using binary tree dictionaries, frame differences, tile rotation... and i finally was able to compression the whole video in a bit less than 4 MB. But i realized i would never be able to unpack it at full speed, lately i rather simplified the compression, that is the only way to achieve the 30 fps rate :-/
Here's my last version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple7.bin
As you can see I'm now almost full speed, i say almost because there are still some parts where I'm not but i only need to improve the tilemap compression now :)
Unfortunately the compression ratio is now really low, i can store only half of the video in 4MB :-/
dr apocalipsis
11-02-2012, 08:06 AM
You are doing this faster than expected. Great job.
Can't wait for the music version.
Thanks :) I hope to get it quickly too :) Honestly the last version posted is not that good.. It's good for speed but really not good for compression. I have to improve it a bit if i want to fit PCM playback into free space.
tomaitheous
11-02-2012, 03:42 PM
It's neat you directly have access to a 2bpp mode on the PCE, even on mode 4 we don't have access to that on the MD, which is a shame in this particular case as i waste both bandwidth and cpu time with the 4bpp mode. I guess your vertical map correspond to the horizontal with a 90° rotation ? Currently I'm doing the rotation manually but i probably can use a prepared vertical tilemap as you do :) When you said you haven't compression on VDC, you means you really unpack all tiles instead of using predefined plain tiles ? If my calculation is correct that represent about 6144 bytes per frame to send to the VDC... i don't know how much you can send on PCE.
With my resolution a trivial frame transfer would eat 8960 bytes per frame. Given the 38 blank lines with 200 bytes max per scanline i barely obtain 7600 bytes so i have to use plain tile.
Yeah, PCE has the option to turn BG and/or SPRITEs into 2bpp mode. You can change this per scanline (makes for a kind of a cool effect since it kind of 'banks' the 4bpp into two 2bpp. There's a bit to show with set of 2bpp of the 4bpp tiles. I.e. it won't read from the alt 'banked' bitplanes, so it's doesn't show all corrupt if the original tile was a real 4bpp graphic. I did a cheap transparency effect once using this effect on a hsync interrupt call). But anyway, the VDC port is 16bit data bus but there's a pin on the VDC when pulled to ground is splits the ports into two 8bit ports for CPUs with an 8bit data bus, with the latch on the second port for vram read/write (registers via control port has no latch so you can update either just the LSB or MSB). So, even though the VDC has no 1bpp storage mode, you can exploit writing to VRAM by only writing to the MSB with latch. The last value written to the LSB is held in a buffer. It makes for writing a fast 1bpp and only need to write a byte at a time. Kinda cool for some things.
As far as VDC bandwidth, the CPU will never over take the VDC access slots during active display (even with a data block move transfer instruction Txx). The only time VDC will stall the CPU is in a short time frame inside part of Hblank if there's a lot of sprite data to be fetch (it does all sprite data fetching during part of hblank for the line) and first three scanlines immediately after vblank if sprite DMA auto flag is set. So pretty much full vram access. I did a few tests back in the day where I raced the 'beam' by writing to the tilemap (it was for some dynamic transparency effects).
Also, 32x24 sets of 2bpp tiles is 12288 bytes. I have 238,872 cpu cycles in two frame time window (figure 30fps). That's like 19cycles per byte to decompress and write to vram. That ain't gonna happen. It's 11 cpu cycles for a simple load (indirect), store PORT. Looks like I'm gonna have to compress on the VDC side with the tilemap for redundant tiles after all, to get upload expenditure back down :/ As of now, I don't have any streaming code setup on the PCE, just the single frame decompressor (I wanted the compression scheme to be fairly mature before moving onto that) and I didn't realize this. Oh well, not too difficult.
I don't get what is exactly the padding bit shift compression you are talking about but i guess that is something which work nicely on that particular video case.
Look at the large pic I posted. Look at their butt/back side (hehe). Or just there outline on the sides and such. A lot of the image detail has small incremental starecasing. Since this is confined to a tile (see the thin red border in the pic), I can represent that with an expanded shift. Like so:
If the first row of pixels is 00000111 and the next row below it is 00011111, then I could be represented as a two logical shifts to the left. But I want the first bit on the right to 'trail', so I shift then pad (set or reset the bit to what it originally was). And do this again for how many times the control code asks of this (1 pixel shift, 2 pixel shift, 3 pixels shift, etc). My data is already planar format, so this is pretty easy resource wise. What would normally take 2 bytes to store a single row of 8 pixels (2bpp), I can now store as control code in something like 4 bits or 6bits. And the same for shifting to the right instead of the left, etc. This is one of the reasons why I kept the colors down to 3 instead of 4. 1 white and 1 grey is more compression friendly for this 'shift' method than using an extra shade of grey (full 2bpp). Of course 1bpp would really benefit from this, but since I'm working with 'planes', I can still optimize which plane gets shifted thus the trailing grey bits don't hurt much.
Heh. I might end up adding a small dynamic dictionary scheme to it as well (something like 32 tiles to be referenced from within the frame). Probably won't save much, but as this point there is no big savings. It's now down to all the little savings I can get :P
Here's my last version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple7.bin
As you can see I'm now almost full speed, i say almost because there are still some parts where I'm not but i only need to improve the tilemap compression now :)
Unfortunately the compression ratio is now really low, i can store only half of the video in 4MB :-/
Nice :D
Yeah, PCE has the option to turn BG and/or SPRITEs into 2bpp mode. You can change this per scanline (makes for a kind of a cool effect since it kind of 'banks' the 4bpp into two 2bpp. There's a bit to show with set of 2bpp of the 4bpp tiles. I.e. it won't read from the alt 'banked' bitplanes, so it's doesn't show all corrupt if the original tile was a real 4bpp graphic. (I did a cheap transparency effect once using this effect on a hsync interrupt call)
Practical :) I guess the 2bpp mode is not much used in game (maybe to save vram space) but it can be cool for some effects as you can enable it on scanline base :)
But anyway, the VDC port is 16bit data bus but there's a pin on the VDC when pulled to ground is splits the ports into two 8bit ports for CPUs with an 8bit data bus, with the latch on the second port for vram read/write (registers via control port has no latch so you can update either just the LSB or MSB).
CPU with 8bit data bus, but that is the case of the HuC6280 right ? It's really weird to have a 16 bit port and connect a 8 bit cpu to it...
So, even though the VDC has no 1bpp storage mode, you can exploit writing to VRAM by only writing to the MSB with latch. The last value written to the LSB is held in a buffer. It makes for writing a fast 1bpp and only need to write a byte at a time. Kinda cool for some things.
I see, indeed the bit plan store help here at least :p
As far as VDC bandwidth, the CPU will never over take the VDC access slots during active display (even with a data block move transfer instruction Txx). The only time VDC will stall the CPU is in a short time frame inside part of Hblank if there's a lot of sprite data to be fetch (it does all sprite data fetching during part of hblank for the line) and first three scanlines immediately after vblank if sprite DMA auto flag is set. So pretty much full vram access. I did a few tests back in the day where I raced the 'beam' by writing to the tilemap (it was for some dynamic transparency effects).
That's a really big advantage versus others system here :) Maybe because of the 8 bits CPU versus the fast 16 bits VDC.
What is the max speed the cpu can feed up the VDC ?
Also, 32x24 sets of 2bpp tiles is 12288 bytes. I have 238,872 cpu cycles in two frame time window (figure 30fps).
Yeah, it's what i calculated in previous post, 6144 bytes per frame :)
That's like 19cycles per byte to decompress and write to vram. That ain't gonna happen. It's 11 cpu cycles for a simple load (indirect), store PORT.
Yeah I'm not really surprised, i faced the exact same problem... I could transfer up to (190*38) + (18*224) = 11252 bytes per frame but that would eat 100% of CPU time (without any decompression) so even 8960 would be very difficult with minor compression.
With my current compression the maximum i have to transfer is 512 4bpp tiles (16384 bytes) + complete tilemap (2240 bytes) for 2 movies frames (4 genesis frames).
That becomes 4656 bytes / frame which can be transfered easily during vblank with DMA :)
Looks like I'm gonna have to compress on the VDC side with the tilemap for redundant tiles after all, to get upload expenditure back down :/ As of now, I don't have any streaming code setup on the PCE, just the single frame decompressor (I wanted the compression scheme to be fairly mature before moving onto that) and I didn't realize this. Oh well, not too difficult.
I made the same scheme : having the codec done to make all the video fit in 4MB, but then i realized it would be just impossible to optimize it enough to get full speed play :p
Look at the large pic I posted. Look at their butt/back side (hehe). Or just there outline on the sides and such. A lot of the image detail has small incremental starecasing. Since this is confined to a tile (see the thin red border in the pic), I can represent that with an expanded shift. Like so:
If the first row of pixels is 00000111 and the next row below it is 00011111, then I could be represented as a two logical shifts to the left. But I want the first bit on the right to 'trail', so I shift then pad (set or reset the bit to what it originally was). And do this again for how many times the control code asks of this (1 pixel shift, 2 pixel shift, 3 pixels shift, etc). My data is already planar format, so this is pretty easy resource wise. What would normally take 2 bytes to store a single row of 8 pixels (2bpp), I can now store as control code in something like 4 bits or 6bits. And the same for shifting to the right instead of the left, etc. This is one of the reasons why I kept the colors down to 3 instead of 4. 1 white and 1 grey is more compression friendly for this 'shift' method than using an extra shade of grey (full 2bpp). Of course 1bpp would really benefit from this, but since I'm working with 'planes', I can still optimize which plane gets shifted thus the trailing grey bits don't hurt much.
Thanks for explaining, perfectly got it :) Still even if the compression look simple I'm not sure you will be able to unpack it at full speed :-/
Heh. I might end up adding a small dynamic dictionary scheme to it as well (something like 32 tiles to be referenced from within the frame). Probably won't save much, but as this point there is no big savings. It's now down to all the little savings I can get :P
Hehe yeah and you realize that cpu cycles start to miss :p I initially had 7 sorts of compression schemes for tiles and 3 binary tree dictionaries. I'm my last version i only use a single fixed size dictionary X'D
tomaitheous
11-02-2012, 09:39 PM
CPU with 8bit data bus, but that is the case of the HuC6280 right ? It's really weird to have a 16 bit port and connect a 8 bit cpu to it...
It is a bit strange. Given that the VDC, subpalette, VCE, etc design was all new (and expensive).. why then went with an 8bit processor is strange. Hudson developed all three chips (cpu,vdc,vce) in the system. The VDC is used in a couple arcade setups, though kind of rare. The HuC6280 is also used in arcades, but only as a sound processor and usually ran at 1mhz (and at that, they ignore on the on chip audio channels). Anyway, it's strange because the 6280 isn't just a repackaged 65C02. It's based on the 65C02S (which was a special Rockwell version which had additional instructions that WDC later added to the 65C02 late models) and additional instructions and some of the instructions are made specifically for interfacing with the VDC or 16bit devices in general (16bit ports). That and the banking is all internal unlike other 8bit processors (the external address range is 21bit). I'm sure making a custom processor like the 6280 wasn't cheap. And to run at 7.16mhz too. There's a reason why the original 65x processors didn't hit such high clock speeds, the memory access timings are REALLY tight on those processors so you need much faster and more expensive ram (not as bad as the 65816, but still up there). The external memory timing requirements on the 6280 are more relaxed in that respect. The CPU clock cycles are 139ns and you can get away with 120ns memory. A 65x supposedly can't at this speed (I forget all the details off hand). It also has block transfer instructions like the 65816, but before it came out (and it has more 'modes' of them than the '816s). CPU instruction cycles are changed too. There's no more 8bit page boundary cycle penalties like the 8bit 65x's, but now the one extra cycle is included for most bus operand instructions. It was a pretty big redesign. I'm sure it wasn't cheap to design/make. Why not a TI 16bit processor or a motorola? My only guess is that they wanted to 'woo' over famicom developers with a similar processor and the 65816 wasn't ready in time. One big time Hudson employee that was part of the PCE design, stated the system hardware and chips specs were design by software engineers, not hardware engineers (for whatever that means).
The other weird thing, when coding for the console, is that all VDC access and addressing is done in WORDs. Not byte addressing. So you have 32kwords instead of 64kbytes. It's a little strange when you go back and fourth between byte addressing on the cpu and word addressing on the gpu.
That's a really big advantage versus others system here :) Maybe because of the 8 bits CPU versus the fast 16 bits VDC.
What is the max speed the cpu can feed up the VDC ?
Simple " lda [indirect],y store PORT" is ~41bytes per scanline. TIA block transfer is to a 16bit port is 6 cycles a byte, but the last bank of the PCE is #$ff known as the hardware bank. And any writes to $0000-07ff address range gets a 1 cpu cycle wait state (regardless of what's there, but $0800-$1fff has no wait cycle), so 7 cycles per byte to the VDC... ~65bytes per scanline. There are three special opcodes that write directly to the VDC even if the hardware bank isn't mapped - ST0/ST1/ST2 (store immediate byte to VDC corresponding port). You can setup graphics as embedded opcodes (works great if you have this code in ram for self modification. Some demos I've done do this with these opcodes). That's 5 CPU cycles per byte, so... ~91 bytes per scanline is the max/fastest the cpu can write to the VDC. At 91 bytes per scanline, you still won't saturate the VDC.
The VDC runs at three different speeds, depending on the DOT clock (resolution). 5.37mhz, 7.16mhz, 10.74mhz. The VCE register controls this, divides the master clock, and sends it to the VDC. So the VDC has no idea what speed it's running at. The VCE also builds out the NTSC signal/timing frame and sends Hsync and Vsync to the VDC (VDC can generate its own, or it can run in slave mode where external sync defines it). The VDC has all sorts of register for controlling the width of hblank, hsync, active display, active display line end blanking, starting blanked scanlines, total scanlines (a max of 512 scanlines), vblank length, etc. It's pretty complicated. You can setup the VDC to generate multiple 'VDC' frames insides a single VCE frame. 'Cause if it doesn't get hsync from the VCE, it times out and just goes on. So, you can do multiple vertical frames as well as multiple scanlines on a single VCE scanlines. You can also change the speed of the DOT clock at any time and any point during a scanline. But I digress. The VDC during active display has 8 memory access slots. Each access slot is one WORD wide (read or write). Each slot is 1 VDC pixel long. It's broken down into 8 dot clock cycle segments. The CPU is given 4 of those DOT clock access slots. So the VDC gives the CPU up to four WORDs of access per 8 pixels, and that speed depends on the speed of the DOT clock. Low res mode (5.37mhz) is like 341bytes could be written during the scanline (assuming nothing during the second half of hblank like sprite data, lowers that number). It's basically fast because it's not fetching from a second tilemap and a second BG layer. During vblank or screen turned off, cpu has access to all 8 access slots. That's a crazy amount of bandwidth that the CPU will never touch.
Christuserloeser
11-03-2012, 02:01 PM
No download link?
There's no download because it doesn't exist.
- or to quote "evildragon": "because for some reason, when it runs, it corrupts the HD it is played from."
Same guy who ported Sonic to NES.
tomaitheous
11-03-2012, 09:39 PM
1bpp display mode on the Genesis:
Both planes A and B need to be 64x64 size. Res needs to be H32 cell mode. You have 256 tiles loaded into vram and remain unchanged. Only the first top row of pixels of the tiles matters (the first 8 pixels of the tiles). You need to setup the scroll table for line resolution. The first 64x64 map (plane A) is divided into four 32x32 maps. The scroll table is setup so that the next scanline points to the next row in the tilemap with the horizontal position at 0 each line. This gives you 64 bitmap scanlines going down the screen. Next, you restart the same layout but you have X position set to $100. This will give you the next 64 scanlines. So you have plane A giving you the top 128 scanlines. Next, you setup plane B the exact same way. Plane B should start right where plane A ends (i.e. they don't overlap). This will give you 256 scanlines total, but you only need what you want to show (192,224,240,whatever).
Now, the preloaded set of tiles in vram correspond to the first 8bits of the tilemap. A normal byte that contains eight 1bit pixels gets written to the tilemap. Thus, 1bit pseudo-bitmap on the Genesis. Double buffer should be possible because you can change where the plane A and B data are read from in vram (plane A is coarser, but still fine for this). Unless you can use some mixed SMS/Genesis VDP mode (that I don't know about) to exploit only writing 1 byte to the VDP for increments of 2, you're gonna have to waste bandwidth and VDMA (or manually cpu copy) that upper unneeded byte of the tilemap to vram as well. So instead of only 6144bytes (for 256x192) to copy, you'll need double that.
Anyway, some downsides and some upsides. The upside is that you have planar mode.... kind of (I always think of 1bpp as single plane planar graphics :P). So you have some exploits/tricks that go along with this mode. The other thing is that it requires thee minimal space on a cart to store (and no conversion to 4bpp). Downside is copying of the redundant byte of the tilemap. But at 6144 bytes per frame (assuming 30fps), it should be doable. Especially of you turn off the display early to gain extra bandwidth (but I don't see that as necessary. I mean, you can put the VDP in H40 just for vblank to get a faster VDMA and turn it back to H32 before the display starts, right? I thought Fonzie said he got that trick to work).
Any ideas of building on this trick for Bad Apple?
There's no download because it doesn't exist.
- or to quote "evildragon": "because for some reason, when it runs, it corrupts the HD it is played from."
Same guy who ported Sonic to NES.
I didn't forget (I specific remember that thread when it was new over at spritesmind).
evildragon
11-04-2012, 12:39 AM
If it doesn't exist, how did I run it on camera then? The sonic thing was clearly fake. I was bored, young, and an asshole. It was also the most clearly faked screenshots ever (that somehow people actually fell for it).
But to be honest, what's stopping people from just making an animated GIF, and running a DOS GIF player? It'd probably work, just not that fast.
In honesty, you should just consider my program an animated GIF player, but not GIF, plays RLE frames, and instead of using BIOS calls like all software, uses direct MCGA hardware access. BIOS calls are always going to be slower than direct calls. Maybe that would then be more believable? With that in mind, an 8086 can quite easily pull this off. Especially since my 8086 is loaded with flash storage.
It is a bit strange. Given that the VDC, subpalette, VCE, etc design was all new (and expensive).. why then went with an 8bit processor is strange. Hudson developed all three chips (cpu,vdc,vce) in the system. The VDC is used in a couple arcade setups, though kind of rare. The HuC6280 is also used in arcades, but only as a sound processor and usually ran at 1mhz (and at that, they ignore on the on chip audio channels). Anyway, it's strange because the 6280 isn't just a repackaged 65C02. It's based on the 65C02S (which was a special Rockwell version which had additional instructions that WDC later added to the 65C02 late models) and additional instructions and some of the instructions are made specifically for interfacing with the VDC or 16bit devices in general (16bit ports). That and the banking is all internal unlike other 8bit processors (the external address range is 21bit). I'm sure making a custom processor like the 6280 wasn't cheap. And to run at 7.16mhz too. There's a reason why the original 65x processors didn't hit such high clock speeds, the memory access timings are REALLY tight on those processors so you need much faster and more expensive ram (not as bad as the 65816, but still up there). The external memory timing requirements on the 6280 are more relaxed in that respect. The CPU clock cycles are 139ns and you can get away with 120ns memory. A 65x supposedly can't at this speed (I forget all the details off hand). It also has block transfer instructions like the 65816, but before it came out (and it has more 'modes' of them than the '816s). CPU instruction cycles are changed too. There's no more 8bit page boundary cycle penalties like the 8bit 65x's, but now the one extra cycle is included for most bus operand instructions. It was a pretty big redesign. I'm sure it wasn't cheap to design/make. Why not a TI 16bit processor or a motorola? My only guess is that they wanted to 'woo' over famicom developers with a similar processor and the 65816 wasn't ready in time. One big time Hudson employee that was part of the PCE design, stated the system hardware and chips specs were design by software engineers, not hardware engineers (for whatever that means).
I'm not sure it was that expensive, of course it was expensive in research but then probably not too much in production (the 6502 was the cheapest CPU to produce because of the very simplistic die). There are cost advantages of packing bank and others IO circuits on the chip. You said the 6280 includes some extras 16 bits port instructions ? do you mean the 6280 can take up the advantage of the 16 bits VDC port ? I discussed with a PCE fan which told me the fastest way to feed up the VDC was still the block transfer instruction.
Also about the speed, i heard the 7.16 Mhz speed was rarely used with HuCard on japanese system because of overheating problem so it looks like a part of the solution to get that speed was a nice VDD increase ;)
Also later WCD 65C02 were able to raise high frequency as 14 Mhz but probably redesigned the chip to reduce consumption.
As you said one of the reason of using that chip would be to stole NES devers, also the high frequency coupled to the Mhz efficiency of the CPU was really impressive at that time.
Still i believe you could do a lot more with a 68000, the 16/32 bits architecture help a lot in some situations and the C compiler produce better code with these type of CPU, which is a big advantage for developers...
The other weird thing, when coding for the console, is that all VDC access and addressing is done in WORDs. Not byte addressing. So you have 32kwords instead of 64kbytes. It's a little strange when you go back and fourth between byte addressing on the cpu and word addressing on the gpu.
That really looks like they probably changed some stuffs lately in their design but who know...
Simple " lda [indirect],y store PORT" is ~41bytes per scanline. TIA block transfer is to a 16bit port is 6 cycles a byte, but the last bank of the PCE is #$ff known as the hardware bank. And any writes to $0000-07ff address range gets a 1 cpu cycle wait state (regardless of what's there, but $0800-$1fff has no wait cycle), so 7 cycles per byte to the VDC... ~65bytes per scanline. There are three special opcodes that write directly to the VDC even if the hardware bank isn't mapped - ST0/ST1/ST2 (store immediate byte to VDC corresponding port). You can setup graphics as embedded opcodes (works great if you have this code in ram for self modification. Some demos I've done do this with these opcodes). That's 5 CPU cycles per byte, so... ~91 bytes per scanline is the max/fastest the cpu can write to the VDC. At 91 bytes per scanline, you still won't saturate the VDC.
91 bytes scanline is really nice but i guess that's a particular case of VDC filling (with the specific ST0/ST1/ST2 instructions) more that real memory transfer ?
Still even 65 bytes scanline permit a lot as you can transfert as almost full speed even in active period.
On Genesis you are really limited in active period and you would prefer to use the blanking period and DMA so you can transfer up to ~200 bytes / scanline.
What i don't understand is why they did a 8 bits port for the VRAM where CRAM and VSRAM are 16 bits (and so have double tranfert rate compared to VRAM), of course this is all technical choices i probably can't understand but still that's really lame for developers to see CRAM being filled at 400 bytes / scanline...
At least that permitted the cool Direct Color mode demo :D
http://gendev.spritesmind.net/forum/viewtopic.php?t=1203
The VDC runs at three different speeds, depending on the DOT clock (resolution). 5.37mhz, 7.16mhz, 10.74mhz. The VCE register controls this, divides the master clock, and sends it to the VDC. So the VDC has no idea what speed it's running at. The VCE also builds out the NTSC signal/timing frame and sends Hsync and Vsync to the VDC (VDC can generate its own, or it can run in slave mode where external sync defines it). The VDC has all sorts of register for controlling the width of hblank, hsync, active display, active display line end blanking, starting blanked scanlines, total scanlines (a max of 512 scanlines), vblank length, etc. It's pretty complicated. You can setup the VDC to generate multiple 'VDC' frames insides a single VCE frame. 'Cause if it doesn't get hsync from the VCE, it times out and just goes on. So, you can do multiple vertical frames as well as multiple scanlines on a single VCE scanlines. You can also change the speed of the DOT clock at any time and any point during a scanline. But I digress. The VDC during active display has 8 memory access slots. Each access slot is one WORD wide (read or write). Each slot is 1 VDC pixel long. It's broken down into 8 dot clock cycle segments. The CPU is given 4 of those DOT clock access slots. So the VDC gives the CPU up to four WORDs of access per 8 pixels, and that speed depends on the speed of the DOT clock. Low res mode (5.37mhz) is like 341bytes could be written during the scanline (assuming nothing during the second half of hblank like sprite data, lowers that number). It's basically fast because it's not fetching from a second tilemap and a second BG layer. During vblank or screen turned off, cpu has access to all 8 access slots. That's a crazy amount of bandwidth that the CPU will never touch.
Hehe fun they gave access to a so low level stuff on the VDC, you can probably use that to do unexpected effect :D
1bpp display mode on the Genesis:
Both planes A and B need to be 64x64 size. Res needs to be H32 cell mode. You have 256 tiles loaded into vram and remain unchanged. Only the first top row of pixels of the tiles matters (the first 8 pixels of the tiles). You need to setup the scroll table for line resolution. The first 64x64 map (plane A) is divided into four 32x32 maps. The scroll table is setup so that the next scanline points to the next row in the tilemap with the horizontal position at 0 each line. This gives you 64 bitmap scanlines going down the screen. Next, you restart the same layout but you have X position set to $100. This will give you the next 64 scanlines. So you have plane A giving you the top 128 scanlines. Next, you setup plane B the exact same way. Plane B should start right where plane A ends (i.e. they don't overlap). This will give you 256 scanlines total, but you only need what you want to show (192,224,240,whatever).
Now, the preloaded set of tiles in vram correspond to the first 8bits of the tilemap. A normal byte that contains eight 1bit pixels gets written to the tilemap. Thus, 1bit pseudo-bitmap on the Genesis. Double buffer should be possible because you can change where the plane A and B data are read from in vram (plane A is coarser, but still fine for this). Unless you can use some mixed SMS/Genesis VDP mode (that I don't know about) to exploit only writing 1 byte to the VDP for increments of 2, you're gonna have to waste bandwidth and VDMA (or manually cpu copy) that upper unneeded byte of the tilemap to vram as well. So instead of only 6144bytes (for 256x192) to copy, you'll need double that.
Anyway, some downsides and some upsides. The upside is that you have planar mode.... kind of (I always think of 1bpp as single plane planar graphics :P). So you have some exploits/tricks that go along with this mode. The other thing is that it requires thee minimal space on a cart to store (and no conversion to 4bpp). Downside is copying of the redundant byte of the tilemap. But at 6144 bytes per frame (assuming 30fps), it should be doable. Especially of you turn off the display early to gain extra bandwidth (but I don't see that as necessary. I mean, you can put the VDP in H40 just for vblank to get a faster VDMA and turn it back to H32 before the display starts, right? I thought Fonzie said he got that trick to work).
Any ideas of building on this trick for Bad Apple?
Oh i see the idea, really neat ! I'm always very fond of this tricks :)
Why not use 32x128 plan size instead by the way ?
So you could keep the HScroll unchanged (always 0) and only increase the vertical scroll of 8 at each H interrupt instead of doing +8 -1 +8 -1...
About the bandwidth, you are right, we have to double it. For a 256x224 frame it requires 14336 bytes.
For a 30 FPS video that give 7168 bytes / frame.. That is just possible to transfer all that in VBlank if you use the H40 mode switch :p
What is nice with this method is that you could use a fast decoder to unpack the 1bpp bitmap (which should compress fairly well in RLE) during the active period :)
Honestly i already spent many time on my bad apple version which is not yet finished (but close to) so i won't restart something new about it ;)
By the way here is the last version :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple8.bin
This time I am finally full speed or very close to :) I believe there is still 2 or 3 few part with a minor frame rate drop.
I back to several compression schemes even for tiles (though simplified compared to firsts ones) but i had to pass all the tile unpacking code to assembly as some tilemap unpacking to get it running fast enough...
The tilemap also uses several compression schemes (plainAndPix, Deriving previous, RLE, RLE binary...)but that is less relevant as the tiles are far away in term of cpu usage :p
Note that i fixed the address error on real hardware present in version 7 (too much optimization :p) so it's now back working on real hardware :)
Here are the numbers :
RAW tiles size : 59709440
Packed tiles size : 3252400 - 26019200
Tiles repartition :
name | number | size in bytes
--------------------------------------------------
Plain tiles 1605845 packed in tilemap
RAW tiles 4758 157014
PlainAndPix 43562 339980
Dico 122670 1845700
DeriveSame 7931 67145
DeriveOther 77809 839216
Copy 3345 3345
tilemap packed size : 195945 bytes
dr apocalipsis
11-04-2012, 12:27 PM
You nailed it. It's impressive to see it side by side with the youtube version and not lose sync.
A silly idea that could give it a more flashy look could be to add subs once you have added the PCM. Over the video and optional, of course.
Yeah almost done :D It has been lot of effort but i am happy it's finally there.
Too bad i can't make it fit in 4MB, even separated in 2 parts it will be difficult with PCM data.
Subs may come later i guess... still have to figure how to make PCM data fit in now :)
I made severals tests for the PCM part and i finally choose to use a modified version of the SGDK 4 bits ADPCM driver. The modification just permit to downgrade the sample rate to 13 Khz so the PCM just fit in the ROM.
Still i am very unsatisfied with that solution and i will probably develop a new driver to avoid that horrible distortion i get on real hardware (but not on emulator). This is due to the heavy DMA I am doing which make the Z80 to stall for a long time when it want to access the 68k BUS. I can't avoid DMA as i can transfer up to 4640 bytes per vblank and i cannot do it with the CPU.
To avoid the distortion on real hardware i will need to develop a driver which can buffer samples in active period and avoid any 68k bus access during vblank. That mean i will have to use the V interrupt to synchronize the Z80 processing... tricky for the PCM timings but possible :)
By the time, here are the "pre final" version, i had to split the rom in 2 parts to have the complete video sequence :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple9_p1.bin
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple9_p2.bin
Don't try it on real hardware as it sounds really awful :p
Raijin
11-08-2012, 05:42 PM
This is quite nice. I don't think it sounds good on emulator either though, as it is very scratchy, and the voice is drowned out by everything else. Possibly overpeaked also? Not sure how hard it would be to optimize if you use PCM only.
What about using an FM/PSG version instead? Not sure of the original BPM of the song, but it shouldn't be hard to match.
tomaitheous
11-08-2012, 06:01 PM
The 68k can read/write z80 ram, right? Just not the z80 accessing 68k ram?
evildragon
11-08-2012, 06:40 PM
I thought the z80 accessed 68k ram when in master system compatibility mode
This is quite nice. I don't think it sounds good on emulator either though, as it is very scratchy, and the voice is drowned out by everything else. Possibly overpeaked also? Not sure how hard it would be to optimize if you use PCM only.
What about using an FM/PSG version instead? Not sure of the original BPM of the song, but it shouldn't be hard to match.
My base sample is not very good and have the saturation problem which make voices not very audible. If someone has a better version of the sample i would be very grateful :)
When i listen on youtube the sample sounds ok but as soon i try to extract the MP3 from it I obtain a saturated sample, and unfortunately i cannot make any record from the stereo mix as my sound card is a bitch...
Also take in consideration that I have to make 2 minutes of sample fit in 700 KB so the quality cannot be as good the original as i need to compress it a lot...
I used 13 Khz 4 bit ADPCM compression to make it fit in 700 KB and honestly i think that it does not sound "that bad" compared to the original. We easily heard the lower sample rate but distortion are acceptable :)
I somehow plan to do a better quality version if i find the time for that. I will try to make a 8 MB version of the demo so i could use 22 Khz ADPCM instead as the movie will have better compression.
FM/PSG version ??? well I just can't do that, I don't have any knowledge in music composition :-/ and honestly I'm not sure about what you can achieve with FM/PSG for that type of song, voices bring a lot here :)
The 68k can read/write z80 ram, right? Just not the z80 accessing 68k ram?
Yeah the 68k can read/write the Z80 ram... by taking its bus and so stopping it :)
Without that possibility you won't have any way of uploading the driver to Z80 ram.
The Z80 cannot access 68k RAM, just the ROM. At least Sega said you cannot access it but it turns out that you can write 68k ram from Z80 on real hardware (read return garbage).
Raijin
11-08-2012, 11:32 PM
Here's an small YM2612/SN76489 demonstration I made just to show you what it could sound like.
https://dl.dropbox.com/u/59931286/Bad%20Apple%21%21.vgm
It's VGM format 1.60, so you'll need VGMPlay, in_vgm or an updated foo_gep to play it.
Edit: Actually not everyone will have or want those VGM players, so here is an mp3 also.
https://dl.dropbox.com/u/59931286/Bad%20Apple%21%21.mp3
Edit 2: Replaced first link with VGM 1.50.
Chilly Willy
11-09-2012, 12:14 AM
Here's an small YM2612/SN76489 demonstration I made just to show you what it could sound like.
https://dl.dropbox.com/u/59931286/Bad%20Apple%21%21.vgm
It's VGM format 1.60, so you'll need VGMPlay, in_vgm or an updated foo_gep to play it.
Edit: Actually not everyone will have or want those VGM players, so here is an mp3 also.
https://dl.dropbox.com/u/59931286/Bad%20Apple%21%21.mp3
Played the VGM on real hardware - whatever you used to generate the VGM has a common problem - the init at the start doesn't enable the DAC, so drums don't play.
Raijin
11-09-2012, 12:31 AM
Oh great... Well, I used VGM Music Maker. If this is a bug with the tracker, it will never be fixed...
I wonder if dumping it to VGM 1.50 would fix the problem. Doesn't hurt to try anyway. I'll update that link above and replace it with a 1.50 link instead.
I am unable to test this on Hardware since I don't have a flash cart (or a VGM player that plays on Hardware, if that exists?).
Also, not sure if this would be the problem, but I calculated the BPM of the original to 138~ BPM. In order to avoid having to track at 900BPM, I used a custom clock speed (55Hz) in the tracker to get as close to the right BPM as I needed, in this case, that would be 275 (276 is double 138, so that's as close as I could get). I honestly don't think that would be the problem though, because everything still plays at NTSC frequency.
Your VGM sounds really great :) I will be really glad to do a FM version of the demo if you plan to finish it ;)
I think we have a Z80 VGM player with PCM support (but PCM play at limited rate as far i remember), though i am not sure about the technical BPM issue you are talking about. The demo is done to work only on NTSC system (as the base video is 30 FPS) so it should not be a problem i guess.
dr apocalipsis
11-09-2012, 10:46 AM
What a funny presentation.
One thing I noticed is that you barely make any use of gray shades to reduce aliasing. Probably that would hurt performance, increase artifacts or even break some frames. But if it is possible to play with the contrast before apply the video conversion, I'm sure it would enhance the video IQ so much.
As I understood you are using youtube as source video, I tried to find a better version. Unfortunately, the better quality videos I found have embed subtitles. So I just extracted sound track:
http://www.4shared.com/file/8GQB0sod/BAD_APPLE.html
Source claims it's the original lossless audio recut to sync with video.
Raijin
11-10-2012, 01:08 AM
Hmm... Well, a quick question just to make sure. I was messing around a bit with it today and 275BPM was a bit too slow compared to the original. I have managed to get it to 276 in the tracker, but I had to bump up the Clock speed to 92Hz!
Now my question is, how does the conversion process work exactly? Do VGM's play at the specified Clock speeds (in my case 92Hz), or is there some sort of converting going on in the process to bring it back to 60Hz, but then act as if I would have tracked this at a high BPM like 900 to get the tempo even? I somehow doubt that though, but I just want to be sure.
If that isn't the case, would playing back a 92Hz file on HW damage the Genesis, or maybe not play at all, and is there a work around somewhere while implimenting the track into a ROM to initiate the DAC as it's supposed to? The latter I am not sure about, but Chilly Willy said it is a common problem, but I haven't done anything like this before, so I am curious about it.
If all seems well, I would probably be interested in finishing the track, though I wouldn't be able to dedicate full time to it. Of course, it would be finished as soon as I can if I do decide to continue.
...and sorry about the slight derail here, but it's definately not worth starting a new thread over.
VGMs are sample based, and all chip events are quantisized to 44100Hz from what I know. There is no "Hz"
Chilly Willy
11-10-2012, 12:38 PM
When I can contact the author, I usually help them fix the issue. For example, DefleMask used to have the same trouble when he first added VGM support, but I got in contact and he fixed the problem. VGM-MM might be a problem since Shiru has left the scene at the moment.
Raijin
11-10-2012, 06:25 PM
VGMs are sample based, and all chip events are quantisized to 44100Hz from what I know. There is no "Hz"
Ok, thanks for that info.
When I can contact the author, I usually help them fix the issue. For example, DefleMask used to have the same trouble when he first added VGM support, but I got in contact and he fixed the problem. VGM-MM might be a problem since Shiru has left the scene at the moment.
Ah, I see. Man, I have no idea how to get in touch with Shiru anymore. He seemed to have vanished because I don't see him around anywhere these days, though to be honest, I don't really frequent many forums. As far as fixing that issue goes, it seems like a lost cause. If only there were a such thing as miracles :P
tomaitheous
11-10-2012, 07:04 PM
Ok, thanks for that info.
Ah, I see. Man, I have no idea how to get in touch with Shiru anymore. He seemed to have vanished because I don't see him around anywhere these days, though to be honest, I don't really frequent many forums. As far as fixing that issue goes, it seems like a lost cause. If only there were a such thing as miracles :P
He's active. I've seen him around recently on other forums. He's been doing a lot of NES dev stuff lately.
What a funny presentation.
One thing I noticed is that you barely make any use of gray shades to reduce aliasing. Probably that would hurt performance, increase artifacts or even break some frames. But if it is possible to play with the contrast before apply the video conversion, I'm sure it would enhance the video IQ so much.
As I understood you are using youtube as source video, I tried to find a better version. Unfortunately, the better quality videos I found have embed subtitles. So I just extracted sound track:
http://www.4shared.com/file/8GQB0sod/BAD_APPLE.html
Source claims it's the original lossless audio recut to sync with video.
Thanks for the sample, indeed the quality is really better than mine :)
I replaced roms in drop box links, again sound awful on real hardware as i didn't fixed the DMA stuff...
Hmm... Well, a quick question just to make sure. I was messing around a bit with it today and 275BPM was a bit too slow compared to the original. I have managed to get it to 276 in the tracker, but I had to bump up the Clock speed to 92Hz!
Now my question is, how does the conversion process work exactly? Do VGM's play at the specified Clock speeds (in my case 92Hz), or is there some sort of converting going on in the process to bring it back to 60Hz, but then act as if I would have tracked this at a high BPM like 900 to get the tempo even? I somehow doubt that though, but I just want to be sure.
If that isn't the case, would playing back a 92Hz file on HW damage the Genesis, or maybe not play at all, and is there a work around somewhere while implimenting the track into a ROM to initiate the DAC as it's supposed to? The latter I am not sure about, but Chilly Willy said it is a common problem, but I haven't done anything like this before, so I am curious about it.
If all seems well, I would probably be interested in finishing the track, though I wouldn't be able to dedicate full time to it. Of course, it would be finished as soon as I can if I do decide to continue.
...and sorry about the slight derail here, but it's definately not worth starting a new thread over.
I don't have a clue about how VGM driver handle internal speed and timings. for the DAC enable problem, as you said i guess we can easily hack it in the rom or the driver ;)
About the music itself, do it only for our pleasure if you want to :) I would be happy to integrate it of course but so it at your time and when you want, i still have to work out the DMA issue on the Z80 PCM driver ;)
Chilly Willy
11-11-2012, 06:22 PM
I don't have a clue about how VGM driver handle internal speed and timings. for the DAC enable problem, as you said i guess we can easily hack it in the rom or the driver ;)
About the music itself, do it only for our pleasure if you want to :) I would be happy to integrate it of course but so it at your time and when you want, i still have to work out the DMA issue on the Z80 PCM driver ;)
On PCs, you ignore everything but the wait commands. Any reasonably modern PC is infinite in speed compared to the timing for the VGM. On a MegaDrive, you have to keep careful track of the cycles spent processing the data, writing registers, etc. For example, here's the wait one tick command on my MD VGM player that's part of the Myth menu:
wait_1:
addi.l #173-118,d5
0:
subi.l #26,d5 /* 16 for subi, 10 for bpl */
bpl.b 0b
bra read_cmd
Raijin
11-11-2012, 09:35 PM
I talked with ValleyBell yesterday a bit about it, and he said pretty much that there should be no problems, so I will start to work on it more now.
On PCs, you ignore everything but the wait commands. Any reasonably modern PC is infinite in speed compared to the timing for the VGM. On a MegaDrive, you have to keep careful track of the cycles spent processing the data, writing registers, etc. For example, here's the wait one tick command on my MD VGM player that's part of the Myth menu:
wait_1:
addi.l #173-118,d5
0:
subi.l #26,d5 /* 16 for subi, 10 for bpl */
bpl.b 0b
bra read_cmd
That is 68k code but i guess you can have similar code for Z80 :)
I don't remember where but i saw that PCM play command can be somehow high level in the VGM format. I mean you just give a command as "play PCM #23" then it start playing the specified PCM sample. Can it be used when you do composition for megadrive ? or you have to deal with the "send DAC data" & "wait command" as the GYM format ? In the second case, it would mean the VGM does not bring much for MD than GYM already does... and that explain why some VGM files are that big even for repeated samples... At least it's simpler to implement the driver.
I talked with ValleyBell yesterday a bit about it, and he said pretty much that there should be no problems, so I will start to work on it more now.
Cool :) VGM music will definitely make the demo more "gennish" :)
morcar
11-12-2012, 10:56 AM
holy shit guys that is impressive. I thought I was watching it from a much later system (Saturn or maybe a PS2)
That goes to show just how great the Megadrive/Genesis was and still is.
Chilly Willy
11-12-2012, 02:59 PM
That is 68k code but i guess you can have similar code for Z80 :)
I don't remember where but i saw that PCM play command can be somehow high level in the VGM format. I mean you just give a command as "play PCM #23" then it start playing the specified PCM sample. Can it be used when you do composition for megadrive ? or you have to deal with the "send DAC data" & "wait command" as the GYM format ? In the second case, it would mean the VGM does not bring much for MD than GYM already does... and that explain why some VGM files are that big even for repeated samples... At least it's simpler to implement the driver.
On real hardware, you might switch back and forth between channel 6 as FM and as PCM. You HAVE to change the channel 6 settings explicitly for which you want or you'll have trouble (playing FM when set for PCM or vice versa). PC VGM players tend to leave channel 6 set for FM and play the PCM straight instead of sending it to the YM2612 emulated chip as channel 6 PCM. Real hardware cannot do that.
VGM has a play pcm and wait command. I handle the wait similar to the code above after setting the PCM. There will be a difference in the constant due to the overhead of setting the PCM, but the wait itself will be the same code. That was just an example of cycle timing for busy looping on the MD for timing purposes.
On real hardware, you might switch back and forth between channel 6 as FM and as PCM. You HAVE to change the channel 6 settings explicitly for which you want or you'll have trouble (playing FM when set for PCM or vice versa).
I was speaking for that particular case, it's just a hack not a suitable solution for a real VGM driver :)
But some music really does that ? i mean change the channel 6 configuration to play PCM then set it back to play FM when there is no PCM ?
PC VGM players tend to leave channel 6 set for FM and play the PCM straight instead of sending it to the YM2612 emulated chip as channel 6 PCM. Real hardware cannot do that.
Indeed ! That may be the cause of the problem in the VGM exporter.
VGM has a play pcm and wait command. I handle the wait similar to the code above after setting the PCM. There will be a difference in the constant due to the overhead of setting the PCM, but the wait itself will be the same code. That was just an example of cycle timing for busy looping on the MD for timing purposes.
Ok, i though that in the specific MD case the play PCM command wasn't used and it was using "write DAC port" instead for each sample byte... which would be a real waste in term of size.
Good to know the play PCM command can be used actually :)
holy shit guys that is impressive. I thought I was watching it from a much later system (Saturn or maybe a PS2)
That goes to show just how great the Megadrive/Genesis was and still is.
Hehe thanks :) Don't forget that Sega Genesis has blast processing :D
Chilly Willy
11-12-2012, 04:38 PM
I was speaking for that particular case, it's just a hack not a suitable solution for a real VGM driver :)
But some music really does that ? i mean change the channel 6 configuration to play PCM then set it back to play FM when there is no PCM ?
I don't remember which, but I've run across a couple VGMs that switch between PCM and FM.
Ok, i though that in the specific MD case the play PCM command wasn't used and it was using "write DAC port" instead for each sample byte... which would be a real waste in term of size.
Good to know the play PCM command can be used actually :)
There is really only one PCM command - "write PCM from the bank and wait X samples", where X is 0 to 15. Any other usage of the DAC requires using the proper "write FM reg" command.
evildragon
11-12-2012, 04:47 PM
On AudioOverload for the Mac (or Linux), you can toggle VGM's YM2612 emulated chips channel 6, and either if it plays PCM or FM, that channel checkbox does control it.
evildragon
11-14-2012, 03:05 AM
Now call me crazy, but perhaps the ORIGINAL song that Bad Apple remixes, should be used.
http://www.youtube.com/watch?v=Fn5l02pFu0k
It was in FM after all. ;) But then again, it's not as exciting.
evildragon
11-14-2012, 03:10 AM
Also, for this demo, I had an idea, and don't know if it can be done. I'm not too familiar with redbook, but the Sega CD CAN read subchannel data, as it can play CD+G.
Could this demo be worked so it plays CD-Audio for the original demos music, but use redbooks subchannel data to try and stream the video data? I know it won't be fast at all, and very slow, but maybe it could be pre-buffered in the Sega CD's RAM, and as the audio CD starts to play and the video starts to play, try and read the subchannel data to fill up the RAM with enough video data to finish?
Of course, this is hypothetical, assuming we even have a burner that can properly burn that sub-channel data.
Chilly Willy
11-14-2012, 02:21 PM
You only have 72 bytes per frame of subchannel data you can use (the other 24 bytes are already dedicated to other uses). 72 bytes every 1/75th of a second is next to nothing... which is why CD+G is mostly lines of text. It's meant for showing lyrics for the music, not videos.
Better would be to have the track in data mode and use PCM mixed with the video data... like FMV. ;) :D
evildragon
11-15-2012, 06:23 PM
Then sound quality is limited. ;)
Joe Redifer
11-15-2012, 08:46 PM
The best way would be to have the video play off of cartridge and tell the Sega CD to play a track off of an inserted disc and pray that it keeps sync.
Chilly Willy
11-15-2012, 11:09 PM
The best way would be to have the video play off of cartridge and tell the Sega CD to play a track off of an inserted disc and pray that it keeps sync.
Yeah, that was my thoughts as well. Use a mode 1 cart to play the video while playing music from CD. As to sync, maybe add a second of silence to the start of the track, then have the CD side send a status to the MD side just as that extra second passes. You should be able to sync to within one CD frame (1/75th a sec).
Raijin
11-21-2012, 08:12 PM
Done.
https://dl.dropbox.com/u/59931286/Bad%20Apple%21%21.vgm
VGM MM lacks the ability to "loop off", and I couldn't do it with VGM tools either, so I guess it has to be done manually or something.
evildragon
11-21-2012, 08:33 PM
I only got to hear a little bit of it, but I like this better than the other MD bad apple demo.
Done.
https://dl.dropbox.com/u/59931286/Bad%20Apple%21%21.vgm
VGM MM lacks the ability to "loop off", and I couldn't do it with VGM tools either, so I guess it has to be done manually or something.
Wow thanks Raijin it's awesome :)
Well now i need to get a good VGM Z80 driver with PCM support :D
Also can i cut easily the VGM in 2 parts ? as i will probably 2 version of the ROM, one 8 MB and one splited in 2x4 MB.
By the way, i managed to get the PCM to play correctly on real hardware, i made a modified version of my ADPCM driver specific for the demo which avoid any rom access during VBlank, so the quality on real hardware is now as best as it can for 13Khz 4 bits ADPCM (i don't have enough space for more).
Raijin
11-23-2012, 03:26 PM
I am pretty sure you can split the VGM if you need to. It might need a bit of manual patchwork first though, because VGM MM VGM files are broken, such as the PAL only timer, the DAC not initiating on Hardware, the "loop off" not existing and other things.
Though I have a question. Why would you need to split it into 2 parts? My FM version uses 3 16000Hz drum samples, and all else is FM/PSG, so wouldn't that be smaller than a full song running in the DAC?
Of course but the complete version of the demo fits in 8 MB, the video part eats already 6.5 MB :-/
I will make a 8 MB version but only some hacked emulator or Mega Everdrive can run so i will also make a classic version split in 2x4MB rom :)
Raijin
11-23-2012, 05:33 PM
Oh! Ok, now I understand.
I'm finally to a point i am satisfied with it :)
It plays smoothly and the audio is now fixed on real hardware !
The only flaw is that i had to split the video in 2 x 4 MB rom as the complete rom takes a bit less than 8 MB... which is already not that big compared to raw size Smile
Download part 1 and part 2 :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_p1.bin
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_p2.bin
The last part remaining was that sound driver, the demo intensively use the DMA and then PCM playback was very distorted on real hardware.
I fixed that by developing a specific Z80 ADPCM driver which access ROM outside vblank area and now it works perfectly on real hardware :)
I would like to do a 8 MB version of the rom (without bank access) for mega everdrive but for some obscure reason i cannot get it to work now.... i don't know if this is a bug of the hacked emulator which support that mode or not.
I also plan to do a version with FM music (done by Raijin) instead of the poor quality PCM (due to restricted space) but i need a good VGM driver for that, which can play PCM at good rate and ideally access rom outside vblank... and i don't know if that is really possible :-/
I will investigate in existing VGM Z80 drivers and play a bit with them :)
(Thanks to the moderator for having deleted the wrong topic)
Joe Redifer
11-24-2012, 05:23 PM
I didn't delete the wrong topic. I merged them. There was no reason for two topics on the same exact thing.
Yeah i know, i made a mistake ! I saw you merged them, thanks as we cannot delete topic even if only one message is present :)
Added a 8 MB version of the demo (for Mega Everdrive users mainly), you can see have details on the first page of the topic :
http://www.sega-16.com/forum/showthread.php?19027-Bad-Apple-demo-thread
Joe Redifer
11-25-2012, 07:17 PM
I just tried the 8MB version on my Mega Everdrive and the instructions screen came up but when I pressed Start I got a bunch of static sound and a black screen that said "illegal instruction!" I could enable the fps counter which was at 30 but no other buttons worked.
Thanks for the report :)
As i don't own a Mega Everdrive myself i cannot test it but i fixed some stuff in the later 4MB version i did not yet reported on the 8MB version.
That might be the problem, i will fix that when i will be back to home.
It would be nice if you could re test then ;)
evildragon
11-26-2012, 11:17 AM
When the music is changed to the VGM version rather than the PCM version, could it be shrunk to 5MB?
Unfortunately no, the PCM only eat about 1.3 MB of ROM so it can down size to 6.7 MB but not less :-/
I tried to re upload a new version of the 8 MB file but i am afraid the bug is still there as it looks like the version i posted previously was up to date.
I should ask Krizz about the support of large rom in the mega everdrive, it looks like there is some (many ?) limitations...
By the way, here are the source code of the demo :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple_src.7z
Joe Redifer
11-26-2012, 04:38 PM
It would be nice if you could re test then ;)
I'll definitely be happy to!
As i said in my previous posts i uploaded a new version of the rom on the same url :
https://dl.dropbox.com/u/93332624/dev/megadrive/demo/BadApple.bin
But i really doubt it can make any difference. Someone on the spritesmind forum also experience problems but even more on his system.
What is wierd is that some people get it to work on their mega everdrive :
http://www.youtube.com/watch?v=hOftlDELhAo
Can it be a firmware version problem ?
Raijin
11-26-2012, 06:28 PM
Unfortunately, I don't know what the issue could be either, and I don't have a Mega Everdrive (or any flash carts actually) to be able to test in any way.
But, I listened to it in an emulator and it seems very slightly desynced at the start, and gradually desyncs more as it goes. I am not sure how it is effected on Hardware, but in that video it seems like it is not desynced, so I am guessing that Hardware is fine.
Joe Redifer
11-26-2012, 07:11 PM
Here's the issue:
It doesn't work on Genesis Model 1, at least not my VA2.
It DOES work great on a model 2! Not sure if anyone knows why this would be or how to fix the issue.
I gotta say, though, this 8MB version is awesome. Definitely the most impressive Bad Apple demo I've seen.! Great job!
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.