It isn't much? Echo's sample rate is 10250 Hz, and 1Kb would hold over 1/10th of a second (over 6 NTSC frames). Are you sure that isn't much?
1KB isn't a good size anyways. If you want to keep performance, you probably shouldn't go over 256 bytes, so you don't need to update the high byte of the address (which will eat up some extra cycles).
EDIT: remember that buffering is done to speed up the code,
not to avoid ROM access like crazy. Eventually it'll reach a point that you've buffered so much it doesn't provide any serious speed gain to be worth the effort.
Of course i myself use 256 bytes buffer for mixing and is more than enough but in case you want variable pitch playback then 1 KB isn't that much (i was speaking about pre buffer, not just mixing buffer) : if the base sample frequency is 440 Hz and you want to play a 3520 Hz note (3 octaves up) then you eat your 1 Kb buffer faster than fixed 256 bytes one...
As soon as you start mixing you're screwed anyways. Either you mix two 8-bit samples and get a 9-bit output (and doing that as-is will require you to use slow 16-bit operations) or you downsample the values before hand so you get two 7-bit samples adding up to 8-bit (which is faster to process).
It really depends anyways. A smaller sample range will make you lose some control over volume, but in practice the shape of the waveform is much more important, so you could do e.g. 6-bit and still get something that sounds decent. In practice the sample rate matters more.
If you use 8 bits samples you actually lose quality *only* when overflow occurs and even with 4 channels, this doesn't happen often.
And you can use 8 bits instruction, you need to directly test overflow flag and use a simple trick to limit sample :
Code:
LD A, (BC) ; source sample
ADD (HL) ; mix
JP PO, .ok ; check overflow
LD A, $7F ; limit
ADC $0 ; A = $7F/$80 (depend if negative or positive)
.ok
LD (HL), A ; write sample in buffer
INC L ; next
I do need to use signed samples here which i unsign right before sample output.
Of course this slow down quite a bit the mix process (you can improve speed by caching constants values in registers) but i don't mind, i really don't like use 6 bits samples which sound really quiet.