Quantcast

Page 2 of 2 FirstFirst 12
Results 16 to 30 of 30

Thread: Want to start programming for the Genesis.

  1. #16
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,793
    Rep Power
    50

    Default

    Quote Originally Posted by Sik View Post
    Code:
    void vsync(void) {
       volatile uint16_t* const vdpctrl = (uint16_t*) 0xC00004;
       
       while (*vdpctrl & 0x0008);
       while (!(*vdpctrl & 0x0008));
    }
    Tell me what's wrong with that code.
    It looks fine... as long as you don't exceed -O1. Only certain platforms retain the access order for volatiles above level 1, the x86 being the only one I'm familiar with. I know for a fact that SuperH, MIPS, and M68K all allow volatiles to be reordered as to access above level 1. A quick google on the issue shows a half decade long argument over what SHOULD happen, but in the end, you're cautioned to not assume volatiles won't be reordered. Linux hardware drivers have special macros you use to access IO, and while x86 merely does a volatile access, other platforms may/do not.


    And you're overestimating GCC. The mere fact that it tried to read from memory and test the value in the register instead of testing the bit off memory directly means the optimizer isn't doing a very good job. And the piece of assembly I posted wouldn't have worked even if it was standard memory instead of hardware.

    EDIT: also, -O3 is labeled as "may generate bloat", not as "may break things". If code works in -O1 but doesn't in -O3 and it's standards compliant, then it's a bug in the compiler. And obviously the volatile qualifier is working here because GCC isn't trying to cache the value in a register.
    I see you're on THAT side of the argument.

    I'm not going to argue about what it SHOULD do since that has been debated for over five years without resolution, I'm just telling you the way it is. On SH, MIPS, and M68K, you cannot use more than -O1 for hardware references, even with volatile. I'm not sure about ARM... haven't done enough programming on it yet.


    EDIT 2: also, I said "similar to" because I used asm68k syntax instead of GAS syntax. The generated code was exactly like that.
    I'm not sure I can believe that without seeing it for myself. While no compiler is bug-free, I think something THAT bad would have been noticed right off. Perhaps you should use my guide to make a nice new 4.5.2 gcc toolchain just to be certain.

    Sorry if I sound argumentative... this is just something I had to deal with a number of times about an issue that people really get worked up about. I agree that making volatiles synchronous is OBVIOUSLY best, but the devs who work on gcc (at least the non-x86 branches) don't agree. I doubt another five years will resolve the issue, so until then, use -O1 for the file in the makefile, or use the new function attribute to set the opt level to 1 to avoid the issue.

  2. #17
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    Quote Originally Posted by Chilly Willy View Post
    It looks fine... as long as you don't exceed -O1.
    If it works in -O1 then it should work in -O3, period. It shouldn't change the program behavior at all except for performance. If the program output changes, then the compiler is doing something wrong. For the record, this very same reason is why floating point operations can't be optimized (since that'd change the rounding errors).

    Quote Originally Posted by Chilly Willy View Post
    Only certain platforms retain the access order for volatiles above level 1, the x86 being the only one I'm familiar with. I know for a fact that SuperH, MIPS, and M68K all allow volatiles to be reordered as to access above level 1. A quick google on the issue shows a half decade long argument over what SHOULD happen, but in the end, you're cautioned to not assume volatiles won't be reordered. Linux hardware drivers have special macros you use to access IO, and while x86 merely does a volatile access, other platforms may/do not.
    If we're going to argue that volatile accesses can be reordered, then we shouldn't trust them even with -O1, since -O1 still does some optimizations that could potentially lead to a change like that. Only -O0 should be trusted then.

    For the record, the whole volatile reordering thing is really more an issue of modern processors, since they usually run instructions in a different order from the program code and then there's the cache hardware deciding to flush memory in completely random orders (it may even decide to not flush writes to memory for looooooong periods of time). The 68000 is not such a processor, though.

    Note that the above also means that keeping volatile accesses in order on x86 is completely useless, since the x86 family is probably the worst offender when it comes to internal reordering. The only reason for keeping such a thing is not breaking old PC programs that relied on such accesses.

    Quote Originally Posted by Chilly Willy View Post
    I'm not going to argue about what it SHOULD do since that has been debated for over five years without resolution, I'm just telling you the way it is. On SH, MIPS, and M68K, you cannot use more than -O1 for hardware references, even with volatile. I'm not sure about ARM... haven't done enough programming on it yet.
    This is an issue with the compiler, not with the processor. Remember, GCC doesn't have a dedicated compiler for each processor, it runs the same compiler and uses the RTL to generate assembly instructions. This means that GCC will attempt to apply the same kind of optimizations regardless of processors, and this includes reordering instructions.

    For the record, this probably means that newer versions of GCC suck completely with old processors, since they have been optimized to deal with modern hardware and the requirements are completely different. Bear into mind that the -march option only limits the opcodes GCC can use, it doesn't change the optimization strategy.

    Quote Originally Posted by Chilly Willy View Post
    I doubt another five years will resolve the issue, so until then, use -O1 for the file in the makefile, or use the new function attribute to set the opt level to 1 to avoid the issue.
    I'd say that five years will most likely make the situation worse, not better, since GCC will be even more optimized for modern hardware than for old hardware like the 68000.

  3. #18
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,793
    Rep Power
    50

    Default

    I did some tests with the 4.5.2 m68k gcc on the code you posted... here's the exact code tested:

    Code:
    #include <stdint.h>
    
    void vsync(void) {
       volatile uint16_t* const vdpctrl = (uint16_t*) 0xC00004;
       
       while (*vdpctrl & 0x0008);
       while (!(*vdpctrl & 0x0008));
    }
    and here is the code generated for -O1, -O2, and -O3 respectively:

    Code:
    #NO_APP
    	.file	"test.c"
    	.text
    	.align	2
    	.globl	vsync
    	.type	vsync, @function
    vsync:
    	link.w %fp,#0
    .L2:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jne .L2
    .L4:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jeq .L4
    	unlk %fp
    	rts
    	.size	vsync, .-vsync
    	.ident	"GCC: (GNU) 4.5.2"
    Code:
    #NO_APP
    	.file	"test.c"
    	.text
    	.align	2
    	.globl	vsync
    	.type	vsync, @function
    vsync:
    	link.w %fp,#0
    .L2:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jne .L2
    .L4:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jeq .L4
    	unlk %fp
    	rts
    	.size	vsync, .-vsync
    	.ident	"GCC: (GNU) 4.5.2"
    Code:
    #NO_APP
    	.file	"test.c"
    	.text
    	.align	2
    	.globl	vsync
    	.type	vsync, @function
    vsync:
    	link.w %fp,#0
    .L2:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jne .L2
    .L4:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jeq .L4
    	unlk %fp
    	rts
    	.size	vsync, .-vsync
    	.ident	"GCC: (GNU) 4.5.2"
    In this case, the function is so simple that it can't be optimized any more by gcc. It also looks like good code... nothing funky going on. Well, ONE optimization I could see - not using the frame pointer... adding -fomit-frame-pointer to the compile options gives:

    Code:
    #NO_APP
    	.file	"test.c"
    	.text
    	.align	2
    	.globl	vsync
    	.type	vsync, @function
    vsync:
    .L2:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jne .L2
    .L4:
    	move.w 12582916,%d0
    	btst #3,%d0
    	jeq .L4
    	rts
    	.size	vsync, .-vsync
    	.ident	"GCC: (GNU) 4.5.2"

  4. #19
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    Then the GCC build that comes with the Everdrive devkit is buggy. In any case it goes to show the C code wasn't wrong =|

    EDIT: and there's one more optimization that could be done, performing btst directly on memory instead of reading the word into a register. Doing a btst from memory is just a read so the end result should be the same. And yes, it works on real hardware with the VDP.

    EDIT 2: and keeping the pointer into a register so it doesn't have to be fetched every time... Either way it's still valid code, I guess.

  5. #20
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,793
    Rep Power
    50

    Default

    Quote Originally Posted by Sik View Post
    Then the GCC build that comes with the Everdrive devkit is buggy. In any case it goes to show the C code wasn't wrong =|
    Yes, the code was fine.


    EDIT: and there's one more optimization that could be done, performing btst directly on memory instead of reading the word into a register. Doing a btst from memory is just a read so the end result should be the same. And yes, it works on real hardware with the VDP.

    EDIT 2: and keeping the pointer into a register so it doesn't have to be fetched every time... Either way it's still valid code, I guess.
    Which is why hand-tuned assembly is usually better than the best compiler with optimization generates. The pointer itself was a constant, so loading it once is clearly okay. You have two address registers free as scratch (a0 and a1). The address register indirect fetch (a0) is faster than absolute direct long ADDR.l, and takes four less bytes to encode, so you want to use that when you can. As to using the btst on the address directly, you just need to remember that it works on a byte only, so in the case of the above, you'd need to btst the address + 1. Because the volatile value was defined as a word, it fetched it before doing the test; I'd bet if it had been defined as a byte, it might have just done the btst on the address directly.

  6. #21
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    The address one could have been detected this way:
    • Check if a given constant pointer is used more than once
    • Check if there's a free address register that can be used
    • Store pointer in register if both of the above are met
    To be fair I'm surprised GCC did not find out that one... Maybe the volatile is getting in the way? What happens if the const is removed? Moreover, what happens if it's explicitly marked as register? (one of the rare situations where you'd actually want to use the register keyword, I guess)

    Quote Originally Posted by Chilly Willy View Post
    As to using the btst on the address directly, you just need to remember that it works on a byte only, so in the case of the above, you'd need to btst the address + 1. Because the volatile value was defined as a word, it fetched it before doing the test; I'd bet if it had been defined as a byte, it might have just done the btst on the address directly.
    Oh, screw that, with some hardware that could have been an issue. Gah.

  7. #22
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,793
    Rep Power
    50

    Default

    Quote Originally Posted by Sik View Post
    The address one could have been detected this way:
    • Check if a given constant pointer is used more than once
    • Check if there's a free address register that can be used
    • Store pointer in register if both of the above are met
    To be fair I'm surprised GCC did not find out that one... Maybe the volatile is getting in the way? What happens if the const is removed? Moreover, what happens if it's explicitly marked as register? (one of the rare situations where you'd actually want to use the register keyword, I guess)
    Taking out const and/or adding register did nothing to the code. Using -Os gives this:

    Code:
    #NO_APP
    	.file	"test.c"
    	.text
    	.align	2
    	.globl	vsync
    	.type	vsync, @function
    vsync:
    .L2:
    	move.w 12582916,%d0
    	move.w %d0,%ccr
    	jmi .L2
    .L4:
    	move.w 12582916,%d0
    	move.w %d0,%ccr
    	jpl .L4
    	rts
    	.size	vsync, .-vsync
    	.ident	"GCC: (GNU) 4.5.2"
    If you want UNOPTIMIZED code, try this... this is -O0 code:

    Code:
    #NO_APP
    	.file	"test.c"
    	.text
    	.align	2
    	.globl	vsync
    	.type	vsync, @function
    vsync:
    	subq.l #4,%sp
    	move.l #12582916,(%sp)
    	nop
    .L2:
    	move.l (%sp),%d0
    	move.l %d0,%a0
    	move.w (%a0),%d0
    	move.w %d0,%d0
    	and.l #65535,%d0
    	moveq #8,%d1
    	and.l %d1,%d0
    	tst.l %d0
    	jne .L2
    	nop
    .L3:
    	move.l (%sp),%d0
    	move.l %d0,%a0
    	move.w (%a0),%d0
    	move.w %d0,%d0
    	and.l #65535,%d0
    	moveq #8,%d1
    	and.l %d1,%d0
    	tst.l %d0
    	jeq .L3
    	addq.l #4,%sp
    	rts
    	.size	vsync, .-vsync
    	.ident	"GCC: (GNU) 4.5.2"

    Oh, screw that, with some hardware that could have been an issue. Gah.
    Which is why I mentioned it. Then there's the "fun" of the clr instruction... it does a read before writing 0. Why? No one knows, but that's why you usually see move.w #0,HW_ADDR instead of clr.w HW_ADDR.

  8. #23
    WCPO Agent Sik's Avatar
    Join Date
    Jan 2011
    Posts
    907
    Rep Power
    9

    Default

    I don't feel like making an one-liner, but...
    Quote Originally Posted by Chilly Willy View Post
    Code:
    	move.w %d0,%ccr
    o_O

    EDIT: shouldn't that be move byte instead of move word? CCR is the low byte of SR, and moving a word would mean overwriting the IRQ masks and such.
    Last edited by Sik; 05-27-2011 at 06:54 PM.

  9. #24
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,793
    Rep Power
    50

    Default

    Quote Originally Posted by Sik View Post
    shouldn't that be move byte instead of move word? CCR is the low byte of SR, and moving a word would mean overwriting the IRQ masks and such.
    Oddly enough, move to ccr is a WORD operation. Only the immediate logical ops on ccr are bytes, but immediate bytes are stored in the instruction as a word where the upper byte is ignored. When you move a word to the ccr, the upper byte is ignored. When you move FROM ccr, it is also a word with the upper byte set to 0.

    Only move to sr sets the upper part of sr. That instruction was made supervisor state only in the 68010 to help with user/super protection. Move from sr is the only way to see the upper part of sr, and was also made supervisor only in the 68010.

  10. #25
    Wildside Expert Stef's Avatar
    Join Date
    Aug 2011
    Location
    France
    Posts
    211
    Rep Power
    2

    Default

    For having tested GCC 68k elf a lot i can say that indeed the compiler is not that good for optimization...
    Later version are even worst because of modern CPU architecture optimizations which are totally wrong with olders CPU i guess.
    It's why i am still distributing GCC 3.4.6 with SGDK... which is a pity as it is not able to inline functions correctly :-/
    I never meet optimizations problems with GCC 3.4.6 nor GCC 4.1.X whatever is the optimization level but -O1 generally produces the best code for m68k-elf target
    If you got problems with your GCC it's probably because you used the (totally) broken SGCC compiler or XGCC which is not very reliable too.

    When it comes to ASM versus C, of course ASM could always be faster but GCC with -O1 produces enough good code for me and we keep readability. I use ASM only when GCC is very inefficient and when i need speed.

  11. #26
    Smith's Minister of War Raging in the Streets Kamahl's Avatar
    Join Date
    Jan 2011
    Location
    Portugal
    Age
    23
    Posts
    4,564
    Rep Power
    51

    Default

    You could also try this Stef:
    http://www.compilers.de/vbcc.html
    I'm using it for the Amiga and it's excellent.
    This thread needs more... ENGINEERS

  12. #27
    Wildside Expert Stef's Avatar
    Join Date
    Aug 2011
    Location
    France
    Posts
    211
    Rep Power
    2

    Default

    That looks interesting ! I wonder how difficult it could be to integrate the compiler in SGDK, it looks like i have to compile it at least and i'm not sure about the needed work for basic C lib (normally i can avoid them as i reimplemented almost all basics methods in SGDK).

  13. #28
    Smith's Minister of War Raging in the Streets Kamahl's Avatar
    Join Date
    Jan 2011
    Location
    Portugal
    Age
    23
    Posts
    4,564
    Rep Power
    51

    Default

    Quote Originally Posted by Stef View Post
    That looks interesting ! I wonder how difficult it could be to integrate the compiler in SGDK, it looks like i have to compile it at least and i'm not sure about the needed work for basic C lib (normally i can avoid them as i reimplemented almost all basics methods in SGDK).
    Can't help you with that I'm afraid. Shouldn't be too difficult, it's designed to support embedded systems.
    See: http://www.ibaug.de/vbcc/doc/vbcc_10.html#SEC109
    This thread needs more... ENGINEERS

  14. #29
    ESWAT Veteran Chilly Willy's Avatar
    Join Date
    Feb 2009
    Posts
    5,793
    Rep Power
    50

    Default

    Vbcc is nice, but you need to be aware that it cannot be used for commercial programs without specific permission from the author.

  15. #30
    Smith's Minister of War Raging in the Streets Kamahl's Avatar
    Join Date
    Jan 2011
    Location
    Portugal
    Age
    23
    Posts
    4,564
    Rep Power
    51

    Default

    Quote Originally Posted by Chilly Willy View Post
    Vbcc is nice, but you need to be aware that it cannot be used for commercial programs without specific permission from the author.
    As far as I understood that only applied to using VBCC itself in some commercial software, not compiling one. As long as SGDK remains open source it's all good, even if someone wants to make Pier Solar 2.
    The license really isn't clear on this though. That just makes it easier to break in case that's what he really means .
    This thread needs more... ENGINEERS

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •