BZ #121: VM startup error: Absurd stack bottom value

Status fields:

creation_ts:2009-02-14 17:34
component:vm
version:0.99.3
rep_platform:arm
op_sys:Linux
bug_status:RESOLVED
resolution:FIXED
reporter:philipstoehrer@gmx.de
Cacao VM doesn't startup and exits with the following error message:

Absurd stack bottom value
Aborted

The error message appears here in the code:
http://cacao.sourcearchive.com/documentation/0.99.4~20081117/os__dep_8c-source.html
line 01080

My /proc/self/stat looks like this:
1493 (sh) S 1491 1493 1493 34816 1542 4194304 575 1565 0 6 9 23 33 71 21 1 1 0 11040
3518464 257 4294967295 32768 614548 63392720 63390544 1074456324 0 0 2637828 2
3221521656 0 0 17 0 0 0 0 0 0

Hardware: Openmoko Neo Freerunner
Distribution: SHR unstable (build from 2009-02-08)

Comment #1 by stefan@complang.tuwien.ac.at on 2009-02-14 18:07:10

Looks like a problem with Boehm GC.

Comment #2 by stefan@complang.tuwien.ac.at on 2009-02-14 18:07:55

Oh well, you already found that out...

Comment #3 by thebohemian@gmx.net on 2009-02-15 13:37:38

I already had cacao running on the Freerunner. However that was with an older OpenMoko
firmware using kernel 2.6.24. I just installed their newer 2.6.28 based images and now I
am also seeing this problem.

Has anyone a quick workaround for this or something I could try out?

I am a bit in a hurry as this problem blocks the performance tests on the Freerunner for
my diploma thesis (the JIT cache implementation). So any hint to get it fixed (or worked
around) is appreciated.

Comment #4 by stefan@complang.tuwien.ac.at on 2009-02-15 14:02:22

You should have a look at the latest Boehm GC at
<http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc.tar.gz> and try to integrate
it.

Lest you waste effort, try to build and run the test programs that come with standalone
Boehm GC 7.1 first and check if they also complain about an absurd value.

Comment #5 by michi@complang.tuwien.ac.at on 2009-02-15 14:07:36

Have you already tried to simply remove the sanity check from the BoehmGC. I have
compared the values with the ones from out ARM board. And the stack-base value seems to
be quite 'low'. I see no point of having this arbitrary boundary in the BoehmGC code,
although I am not an expert of the BoehmGC code.

Here is the /proc/self/stat of out ARM board:
27948 (cat) R 27941 27948 27941 34816 27948 0 100 0 90 0 1 3 0 0 14 0 0 0 569932656
1388544 89 4294967295 32768 48052 3221224480 3221224184 1074601072 0 0 0 0 0 0 0 17 0

The values for the stack-base are as follows:
BoehmGC boundary: 0x10000000
Our ARM board:    0xBFFFFC20
Your Openmoko:    0x03C74BD0

So as mentioned above, my first suggestion for a quick'n'ugly workaround would be to
simply try to remove the sanity check.

This is the line we are talking about:
http://mips.complang.tuwien.ac.at/hg/cacao/file/f162d02fffb3/src/mm/boehm-
gc/os_dep.c#l1080

Comment #6 by thebohemian@gmx.net on 2009-02-15 17:16:36

Disabling the check for the stack base yields in a working cacao, however neither I know
why boehm is doing it nor do I know why the stack base is that different on the
Freerunner /w Linux 2.6.28.

Comment #7 by philipstoehrer@gmx.de on 2009-02-23 16:18:21

I can confirm that this problem is related to Kernel 2.6.28. It doesn't appear when
running a 2.6.24 Linux Kernel (on the same device).

I didn't manage to cross-compile cacao for ARM, so I can't try the "dirty workaround". I
just switched from Cacao to JamVM and this works fine for me.

Comment #8 by michi@complang.tuwien.ac.at on 2009-03-16 10:02:12

I have posted this issue on the Boehm GC mailing list, lets see what they have to say
about that. Maybe someone finds a fix upstream or we can simply remove the assertion in
our fork of the Boehm GC source tree.

This is the post:
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2009-March/002616.html

Comment #9 by stefan@complang.tuwien.ac.at on 2009-03-20 13:50:40

He patched it:
http://bdwgc.cvs.sourceforge.net/viewvc/bdwgc/bdwgc/os_dep.c?r1=1.35&r2=1.36

Comment #10 by michi@complang.tuwien.ac.at on 2009-03-21 17:10:45

As Stefan pointed out in his last comment, the issue was fixed upstream (thanks go to
Hans Boehm for his quick response). Since I don't want to have some CVS version of
BoehmGC hanging around in the Cacao source tree, I just cherry-picked the one changeset.

This is the changeset:
http://mips.complang.tuwien.ac.at/hg/cacao/rev/68bdfa8ea857

I am closing this bug now, but it would still be a good thing if someone with access to
an Openmoko could verify the fix.

Comment #11 by thebohemian@gmx.net on 2009-03-24 21:26:04

> I am closing this bug now, but it would still be a good thing if someone with
> access to an Openmoko could verify the fix.
It works I used the workaround that was discussed here before it was applied to the
repository already.