BZ #78: Fast subtype checking

Status fields:

creation_ts:2008-06-09 21:56
version:default branch
I've started an implementation of fast subtype checking (like Hotspot):

It's x86_64 only at the moment. I won't have time to work on it for at least 2 months,
so I'm making it available here in case somebody wants to pick it up.


- Because of the messy CISC code, I couldn't always find useful instruction mnemonics in
emit.*/codegen.h, so there's the actual machine code in codegen for two instructions.

- The code is x86_64 only. Needs to be ported to all architectures.

- Overflow is allocated via malloc and never freed. Not nice.

- The x86_64 code uses the red zone (below the stack pointer) for the array scanning
code. twisti doesn't like that...

- The fields in _vftbl should be reordered. I think I'm wasting some space to alignment.
They are also not nicely formatted.


We do secondary types (interfaces) the way we've always done. I've also left in the
check for ACC_INTERFACE, so the new display-based subtype code only needs to handle
primary types. So for us, the algorithm looks like this.

if S.display[T.depth] == T: (where T.depth is a constant and encoded
                             in the instruction itself)
  return True
if T.offset != &.display[DISPLAY_SIZE]:
  return False
return array_contains(T, *S.overflow (of length S.overflow_length))

The overflow part is allocated on the heap, and in our case we would not even have to
scan it because it contains just the entries from the display. And those are in the
correct order. So a simple lookup would be sufficient here. However, this test is rare,
so I don't mind wasting a few cycles here in the name of future extendability.


- The actual code for ICMD_CHECKCAST and ICMD_INSTANCEOF is almost identical.

- It's basically an inline implementation of fast_subtype_check in builtin.c.

- Almost all of the code can be left out when super is known and its depth is not too

- I used the red zone because of severe register shortage. Maybe someone will find a
more elegant solution.

Comment #1 by on 2008-06-09 21:58:43

Created an attachment (id=42)
subtype branch export

I cannot commit...

Would like to push this to the official repo.

Comment #2 by on 2008-06-11 23:51:34

Pushed the changeset into the official repo (subtype branch).

Comment #3 by on 2008-08-15 18:51:06

Twisti asked me during my Diplomarbeitsvortrag how much time was spent on tree numbering
and I was not aware how much it really was. I've just measured it (thanks --enable-rt-
timing) and am floored...

It's a x86_64 Fedora 9 system on Pentium D 3.2 GHz, compiled with -O2. Here's the result
for starting up eclipse (7000 class loads):

      969631 usec  23% link: resolve superclass/superinterfaces
       33555 usec   1% link: compute vftbl length
        4184 usec   0% link: handle abstract methods
        3020 usec   0% link: compute interface table
       37099 usec   1% link: fill vftbl
        2250 usec   0% link: set offsets
       59099 usec   1% link: fill interface table
         622 usec   0% link: set finalizer
           0 usec   0% link: resolve exception classes
     3079609 usec  73% link: re-calculate subclass indices
     4243228 usec      total link time

3 seconds just for tree numbering! Wow!

Comment #4 by on 2008-09-15 09:39:05

I've modified the code once again. It is simpler now and can do without additional stack
space. I removed the array scanning and replaced it with a simple index lookup. There is
also an Alpha version now.

Most of the runtime checks could be removed if the patcher generated the code after
loading the class, as we established in the meeting 2 weeks ago. But if we do this, then
we might as well remove the runtime check for class/interface which has been there for
ages, and this was just too tedious for me at the moment.

Comment #5 by on 2008-12-23 12:24:31

Merged into trunk (hg rev 25a769e0af7f).

Attachment id=42

date:2008-06-09 21:58
desc:subtype branch export