This perl5 jitter is super-simple, and modeled after B::CC

The compiled perl5 optree is a linked list in memory in non-execution order,
wide-spread jumps. Additionally the calls are indirect. The jitter properly
aligns the run-time calls in linear linked-list "exec" order, so that the CPU
can prefetch the next instructions, and it inlines some simple ops.
op_next targets (returned by false conditions) are favored over op_other and
other targets.

IT DOES NOT WORK YET!
It does only work yet for non-threaded simple functions! No subs, no maybranch ops

Faster jitted execution path without runops loop, selected with -MJit or later,
when stable with perl -j.

All ops are unrolled in execution order for the CPU cache,
prefetching is the main advantage of this function. The perl5 runloop has
no chance to get cached at all.
For < 5.13 the ASYNC check is only done when necessary.

For now only implemented for x86 and amd64/x86_64 with certain
hardcoded my_perl offsets when threaded.

C pseudocode

x86 not-threaded, PL_op in eax, PL_sig_pending temp in ecx

prolog:
       55                      push   %ebp
       89 e5                   mov    %esp,%ebp
       53                      push   %rbx
call:
       e8 xx xx xx xx          call   $PL_op->op_ppaddr #relative
save_plop:
       a3 xx xx xx xx          mov    %eax,$PL_op

dispatch_getsig:
       8b 0d xx xx xx xx       mov    $PL_sig_pending,%ecx
dispatch:
       85 c9                   test   %ecx,%ecx
       74 06                   je     nextcall
       e8 xx xx xx xx          call   *Perl_despatch_signals #relative
epilog:
       b8 00 00 00 00          mov    $0x0,%eax        # clean PL_op
       5b                      pop    %rbx
       5d                      pop    %ebp
       c3                      ret

If op maybranch (Opcodes-0.04), create call of other op, check PL_op before with after
and branch to label of other op.

Problems
far calls to the pp ops break code prefetching, so we have to inline as much as
possible, similar to B::CC. Easy to jit are only nextstate, enter, and skip null.
The best jitter would be a B::CC to assembler backend, but this is hard to get right.

Porting
I created the asm with cc_main and cc_main_nt, see Makefile for objdump and cc_harness
rules for gcc assembly.

ASM links

http://www.lxhp.in-berlin.de/lhplinks.html
http://blogs.msdn.com/freik/archive/2005/03/17/398200.aspx
http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
http://asm.sourceforge.net//resources.html
http://www.intel.com/design/itanium/manuals/iiasdmanual.htm
http://www.heyrick.co.uk/assembler/qfinder.html

HL jitters

parrot
luajit
psyco / pypy
tracemonkey
ruby
clisp

JIT libs

lightning - c macros only
libjit - c lib
llvm - compiler framework + lib