sljit

What is sljit?

The sljit compiler is a stack-less platform independent JIT compiler, or perhaps a platform independent assembler is even a better name. The key design principle of sljit is that it does not try to be smarter than the developer. This principle is achieved by providing control over the generated machine code like assembly languages. Unlike other assembly languages however, sljit LIR (low-level intermediate representation) is CPU independent, which greatly improves portability.

The engine strikes a good balance between performance and maintainability. The LIR code can be compiled to many CPU architectures, and the performance of the generated code is very close to code written in assembly languages. Although sljit does not support higher level features such as automatic register allocation, it can be a code generator backend for other JIT compiler libraries. Developing these intermediate libraries takes far less time, because they only needs to support a single backend.

Defining a LIR which provides wide range of optimization opportunities and still can be efficiently translated to machine code on all CPUs is the biggest challenge of this project. Those instruction forms and features which are supported on many (but not necessarily on all) architectures are carefully selected and a LIR is created from them. These features are also emulated by the remaining architectures with low overhead. For example, sljit supports various memory addressing modes and setting status register bits.

This approach is very effective for byte-code interpreters since their machine independent byte code (middle level representation) typically contains instructions which either can be easly translated to machine code, or does not worth to translate them at all.
    Interpreter byte-code instruction examples
      pop - pop from stack
        Very easy to implement in sljit level, since it just decrease the stack pointer by 1.
      add - add
        Fast case for integer addition, and slow case for anything else.
      resolve - resolve an identifier
        Not suitable to do it in JIT level, just call a native C++ helper.

Download

https://github.com/zherczeg/sljit/

Supported architectures

    Intel-x86 32
    AMD-x86 64
    ARM 32 (ARM-v5, ARM-v7 and Thumb2 instruction sets)
    ARM 64
    PowerPC 32
    PowerPC 64
    MIPS 32 (III, R1)
    MIPS 64 (III, R1)
    RISC-V 32
    RISC-V 64
    s390x (64)

Pattern matching: JIT-ing regular expressions

    PCRE2-sljit combines the well known PCRE2 regular expression library with the performance boost provided by sljit, creating a lightning fast, PERL compatible backtracking engine. More about this project can be found in pcre2_jit.html and pcre.html. Comparing the PCRE-sljit with other engines is presented in regex_perf.html. The new scan substring feature is described in scan_substring.html.

What a JIT ...

    ... can do
      decrease the number of executed instructions, which speeds up the execution. You can embed constants and constant pointers into the JIT code, so you don't need to access them before use (eliminates several loads). However, its trade-off is the extra memory space consumed by the jitted code. On embedded systems, large amount of JIT-ed code might decrease the efficiency of the instruction cache.
    ... can't do
      miracles. JIT is a good thing if you know what you are doing.

My practical experiences

    - JIT is kind of a code inlining (static compiler optimization). It basically has the same disadvantages as well.
    - focus on the most frequently executed part of your program. Profiling can help. Never compile generic (especially complex) algorithms by JIT code generators. Their C/C++ counterpart usually performs better.

SL-JIT Advantages

    The execution can be continued from any LIR instruction In other words, jump into and out of the code is safe.
    Target of (conditional) jump and call instructions can be dynamically modified during the execution of the code.
    Constants can be modified during the execution of the code.
    Fast, non-ABI compilant function call (when a JIT code calls anoher JIT code). Requires only a few machine instructions, and all registers are keeping their values.
    Move with update instructions. It means the base register is updated before the actual load or store.

SL-JIT Disadvantages

    Limited (3 machine words) number of arguments for ABI compatible function calls.

More about the project

The source package contains a readme, which describes how to add sljit to an existing project. The details about the sljit LIR (low-level intermediate representation) is found in sljitLir.h, which is the only file, you need to know to use sljit.

Help needed

I have limited access to various software tools and hardware, which makes testing difficult. You could help me by trying sljit with various compilers (ARM RVCT) and various CPUs, especially mips ppc, and sparc.

Contribution

Please open issues or submit pull requests to https://github.com/zherczeg/sljit/

Last modification: 05.10.2024