Type Information Restoring

Though the C0 language is a strong typed language, during the process of compiling to bytecode, most of these type information was lost.

Luckily, the compiler will contain some information in the "comment" area.

Inference Value Type from BIPUSH

For instance, when pushing a value into the stack using BIPUSH instruction, we will have something like this

10 03    # bipush 3           # 3
10 6D    # bipush 109         # 'm'
10 00    # bipush 0           # false

While the second column directly notes the BIPUSH and the value to be pushed, the third column reveals the information of type.

We can use the regular expression to match the third column and get the type accordingly:

const int_comment_regex = /(\d+)|(dummy return value)/;
const bool_comment_regex = /(true)|(false)/;
const char_comment_regex = /'.*'/;

The "(dummy return value)" is here since when the function returns void, the CC0 compiler will always push a 0 on operand stack and return that "dummy value" to caller.

Inference Pointer Type from NEW

When allocating a heap memory block using the new instruction, the byte code line will always be something like this:

BB 10    # new 16             # alloc(list)

The third column reveals the type of return value should be list*.

The value should be a pointer points to list type, so the value itself has type list*

The datatype in comment area can be matched using the regular expression below, where the first group of matching result will be the type of allocated memory block.

const new_comment_regex = /^alloc\(([a-zA-Z0-9_\-*[\]\s]+)\)/;

Inference Pointer Type from NEWARRAY

When allocating an array on heap memory using the newarray instruction, the byte code line will always be something like this:

BC 04    # newarray 4         # alloc_array(int, 5)

In this case, the returned value of this instruction will be int[].

The datatype in comment area can be matched using the regular expression below, where the first group of matching result will be the type of element in array.

const arr_comment_regex = /^alloc_array\(([a-zA-Z0-9_\-*[\]\s]+),.+\)/;

Storing Type Inference Result

The reconstructed information will be stored in the comment property of C0Function.

type C0Function = {
    name: string;
    numVars: number;
    numArgs: number;
    varName: string[];
    size: number;
    code: Uint8Array;
    comment: Map<number, CodeComment>;
};

type CodeComment = {
    dataType?: string,  // If command = new/new_array/bipush, the type name of variable will be placed here
    fieldName?: string, // If command = aaddf, the field name will be placed here
    lineNumber: number  // The corresponding line number in .bc0 file
}

The comment is a hash map maps from the instruction index to the CodeComment object - an object that stores the line number mapping, data type and struct field name parsed from raw bc0 file.

Last updated