Type Information Restoring
Though the C0 language is a strong typed language, during the process of compiling to bytecode, most of these type information was lost.
Luckily, the compiler will contain some information in the "comment" area.
Inference Value Type from BIPUSH
For instance, when pushing a value into the stack using BIPUSH
instruction, we will have something like this
While the second column directly notes the BIPUSH
and the value to be pushed, the third column reveals the information of type.
We can use the regular expression to match the third column and get the type accordingly:
The "(dummy return value)" is here since when the function returns void
, the CC0 compiler will always push a 0
on operand stack and return that "dummy value" to caller.
Inference Pointer Type from NEW
When allocating a heap memory block using the new
instruction, the byte code line will always be something like this:
The third column reveals the type of return value should be list*
.
The value should be a pointer points to list
type, so the value itself has type list*
The datatype in comment area can be matched using the regular expression below, where the first group of matching result will be the type of allocated memory block.
Inference Pointer Type from NEWARRAY
When allocating an array on heap memory using the newarray
instruction, the byte code line will always be something like this:
In this case, the returned value of this instruction will be int[]
.
The datatype in comment area can be matched using the regular expression below, where the first group of matching result will be the type of element in array.
Storing Type Inference Result
The reconstructed information will be stored in the comment
property of C0Function
.
The comment
is a hash map maps from the instruction index to the CodeComment
object - an object that stores the line number mapping, data type and struct field name parsed from raw bc0
file.
Last updated