2 Comments

Decompiling SCI – part 3 – instruction consumption

Here’s a link to part 1 of this series.

Another fundamental concept we need to introduce is instruction consumption.

The basics

Each SCI instruction (opcode + operands) “consumes” a certain number of values from the stack, or the value of the accumulator or prev registers. Likewise, it may “generate” a result that gets pushed onto the stack, or placed in the accumulator or prev register.

Let’s look at a simple example. Consider the following code:

(= localVar (+ param1 3))

and its corresponding assembly instructions:

lsp param1 
ldi 3 
add 
sal localVar 

 

How do we go about actually constructing the former from that latter?

We can start from the last instruction (sal localVar), which means “store the value of the accumulator in this local variable”. From this, we know we we’ll have an assignment operation, and the value that is being assigned will come from the value in the accumulator.

So going backwards, we see the previous instruction is add. This satisfies our need for an accumulator value, since add places its result in the accumulator. So the result of the add is what will end up being assigned to the local variable above.

Of course, add consumes an accumulator value itself, in addition to popping a value from the stack. So we work backwards again trying to find an instruction that pushes a value onto the stack, and the next instruction that places a value into the accumulator. Luckily, in this case, these are immediately satisfied. ldi places a value in the accumulator (3), and lsp pushes a value onto the stack (the value of param1).

Essentially, we’re building a tree of instructions, where a node represents an instruction and a node’s children represent instructions that generate the data needed for the node’s “consumption”. So in the above example, we’d have:

  • sal local0
    • add
      • ldi 3
      • lsp param1

 

In practice of course, these trees become quite large and nested quite deeply. SCI has some fairly complex instructions that can consume arbitrary amounts of stack values.

The problems

Things aren’t always so clean as in the previous example.

For instance, it is not uncommon for a value in the accumulator to be “consumed” by multiple instructions.

Consider the following assembly from the spotEgo method in script 80 of King’s Quest 6:

	pToa spotEgoScr 
	bnt code_01f6 // Consumes the accumulator, which is spotEgoScr.
	pushi 92 // $92 setScript
	pushi 3
	push // Consumes the accumulator, which is spotEgoScr.
	push0 
	lsp param1 
	lag global2 
	send a 

The value in the accumulator is used twice. This decompiles into the following code, which is a pretty common pattern (branch on something, then use the value that was branched on):

		(if spotEgoScr
			(global2 setScript: spotEgoScr 0 param1)
		)

Speaking of branching, what happens if we encounter a branch or jump instruction while backtracking? Suddenly, we can’t form a good idea of what constitutes the statement we’re trying to build, since the shape of our instruction consumption tree would depend on which branch was taken.

Tying it together with the control flow graph.

Both the control flow graph and the instruction consumption logic work together in determining the final decompiled source code.

The control flow analysis should have removed any stray branch instructions (jmp, bt or bnt) in the sense that they would have turned into semantic control flow graph nodes (if statements, loops, etc…), leaving just non-branching sequences of instructions. So our instruction consumption code doesn’t really need to deal with them anymore.

However, it does need to deal with the problem of not being able to satisfy instruction consumption in the current sequence of instructions. So let’s see how that might work.

Let’s revisit what we have in our control flow graph. Yes, we have all those semantic nodes: if, loop, switch and so on. But we also have “raw code” nodes in the leaves of the graph. That is, the condition node in a loop will contain the raw set of instructions that comprise that condition.

So if we take our example from the previous section, after doing our first pass of instruction consumption, we’ll end up with something like this:

  [If]
    [Condition]
      bnt  [01e5]
        pToa  [01e3]
    [Then]
      send  [01f2]
        pushi  [01e7]
        pushi  [01ea]
        push  [01ec]
          [NeedsAccumulator]
        push0  [01ed]
        lsp  [01ee]
        lag  [01f0]

You’ll see that we were not able to find an accumulator value to satisfy the push instruction. So we tagged it as “needs accumulator”.

On a second pass, we’ll try to work backwards to find the appropriate value. Below, you’ll see that we “cloned” the pToa value from above:

  [If]
    [Condition]
      bnt  [01e5]
        pToa  [01e3]
    [Then]
      send  [01f2]
        pushi  [01e7]
        pushi  [01ea]
        push  [01ec]
          pToa  [01e3]
        push0  [01ed]
        lsp  [01ee]
        lag  [01f0]

 

Working backwards doesn’t just mean returning to the original set of raw instructions and scanning backwards, of course. Then we run into the problem again of branching interfering with a static known path to our current instruction.

Instead, we need to follow the control flow graph hierarchy “up and backwards”. And the way we do that ends up being fairly complex, and changes depending on the type of the node in the control flow graph.

Another point that is important to bring up: Some of the control flow graph nodes themselves need to be treated as though they generate a value for the accumulator. For instance, if-statements, switch-statements and isolated compound conditions are all treated as such. That is, if we can’t find an accumulator-value-producing instruction to satisfy our instruction consumption, and we encounter an if node while navigating “up and backwards”, then we’ll use that if node as an accumulator-value-producer. This is a pattern we see often in source code like this (this would be the equivalent of the ternary operator in most languages):

 

(= someVar
    (if blahblah 5 else 2)
)

 

Of course, then we need to make a call as to whether we can remove the if node from its previous spot in the instruction consumption tree, or we have to leave it in and clone a new one (resulting in possibly incorrect code since we’ll evaluate the if-statement twice). These subtleties and complexities help to illustrate some of the difficulties in implementing an accurate decompiler.

During this phase we also do some analysis to classify loops: are they while-loops, do-loops or infinite loops? We can analyze the structure of the loop to figure this out. For instance, if the latch node has two successors, and the loop head node has only one, we know it’s a do-loop (the loop condition comes at the end).

Summary

At the end of the instruction consumption phase, we have now generated a “consumption node” tree that incorporates both information from the control flow graph, but also more detailed information on how individual code statements are constructed: flat sequences of instructions have been re-organized into hierarchical forms. We’re now ready for the next phase, which involves constructing the abstract syntax tree for the source code.

Below is an example of what the instruction consumption tree might look like. First, the decompiled method:

	(method (constantRocks &tmp temp0)
		(rCliffs cheatCount: 0)
		(= temp0 0)
		(while (< temp0 7)
			((= [newRockStep temp0] (RockStep new:))
				x: (if argc (- 320 [local246 temp0]) else [local246 temp0])
				y: [local262 temp0]
				cel: 3
				rockPointer: temp0
				corner: (if (== temp0 6) 1 else 0)
				init:
				stopUpd:
			)
			(if (< temp0 6) ([newRockStep temp0] addToPic:))
			(++ temp0)
		)
	)

 

And then, the instruction consumption tree that was used to generate this:

  link  [0350]
  send  [0359]
    pushi  [0352]
    push1  [0355]
    push0  [0356]
    class  [0357]
  sat  [035d]
    ldi  [035b]
  [While]
    [LoopBody]
      send  [03b0]
        push1  [0366]
        push1  [0367]
        push  [037a]
          [If]
            [Condition]
              bnt  [036a]
                lap  [0368]
            [Else]
              lali  [0378]
                lat  [0376]
            [Then]
              sub  [0373]
                pushi  [036c]
                lali  [0371]
                  lat  [036f]
              jmp  [0374]
        push0  [037b]
        push1  [037c]
        lsli  [037f]
          lat  [037d]
        pushi  [0382]
        push1  [0384]
        pushi  [0385]
        pushi  [0387]
        push1  [038a]
        push  [038b]
          lat  [037d]
        pushi  [038c]
        push1  [038f]
        push  [039c]
          [If]
            [Condition]
              bnt  [0394]
                eq?  [0393]
                  push  [0390]
                    lat  [037d]
                  ldi  [0391]
            [Else]
              ldi  [039a]
            [Then]
              ldi  [0396]
              jmp  [0398]
        pushi  [039d]
        push0  [039f]
        pushi  [03a0]
        push0  [03a3]
        sali  [03ae]
          push  [03ab]
            send  [03a9]
              pushi  [03a4]
              push0  [03a6]
              class  [03a7]
          lat  [03ac]
      [If]
        [Condition]
          bnt  [03b7]
            lt?  [03b6]
              lst  [03b2]
              ldi  [03b4]
        [Then]
          send  [03c1]
            pushi  [03b9]
            push0  [03bc]
            lali  [03bf]
              lat  [03bd]
      +at  [03c3]
      jmp  [03c5]
    [Condition]
      bnt  [0364]
        lt?  [0363]
          lst  [035f]
          ldi  [0361]
  ret  [03c7]

 

The next post ties everything together and summarizes the final steps.

Advertisements

2 comments on “Decompiling SCI – part 3 – instruction consumption

  1. […] The next step we’ll look at in the decompilation process is Instruction Consumption. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Space Quest Historian

Adventure game blogs, Let's Plays, live streams, and more

Harebrained Schemes

Developer's blog for IceFall Games

kosmonaut's blog

3d GFX and more

Halogenica

Turn up the rez!

bitsquid: development blog

Developer's blog for IceFall Games

Game Development by Sean

Developer's blog for IceFall Games

Lost Garden

Developer's blog for IceFall Games

Memories

Developer's blog for IceFall Games

Casey's Blog

Developer's blog for IceFall Games

Blog

Developer's blog for IceFall Games

Rendering Evolution

Developer's blog for IceFall Games

Simon schreibt.

Developer's blog for IceFall Games

Dev & Techno-phage

Do Computers Dream of Electric Developper?

- Woolfe -

Developer's blog for IceFall Games

Ferrara Fabio

Game & Application Developer, 3D Animator, Composer.

Clone of Duty: Stonehenge

First Person Shooter coming soon to the XBOX 360

Low Tide Productions

Games and other artsy stuff...

BadCorporateLogo

Just another WordPress.com site

Sipty's Writing

Take a look inside the mind of a game developer.

Jonas Kyratzes

Writer, game designer, filmmaker.

%d bloggers like this: