Hacking devices can/will void your warranty and can turn your expensive consumer electronics into worthless trash if you don't know what you're doing. This blog is for information purposes only, and if you try to hack into your own consumer electronics, you do so at your own risk. The device I'm currently hacking is the Canon SX10 IS camera.

Wednesday, April 14, 2010

MIPS disassembler / reverse engineering tool

I know I haven't posted for a long time, but I have been busy hacking.

I needed a decompiler with source code that I could modify to try to build something similar to what an acquaintance of mine once built, the MIPSX Sourcer. When I was into hacking a certain DVD player, I learned MIPSX which isn't too far different than MIPS but different enough to make the tools incompatible. Doing a disassembler is easy. In fact, objdump comes on the ScreenPlay Pro itself, so just objdump the DvdPlayer and presto, a gazillion lines of MIPS code with no chance of understanding it and not knowing where to start. I needed something that would take any new information I gave it and apply it across the whole file to give me more information back, and I had to start from scratch.

First, I needed an understanding of the binary format of the executable file.
http://www.skyfree.org/linux/references/ELF_Format.pdf

I put together a basic program to open and do a raw dump of each section defined in that ELF_Format.

Once I had that, it was a small matter to make the disassembler, made even easier by the Reverse Engineering Compiler:
http://www.backerstreet.com/rec/rec.htm
In fact REC looked very similar to what I wanted...but it was not able to handle the Iomega firmware and testing with other MIPS code I found it was difficult to get the information together that I wanted. However, the author published a small piece of his code:
http://www.backerstreet.com/rec/dismips.c

This was a nice jewel. It had some stuff wrong but it was a nice start anyway. So I plugged in the functions into the code I wrote and send each instruction from the ELF segments that were listed as executable and poof, instant MIPS code. It didn't match exactly with what I was getting out of objdump, so I did some investigating.

http://www.mips.com/products/architectures/mips32/index.cfm#specifications
You have to register (free) to get the documents, but this is the most accurate documentation on the language, of course. Not always the easiest to digest for a beginner, so here was a good primer I came across:

http://www.eecs.harvard.edu/~ellard/Courses/cs50-asm.pdf

I didn't like the way dismips.c was representing a lot of the information, and some of it was just wrong. So I modified it to resemble the objdump disassembly a little closer. Then I have it analyze the code from top down. It assumes that any 0x3c1cXXXX followed by a 0x279cXXXX (where XXXX is any hexadecimal digit) is the beginning of a new function...and so far I haven't found any exceptions to the rule. Those two values equate to: LUI $gp,0xXXXX and ADDIU $gp,$gp,0xXXXX. These almost always followed a JR $RA instruction (with one instruction after that).

Those familiar with assembly but not familiar with MIPS need to understand that MIPS processors use a pipeline for processing the instructions and depending upon the pipeline, you can have multiple instructions in the pipe being processed at different stages. JR $RA (Jump to address specified by Register RA) changes the current execution address. But JR doesn't get the address loaded into the current instruction pointer (IP) before the next instruction is loaded into the pipeline. So the next instruction will also be executed even though it occurs AFTER the JR.

so you may see something like:

jr $ra
addiu $sp,$sp,0x20

lui $gp,0xfcc
addiu $gp,$gp,0xffffb9a4
addu $gp,$gp,$t9

the actual start of the function is on the LUI and the end of the other function will include adding 32 (hex value 0x20) into the $SP (stack pointer) register.

Ok, I'm getting sidetracked. The links up above contain better information about how MIPS works, but I wanted to explain that part so I could explain how the analysis engine works.

Now, I figured out that register $t9 always contains the address of the function being started. Whenever the code was calling a function it would do a JALR $ra,$t9 (objdump leaves off $ra since it is assumed, but I like having it there). What that does is jumps to register $t9 and stores the current IP in $ra (so that JA can then return to the next line after the JALR + one more line due to pipelining). So I make the assumption that $t9 will always contain the address of the beginning of the function at the beginning.

$a0 - $a3 are registers that are typically used for passing parameters. More can be passed by placing them onto the stack, but most times I only need to know what the first 4 parameters to any given function are anyway. Knowing this means a block of code like this:

lw $a0,0xffff802c($gp)
nop
addiu $a0,$a0,0xffff9460
lw $t9,0x2cd8($gp)
nop
jalr $ra,$t9
nop

is a call to a function with one parameter. But objdump shows this and trying to make sense of it isn't all that helpful.

So I have the program analyze the registers as it decompiles them. If it knows the value, it prints it next to the instruction. LW $a0,0xffff802c($gp) means to take the value of the register $gp and add 0xffff802c and go to that address and fetch the word (LoadWord) at that address. Well, these addresses are all in the .GOT segment in the ELF file (Global Offsets Table) so I have it peek in there to get the address. I also added the opcode and address.

004002e0: 3c1c0fcc lui $gp,0xfcc; gp=0x0fcc0000
004002e4: 279cb920 addiu $gp,$gp,0xffffb920; gp=0xfcbb920
004002e8: 0399e021 addu $gp,$gp,$t9; gp=0x100bbc00
...
0040034c: 8f84802c lw $a0,0xffff802c($gp); a0=0x00af0000
00400350: 00000000 nop
00400354: 24849460 addiu $a0,$a0,0xffff9460; a0=00ae9460
00400358: 8f992cd8 lw $t9,0x2cd8($gp); t9=0x00ad2310
0040035c: 00000000 nop
00400360: 0320f809 jalr $ra,$t9
00400364: 00000000 nop

(the first hex number is the address where the instruction will be loaded, the second hex number is the instruction that is decoded) So now we know it's calling a function with one parameter, 0x00ae9460. That happens to be an address in the .rodata segment (Read Only data segment). So I looked there, found a string terminated with a zero and included this information in. This is what I get:

004002e0: 3c1c0fcc lui $gp,0xfcc; gp=0x0fcc0000
004002e4: 279cb920 addiu $gp,$gp,0xffffb920; gp=0xfcbb920
004002e8: 0399e021 addu $gp,$gp,$t9; gp=0x100bbc00
...
0040034c: 8f84802c lw $a0,0xffff802c($gp); a0=0x00af0000
00400350: 00000000 nop
00400354: 24849460 addiu $a0,$a0,0xffff9460; a0=00ae9460
00400358: 8f992cd8 lw $t9,0x2cd8($gp); t9=0x00ad2310
0040035c: 00000000 nop
00400360: 0320f809 jalr $ra,$t9; unknown("malloc failure\n")
00400364: 00000000 nop

and that unknown function at that address is scattered throughout the code with other strings that I commonly see when I execute DvdPlayer. So I know now that the function at address 0x00ad2310 is probably printf.

So I built a separate file to document the addresses and functions at those addresses. The disassembler can then use those to pull the function names when parsing apart functions, rather than saying "unknown".

From there, I used objdump on the Ellion firmware, compiled with symbols and objdumped it with symbols. Looked at the printf function and sure enough it matched the opcodes for the function I figured was printf in the Iomega firmware. The only exceptions were the addresses and offset values. Went forward and backward from those function and found the matching functions and was able to label those.

With many of the library functions identified, it makes it that much easier to read the source and find functions that match in the Iomega vs. Ellion firmware. Especially helpful was to find the assert command, because it lists what function name it was in as one of the parameters.

I'm using this utility to try to track down what is turning off the flashing light when the DvdPlayer boots. I've identified all of the calls in the main.cpp function, which is similar but not exactly the same as the Ellion's main.cpp. What I'm going to do from this point is knock out the function calls one at a time, searching the binary file and replacing the JALR calls with NOPs. I can more easily find the ones I'm looking for due to the disassembly and using HexEdit I can replace the opcode with 0s to turn it into a NOP.

I've added some other analysis to the utility, which I will blog about when I publish the utility.