Hacking devices can/will void your warranty and can turn your expensive consumer electronics into worthless trash if you don't know what you're doing. This blog is for information purposes only, and if you try to hack into your own consumer electronics, you do so at your own risk. The device I'm currently hacking is the Canon SX10 IS camera.

Wednesday, April 14, 2010

MIPS disassembler / reverse engineering tool

I know I haven't posted for a long time, but I have been busy hacking.

I needed a decompiler with source code that I could modify to try to build something similar to what an acquaintance of mine once built, the MIPSX Sourcer. When I was into hacking a certain DVD player, I learned MIPSX which isn't too far different than MIPS but different enough to make the tools incompatible. Doing a disassembler is easy. In fact, objdump comes on the ScreenPlay Pro itself, so just objdump the DvdPlayer and presto, a gazillion lines of MIPS code with no chance of understanding it and not knowing where to start. I needed something that would take any new information I gave it and apply it across the whole file to give me more information back, and I had to start from scratch.

First, I needed an understanding of the binary format of the executable file.
http://www.skyfree.org/linux/references/ELF_Format.pdf

I put together a basic program to open and do a raw dump of each section defined in that ELF_Format.

Once I had that, it was a small matter to make the disassembler, made even easier by the Reverse Engineering Compiler:
http://www.backerstreet.com/rec/rec.htm
In fact REC looked very similar to what I wanted...but it was not able to handle the Iomega firmware and testing with other MIPS code I found it was difficult to get the information together that I wanted. However, the author published a small piece of his code:
http://www.backerstreet.com/rec/dismips.c

This was a nice jewel. It had some stuff wrong but it was a nice start anyway. So I plugged in the functions into the code I wrote and send each instruction from the ELF segments that were listed as executable and poof, instant MIPS code. It didn't match exactly with what I was getting out of objdump, so I did some investigating.

http://www.mips.com/products/architectures/mips32/index.cfm#specifications
You have to register (free) to get the documents, but this is the most accurate documentation on the language, of course. Not always the easiest to digest for a beginner, so here was a good primer I came across:

http://www.eecs.harvard.edu/~ellard/Courses/cs50-asm.pdf

I didn't like the way dismips.c was representing a lot of the information, and some of it was just wrong. So I modified it to resemble the objdump disassembly a little closer. Then I have it analyze the code from top down. It assumes that any 0x3c1cXXXX followed by a 0x279cXXXX (where XXXX is any hexadecimal digit) is the beginning of a new function...and so far I haven't found any exceptions to the rule. Those two values equate to: LUI $gp,0xXXXX and ADDIU $gp,$gp,0xXXXX. These almost always followed a JR $RA instruction (with one instruction after that).

Those familiar with assembly but not familiar with MIPS need to understand that MIPS processors use a pipeline for processing the instructions and depending upon the pipeline, you can have multiple instructions in the pipe being processed at different stages. JR $RA (Jump to address specified by Register RA) changes the current execution address. But JR doesn't get the address loaded into the current instruction pointer (IP) before the next instruction is loaded into the pipeline. So the next instruction will also be executed even though it occurs AFTER the JR.

so you may see something like:

jr $ra
addiu $sp,$sp,0x20

lui $gp,0xfcc
addiu $gp,$gp,0xffffb9a4
addu $gp,$gp,$t9

the actual start of the function is on the LUI and the end of the other function will include adding 32 (hex value 0x20) into the $SP (stack pointer) register.

Ok, I'm getting sidetracked. The links up above contain better information about how MIPS works, but I wanted to explain that part so I could explain how the analysis engine works.

Now, I figured out that register $t9 always contains the address of the function being started. Whenever the code was calling a function it would do a JALR $ra,$t9 (objdump leaves off $ra since it is assumed, but I like having it there). What that does is jumps to register $t9 and stores the current IP in $ra (so that JA can then return to the next line after the JALR + one more line due to pipelining). So I make the assumption that $t9 will always contain the address of the beginning of the function at the beginning.

$a0 - $a3 are registers that are typically used for passing parameters. More can be passed by placing them onto the stack, but most times I only need to know what the first 4 parameters to any given function are anyway. Knowing this means a block of code like this:

lw $a0,0xffff802c($gp)
nop
addiu $a0,$a0,0xffff9460
lw $t9,0x2cd8($gp)
nop
jalr $ra,$t9
nop

is a call to a function with one parameter. But objdump shows this and trying to make sense of it isn't all that helpful.

So I have the program analyze the registers as it decompiles them. If it knows the value, it prints it next to the instruction. LW $a0,0xffff802c($gp) means to take the value of the register $gp and add 0xffff802c and go to that address and fetch the word (LoadWord) at that address. Well, these addresses are all in the .GOT segment in the ELF file (Global Offsets Table) so I have it peek in there to get the address. I also added the opcode and address.

004002e0: 3c1c0fcc lui $gp,0xfcc; gp=0x0fcc0000
004002e4: 279cb920 addiu $gp,$gp,0xffffb920; gp=0xfcbb920
004002e8: 0399e021 addu $gp,$gp,$t9; gp=0x100bbc00
...
0040034c: 8f84802c lw $a0,0xffff802c($gp); a0=0x00af0000
00400350: 00000000 nop
00400354: 24849460 addiu $a0,$a0,0xffff9460; a0=00ae9460
00400358: 8f992cd8 lw $t9,0x2cd8($gp); t9=0x00ad2310
0040035c: 00000000 nop
00400360: 0320f809 jalr $ra,$t9
00400364: 00000000 nop

(the first hex number is the address where the instruction will be loaded, the second hex number is the instruction that is decoded) So now we know it's calling a function with one parameter, 0x00ae9460. That happens to be an address in the .rodata segment (Read Only data segment). So I looked there, found a string terminated with a zero and included this information in. This is what I get:

004002e0: 3c1c0fcc lui $gp,0xfcc; gp=0x0fcc0000
004002e4: 279cb920 addiu $gp,$gp,0xffffb920; gp=0xfcbb920
004002e8: 0399e021 addu $gp,$gp,$t9; gp=0x100bbc00
...
0040034c: 8f84802c lw $a0,0xffff802c($gp); a0=0x00af0000
00400350: 00000000 nop
00400354: 24849460 addiu $a0,$a0,0xffff9460; a0=00ae9460
00400358: 8f992cd8 lw $t9,0x2cd8($gp); t9=0x00ad2310
0040035c: 00000000 nop
00400360: 0320f809 jalr $ra,$t9; unknown("malloc failure\n")
00400364: 00000000 nop

and that unknown function at that address is scattered throughout the code with other strings that I commonly see when I execute DvdPlayer. So I know now that the function at address 0x00ad2310 is probably printf.

So I built a separate file to document the addresses and functions at those addresses. The disassembler can then use those to pull the function names when parsing apart functions, rather than saying "unknown".

From there, I used objdump on the Ellion firmware, compiled with symbols and objdumped it with symbols. Looked at the printf function and sure enough it matched the opcodes for the function I figured was printf in the Iomega firmware. The only exceptions were the addresses and offset values. Went forward and backward from those function and found the matching functions and was able to label those.

With many of the library functions identified, it makes it that much easier to read the source and find functions that match in the Iomega vs. Ellion firmware. Especially helpful was to find the assert command, because it lists what function name it was in as one of the parameters.

I'm using this utility to try to track down what is turning off the flashing light when the DvdPlayer boots. I've identified all of the calls in the main.cpp function, which is similar but not exactly the same as the Ellion's main.cpp. What I'm going to do from this point is knock out the function calls one at a time, searching the binary file and replacing the JALR calls with NOPs. I can more easily find the ones I'm looking for due to the disassembly and using HexEdit I can replace the opcode with 0s to turn it into a NOP.

I've added some other analysis to the utility, which I will blog about when I publish the utility.

17 comments:

  1. Hello,
    GREAT!! You are the master!
    Thanks for your work.
    If I modify some lines in the decompiled source code of DvdPlayer, how then you compile back to DvdPlayer executable ?

    good work
    regards
    Victor

    ReplyDelete
  2. That'll require generating it in a slightly different format. I will need to get the offsets replaced with labels and remove the addresses/opcodes. Then I need to get it to match the .S file format and then all you'll need to do is use a mipsel cross assembler.

    ReplyDelete
  3. Hello,

    I'am sorry if my english isn´t good. I have followed your blob and I have tried some things.
    First. I have installed ffmpeg and other utilities in the hard disk. There are two interesting pages http://emprex-me1.blogspot.com/2008/03/setup-me1-as-bt-downloader.html and http://ipkg.nslu2-linux.org/feeds/unslung/cross/ . The nslu2 packages can be install in the disk perfectly. You must install the depends packages but they runs. I have runned ffmpeg but it was killed because it needed a lot of cpu. There one or two process to control it and kill any process that spend a lot of cpu.

    Second. I have made a copy (the partitions) of the sda1, and sda2 in a vmware disk with dd and have created a sda3 partition. After, I have tried to run the copy with qemu because it can emulate mips arquitectures. It was to nearly to start. It printed a error involved with the kernel modules. My idea is to run a lightweight graphical desktop, lirc and ffplay or vlc but the problem is the memory only 64Mb. Vlc or ffplay needs a graphical desktop to show the content in a window.

    Pages with the information:
    http://www.linux-mips.org/wiki/Qemu
    https://forum.openwrt.org/viewtopic.php?id=13423
    http://people.debian.org/~aurel32/qemu/mips/
    http://www.nasirabed.eu/2008/10/31/minimal-linux/

    Good work!!

    ReplyDelete
  4. Hi there.
    I am doing much the same thing for WDTV Live.
    I have used IDA with success and a custom api hooking for LD_PRELOAD-ing an shared object and injecting it in the application lifecycle.
    I now have introduced a javascript engine (spidermonkey) and created a minimal javascript plugin system for it.

    see here for more details
    http://wdtvforum.com/main/index.php?topic=5577.0

    ReplyDelete
  5. Anon, interesting approach. I've often wondered if qemu would be able to emulate it, but because of the hardware unknowns and my lack of knowledge about qemu, I haven't pursued it.

    Bogdan, IDA is a great tool for anyone who has that kind of money for it. I'm afraid it's a bit out of my price range for my hobby, hence why I'm taking this route. There's also this http://acade.au7.de/disasmips.htm utility for disassembling mips but no source, so no way to modify it after the fact or correct any problems, or get it to generate .s compatible files. With the tool I am creating I will be able to share the source so others can use it or modify it to suit their needs as well.

    ReplyDelete
  6. Joman100, I also started on a .Net c# project that can read an mipsel elf, it's references, decompile them all, do relocation (not all types i'm afraid) perform initializations and start the simulation ... it's a frankestain project, the only purpose with it was to study elf format, mips assembly language and a lot of other connected stuff.

    ReplyDelete
  7. Bogdan, that's great. Have you are are you willing to open source it?

    ReplyDelete
  8. Joman100 atm I have no time to manage an open source project. If you are willing to receive an archive with all the sources and create an OS project based ont it, I can do that. I will probably be able to contribute to it from time to time

    ReplyDelete
  9. Absolutely. Send me your email address in these comments. I will prevent it from appearing (I moderate all comments) and I'll contact you about the source.

    ReplyDelete
  10. Bogdan, I guess you aren't ready. Let me know when you're ready to open source your utility.

    ReplyDelete
  11. (Just as an FYI for other readers, Bogdan has sent me the code and I'm trying to figure out the best way to open source it)

    ReplyDelete
  12. Hi, can you see this project: http://sourceforge.net/projects/ketlaer/ ?
    It looks very promising.

    Goodbye

    ReplyDelete
  13. Yep, familiar with the ketlaer project, I joined it when it started but haven't been able to contribute yet. I've been trying to adapt it to work with the 1262. The complication is that the ketlaer currently links with version 3.4.4 libraries. The libraries that I can get from the Ellion source, the ones required that we don't have source for, are from the 2.96 and aren't compatible. But it is on the list to figure out a way to adapt their work.

    ReplyDelete
  14. Hi,

    I dont know if you have seen that alreay, but parts of the source code can be downloaded from argosy research inc.
    the argosy hv359t media player essentially is a screenplay pro hd without buttons on front panel, everything else seems to be the same.
    in the source archive, there is some info about the database i think, but my chinese isnt very good :D
    most of the archive is only those parts of the firmware wich are distributed unter the GNU license, but maybe you can find something useful.

    http://www.argosy.tw/mana_php/prod/file/HV359_GPL.zip

    greetings from germany

    damaltor

    ReplyDelete
  15. Thanks good going keep it up .Thanks for posting this, was what I was looking for. Good blog for me ,I like it, so give me more details about this article.

    ReplyDelete
  16. u can always use the relocation entries to backtrace the function calls. use -r option

    ReplyDelete