@mamaich: That is fantastic. For myself, I'd usually recommend using the Clang C/C++ parser for this kind of thing; it's much better documented and easier to extend than GCC, and also has a very clear compilation pipeline that makes it easy to pull out the specific layer of the process that you want to use and pipe it into another codebase. TCC (Tiny C Compiler, originally by Fabrice Bellard*) is another possibility, though I don't know if it supports all the calling convention "keywords" used in MSVC sources correctly (which you would obviously need). It's smaller than Clang though, which may make using it for a specialized purpose easier.
As for writing your own DynRec engine, that would be an incredible project - one which I would truly love to see, and which I would strongly recommend that you open-source for collaborative development - but I'm glad to see you've got a good plan for compatibility first. While gaming would be (already is, actually - I love HoMM3 as well) a great acheivement, most of the programs that I want to run are simply small utilities that were never open-sourced, or possibly even some that are too much hassle to port. With those, perf is much less of an issue. Maybe I'm in the minority, there, though. Regarding the project itself, self-modifying code or even anything using JIT compilation sounds very tricky (offsets constantly changing due to different instruction sizes, probably needing to use copy-on-write for modifying code and then recompiling the written instructions and fixing the offsets) but it occurs to me that one of the huge improvements it could give is to take advantage of the processor-specific optimizations already present on the Tegra 3 (and any other chips you target). Even with a nominally RISC ISA (like ARM), some instructions will be more expensive than others and using more efficiently selected instructions may improve speed over straight transliteration of the x86 code.
@SixSixSevenSeven: Please don't take the "90 MHz" statement as gospel. For one thing, it's an incredibly rough approximation based largely on numbers pulled from my ass. For another, that was based on "will it run SC?" not "what is the actual emulated speed?" so while it's a reasonable statement that "it will probably be possible to run SC even if not quickly, and the requirements for AoE are about the same", it's probably not accurate to say "the emulated CPU runs at this speed."
Also, CPU clock speeds only tell a portion fo the story. As I mentioned above, some instructions are faster than others, and this largely depends on the CPU being used. The Core iN series of Intel CPUs often run at a lower clock rate, per core, than most Pentium 4 chips did. However, they are far, far faster than the P4 at real-world operations, because the P4 architecture ("netburst") was very poorly optimized for certain operations, such as branching; if it "guessed" wrongly at the outcome of an "if" statement, it had to throw away potentially dozens of clock cylces of work to refill the pipeline with the other branch's instructions. Since the performance of win86emu isn't going to exactly duplicate any Intel or AMD CPU, you can't really make a statement of "it runs at the speed of an X CPU at Y MHz" even if we had a proper way to measure that.
Speaking of which, though, it may be good to get some additional benchmarking tools to run. 7Z uses a fairly simple set of instructions, and while it's interesting as a data point, it's not an ideal indicator of how fast many other types of program (games, audio decoders, OpenGL emulation in software, any kind of AI, anything heavy on floating-point, anything that makes a bunch of system calls through the win86emu stub DLLs, etc.) will run.