Are you insisting there is a microprocessor on the chip that executes FW even though the implementation is unknown.to you?
Although you didn't direct this to me, I'll reply.
The implementation is indeed unknown to us, so we are all guessing. You said your guess is that it's implemented as random logic, just like you did for a QPI interface implementation in the past. I don't agree with this guess, but likewise you don't agree with mine. There wouldn't be any problem if it weren't for the several pages of flame posts that follow. Those make it difficult for others to find real information here.
Therefore I'll bite and give reasons why I don't agree with your guess. Please defend your guess on a technical level, not a personal one, so that the thread stays on topic. I certainly don't want to attack you (or anyone else). If it may appear so, then it is not intentional. English is not my native language and this can cause all kinds of misunderstandings.
1. All (useful!) standards are made to solve realworld problems, with realworld tools. Often a reference implementation is made (public or not) and the standard is drafted with data observed with the reference implementation. QPI is a highspeed bus with low latency. Response time is defined in clock cycles. There is no way to implement QPI in software anytime soon (except when you don't want to interface to real hardware on the other side, like a simulator for example). Therefore QPI is clearly meant to be implemented in hardware, just like you did.
The EMMC spec is different. It does mandate timings, but only those of the IO interface itself are specified in clock cycles. All other timings are given in ms and some even in seconds. Those who wrote the spec didn't have a big and fast Verilog FSM in mind. The mandated features are easy to implement in software and still meet the timing spec. All you need is a peripherial that implements the IO interface. This is very much like the situation on USB, where an IO peripherial implements the "data pump", and all the rest is usually done in software.
Along the same lines, there is another hint apart from the timing: feature diversity. Since QPI is clearly meant to be implemented in hardware, it won't request extra features that can't be implemented well in hardware. EMMC on the other hand, does require lots of stupid extra stuff that vastly complicates a pure logic implementation, for almost no benefit.
For example, the EMMC spec includes a security feature, that is probably not even used in many devices with EMMC. Yet this feature makes it necessary to implement an HMAC with SHA256. Adding those in firmware is easy, it takes just a few extra kb of code. However, a direct hardware implementation suffers a lot from this (mostly unnecessary) burden. It will take lots of extra silicon area (and IP licences) for a feature that is rarely used.
For these and similar reasons I believe that EMMC is meant to be implemented in software, not hardware. That doesn't make it impossible to do it in hardware alone, but it certainly won't be cost effective. The utilization of the various (mandatory) circuits will be too low.
2. Samsung clearly says that their EMMC devices use "firmware", even giving their firmware names and version numbers. For example "VYL00M" 0x25.
3. The official android kernel patch mentioned earlier, contains comments which reveals information. It sais "We can patch the firmware in such chips", "the wear leveling code can insert 32 bytes", "Patch the firmware in certain Samsung emmc chips", "fix bug in wear leveling firmware", "patch the firmware in the internal sram", etc.
Clearly there is firmware (code) running in the device.
4. We know two 32bit words of a valid firmware image, which are contained in the same android kernel patch. One of them decodes to two valid thumb instructions. Coincidence?
5. Also from the patch, we learn something about the internal sram size. The patch accesses 0x4dd9c and 0x379a4. Therefore the SRAM is certainly at least 0x163f8 bytes big. Applying the typical rules about how memory is implemented, it is probably at least 0x80000 bytes big. Whoever put that much sram into the device, counted with quite a complex firmware.
6. Considering that there is firmware and it runs from sram, the only way to maintain your guess of a pure hardware implementation, is to use programmable hardware. CPLDs are not complex enough to require such big firmware images, leaving only FPGAs. It's certainly possible to patch FPGA bitstreams, and indeed they "execute from" (or better: are stored in) SRAM cells. However, all FPGA architectures that I know use configuration frames instead of 32-bit data words and 32-bit aligned memory addresses. Those configuration frames are typically longer, and the population of databits within the frame is very sparse. To me, the value 0xD20228FF (from the android kernel patch) doesn't look like a typical piece of FPGA configuration. I would expect something like 0x00110044, or 0x40840001.
7. FPGAs are used to get to the market quick. But they are not cost effective in a mass market, because they contain lots of flexibility (=extra area and layers) that is not needed once the functional requirements are frozen. EMMCs are clearly a mass market product. Including an FPGA with a bitstream size of 512KB is not the way to earn money. 512KB bitstream size is about equivalent to a XC3S1400A or a XC4VLX15.
Of course we don't have much official information, but what we have seems to support an embedded CPU core and at least 512kb SRAM available to it. For the reasons above, it is more likely than anything else.