Here's a not-so-widely-tested fix for the spontaneous reboot that occurs after rooting the SM-G9750 and other Snapdragon S10 models.
tulth located this patch. If you read the description of that patch, it mentions a NULL pointer getting dereferenced in find_get_entry (such a thing tends to cause crashes in your average program, so when this happens in the kernel, it's not surprising that a crash and reset is the response). If you look at tulth's last_kmsg, my last_kmsg and G-ThGraf's last_kmsg from a G9730, you'll notice they all have one thing in common: SHTF at smaps_pte_range+0x29c. What's at that location on those devices' kernel? Why it's only find_get_entry(vma->vm_file. So yeah, it's the same bug, already known to Google and it's been fixed in their kernel tree since January. The bug is triggered externally by reading /proc/<pid>/smaps_rollup under certain conditions. You might be able to workaround this by disabling programs to get more free RAM, but The Only Way To Fix the Underlying Kernel Bug Is To Fix the Kernel Itself™.
We're probably not going to see a new kernel update until (if?) we get an update for the next major version of Android. We Snapdragon S10* users already have an older kernel compared to Exynos S10 owners (our 4.14.78 vs. their 4.14.85) and it's probably because of that they don't see this bug. So I think the idea of Samsung fixing this is a non-starter. While I did manage to build an SM-G9750 kernel from source (their instructions leave a lot to be desired) with that patch applied, I could not get my phone to boot the result.
I am not a programmer, but I do know just slightly enough to get the ball rolling and provide the fix that that aforementioned patch does in the opcode form that can be applied onto the existing kernel on the phone.
While I've not half-arsed it in the sense I took the easy way out (always having mss->check_shmem_swap set to zero is an easy one-liner workaround; however, freeing of unneeded SHM pages wouldn't happen, eventually causing your phone to crawl to a halt), I am not familiar with assembly language for any platform at all and, as such, I could not find a way to free up enough space in the show_smap function. So I jump
As far as I can see, where I have placed the code isn't referenced by anything else at all in the kernel but I can't be 100% certain on that. Nevertheless, I've been testing this on and off (I've had to manually initiate reboots in between for various reasons) myself for the past seven days or so and I've not noticed any adverse effects.
EDIT: Saying that, I think I'll try and move the code into load_module() when I get time because this kernel can't actually load modules (see below) thus much of the code there is pointless.
I would've liked to wrote this as a kernel module, being far easier to maintain, and hooked the relevant smap functions (in a similar vein to flar2's wp_mod and AleksJ's ric_mod) but thanks to the geniuses at Samsung, load_module() will always return early and the compiler accordingly realises it can optimise the function by excising all the code needed to actually load a module - there's no point in keeping unreachable code. Why Samsung bothered turning on mandatory module signing is beyond me because modules will never load! You can see this for yourself: insmod /system/vendor/lib/modules/wil6210.ko will always fail with "Exec format error", and that's a signed module built and shipped by Samsung themselves for their kernel. Anyway.
As long as the kernel version remains the same, it's likely, but not guaranteed, the same patches will work for future software updates from Samsung and all I'll have to do is update the compatibility list. If you try this on any other kernel version, the chances of not being able to boot are very high. The task of maintaining this doesn't enthuse me, but I'll continue to do so out of necessity, for I like having a rooted phone but not one that restarts at the worst of times.
I know people have reported longer uptimes than that on their phone before having a forced restart, but in my case, my phone has AOD enabled, the latest stable Magisk version installed and is running EdXposed. Before this fix, I've never seen an uptime longer than about 16 hours (usually less), regardless of whether the phone was in use or not, as getting multiple restarts in a day tends to have that effect.
As long as you only write to the recovery partition (and that's the only block device that this guide tells you to write to ), you should always be able to use Odin to reflash it to reverse this, the process being somewhat similar to flashing Magisk in the first place but with the notable exception of not needing to factory reset anything. The following flashing routine was adapted from Magisk, so my thanks to topjohnwu.
If someone has the bright idea of sharing their already-patched recovery.img because typing
I won't take any responsibility if this damages your phone. Perform the following at your own risk. If you agree, then:
- If you haven't already, root the phone with Magisk. Make sure to keep a copy of the magisk_patched.tar somewhere on your computer so you can reflash it with ODIN if something goes wrong here. Always make sure Magisk is installed before modifying the recovery partition yourself. If you have a pending software update, install that with Odin and root that first before doing the following.
- Set up ADB on your phone and computer
- From your computer, adb shell into the phone
- Run Code:
- Run Code:
rm -rf /data/local/tmp/q12kpwrk ; mkdir /data/local/tmp/q12kpwrk && cd /data/local/tmp/q12kpwrk
- Run Code:
mkdir recovery && cd recovery
- Find the recovery partition on your phone by running: Code:
recovery_blk="`readlink -f /dev/block/by-name/recovery`" ; [ -b "$recovery_blk" ] || echo "Eh, something's off here. Don't continue"
- Dump it to a file by running: Code:
dd if="$recovery_blk" of=recovery.img
- Extract the kernel by running: Code:
/data/adb/magisk/magiskboot unpack recovery.img || echo "Stop! Do not continue!"
Otherwise, if all went well with the step above (the message "Kernel is uncompressed or not a supported compressed type!" can be safely disregarded), then note that for any of these patches, if you don't get any matches or get more than one, then do not continue any further. Don't selectively apply any of these patches; it's all or nothing.
- Apply the first patch by running: Code:
/data/adb/magisk/magiskboot hexpatch kernel F7030032895240F9F64F00F9 F7030032FD10F997F64F00F9
- Run Code:
/data/adb/magisk/magiskboot hexpatch kernel 02000014C02E00F9E1630191 02000014ED10F997E1630191
- If you have an SM-G9750/Snapdragon S10+: run Code:
/data/adb/magisk/magiskboot hexpatch kernel F30300AAA1010035F40313AA750640F9890E41F83F7500F103010054AA02098BC10501B0407100D121B83191 F30300AA0D000014895240F9DF420239C0035FD600000000D22E40F94E02008BCE2E00F9C0035FD621B83191
OR if you have an SM-G9700/Snapdragon S10e (thanks to Laikar_ for the recovery.img and testing): runCode:
/data/adb/magisk/magiskboot hexpatch kernel F30300AAA1010035F40313AA750640F9890E41F83F7500F103010054AA02098BA10501D0407100D121B81D91 F30300AA0D000014895240F9DF420239C0035FD600000000D22E40F94E02008BCE2E00F9C0035FD621B81D91
- Have the patched kernel placed into a new recovery image, new-boot.img, by running: Code:
/data/adb/magisk/magiskboot repack recovery.img || echo "Stop! Do not continue!"
- Check to see if new-boot.img isn't somehow larger than the recovery partition itself by running Code:
[ `stat -c '%s' "new-boot.img"` -gt `blockdev --getsize64 "$recovery_blk"` ] && echo "Do not continue!"
- Flash the new recovery image by running Code:
cat new-boot.img /dev/zero >"$recovery_blk" 2>/dev/null
- Run Code:
sync ; sync ; sync ; reboot recovery
If the phone boots again, great! If you're stuck at the Samsung-only logo that fades in and out for many minutes, just restart the phone again whilst holding the recovery button combo to boot into Android with Magisk activated like normal.
You can rm -rf the /data/local/tmp/q12kpwrk folder afterwards to get some space back.
If your phone keeps restarting, or you automatically get put into semi-bootloader flashing mode, hold the bootloader button combo to get to the blue-background downloading mode and reflash magisk_patched.tar (and HOME_CSC) with Odin. If you didn't keep said file or a Magisk-patched recovery.img you can tar up with 7-Zip and get Odin to flash as AP, you'll need to download the latest firmware for your SM-G9750 with Frija or similar, reflash that and then follow the instructions to root your phone again with Magisk.
If you do get a reboot after applying this, looking at /proc/last_kmsg will indicate if it's something to do with this patch or something else entirely.
Q: Will I have to reapply this if I update Magisk from Magisk Manager with a direct install?
Q: Will I have to reapply this if I update the phone's firmware?
A: Yes, but check the new kernel's version first and see if it's listed in the compatibility section. If not, then you'll need to wait for an update to this fix. And remember to make sure that Magisk is installed first before modifying the recovery partition yourself.
Q: I don't want to wait hours to see if my phone will restart out of the blue. How can I test for this bug?
A: A variation on the steps to reproduce here, you can do this:
su dd if=/data/media/0/AP_G9750ZHU1ASF1_CL16082828_QB24224470_REV00_user_low_ship_MULTI_CERT_meta_OS9.tar.md5 of=/dev/shm # or any very large file (3-4 GB, /dev/urandom might work). This fills up the allocated space for shared memory cat /proc/*/smaps_rollup
Q: Do you have any other kernel patches?
A: Just the one, only tested on the SM-G9750, and it seems to not be needed at all - it has no bearing on this specific reboot issue anyway. This one disables one aspect of RKP. Again, I don't think this is actually needed on the S10+ , but Magisk still attempts to patch for this issue indiscriminately (probably for the benefit of older devices), although its patch will not apply to our kernel.
/data/adb/magisk/magiskboot hexpatch kernel 1FA50F7143010054491540B93FA50F71E30000544B0940B97FA50F71830000544A1940B95FA10F7168090054 1FA10F71810A0054491540B93FA10F71200A00544B0940B97FA10F71C00900544A1940B95FA10F7161090054
A: No! What I am providing is the compiled form of the patch linked to in the beginning of this thread. If you want to understand what this does in lovely C, just look at that patch. Of course, I have to deal with this on the assembler level, so there is no source per se, just dump all the hex strings into an online disassembler. The first two magiskboot hexpatch invocations replace two existing instructions with jumps into the new code I add. The third hexpatch invocation adds the additional code implementing the patch - the original replaced instruction is executed, along with the code I added to set mss->check_shmem_swap to zero before vma->vm_file is checked for != NULL and for shmem_swapped to be added to mss->swap instead of replacing it.
Patches for older kernels:
- Use Magisk Manager to install the Busybox Magisk module. No, this is not optional. You can use a version of Busybox from another source, but note that this is the version I have personally tested all this with. Restart your phone anyway if you already have it installed; you want your phone's running state to be as fresh as possible to avoid the possibility of running into this bug while attempting to fix it.
/data/adb/magisk/magiskboot hexpatch kernel F7030032895240F9F64F00F9 F70300327ED15494F64F00F9
/data/adb/magisk/magiskboot hexpatch kernel 02000014C02E00F9E1630191 020000146ED15494E1630191
printf '\x89\x52\x40\xF9\xDF\x42\x02\x39\xC0\x03\x5F\xD6\x00\x00\x00\x00\xD2\x2E\x40\xF9\x4E\x02\x00\x8B\xCE\x2E\x00\xF9\xC0\x03\x5F\xD6' | busybox dd of=kernel bs=1 seek="$((0x017F9AAC + 20))" conv=notrunc