FORUMS
Remove All Ads from XDA

[SM-G9750] Random root reboot fix (Snapdragon S10 & S10e probably, too)

446 posts
Thanks Meter: 324
 
By qwerty12, Senior Member on 31st July 2019, 02:34 PM
Post Reply Email Thread
WARNING: This won't work currently for the SM-G9730. I need a recovery.img(.lz4) from the latest firmware.

Here's a not-so-widely-tested fix for the spontaneous reboot that occurs after rooting the SM-G9750 and other Snapdragon S10 models.

tulth located this patch. If you read the description of that patch, it mentions a NULL pointer getting dereferenced in find_get_entry (such a thing tends to cause crashes in your average program, so when this happens in the kernel, it's not surprising that a crash and reset is the response). If you look at tulth's last_kmsg, my last_kmsg and G-ThGraf's last_kmsg from a G9730, you'll notice they all have one thing in common: SHTF at smaps_pte_range+0x29c. What's at that location on those devices' kernel? Why it's only find_get_entry(vma->vm_file. So yeah, it's the same bug, already known to Google and it's been fixed in their kernel tree since January. The bug is triggered externally by reading /proc/<pid>/smaps_rollup under certain conditions. You might be able to workaround this by disabling programs to get more free RAM, but The Only Way To Fix the Underlying Kernel Bug Is To Fix the Kernel Itself™.

We're probably not going to see a new kernel update until (if?) we get an update for the next major version of Android. We Snapdragon S10* users already have an older kernel compared to Exynos S10 owners (our 4.14.78 vs. their 4.14.85) and it's probably because of that they don't see this bug. So I think the idea of Samsung fixing this is a non-starter. While I did manage to build an SM-G9750 kernel from source (their instructions leave a lot to be desired) with that patch applied, I could not get my phone to boot the result.

I am not a programmer, but I do know just slightly enough to get the ball rolling and provide the fix that that aforementioned patch does in the opcode form that can be applied onto the existing kernel on the phone.
While I've not half-arsed it in the sense I took the easy way out (always having mss->check_shmem_swap set to zero is an easy one-liner workaround; however, freeing of unneeded SHM pages wouldn't happen, eventually causing your phone to crawl to a halt), I am not familiar with assembly language for any platform at all and, as such, I could not find a way to free up enough space in the show_smap function. So I jump quite far out into a chunk of the .text section where it's full of zeroes. I don't know anything about the ELF format to be able to tell you why this section of zeroes exists - I make the probably-wrong assumption it's perhaps a requirement of the ELF format if a linker that's very good at producing optimised code still bothers to output that or it's optimisation by alignment - but it's there and it's a good place to add extra code to on account of, you know, being empty and marked executable.
As far as I can see, where I have placed the code isn't referenced by anything else at all in the kernel but I can't be 100% certain on that. Nevertheless, I've been testing this on and off (I've had to manually initiate reboots in between for various reasons) myself for the past seven days or so and I've not noticed any adverse effects.
EDIT: Saying that, I think I'll try and move the code into load_module() when I get time because this kernel can't actually load modules (see below) thus much of the code there is pointless.
The risk is yours, should you choose to apply this fix.

I would've liked to wrote this as a kernel module, being far easier to maintain, and hooked the relevant smap functions (in a similar vein to flar2's wp_mod and AleksJ's ric_mod) but thanks to the geniuses at Samsung, load_module() will always return early and the compiler accordingly realises it can optimise the function by excising all the code needed to actually load a module - there's no point in keeping unreachable code. Why Samsung bothered turning on mandatory module signing is beyond me because modules will never load! You can see this for yourself: insmod /system/vendor/lib/modules/wil6210.ko will always fail with "Exec format error", and that's a signed module built and shipped by Samsung themselves for their kernel. Anyway.

As long as the kernel version remains the same, it's likely, but not guaranteed, the same patches will work for future software updates from Samsung and all I'll have to do is update the compatibility list. If you try this on any other kernel version, the chances of not being able to boot are very high. The task of maintaining this doesn't enthuse me, but I'll continue to do so out of necessity, for I like having a rooted phone but not one that restarts at the worst of times.

Click image for larger version

Name:	Screenshot_20190731-125642_Termux.jpg
Views:	526
Size:	194.8 KB
ID:	4799868
I know people have reported longer uptimes than that on their phone before having a forced restart, but in my case, my phone has AOD enabled, the latest stable Magisk version installed and is running EdXposed. Before this fix, I've never seen an uptime longer than about 16 hours (usually less), regardless of whether the phone was in use or not, as getting multiple restarts in a day tends to have that effect.

As long as you only write to the recovery partition (and that's the only block device that this guide tells you to write to ), you should always be able to use Odin to reflash it to reverse this, the process being somewhat similar to flashing Magisk in the first place but with the notable exception of not needing to factory reset anything. The following flashing routine was adapted from Magisk, so my thanks to topjohnwu.
If someone has the bright idea of sharing their already-patched recovery.img because typing copy and pasting commands is hard, I'll point out the following: anybody flashing such an image should really make sure they're running the same firmware and Magisk version the image was designed for. (And after reading ianmacd's posts, topjohnwu supposedly doesn't like pre-patched images with Magisk being shared. I'll respect that, and so should you.)

I won't take any responsibility if this damages your phone. Perform the following at your own risk. If you agree, then:
  1. If you haven't already, root the phone with Magisk. Make sure to keep a copy of the magisk_patched.tar somewhere on your computer so you can reflash it with ODIN if something goes wrong here. Always make sure Magisk is installed before modifying the recovery partition yourself. If you have a pending software update, install that with Odin and root that first before doing the following.
  2. Set up ADB on your phone and computer
  3. From your computer, adb shell into the phone
  4. Run
    Code:
    uname -r
    Only attempt to apply these patches if you get 4.14.78-16509050 back. For an older version, the bottom of this post has previous patches that may or may not apply. Or just update your phone.
  5. Run
    Code:
    su
    and then
    Code:
    rm -rf /data/local/tmp/q12kpwrk ; mkdir /data/local/tmp/q12kpwrk && cd /data/local/tmp/q12kpwrk
  6. Run
    Code:
    mkdir recovery && cd recovery
  7. Find the recovery partition on your phone by running:
    Code:
    recovery_blk="`readlink -f /dev/block/by-name/recovery`" ; [ -b "$recovery_blk" ] || echo "Eh, something's off here. Don't continue"
  8. Dump it to a file by running:
    Code:
    dd if="$recovery_blk" of=recovery.img
  9. Extract the kernel by running:
    Code:
    /data/adb/magisk/magiskboot unpack recovery.img || echo "Stop! Do not continue!"
    If you see the warning message again on a new line, then stop.
    Otherwise, if all went well with the step above (the message "Kernel is uncompressed or not a supported compressed type!" can be safely disregarded), then note that for any of these patches, if you don't get any matches or get more than one, then do not continue any further. Don't selectively apply any of these patches; it's all or nothing.
  10. Apply the first patch by running:
    Code:
    /data/adb/magisk/magiskboot hexpatch kernel F7030032895240F9F64F00F9 F7030032FD10F997F64F00F9
  11. Run
    Code:
    /data/adb/magisk/magiskboot hexpatch kernel 02000014C02E00F9E1630191 02000014ED10F997E1630191
  12. If you have an SM-G9750/Snapdragon S10+: run
    Code:
    /data/adb/magisk/magiskboot hexpatch kernel F30300AAA1010035F40313AA750640F9890E41F83F7500F103010054AA02098BC10501B0407100D121B83191 F30300AA0D000014895240F9DF420239C0035FD600000000D22E40F94E02008BCE2E00F9C0035FD621B83191
    OR if you have an SM-G9730/Snapdragon S10: there is currently no patch. Feel free to send me a recovery.img from the latest firmware and I'll adapt it
    OR if you have an SM-G9700/Snapdragon S10e (thanks to Laikar_ for the recovery.img and testing): run
    Code:
    /data/adb/magisk/magiskboot hexpatch kernel F30300AAA1010035F40313AA750640F9890E41F83F7500F103010054AA02098BA10501D0407100D121B81D91 F30300AA0D000014895240F9DF420239C0035FD600000000D22E40F94E02008BCE2E00F9C0035FD621B81D91
  13. Have the patched kernel placed into a new recovery image, new-boot.img, by running:
    Code:
    /data/adb/magisk/magiskboot repack recovery.img || echo "Stop! Do not continue!"
  14. Check to see if new-boot.img isn't somehow larger than the recovery partition itself by running
    Code:
    [ `stat -c '%s' "new-boot.img"` -gt `blockdev --getsize64 "$recovery_blk"` ] && echo "Do not continue!"
  15. Flash the new recovery image by running
    Code:
    cat new-boot.img /dev/zero >"$recovery_blk" 2>/dev/null
  16. Run
    Code:
    sync ; sync ; sync ; reboot recovery


If the phone boots again, great! If you're stuck at the Samsung-only logo that fades in and out for many minutes, just restart the phone again whilst holding the recovery button combo to boot into Android with Magisk activated like normal.
You can rm -rf the /data/local/tmp/q12kpwrk folder afterwards to get some space back.
If your phone keeps restarting, or you automatically get put into semi-bootloader flashing mode, hold the bootloader button combo to get to the blue-background downloading mode and reflash magisk_patched.tar (and HOME_CSC) with Odin. If you didn't keep said file or a Magisk-patched recovery.img you can tar up with 7-Zip and get Odin to flash as AP, you'll need to download the latest firmware for your SM-G9750 with Frija or similar, reflash that and then follow the instructions to root your phone again with Magisk.

If you do get a reboot after applying this, looking at /proc/last_kmsg will indicate if it's something to do with this patch or something else entirely.

Q&A:

Q: Will I have to reapply this if I update Magisk from Magisk Manager with a direct install?
A: No.

Q: Will I have to reapply this if I update the phone's firmware?
A: Yes, but check the new kernel's version first and see if it's listed in the compatibility section. If not, then you'll need to wait for an update to this fix. And remember to make sure that Magisk is installed first before modifying the recovery partition yourself.

Q: I don't want to wait hours to see if my phone will restart out of the blue. How can I test for this bug?
A: A variation on the steps to reproduce here, you can do this:
Code:
su
dd if=/data/media/0/AP_G9750ZHU1ASF1_CL16082828_QB24224470_REV00_user_low_ship_MULTI_CERT_meta_OS9.tar.md5 of=/dev/shm # or any very large file (3-4 GB, /dev/urandom might work). This fills up the allocated space for shared memory
cat /proc/*/smaps_rollup
If your kernel isn't patched, restart your phone certainly does. (Of course, you should probably run reboot recovery anyway if not because a full SHM isn't really conducive to a well-running Android session.)

Q: Do you have any other kernel patches?
A: Just the one, only tested on the SM-G9750, and it seems to not be needed at all - it has no bearing on this specific reboot issue anyway. This one disables one aspect of RKP. Again, I don't think this is actually needed on the S10+ , but Magisk still attempts to patch for this issue indiscriminately (probably for the benefit of older devices), although its patch will not apply to our kernel.
Code:
/data/adb/magisk/magiskboot hexpatch kernel 1FA50F7143010054491540B93FA50F71E30000544B0940B97FA50F71830000544A1940B95FA10F7168090054 1FA10F71810A0054491540B93FA10F71200A00544B0940B97FA10F71C00900544A1940B95FA10F7161090054
Q: Are you a dirty GPL violator, qwerty12?
A: No! What I am providing is the compiled form of the patch linked to in the beginning of this thread. If you want to understand what this does in lovely C, just look at that patch. Of course, I have to deal with this on the assembler level, so there is no source per se, just dump all the hex strings into an online disassembler. The first two magiskboot hexpatch invocations replace two existing instructions with jumps into the new code I add. The third hexpatch invocation adds the additional code implementing the patch - the original replaced instruction is executed, along with the code I added to set mss->check_shmem_swap to zero before vma->vm_file is checked for != NULL and for shmem_swapped to be added to mss->swap instead of replacing it.

Patches for older kernels:


4.14.78-16082828:
  • Use Magisk Manager to install the Busybox Magisk module. No, this is not optional. You can use a version of Busybox from another source, but note that this is the version I have personally tested all this with. Restart your phone anyway if you already have it installed; you want your phone's running state to be as fresh as possible to avoid the possibility of running into this bug while attempting to fix it.
  • Code:
    /data/adb/magisk/magiskboot hexpatch kernel F7030032895240F9F64F00F9 F70300327ED15494F64F00F9
  • Code:
    /data/adb/magisk/magiskboot hexpatch kernel 02000014C02E00F9E1630191 020000146ED15494E1630191
  • Code:
    printf '\x89\x52\x40\xF9\xDF\x42\x02\x39\xC0\x03\x5F\xD6\x00\x00\x00\x00\xD2\x2E\x40\xF9\x4E\x02\x00\x8B\xCE\x2E\x00\xF9\xC0\x03\x5F\xD6' | busybox dd of=kernel bs=1 seek="$((0x017F9AAC + 20))" conv=notrunc
    The magiskboot hexpatch equivalent of this was too large, so I settled for writing to a hard coded offset.
The Following 9 Users Say Thank You to qwerty12 For This Useful Post: [ View ] Gift qwerty12 Ad-Free
 
 
31st July 2019, 08:09 PM |#2  
Vuska's Avatar
Senior Member
Flag Bandung
Thanks Meter: 331
 
More
I have random reboot... will try this patch tomorrow.


Sent from my SM-G9750 using Tapatalk
1st August 2019, 05:01 AM |#3  
Vuska's Avatar
Senior Member
Flag Bandung
Thanks Meter: 331
 
More
Hi... already doing your patches... i thinks succesfully, because i dont have any error, and boot normally after last command.

So.... i have to wait if random reboot appear right ? *to test*

Thank you... will report in about 3 days
The Following User Says Thank You to Vuska For This Useful Post: [ View ] Gift Vuska Ad-Free
1st August 2019, 01:16 PM |#4  
qwerty12's Avatar
OP Senior Member
Flag (*Europe).London
Thanks Meter: 324
 
More
Hi,

Quote:
Originally Posted by Vuska

So.... i have to wait if random reboot appear right ? *to test*

You can run the commands under "Q: I don't want to wait hours to see if my phone will restart out of the blue. How can I test for this bug?" in the first post. If your phone restarts automatically when running cat, then your phone is still susceptible to restarting itself during use.

If it doesn't restart, then you need to run reboot recovery yourself immediately, but it means the fix was successfully applied.
The Following User Says Thank You to qwerty12 For This Useful Post: [ View ] Gift qwerty12 Ad-Free
1st August 2019, 06:40 PM |#5  
Member
Flag Marinha Grande / Leiria
Thanks Meter: 10
 
Donate to Me
More
PS D:\S10+\ADB platform-tools> ./adb devices
List of devices attached
R28M31K3DNZ device

PS D:\S10+\ADB platform-tools> ./adb shell
beyond2q:/ $ su
Permission denied
1|beyond2q:/ $

?????
2nd August 2019, 05:35 AM |#6  
Vuska's Avatar
Senior Member
Flag Bandung
Thanks Meter: 331
 
More
Quote:
Originally Posted by N1ldo

PS D:\S10+\ADB platform-tools> ./adb devices
List of devices attached
R28M31K3DNZ device

PS D:\S10+\ADB platform-tools> ./adb shell
beyond2q:/ $ su
Permission denied
1|beyond2q:/ $

?????

do you already install busybox via magisk ? also there will be a pop up in your device to request access from computer. accept it


already enable usb debugging in developer menu ?

permission denied .... .. strange... already rooted right ?




Sent from my SM-G9750 using Tapatalk
2nd August 2019, 11:14 AM |#7  
Member
Flag Marinha Grande / Leiria
Thanks Meter: 10
 
Donate to Me
More
Quote:
Originally Posted by Vuska

do you already install busybox via magisk ? also there will be a pop up in your device to request access from computer. accept it

already enable usb debugging in developer menu ?

permission denied .... .. strange... already rooted right ?

Sent from my SM-G9750 using Tapatalk



Yes.
As you can see in the prints below.
i try install another busybox to.
Attached Thumbnails
Click image for larger version

Name:	Screenshot_20190802-110715_Settings.jpg
Views:	172
Size:	176.2 KB
ID:	4800682   Click image for larger version

Name:	Screenshot_20190802-110533_Magisk Manager.jpg
Views:	170
Size:	119.0 KB
ID:	4800683   Click image for larger version

Name:	Screenshot_20190802-110546_Magisk Manager.jpg
Views:	170
Size:	141.0 KB
ID:	4800684   Click image for larger version

Name:	Screenshot_20190802-110617_One UI Home.jpg
Views:	140
Size:	201.7 KB
ID:	4800685  
2nd August 2019, 02:28 PM |#8  
qwerty12's Avatar
OP Senior Member
Flag (*Europe).London
Thanks Meter: 324
 
More
Quote:
Originally Posted by N1ldo

beyond2q:/ $ su
Permission denied
1|beyond2q:/ $

?????

Check your Magisk settings to see if you haven't turned off ADB superuser access and your apps list for a denied Shell entry.
The Following User Says Thank You to qwerty12 For This Useful Post: [ View ] Gift qwerty12 Ad-Free
2nd August 2019, 05:43 PM |#9  
Member
Flag Marinha Grande / Leiria
Thanks Meter: 10
 
Donate to Me
More
Quote:
Originally Posted by qwerty12

Check your Magisk settings to see if you haven't turned off ADB superuser access and your apps list for a denied Shell entry.

Thank you all ...
Yes Shell was unauthorized root on Magisk application list
The Following User Says Thank You to N1ldo For This Useful Post: [ View ] Gift N1ldo Ad-Free
4th August 2019, 10:31 AM |#10  
Vuska's Avatar
Senior Member
Flag Bandung
Thanks Meter: 331
 
More
3 days now.... i can say it successfully fixed....
Thank you.

hope you will update too when new firmware arrives....

because i dont understand some code mean.... just follow and copy paste



Sent from my SM-G9750 using Tapatalk
The Following 2 Users Say Thank You to Vuska For This Useful Post: [ View ] Gift Vuska Ad-Free
8th August 2019, 02:05 PM |#11  
FlatOutRU's Avatar
Senior Member
Flag Moscow
Thanks Meter: 91
 
More
*ASG7 firmware is out
Post Reply Subscribe to Thread

Guest Quick Reply (no urls or BBcode)
Message:
Previous Thread Next Thread
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes