Kernel module compatibility problem
Kernel module is not so cross compatible with a different release. Therefore, when an exact version is not found, NoDock will try to find a loosened kernel release to maximize ROM coverage.
Due to the many variants of kernel release, a loosened kernel version fall-back mechanism has been designed to work on more phones and kernel versions. When the kernel module of the exact kernel release is not found from the app, a loosened kernel release will be formed by removing the characters starting from the first '-' character, then try to lookup whether this loosened kernel release exists. If it exists, the user will be presented a dialog of choosing whether to try it. Otherwise, the app won't allow the user to load anything to prevent crashing.
If your phone's kernel version is inside the list of the loosened kernel release table below, you may still try the app. Feel free to contact me if you want a particular kernel release of your phone to be supported.
Available loosened kernel release:
When the kernel module failed to load, dmesg shows no related info about the cause, not even vermagic mismatch is showed. Looking into the kernel source reveal that there is something different when CONFIG_MODVERSIONS is set. A check that looks for the __versions section has been done prior to vermagic. objdump -s -j __versions some.ko shows the content as crc, name pairs.
After the investigation(which is related with the Milestone Overclock failed to load on Droid 2.2 issue), It shows that NoDock version older than v0.11.2a had never been able to work on Droid's stock kernel because of the CONFIG_MODVERSIONS defined in sholes_defconfig.
Now I understand why are there many "refund" of the app(about 50%). I hope when this is fixed in the next release, Droid users or any users that found NoDock doesn't work for them previously could give it a try and enjoy using it.
Thanks to Joshua who reported this issue and provided remote access to allow more experiments to be done on the ChevyNo1 kernel on Droid.
As observed from the tested kernel module and platform, it is believed that the problems for kernel module incompatibility rely in changed module specific structure(eg. struct_module -> module_layout), changed EABI(eg. __aeabi_unwind_cpp_pr0). Also, incompatible changes happen in the 3rd version number increment(eg. 2.6.29 -> 2.6.32) while 2.6.32 is compatible with 2.6.32.9 and 2.6.32.15.
Therefore, the loosened kernel version should be able to consider only the first 3 numbers. To cater kernels compiled with MODVERSIONS, all modules should be built with MODVERSIONS too. Finally, define a vermagic that is large enough to hold any possible target vermagic to allow runtime patching of the vermagic string on the target platform.
This is the temporary conclusion drew from the testing done by users, kernel source code and observation of the existing patterns. Bear in mind of any potential risk of crashing, still.
Recently, a user with using Samsung Fascinate(SCH-I500) is having problem on using NoDock. With his kind cooperation and smart(he found the android USB driver for the phone after I failed to download it on the official site) on providing remote access using TeamViewer, dmesg shows that the symbol of preempt something is unresolved.
Ahha! Samsung's defconfig that are related to module is very different than Motorola and HTC. It has the preempt disabled! Also, when modversions is enabled, the CRC of some kernel functions are different. However, such mismatch doesn't prevent the module from loading on the phone nor result in any reboot, after any usage of preempt_* is removed. Miracle!
Maybe in the future when I have time, should find out what constitutes the CRC and why don't the differences affect the execution of the module. To have a better insight into cross kernel module compatibility on different phones and different machines.
A Defy user has emailed me about loading NoDock module caused a reboot. From the panic log he obtained(/data/dontpanic/apanic_console) and sent to me, it is Oops 80d. Googling shows a guy said that this is section write error. Tracing kernel source code seems to confirm this. Searching the memory mapping related stuff in kernel finally reached arch/arm/mm/mmu.c. The diff of this file against Milestone shows that something has been added in Defy to create a read only kernel text mapping. Guess it is to protect kernel code from unexpected corruption, the side effect is that it prevent NoDock from working.
Plugging in the create_mapping(...) function from mmu.c is able to alter the section protection flag and allow the issue to be repeated on Milestone. However, the section protection doesn't seem to be refreshed until the next insmod, which is bad for NoDock.
After some more kernel surfing, probe_kernel_write(...) is found. Trying to write with this function succeeded within the same load. Further experiment by not altering the pmd showed that this function can somehow skip the section protection check. The final trial and error has pinpointed that the set_fs(KERNEL_DS) inside probe_kernel_write(...) can do the magic!
In NoDock v1.0.5.4, the set_fs(KERNEL_DS) has been added prior to writing to kernel text and it solves the kernel panic issue on Defy.
While in 2.6.29 kernel, SCH-I500 was working perfectly, the recent release 2.6.32 caused panic when trying to load NoDock. While rebuilding a ko for the particular kernel resolved the issue. It is found that the Makefile of this phone has defined more things than Motorola's Droid series. As a result, the module layout is larger (0x162 bytes) than Milestone (0x144 bytes). The command below was used to check this size:
arm-eabi-objdump -s -j .gnu.linkonce.this_module x.ko
As seen from kernel/module.c:
mod = (void *)sechdrs[modindex].sh_addr;
the module object was taken from the this_module section directly.So any reference to it larger than the size will cause memory corruption to other locations. And since the sh_size member of this is not used, larger size of this_module should not matter.
The next experiment will be to try to load Samsung's ko on Milestone since I don't have a Samsung to repeat the panic experiment. This test is to test the idea that the only things that matter is the size of this_module other than the major kernel version.
Unfortunately, Samsung's ko has unresolved symbol for MS "nodockm: Unknown symbol __gnu_mcount_nc". Perhaps the same Samsung user should be contacted again to perform this experiment. :P
It has been tested that the this_module section can be enlarged by modding modpost.c. Further experiment has to be done on a real device for a PoC.
The kernel panic is caused by too small module layout size. Enlarging it will solve the kernel panic issue, however, the exit function won't be found and the module will be unable to be unloaded.