Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diskcorruption on 6.12.y #6492

Open
folkertvanheusden opened this issue Nov 25, 2024 · 9 comments
Open

diskcorruption on 6.12.y #6492

folkertvanheusden opened this issue Nov 25, 2024 · 9 comments

Comments

@folkertvanheusden
Copy link
Contributor

Describe the bug

Revision 521f2ba of branch rpi-6.12.y gives massive disk-corruption when realtime-kernel is enabled.

Steps to reproduce the behaviour

Configure kernel for real-time.

Device (s)

Raspberry Pi Zero 2 W, Raspberry Pi 3 Mod. B

System

raspbian 64bit lite

Logs

Not directly available. If this problem is of interest, I can see if I can reproduce it.
Errors I saw were related to directory-nodes being full.

Additional context

No response

@pelwell
Copy link
Contributor

pelwell commented Nov 25, 2024

Realtime support is new to 6.12:

  1. Have you add a non-corrupting 6.12 realtime kernel before the indicated commit?
  2. Does a non-realtime build of 6.12 work for you?
  3. Have you add a non-corrupting 6.11 realtime (with patches) kernel?
  4. Have you tried realtime kernels before?

@pelwell
Copy link
Contributor

pelwell commented Nov 25, 2024

  1. On which medium are you seeing corruptions - SD card or some external storage?

@folkertvanheusden
Copy link
Contributor Author

  1. no, it was the first I tried
  2. seems to fail as well. see log below.
  3. no
  4. no. the RT patches were only merged in 6.12 I believe.
  5. sd cards (I tried 3 different cards, all 32 GB in size)

Both the realtime and non-realtime 6.12 kernel fail:

[  103.850530] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[  103.850566] EXT4-fs error (device mmcblk0p2): htree_dirblock_to_tree:1083: inode #55844: block 2: comm apt: Directory block failed checksum
[  103.850777] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[  103.850791] EXT4-fs error (device mmcblk0p2): htree_dirblock_to_tree:1083: inode #55844: block 2: comm apt: Directory block failed checksum
[  104.173859] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[  104.173892] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[  104.174135] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[  104.174149] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[  104.236603] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[  104.236680] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[  104.237061] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[  104.237075] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[  104.303747] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[  104.303791] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
[  104.309206] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[  104.309240] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
[  104.316966] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[  104.317007] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
[  104.323162] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[  104.323196] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum

@popcornmix
Copy link
Collaborator

Both the realtime and non-realtime 6.12 kernel fail:

I think you'll need to back off to a known good point, confirm that is reliable then just update the kernel.

Start with a clean install of RPiOS Bookworm 64-bit lite.
Update it (sudo apt update && sudo apt full-upgrade).

Confirm sdcard is reliable with no complaints in dmesg when running your normal workloads.

Now update to our build of the 6.12 kernel.
sudo rpi-update next
reboot and report if any sdcard corruption issues.

@folkertvanheusden
Copy link
Contributor Author

That build of the kernel works fine.
What would be the procedure for getting a kernel with CONFIG_PREEMPT_RT instead?

@popcornmix
Copy link
Collaborator

You said

Both the realtime and non-realtime 6.12 kernel fail:

So I think the first issue to resolve is if the default 6.12 (non-realtime) kernel has an issue with corrupting the sdcard.
Once we have an answer to that, we can consider enabling RT. But you don't want to be changing too many things at once.

@folkertvanheusden
Copy link
Contributor Author

folkertvanheusden commented Nov 27, 2024

You said

Both the realtime and non-realtime 6.12 kernel fail:

So I think the first issue to resolve is if the default 6.12 (non-realtime) kernel has an issue with corrupting the sdcard. Once we have an answer to that, we can consider enabling RT. But you don't want to be changing too many things at once.

Yes, but not the one from rpi-update next.
I did a diff of the rpi-update next-version of .config and the one I came up with and see quite a few possible reasons for my version to fail.
So 'm going to try only changing the scheduling parameters in the rpi-update next-version and see if that helps.

@nbuchwitz
Copy link
Contributor

@folkertvanheusden were you able to reproduce the errors with a more recent 6.12 version? Haven't seen such errors on PREEMPT RT enabled 6.12 builds yet.

Tested our tree (with some vendor patches, mostly targeting dt for our hardware platform) on CM5 with eMMC, pi500 with the stock rpi sd card and rpi2w. So far none of these has reported any ext4 warnings - even under high (io) load tests. Are there any other warnings / errors in kernel log? There are still some quirks with rt on rpi (dwc_otg for example needs patching or switch to dwc2)

@folkertvanheusden
Copy link
Contributor Author

folkertvanheusden commented Jan 3, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants