Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel 6.6.y and DAC Allo Piano 2.1+Kali fails init #6527

Open
foonerd opened this issue Dec 10, 2024 · 58 comments
Open

Kernel 6.6.y and DAC Allo Piano 2.1+Kali fails init #6527

foonerd opened this issue Dec 10, 2024 · 58 comments

Comments

@foonerd
Copy link
Contributor

foonerd commented Dec 10, 2024

Describe the bug

It seems that sound/soc/bcm/allo-piano-dac-plus.c is not parsing parameters for alsa use. What I can observe is that the dtoverlay partly fails register the card:

[   19.836848] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517
[   19.882239] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517

However module somewhat loads:
ls -la /sys/module/snd_soc_allo_piano_dac_plus/

drwxr-xr-x   6 root root    0 Dec 10 08:08 .
drwxr-xr-x 150 root root    0 Jan  1  1970 ..
-r--r--r--   1 root root 4096 Dec 10 08:08 coresize
drwxr-xr-x   2 root root    0 Dec 10 08:08 drivers
drwxr-xr-x   2 root root    0 Dec 10 08:08 holders
-r--r--r--   1 root root 4096 Dec 10 08:08 initsize
-r--r--r--   1 root root 4096 Dec 10 08:08 initstate
drwxr-xr-x   2 root root    0 Dec 10 08:08 notes
-r--r--r--   1 root root 4096 Dec 10 08:08 refcnt
drwxr-xr-x   2 root root    0 Dec 10 08:08 sections
-r--r--r--   1 root root 4096 Dec 10 08:08 srcversion
-r--r--r--   1 root root 4096 Dec 10 08:08 taint
--w-------   1 root root 4096 Dec 10 08:08 uevent

Simple test with alsa-info shows that the expected DAC's control parameters are missing:

ALSA Information Script v 0.5.1
--------------------------------

This script visits the following commands/files to collect diagnostic
information about your ALSA installation and sound related hardware.

  dmesg
  lspci
  aplay
  amixer
  alsactl
  rpm, dpkg
  /proc/asound/
  /sys/class/sound/
  ~/.asoundrc (etc.)

See '/usr/sbin/alsa-info --help' for command line options.

cat: /tmp/alsa-info.2lPuin51C4/acpidevicestatus.tmp: No such file or directory
pgrep: pattern that searches for process name longer than 15 characters will result in zero matches
Try `pgrep -f' option to match against the complete command line.
cat: '/sys/module/snd_soc_allo_piano_dac_plus/parameters/*': No such file or directory

In addition, there seems to be a deadlock after module load preventing init level change.

Steps to reproduce the behaviour

Load overlay with config.txt:

# Enable audio (loads snd_bcm2835)
dtparam=audio=off
[all]
dtoverlay=allo-piano-dac-plus-pcm512x-audio

Load manually:

sudo dtoveraly allo-piano-dac-plus-pcm512x-audio

Device (s)

Raspberry Pi 3 Mod. B+, Raspberry Pi 4 Mod. B, Raspberry Pi 400, Raspberry Pi 5, Raspberry Pi CM4, Raspberry Pi CM4 Lite, Other

System

cat /etc/rpi-issue

Raspberry Pi reference 2024-07-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 48efb5fc5485fafdc9de8ad481eb5c09e1182656, stage5

vcgencmd version

Nov 26 2024 12:54:19 
Copyright (c) 2012 Broadcom
version 2ae30f53898ae2f1ba77ff570a92991bedfb0398 (clean) (release) (start)

uname -a

Linux bookworm 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

Logs

dmesg shows continuously:

[  242.652712] INFO: task alsactl:514 blocked for more than 120 seconds.
[  242.652754]       Tainted: G         C         6.6.62+rpt-rpi-v8 #1 Debian 1:6.6.62-1+rpt1
[  242.652766] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.652773] task:alsactl         state:D stack:0     pid:514   ppid:1      flags:0x00000805
[  242.652800] Call trace:
[  242.652810]  __switch_to+0xe0/0x148
[  242.652841]  __schedule+0x37c/0xd60
[  242.652856]  schedule+0x64/0x108
[  242.652871]  schedule_preempt_disabled+0x2c/0x50
[  242.652893]  rwsem_down_read_slowpath+0x214/0x528
[  242.652915]  down_read+0xac/0xc0
[  242.652931]  snd_soc_card_get_kcontrol+0x34/0xc0 [snd_soc_core]
[  242.653171]  snd_soc_limit_volume+0x2c/0x80 [snd_soc_core]
[  242.653360]  pcm512x_set_reg_master+0x110/0x138 [snd_soc_allo_piano_dac_plus]
[  242.653401]  snd_ctl_elem_write+0xfc/0x208 [snd]
[  242.653508]  snd_ctl_ioctl+0x14c/0x920 [snd]
[  242.653598]  __arm64_sys_ioctl+0xb4/0x100
[  242.653619]  invoke_syscall+0x50/0x128
[  242.653642]  el0_svc_common.constprop.0+0xc8/0xf0
[  242.653662]  do_el0_svc+0x24/0x38
[  242.653681]  el0_svc+0x38/0xd0
[  242.653702]  el0t_64_sync_handler+0x100/0x130
[  242.653723]  el0t_64_sync+0x190/0x198
[  242.653745] INFO: task alsactl:532 blocked for more than 120 seconds.
[  242.653758]       Tainted: G         C         6.6.62+rpt-rpi-v8 #1 Debian 1:6.6.62-1+rpt1
[  242.653768] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.653775] task:alsactl         state:D stack:0     pid:532   ppid:1      flags:0x00000004
[  242.653796] Call trace:
[  242.653803]  __switch_to+0xe0/0x148
[  242.653820]  __schedule+0x37c/0xd60
[  242.653834]  schedule+0x64/0x108
[  242.653849]  schedule_preempt_disabled+0x2c/0x50
[  242.653865]  rwsem_down_write_slowpath+0x32c/0x6d8
[  242.653888]  down_write+0x98/0xa8
[  242.653906]  snd_ctl_release+0x74/0x130 [snd]
[  242.653996]  __fput+0xc4/0x290
[  242.654017]  __fput_sync+0x58/0x70
[  242.654037]  __arm64_sys_close+0x40/0x90
[  242.654056]  invoke_syscall+0x50/0x128
[  242.654076]  el0_svc_common.constprop.0+0x48/0xf0
[  242.654095]  do_el0_svc+0x24/0x38
[  242.654113]  el0_svc+0x38/0xd0
[  242.654133]  el0t_64_sync_handler+0x100/0x130
[  242.654154]  el0t_64_sync+0x190/0x198
[  242.654221] INFO: task wireplumber:989 blocked for more than 120 seconds.
[  242.654233]       Tainted: G         C         6.6.62+rpt-rpi-v8 #1 Debian 1:6.6.62-1+rpt1
[  242.654243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.654251] task:wireplumber     state:D stack:0     pid:989   ppid:961    flags:0x00000804
[  242.654271] Call trace:
[  242.654277]  __switch_to+0xe0/0x148
[  242.654293]  __schedule+0x37c/0xd60
[  242.654308]  schedule+0x64/0x108
[  242.654322]  schedule_preempt_disabled+0x2c/0x50
[  242.654338]  rwsem_down_write_slowpath+0x32c/0x6d8
[  242.654356]  down_write+0x98/0xa8
[  242.654372]  snd_ctl_release+0x74/0x130 [snd]
[  242.654467]  __fput+0xc4/0x290
[  242.654488]  __fput_sync+0x58/0x70
[  242.654507]  __arm64_sys_close+0x40/0x90
[  242.654525]  invoke_syscall+0x50/0x128
[  242.654544]  el0_svc_common.constprop.0+0xc8/0xf0
[  242.654563]  do_el0_svc+0x24/0x38
[  242.654583]  el0_svc+0x38/0xd0
[  242.654603]  el0t_64_sync_handler+0x100/0x130
[  242.654623]  el0t_64_sync+0x190/0x198
[  363.484702] INFO: task alsactl:514 blocked for more than 241 seconds.
[  363.484745]       Tainted: G         C         6.6.62+rpt-rpi-v8 #1 Debian 1:6.6.62-1+rpt1
[  363.484758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Additional context

No response

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

The last working kernel version was 6.1.y.
Checked with a simple overlay allo-piano-dac-pcm512x-audio and a single chip channel loads with a simplified controls. It leads me to believe, that the switch part is not functioning as it is meant to be.

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

517 is EPROBEDEFER, which means "not yet" - the kernel will try again later. It's nice if drivers suppress those error messages as they can be confusing, but otherwise I don['t see anything wrong there. The stall is probably caused by something failing to release a mutex.

Like most of the drivers we host in our kernel, we can't offer any support for them unless they reveal an underlying problem in the bits we do maintain. Our involvement consists of forward-porting the patches to new kernel versions, which is basically just a matter of getting them to compile again when the APIs change.

Allo seems to be long gone - your best hope would be that one of the users out there has some coding experience. Even a relative novice could learn quite a bit about the problem by scattering a few pr_info() calls around the driver.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Thank you for looking into this. Appreciated. Being a "novice" as you mention I can dive into the module, however will need a bit of guidance: mutex. What troubleshooting steps or actions would you suggest?

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

Here's a rough plan of action:

  1. After booting your current kernel, run dmesg -l err,warn and look for any messages that might be relevant.
  2. Once the error has occurred, run that dmesg again, and look for additions. Report your findings.
  3. As you'll be building your own kernel you should use the fastest Pi you have, which looks to be a Pi 5. To avoid the need to set up a cross-compiler, I recommend you use a 64-bit image (to match the 64-bit kernel). Just in case something goes disastrously wrong, you might want to install a fresh image to a spare card - even a 64-bit Lite image would be enough.
  4. Follow the kernel build instructions here: https://www.raspberrypi.com/documentation/computers/linux_kernel.html#building
    You need the "Natively build a kernel" instructions.
  5. Once compilation has completed, follow the installation steps and verify that it boots.
  6. Now we're going trace all the snd_soc_... calls. At the top of the file, after the last #include, add:
#define snd_soc_component_read(c, r) my_snd_soc_component_read(c, r, __LINE__)
#define snd_soc_component_write(c, r, v) my_snd_soc_component_write(c, r, v, __LINE__)
static unsigned int my_snd_soc_component_read(struct snd_soc_component *component,
				      unsigned int reg, int line);
static int my_snd_soc_component_write(struct snd_soc_component *component,
			    unsigned int reg, unsigned int val, int line);
  1. At the end of the file (before MODULE_AUTHOR should be OK), put this:
#undef snd_soc_component_read
#undef snd_soc_component_write

static unsigned int my_snd_soc_component_read(struct snd_soc_component *component,
				      unsigned int reg, int line)
{
	unsigned int val;
	pr_info("< read %x (%d)\n", reg, line);
	val = snd_soc_component_read(component, reg);
	pr_info("> read %x -> %x (%d)\n", reg, val, line);
	return val;
}

static int my_snd_soc_component_write(struct snd_soc_component *component,
			    unsigned int reg, unsigned int val, int line)
{
	int ret;
	pr_info("< write %x = %x (%d)\n", reg, val, line);
	ret = snd_soc_component_write(component, reg, val);
	pr_info("> write %x = %x ret %d (%d)\n", reg, val, ret, line);
	return ret;
}
  1. Rebuild and reinstall the kernel (it should be quick this time since you've only changed one file), and confirm that you are seeing messages (dmesg -l err).
  2. Run it until you get a failure, then dmesg -l err,warn | pastebinit (you may need to sudo apt install pastebinit first).

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Thank you for the steps. This round, the module was built against "stock" Pi kernel - 6.6.62-v7+ #1816 using cross-compiler.

load dtoverlay -> http://paste.debian.net/1339146/
after reboot on alsa test -> http://paste.debian.net/1339151/

I would expect a bit more output at this time, however the test was on Pi 3B+, moving to Pi5 or CM5 to speed-up the native build work.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Stock kernel:
uname -a
Linux foopi 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

dmesg -l err,warn

No overlays DAC overlay dynamic DAC overlay reboot
https://paste.debian.net/hidden/6fb41fac/ https://paste.debian.net/hidden/e3bfc68c/ https://paste.debian.net/hidden/81078b39/

There are no new messages from test other than task alsactl:67778 blocked for more than xxx seconds.
On shutdown/reboot the alsa-restore.service stop job waits for over 5 min; hard reset.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Kernel from repo:
uname -a
Linux foopi 6.6.63-v8-16k+ #1 SMP PREEMPT Tue Dec 10 13:25:12 GMT 2024 aarch64 GNU/Linux

dmesg -l err,warn

No overlays DAC overlay dynamic DAC overlay reboot
https://paste.debian.net/hidden/55089d41/ https://paste.debian.net/hidden/4e15b3e3/ https://paste.debian.net/hidden/f6ed63d9/

Same behaviour as before.

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

Odd. Can you put another pr_err (any message you like) before the "return 0;" at the end of snd_allo_piano_dac_probe? That should confirm whether the driver initialisation is completing or not.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Kernel with updated module:
uname -a
Linux foopi 6.6.63-v8-16k+ #1 SMP PREEMPT Tue Dec 10 13:25:12 GMT 2024 aarch64 GNU/Linux

dmesg -l err,warn

No overlays DAC overlay dynamic DAC overlay reboot
https://paste.debian.net/hidden/a4c4c1a0/ https://paste.debian.net/hidden/44f61fe7/ https://paste.debian.net/hidden/e8d35d5f/

Same behaviour as before.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Odd. Can you put another pr_err (any message you like) before the "return 0;" at the end of snd_allo_piano_dac_probe? That should confirm whether the driver initialisation is completing or not.

                if ((mute_gpio[0]) && (mute_gpio[1]))
                        snd_allo_piano_gpio_mute(&snd_allo_piano_dac);
                pr_err("DAC issue #6527\n");

                return 0;

Result: https://paste.debian.net/hidden/0aaf8bb3/

Looks like it does.

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

(I'll take your word that there is no change - you don't need to upload the logs each time)
Let's just double check that all the probe calls are failing - add another message at the start of snd_allo_piano_dac_probe, then if we only see it twice followed by two -517s then we know that the instantiation never succeeds.

@6by9
Copy link
Contributor

6by9 commented Dec 10, 2024

sudo cat /sys/kernel/debug/devices_deferred should also tell you if there are still outstanding devices that are deferred. (The dependency will only be listed if the driver has used dev_err_probe though)

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

sudo cat /sys/kernel/debug/devices_deferred should also tell you if there are still outstanding devices that are deferred. (The dependency will only be listed if the driver has used dev_err_probe though)

Nothing, just empty:

foonerd@foopi:~ $ sudo cat /sys/kernel/debug/devices_deferred
foonerd@foopi:~ $ sudo ls -la /sys/kernel/debug/devices_deferred
-r--r--r-- 1 root root 0 Jan  1  1970 /sys/kernel/debug/devices_deferred

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

(The dependency will only be listed if the driver has used dev_err_probe though)

It doesn't.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

add another message at the start of snd_allo_piano_dac_probe, then if we only see it twice followed by two -517s then we know that the instantiation never succeeds.

This is getting interesting - https://paste.debian.net/hidden/168f11aa/

Purely boot only, nothing touched... yet.

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

Well that's a bit of a relief - I was beginning to wonder how you could be getting so far if the probe() had never succeeded.

@6by9
Copy link
Contributor

6by9 commented Dec 10, 2024

(The dependency will only be listed if the driver has used dev_err_probe though)

It doesn't.

The fact it is still deferred would be listed, just not what the dependency was.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Honestly, I'm scratching my head for past two weeks, doubting my sanity with this.
On the serious note - there is a quite clever switch in the code. Perhaps barking up the wrong tree - could the overlay as such need a revamp?

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

It seems unlikely, since it only provides a static configuration - by the time you've exited the probe function, the overlay plays no further part.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

To add to the confusion, when using basic overlay: allo-piano-dac-pcm512x-audio the DAC works with limited functionality. This bypasses all channels separation 2xPCM, lowpass, etc.

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

Are you seeing any of my debug output when you use the controls? If not, which controls are available to you in this subset?

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

I think I made my last remark confusing.
Hardware is Piano 2.1

  1. When using https://github.com/raspberrypi/linux/blob/rpi-6.6.y/sound/soc/bcm/allo-piano-dac.c via overlay https://github.com/raspberrypi/linux/blob/rpi-6.6.y/arch/arm/boot/dts/overlays/allo-piano-dac-pcm512x-audio-overlay.dts there are limited controls I was referring to. Audio is active via one of the pcm512x chip.

  2. However, the DAC should be using https://github.com/raspberrypi/linux/blob/rpi-6.6.y/sound/soc/bcm/allo-piano-dac-plus.c via overlay https://github.com/raspberrypi/linux/blob/rpi-6.6.y/arch/arm/boot/dts/overlays/allo-piano-dac-plus-pcm512x-audio-overlay.dts and this is where debug output has been added.

Will add debug params to 1 and check.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

The 1 seems to be producing events. Not sure how this is going to help though.

Comparing with working DAC on kernel 6.1.y, the node parameters is created under /sys/module/snd_soc_allo_piano_dac_plus/ and this is what alsa is using for hw2. On kernel 6.6.y node is missing.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

A simple test, forcing bogus return 0; from snd_allo_piano_dac_probe and it is clear that all the params are not available to alsa. There is literally nothing in dmesg nor journalctl to follow up on. Any ideas?

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

Hey @pelwell,

This is getting interesting. After adding #include <linux/i2c.h> all of the sudden:

[   12.949688] < read 13d (306)
[   12.949693] > read 13d -> 30 (306)
[   12.949694] < read 13e (308)
[   12.949695] > read 13e -> 30 (308)
[   12.949697] < write 13e = 30 (328)
[   12.950447] > write 13e = 30 ret 0 (328)
[   12.950449] < write 13d = 30 (330)
[   12.951185] > write 13d = 30 ret 0 (330)
[   12.951187] < write 103 = 0 (332)
[   12.951916] > write 103 = 0 ret 0 (332)
[   12.951917] < write 103 = 0 (334)
[   12.952646] > write 103 = 0 ret 0 (334)

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

The output here (https://paste.debian.net/hidden/168f11aa/) showed it reaching the end of snd_allo_piano_dac_probe ("DAC issue #6527"), so why do you need to force it? And what changed between that run and the previous one (that produced much less output), other than the addition of the "DAC probe begin" message?

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

The only differences at this point were:

  1. add another message at the start of snd_allo_piano_dac_probe
  2. switched all power off, coffee break, power back on

@pelwell
Copy link
Contributor

pelwell commented Dec 10, 2024

Your log above shows the output from snd_allo_piano_dual_mode_put, turning dual_mode off. Do any other controls exist/work?

@foonerd
Copy link
Contributor Author

foonerd commented Dec 10, 2024

The amixer is used to set specific hwdevice with params. However, module once loaded renders alsa completely unresponsive and console hungs. No other controls are available. The continuous message kernel: INFO: task alsactl:1683 blocked for more than xxx seconds about 2 min apart.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

Well, the DAC init is stuck here, indefinite loop:

[  484.320360] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.320363] task:alsactl         state:D stack:0     pid:750   ppid:1      flags:0x00400801
[  484.320369] Call trace:
[  484.320372]  __switch_to+0xe0/0x148
[  484.320381]  __schedule+0x31c/0xd60
[  484.320385]  schedule+0x64/0x108
[  484.320389]  schedule_preempt_disabled+0x2c/0x50
[  484.320393]  rwsem_down_write_slowpath+0x23c/0x6e0
[  484.320398]  down_write+0xa0/0xb0
[  484.320402]  snd_ctl_release+0x74/0x138 [snd]
[  484.320427]  __fput+0xc4/0x290
[  484.320432]  __fput_sync+0x58/0x70
[  484.320434]  __arm64_sys_close+0x40/0x90
[  484.320440]  invoke_syscall+0x50/0x128
[  484.320446]  el0_svc_common.constprop.0+0xd0/0xf8
[  484.320450]  do_el0_svc_compat+0x28/0x48
[  484.320455]  el0_svc_compat+0x30/0x88
[  484.320458]  el0t_32_sync_handler+0x98/0x148
[  484.320461]  el0t_32_sync+0x194/0x198

I think without addressing this part, we can't progress further.
Earlier on you mention that this may be related to mutex. Suggestions?

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

In such case I will put my spyglass down, for a bit at least.

@pelwell
Copy link
Contributor

pelwell commented Dec 11, 2024

One thing you can try quickly is removing the 24db_digital_gain parameter which, based on the call stacks, you seem to be using.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

Rebuild module with?

/*
		digital_gain_0db_limit =
			!of_property_read_bool(pdev->dev.of_node,
						"allo,24db_digital_gain");
*/

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

With above code commented out, the same deadlock:

[   64.871893] ============================================
[   64.877220] WARNING: possible recursive locking detected
[   64.882548] 6.6.63-v8-16k+ #2 Tainted: G         C        
[   64.888050] --------------------------------------------
[   64.893377] alsactl/1461 is trying to acquire lock:
[   64.898269] ffff80012d380230 (&card->controls_rwsem){++++}-{3:3}, at: snd_soc_card_get_kcontrol+0x34/0x70 [snd_soc_core]
[   64.909222] 
               but task is already holding lock:
[   64.915072] ffff80012d380230 (&card->controls_rwsem){++++}-{3:3}, at: snd_ctl_elem_write+0x38/0x208 [snd]
[   64.924691] 
               other info that might help us debug this:
[   64.931239]  Possible unsafe locking scenario:

[   64.937177]        CPU0
[   64.939623]        ----
[   64.942070]   lock(&card->controls_rwsem);
[   64.946175]   lock(&card->controls_rwsem);
[   64.950281] 
                *** DEADLOCK ***

[   64.956219]  May be due to missing lock nesting notation

[   64.963029] 1 lock held by alsactl/1461:
[   64.966960]  #0: ffff80012d380230 (&card->controls_rwsem){++++}-{3:3}, at: snd_ctl_elem_write+0x38/0x208 [snd]
[   64.977013] 

@pelwell
Copy link
Contributor

pelwell commented Dec 11, 2024

Oh - it defaults to being enabled. Either restore that code and use "dtoverlay=...,24db_digital_gain=off", or replace the commented out code with digital_gain_0db_limit = false;.

I have a patch I'm just about to push that may solve the problem, but this confirmation would be useful.

pelwell added a commit to pelwell/linux that referenced this issue Dec 11, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: raspberrypi#6527

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to pelwell/linux that referenced this issue Dec 11, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: raspberrypi#6527

Signed-off-by: Phil Elwell <[email protected]>
@pelwell
Copy link
Contributor

pelwell commented Dec 11, 2024

See #6532.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

reverted comment and sudo dtoverlay allo-piano-dac-plus-pcm512x-audio,24db_digital_gain=off

[   50.906150] ============================================
[   50.911476] WARNING: possible recursive locking detected
[   50.916803] 6.6.63-v8-16k+ #2 Tainted: G         C        
[   50.922305] --------------------------------------------
[   50.927632] alsactl/1448 is trying to acquire lock:
[   50.932523] ffff800105d35230 (&card->controls_rwsem){++++}-{3:3}, at: snd_soc_card_get_kcontrol+0x34/0x70 [snd_soc_core]
[   50.943474] 
               but task is already holding lock:
[   50.949324] ffff800105d35230 (&card->controls_rwsem){++++}-{3:3}, at: snd_ctl_elem_write+0x38/0x208 [snd]
[   50.958945] 
               other info that might help us debug this:
[   50.965493]  Possible unsafe locking scenario:

[   50.971430]        CPU0
[   50.973876]        ----
[   50.976322]   lock(&card->controls_rwsem);
[   50.980428]   lock(&card->controls_rwsem);
[   50.984533] 
                *** DEADLOCK ***

[   50.990470]  May be due to missing lock nesting notation

[   50.997280] 1 lock held by alsactl/1448:
[   51.001211]  #0: ffff800105d35230 (&card->controls_rwsem){++++}-{3:3}, at: snd_ctl_elem_write+0x38/0x208 [snd]

with code block:

                digital_gain_0db_limit = false;
//                      !of_property_read_bool(pdev->dev.of_node,
//                                              "allo,24db_digital_gain");

Jackpot!

[   31.685473] DAC probe begin
[   31.685945] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517
[   31.698282] pcm512x 1-004c: supply AVDD not found, using dummy regulator
[   31.698759] pcm512x 1-004c: supply DVDD not found, using dummy regulator
[   31.698814] pcm512x 1-004c: supply CPVDD not found, using dummy regulator
[   31.702763] DAC probe begin
[   31.702918] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517
[   31.703585] pcm512x 1-004d: supply AVDD not found, using dummy regulator
[   31.703793] pcm512x 1-004d: supply DVDD not found, using dummy regulator
[   31.704165] pcm512x 1-004d: supply CPVDD not found, using dummy regulator
[   31.707716] DAC probe begin
[   31.731024] < write 13d = ffffff58 (597)
[   31.732362] > write 13d = ffffff58 ret 0 (597)
[   31.732370] < write 13e = ffffff58 (602)
[   31.733124] > write 13e = ffffff58 ret 0 (602)
[   31.733168] < write 103 = ffffffee (663)
[   31.733926] > write 103 = ffffffee ret 0 (663)
[   31.733929] < write 103 = ffffffee (668)
[   31.734683] > write 103 = ffffffee ret 0 (668)
[   31.780738] pcm512x 1-004c: No SCLK, using BCLK: -2
[   31.782255] pcm512x 1-004d: No SCLK, using BCLK: -2
[   31.784569] Lock acquired in snd_allo_piano_dsp_program
[   31.784579] Lock released in snd_allo_piano_dsp_program
[   31.814537] < read 103 (621)
[   31.814555] > read 103 -> ee (621)
[   31.814566] < read 13d (542)
[   31.814569] > read 13d -> 58 (542)
[   31.814572] < read 13e (549)
[   31.814576] > read 13e -> 58 (549)
[   31.814581] < read 103 (621)
[   31.814584] > read 103 -> ee (621)
[   31.814675] < read 103 (499)
[   31.814679] > read 103 -> ee (499)
[   31.814688] < read 13e (433)
[   31.814691] > read 13e -> 30 (433)
[   31.814694] < read 13d (436)
[   31.814697] > read 13d -> 30 (436)
[   31.814702] < read 103 (499)
[   31.814706] > read 103 -> ee (499)
[   31.819702] pcm512x 1-004c: No SCLK, using BCLK: -2
[   31.819711] pcm512x 1-004d: No SCLK, using BCLK: -2
[   31.819872] Lock acquired in snd_allo_piano_dsp_program
[   31.819878] Lock released in snd_allo_piano_dsp_program
[   31.855278] pcm512x 1-004c: No SCLK, using BCLK: -2
[   31.855294] pcm512x 1-004d: No SCLK, using BCLK: -2
[   31.859934] < read 13d (542)
[   31.859947] > read 13d -> 58 (542)
[   31.859951] < read 13e (549)
[   31.859954] > read 13e -> 58 (549)
[   31.859963] < write 13d = ffffff24 (597)
[   31.860746] > write 13d = ffffff24 ret 0 (597)
[   31.860750] < write 13e = ffffff58 (602)
[   31.861517] > write 13e = ffffff58 ret 0 (602)
[   31.861562] < read 103 (621)
[   31.861567] > read 103 -> ff (621)
[   31.861576] < write 103 = ffffffee (663)
[   31.862339] > write 103 = ffffffee ret 0 (663)
[   31.862344] < write 103 = ffffffee (668)
[   31.863102] > write 103 = ffffffee ret 0 (668)
[   31.863132] < read 13d (542)
[   31.863137] > read 13d -> 24 (542)
[   31.863140] < read 13e (549)
[   31.863143] > read 13e -> 58 (549)
[   31.863149] < write 13d = ffffff24 (597)
[   31.863230] pcm512x 1-004c: No SCLK, using BCLK: -2
[   31.864432] > write 13d = ffffff24 ret 0 (597)
[   31.864439] < write 13e = ffffff24 (602)
[   31.864446] pcm512x 1-004d: No SCLK, using BCLK: -2
[   31.865227] > write 13e = ffffff24 ret 0 (602)
[   31.865254] < read 103 (621)
[   31.865259] > read 103 -> ee (621)
[   31.865266] < write 103 = ffffffee (663)
[   31.866030] > write 103 = ffffffee ret 0 (663)
[   31.866034] < write 103 = ffffffee (668)
[   31.866789] > write 103 = ffffffee ret 0 (668)
[   31.866918] < read 13d (542)
[   31.866923] > read 13d -> 24 (542)
[   31.866926] < read 13e (549)
[   31.866930] > read 13e -> 24 (549)
[   31.866936] < read 103 (621)
[   31.866940] > read 103 -> ee (621)
[   31.866950] < read 13d (542)
[   31.866954] > read 13d -> 24 (542)
[   31.866957] < read 13e (549)
[   31.866960] > read 13e -> 24 (549)
[   31.866966] < read 103 (621)
[   31.866969] > read 103 -> ee (621)
[   31.867063] pcm512x 1-004c: No SCLK, using BCLK: -2
[   31.867069] pcm512x 1-004d: No SCLK, using BCLK: -2

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

See #6532.

Nice!. Do you have a spare thinking head that I can borrow?

@pelwell
Copy link
Contributor

pelwell commented Dec 11, 2024

This one's showing signs of aging.

I'm hoping that with the patch in the pull request that the digital gain limit will still work, but it will have to be verified. I'm not sure if the code was arranged as it was for good reasons or just carelessness. If you wait another half hour you'll be able to install a trial build with sudo rpi-update pulls/6532, or just apply the patch yourself.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

I am looking into future kernels too. The following suggest that references asoc_ are being swapped with snd_soc_

linux/include/sound/soc.h

Lines 1154 to 1157 in 574ea34

/* REMOVE ME */
#define asoc_rtd_to_cpu snd_soc_rtd_to_cpu
#define asoc_rtd_to_codec snd_soc_rtd_to_codec
#define asoc_substream_to_rtd snd_soc_substream_to_rtd

When looking into the module code codes for kernel like 6.13.y it is already replaced there. As a suggestion - wouldn't now be an ideal time to future-proof module a bit?

@pelwell
Copy link
Contributor

pelwell commented Dec 11, 2024

I'll bear that in mind.

@popcornmix
Copy link
Collaborator

git diff rpi-6.6.y rpi-6.12.y sound/soc/bcm

shows the changes needed when dragging downstream patches to newer kernels. It is a pain.
I don't think it's worth updating 6.6 to the newer apis - we'll be switching to 6.12 kernel soon enough,
and all kernels that require the newer apis have already been fixed up.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 11, 2024

Thank you for your above and beyond support. Building module for 6.6.62-v8+ #1816 for community testing.

@foonerd
Copy link
Contributor Author

foonerd commented Dec 12, 2024

Internal testing confirmed that the patch works as expected. There is some extra "chatter" from module logged (-517), however this can be addressed at some point in future.

@foonerd foonerd closed this as completed Dec 12, 2024
pelwell added a commit that referenced this issue Dec 12, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: #6527

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit to pelwell/linux that referenced this issue Dec 12, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: raspberrypi#6527

Signed-off-by: Phil Elwell <[email protected]>
@foonerd
Copy link
Contributor Author

foonerd commented Dec 12, 2024

Also, related to this #6534

pelwell added a commit that referenced this issue Dec 17, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: #6527

Signed-off-by: Phil Elwell <[email protected]>
pelwell added a commit that referenced this issue Dec 17, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: #6527

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Dec 20, 2024
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: #6527

Signed-off-by: Phil Elwell <[email protected]>
@foonerd
Copy link
Contributor Author

foonerd commented Dec 31, 2024

Hey @pelwell,

It seems that we have lost the control over

Should we carry on here, or open a new issue?

Kind Regards,

@pelwell
Copy link
Contributor

pelwell commented Jan 1, 2025

You can leave it here.

In what way have you lost the control? Is it stuck on or off?

The variable digital_gain_0db_limit should default to true but be overridden from Device Tree in the probe function before it is used. It should be easy to add a few pr_err calls to trace the operation.

@pelwell pelwell reopened this Jan 1, 2025
@foonerd
Copy link
Contributor Author

foonerd commented Jan 1, 2025

Thank you for looking into this.
The problem surfaces when setting volume above ca' 25% of threshold, after this level audio become distorted; similar to over-amplification. Hence why I think that this is linked.

If you can spare a bit of time, and take a look into the last week discussions in community thread.

dmesg:

[   16.550860] snd-allo-piano-dac-plus soc:sound: Failed to set master volume limit: -22
[   16.550884] snd-allo-piano-dac-plus soc:sound: Failed to set subwoofer volume limit: -22
[  314.091468] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517
[  314.096541] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517
[  314.116804] snd-allo-piano-dac-plus soc:sound: snd_soc_register_card() failed: -517
[  314.121478] snd-allo-piano-dac-plus soc:sound: Failed to set master volume limit: -22
[  314.121500] snd-allo-piano-dac-plus soc:sound: Failed to set subwoofer volume limit: -22

popcornmix pushed a commit that referenced this issue Jan 2, 2025
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: #6527

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Jan 2, 2025
Calling snd_soc_limit_volume from within a kcontrol put handler seems
to cause a deadlock as it attempts to claim a write lock that is already
held. Call snd_soc_limit_volume from the main initialisation code
instead, to avoid the recursive locking.

See: #6527

Signed-off-by: Phil Elwell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants