Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no handling of lost sdr devices leaves TR doing nothing #644

Open
ZeroChaos- opened this issue Mar 2, 2022 · 10 comments
Open

no handling of lost sdr devices leaves TR doing nothing #644

ZeroChaos- opened this issue Mar 2, 2022 · 10 comments

Comments

@ZeroChaos-
Copy link

sometimes when using rtl_tcp as a source, a socket error happens and the socket is disconnected. this can easily be replicated by simply killing rtl_tcp. However, when this happens, t-r doesn't seem to automatically reconnect to the socket and it sure would be nice if it did.

@robotastic
Copy link
Owner

Hmm... I think this is all a bit lower layer than Trunk Recorder. All of the RTL stuff is wrapped up in the osmosdr driver. Trunk Recorder does't get any of that. If there is some way to easily listen for a disconnect and a function to call for it to reconnect, I could add it into the loop.

@ZeroChaos-
Copy link
Author

Not sure how easy it is to trap, but a line is printed that simply says "socket error"

If this needs to be a gr-osmosdr bug report feel free to close with no hard feelings.

@theficus
Copy link

theficus commented Aug 4, 2023

I've moved my trunk-recorder instance onto another machine because my Raspberry Pi 4 couldn't handle the processing duties of my trunked system (I need some 13 recorders and the Pi struggles with more than 8). I've been running rtl_tcp on the Pi and trunk-recorder on a far more powerful machine that can easily handle the amount of recorders I need.

Unfortunately, I am running into this exact problem where rtl_tcp just kind of decides to crap out and trunk-recorder just kind of gets stuck. Unlike its built-in capability to terminate when it drops too many control channel messages, it never recovers from this without manual intervention. Once I manually restart trunk-recorder it immediately starts working again.

I'm assuming in this scenario that bits stop flowing from the rtl_tcp sink(s). Maybe this is something trunk-recorder could detect and error out on if this happens past a certain threshold? (Like, if it gets less than n bytes of data during s seconds then terminate, sort of like how the existing control channel threshold works.) This would allow using the auto-restart script to recover from this class of issue.

@ZeroChaos-
Copy link
Author

After years of use at this point, I switched to directly using rtlsdr without rtl_tcp and found this error is not limited at all to rtl_tcp. TR will happily just do nothing if you literally pull the in use sdr out of the usb port.

Please detect when the source disappears and do something useful. A useful error message and exiting is likely the minimum I would suggest.

@ZeroChaos- ZeroChaos- changed the title rtl_tcp socket error no handling of lost sdr devices leaves TR doing nothing Sep 8, 2024
@robotastic
Copy link
Owner

Totally game to add something - any thoughts on how to detect this? Maybe there is some way to get a sample count and just check every so often to make sure that number is going up?

@ZeroChaos-
Copy link
Author

I honestly have no idea how to implement it :-). Your idea makes sense, 0 samples is easy to detect and configuration free. That said I find it nearly impossible to believe that the various drivers or gr-osmosdr don't provide a notification or something. I noticed "socket error" in the rtl_tcp instances but when directly using the sdr I see nothing so maybe there is no notification. Sorry I can't be more helpful.

@Dygear
Copy link
Contributor

Dygear commented Sep 9, 2024

For what it's worth, this also happens on the USRP when the plug falls out. But I don't really expect for TR to handle this directly. The whole TR instance crashes and then restarts using the .service file. If there is a way to detect when a source goes away, that would be cool tho. If you do find a way to figure out that a source has failed, I would love to see a status message added so a plugin can send a notification out.

@ZeroChaos-
Copy link
Author

For what it's worth, this also happens on the USRP when the plug falls out. But I don't really expect for TR to handle this directly. The whole TR instance crashes and then restarts using the .service file. If there is a way to detect when a source goes away, that would be cool tho. If you do find a way to figure out that a source has failed, I would love to see a status message added so a plugin can send a notification out.

Yours crashes and restarts???? Mine doesn't, that's the problem. Tested on rtl_tcp and rtlsdr sources and they just sit there doing nothing and don't crash. I would love it to crash (or exit with a useful error)

@Dygear
Copy link
Contributor

Dygear commented Sep 9, 2024

Must be the USRP's default to crash. Never thought crashing would be a feature.

@rexgithub2021
Copy link

rexgithub2021 commented Nov 13, 2024

I stumbled across this thread in my quest for information about something else and thought I would share my crude workarounds: I don't currently use rtl_tcp with TR but I've definitely had the issue described above on my 2 TR implementations with locally attached RTLSDRs too. I've found mainly two failure scenarios:

(1) An RTLSDR disconnects USB but then quickly reconnects with a different device number on the USB bus. TR does not usually crash but it can no longer record anything dependent on that RTL. If it happens to be the RTLSDR used for the Control Channel it's game over and it keeps looping through the CCs with no automatic recovery.

(2) An RTLSDR disconnects USB but does not reconnect on its own. No amount of USB bus resets or other command seem to bring it back to life. The effect on TR is the same as failure scenario 1 but the workarounds differ.

Workaround for scenario 1: I wrote a no-thrills shell script that captures the output from the lsusb command (Specifically: "/usr/bin/lsusb | grep RTL") every 60 seconds, compares the output to the results a minute earlier, and if they are different, it kills the TR script which then caused the service loop to restart it. This easily detects when an RTLSDR that was, say, device number 003 a minute ago and suddenly becomes device number 007 on the same USB bus ID. I installed the script as a service so it starts automatically at boot time. I've been running this for over a year on two different TR systems and it works great, except for the loss of up to a minute of calls on a busy radio system and it covers this failure mode almost 100% of the time.

Workaround for scenario 2: I augmented the script referenced above to look for the scenario when the actual number of lines returned every 60 seconds from the "/usr/bin/lsusb | grep RTL" command is less than the number of lines two minutes earlier. (For example, two minutes ago, the lsusb command returned 4 lines and now it returns only 3. This usually means an RTLSDR dropped its USB connection but isn't coming back on its own.) When the script finds this.... Plan A: It reboots the entire OS. This works about 85 percent of the time and causes all of the RTLSDR to become present again after the reboot. Sometimes one or more of the RTLSDRs don't come back after the OS restart so Plan B is to trigger an external automation that turns off the smart plug that provide power to the TR PC for 2 minutes and then repowers the PC. This picks up another 10 percent of this failure scenario and when the PC powers back up, all the RTLSDRs show up.... usually. In about 5 percent of the cases a RTLSDR disappears but even the power cycle doesn't do the trick. At this point, Plan C is required... all the USB devices have to be physically unplugged and then reconnected. As long as I'm not travelling, that annoying but workable for the TR at home. But for the one that's 650 miles away... I have to call in a favor to have someone go put their hands on it.

In my observations, USB devices do this kind of thing all the time regardless of OS, especially when power requirements on a USB bus creep up or if a USB connection gets physically "bumped". Not a problem when you can just reach over, unplug and re-plug your mouse, but less handy when it's a PC in a closet or another part of the world. This isn't really behavior that TR can fix per se. At best, it would be a "detect and try to restart" enhancement I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants