Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtail doesn't follow log rotation when running under WSL and docker #531

Open
JJassonn69 opened this issue May 21, 2021 · 4 comments
Open
Labels
bug This is considered a bug and shall get fixed help wanted mtail-Log Tailing Issues related to log polling and tailing

Comments

@JJassonn69
Copy link

I am using prometheus and mtail to gather information from chia logs. It is working fine until the log file size is 20MB and new file log file is created, mtail then receives random data.
I have to recompose the program to fix the issue but the same problem occurs when another log file is created.

I am using docker, VSL and prometheus. I do have mtail installed locally aswell but no idea how to implement it.

any help on how can i remedy it.

image
image

@jaqx0r
Copy link
Contributor

jaqx0r commented May 22, 2021 via email

@pikolos
Copy link

pikolos commented Jun 10, 2021

I am seeing the same issue with the same app. When the logrotate takes place mtail seems to re-ingest the original file, exploding the counters. This may be related on the implementation of the windows subsystem for Linux (WSL) that docker desktop uses to expose the log to the container as I run exectly the same setup under linux and the issue does not occur.

When running mtail directly on the windows host it works ... kind of as there mtail seems to simply prevent the logrotate of taking place in the first place :-)

I'am currently running mtail with the -v=2 and so i can post results when the rotate takes place.

@jaqx0r
Copy link
Contributor

jaqx0r commented Jun 11, 2021

This isn't great at all! I assume that it's because I don't understand windows file semantics, and would appreciate help!

Sounds like we have two problems here, one is the pinning of closed files in windows itself, and the other is the behaviour when running under wsl.

If you all can get me verbose INFO logs per above that might help start the debugging process.

@pikolos
Copy link

pikolos commented Jun 12, 2021

Here is the log of the exact moment the log was rotated:

 [36mmtail         | [0m I0611 17:17:52.171393       1 tail.go:244] already got a logstream on "/var/log/chia/debug.log"
 [36mmtail         | [0m I0611 17:17:52.171178       1 filestream.go:175] &{0xc00008f320}: current seek is 20971559
 [36mmtail         | [0m I0611 17:17:52.171507       1 filestream.go:176] &{0xc00008f320}: new size is 167
 [36mmtail         | [0m I0611 17:17:52.171549       1 filestream.go:181] &{0xc00008f320}: truncate? currentoffset is 20971559 and size is 167
 [36mmtail         | [0m I0611 17:17:52.171663       1 filestream.go:191] &{0xc00008f320}: Seeked to 0
 [36mmtail         | [0m I0611 17:17:52.172390       1 filestream.go:106] &{0xc00008f320}: read 4096 bytes, err is <nil>
 [36mmtail         | [0m I0611 17:17:52.172585       1 filestream.go:110] &{0xc00008f320}: decode and send
 [36mmtail         | [0m I0611 17:17:52.172687       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172744       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172861       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172924       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172941       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172958       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.172989       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173096       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173125       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173164       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173188       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173353       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173410       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173493       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173548       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173584       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173601       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173620       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173649       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173664       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173683       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173738       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173753       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173772       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173807       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173827       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173851       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173905       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173924       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173946       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.173990       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.174839       1 filestream.go:106] &{0xc00008f320}: read 4096 bytes, err is <nil>
 [36mmtail         | [0m I0611 17:17:52.174984       1 filestream.go:110] &{0xc00008f320}: decode and send
 [36mmtail         | [0m I0611 17:17:52.175011       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175075       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175102       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175203       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175335       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175485       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175633       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175659       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175682       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175736       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175843       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.175874       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176022       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176125       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176147       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176191       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176210       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176232       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176274       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176294       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.176316       1 decode.go:37] sendline
 [36mmtail         | [0m I0611 17:17:52.177252       1 filestream.go:106] &{0xc00008f320}: read 4096 bytes, err is <nil>
 [36mmtail         | [0m I0611 17:17:52.177465       1 filestream.go:110] &{0xc00008f320}: decode and send

The rest of the logfile is a repetion 25xsendline, a read 4096 and a decode, seems like mtail went in a loop here for the next 45 gb of logfile. CPU for the mtail container was also at 100%. This went on on until the docker desktop host ran out of memory due to of trying to log 45gb :-)

It could be related to how docker desktop (windows) interact with the windows filesystem trought the WSL subsystem. However another container running promtail that was watching the same file had no issue with the rotate as seen in its log:

promtail  | ts=2021-06-11T17:17:52.0376528Z caller=log.go:124 level=info msg="Re-opening moved/deleted file /var/log/chia/debug.log ..."
promtail  | 2021/06/11 17:17:52 stat of old file returned, this is not expected and may result in unexpected behavior
promtail  | ts=2021-06-11T17:17:52.0400007Z caller=log.go:124 level=info msg="Waiting for /var/log/chia/debug.log to appear..."
promtail  | ts=2021-06-11T17:17:52.2932814Z caller=log.go:124 level=info msg="Successfully reopened /var/log/chia/debug.log"

If we look at the code of promtail of that warning stat of old file returned, this is not expected and may result in unexpected behavior there seems to be something going on with opening the old file at logrotate (line 210).

https://github.com/grafana/tail/blob/master/tail.go

It seems that the for some reason at logrotate the old file is still returned somehow and mtail goes into a loop. This might all be related to the implementation of docker desktop and its interaction with the local filesystem via WSL. WSL seems to use Plan9 for this. I am currently looking into other ways of mounting the log file to the container under windows.

Regardless of the filesystem thing there may be a way to fix how mtail handles this, at least looking at the comments in the code for promtail there is something going on with logrotate and the old file being returned.

@jaqx0r jaqx0r added bug This is considered a bug and shall get fixed help wanted mtail-Log Tailing Issues related to log polling and tailing labels Jun 20, 2021
@jaqx0r jaqx0r changed the title log rotation error mtail mtail doesn't follow log rotation when running under WSL and docker Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This is considered a bug and shall get fixed help wanted mtail-Log Tailing Issues related to log polling and tailing
Projects
None yet
Development

No branches or pull requests

3 participants