Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks with version 1.93 #1440

Closed
mdantonio opened this issue Aug 27, 2024 · 12 comments
Closed

Deadlocks with version 1.93 #1440

mdantonio opened this issue Aug 27, 2024 · 12 comments
Labels

Comments

@mdantonio
Copy link

Hey, we are using logrus on a service deployed on kubernetes that produces a pretty big amount of logs (around 4 millions per hour). We never observed similar issues in several months, but today we experienced several deadlocks, without any change on our stack (service was stable and running since many days then it started to completely freeze every 15-20 minutes).
We are using latest version (1.9.3)

	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:321
github.com/sirupsen/logrus.(*Entry).Info(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x48 fp=0xc0001cb7c8 sp=0xc0001cb798 pc=0x7dc1e8
github.com/sirupsen/logrus.(*Entry).Log(0xc0059b9b90, 0x4, {0xc0001cb7f8?, 0xc0001cb808?, 0x41535b?})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:233 +0x2d1 fp=0xc0001cb798 sp=0xc0001cb6a0 pc=0x7dba91
github.com/sirupsen/logrus.(*Entry).log(0xc0059b9b90, 0x4, {0xc004dbbc98, 0x11})
	/go/pkg/mod/github.com/sirupsen/[email protected]/logger.go:61
github.com/sirupsen/logrus.(*MutexWrap).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:171 +0x15d fp=0xc0001cb6a0 sp=0xc0001cb650 pc=0x48ad5d
sync.(*Mutex).lockSlow(0x24228f0)
	/usr/local/go/src/runtime/sema.go:77 +0x25 fp=0xc0001cb650 sp=0xc0001cb618 pc=0x47b065
sync.runtime_SemacquireMutex(0x0?, 0x0?, 0xc0059b9c00?)
	/usr/local/go/src/runtime/sema.go:160 +0x225 fp=0xc0001cb618 sp=0xc0001cb5b0 pc=0x459fe5
runtime.semacquire1(0x24228f4, 0x0, 0x3, 0x1, 0x15)
	/usr/local/go/src/runtime/proc.go:408
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0001cb5b0 sp=0xc0001cb590 pc=0x44716e
runtime.gopark(0x3631905e71dab8?, 0x11?, 0x10?, 0x0?, 0xc0001cb600?)
goroutine 848 gp=0xc003eb4000 m=nil [sync.Mutex.Lock, 218 minutes]:

I see similar issues reported in the past (#1201, that is closed, but apparently not resolved).

What is the reason for such deadlocks? Can be caused by some dirty data being logged? Any hint to prevent similar deadlocks?

@ss920386
Copy link

Hi @mdantonio , may I ask more about your "freeze every 15-20 minutes"? Did that mean the freeze (deadlock) was eliminated itself and the service consumed for a while and everything happen again? Or did you restart the service manually to eliminate the deadlock?

@mdantonio
Copy link
Author

Hi @ss920386 the service was not able to recover by itself and a restart of the service was needed

The pattern was something like:

  • service started and working for some time (15-20 minutes on average)
  • suddenly the service stuck, forever
  • by sending sigterm to the service we got the above stack trace
  • by manually restarting the service, it was working again for 15-20 minutes avg, that was stucking again

@ss920386
Copy link

@mdantonio Thanks for your information. I first suspect that I encountered the same issue as yours😂However, I only got long processing time instead of deadlock. Sorry that I still don’t have any idea about the issue…but thanks again for your quick response🙏

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Oct 11, 2024
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale.

@Pure-AdamuKaapan
Copy link

Pure-AdamuKaapan commented Nov 26, 2024

@ss920386 @mdantonio : I have more information on this. We just encountered this issue when upgrading from logrus at commit 79c5ab66aa2ce7d9ff7b3c437ebc22fcc519a967 (v1.8.2?) to 1.9.3 in our repository. It appears specifically to have something to do with re-entrant locks in logging: this is a stack trace we got during our go test run that times out. It passes consistently with 1.8.2 and fails consistently with 1.9.3.

goroutine 22 [sync.Mutex.Lock]:
sync.runtime_SemacquireMutex(0x0?, 0x0?, 0xc0002f39e0?)
        /root/.gvm/gos/go1.23/src/runtime/sema.go:95 +0x25
sync.(*Mutex).lockSlow(0x114e2b0)
        /root/.gvm/gos/go1.23/src/sync/mutex.go:173 +0x15d
sync.(*Mutex).Lock(...)
        /root/.gvm/gos/go1.23/src/sync/mutex.go:92
github.com/sirupsen/logrus.(*MutexWrap).Lock(...)
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/logger.go:61
github.com/sirupsen/logrus.(*Entry).log(0xc00030e770, 0x2, {0xc0001c7580, 0x1b})
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:233 +0x2d1
github.com/sirupsen/logrus.(*Entry).Log(0xc00030e770, 0x2, {0xc0000484e8?, 0xb?, 0xc000189500?})
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Error(...)
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:333
<our code...>
fmt.(*pp).handleMethods(0xc0002de4e0, 0x416d25?)
        /root/.gvm/gos/go1.23/src/fmt/print.go:673 +0x31f
fmt.(*pp).printArg(0xc0002de4e0, {0xbad7e0, 0xc0001816d0}, 0x76)
        /root/.gvm/gos/go1.23/src/fmt/print.go:756 +0x446
fmt.(*pp).doPrint(0xc0002de4e0, {0xc000048a88?, 0x1, 0x1?})
        /root/.gvm/gos/go1.23/src/fmt/print.go:1208 +0x45
fmt.Sprint({0xc000048a88, 0x1, 0x1})
        /root/.gvm/gos/go1.23/src/fmt/print.go:279 +0x3e
github.com/sirupsen/logrus.(*TextFormatter).appendValue(0xc0001b42a0, 0xc00025b9b0, {0xbad7e0?, 0xc0001816d0?})
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/text_formatter.go:331 +0x65
github.com/sirupsen/logrus.(*TextFormatter).appendKeyValue(0xc0001b42a0, 0xc00025b9b0, {0xbb2fca?, 0x7?}, {0xbad7e0, 0xc0001816d0})
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/text_formatter.go:325 +0x94
github.com/sirupsen/logrus.(*TextFormatter).Format(0xc0001b42a0, 0xc00030e700)
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/text_formatter.go:224 +0xf71
github.com/sirupsen/logrus.(*Entry).write(0xc00030e700)
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:289 +0x94
github.com/sirupsen/logrus.(*Entry).log(0xc00030e690, 0x3, {0xc0001a5560, 0x24})
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:252 +0x425
github.com/sirupsen/logrus.(*Entry).Log(0xc00030e690, 0x3, {0xc000049b78?, 0x4?, 0xc0001816d0?})
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Warn(...)
        /root/go/src/github.com/pure-px/porx/vendor/github.com/sirupsen/logrus/entry.go:325
<our code...>

@Pure-AdamuKaapan
Copy link

Reopening this bug would be greatly appreciated - due to a chain of dependency conflicts, we can't really just roll back to the old version (at least, not forever).

@Pure-AdamuKaapan
Copy link

Pure-AdamuKaapan commented Nov 26, 2024

I ran a diff between v1.9.3 and v1.8.2 and found the only substantial difference to be this part, which is in the same struct as our stack trace (Entry):

diff --git a/writer.go b/writer.go
index 72e8e3a..074fd4b 100644
--- a/writer.go
+++ b/writer.go
@@ -4,6 +4,7 @@ import (
 	"bufio"
 	"io"
 	"runtime"
+	"strings"
 )
 
 // Writer at INFO level. See WriterLevel for details.
@@ -20,15 +21,18 @@ func (logger *Logger) WriterLevel(level Level) *io.PipeWriter {
 	return NewEntry(logger).WriterLevel(level)
 }
 
+// Writer returns an io.Writer that writes to the logger at the info log level
 func (entry *Entry) Writer() *io.PipeWriter {
 	return entry.WriterLevel(InfoLevel)
 }
 
+// WriterLevel returns an io.Writer that writes to the logger at the given log level
 func (entry *Entry) WriterLevel(level Level) *io.PipeWriter {
 	reader, writer := io.Pipe()
 
 	var printFunc func(args ...interface{})
 
+	// Determine which log function to use based on the specified log level
 	switch level {
 	case TraceLevel:
 		printFunc = entry.Trace
@@ -48,23 +52,51 @@ func (entry *Entry) WriterLevel(level Level) *io.PipeWriter {
 		printFunc = entry.Print
 	}
 
+	// Start a new goroutine to scan the input and write it to the logger using the specified print function.
+	// It splits the input into chunks of up to 64KB to avoid buffer overflows.
 	go entry.writerScanner(reader, printFunc)
+
+	// Set a finalizer function to close the writer when it is garbage collected
 	runtime.SetFinalizer(writer, writerFinalizer)
 
 	return writer
 }
 
+// writerScanner scans the input from the reader and writes it to the logger
 func (entry *Entry) writerScanner(reader *io.PipeReader, printFunc func(args ...interface{})) {
 	scanner := bufio.NewScanner(reader)
+
+	// Set the buffer size to the maximum token size to avoid buffer overflows
+	scanner.Buffer(make([]byte, bufio.MaxScanTokenSize), bufio.MaxScanTokenSize)
+
+	// Define a split function to split the input into chunks of up to 64KB
+	chunkSize := bufio.MaxScanTokenSize // 64KB
+	splitFunc := func(data []byte, atEOF bool) (int, []byte, error) {
+		if len(data) >= chunkSize {
+			return chunkSize, data[:chunkSize], nil
+		}
+
+		return bufio.ScanLines(data, atEOF)
+	}
+
+	// Use the custom split function to split the input
+	scanner.Split(splitFunc)
+
+	// Scan the input and write it to the logger using the specified print function
 	for scanner.Scan() {
-		printFunc(scanner.Text())
+		printFunc(strings.TrimRight(scanner.Text(), "\r\n"))
 	}
+
+	// If there was an error while scanning the input, log an error
 	if err := scanner.Err(); err != nil {
 		entry.Errorf("Error while reading from Writer: %s", err)
 	}
+
+	// Close the reader when we are done
 	reader.Close()
 }
 
+// WriterFinalizer is a finalizer function that closes then given writer when it is garbage collected
 func writerFinalizer(writer *io.PipeWriter) {
 	writer.Close()
 }

@sillydong
Copy link

Also meet this deadlock. Can we reopen this and see if there is any solution? @ss920386

For me the lock is happen in entry.go line#233
After newEntry was dup from an existing entry, the entry's Logger is locked somewhere we don't know, and then when newEntry wants to lock, it is blocked.

Here is a simple test code for this scenario

package playground_test

import (
	"fmt"
	"sync"
	"testing"
	"time"
)

type Logger struct {
	mu MutexWrap
}

type Entry struct {
	Logger *Logger
}

func (e *Entry) Dup() *Entry {
	return &Entry{
		Logger: e.Logger,
	}
}

type MutexWrap struct {
	lock     sync.Mutex
	disabled bool
}

func (mw *MutexWrap) Lock() {
	if !mw.disabled {
		mw.lock.Lock()
	}
}

func (mw *MutexWrap) Unlock() {
	if !mw.disabled {
		mw.lock.Unlock()
	}
}

func (mw *MutexWrap) Disable() {
	mw.disabled = true
}

func TestLock(t *testing.T) {
	logger := &Logger{}
	entry := &Entry{Logger: logger}
	go func() {
		entry.Logger.mu.Lock()
		fmt.Println("locked")
		time.Sleep(time.Minute)
	}()

	time.Sleep(time.Second)
	newEntry := entry.Dup()
	newEntry.Logger.mu.Lock()
	fmt.Println("new locked")
}

@ss920386
Copy link

Hi @Pure-AdamuKaapan @sillydong !
Thank you for tagging me! However, I don’t believe I can re-open this issue since I’m not the original author, and I later realized the issue I encountered might not be the same as this one.
Still, I’m really looking forward to seeing more progress on this topic and truly appreciate all the effort going into resolving it! Please let me know if there’s anything else I might be able to help with.

@Pure-AdamuKaapan
Copy link

Shall we file a new bug then, now that we have a specific repro? (thanks so much @sillydong , beat me to it!)

@sillydong
Copy link

sillydong commented Nov 28, 2024

Here is a new issue tracking this #1448. It's fully reproducible and root cause found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants