Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evalv3: OOM and signal killed (64GB) #3334

Open
uhthomas opened this issue Jul 28, 2024 · 8 comments
Open

evalv3: OOM and signal killed (64GB) #3334

uhthomas opened this issue Jul 28, 2024 · 8 comments
Labels
evaluator evalv3 issues affecting only the evaluator version 3 NeedsInvestigation

Comments

@uhthomas
Copy link
Contributor

uhthomas commented Jul 28, 2024

What version of CUE are you using (cue version)?

$ go run cuelang.org/go/cmd/[email protected] version
cue version v0.10.0-alpha.2.0.20240726151100-e68bd632c29e

go version go1.22.5
      -buildmode exe
       -compiler gc
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.10.0

Does this issue reproduce with the latest stable release?

Yes.

What did you do?

After following this to fix another issue, I thought it would be interesting to try the new evaluator again.

#3330 (comment)

❯ CUE_EXPERIMENT=evalv3 go run cuelang.org/go/cmd/[email protected] export ./k8s/amour/list.cue
signal: killed

uhthomas/automata@5a1a4fe

What did you expect to see?

Successful build.

What did you see instead?

OOM, signal killed.

image

@uhthomas uhthomas added NeedsInvestigation Triage Requires triage/attention labels Jul 28, 2024
uhthomas added a commit to uhthomas/automata that referenced this issue Jul 28, 2024
@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 28, 2024

I wrote this quick script to see if I could find anything obvious, but it doesn't appear that any individual directory causes this. It seems maybe to be a culmination of all of them? There are some pretty large spikes in memory usage.

for dir in */; echo "$dir:" && CUE_EXPERIMENT=evalv3 go run cuelang.org/go/cmd/[email protected] export "./$dir"

image

Some of those spikes are like 10 gigabytes....

@uhthomas
Copy link
Contributor Author

If it helps - the latest stable release without using evalv3 takes at most 2GB for everything all at once (cue export ./k8s/amour/list.cue). Pretty major regression here.

❯ cue version
cue version 0.9.2

go version go1.22.5
      -buildmode pie
       -compiler gc
       -trimpath true
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.9.2

@uhthomas
Copy link
Contributor Author

uhthomas commented Jul 28, 2024

I don't know if this pprof info is helpful.

pprof001

❯ go tool pprof http://localhost:8080/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8080/debug/pprof/heap
Saved profile in /home/thomas/pprof/pprof.cuebin.alloc_objects.alloc_space.inuse_objects.inuse_space.011.pb.gz
File: cuebin
Type: inuse_space
Time: Jul 28, 2024 at 10:41pm (BST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 15.18GB, 97.93% of 15.51GB total
Dropped 187 nodes (cum <= 0.08GB)
Showing top 10 nodes out of 54
      flat  flat%   sum%        cum   cum%
    6.28GB 40.51% 40.51%     6.28GB 40.51%  cuelang.org/go/internal/core/adt.(*overlayContext).allocCC
    3.88GB 25.04% 65.55%     5.95GB 38.36%  cuelang.org/go/internal/core/adt.(*closeContext).getKeyedCC
    2.17GB 13.97% 79.52%     2.17GB 13.97%  cuelang.org/go/internal/core/adt.(*OpContext).newNodeContext
    1.10GB  7.12% 86.64%     1.10GB  7.12%  cuelang.org/go/internal/core/adt.(*closeContext).addDependency
    0.96GB  6.21% 92.85%     5.99GB 38.60%  cuelang.org/go/internal/core/adt.(*closeContext).assignConjunct
    0.29GB  1.86% 94.71%     0.29GB  1.86%  cuelang.org/go/internal/core/adt.(*overlayContext).initCloneCC
    0.18GB  1.16% 95.87%     0.18GB  1.16%  cuelang.org/go/internal/core/adt.(*Vertex).rootCloseContext
    0.12GB   0.8% 96.67%     1.28GB  8.26%  cuelang.org/go/internal/core/adt.(*overlayContext).cloneVertex
    0.10GB  0.66% 97.32%     0.10GB  0.66%  cuelang.org/go/internal/core/adt.(*nodeContext).getArc
    0.09GB  0.61% 97.93%     0.10GB  0.66%  cuelang.org/go/internal/core/adt.(*nodeContext).scheduleTask
(pprof) list allocCC
Total: 15.51GB
ROUTINE ======================== cuelang.org/go/internal/core/adt.(*overlayContext).allocCC in /home/thomas/code/github.com/cue-lang/cue/internal/core/adt/overlay.go
    6.28GB    15.11GB (flat, cum) 97.42% of Total
         .          .    259:func (ctx *overlayContext) allocCC(cc *closeContext) *closeContext {
         .          .    260:   // TODO(perf): if the original is "done", it can no longer be modified and
         .          .    261:   // we can use the original, even if the values will not be correct.
         .          .    262:   if cc.overlay != nil {
         .          .    263:           return cc.overlay
         .          .    264:   }
         .          .    265:
    4.59GB     4.59GB    266:   o := &closeContext{generation: ctx.generation}
         .          .    267:   cc.overlay = o
         .          .    268:
         .          .    269:   if cc.parent != nil {
         .     6.15GB    270:           o.parent = ctx.allocCC(cc.parent)
         .          .    271:   }
         .          .    272:
         .          .    273:   // Copy the conjunct group if it exists.
         .          .    274:   if cc.group != nil {
         .          .    275:           // Copy the group of conjuncts.
    1.67GB     1.67GB    276:           g := make([]Conjunct, len(*cc.group))
         .          .    277:           o.group = (*ConjunctGroup)(&g)
         .          .    278:           for i, c := range *cc.group {
         .     1.50MB    279:                   g[i] = ctx.copyConjunct(c)
         .          .    280:           }
         .          .    281:
         .          .    282:           if o.parent != nil {
         .          .    283:                   // validate invariants
         .          .    284:                   ca := *cc.parent.group
         .          .    285:                   if ca[cc.parentIndex].x != cc.group {
         .          .    286:                           panic("group misaligned")
         .          .    287:                   }
         .          .    288:
         .          .    289:                   (*o.parent.group)[cc.parentIndex].x = o.group
         .          .    290:           }
         .          .    291:   }
         .          .    292:
         .          .    293:   // This must come after allocating the parent so that we can always read
         .          .    294:   // the src vertex from the parent during initialization. This assumes that
         .          .    295:   // src is set in the root closeContext when cloning a vertex.
   21.23MB    21.23MB    296:   ctx.closeContexts = append(ctx.closeContexts, cc)
         .          .    297:
         .          .    298:   // needsCloseInSchedule is used as a boolean. The pointer to the original
         .          .    299:   // closeContext is just used for reporting purposes.
         .          .    300:   if cc.needsCloseInSchedule != nil {
         .   128.57MB    301:           o.needsCloseInSchedule = ctx.allocCC(cc.needsCloseInSchedule)
         .          .    302:   }
         .          .    303:
         .          .    304:   // We only explicitly tag dependencies of type ARC. Notifications that
         .          .    305:   // point within the disjunct overlay will be tagged elsewhere.
         .          .    306:   for _, a := range cc.arcs {
         .          .    307:           if a.kind == ARC {
         .     2.55GB    308:                   ctx.allocCC(a.cc)
         .          .    309:           }
         .          .    310:   }
         .          .    311:
         .          .    312:   return o
         .          .    313:}
(pprof) list getKeyedCC
Total: 15.51GB
ROUTINE ======================== cuelang.org/go/internal/core/adt.(*closeContext).getKeyedCC in /home/thomas/code/github.com/cue-lang/cue/internal/core/adt/fields.go
    3.88GB    10.72GB (flat, cum) 69.15% of Total
         .          .    362:func (cc *closeContext) getKeyedCC(ctx *OpContext, key *closeContext, c CycleInfo, mode ArcType, checkClosed bool) *closeContext {
         .          .    363:   for _, a := range cc.arcs {
         .          .    364:           if a.key == key {
         .          .    365:                   a.cc.updateArcType(mode)
         .          .    366:                   return a.cc
         .          .    367:           }
         .          .    368:   }
         .          .    369:
  360.01MB   360.01MB    370:   group := &ConjunctGroup{}
         .          .    371:
         .          .    372:   if cc.parentConjuncts == cc {
         .          .    373:           panic("parent is self")
         .          .    374:   }
         .          .    375:
         .     5.74GB    376:   parent, pos, _ := cc.parentConjuncts.assignConjunct(ctx, key, Conjunct{
         .          .    377:           CloseInfo: CloseInfo{
         .          .    378:                   FromDef:   cc.isDef,
         .          .    379:                   FromEmbed: cc.isEmbed,
         .          .    380:                   CycleInfo: c,
         .          .    381:           },
         .          .    382:           x: group,
         .          .    383:   }, mode, false, checkClosed)
         .          .    384:
    3.53GB     3.53GB    385:   arc := &closeContext{
         .          .    386:           generation:      cc.generation,
         .          .    387:           parent:          parent,
         .          .    388:           parentConjuncts: parent,
         .          .    389:           parentIndex:     pos,
         .          .    390:
         .          .    391:           src:     key.src,
         .          .    392:           arcType: mode,
         .          .    393:           group:   group,
         .          .    394:
         .          .    395:           isDef:                cc.isDef,
         .          .    396:           isEmbed:              cc.isEmbed,
         .          .    397:           needsCloseInSchedule: cc,
         .          .    398:   }
         .          .    399:
         .          .    400:   arc.parent.incDependent(ctx, PARENT, arc)
         .          .    401:
         .          .    402:   // If the parent, w.r.t. the subfield relation was already processed,
         .          .    403:   // there is no need to register the notification.
         .          .    404:   arc.incDependent(ctx, EVAL, cc) // matched in REF(decrement:nodeDone)
         .          .    405:
         .          .    406:   // A let field never depends on its parent. So it is okay to filter here.
         .          .    407:   if !arc.Label().IsLet() {
         .          .    408:           // prevent a dependency on self.
         .          .    409:           if key.src != cc.src {
         .     1.10GB    410:                   cc.addDependency(ctx, ARC, key, arc, key)
         .          .    411:           }
         .          .    412:   }
         .          .    413:
         .          .    414:   v := key.src
         .          .    415:   if checkClosed && v.Parent != nil && v.Parent.state != nil {

@mvdan mvdan added panic and removed Triage Requires triage/attention labels Jul 28, 2024
@mvdan mvdan removed the panic label Oct 9, 2024
@mvdan
Copy link
Member

mvdan commented Oct 9, 2024

Some updated numbers, as of CUE 655236e:

$ time CUE_EXPERIMENT=evalv3=0 cue export ./k8s/amour/list.cue >/dev/null

real	0m10.683s
user	0m18.254s
sys	0m0.456s
$ time CUE_EXPERIMENT=evalv3=1 cue export ./k8s/amour/list.cue >/dev/null
^C (stopped before it eats all my RAM)

real	0m26.453s
user	1m13.289s
sys	0m14.437s

We see a similar issue with a few other medium to large projects when using the new evaluator. The current work in #2854 and particularly #2853 should help significantly here. I will update this thread once more progress has been made, as we are actively merging improvements into master.

@mvdan
Copy link
Member

mvdan commented Oct 14, 2024

I have finally been able to reduce a bit of CUE which seems to OOM my machine where evalv2 definitely did not: #3509

@mvdan
Copy link
Member

mvdan commented Oct 14, 2024

#3509 is fixed but that doesn't fix your issue. Quickly trying to narrow down its cause, I ran into a related regression, #3511.

@mvdan mvdan added the evalv3 issues affecting only the evaluator version 3 label Dec 10, 2024
@mvdan
Copy link
Member

mvdan commented Dec 12, 2024

Still seems to happen as of 0005c22.

@mvdan
Copy link
Member

mvdan commented Dec 24, 2024

I again reduced part of this project into another performance issue to be fixed: #3633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluator evalv3 issues affecting only the evaluator version 3 NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

3 participants