Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Exit 57005 StackOverflowException using AzurePowershell@5 #20156

Open
7 tasks
trevors20 opened this issue Jul 16, 2024 · 22 comments
Open
7 tasks

[BUG]: Exit 57005 StackOverflowException using AzurePowershell@5 #20156

trevors20 opened this issue Jul 16, 2024 · 22 comments

Comments

@trevors20
Copy link

New issue checklist

Task name

AzurePowerShell

Task version

5.242.1

Issue Description

We are seeing a lot of Exit 57005 build breaks when using AzurePowerShell@5.
It looks like from the logs there is a StackOverflowException exception that is happening.
Was wondering if you might be able to help debug and fix this and/or help diagnose why this might be happening and provide a workaround or anything like that. Much appreciated.

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 2022

Relevant log output

2024-07-16T14:22:16.2568594Z VERBOSE: Testing whether type can be loaded: 'Microsoft.VisualStudio.Services.WebApi.VssClientHttpRequestSettings'
2024-07-16T14:22:16.2573359Z VERBOSE: The type was loaded successfully.
2024-07-16T14:22:16.2578674Z VERBOSE: Leaving Get-OMType.
2024-07-16T14:22:16.2793622Z 
2024-07-16T14:22:16.2793956Z Process is terminated due to StackOverflowException.
2024-07-16T14:22:16.2852511Z VERBOSE: AGENT_SKIPCERTVALIDATION (empty)
2024-07-16T14:22:16.2857885Z VERBOSE:  Converted to bool: False
2024-07-16T14:22:16.2863891Z VERBOSE: Constructing HTTP client.
2024-07-16T14:22:16.3287111Z VERBOSE: Leaving Get-VstsVssHttpClient.
2024-07-16T14:22:16.3296507Z VERBOSE: Removing assemlby resolver.
2024-07-16T14:22:16.3347027Z VERBOSE: SYSTEM_PLANID: '[REDACTED]'
2024-07-16T14:22:16.3377086Z VERBOSE: SYSTEM_JOBID: '[REDACTED]'
2024-07-16T14:22:16.3406912Z VERBOSE: SYSTEM_HOSTTYPE: 'build'
2024-07-16T14:22:16.3436924Z VERBOSE: SYSTEM_TEAMPROJECTID: '[REDACTED]'
2024-07-16T14:22:16.7096512Z ##[command]Clear-AzContext -Scope CurrentUser -Force -ErrorAction SilentlyContinue
2024-07-16T14:22:16.8158020Z ##[command]Clear-AzContext -Scope Process
2024-07-16T14:22:16.8497381Z ##[command]Connect-AzAccount 
2024-07-16T14:22:16.8497676Z Name                           Value                                                                                   
2024-07-16T14:22:16.8497908Z ----                           -----                                                                                   
2024-07-16T14:22:16.8498114Z Tenant                         [REDACTED]                                                   
2024-07-16T14:22:16.8498310Z Scope                          Process                                                                                 
2024-07-16T14:22:16.8498491Z Environment                    AzureCloud                                                                              
2024-07-16T14:22:16.8498754Z ApplicationId                  ***                                                    
2024-07-16T14:22:16.8498944Z ServicePrincipal               True                                                                                    
2024-07-16T14:22:16.8499030Z 
2024-07-16T14:22:16.8499074Z 
2024-07-16T14:22:16.8499122Z 
2024-07-16T14:22:16.8635574Z WARNING: The feature related to parameter name 'FederatedToken' is under preview.
2024-07-16T14:22:18.4771754Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 5979.00 MB out of 32767.00 MB, CPU: Usage 7.00%
2024-07-16T14:22:23.7474754Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6086.00 MB out of 32767.00 MB, CPU: Usage 4.00%
2024-07-16T14:22:29.0309961Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6055.00 MB out of 32767.00 MB, CPU: Usage 9.00%
2024-07-16T14:22:34.2999428Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6075.00 MB out of 32767.00 MB, CPU: Usage 5.00%
2024-07-16T14:22:39.5848325Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6110.00 MB out of 32767.00 MB, CPU: Usage 8.00%
2024-07-16T14:22:44.8577747Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6151.00 MB out of 32767.00 MB, CPU: Usage 7.00%
2024-07-16T14:22:47.1743146Z ##[debug]Exit code: 57005

Full task logs with system.debug enabled

 LOGGING GOES OVER 65536 characters!  I will respond to this thread and upload it as a file.  

Repro steps

- task: AzurePowerShell@5
          displayName: DisplayName
          inputs:
            azureSubscription: '[REDACTED]'
            workingDirectory: '$(Build.SourcesDirectory)'
            ScriptPath: 'src\Scripts\script.ps1'
            ScriptArguments: '-AccessToken $(System.AccessToken) -Param2 $(param2) -Param3 $(param3)'
            azurePowerShellVersion: LatestVersion
          condition: succeeded()
@trevors20
Copy link
Author

I'm attaching the
azurepowershell_error.txt
logs as a file to this comment.

@trevors20
Copy link
Author

This is happening 80% of the time. For the 20% of the time, it does work.
I'm uploading an example of logging when the task does actually work to this comment.
azurepowershell_working.txt

Please help. Thanks muchly!

@trevors20
Copy link
Author

trevors20 commented Jul 16, 2024

We have decided to pin to an old version of AzurePowerShell@5 until this is fixed or figured out. That works great.

@v-schhabra
Copy link
Contributor

Hi @trevors20
Thanks for reporting this issue. We are investigating on it.

@huizhonglong
Copy link

huizhonglong commented Jul 17, 2024

It'd heavily impacted our VS builds across pipelines before we pinned to the old version as a workaround to unblock folks.

You can get all the failed build IDs from below query if comment out the last line (| summarize) if that helps:

Go to Log Analytics and run query

let _start = datetime('2024-07-12T00:00:00Z');
let _end = datetime('2024-07-17T00:00:00Z');
customEvents
| extend BuildDefinitionId = tostring(customDimensions.BuildDefinitionId)
| extend BuildDefinitionName = tostring(customDimensions.BuildDefinitionName)
| extend BuildId = tostring(customDimensions.BuildId)
|where timestamp between(_start .. _end)
| where customDimensions.RecordName == "Layout BuildArtifacts" and customDimensions.Message contains "PowerShell exited with code '57005'"
| distinct BuildDefinitionId, BuildDefinitionName, BuildId
| summarize FailedCount = count() by BuildDefinitionId, BuildDefinitionName

<style> </style>
BuildDefinitionId BuildDefinitionName FailedCount
10310 DD-CB-PR 41
10289 DD-CB-TestSignVS 13
11051 DD-CB-EUCU-TestSignVS 1

@fernando-lopezblanco-lrn

Hi. Same. We've started experiencing these stack overflows today, with that AzurePowershell@5 task. Testing now version 4...

@trevors20
Copy link
Author

Thanks @fernando-lopezblanco-lrn . If it helps, we pinned to version 5.241.5 and that has stabilized our builds.

-task: [email protected]

@fernando-lopezblanco-lrn
Copy link

fernando-lopezblanco-lrn commented Jul 18, 2024

Thanks @trevors20 for very quick response. Will try that version later on.

Same issue with AzurePowershell@4 (shown as Version : 4.242.0 in the pipeline logs). Testing now @5.241.5

@fernando-lopezblanco-lrn

@5.241.5 is looking good, but other pipelines with @4 are starting to fail as well

@trevors20
Copy link
Author

@fernando-lopezblanco-lrn , I'm not sure which version to pin to for the AzurePowershell@4. But, you should be able to find that by looking at your older builds that were working before you started using the 4.242.0 version of the task.

At the top of the AzurePowershell@4 step in the build, you should see a header at the top of the logs. Something like this:

Task : Azure PowerShell
Description : Run a PowerShell script within an Azure environment
Version : 5.241.5
Author : Microsoft Corporation
Help : https://aka.ms/azurepowershelltroubleshooting

If you work backwards and find one of your older builds with a previous version than the 4.242.0 version then try pinning the task to the previous version that you find. To give a rough timeframe, I know that the AzurePowershell@5 task was updated a couple of weeks ago which started the instability here. So, you may need to look back a couple of weeks or so. :(

@trevors20
Copy link
Author

@fernando-lopezblanco-lrn , actually, we should be able to see the versions in the source code history. The version is specified in the task.json for the task. In this case for AzurePowershell@4, that would be here: https://github.com/microsoft/azure-pipelines-tasks/commits/master/Tasks/AzurePowerShellV4/task.json

Looking at the history, the change that was checked in a couple of weeks ago, if you look at the previous change it looks like you may want to pin to 4.241.2. Try it and see?

@fernando-lopezblanco-lrn
Copy link

fernando-lopezblanco-lrn commented Jul 19, 2024

@trevors20 yeah, the version 4.242.0 I gave you above, I took it from the header of the logs as you later mentioned.
Testing 4.241.0 now, for 4.241.2 does not look to exist as it's prompting me to install it from the Marketplace

@admachina
Copy link

Curious if anyone has seen this issue recently. We took the pinned version workaround, and I'm looking to unwind it if it's fixed.

@admachina
Copy link

Was able to get this to reproduce on 5.243.7

@admachina
Copy link

This bug seems it may be mitigated also by switching to pwsh mode instead of legacy powershell.
Adding pwsh: true to the pipeline

Having good success with this if you wanted to unpin the version

@daxianji007
Copy link

Also have this issue. Waiting for advices.

@onetocny
Copy link
Contributor

Hi @trevors20,

In order to proceed with investigation, could please share following details?

  1. What kind of ARM service endpoint are you using in AzurePowerShell tasks?
  2. Where is the build agent running?
  3. What is content of script.ps1?
  4. How does the whole pipeline definition looks like?

@v-schhabra
Copy link
Contributor

Hi @trevors20
Could you please share the above details if the issue is occurring?

@rcpd-alt
Copy link

rcpd-alt commented Sep 27, 2024

We're seeing this too, earliest seen build affected is 5.245.1 and still seeing it in 5.245.3

@trevors20
Copy link
Author

@onetocny , @v-schhabra
My apologies. Was on vacay when the questions were sent. Just now looking to see the status and just saw the questions.

  1. This is an ARM using workload identity federation with openid connect
  2. Not quite sure what you are looking for here. Do you mean the azure subscription? Or, the agent pool?
  3. I'll send directly
  4. I'll send directly

@onetocny
Copy link
Contributor

onetocny commented Nov 4, 2024

We have identified the root cause in Az.Accounts module. You can find more details about that in related issue Azure/azure-powershell#26623. In parallel we were able to mitigate the issue on our side. New fixed versions (X.248.Y) of all PowerShell based tasks will be deployed next week.

If there is a need to use latest AzurePowerShell tasks immediately there is a confirmed workaround. You can prepend following task before the failing AzurePowerShell tasks:

- task: PowerShell@2
        inputs:
          targetType: 'inline'
          script: |
            Install-Module Az.Accounts -Force -AllowClobber -ErrorAction Stop -SkipPublisherCheck
            Import-Module Az.Accounts -Global -PassThru -Force
            Update-AzConfig -CheckForUpgrade $false

@henriblMSFT
Copy link

henriblMSFT commented Dec 6, 2024

I'm intermittently hitting stack overflow with version 5.248.3 and looking at the call stack in the linked item I believe the UpgradeNotificationHelper exposes the problem but isn't the root cause. I believe the root cause is that attempting to call a powershell defined AssemblyResolver from a background thread when the assembly to resolve is a resource string will cause a stack overflow. This happens with UpgradeNotificationHelper because it runs in the background but any other background thread may cause this issue.

Unfortunately, I don't have the call stack from my failed run so I am extrapolating here.

If we look at the callstack from the linked Az Powershell issue , the stackoverflow is around assembly resolution. The assembly resolver that's causing the stack overflow is a PowerShell ScriptBlock:

[d:\os\src\onecore\admin\monad\src\engine\lang\Scriptblock.cs @ 774]
unknown!DynamicClass.lambda_method+0x111

Digging further, it gets into the PowerShell internal where it seems to be triggering an error:
:

System_Management_Automation_ni!System.Management.Automation.ErrorCategoryInfo.Ellipsize+0x56 [d:\os\src\onecore\admin\monad\src\engine\ErrorPackage.cs @ 511]
System_Management_Automation_ni!System.Management.Automation.ScriptBlock.GetContextFromTLS+0xc97f60 

Looking at what the GetContextFromTLS does:

internal ExecutionContext GetContextFromTLS()
{
	ExecutionContext executionContextFromTLS = LocalPipeline.GetExecutionContextFromTLS();
	if (executionContextFromTLS == null)
	{
		string original = ToString();
		original = ErrorCategoryInfo.Ellipsize(CultureInfo.CurrentUICulture, original);
		PSInvalidOperationException ex = PSTraceSource.NewInvalidOperationException(ParserStrings.ScriptBlockDelegateInvokedFromWrongThread, original);
		ex.SetErrorId("ScriptBlockDelegateInvokedFromWrongThread");
		throw ex;
	}
	return executionContextFromTLS;
}

It is trying to emit a ScriptBlockDelegateInvokedFromWrongThread error but to do this it needs to get a localized version of the error string. Since the resource assembly isn't loaded, the runtime attempts to load the assembly, which goes through the PowerShell resolver which triggers an error because it is in a background thread which then tries to resolve resource assembly which runs the PowerShell assembly resolver, and we are now in our stack overflow scenario.

Searching azure-pipeline-tasks for cases where an assembly resolver is registered with powershell script we find two instances:

It turns out, that in Azure PowerShell the VstsAzureHelpers module is loaded.

So, to truly fix all the cases where this can happen, VstsAzureHelpers_ would need to be updated to use a c# assembly resolver instead of a powershell script block in VstsAzureHelpers

cc @starkmsu who introduced this change

Call stack for reference

unknown!DynamicClass.lambda_method+0x111
mscorlib_ni!System.AppDomain.OnAssemblyResolveEvent+0xb1 [f:\dd\ndp\clr\src\BCL\system\appdomain.cs @ 3188]
clr!CallDescrWorkerInternal+0x83
clr!CallDescrWorkerWithHandler+0x47
clr!MethodDescCallSite::CallTargetWorker+0xfa
clr!AppDomain::RaiseAssemblyResolveEvent+0x26d9f6
clr!AppDomain::TryResolveAssembly+0x90
clr!AppDomain::PostBindResolveAssembly+0xcc
clr!AppDomain::BindAssemblySpec+0x26aeac
clr!AssemblySpec::LoadDomainAssembly+0x1e6
clr!AssemblyNative::Load+0x3c4
mscorlib_ni!System.Reflection.RuntimeAssembly.InternalGetSatelliteAssembly+0x155 [f:\dd\ndp\clr\src\BCL\system\reflection\assembly.cs @ 2937]
mscorlib_ni!System.Resources.ManifestBasedResourceGroveler.GetSatelliteAssembly+0xf5 [f:\dd\ndp\clr\src\BCL\system\resources\manifestbasedresourcegroveler.cs @ 555]
mscorlib_ni!System.Resources.ManifestBasedResourceGroveler.GrovelForResourceSet+0x1eb [f:\dd\ndp\clr\src\BCL\system\resources\manifestbasedresourcegroveler.cs @ 89]
mscorlib_ni!System.Resources.ResourceManager.InternalGetResourceSet+0x40f [f:\dd\ndp\clr\src\BCL\system\resources\resourcemanager.cs @ 808]
mscorlib_ni!System.Resources.ResourceManager.InternalGetResourceSet+0x2b [f:\dd\ndp\clr\src\BCL\system\resources\resourcemanager.cs @ 752]
mscorlib_ni!System.Resources.ResourceManager.GetString+0x235 [f:\dd\ndp\clr\src\BCL\system\resources\resourcemanager.cs @ 1316]
System_Management_Automation_ni!System.Management.Automation.ErrorCategoryInfo.Ellipsize+0x56 [d:\os\src\onecore\admin\monad\src\engine\ErrorPackage.cs @ 511]
System_Management_Automation_ni!System.Management.Automation.ScriptBlock.GetContextFromTLS+0xc97f60 [d:\os\src\onecore\admin\monad\src\engine\lang\Scriptblock.cs @ 799]
System_Management_Automation_ni!System.Management.Automation.ScriptBlock.InvokeAsDelegateHelper+0x2b [d:\os\src\onecore\admin\monad\src\engine\lang\Scriptblock.cs @ 774]
unknown!DynamicClass.lambda_method+0x111
mscorlib_ni!System.AppDomain.OnAssemblyResolveEvent+0xb1 [f:\dd\ndp\clr\src\BCL\system\appdomain.cs @ 3188]
clr!CallDescrWorkerInternal+0x83
clr!CallDescrWorkerWithHandler+0x47
clr!MethodDescCallSite::CallTargetWorker+0xfa
clr!AppDomain::RaiseAssemblyResolveEvent+0x26d9f6
clr!AppDomain::TryResolveAssembly+0x90
clr!AppDomain::PostBindResolveAssembly+0xcc
clr!AppDomain::BindAssemblySpec+0x26aeac
clr!AssemblySpec::LoadDomainAssembly+0x1e6
clr!AssemblyNative::Load+0x3c4
mscorlib_ni!System.Reflection.RuntimeAssembly.InternalGetSatelliteAssembly+0x155 [f:\dd\ndp\clr\src\BCL\system\reflection\assembly.cs @ 2937]
mscorlib_ni!System.Resources.ManifestBasedResourceGroveler.GetSatelliteAssembly+0xf5 [f:\dd\ndp\clr\src\BCL\system\resources\manifestbasedresourcegroveler.cs @ 555]
mscorlib_ni!System.Resources.ManifestBasedResourceGroveler.GrovelForResourceSet+0x1eb [f:\dd\ndp\clr\src\BCL\system\resources\manifestbasedresourcegroveler.cs @ 89]
mscorlib_ni!System.Resources.ResourceManager.InternalGetResourceSet+0x40f [f:\dd\ndp\clr\src\BCL\system\resources\resourcemanager.cs @ 808]
mscorlib_ni!System.Resources.ResourceManager.InternalGetResourceSet+0x2b [f:\dd\ndp\clr\src\BCL\system\resources\resourcemanager.cs @ 752]
mscorlib_ni!System.Resources.ResourceManager.GetString+0x235 [f:\dd\ndp\clr\src\BCL\system\resources\resourcemanager.cs @ 1316]```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants