We recently have seen flakyness/failures in our CI/CD system that seem related to the auth action when using it with workload identity federation.
Failures in our tests:
Get "https://sqladmin.googleapis.com/sql/v1beta4/projects/***/instances/us-central1~***/connectSettings?alt=json&prettyPrint=false": oauth2/google: unable to generate access token: Post "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/***:generateAccessToken": oauth2/google: invalid response when retrieving subject token: Get "https://pipelines.actions.githubusercontent.com/wnFNgWBjsU8cogeeTzb2CiO5AuGdnZuICpyvLHwtISiGZGW9qa/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/c74b5a43-cfd0-4a22-8e48-7ecf7f3f3bfa/jobs/a87764f3-d2a4-5991-9b9b-9ec78441f076/idtoken?api-version=2.0&audience=https%!A(MISSING)%!F(MISSING)%!F(MISSING)iam.googleapis.com%!F(MISSING)projects%!F(MISSING)174904406655%!F(MISSING)locations%!F(MISSING)global%!F(MISSING)workloadIdentityPools%!F(MISSING)gh-13a715-cloudsql-proxy%!F(MISSING)providers%!F(MISSING)gh-13a715-cloudsql-proxy"
Flakybot issues on our repo for context:
GoogleCloudPlatform/cloud-sql-proxy#1649
GoogleCloudPlatform/cloud-sql-proxy#1648
Build normally passes without issues.
Flakyness resulting from unable to generate access token using auth
creds.
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: v1 periodic
on:
schedule:
- cron: '0 2 * * *'
jobs:
integration:
name: integration tests
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [macos-latest, windows-latest, ubuntu-latest]
fail-fast: false
permissions:
contents: 'read'
id-token: 'write'
steps:
- name: Checkout code
uses: 'actions/[email protected]'
with:
ref: v1
- name: Setup Go
uses: actions/[email protected]
with:
go-version: 1.19
- id: 'auth'
name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/[email protected]'
with:
workload_identity_provider: ${{ secrets.PROVIDER_NAME }}
service_account: ${{ secrets.SERVICE_ACCOUNT }}
access_token_lifetime: 600s
- id: 'secrets'
name: Get secrets
uses: 'google-github-actions/[email protected]'
with:
secrets: |-
MYSQL_CONNECTION_NAME:${{ secrets.GOOGLE_CLOUD_PROJECT }}/MYSQL_CONNECTION_NAME
MYSQL_USER:${{ secrets.GOOGLE_CLOUD_PROJECT }}/MYSQL_USER
MYSQL_PASS:${{ secrets.GOOGLE_CLOUD_PROJECT }}/MYSQL_PASS
MYSQL_DB:${{ secrets.GOOGLE_CLOUD_PROJECT }}/MYSQL_DB
POSTGRES_CONNECTION_NAME:${{ secrets.GOOGLE_CLOUD_PROJECT }}/POSTGRES_CONNECTION_NAME
POSTGRES_USER:${{ secrets.GOOGLE_CLOUD_PROJECT }}/POSTGRES_USER
POSTGRES_USER_IAM:${{ secrets.GOOGLE_CLOUD_PROJECT }}/POSTGRES_USER_IAM
POSTGRES_PASS:${{ secrets.GOOGLE_CLOUD_PROJECT }}/POSTGRES_PASS
POSTGRES_DB:${{ secrets.GOOGLE_CLOUD_PROJECT }}/POSTGRES_DB
SQLSERVER_CONNECTION_NAME:${{ secrets.GOOGLE_CLOUD_PROJECT }}/SQLSERVER_CONNECTION_NAME
SQLSERVER_USER:${{ secrets.GOOGLE_CLOUD_PROJECT }}/SQLSERVER_USER
SQLSERVER_PASS:${{ secrets.GOOGLE_CLOUD_PROJECT }}/SQLSERVER_PASS
SQLSERVER_DB:${{ secrets.GOOGLE_CLOUD_PROJECT }}/SQLSERVER_DB
- name: Enable fuse config (Linux)
if: runner.os == 'Linux'
run: |
sudo sed -i 's/#user_allow_other/user_allow_other/g' /etc/fuse.conf
- name: Run tests
env:
GOOGLE_CLOUD_PROJECT: '${{ secrets.GOOGLE_CLOUD_PROJECT }}'
MYSQL_CONNECTION_NAME: '${{ steps.secrets.outputs.MYSQL_CONNECTION_NAME }}'
MYSQL_USER: '${{ steps.secrets.outputs.MYSQL_USER }}'
MYSQL_PASS: '${{ steps.secrets.outputs.MYSQL_PASS }}'
MYSQL_DB: '${{ steps.secrets.outputs.MYSQL_DB }}'
POSTGRES_CONNECTION_NAME: '${{ steps.secrets.outputs.POSTGRES_CONNECTION_NAME }}'
POSTGRES_USER: '${{ steps.secrets.outputs.POSTGRES_USER }}'
POSTGRES_USER_IAM: '${{ steps.secrets.outputs.POSTGRES_USER_IAM }}'
POSTGRES_PASS: '${{ steps.secrets.outputs.POSTGRES_PASS }}'
POSTGRES_DB: '${{ steps.secrets.outputs.POSTGRES_DB }}'
SQLSERVER_CONNECTION_NAME: '${{ steps.secrets.outputs.SQLSERVER_CONNECTION_NAME }}'
SQLSERVER_USER: '${{ steps.secrets.outputs.SQLSERVER_USER }}'
SQLSERVER_PASS: '${{ steps.secrets.outputs.SQLSERVER_PASS }}'
SQLSERVER_DB: '${{ steps.secrets.outputs.SQLSERVER_DB }}'
TMPDIR: "/tmp"
TMP: '${{ runner.temp }}'
# specifying bash shell ensures a failure in a piped process isn't lost by using `set -eo pipefail`
shell: bash
run: |
go test -race -v ./... | tee test_results.txt
- name: Convert test output to XML
if: ${{ github.event_name == 'schedule' && always() }}
run: |
go install github.com/jstemmer/go-junit-report/[email protected]
go-junit-report -in test_results.txt -set-exit-code -out v1periodic_sponge_log.xml
- name: FlakyBot (Linux)
# only run flakybot on periodic (schedule) event
if: ${{ github.event_name == 'schedule' && runner.os == 'Linux' && always() }}
run: |
curl https://github.com/googleapis/repo-automation-bots/releases/download/flakybot-1.1.0/flakybot -o flakybot -s -L
chmod +x ./flakybot
./flakybot --repo ${{github.repository}} --commit_hash ${{github.sha}} --build_url https://github.com/${{github.repository}}/actions/runs/${{github.run_id}}
- name: FlakyBot (Windows)
# only run flakybot on periodic (schedule) event
if: ${{ github.event_name == 'schedule' && runner.os == 'Windows' && always() }}
run: |
curl https://github.com/googleapis/repo-automation-bots/releases/download/flakybot-1.1.0/flakybot.exe -o flakybot.exe -s -L
./flakybot.exe --repo ${{github.repository}} --commit_hash ${{github.sha}} --build_url https://github.com/${{github.repository}}/actions/runs/${{github.run_id}}
- name: FlakyBot (macOS)
# only run flakybot on periodic (schedule) event
if: ${{ github.event_name == 'schedule' && runner.os == 'macOS' && always() }}
run: |
curl https://github.com/googleapis/repo-automation-bots/releases/download/flakybot-1.1.0/flakybot-darwin-amd64 -o flakybot -s -L
chmod +x ./flakybot
./flakybot --repo ${{github.repository}} --commit_hash ${{github.sha}} --build_url https://github.com/${{github.repository}}/actions/runs/${{github.run_id}}
https://github.com/GoogleCloudPlatform/cloud-sql-proxy/actions/runs/4149257897/jobs/7178042999#step:7:354
No response
Hi there @jackwotherspoon
Thank you for opening an issue. Our team will triage this as soon as we can. Please take a moment to review the troubleshooting steps which lists common error messages and their resolution steps.
Thank you for opening an issue. I'm seeing a bunch of "!F(MISSING)" in that output. For example, I would expect:
https://pipelines.actions.githubusercontent.com/wnFNgWBjsU8cogeeTzb2CiO5AuGdnZuICpyvLHwtISiGZGW9qa/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/c74b5a43-cfd0-4a22-8e48-7ecf7f3f3bfa/jobs/a87764f3-d2a4-5991-9b9b-9ec78441f076/idtoken?api-version=2.0&audience=https%!A(MISSING)%!F(MISSING)%!F(MISSING)iam.googleapis.com%!F(MISSING)projects%!F(MISSING)174904406655%!F(MISSING)locations%!F(MISSING)global%!F(MISSING)workloadIdentityPools%!F(MISSING)gh-13a715-cloudsql-proxy%!F(MISSING)providers%!F(MISSING)gh-13a715-cloudsql-proxy
to be:
https://pipelines.actions.githubusercontent.com/wnFNgWBjsU8cogeeTzb2CiO5AuGdnZuICpyvLHwtISiGZGW9qa/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/c74b5a43-cfd0-4a22-8e48-7ecf7f3f3bfa/jobs/a87764f3-d2a4-5991-9b9b-9ec78441f076/idtoken?api-version=2.0&audience=https://iam.googleapis.com/projects/174904406655/locations/global/workloadIdentityPools/gh-13a715-cloudsql-proxy/providers/gh-13a715-cloudsql-proxy
It looks like some kind of stripping or variable substitution might be failing. Per the troubleshooting steps, can you please enable debug logging and provide the logs (or a link to the logs)? I can add a few retries, but I want to make sure I understand the problem first. Does reducing the concurrency help at all?
Thanks @jackwotherspoon. Do you know if each test is generating a new auth token through the WIF workflow? I wonder if generating an auth token and injecting it into the process instead of relying on ADC could help. That would mean you only have one auth exchange.
This line sets a token format, but you're not actually generating a token (token_format: 'access_token'
) or injecting it into the subsequent processes. That means each run does an ADC cycle, which might be why you're getting errors. I would expect the errors to be rate limits or quota though, not connection errors.
I wonder if our action or the nodejs action isn't properly cleaning up connections?
@sethvargo Thanks for the suggestions! I will look at how we can more efficiently generate tokens/creds as we do heavily rely on ADC currently.
This may be why we also see timeout errors in some of our runs:
Error: google-github-actions/get-secretmanager-secrets failed with: failed to access secret "projects/***/secrets/MYSQL_CONNECTION_NAME/versions/latest": request to https://pipelines.actions.githubusercontent.com/umAmnh0OhcfbtGEt7J16Yga6HsgM8dYIhPxbPiOYFLVwMnfbKz/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/833be520-2cef-45cf-837f-a087e0c4b14d/jobs/8b492971-3af8-5c25-5c38-fc3954dced57/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2F*** failed, reason: connect ETIMEDOUT 13.107.42.16:443
Owner Name | google-github-actions |
Repo Name | auth |
Full Name | google-github-actions/auth |
Language | TypeScript |
Created Date | 2021-09-16 |
Updated Date | 2023-03-24 |
Star Count | 573 |
Watcher Count | 16 |
Fork Count | 116 |
Issue Count | 3 |
Issue Title | Created Date | Updated Date |
---|