TensorFlow gfile does not work via Workload Identity Federation

This issue has been tracked since 2022-08-11.

TL;DR

I am trying to switch from authenticating with long lived Service Account Key JSON to Workload Identity Federation in a TensorFlow application.

To test that authentication works correctly I am simply testing the existence of a blob in my buckets:

from google.cloud import storage

print(storage.Client().get_bucket("my-bucket").blob("my_blob").exists())

This works correctly for both authenticating with a Service Account Key and with Workload Identity Federation.

However using TensorFlow gfile only works with the old Service Account Key method and fails with Workload Identity Federation:

import tensorflow as tf

print(tf.io.gfile.exists("gs://my-bucket/my_blob"))

Expected behavior

I'd expect both authentication methods to work equally well given that TensorFlow just uses the credentials from $GOOGLE_APPLICATION_CREDENTIALS .

Observed behavior

TensorFlow GFile only seems to work with service account keys and not with Workload Identity Federation.

Action YAML

name: Test

on:
  push:
    branches:
      - main
  pull_request: {}

jobs:
  test:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write

    steps:
      - uses: actions/[email protected]
      - uses: actions/[email protected]
        with:
          python-version: 3.9
      - uses: google-github-actions/[email protected]
        with:
          workload_identity_provider: 'projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider'
          service_account: '[email protected]'
      - run: python test_gcs.py

Log output

W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "FAILED_PRECONDITION: Unexpected content of the JSON credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
Traceback (most recent call last):
  File "/home/runner/work/test_gcs.py", line 16, in <module>
    print(tf.io.gfile.exists("gs://my-bucket/my_blob"))
  File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 288, in file_exists_v2
    _pywrap_file_io.FileExists(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{
  "error": {
    "code": 401,
    "message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
    "errors": [
      {
        "message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
        "domain": "global",
        "reason": "required",
        "locationType": "header",
        "location": "Authorization"
      }
    ]
  }
}
'
	 when reading metadata of gs://my-bucket/my_blob

Additional information

No response

sethvargo wrote this answer on 2022-08-11

Hi @lgeiger

Please open an issue in the tensorflow repository. gfile will need to add support for Workload Identity Federation. Unfortunately there is nothing we can do in this project.

lgeiger wrote this answer on 2022-09-20

For reference, I opened tensorflow/tensorflow#57104 which hasn't seen any response yet.

vpipkt wrote this answer on 2022-11-11

@sethvargo Can you give some more detail on the analysis of how exactly this is a problem that is specific to gfile?

sethvargo wrote this answer on 2022-11-11

Hi @vpipkt TensorFlow GFile needs to be updated to support Workload Identity Federation supplied by Application Default Credentials. If it uses official Google Cloud client libraries under the hood, it probably needs to update to the latest version. More details:

In the past, the were only two ways to authenticate to GCP:

  1. Exported service account key JSON
  2. Machine identity (only for gcloud and workloads on GCP using the metadata server)

About 2 years ago, GCP created Workload Identity Federation, which adds a third authentication mechanism and file format. GFile does not appear to support that format.

More Details About Repo
Owner Name google-github-actions
Repo Name auth
Full Name google-github-actions/auth
Language TypeScript
Created Date 2021-09-16
Updated Date 2023-03-24
Star Count 573
Watcher Count 16
Fork Count 116
Issue Count 3

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date