I am trying to switch from authenticating with long lived Service Account Key JSON to Workload Identity Federation in a TensorFlow application.
To test that authentication works correctly I am simply testing the existence of a blob in my buckets:
from google.cloud import storage
print(storage.Client().get_bucket("my-bucket").blob("my_blob").exists())
This works correctly for both authenticating with a Service Account Key and with Workload Identity Federation.
However using TensorFlow gfile only works with the old Service Account Key method and fails with Workload Identity Federation:
import tensorflow as tf
print(tf.io.gfile.exists("gs://my-bucket/my_blob"))
I'd expect both authentication methods to work equally well given that TensorFlow just uses the credentials from $GOOGLE_APPLICATION_CREDENTIALS
.
TensorFlow GFile only seems to work with service account keys and not with Workload Identity Federation.
name: Test
on:
push:
branches:
- main
pull_request: {}
jobs:
test:
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write
steps:
- uses: actions/[email protected]
- uses: actions/[email protected]
with:
python-version: 3.9
- uses: google-github-actions/[email protected]
with:
workload_identity_provider: 'projects/123456789/locations/global/workloadIdentityPools/my-pool/providers/my-provider'
service_account: '[email protected]'
- run: python test_gcs.py
W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "FAILED_PRECONDITION: Unexpected content of the JSON credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
Traceback (most recent call last):
File "/home/runner/work/test_gcs.py", line 16, in <module>
print(tf.io.gfile.exists("gs://my-bucket/my_blob"))
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 288, in file_exists_v2
_pywrap_file_io.FileExists(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{
"error": {
"code": 401,
"message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
"errors": [
{
"message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.",
"domain": "global",
"reason": "required",
"locationType": "header",
"location": "Authorization"
}
]
}
}
'
when reading metadata of gs://my-bucket/my_blob
No response
Hi @lgeiger
Please open an issue in the tensorflow repository. gfile will need to add support for Workload Identity Federation. Unfortunately there is nothing we can do in this project.
For reference, I opened tensorflow/tensorflow#57104 which hasn't seen any response yet.
@sethvargo Can you give some more detail on the analysis of how exactly this is a problem that is specific to gfile
?
Hi @vpipkt TensorFlow GFile needs to be updated to support Workload Identity Federation supplied by Application Default Credentials. If it uses official Google Cloud client libraries under the hood, it probably needs to update to the latest version. More details:
In the past, the were only two ways to authenticate to GCP:
About 2 years ago, GCP created Workload Identity Federation, which adds a third authentication mechanism and file format. GFile does not appear to support that format.
Owner Name | google-github-actions |
Repo Name | auth |
Full Name | google-github-actions/auth |
Language | TypeScript |
Created Date | 2021-09-16 |
Updated Date | 2023-03-24 |
Star Count | 573 |
Watcher Count | 16 |
Fork Count | 116 |
Issue Count | 3 |
Issue Title | Created Date | Updated Date |
---|