How to access Google Cloud Storage Bucket from AI Platform job

bhoeksem

My Google AI Platform / ML Engine training job doesn't seem to have access to the training file I put into a Google Cloud Storage bucket.

Google's AI Platform / ML Engine requires you store training data files in one of their Cloud Storage buckets. Accessing locally from CLI works fine. However, when I send a training job (after ensuring the data is in the appropriate location in my Cloud Storage bucket), I get an error seeming to be due to no access to the bucket Link URL.

The error is from trying to read what looks to me like the contents of a web page that Google served up saying "Hey, you don't have access to this." I see this gaia.loginAutoRedirect.start(5000, and a URL with this flag at the end: noautologin=true.

I know permissions between AI Platform and Cloud Storage are a thing, but both are under the same project. The walkthroughs I'm using at very least imply that no further action is required if under the same project.

I am assuming I need to use the Link URL provided in the bucket Overview tab. Tried the Link for gsutil but the python (from Google's CloudML Samples repo) was upset about using gs://.

I think Google's examples are proving insufficient since their example data is from a public URL rather than a private Cloud Storage bucket.

Ultimately, the error message I get is a Python error. But like I said, this is preceded by a bunch of gross INFO logs of HTML/CSS/JS from Google saying I don't have permission to get the file I'm trying to get. These logs are actually just because I added a print statement to the util.py file as well - right before read_csv() on the train file. (So the Python parse error is due to trying to parse HTML as a CSV).

... 
INFO    g("gaia.loginAutoRedirect.stop",function(){var b=n;b.b=!0;b.a&&(clearInterval(b.a),b.a=null)});
INFO    gaia.loginAutoRedirect.start(5000,
INFO    'https:\x2F\x2Faccounts.google.com\x2FServiceLogin?continue=https%3A%2F%2Fstorage.cloud.google.com%2F<BUCKET_NAME>%2Fdata%2F%2Ftrain.csv\x26followup=https%3A%2F%2Fstorage.cloud.google.com%2F<BUCKET_NAME>%2Fdata%2F%2Ftrain.csv\x26service=cds\x26passive=1209600\x26noautologin=true',
ERROR   Command '['python', '-m', u'trainer.task', u'--train-files', u'gs://<BUCKET_NAME>/data/train.csv', u'--eval-files', u'gs://<BUCKET_NAME>/data/test.csv', u'--batch-pct', u'0.2', u'--num-epochs', u'1000', u'--verbosity', u'DEBUG', '--job-dir', u'gs://<BUCKET_NAME>/predictor']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 137, in <module>
    train_and_evaluate(args)
  File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 80, in train_and_evaluate
    train_x, train_y, eval_x, eval_y = util.load_data()
  File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 168, in load_data
    train_df = pd.read_csv(training_file_path, header=0, names=_CSV_COLUMNS, na_values='?')
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
ParserError: Error tokenizing data. C error: Expected 5 fields in line 205, saw 961

To get the data, I'm more or less trying to mimic this: https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tf-keras/trainer/util.py

Various ways I have tried to address my bucket in my copy of util.py: https://console.cloud.google.com/storage/browser/<BUCKET_NAME>/data (think this was the "Link URL" back in May)
https://storage.cloud.google.com/<BUCKET_NAME>/data (this is the "Link URL" now - in July)
gs://<BUCKET_NAME>/data (this is the URI - which gives a different error about not liking gs as a url type)

rpasricha

Transferring the answer from a comment above:

Looks like the URL approach requires cookie based authentication if it's not a public object. Instead of using a URL, I would suggest using tf.gfile with a gs:// path, as is used in the Keras sample. If you need to download the file from GCS in a separate step, you can use the GCS client library.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Google cloud AI platform error in executing job

From Dev

Access private files from a google cloud storage bucket

From Dev

Google Cloud Platform - How to upload image file into google cloud storage bucket?

From Dev

GCP: how to access cloud storage bucket from a VM instance

From Dev

How to copy a Google Cloud Storage Bucket from project to project?

From Dev

Downloading folders from Google Cloud Storage Bucket

From Dev

pip install from a Google Cloud Storage bucket

From Python

How to download a file from Google Cloud Platform storage

From Dev

How to share a bucket in Google Cloud Storage

From Dev

How to move files in Google Cloud Storage from one bucket to another bucket by Python

From Dev

How to copy a file from Google Cloud Storage bucket 1 to bucket 2 while keeping ACLs using gsutil

From Dev

Copy from Google Cloud Storage Bucket to S3 Bucket

From Dev

Ai Platform Prediction storage bucket trigger

From Dev

Copy a directory recursively from Google Cloud Storage Bucket to another Google Cloud Storage Bucket using Django

From Dev

How to get write access to the library folder in AI Platform R 3.6 notebook instance in Google Cloud

From Dev

How can I use a Google Cloud Function to push a file from a Cloud Storage bucket into an instance?

From Dev

How can I give my Google Cloud App Engine access to my Firebase Storage bucket

From Dev

Google Cloud Storage: How can I grant an installed application access to only one bucket?

From Dev

Google Cloud Storage - How to access folders and files like static resources inside bucket - Spring Boot

From Dev

How to access the latest uploaded object in google cloud storage bucket using python in tensorflow model

From Dev

Enabling web server access logs for a Google Cloud Storage bucket

From Dev

Google Cloud Storage bucket - signed URL with directory access

From Java

set google cloud storage bucket access logs in java

From Dev

Granting access to a Google Cloud Storage bucket to a third-party

From Dev

How to import a SQL Dump File from Google Cloud Storage into Cloud SQL as a Daily Job?

From Dev

How to let service account on Google Cloud Platform to have access to the Firebase Storage?

From Dev

google functions: access to google storage bucket from a function

From Dev

Restore Bucket on Google Cloud Storage

From Dev

Is it possible to use file system instead of actual Storage bucket in the cloud for development purposes (Google Cloud Platform)

Related Related

  1. 1

    Google cloud AI platform error in executing job

  2. 2

    Access private files from a google cloud storage bucket

  3. 3

    Google Cloud Platform - How to upload image file into google cloud storage bucket?

  4. 4

    GCP: how to access cloud storage bucket from a VM instance

  5. 5

    How to copy a Google Cloud Storage Bucket from project to project?

  6. 6

    Downloading folders from Google Cloud Storage Bucket

  7. 7

    pip install from a Google Cloud Storage bucket

  8. 8

    How to download a file from Google Cloud Platform storage

  9. 9

    How to share a bucket in Google Cloud Storage

  10. 10

    How to move files in Google Cloud Storage from one bucket to another bucket by Python

  11. 11

    How to copy a file from Google Cloud Storage bucket 1 to bucket 2 while keeping ACLs using gsutil

  12. 12

    Copy from Google Cloud Storage Bucket to S3 Bucket

  13. 13

    Ai Platform Prediction storage bucket trigger

  14. 14

    Copy a directory recursively from Google Cloud Storage Bucket to another Google Cloud Storage Bucket using Django

  15. 15

    How to get write access to the library folder in AI Platform R 3.6 notebook instance in Google Cloud

  16. 16

    How can I use a Google Cloud Function to push a file from a Cloud Storage bucket into an instance?

  17. 17

    How can I give my Google Cloud App Engine access to my Firebase Storage bucket

  18. 18

    Google Cloud Storage: How can I grant an installed application access to only one bucket?

  19. 19

    Google Cloud Storage - How to access folders and files like static resources inside bucket - Spring Boot

  20. 20

    How to access the latest uploaded object in google cloud storage bucket using python in tensorflow model

  21. 21

    Enabling web server access logs for a Google Cloud Storage bucket

  22. 22

    Google Cloud Storage bucket - signed URL with directory access

  23. 23

    set google cloud storage bucket access logs in java

  24. 24

    Granting access to a Google Cloud Storage bucket to a third-party

  25. 25

    How to import a SQL Dump File from Google Cloud Storage into Cloud SQL as a Daily Job?

  26. 26

    How to let service account on Google Cloud Platform to have access to the Firebase Storage?

  27. 27

    google functions: access to google storage bucket from a function

  28. 28

    Restore Bucket on Google Cloud Storage

  29. 29

    Is it possible to use file system instead of actual Storage bucket in the cloud for development purposes (Google Cloud Platform)

HotTag

Archive