示例代码使用Google Cloud ML服务和Cloud Shell对错误进行了重新训练

查克·芬利(Chuck Finley)

在Cloud Shell中运行示例代码时,会出现错误,该示例代码是Google的@SlavenBilac发布的,用于使用Google Cloud Machine Learning和Cloud Dataflow对图像进行训练和分类。

代码被卡在global_step / sec:0

INFO    2017-02-16 06:28:36 -0600       master-replica-0                Start master session 538be2b71d17c4dc with config: 
ERROR   2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:ps"
ERROR   2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:master/task:0"
INFO    2017-02-16 06:28:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:30:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:32:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:34:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:36:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:38:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:40:39 -0600       master-replica-0                global_step/sec: 0
<keeps repeating until I kill the job>

基于Google对类似问题的@JoshGC答案,我昨天创建了一个全新的Google Cloud帐户(具有新的计费帐户,新项目等),然后运行CloudShell设置脚本和其他步骤来设置环境,然后运行示例代码对照样本花数据。发生错误(如下所示),因此我认为原因与数据无关,也与我的帐户配置无关。

如何修改GoogleCloudPlatform / cloudml-samples / flowers中的文件以避免此错误?

摘录:

运行示例代码

cfinley3@wordthree-wordfour-7654321:~/google-cloud-ml/samples/flowers$ ./sample.sh

Your active configuration is: [cloudshell-18758]
Using job id:  flowers_cfinley3_20170216_045347

预处理似乎还可以

python trainer/preprocess.py \
  --input_dict "$DICT_FILE" \
  --input_path "gs://cloud-ml-data/img/flower_photos/train_set.csv" \
  --output_path "${GCS_PATH}/preprocess/train" \
  --cloud

培训开始

gcloud beta ml jobs submit training "$JOB_ID" \
  --module-name trainer.task \
  --package-path trainer \
  --staging-bucket "$BUCKET" \
  --region us-central1 \
  -- \
  --output_path "${GCS_PATH}/training" \
  --eval_data_paths "${GCS_PATH}/preproc/eval*" \
  --train_data_paths "${GCS_PATH}/preproc/train*"
Job [flowers_cfinley3_20170216_045347] submitted successfully.

培训停留在global_step / sec:0

INFO    2017-02-16 06:24:48 -0600       unknown_task            Validating job requirements...
INFO    2017-02-16 06:24:48 -0600       unknown_task            Job creation request has been successfully validated.
INFO    2017-02-16 06:24:48 -0600       unknown_task            Job flowers_cfinley3_20170216_045347 is queued.
INFO    2017-02-16 06:24:55 -0600       unknown_task            Waiting for job to be provisioned.
INFO    2017-02-16 06:24:55 -0600       unknown_task            Waiting for TensorFlow to start.
INFO    2017-02-16 06:28:27 -0600       master-replica-0                Running task with arguments: --cluster={"master": ["master-9a431abe8e-0:2222"]} --task={"type": "master", "index": 0} --job={
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "package_uris": ["gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz"],
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "python_module": "trainer.task",
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "args": ["--output_path", "gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/training", "--eval_data_paths", "gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/eval*", "--train_data_paths", "gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/train*"],
INFO    2017-02-16 06:28:27 -0600       master-replica-0                  "region": "us-central1"
INFO    2017-02-16 06:28:27 -0600       master-replica-0                } --beta
INFO    2017-02-16 06:28:28 -0600       master-replica-0                Running module trainer.task.
INFO    2017-02-16 06:28:28 -0600       master-replica-0                Running command: gsutil -q cp gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz trainer-0.1.tar.gz
INFO    2017-02-16 06:28:29 -0600       master-replica-0                Installing the package: gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz
INFO    2017-02-16 06:28:29 -0600       master-replica-0                Running command: pip install --user --upgrade --force-reinstall trainer-0.1.tar.gz
INFO    2017-02-16 06:28:29 -0600       master-replica-0                Processing ./trainer-0.1.tar.gz
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Building wheels for collected packages: trainer
INFO    2017-02-16 06:28:30 -0600       master-replica-0                  Running setup.py bdist_wheel for trainer: started
INFO    2017-02-16 06:28:30 -0600       master-replica-0                creating '/tmp/tmpn9HeiIpip-wheel-/trainer-0.1-cp27-none-any.whl' and adding '.' to it
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/model.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/__init__.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/util.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer/preprocess.py'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/DESCRIPTION.rst'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/metadata.json'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/top_level.txt'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/METADATA'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                adding 'trainer-0.1.dist-info/RECORD'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                  Running setup.py bdist_wheel for trainer: finished with status 'done'
INFO    2017-02-16 06:28:30 -0600       master-replica-0                  Stored in directory: /root/.cache/pip/wheels/e8/0c/c7/b77d64796dbbac82503870c4881d606fa27e63942e07c75f0e
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Successfully built trainer
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Installing collected packages: trainer
INFO    2017-02-16 06:28:30 -0600       master-replica-0                Successfully installed trainer-0.1
INFO    2017-02-16 06:28:31 -0600       master-replica-0                Running command: python -m trainer.task --output_path gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/training --eval_data_paths gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/eval* --train_data_paths gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/train*
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Original job data: {u'package_uris': [u'gs://wordthree-wordfour-7654321-ml/flowers_cfinley3_20170216_045347/edafa5c7debed9fc8612af3c0dd33d145e23502e/trainer-0.1.tar.gz'], u'args': [u'--output_path', u'gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/training', u'--eval_data_paths', u'gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/eval*', u'--train_data_paths', u'gs://wordthree-wordfour-7654321-ml/cfinley3/flowers_cfinley3_20170216_045347/preproc/train*'], u'python_module': u'trainer.task', u'region': u'us-central1'}
INFO    2017-02-16 06:28:34 -0600       master-replica-0                setting eval batch size to 100
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Starting master/0
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Initialize GrpcChannelCache for job master -> {0 -> localhost:2222}
INFO    2017-02-16 06:28:34 -0600       master-replica-0                Started server with target: grpc://localhost:2222
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                From /root/.local/lib/python2.7/site-packages/trainer/task.py:211 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Instructions for updating:
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Please switch to tf.summary.merge_all.
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/logging_ops.py:270 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Instructions for updating:
WARNING 2017-02-16 06:28:35 -0600       master-replica-0                Please switch to tf.summary.merge.
INFO    2017-02-16 06:28:36 -0600       master-replica-0                Start master session 538be2b71d17c4dc with config: 
ERROR.  2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:ps"
ERROR.  2017-02-16 06:28:36 -0600       master-replica-0                device_filters: "/job:master/task:0"
INFO    2017-02-16 06:28:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:30:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:32:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:34:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:36:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:38:39 -0600       master-replica-0                global_step/sec: 0
INFO    2017-02-16 06:40:39 -0600       master-replica-0                global_step/sec: 0
杰里米·里维(Jeremy Levi)

看到这个类似的问题检查您的输入数据文件以确保它们不为空。如果您的数据文件为空,则这可​​能会导致此行为,因为TF会永远等待数据。

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

错误:7 PERMISSION_DENIED:您的应用已使用Google Cloud SDK中的最终用户凭据进行了身份验证

来自分类Dev

Google Cloud ML Engine GPU 错误

来自分类Dev

使用 Google Cloud ML Engine 和 XGBoost 调整超参数

来自分类Dev

Google Cloud:如何在 Cloud Datalab 中使用 Cloud ML

来自分类Dev

在Google Cloud Shell上设置环境以使用Cloud ML时出错

来自分类Dev

Google Cloud Messaging示例

来自分类Dev

如何在 Google Cloud ML 上定期训练和部署新的机器学习模型?如何自动化这个过程?

来自分类Dev

用于推荐的 Google Cloud ML

来自分类Dev

Objectify和Google Cloud端点示例

来自分类Dev

如何修改 TensorFlow 代码以接受在 Google Cloud ML 上进行预测的样本?

来自分类Dev

spring cloud zuul无法使用简单的示例代码

来自分类Dev

使用XMPP服务器和Google Cloud Messaging(或更新的Firebase Cloud Messaging)进行推送通知的Android聊天应用程序

来自分类Dev

尝试使用Google Cloud运行BigQuery示例,但出现问题

来自分类Dev

使用Google Cloud Scheduler和python脚本进行系统调用

来自分类Dev

没有日志,也没有 Google Cloud ML 训练作业的输出

来自分类Dev

使用Google Cloud Save进行Proguard配置

来自分类Dev

使用Google Cloud Datasore进行GQL查询

来自分类Dev

如何使用 Google Cloud 进行计算?

来自分类Dev

Spring Cloud退出代码和Docker重新启动

来自分类Dev

使用Google Cloud Messaging的推送通知错误

来自分类Dev

使用Cloud Proxy的Google Cloud Composer和MS SQL

来自分类Dev

Google Cloud Messaging示例主线程

来自分类Dev

探索Google Cloud of Google Cloud服务帐户的驱动器

来自分类Dev

错误安装Google Cloud SDK

来自分类Dev

Android:Google Cloud Messaging错误

来自分类Dev

Android:Google Cloud Messaging错误

来自分类Dev

适用于 Google Cloud 的 Powershell:使用服务帐号进行身份验证

来自分类Dev

如何使用服务帐户通过 Google Cloud SQL Java 进行身份验证

来自分类Dev

验证和使用 Google Cloud Speech API

Related 相关文章

  1. 1

    错误:7 PERMISSION_DENIED:您的应用已使用Google Cloud SDK中的最终用户凭据进行了身份验证

  2. 2

    Google Cloud ML Engine GPU 错误

  3. 3

    使用 Google Cloud ML Engine 和 XGBoost 调整超参数

  4. 4

    Google Cloud:如何在 Cloud Datalab 中使用 Cloud ML

  5. 5

    在Google Cloud Shell上设置环境以使用Cloud ML时出错

  6. 6

    Google Cloud Messaging示例

  7. 7

    如何在 Google Cloud ML 上定期训练和部署新的机器学习模型?如何自动化这个过程?

  8. 8

    用于推荐的 Google Cloud ML

  9. 9

    Objectify和Google Cloud端点示例

  10. 10

    如何修改 TensorFlow 代码以接受在 Google Cloud ML 上进行预测的样本?

  11. 11

    spring cloud zuul无法使用简单的示例代码

  12. 12

    使用XMPP服务器和Google Cloud Messaging(或更新的Firebase Cloud Messaging)进行推送通知的Android聊天应用程序

  13. 13

    尝试使用Google Cloud运行BigQuery示例,但出现问题

  14. 14

    使用Google Cloud Scheduler和python脚本进行系统调用

  15. 15

    没有日志,也没有 Google Cloud ML 训练作业的输出

  16. 16

    使用Google Cloud Save进行Proguard配置

  17. 17

    使用Google Cloud Datasore进行GQL查询

  18. 18

    如何使用 Google Cloud 进行计算?

  19. 19

    Spring Cloud退出代码和Docker重新启动

  20. 20

    使用Google Cloud Messaging的推送通知错误

  21. 21

    使用Cloud Proxy的Google Cloud Composer和MS SQL

  22. 22

    Google Cloud Messaging示例主线程

  23. 23

    探索Google Cloud of Google Cloud服务帐户的驱动器

  24. 24

    错误安装Google Cloud SDK

  25. 25

    Android:Google Cloud Messaging错误

  26. 26

    Android:Google Cloud Messaging错误

  27. 27

    适用于 Google Cloud 的 Powershell:使用服务帐号进行身份验证

  28. 28

    如何使用服务帐户通过 Google Cloud SQL Java 进行身份验证

  29. 29

    验证和使用 Google Cloud Speech API

热门标签

归档