TensorFlow 1.5에서 TensorFlow 2.0으로 마이그레이션하는 동안이 오류가 발생했습니다. 이 모델이 1.5에서 올바르게 실행 된다는 것을 구체적으로 말하고 싶습니다 . 변경된 유일한 것은 .fit () 을 공급하는 동안 생성기 (BTW, 배치 크기는 8)에서 tf.Dataset 으로 마이그레이션 하는 것 입니다.
저는 GPU의 OOM 문제와 관련하여 Stack Overflow에서 많은 스레드를 살펴 봤지만 대부분은 정말 큰 텐서 문제에 관한 것이지만, 제 것은 작은 [256,128] 또는 큰 배치 크기입니다.
내 모델은 다음과 같습니다.
def build_model(self):
self.g_Model = Sequential()
self.g_Model.add(Embedding(input_dim=self.g_Max_features, output_dim=256, name='X'))
self.g_Model.add(LSTM(128))
self.g_Model.add(Dropout(0.5))
self.g_Model.add(Dense(1, activation='sigmoid'))
self.g_Model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
요약:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
X (Embedding) (None, None, 256) 256000
_________________________________________________________________
lstm (LSTM) (None, 128) 197120
_________________________________________________________________
dropout (Dropout) (None, 128) 0
_________________________________________________________________
dense (Dense) (None, 1) 129
=================================================================
Total params: 453,249
Trainable params: 453,249
Non-trainable params: 0
내 기차 기능은 다음과 같습니다.
def train_model(self):
if self.g_Model is None:
self.build_model()
dataset = self.prepare_the_data()
self.g_Model.fit(dataset, epochs=2)
그리고 데이터 자체의 준비 :
@staticmethod
def prepare_the_data():
lstm_feature_description = {
'X_input': tf.io.FixedLenFeature(CONFIG.g_keras_lstm_max_document_length, tf.float32),
'y': tf.io.FixedLenFeature((), tf.int64),
}
def _parse_lstm_function(example_proto):
# Parse the input tf.Example proto using the dictionary above.
parsed = tf.io.parse_single_example(serialized=example_proto, features=lstm_feature_description)
return parsed["X_input"], parsed["y"]
# Start Preparing The Data
dataset = tf.data.TFRecordDataset(CONFIG.g_record_file_lstm)
dataset = dataset.shuffle(buffer_size=5000)
dataset = dataset.map(map_func=_parse_lstm_function)
dataset = dataset.batch(batch_size=1)
for next_element in dataset:
tf.print(next_element)
return dataset
데이터 세트에는 40 개의 요소가 포함됩니다. 그중 하나는 다음과 같습니다.
([[0 0 0 ... 1 10 3]], [0])
X_input 은 24000 크기 의 tensorflow.python.framework.ops.EagerTensor 이고 y 는 동일한 유형이지만 크기는 1 (레이블)입니다.
따라서 .fit ()을 실행할 때 다음 OOM 오류 (1 부)가 표시됩니다.
2019-11-02 18:42:52.426444: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 128.0KiB (rounded to 131072). Current allocation summary follows.
2019-11-02 18:42:52.428463: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 2753, Chunks in use: 2753. 688.3KiB allocated for chunks. 688.3KiB in use in bin. 10.8KiB client-requested in use in bin.
2019-11-02 18:42:52.428723: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 78217, Chunks in use: 78217. 38.19MiB allocated for chunks. 38.19MiB in use in bin. 38.19MiB client-requested in use in bin.
2019-11-02 18:42:52.428982: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 24001, Chunks in use: 24001. 23.44MiB allocated for chunks. 23.44MiB in use in bin. 23.44MiB client-requested in use in bin.
2019-11-02 18:42:52.429247: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): Total Chunks: 3, Chunks in use: 3. 6.0KiB allocated for chunks. 6.0KiB in use in bin. 6.0KiB client-requested in use in bin.
2019-11-02 18:42:52.429481: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.429704: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.429920: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.430138: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.430359: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): Total Chunks: 10892, Chunks in use: 10892. 680.75MiB allocated for chunks. 680.75MiB in use in bin. 680.75MiB client-requested in use in bin.
2019-11-02 18:42:52.430613: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): Total Chunks: 10894, Chunks in use: 10894. 1.33GiB allocated for chunks. 1.33GiB in use in bin. 1.33GiB client-requested in use in bin.
2019-11-02 18:42:52.430855: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): Total Chunks: 3, Chunks in use: 3. 1022.8KiB allocated for chunks. 1022.8KiB in use in bin. 768.0KiB client-requested in use in bin.
2019-11-02 18:42:52.431091: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): Total Chunks: 3, Chunks in use: 3. 2.00MiB allocated for chunks. 2.00MiB in use in bin. 1.50MiB client-requested in use in bin.
2019-11-02 18:42:52.431323: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431539: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431755: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431970: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.432193: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.432419: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.442986: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443324: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443543: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443767: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 128.0KiB was 128.0KiB, Chunk State:
2019-11-02 18:42:52.443895: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1048576
2019-11-02 18:42:52.444010: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600000 next 1 of size 1280
2019-11-02 18:42:52.444139: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600500 next 9 of size 256
2019-11-02 18:42:52.444267: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600600 next 13 of size 256
...
2 부:
2019-11-02 18:44:43.211483: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 525056 totalling 512.8KiB
2019-11-02 18:44:43.211607: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1047808 totalling 1023.3KiB
2019-11-02 18:44:43.211731: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 2.06GiB
2019-11-02 18:44:43.211851: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 2210712576 memory_limit_: 2210712780 available bytes: 204 curr_region_allocation_bytes_: 4294967296
2019-11-02 18:44:43.212060: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 2210712780
InUse: 2210712576
MaxInUse: 2210712576
NumAllocs: 137751
MaxAllocSize: 33554432
2019-11-02 18:44:43.216115: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ****************************************************************************************************
2019-11-02 18:44:43.216331: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at split_op.cc:311 : Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2019-11-02 18:44:43.216642: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node sequential/lstm/while/body/_1/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Reshape_12/_28]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
2019-11-02 18:44:43.223629: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node sequential/lstm/while/body/_1/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
운없이 이미 시도한 것 :
내 모델이 매우 작고 배치 크기가 1에 불과하기 때문에 무슨 일이 일어나고 있는지 정말 이해하지 못합니다. 저는 GTX1060 3GB를 사용하고 있습니다. 따라서 어떤 도움이라도 대단히 감사합니다. 감사!
내 실수가 얼마나 어리석은 지 믿지 못할 것입니다. @OverLordGoldDragon이 게시 한 다른 Q & A 후에 운이 좋을 때만 알아볼 수있었습니다.
가져 오기 단계에서 다음 진술을 활용했습니다.
from tensorflow_core.python.keras.layers import Dense, Dropout, LSTM, Embedding
from tensorflow_core.python.keras.models import Sequential, load_model
from tensorflow_core.python.keras.preprocessing import sequence
대신 다음을 사용해야합니다.
from tensorflow.keras.layers import Dense, Dropout, LSTM, Embedding
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.preprocessing import sequence
BTW, 최신 PyCharm Professional은 tf.keras 문에 대한 자동 완성 기능을 제공하지 않아 처음에 저를 거절했습니다. 놀랍게도 tf.python.keras 자동 완성이 올바르게 작동합니다.
더 많은 정보는 여기에서 찾을 수 있습니다 : tf.python.keras 관련 문제
이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.
침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제
몇 마디 만하겠습니다