tf.keras OOM even on a small LSTM model with a batch size of 1

debugcn 에 게시 Dev

none32

I've encountered this error while migrating to TensorFlow 2.0 from TensorFlow 1.5. I would like to specifically state that this model runs correctly on 1.5. The only thing that is changed is migration from generator (BTW, batch size was 8) to tf.Dataset while feeding .fit().

I've looked into a lot of threads on Stack Overflow regarding OOM issues on GPU, however, most of them were about the problems with the really huge tensors, while mine is a small [256,128] or with big batch sizes.

Here is my model:

def build_model(self):
    self.g_Model = Sequential()
    self.g_Model.add(Embedding(input_dim=self.g_Max_features, output_dim=256, name='X'))
    self.g_Model.add(LSTM(128))
    self.g_Model.add(Dropout(0.5))
    self.g_Model.add(Dense(1, activation='sigmoid'))
    self.g_Model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Summary:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
X (Embedding)                (None, None, 256)         256000    
_________________________________________________________________
lstm (LSTM)                  (None, 128)               197120    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
=================================================================
Total params: 453,249
Trainable params: 453,249
Non-trainable params: 0

Here is my train function:

def train_model(self):
    if self.g_Model is None:
        self.build_model()

    dataset = self.prepare_the_data()
    self.g_Model.fit(dataset, epochs=2)

And the preparation of the data itself:

@staticmethod
def prepare_the_data():
    lstm_feature_description = {
        'X_input': tf.io.FixedLenFeature(CONFIG.g_keras_lstm_max_document_length, tf.float32),
        'y': tf.io.FixedLenFeature((), tf.int64),
    }

    def _parse_lstm_function(example_proto):
        # Parse the input tf.Example proto using the dictionary above.
        parsed = tf.io.parse_single_example(serialized=example_proto, features=lstm_feature_description)
        return parsed["X_input"], parsed["y"]

    # Start Preparing The Data
    dataset = tf.data.TFRecordDataset(CONFIG.g_record_file_lstm)
    dataset = dataset.shuffle(buffer_size=5000)
    dataset = dataset.map(map_func=_parse_lstm_function)
    dataset = dataset.batch(batch_size=1)

    for next_element in dataset:
        tf.print(next_element)

    return dataset

Dataset contains 40 elements. Here is how one of them looks like:

([[0 0 0 ... 1 10 3]], [0])

X_input is a tensorflow.python.framework.ops.EagerTensor of 24000 size and y is of the same type, but size is 1 (just a label).

So, when running .fit() I receive the following OOM error (part 1):

2019-11-02 18:42:52.426444: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 128.0KiB (rounded to 131072).  Current allocation summary follows.
2019-11-02 18:42:52.428463: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256):   Total Chunks: 2753, Chunks in use: 2753. 688.3KiB allocated for chunks. 688.3KiB in use in bin. 10.8KiB client-requested in use in bin.
2019-11-02 18:42:52.428723: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512):   Total Chunks: 78217, Chunks in use: 78217. 38.19MiB allocated for chunks. 38.19MiB in use in bin. 38.19MiB client-requested in use in bin.
2019-11-02 18:42:52.428982: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024):  Total Chunks: 24001, Chunks in use: 24001. 23.44MiB allocated for chunks. 23.44MiB in use in bin. 23.44MiB client-requested in use in bin.
2019-11-02 18:42:52.429247: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048):  Total Chunks: 3, Chunks in use: 3. 6.0KiB allocated for chunks. 6.0KiB in use in bin. 6.0KiB client-requested in use in bin.
2019-11-02 18:42:52.429481: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.429704: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.429920: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.430138: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.430359: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536):     Total Chunks: 10892, Chunks in use: 10892. 680.75MiB allocated for chunks. 680.75MiB in use in bin. 680.75MiB client-requested in use in bin.
2019-11-02 18:42:52.430613: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072):    Total Chunks: 10894, Chunks in use: 10894. 1.33GiB allocated for chunks. 1.33GiB in use in bin. 1.33GiB client-requested in use in bin.
2019-11-02 18:42:52.430855: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144):    Total Chunks: 3, Chunks in use: 3. 1022.8KiB allocated for chunks. 1022.8KiB in use in bin. 768.0KiB client-requested in use in bin.
2019-11-02 18:42:52.431091: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288):    Total Chunks: 3, Chunks in use: 3. 2.00MiB allocated for chunks. 2.00MiB in use in bin. 1.50MiB client-requested in use in bin.
2019-11-02 18:42:52.431323: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431539: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431755: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.431970: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.432193: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.432419: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.442986: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443324: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443543: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-11-02 18:42:52.443767: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 128.0KiB was 128.0KiB, Chunk State: 
2019-11-02 18:42:52.443895: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 1048576
2019-11-02 18:42:52.444010: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600000 next 1 of size 1280
2019-11-02 18:42:52.444139: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600500 next 9 of size 256
2019-11-02 18:42:52.444267: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0000000703600600 next 13 of size 256
...

Part 2:

2019-11-02 18:44:43.211483: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 525056 totalling 512.8KiB
2019-11-02 18:44:43.211607: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 1047808 totalling 1023.3KiB
2019-11-02 18:44:43.211731: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 2.06GiB
2019-11-02 18:44:43.211851: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 2210712576 memory_limit_: 2210712780 available bytes: 204 curr_region_allocation_bytes_: 4294967296
2019-11-02 18:44:43.212060: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                  2210712780
InUse:                  2210712576
MaxInUse:               2210712576
NumAllocs:                  137751
MaxAllocSize:             33554432

2019-11-02 18:44:43.216115: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ****************************************************************************************************
2019-11-02 18:44:43.216331: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at split_op.cc:311 : Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2019-11-02 18:44:43.216642: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node sequential/lstm/while/body/_1/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Reshape_12/_28]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

2019-11-02 18:44:43.223629: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node sequential/lstm/while/body/_1/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

What I've tried already with no luck:

I've set set_memory_growth=True
Moved all the code from the train function, except building the model and .fit() itself
Lowered batch size to 1.

I really don't understand what is going on as my model is pretty small and batch size is just 1. I'm using GTX1060 3GB. So, any help is VERY appreciated. Thanks!

none32

You wouldn't believe how stupid my mistake was. I was able to recognize it only by luck after different Q&As posted by @OverLordGoldDragon.

During the import phase I've utilized the following statements:

from tensorflow_core.python.keras.layers import Dense, Dropout, LSTM, Embedding
from tensorflow_core.python.keras.models import Sequential, load_model
from tensorflow_core.python.keras.preprocessing import sequence

Instead, I should've used these:

from tensorflow.keras.layers import Dense, Dropout, LSTM, Embedding
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.preprocessing import sequence

BTW, latest PyCharm Professional does not provide auto-completion for tf.keras statements, which turned me down in the first place. By surprise, tf.python.keras auto-completetion works correctly.

More info could be found here: Issues with tf.python.keras

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-04-1

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

tf.keras OOM even on a small LSTM model with a batch size of 1

tf.keras OOM even on a small LSTM model with a batch size of 1

배치 크기가 1 인 작은 LSTM 모델에서도 tf.keras OOM

tf model.fit ()의 batch_size 대 tf.data.Dataset의 batch_size

Keras CNN : 모든 batch_size> 1 인 호환되지 않는 모양 [batch_size * 2,1] 대 [batch_size, 1]

Tensorflow-LSTM 모델 구축-tf.keras.layers.Dense () 필요

LSTM 용 tf.keras의 벡터 채우기

.tfrecord에서 tf.data.Dataset으로 tf.keras.model.fit으로

How to use a tf.keras model in a tf.data.Dataset generator?

tf.function 내에서 tf.keras.model.predict 사용

Keras.model과 함께 tf.keras.layers 사용

batch_size> 1에 대한 Keras (세그멘테이션 모델)와 호환되지 않는 모양 문제

Keras LSTM 오류

tf.keras에서 lstm의 입력을 마스킹하는 방법

Tf.keras.Model 클래스 ~ 자체 변수 성능

tf.keras.model의 add_loss 메서드에 대한 문서

Keras Sequential Model에서 tf.data.experimental.CsvDataset 사용

tf.keras.Model을 서브 클래 싱 할 때 tf.keras.add 레이어 생성

Keras에 레이어 포함 : Vocab Size +1

Keras에 레이어 포함 : Vocab Size +1

tf.keras.model.compile에 전달할 수있는 메트릭 목록

tf.data.Dataset의 길이 (data_size / batch_size)는 어떻게 얻습니까?

keras lstm 오류 : 1 개의 배열이 표시되어야합니다.

tf1.x saved_model.pb를 새 tf2.0 saved_model.pb에 다시 저장합니다.

Keras LSTM 온라인 학습

Keras LSTM : 첫 번째 인수

Keras, 상태 비 저장 LSTM

tf.data.Dataset.map ()에서 Keras의 predict_on_batch를 사용하는 방법은 무엇입니까?

tf.hard.models.model 대 tf.hard.model

tf.data.Dataset : 주어진 입력 유형에`batch_size` 인수를 지정하면 안됩니다.

tf.data.Dataset : 주어진 입력 유형에`batch_size` 인수를 지정하면 안됩니다.