Tensorflow Distribution을 사용하는 Keras 모델은 배치 크기> 1로 실패합니다.

debugcn 에 게시 Dev

과

Keras에서 사용자 지정 손실 함수를 정의하기 위해 tensorflow_probability의 분포를 사용하려고합니다. 좀 더 구체적으로 저는 Mixture Density Network를 구축하려고합니다.
BATCH_SIZE = 1이 (가에 대한 올바른 혼합 분포를 예측하기 위해 배운다 때 내 모델은 장난감 세트에서 작동 y사용 x). 그러나 batch_size> 1이면 "실패"합니다 ( y를 무시하고 모두 에 대해 동일한 분포를 예측 함 x). 이것은 내 문제가 batch_shape 대 sample_shape와 관련이 있다고 생각하게 만듭니다.

재현하려면 :

import random

import keras
from keras import backend as K
from keras.layers import Dense, Activation, LSTM, Input, Concatenate, Reshape, concatenate, Flatten, Lambda
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from keras.models import Sequential, Model
import tensorflow
import tensorflow_probability as tfp
tfd = tfp.distributions

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# generate toy dataset
random.seed(12902)
n_obs = 20000
x = np.random.uniform(size=(n_obs, 4))
df = pd.DataFrame(x, columns = ['x_{0}'.format(i) for i in np.arange(4)])

# 2 latent classes, with noisy assignment based on x_0, x_1, (x_2 and x_3 are noise)
df['latent_class'] = 0
df.loc[df.x_0 + df.x_1 + np.random.normal(scale=.05, size=n_obs) > 1, 'latent_class'] = 1
df.latent_class.value_counts()

# Latent class will determines which mixture distribution we draw from
d0 = tfd.MixtureSameFamily(
    mixture_distribution=tfd.Categorical(probs=[0.3, 0.7]),
    components_distribution=tfd.Normal(
      loc=[-1., 1], scale=[0.1, 0.5])) 
d0_samples = d0.sample(sample_shape=(df.latent_class == 0).sum()).numpy()

d1 = tfd.MixtureSameFamily(
    mixture_distribution=tfd.Categorical(probs=[0.5, 0.5]),
    components_distribution=tfd.Normal(
      loc=[-2., 2], scale=[0.2, 0.6]))

d1_samples = d1.sample(sample_shape=(df.latent_class == 1).sum()).numpy()

df.loc[df.latent_class == 0, 'y'] = d0_samples
df.loc[df.latent_class == 1, 'y'] = d1_samples

fig, ax = plt.subplots()
bins = np.linspace(-4, 5, 9*4 + 1)
df.y[df.latent_class == 0].hist(ax=ax, bins=bins, label='Class 0', alpha=.4, density=True)
df.y[df.latent_class == 1].hist(ax=ax, bins=bins, label='Class 1', alpha=.4, density=True)
ax.legend();

# mixture density network
N_COMPONENTS = 2  # number of components in the mixture
input_feature_space = 4

flat_input = Input(shape=(input_feature_space,), 
                   batch_shape=(None, input_feature_space), 
                   name='inputs')
x = Dense(6, activation='relu', 
             kernel_initializer='glorot_uniform',
             bias_initializer='ones')(flat_input)
x = Dense(6, activation='relu',
             kernel_initializer='glorot_uniform',
             bias_initializer='ones')(x)
# 3 params per component: weight, loc, scale
output = Dense(N_COMPONENTS*3,
             kernel_initializer='glorot_uniform',
             bias_initializer='ones')(x)

model = Model(inputs=[flat_input],
              outputs=[output])

문제가 다음 세 가지 기능에 있다고 생각합니다.


def get_mixture_coef(output, num_components):
    """
    Extract mixture params from output
    """
    out_pi = output[:, :num_components]
    out_sigma = output[:, num_components:2*num_components]
    out_mu = output[:, 2*num_components:]
    # use softmax to normalize pi into prob distribution
    max_pi = K.max(out_pi, axis=1, keepdims=True)
    out_pi = out_pi - max_pi
    out_pi = K.exp(out_pi)
    normalize_pi = 1 / K.sum(out_pi, axis=1, keepdims=True)
    out_pi = normalize_pi * out_pi
    # use exp to ensure sigma is pos
    out_sigma = K.exp(out_sigma)
    return out_pi, out_sigma, out_mu

def get_lossfunc(out_pi, out_sigma, out_mu, y):
    d0 = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(
            probs=out_pi),
        components_distribution=tfd.Normal(
          loc=out_mu, scale=out_sigma,
        ),
    )
    # I suspect the problem is here
    return -1 * d0.log_prob(y)

def mdn_loss(num_components):
    def loss(y_true, y_pred):
        out_pi, out_sigma, out_mu = get_mixture_coef(y_pred, num_components)
        return get_lossfunc(out_pi, out_sigma, out_mu, y_true)
    return loss

opt = Adam(lr=.001)
model.compile(
    optimizer=opt,
    loss = mdn_loss(N_COMPONENTS),
)

es = EarlyStopping(monitor='val_loss',
                  min_delta=1e-5,
                  patience=5,
                  verbose=1, mode='auto')

validation = .15
validate_idx = np.random.choice(df.index.values, 
                                size=int(validation * df.shape[0]), 
                                replace=False)
train_idx = [i for i in df.index.values if i not in validate_idx]

x_cols = ['x_0', 'x_1', 'x_2', 'x_3']

model.fit(x=df.loc[train_idx, x_cols].values,
          y=df.loc[train_idx, 'y'].values[:, np.newaxis],
          validation_data=(
            df.loc[validate_idx, x_cols].values,
            df.loc[validate_idx, 'y'].values[:, np.newaxis]), 
          # model works when batch_size = 1
          # model fails when batch_size > 1
          epochs=2, batch_size=1, verbose=1, callbacks=[es])

def sample(output, n_samples, num_components):
    """Sample from a mixture distribution parameterized by
    model output."""
    pi, sigma, mu = get_mixture_coef(output, num_components)    
    d0 = tfd.MixtureSameFamily(
    mixture_distribution=tfd.Categorical(
        probs=pi),
    components_distribution=tfd.Normal(
      loc=mu,
      scale=sigma)) 
    return d0.sample(sample_shape=n_samples).numpy()

yhat = model.predict(df.loc[train_idx, x_cols].values)

out_pi, out_sigma, out_mu = get_mixture_coef(yhat, 2)

latent_1_samples = sample(yhat[:1], n_samples=1000, num_components=2)
latent_1_samples = pd.DataFrame({'latent_1_samples': latent_1_samples.ravel()})

fig, ax = plt.subplots()
bins = np.linspace(-4, 5, 9*4 + 1)
latent_1_samples.latent_1_samples.hist(ax=ax, bins=bins, label='Class 1: yHat', alpha=.4, density=True)
df.y[df.latent_class == 0].hist(ax=ax, bins=bins, label='Class 0: True', density=True, histtype='step')
df.y[df.latent_class == 1].hist(ax=ax, bins=bins, label='Class 1: True', density=True, histtype='step')
ax.legend();

미리 감사드립니다!

최신 정보

이 답변에 따라 문제를 해결하는 두 가지 방법을 찾았습니다 . 두 솔루션 모두 Keras가 y_pred와 일치하도록 y를 어색하게 방송한다는 사실을 지적합니다.

def get_lossfunc(out_pi, out_sigma, out_mu, y):
    d0 = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(
            probs=out_pi),
        components_distribution=tfd.Normal(
          loc=out_mu, scale=out_sigma,
        ),
    )
    # this also works: 
    # return -1 * d0.log_prob(tensorflow.transpose(y))
    return -1 * d0.log_prob(y[:, 0])

Tensorflow 전사

커뮤니티의 이익을 위해 Dan이 질문에 지정 했더라도 여기 (답변 섹션)에서 해결 방법을 지정합니다.

모두 y에 대해 동일한 분포를 예측 하고 무시 x하는 문제는 두 가지 방법으로 해결할 수 있습니다.

에 대한 코드 Solution 1는 다음과 같습니다.

def get_lossfunc(out_pi, out_sigma, out_mu, y):
    d0 = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(
            probs=out_pi),
        components_distribution=tfd.Normal(
          loc=out_mu, scale=out_sigma,
        ),
    )
    return -1 * d0.log_prob(tensorflow.transpose(y))

에 대한 코드 Solution 2는 다음과 같습니다.

def get_lossfunc(out_pi, out_sigma, out_mu, y):
    d0 = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(
            probs=out_pi),
        components_distribution=tfd.Normal(
          loc=out_mu, scale=out_sigma,
        ),
    )
    return -1 * d0.log_prob(y[:, 0])

도움이 되었기를 바랍니다. 행복한 학습!

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-03-31

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

Tensorflow Distribution을 사용하는 Keras 모델은 배치 크기> 1로 실패합니다.

Tensorflow Distribution을 사용하는 Keras 모델은 배치 크기> 1로 실패합니다.

Tensorflow 1.xx에서 .meta 체크 포인트의 모델을 일부로 사용하는 Tensorflow 2.0 모델을 저장하는 방법은 무엇입니까?

Tensorflow Keras Estimator는 기본 모델이 작동하는 동안 회귀 작업에서 실패합니다.

tensorflow v2.x 백엔드가있는 keras에서 tensorflow v1.x 백엔드로 keras 모델을로드하는 방법은 무엇입니까?

keras 모델에 Tensorflow 커스텀 손실을 사용하는 방법은 무엇입니까?

X, y 및 하나의 추가 배열을 생성하는 model.fit에 사용자 지정 데이터 생성기를 tensorflow.keras 모델에 입력하는 방법은 무엇입니까?

저장된 모델을 TensorFlow 또는 Keras로 변환하거나로드하는 방법은 무엇입니까?

배치 크기가 1 인 작은 LSTM 모델에서도 tf.keras OOM

Tensorflow Keras에서 모델 최적화 프로그램을 저장하지 않는 방법은 무엇입니까?

Tensorflow 2.2에서 여러 입력으로 Keras 모델을 훈련하는 방법은 무엇입니까?

Tensorflow ResNet 모델 로딩은 ** ~ 5GB의 RAM **을 사용합니다. 반면 가중치에서 로딩은 ~ 200MB 만 사용합니다.

Numpy`resize`는 -1 구문을 사용할 때 예기치 않은 배열 크기를 제공합니다.

Keras, Tensorflow : 두 개의 다른 모델 출력을 하나로 병합

Keras 모델을로드하는 방법은 무엇입니까?

keras의 다양한 배치 크기에 대한 학습 모델

Tensorflow Federated에서 여러 기능을 사용하여 모델을 빌드하는 방법은 무엇입니까?

tensorflow2 및 keras를 사용하여 다중 GPU에서 모델을 훈련하는 방법은 무엇입니까?

Tensorflow 또는 Keras는 모델 가중치 초기화를 어떻게 처리하며 언제 발생합니까?

tf.data.Dataset 생성기에서 tf.keras 모델을 사용하는 방법은 무엇입니까?

Tensorflow 모델을 저장하고로드하면 Keras 오류가 발생합니다.

TensorFlow 2.x에서 Tensorflow 1.x 저장된 모델을로드하는 방법은 무엇입니까?

다른 모델을 만들기위한 기준선으로 사용하기 위해 keras 모델에서 레이어를 제거하는 방법

pyspark UDF에서 tensorflow.keras 모델을 사용하면 피클 오류가 발생합니다.

Tensorflow에서 다중 GPU 모델 학습을 위해 tf.distribute.MirroredStrategy와 함께 keras.utils.Sequence 데이터 생성기를 사용하는 방법은 무엇입니까?

Keras : 기능 모델이 LSTM에 대해 여러 배치를 허용하도록합니다.

Keras 모델로 훈련 된 Tensorflow 2. *를 .onnx 형식으로 변환하는 방법은 무엇입니까?

사용자 지정 손실 함수로 Keras 모델을 생성하는 함수는 한 번만 작동합니다.

Withou onnx, pytorch 모델을 tensorflow 모델로 수동으로 변환하는 방법은 무엇입니까?

Keras에서 하나의 출력을 가진 모델의 손실로 여러 변수를 사용하는 방법은 무엇입니까?

Tensorflow 2.1을 사용하는 Keras 모델의 커스텀 측정 항목

Tensorflow ResNet 모델 로딩은 ~ 5GB의 RAM 을 사용합니다. 반면 가중치에서 로딩은 ~ 200MB 만 사용합니다.