RGB 이미지를 사용한 컨볼 루션-RGB 필터는 어떤 값을 보유합니까?

debugcn 에 게시 Dev

할 수있다

그레이 스케일 이미지의 컨볼 루션은 간단합니다. 모양 필터가 nxnx1있고 입력 이미지를 컨볼 루션하여 원하는 기능을 추출합니다.

또한 회선이 RGB 이미지에서 어떻게 작동하는지 이해합니다. 필터의 모양은 nxnx3. 그러나 필터의 3 개의 '계층'이 모두 동일한 커널을 보유할까요? 예를 들어 0 번째 레이어가 아래와 같이 맵이라면 레이어 1과 2도 정확한 값을 보유할까요? 나는 기존의 이미지 처리가 아닌 Convolutional Neural Networks와 관련하여 요청하고 있습니다. 각 필터의 가중치가 학습되고 처음에 무작위 화된다는 것을 이해합니다. 각 레이어가 다른 무작위 화 된 값을 가질 것이라고 생각하는 것이 맞습니까?

Anton 코드

필터의 3 개의 '계층'이 모두 동일한 커널을 보유합니까?

짧은 대답은 아니오입니다. 더 긴 대답은 레이어 당 커널이 아니라 모든 입력 및 출력 레이어를 한 번에 처리하는 하나의 커널 만 있다는 것입니다.

아래 코드는 각 컨볼 루션을 수동으로 계산하는 방법을 단계별로 보여 주며,이를 통해 높은 수준에서 계산이 다음과 같이 진행됨을 알 수 있습니다.

이미지 배치에서 패치를 가져옵니다 (귀하의 경우 BatchSize x 3x3x3).
평면화 [BatchSize, 27]
행렬에 재구성 된 커널을 곱합니다. [27, output_filters]
모양의 편향 추가 [output_filters]

모든 색상은 커널 행렬과 함께 행렬 곱셈을 사용하여 한 번에 처리됩니다. 커널 행렬에 대해 생각해 보면 첫 번째 필터를 생성하는 데 사용되는 커널 행렬의 값이 첫 번째 열에 있고 두 번째 필터를 생성하는 값이 두 번째 열에 있음을 알 수 있습니다. 따라서 실제로 값은 다르고 재사용되지 않지만 별도로 저장되거나 적용되지 않습니다.

코드 연습

import tensorflow as tf
import numpy as np

# Define a 3x3 kernel that after convolution will create an image with 2 filters (channels)
conv_layer = tf.keras.layers.Conv2D(filters=2, kernel_size=3)

# Lets create a random input image
starting_image = np.array( np.random.rand(1,4,4,3), dtype=np.float32)

# and process it
result = conv_layer(starting_image)
weight, bias = conv_layer.get_weights()
print('size of weight', weight.shape)
print('size of bias', bias.shape)

무게의 크기 (3, 3, 3, 2)

편향의 크기 (2,)

# The output of the convolution of the 4x4x3 image input 
# is a 2x2x2 output (because we don't have padding)
result.numpy()

배열 ([[[[-0.34940776, -0.6426925],

[-0.81834394, -0.16166998]],

[[-0.37515935, -0.28143463],

[-0.60084903, -0.5310158]]]], dtype = float32)

# Now let's see how we can recreate this using the weights

# The way convolution is done is to extract a patch
# the size of the kernel (3x3 in this case)
# We will use the first patch, the first three rows and columns and all the colors
patch = starting_image[0,:3,:3,:]
print('patch.shape' , patch.shape)

# Then we flatten the patch
flat_patch = np.reshape( patch, [1,-1] )
print('New shape is', flat_patch.shape)

patch.shape (3, 3, 3)

새 모양은 (1, 27)입니다.

# next we take the weight and reshape it to be [-1,filters]
flat_weight = np.reshape( weight, [-1,2] )
print('flat_weight shape is ',flat_weight.shape)

flat_weight 모양은 (27, 2)입니다.

# we have the patch of shape [1,27] and the weight of [27,2]
# doing a matric multiplication of the two shapes [1,27]*[27,2] = a shape of [1,2]
# which is the output we want, 2 filter outputs for this patch
output_for_patch = np.matmul(flat_patch,flat_weight)

# but we haven't added the bias yet, so lets do that
output_for_patch = output_for_patch + bias

# Finally, we can see that our manual calculation matches 
# what Conv2D does exactly for the first patch

output_for_patch

배열 ([[-0.34940773, -0.64269245]], dtype = float32)

이것을 위의 전체 컨볼 루션과 비교하면 이것이 정확히 첫 번째 패치임을 알 수 있습니다.

배열 ([[[[-0.34940776, -0.6426925],

[-0.81834394, -0.16166998]],

[[-0.37515935, -0.28143463],

[-0.60084903, -0.5310158]]]], dtype = float32)

각 패치에 대해이 프로세스를 반복합니다. 이 코드를 좀 더 최적화하려면 한 번에 하나의 이미지 패치 만 전달하는 대신 [1,27] 한 번에 [batch_number, 27] 패치를 전달할 수 있으며 커널은 [batch_number, filter_size].

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-04-5

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사