Matlab-直方图的熵比较

debugcn 发表于 Dev

用户名

我试图了解向量的熵。我首先从均值130和方差1的正态分布中生成大小为1000000的样本：

kk=normrnd(130,20,1000000,1);
kk=uint8(kk);%did this or else the result was 0
entropy(kk)

在imhistKK的是：

熵结果为6.3686

然后，按照与获取噪声较大的分布相同的步骤，从均值130和方差1的正态分布中生成大小为1000的样本，这是直方图：

And the entropy is 6.2779. So it seems the noisier the distribution the smaller the entropy. I calculated the entropies for other sample sizes of a normal distribution with same mean and variance and it changes according to this. But am I right? Is this the right way to compare entropies of histogram distributions?

[EDITION]

After what obchardon said I investigated a bit more. This distribution:

kk1=normrnd(130,75,1000000,1);%entropy=7.6983

gives me a bigger entropy than:

kk2=normrnd(130,20,1000000,1);%entropy=6.3686

but this one's entropy is smaller than kk1 and kk2:

kka2=normrnd(130,150,100000,1);%entropy=6.1660

How is this possible?

obchardon

The entropy formula is biased for small vector:

For example:

We generate a 10x1 normally distributed vector:

n = 10

kk=normrnd(130,20,n,1);
kk=uint8(kk);

Now we calculate the entropy:

kk = im2double(kk);
P = hist(kk(:), linspace(0, 1, 256)); 
P = P(:); P = P(P(:)>0); %we need to delete the value where P = 0, because log(0) = Inf.
P = P/n;
E = -sum(P.*log2(P))

So in this example the entropy will never be higher than -sum(n*(1/n)*log2(1/n)) = 3.32 ! (worst case where each kk value are differents)

因此，@ TasosPapastylianou是正确的：熵是（仅）其方差的函数，但仅当时 $n --> Inf$ 。