Calculate mean on values in python collections.Counter

chrisinmtown

I'm profiling some numeric time measurements that cluster extremely closely. I would like to obtain mean, standard deviation, etc. Some inputs are large, so I thought I could avoid creating lists of millions of numbers and instead use Python collections.Counter objects as a compact representation.

Example: one of my small inputs yields a collection.Counter like [(48, 4082), (49, 1146)] which means 4,082 occurrences of the value 48 and 1,146 occurrences of the value 49. For this data set I manually calculate the mean to be something like 48.2192042846.

Of course if I had a simple list of 4,082 + 1,146 = 5,228 integers I would just feed it to numpy.mean().

My question: how can I calculate descriptive statistics from the values in a collections.Counter object just as if I had a list of numbers? Do I have to create the full list or is there a shortcut?

Jakub Wasilewski

While you can offload everything to numpy after making a list of values, this will be slower than needed. Instead, you can use the actual definitions of what you need.

The mean is just the sum of all numbers divided by their count, so that's very simple:

sum_of_numbers = sum(number*count for number, count in counter.items())
count = sum(count for n, count in counter.items())
mean = sum_of_numbers / count

Standard deviation is a bit more complex. It's the square root of variance, and variance in turn is defined as "mean of squares minus the square of the mean" for your collection. Soooo...

total_squares = sum(number*number * count for number, count in counter)
mean_of_squares = total_squares / count
variance = mean_of_squares - mean * mean
std_dev = math.sqrt(variance)

A little bit more manual work, but should also be much faster if the number sets have a lot of repetition.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Is there a Python standard library class like collections.Counter() but with with lists as values?

From Dev

Calculate mean of calculated values

From Dev

Calculate mean difference values

From Dev

Calculate the mean of values in a loop

From Dev

Python Collections Counter for a List of Dictionaries

From Dev

Python collections.Counter efficiency

From Dev

Calculate dataframe mean by skipping certain values in Python / Pandas

From Dev

Why does a Python collections.Counter get mutated when its values are referenced

From Dev

Python counter and values

From Dev

How to calculate mean in python?

From Dev

python pandas calculate a mean

From Dev

Calculate the mean values if the numbers are same

From Dev

Python Runtime of collections.Counter Equality

From Java

What is the time complexity of collections.Counter() in Python?

From Java

Python: Collections.Counter vs defaultdict(int)

From Dev

Exclude zeros in collections.Counter in Python

From Dev

Algorithm used by Counter in python collections package?

From Dev

python collections counter cannot display histograme

From Dev

Accumulating collections Counter in Python becomes slower as accumulated counter increases in size

From Dev

Python 2.7: applying str to collections.Counter and collections.defaultdict

From Dev

Sum values of dynamically created dictionaries using Counter from collections

From Dev

collections.Counter, is there any way to avoid adding string values?

From Dev

Modification of collections.Counter/Dictionary (to not store negative values)

From Dev

Python Counter keys() return values

From Dev

Calculate counter for each month for dataframe in Python

From Dev

calculate daily mean of an array in python

From Dev

Calculate probability density mean python

From Dev

How to calculate mean of every three values of a list

From Dev

Calculate mean for each row containing lists of values

Related Related

  1. 1

    Is there a Python standard library class like collections.Counter() but with with lists as values?

  2. 2

    Calculate mean of calculated values

  3. 3

    Calculate mean difference values

  4. 4

    Calculate the mean of values in a loop

  5. 5

    Python Collections Counter for a List of Dictionaries

  6. 6

    Python collections.Counter efficiency

  7. 7

    Calculate dataframe mean by skipping certain values in Python / Pandas

  8. 8

    Why does a Python collections.Counter get mutated when its values are referenced

  9. 9

    Python counter and values

  10. 10

    How to calculate mean in python?

  11. 11

    python pandas calculate a mean

  12. 12

    Calculate the mean values if the numbers are same

  13. 13

    Python Runtime of collections.Counter Equality

  14. 14

    What is the time complexity of collections.Counter() in Python?

  15. 15

    Python: Collections.Counter vs defaultdict(int)

  16. 16

    Exclude zeros in collections.Counter in Python

  17. 17

    Algorithm used by Counter in python collections package?

  18. 18

    python collections counter cannot display histograme

  19. 19

    Accumulating collections Counter in Python becomes slower as accumulated counter increases in size

  20. 20

    Python 2.7: applying str to collections.Counter and collections.defaultdict

  21. 21

    Sum values of dynamically created dictionaries using Counter from collections

  22. 22

    collections.Counter, is there any way to avoid adding string values?

  23. 23

    Modification of collections.Counter/Dictionary (to not store negative values)

  24. 24

    Python Counter keys() return values

  25. 25

    Calculate counter for each month for dataframe in Python

  26. 26

    calculate daily mean of an array in python

  27. 27

    Calculate probability density mean python

  28. 28

    How to calculate mean of every three values of a list

  29. 29

    Calculate mean for each row containing lists of values

HotTag

Archive