Calculate mean of HashMap values after insertion

Juergen Published at Dev

Juergen

I want to efficiently calculate two means of values of a HashMap each time a new key/value pair is inserted.

Suppose we currently have this HashMap<Double, Double>:

3 4
5 6
8 8
1 3
6 8 <- Latest insertion

The latest insertion was the key 6 with value 8.

The first mean to calculate consists of all values which keys are smaller than the inserted one, which is 6.

These are the values 4,6,3 of the keys 3,5,1, so the mean is (4+6+3)/3=4.3...

The second mean is the "opposite", so the mean of all values for all keys greater than 6.

The key 8 with value 1 gives this mean as 8/1=8.

Now, a new key/pair gets inserted:

3 4
5 6
6 8
8 8
1 3
4 9 <- Latest insertion

So again, we need to calculate the mean for all values with keys smaller than 4.

These are the values 4,3 for the keys 3,1, so the "smaller mean" is now (4+3)/2=3.5

The "greater mean" is now (6+8+8)/3=7.3... for the key/value pairs 5/6,6/8,8/8.

A naive implementation might be something like this:

public class CalculateMapMean {

        private double smallerMean = 0.0;
        private double greaterMean = 0.0;

        private HashMap<Double, Double> someMap = new HashMap<Double, Double>();

        public void calculateMeans(double latestInsertedKey) {
            double sumGreater = 0;
            double sumSmaller = 0;
            double sumGreaterCount = 0;
            double sumSmallerCount = 0;
            for (Map.Entry<Double, Double> entry : someMap.entrySet()) {
                double key = entry.getKey();
                double value = entry.getValue();
                if (key > latestInsertedKey) {
                    sumGreater += value;
                    ++sumGreaterCount;
                }
                else if (key < latestInsertedKey) {
                    sumSmaller += value;
                    ++sumSmallerCount;
                }
            }
            if (sumGreaterCount != 0) {
                greaterMean = sumGreater / sumGreaterCount;
            }
            else {
                greaterMean = 0.0;
            }
            if (sumSmallerCount != 0) {
                smallerMean = sumSmaller / sumSmallerCount;
            }
            else {
                smallerMean = 0.0;
            }
        }
    }

The question is if the calculations of the means can be dramatically improved with a TreeMap or another datastrure such that one does not to have iterate over all keys on every insertion.

Is there an elegant way of reusing former calculations?

Vincent van der Weele

The only way I can think of to get below O(n) time for every change to the map is by keeping a balanced binary search tree (BBST) with the keys. In every node you need to keep few extra fields

the number of nodes in the subtree rooted at that node
the sum of the values of all nodes rooted at that node

Rebalancing a BBST after an insert/delete takes O(log n) time. In that same balance operation you can update the count and sum, also in O(log n) time (since you perform O(log n) operations that take O(1) time).

To get the correct means you need to traverse the tree and add the right counts. Let's give a simple example. Suppose I have the following 7 key-value pairs. I hope you can imagine how the corresponding BBST would look.

(3, 5) (4, 3) (7, 1) (8, 4) (11, 3) (12, 1)(13, 3)

In the root - (8, 4) - the total count and sum is stored: [7, 20]. In the root of the left subtree - (4, 3) - the total count and sum of that subtree is stored: [3, 9]. I draw these extra values now as a function of depth in the tree:

[         7, 20        ]
[   3, 9   ][   3, 7   ]
[1, 5][1, 1][1, 3][1, 3]

Suppose I add a new tuple with key 10 now. I start traversing the tree at the root. Because 8 < 10, I don't need to traverse the left subtree: all keys in that subtree are smaller than 10, so we can use the cached values [3, 9]. For the right subtree, we need to recurse, because some keys might be smaller than 10 and some might be larger. We don't have to traverse the right subtree there, because 12 > 10, so we can use [1, 3] directly.

In every layer of the tree, we can ignore one branch and recurse on the other. Therefore, finding the total value and count for keys smaller than the last inserted key and for keys larger than the last inserted key takes O(log n) time as well.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-03-17

Comments

0 comments

From Dev

Related Related

Article

Calculate mean of HashMap values after insertion

Calculate mean of HashMap values after insertion

Calculate mean of calculated values

Calculate mean difference values

Calculate the mean of values in a loop

Is there a Java HashMap implementation where key-values can't be changed after initial insertion?

Is the order of values retrieved from a HashMap the insertion order

Key object is modified after insertion in HashMap

How to find size in Memory after insertion in HashMap?

Calculate the mean values if the numbers are same

Calculate mean after new recording

Get 5 highest values from a hashmap while preserving the insertion order

How to calculate mean of every three values of a list

Calculate mean for each row containing lists of values

Calculate mean change in values one day later

Calculate mean cell values over different files

Calculate max, min and mean values of element in an array

Calculate mean on values in python collections.Counter

Calculate mean with a filter on a column's values

Calculate mean for only three values per row

Pythonic way to calculate the mean and variance of values in Counters

Sum tuples values to calculate mean - RDD

JavaScript; calculate a mean, excluding certain values

How to calculate mean of values per unique class

Calculate mean of row after X entries

R calculate how many values used to calculate mean in aggregate function

Sort after values in an Object inside HashMap

Hashmap + Insertion Order

Sorting object on insertion in HashMap

Duplicate Value insertion in Hashmap

How to calculate size in memory after inserting data in HashMap and LinkedHashMap?