Fastest way to Group a large number of Elements in C++

NIZGTR Published at Dev

NIZGTR

I need a way to quickly group a large number of connections, currently 300k, into groups where each group has a max number of elements allowed, currently 14k, and all of the connections in the same group cannot be connected to the same point. Basically, each connection is between two points and I need them grouped into buckets where the connections in a bucket don't share a point. Hopefully that makes sense.

Here's what I have so far, which works but is rather slow:

for (size_t i = 0; i < ConnectionGroups.size(); i++)
{
    auto& group = ConnectionGroups[i];
    if (group.size() < MaxConnectionGroupSize) // Has room for us...
    {
        int validGroupIdx = i;
        for (size_t gIdx = 0; gIdx < group.size(); gIdx++)
        {
            const auto groupConnection = ConnectionsQuickAccess[group[gIdx]];

            // Are we directly connected to one of the Connections in this group by one degree...
            if (Connection.Point1 == groupConnection->Point1 || Connection.Point1 == groupConnection->Point2 ||
                Connection.Point2 == groupConnection->Point1 || Connection.Point2 == groupConnection->Point2)
            {
                validGroupIdx = -1;
                break; // We are, check the next group
            }
        }

        if (validGroupIdx != -1)
        {
            ConnectionGroups[i].push_back(Connection.Slot);
            Connection.Group = i;
            return;
        }
        else
            continue;
    }
}

// All groups are full, create a new group
vector<int> newGroup;
newGroup.push_back(Connection.Slot);
ConnectionGroups.push_back(newGroup);

This code takes 29.68s to go through 300k connections, is there a faster way of doing this? Or maybe a different approach to this?

Thank you!

Dietmar Kühl

It seems the code posted processes one connection, i.e., it is called n times where n is the number of connections. The algorithm is clearly O(n * n): the time it takes to add new connections grows quadratic - something you generally don't want.

Unless memory is the primary constraints, I'd simple store for each group a hash containing all existing end-points and check against that, i.e., something along the lines of

for (std:size_t i(0); i != ConnectionGroups.size(); ++i) {
    if (ConnectionGroups[i].size () < MaxConnectionGroupSize)
        && !ConnectionGroups[i].usesEndPoint(Connection.Point1)
        && !ConnectionGroups[i].usesEndPoint(Connection.Point2)) {
        ConnectionGroups[i].addEndPoint(Connection.Point1);
        ConnectionGroups[i].addEndPoint(Connection.Point2);
        ConnectionGroups[i].addConnection(Connection);
    }
}

Obviously, ConnectionGroups[i] would be a combination of connections and a hash of end-points accessed with the corresponding functions.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-21

Comments

0 comments

From Dev

Related Related

Article

Fastest way to Group a large number of Elements in C++

Fastest way to Group a large number of Elements in C++

Fastest way to remove huge number of elements from an array in C

Fastest way to sort a large number of arrays in python

Fastest way to find number of elements in a range

C++ : Fastest Way to find number of elements of one array in another array

What is the fastest way to format a large number of date/time objects in Android?

Python fastest way to read a large number of small files into memory?

What is the fastest way to find the exact value of the factorial of a large number in python?

Python fastest way to read a large number of small files into memory?

Fastest way to read very large text file in C#

Fastest way to split a large list of integers into List of strings in C#

Fastest way to process a large file?

Best (fastest) way to find the number most frequently entered in C?

Best (fastest) way to find the number most frequently entered in C?

Fastest way to compute large number of 3x3 dot product

Code chef :ODD , fastest way to calculate the nearest power of 2 for a given large number?

Fastest way to get a large number of nodes from Neo4j using py2neo

Code chef :ODD , fastest way to calculate the nearest power of 2 for a given large number?

What is the fastest way to get a large number of time ranges using Apache Spark?

Fastest way to get elements in viewport

Fastest way to count number of lines?

Fastest way to count number of lines?

What is the fastest compression method for a large number of files?

Fastest way to preload/load large images

Fastest way to match substring from large dict

Fastest way in R to compute the inverse for large matrices

Copy large files in fastest way possible

Fastest way to search string in large text file

MongoDB: fastest way to compare two large sets

Fastest way to write large CSV with Python