Comparing vectors in an array of vectors

user3625380

I'm very badly stuck, and every pythonista I've asked can't seem to help.

I'm using vstack to create an array of vectors in a loop like this:

Corr = np.vstack((Corr, S))

I need to remove repeating vectors so that it is an array of unique vectors and to compare all of these vectors.

I know that this comparison can be done in lists, but I have not found a way to append full vectors to a list.

This is the result (I've marked unique vectors with unique letters):

Corr = [[ 0.  0.  0.  0. -2.  4.  4.  2.  2.] #a
 [-4. -4. -4. -4.  2.  4.  4.  2.  2.]#b
 [-4.  0.  0.  4. -2.  0.  0. -2.  2.]#c
 [ 0. -4. -4.  0.  2.  0.  0. -2.  2.]#d
 [ 0. -4.  4.  0. -2.  0.  0.  2. -2.]#e
 [-4.  0.  0. -4.  2.  0.  0.  2. -2.]#f
 [-4. -4.  4.  4. -2.  4. -4. -2. -2.]#g
 [ 0.  0.  0.  0.  2.  4. -4. -2. -2.]#h
 [ 0.  4. -4.  0. -2.  0.  0.  2. -2.]#i
 [-4.  0.  0. -4.  2.  0.  0.  2. -2.]#f
 [-4.  4. -4.  4. -2. -4.  4. -2. -2.]#j
 [ 0.  0.  0.  0.  2. -4.  4. -2. -2.]#k
 [ 0.  0.  0.  0. -2. -4. -4.  2.  2.]#l
 [-4.  4.  4. -4.  2. -4. -4.  2.  2.]#m
 [-4.  0.  0.  4. -2.  0.  0. -2.  2.]#n
 [ 0.  4.  4.  0.  2.  0.  0. -2.  2.]#o
 [ 4.  0.  0. -4. -2.  0.  0. -2.  2.]#c
 [ 0. -4. -4.  0.  2.  0.  0. -2.  2.]#d
 [ 0.  0.  0.  0. -2. -4. -4.  2.  2.]#p
 [ 4. -4. -4.  4.  2. -4. -4.  2.  2.]#q
 [ 4. -4.  4. -4. -2. -4.  4. -2. -2.]#r
 [ 0.  0.  0.  0.  2. -4.  4. -2. -2.]#k
 [ 0. -4.  4.  0. -2.  0.  0.  2. -2.]#e
 [ 4.  0.  0.  4.  2.  0.  0.  2. -2.]#s
 [ 4.  4. -4. -4. -2.  4. -4. -2. -2.]#t
 [ 0.  0.  0.  0.  2.  4. -4. -2. -2.]#h
 [ 0.  4. -4.  0. -2.  0.  0.  2. -2.]#i
 [ 4.  0.  0.  4.  2.  0.  0.  2. -2.]#s
 [ 4.  0.  0. -4. -2.  0.  0. -2.  2.]#u
 [ 0.  4.  4.  0.  2.  0.  0. -2.  2.]#o
 [ 0.  0.  0.  0. -2.  4.  4.  2.  2.]]#a

I don't know why vstack is adding a period instead of a comma (in the loops each vector S has a comma when I print it separately!).

I need the end result to be an array of unique vectors, (so in this case it'll be vectors a-u ie, 21 vectors).

jakevdp

If you convert your vectors to tuples, you can put them in a set which will automatically discard duplicates. For example:

unique_vectors = set(map(tuple, Corr))

array_of_unique_vectors = np.array(list(unique_vectors))

Edit: I was curious, so I quickly benchmarked the three proposed solutions here. The results are the same up to the order of the returned elements, and it appears that the Pandas drop_duplicates method outperforms the others.

import numpy as np
import pandas as pd

def unique_set(a):
    return np.vstack(set(map(tuple, a)))

def unique_numpy(a):
    a = np.ascontiguousarray(a)
    view = a.view(np.dtype(('void', a.itemsize * a.shape[1])))
    unique = np.unique(view)
    return unique.view(a.dtype).reshape(-1, a.shape[1])

def unique_pandas(a):
    return pd.DataFrame(a).drop_duplicates().values

a = np.random.randint(0, 5, (100000, 5))

%timeit unique_set(a)
10 loops, best of 3: 183 ms per loop

%timeit unique_numpy(a)
10 loops, best of 3: 43.1 ms per loop

%timeit unique_pandas(a)
100 loops, best of 3: 10.3 ms per loop

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related