I have a dataset that looks like this:
search_term = ['computer','usb port', 'phone adaptor']
clicks = [3,2,1]
bounce = [0,0,2]
conversion = [4,1,0]
I want to feed it into a kmeans model however i am having trouble transforming the lists into a matrix format so that it can be ingested by kmeans. I also want to reduce the dimensions with PCA so it can be visualized in a 2d plot.
This is what my code looks like:
X = np.array(clicks, bounce, conversion)
y = np.array(search_terms)
num_clusters = 3
pca = PCA(n_components=2, whiten=True).fit(X)
X_pca = pca.transform(X)
km=KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(X_pca)
print km.labels_[:10]
This is the error i got:
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
Also, once the clustering is done, i would hope to be able to see which search terms fall into which cluster so i'm not sure if setting y = np.array(search_terms) is correct?
Please advise.
The following code should work. Let me know if this is not the case.
import numpy as np
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
search_terms = ['computer','usb port', 'phone adaptor']
clicks = [3,2,1]
bounce = [0,0,2]
conversion = [4,1,0]
X = np.array([clicks, bounce, conversion]).T
y = np.array(search_terms)
num_clusters = 3
X_pca = PCA(n_components=2, whiten=True).fit_transform(X)
km = KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(X_pca)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments