Transforming data for kmeans and PCA

jxn

I have a dataset that looks like this:

search_term = ['computer','usb port', 'phone adaptor']
clicks = [3,2,1]
bounce = [0,0,2]
conversion = [4,1,0]

I want to feed it into a kmeans model however i am having trouble transforming the lists into a matrix format so that it can be ingested by kmeans. I also want to reduce the dimensions with PCA so it can be visualized in a 2d plot.

This is what my code looks like:

X = np.array(clicks, bounce, conversion)
y = np.array(search_terms)
num_clusters = 3

pca = PCA(n_components=2, whiten=True).fit(X)
X_pca = pca.transform(X)

km=KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(X_pca)

print km.labels_[:10]

This is the error i got:

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

Also, once the clustering is done, i would hope to be able to see which search terms fall into which cluster so i'm not sure if setting y = np.array(search_terms) is correct?

Please advise.

Jianxun Li

The following code should work. Let me know if this is not the case.

import numpy as np
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

search_terms = ['computer','usb port', 'phone adaptor']
clicks = [3,2,1]
bounce = [0,0,2]
conversion = [4,1,0]

X = np.array([clicks, bounce, conversion]).T
y = np.array(search_terms)

num_clusters = 3

X_pca = PCA(n_components=2, whiten=True).fit_transform(X)

km = KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(X_pca)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Transforming data

From Dev

PCA output looks weird for a kmeans scatter plot

From Dev

Transforming a simple data frame

From Dev

Transforming Data into a timeline

From Dev

How does PCA gives centers for the Kmeans algorithm in scikit learn

From Dev

Transforming panel data in R/Excel

From Dev

transforming json data using recursion

From Dev

Transforming data in a column to transactions in R

From Dev

Data transforming style of label in a tablerow

From Dev

transforming json data using recursion

From Dev

transforming data using sed commands

From Dev

Transforming Table data into new view

From Dev

Excel 2013 - Transforming data in Excel

From Dev

transforming range data to mean in R

From Dev

PCA biplot of data subset

From Dev

Incremental PCA on big data

From Dev

R_transforming raw data to column data

From Dev

Transforming JSON data to match new Java model

From Dev

Transforming data frame to a selection list in selectInput (Shiny)

From Java

From controller, data is not transforming to html view page

From Dev

Transforming data from a runtime storage to a case class

From Dev

Transforming irregular data into usable format in R

From Dev

Transforming relational data bases to graph databases

From Dev

From controller, data is not transforming to html view page

From Dev

transforming complex wide data to long in R

From Dev

transforming JSON data returned from AJAX request

From Dev

Transforming nested xml data using xslt

From Dev

Transforming pandas data frame using stack function

From Dev

sql server - transforming data with dynamic pivot