in Education by
I used the function for gower distance from this link: https://sourceforge.net/projects/gower-distance-4python/files/. My data (df) is such that each row is a trade, and each of the columns are features. Since it contains a lot of categorical data, I then converted the data using gower distance to measure "similarity"... I hope this is correct (as below..): D = gower_distances(df) distArray = ssd.squareform(D) hierarchal_cluster=scipy.cluster.hierarchy.linkage(distArray, method='ward', metric='euclidean', optimal_ordering=False) I then plot the hierarchical_cluster from above into a dendogram: plt.title('Hierarchical Clustering Dendrogram (truncated)') plt.xlabel('sample index or (cluster size)') plt.ylabel('distance') dendrogram( hierarchal_cluster, truncate_mode='lastp', # show only the last p merged clusters p=15, # show only the last p merged clusters leaf_rotation=90., leaf_font_size=12., show_contracted=True # to get a distribution impression in truncated branches ) I cannot show it, since I do not have enough privilege points, but on the dendogram I can see separate colors. What is the main discriminator separating them? How can I find this out? How can I use PCA to extract useful features? Do I pass my 'hierarchal_cluster' into a PCA function? Something like the below..? pca = PCA().fit(hierarchal_cluster.T) plt.plot(np.arange(1,len(pca.explained_variance_ratio_)+1,1),pca.explained_variance_ratio_.cumsum()) JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
I hope you do know that PCA works only for continuous data? Since you mentioned, there are many categorical features. From what you have written, it occurs that you got mixed data. A common practice when dealing with mixed data is to separate the continuous and categorical features/variables. Then find the Euclidean distance between data points for continuous (or numerical) features and Hamming distance for the categorical features [1]. This will enable you to find similarity between continuous and categorical feature separately. Now, while you are at this, apply PCA on the continuous variables to extract important features. And apply Multiple Correspondence Analysis MCA on the categorical features. Thereafter, you can combine the obtained relevant features together, and apply any clustering algorithm. So essentially, I'm suggesting feature selection/feature extraction before clustering. [1] Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), pp.283-304.

Related questions

0 votes
    I have two Dataframes : DF1(That i've just resampled): Mi_pollution.head(): Sensor_ID Time_Instant ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 3, 2022 in Education by JackTerrance
0 votes
    I'm creating my first program on python. The objective is to get an output of trip cost. In the ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 21, 2022 in Education by JackTerrance
0 votes
    Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 20, 2022 in Education by JackTerrance
0 votes
    Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 20, 2022 in Education by JackTerrance
0 votes
    I am looking to write a pop-up window which asks the user to select a specific option, and if ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 17, 2022 in Education by JackTerrance
0 votes
    I need to find a message the bot has previously posted, then see the reactions on it. I don't ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 14, 2022 in Education by JackTerrance
0 votes
    When you just want to do a try-except without handling the exception, how do you do it in Python? Is the ... rmtree(path) except: pass Select the correct answer from above options...
asked Jan 26, 2022 in Education by JackTerrance
0 votes
    I have two integer values a and b, but I need their ratio in floating point. I know that a < b, and I want ... the following? c= a / b Select the correct answer from above options...
asked Jan 26, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me how I can represent the equivalent of an Enum in Python? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    I am trying to fit some (numpy) data into python skLearn modules, but keep getting error messages. When ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 3, 2022 in Education by JackTerrance
0 votes
    I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 3, 2022 in Education by JackTerrance
0 votes
    This question already has answers here: Convert pandas DateTimeIndex to Unix Time? (7 answers) Closed 3 ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 26, 2022 in Education by JackTerrance
0 votes
    Let's say I have the following code: from types import coroutine @coroutine def stop(): yield 1 async ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 17, 2022 in Education by JackTerrance
0 votes
    I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 7, 2022 in Education by JackTerrance
0 votes
    I want to have root mean squared of gradient boosting algorithm but when I want to print it, I ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 26, 2022 in Education by JackTerrance
...