How to select most important features? Feature Engineering

Question

How to select most important features? Feature Engineering

asked May 7, 2022 in Education by JackTerrance

I used the function for gower distance from this link: https://sourceforge.net/projects/gower-distance-4python/files/. My data (df) is such that each row is a trade, and each of the columns are features. Since it contains a lot of categorical data, I then converted the data using gower distance to measure "similarity"... I hope this is correct (as below..): D = gower_distances(df) distArray = ssd.squareform(D) hierarchal_cluster=scipy.cluster.hierarchy.linkage(distArray, method='ward', metric='euclidean', optimal_ordering=False) I then plot the hierarchical_cluster from above into a dendogram: plt.title('Hierarchical Clustering Dendrogram (truncated)') plt.xlabel('sample index or (cluster size)') plt.ylabel('distance') dendrogram( hierarchal_cluster, truncate_mode='lastp', # show only the last p merged clusters p=15, # show only the last p merged clusters leaf_rotation=90., leaf_font_size=12., show_contracted=True # to get a distribution impression in truncated branches ) I cannot show it, since I do not have enough privilege points, but on the dendogram I can see separate colors. What is the main discriminator separating them? How can I find this out? How can I use PCA to extract useful features? Do I pass my 'hierarchal_cluster' into a PCA function? Something like the below..? pca = PCA().fit(hierarchal_cluster.T) plt.plot(np.arange(1,len(pca.explained_variance_ratio_)+1,1),pca.explained_variance_ratio_.cumsum()) JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

Related questions

0 votes

Q: How to add a new column into a dataframe based on rows of an other dataframe?

I have two Dataframes : DF1(That i've just resampled): Mi_pollution.head(): Sensor_ID Time_Instant ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 3, 2022 in Education by JackTerrance

0 votes

Q: How to fix a problem with input() in python?

I'm creating my first program on python. The objective is to get an output of trip cost. In the ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 21, 2022 in Education by JackTerrance

0 votes

Q: How to overcome the Could not convert String to Float?

Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 20, 2022 in Education by JackTerrance

0 votes

Q: How to overcome the Could not convert String to Float?

Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 20, 2022 in Education by JackTerrance

0 votes

Q: How to retrieve value from selected radiobutton after root.mainloop()?

I am looking to write a pop-up window which asks the user to select a specific option, and if ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 17, 2022 in Education by JackTerrance

0 votes

Q: How does my discord bot find a message by message.id? (previously posted by the bot)

I need to find a message the bot has previously posted, then see the reactions on it. I don't ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 14, 2022 in Education by JackTerrance

0 votes

Q: How to properly ignore exceptions

When you just want to do a try-except without handling the exception, how do you do it in Python? Is the ... rmtree(path) except: pass Select the correct answer from above options...

asked Jan 26, 2022 in Education by JackTerrance

0 votes

Q: How can I force division to be floating point? Division keeps rounding down to 0?

I have two integer values a and b, but I need their ratio in floating point. I know that a < b, and I want ... the following? c= a / b Select the correct answer from above options...

asked Jan 26, 2022 in Education by JackTerrance

0 votes

Q: How can I represent an 'Enum' in Python?

Can anyone tell me how I can represent the equivalent of an Enum in Python? Select the correct answer from above options...

asked Jan 22, 2022 in Education by JackTerrance

0 votes

Q: skLearn fitting data input fails even though numpy data shape is correct

I am trying to fit some (numpy) data into python skLearn modules, but keep getting error messages. When ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jun 3, 2022 in Education by JackTerrance

0 votes

Q: Set pandas names after list items

I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jun 3, 2022 in Education by JackTerrance

0 votes

Q: Convert datetime series to numeric timestamp [duplicate]

This question already has answers here: Convert pandas DateTimeIndex to Unix Time? (7 answers) Closed 3 ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 26, 2022 in Education by JackTerrance

0 votes

Q: Python traceback for coroutine

Let's say I have the following code: from types import coroutine @coroutine def stop(): yield 1 async ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 17, 2022 in Education by JackTerrance

0 votes

Q: Set pandas names after list items

I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 7, 2022 in Education by JackTerrance

0 votes

Q: AttributeError: 'GradientBoostingRegressor' object has no attribute 'np'

I want to have root mean squared of gradient boosting algorithm but when I want to print it, I ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 26, 2022 in Education by JackTerrance

JackTerrance · Answer 1 · 2022-05-07T02:21:26+0000

I hope you do know that PCA works only for continuous data? Since you mentioned, there are many categorical features. From what you have written, it occurs that you got mixed data. A common practice when dealing with mixed data is to separate the continuous and categorical features/variables. Then find the Euclidean distance between data points for continuous (or numerical) features and Hamming distance for the categorical features [1]. This will enable you to find similarity between continuous and categorical feature separately. Now, while you are at this, apply PCA on the continuous variables to extract important features. And apply Multiple Correspondence Analysis MCA on the categorical features. Thereafter, you can combine the obtained relevant features together, and apply any clustering algorithm. So essentially, I'm suggesting feature selection/feature extraction before clustering. [1] Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), pp.283-304.