Count, compare and aggregate a pandas dataframe

Question

Count, compare and aggregate a pandas dataframe

asked Apr 21, 2022 in Education by JackTerrance

My data looks as follows: ID my_val db_val a X X a X X a Y X b X Y b Y Y b Y Y c Z X c X X c Z X Expected result : ID my_val db match a X:2;Y:1 X full_match b Y:2;X:1 Y full_match c z:2;X:1 X partial_match a full_match is when db_val matches the most abundant my_val a partial_match is when db_val is in the other values but doesn't match the top one. My current approach consists of grouping by ID then counting values into a seperate column then concatenating the value and its count, then aggregating all values into one row for each ID. This is how I aggregate the columns: def all_hits_aggregate_df(df, columns=['my_val']): grouped = data.groupby('ID') l=[] for c in columns: res = grouped[c].value_counts(ascending=False, normalize=False).to_frame('count_'+c).reset_index(level=1) res[c] = res[c].astype(str) +':'+ res['count_'+c].astype(str) l.append(res.groupby('ID').agg(lambda x: ';'.join(x))) return reduce(lambda x, y: pd.merge(x, y, on = 'ID'), l) And for the comparison phase, I loop through each row and parse the my_val column into lists then do the comparison. I am sure that the way I do the comparison step is extremely inefficient but I am unsure how I would do it before aggregation to avoid having to parse the generated string later in the process. JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

Related questions

0 votes

Q: Pandas dataframe compare values == none / nothing / null

I have 2 columns in the python dataframe. I want to check each row in my Column A for any value that ... for this particular purpose. Select the correct answer from above options...

asked Jan 9, 2022 in Education by JackTerrance

0 votes

Q: count the frequency that a value occurs in a dataframe column

I have a dataset |category| cat a cat b cat a I'd like to be able to return something like (showing unique values ... cat a 2 cat b 1 Select the correct answer from above options...

asked Jan 27, 2022 in Education by JackTerrance

0 votes

Q: What is the difference between size and count in pandas?

What exactly is the difference between groupby("x").count and groupby("x").size in Pandas? Select the correct answer from above options...

asked Jan 21, 2022 in Education by JackTerrance

0 votes

Q: Converting a Pandas GroupBy output from Series to DataFrame

I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", ... Any hints would be welcome. Select the correct answer from above options...

asked Jan 28, 2022 in Education by JackTerrance

0 votes

Q: Combine two columns of text in dataframe in pandas/python

I have a 20 x 4000 dataframe in python using pandas. Two of these columns are named Year and quarter. I'd ... anyone help with that? Select the correct answer from above options...

asked Jan 28, 2022 in Education by JackTerrance

0 votes

Q: Creating an empty Pandas DataFrame, then filling it?

I'm starting from the pandas DataFrame docs here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html I'd ... =1)] print valdict Select the correct answer from above options...

asked Jan 28, 2022 in Education by JackTerrance

0 votes

Q: Get list from pandas DataFrame column headers

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I don ... 'gdp', 'cap'] Select the correct answer from above options...

asked Jan 26, 2022 in Education by JackTerrance

0 votes

Q: ValueError: DataFrame constructor not properly called! with pandas

def multiple_dfs(xyz_file, sheet, *args): row=2 writer = pd.ExcelWriter(xyz_file, engine='openpyxl') df = pd. ... help me over this? Select the correct answer from above options...

asked Jan 23, 2022 in Education by JackTerrance

0 votes

Q: Delete column from pandas DataFrame

How to delete a column in a DataFrame, currently I am using: del df['column_1'] this is working fine, ... expected df.column_name ? Select the correct answer from above options...

asked Jan 22, 2022 in Education by JackTerrance

0 votes

Q: How to iterate over rows in a DataFrame in Pandas?

There is a DataFrame from pandas: import pandas as pd inp = [{'e2':20, 'e3':200}, {'e2':22,'e3':220}, { ... '] Can I do this in Pandas? Select the correct answer from above options...

asked Jan 22, 2022 in Education by JackTerrance

0 votes

Q: Select rows from a DataFrame based on values in a column in pandas

I want to select rows from a DataFrame based on values in some column in pandas, How can I do it? I ... WHERE column_name = some_value Select the correct answer from above options...

asked Jan 22, 2022 in Education by JackTerrance

0 votes

Q: How to check if a word is in each row of a pandas dataframe

I have the pandas data frame with a column designated to town names. After each town name, I am adding a word " ... .csv', index=False) Select the correct answer from above options...

asked Jan 19, 2022 in Education by JackTerrance

0 votes

Q: How to check if a word is in each row of a pandas dataframe

I have the pandas data frame with the column designated to town names. After each town name, I am adding a word ... .csv', index=False) Select the correct answer from above options...

asked Jan 19, 2022 in Education by JackTerrance

0 votes

Q: How do I get the row count of a pandas DataFrame?

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code. Method 1: total_rows ... What am I doing wrong? Select the correct answer from above options...

asked Jan 27, 2022 in Education by JackTerrance

0 votes

Q: Aggregate dataframe with a calculation across a single column

My initial dataframe looks: library(tidyverse) df...

asked Apr 23, 2022 in Education by JackTerrance

JackTerrance · Answer 1 · 2022-04-21T00:36:20+0000

We can groupby the DataFrame by ID, then count my_val values with value_counts and convert to json with to_json, which, with some small changes in formatting, gives us the format that was requested (we just need to remove curly brackets and quotes and replace commas with semicolons). On the grouped data we also take the first (and presumably the only one per ID) value of db_val and calculate the percentage of matches (more than 50% will give us full_match, 0-50% is partial_match and 0% is no_match): df['match'] = df['my_val']==df['db_val'] z = (df .groupby('ID') .agg({'my_val': lambda x: x.value_counts().to_json(), 'db_val': 'first', 'match': 'mean'}) ).reset_index() z['my_val'] = z['my_val'].str.replace('[{"}]','').str.replace(',',';') z['match'] = np.select( [z['match'] > 0.5, z['match'] > 0], ['full_match', 'partial_match'], 'no_match') print(z) Output: ID my_val db_val match 0 a X:2;Y:1 X full_match 1 b Y:2;X:1 Y full_match 2 c Z:2;X:1 X partial_match