in Education by
My data looks as follows: ID my_val db_val a X X a X X a Y X b X Y b Y Y b Y Y c Z X c X X c Z X Expected result : ID my_val db match a X:2;Y:1 X full_match b Y:2;X:1 Y full_match c z:2;X:1 X partial_match a full_match is when db_val matches the most abundant my_val a partial_match is when db_val is in the other values but doesn't match the top one. My current approach consists of grouping by ID then counting values into a seperate column then concatenating the value and its count, then aggregating all values into one row for each ID. This is how I aggregate the columns: def all_hits_aggregate_df(df, columns=['my_val']): grouped = data.groupby('ID') l=[] for c in columns: res = grouped[c].value_counts(ascending=False, normalize=False).to_frame('count_'+c).reset_index(level=1) res[c] = res[c].astype(str) +':'+ res['count_'+c].astype(str) l.append(res.groupby('ID').agg(lambda x: ';'.join(x))) return reduce(lambda x, y: pd.merge(x, y, on = 'ID'), l) And for the comparison phase, I loop through each row and parse the my_val column into lists then do the comparison. I am sure that the way I do the comparison step is extremely inefficient but I am unsure how I would do it before aggregation to avoid having to parse the generated string later in the process. JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
We can groupby the DataFrame by ID, then count my_val values with value_counts and convert to json with to_json, which, with some small changes in formatting, gives us the format that was requested (we just need to remove curly brackets and quotes and replace commas with semicolons). On the grouped data we also take the first (and presumably the only one per ID) value of db_val and calculate the percentage of matches (more than 50% will give us full_match, 0-50% is partial_match and 0% is no_match): df['match'] = df['my_val']==df['db_val'] z = (df .groupby('ID') .agg({'my_val': lambda x: x.value_counts().to_json(), 'db_val': 'first', 'match': 'mean'}) ).reset_index() z['my_val'] = z['my_val'].str.replace('[{"}]','').str.replace(',',';') z['match'] = np.select( [z['match'] > 0.5, z['match'] > 0], ['full_match', 'partial_match'], 'no_match') print(z) Output: ID my_val db_val match 0 a X:2;Y:1 X full_match 1 b Y:2;X:1 Y full_match 2 c Z:2;X:1 X partial_match

Related questions

0 votes
    I have 2 columns in the python dataframe. I want to check each row in my Column A for any value that ... for this particular purpose. Select the correct answer from above options...
asked Jan 9, 2022 in Education by JackTerrance
0 votes
    I have a dataset |category| cat a cat b cat a I'd like to be able to return something like (showing unique values ... cat a 2 cat b 1 Select the correct answer from above options...
asked Jan 27, 2022 in Education by JackTerrance
0 votes
    What exactly is the difference between groupby("x").count and groupby("x").size in Pandas? Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", ... Any hints would be welcome. Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I have a 20 x 4000 dataframe in python using pandas. Two of these columns are named Year and quarter. I'd ... anyone help with that? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I'm starting from the pandas DataFrame docs here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html I'd ... =1)] print valdict Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I don ... 'gdp', 'cap'] Select the correct answer from above options...
asked Jan 26, 2022 in Education by JackTerrance
0 votes
    def multiple_dfs(xyz_file, sheet, *args): row=2 writer = pd.ExcelWriter(xyz_file, engine='openpyxl') df = pd. ... help me over this? Select the correct answer from above options...
asked Jan 23, 2022 in Education by JackTerrance
0 votes
    How to delete a column in a DataFrame, currently I am using: del df['column_1'] this is working fine, ... expected df.column_name ? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    There is a DataFrame from pandas: import pandas as pd inp = [{'e2':20, 'e3':200}, {'e2':22,'e3':220}, { ... '] Can I do this in Pandas? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    I want to select rows from a DataFrame based on values in some column in pandas, How can I do it? I ... WHERE column_name = some_value Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    I have the pandas data frame with a column designated to town names. After each town name, I am adding a word " ... .csv', index=False) Select the correct answer from above options...
asked Jan 19, 2022 in Education by JackTerrance
0 votes
    I have the pandas data frame with the column designated to town names. After each town name, I am adding a word ... .csv', index=False) Select the correct answer from above options...
asked Jan 19, 2022 in Education by JackTerrance
0 votes
    I'm trying to get the number of rows of dataframe df with Pandas, and here is my code. Method 1: total_rows ... What am I doing wrong? Select the correct answer from above options...
asked Jan 27, 2022 in Education by JackTerrance
0 votes
    My initial dataframe looks: library(tidyverse) df...
asked Apr 23, 2022 in Education by JackTerrance
...