Pandas dataframe CSV reduce disk size

Question

Pandas dataframe CSV reduce disk size

asked May 22, 2022 in Education by JackTerrance

for my university assignment, I have to produce a csv file with all the distances of the airports of the world... the problem is that my csv file weight 151Mb. I want to reduce it as much as i can: This is my csv: and this is my code: # drop all features we don't need for attribute in df: if attribute not in ('NAME', 'COUNTRY', 'IATA', 'LAT', 'LNG'): df = df.drop(attribute, axis=1) # create a dictionary of airports, each airport has the following structure: # IATA : (NAME, COUNTRY, LAT, LNG) airport_dict = {} for airport in df.itertuples(): airport_dict[airport[3]] = (airport[1], airport[2], airport[4], airport[5]) # From tutorial 4 soulution: airportcodes=list(airport_dict) airportdists=pd.DataFrame() for i, airport_code1 in enumerate(airportcodes): airport1 = airport_dict[airport_code1] dists=[] for j, airport_code2 in enumerate(airportcodes): if j > i: airport2 = airport_dict[airport_code2] dists.append(distanceBetweenAirports(airport1[2],airport1[3],airport2[2],airport2[3])) else: # little edit: no need to calculate the distance twice, all duplicates are set to 0 distance dists.append(0) airportdists[i]=dists airportdists.columns=airportcodes airportdists.index=airportcodes # set all 0 distance values to NaN airportdists = airportdists.replace(0, np.nan) airportdists.to_csv(r'../Project Data Files-20190322/distances.csv') I also tried re-indexing it before saving: # remove all NaN values airportdists = airportdists.stack().reset_index() airportdists.columns = ['airport1','airport2','distance'] but the result is a dataframe with 3 columns and 17 million columns and a disk size of 419Mb... quite not an improvement... Can you help me shrink the size of my csv? Thank you! JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

Related questions

0 votes

Q: Pandas dataframe CSV reduce disk size

for my university assignment, I have to produce a csv file with all the distances of the airports of ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 7, 2022 in Education by JackTerrance

0 votes

Q: Removing rows from csv file before a particular row based on values in that row using Pandas

I have csv file and that looks like following. I want to remove all rows before one row values [ ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 14, 2022 in Education by JackTerrance

0 votes

Q: Removing emojis that start with '\x' in pandas/Python when reading a CSV file

How can I remove emojis that start with '\x' when reading a csv file using pandas in Python? The ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 14, 2022 in Education by JackTerrance

0 votes

Q: Using Pandas Dataframe in TensorFlow - X and Y values

I'am trying to follow this tutorial: ... for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jun 2, 2022 in Education by JackTerrance

0 votes

Q: Plotly: How to define the structure of a sankey diagram using a pandas dataframe?

This may sound like a very broad question, but if you'll let me describe some details I can ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 7, 2022 in Education by JackTerrance

0 votes

Q: Using Pandas Dataframe in TensorFlow - X and Y values

I'am trying to follow this tutorial: ... for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 29, 2022 in Education by JackTerrance

0 votes

Q: Count, compare and aggregate a pandas dataframe

My data looks as follows: ID my_val db_val a X X a X X a Y X b X Y b Y Y b ... JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 21, 2022 in Education by JackTerrance

0 votes

Q: Python function for searching Pandas dataframe

I have a simple method to search a pandas dataframe column for a list of keywords; however, I'd like to create a ... do everyth 28,passei o dia com o meu amor comemo demai...

asked Apr 13, 2022 in Education by JackTerrance

0 votes

Q: convert into a pandas dataframe after finding missing values in a spark dataframe

I am utilizing the following to find missing values in my spark df: from pyspark.sql.functions import col, ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 7, 2022 in Education by JackTerrance

0 votes

Q: How to return max value from a row from pandas dataframe taking into account values from the last row?

Currently I'm returning column name of the max value in the each row. df['Active'] = df.idxmax( ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 5, 2022 in Education by JackTerrance

0 votes

Q: How to return max value from a row from pandas dataframe taking into account values from the last row?

Currently I'm returning column name of the max value in the each row. df['Active'] = df.idxmax( ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 2, 2022 in Education by JackTerrance

0 votes

Q: Python Pandas check dataframe groupby, how many people have the same book combinations

So I have a list of people, each of them are given more than 2 books, 4 books are possible. I ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 2, 2022 in Education by JackTerrance

0 votes

Q: How can I replace all the NaN values with Zero's in a column of a pandas dataframe

I have a dataframe as below itm Date Amount 67 420 2012-09-30 00:00:00 65211 68 421 2012-09-09 00 ... solutions would be appreciated. Select the correct answer from above options...

asked Jan 28, 2022 in Education by JackTerrance

0 votes

Q: Converting a Pandas GroupBy output from Series to DataFrame

I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", ... Any hints would be welcome. Select the correct answer from above options...

asked Jan 28, 2022 in Education by JackTerrance

0 votes

Q: Combine two columns of text in dataframe in pandas/python

I have a 20 x 4000 dataframe in python using pandas. Two of these columns are named Year and quarter. I'd ... anyone help with that? Select the correct answer from above options...

asked Jan 28, 2022 in Education by JackTerrance

JackTerrance · Answer 1 · 2022-05-22T19:14:23+0000

I have done a similar application in the past; here's what I will do: It is difficult to shrink your file, but if your application needs to have for example a distance between an airport from others, I suggest you to create 9541 files, each file will be the distance of an airport to others and its name will be name of airport. In this case the loading of file is really fast.