How can I remove emojis that start with '\x' when reading a csv file using pandas in Python? The CSV file has lots of emojis in the text and I want to remove them. However, the normal pattern matching regex for emojis doesn't work on it. Here is an example:
Thx WP for performing key democratic function. Trump wants to live in post truth world where words don't matter. D\xe2\x80\xa6 |\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3|\n ME LA PELAS \n DONALD TRUMP \n|\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf| \n (\\__/) ||\n (\xe2\x80\xa2\xe3\x85\x85\xe2\x80\xa2) ||\n / \xe3\x80\x80 \xe3\x81\xa5
Here is an example of the code that works on normal emojis but not these ones:
import re
text = u'This dog \xe2\x80\x9d \xe2\x80\x9c'
print(text) # with emoji
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"]+", flags=re.UNICODE)
print(emoji_pattern.sub(r'', text)) # no emoji
So, the following piece of code works:
import unicodedata
from unidecode import unidecode
def deEmojify(inputString):
returnString = ""
for character in inputString:
try:
character.encode("ascii")
returnString += character
except UnicodeEncodeError:
returnString += ''
return returnString
print(deEmojify("I'm loving all the trump hate on Twitter right now \xf0\x9f\x99\x8c"))
But when I am reading from a csv using pandas it doesn't work and emojis are not removed:
import pandas as pd
df = pd.read_csv("Trump834.csv", encoding="utf-8")
import unicodedata
from unidecode import unidecode
def deEmojify(inputString):
returnString = ""
for character in inputString:
try:
character.encode("ascii")
returnString += character
except UnicodeEncodeError:
returnString += ''
return returnString
for i in range(df.shape[0]):
print(df.iloc[i]['Tweet'])
print(deEmojify(df.iloc[i]['Tweet']))
print("****************************************")
JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)