What is the best way to overcome wrong entity recognision with Spacy?

Question

What is the best way to overcome wrong entity recognision with Spacy?

asked Apr 20, 2022 in Education by JackTerrance

I'm testing such sentence to extract entity values: s = "Height: 3m, width: 4.0m, others: 3.4 m, 4m, 5 meters, 10 m. Quantity: 6." sent = nlp(s) for ent in sent.ents: print(ent.text, ent.label_) And got some misleading values: 3 CARDINAL 4.0m CARDINAL 3.4 m CARDINAL 4m CARDINAL 5 meters QUANTITY 10 m QUANTITY 6 CARDINAL namely, number 3m is not paired with m. This is the case for many examples as I can't rely on this engine when want to separate meters from quantities. Should I do this manually? JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

Related questions

0 votes

Q: How to overcome the Could not convert String to Float?

Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 20, 2022 in Education by JackTerrance

0 votes

Q: How to overcome the Could not convert String to Float?

Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 20, 2022 in Education by JackTerrance

0 votes

Q: SKLearn 0.20.2 - Import error with RandomizedPCA?

I'm trying to do the Udacity mini project and I've got the latest version of the SKLearn library ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 24, 2022 in Education by JackTerrance

0 votes

Q: How to fix a problem with input() in python?

I'm creating my first program on python. The objective is to get an output of trip cost. In the ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 21, 2022 in Education by JackTerrance

0 votes

Q: Having a problem with calling and printing out of a class

I am working on a homework assignment in which we are creating a class to be used in a program to ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 16, 2022 in Education by JackTerrance

0 votes

Q: skLearn fitting data input fails even though numpy data shape is correct

I am trying to fit some (numpy) data into python skLearn modules, but keep getting error messages. When ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jun 3, 2022 in Education by JackTerrance

0 votes

Q: Set pandas names after list items

I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Jun 3, 2022 in Education by JackTerrance

0 votes

Q: Convert datetime series to numeric timestamp [duplicate]

This question already has answers here: Convert pandas DateTimeIndex to Unix Time? (7 answers) Closed 3 ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 26, 2022 in Education by JackTerrance

0 votes

Q: Python traceback for coroutine

Let's say I have the following code: from types import coroutine @coroutine def stop(): yield 1 async ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 17, 2022 in Education by JackTerrance

0 votes

Q: Set pandas names after list items

I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 7, 2022 in Education by JackTerrance

0 votes

Q: How to select most important features? Feature Engineering

I used the function for gower distance from this link: https://sourceforge.net/projects/gower-distance- ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 7, 2022 in Education by JackTerrance

0 votes

Q: How to add a new column into a dataframe based on rows of an other dataframe?

I have two Dataframes : DF1(That i've just resampled): Mi_pollution.head(): Sensor_ID Time_Instant ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked May 3, 2022 in Education by JackTerrance

0 votes

Q: AttributeError: 'GradientBoostingRegressor' object has no attribute 'np'

I want to have root mean squared of gradient boosting algorithm but when I want to print it, I ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 26, 2022 in Education by JackTerrance

0 votes

Q: AttributeError: 'GradientBoostingRegressor' object has no attribute 'np'

I want to have root mean squared of gradient boosting algorithm but when I want to print it, I ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 26, 2022 in Education by JackTerrance

0 votes

Q: Split multiple values within a column

I want to get the values of col1 in 3 different columns with separate headers. Date/Time col1 0 ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...

asked Apr 21, 2022 in Education by JackTerrance

JackTerrance · Answer 1 · 2022-04-20T23:26:49+0000

One potential difficulty in your example is that it's not very close to natural language. The pre-trained English models were trained on ~2m words of general web and news text, so they're not always going to perform perfect out-of-the-box on text with a very different structure. While you could update the model with more examples of QUANTITY in your specific texts, I think that a rule-based approach might actually be a better and more efficient solution here. The example in this blog post is actually very close to what you're trying to do: import spacy from spacy.pipeline import EntityRuler nlp = spacy.load("en_core_web_sm") weights_pattern = [ {"LIKE_NUM": True}, {"LOWER": {"IN": ["g", "kg", "grams", "kilograms", "lb", "lbs", "pounds"]}} ] patterns = [{"label": "QUANTITY", "pattern": weights_pattern}] ruler = EntityRuler(nlp, patterns=patterns) nlp.add_pipe(ruler, before="ner") doc = nlp("U.S. average was 2 lbs.") print([(ent.text, ent.label_) for ent in doc.ents]) # [('U.S.', 'GPE'), ('2 lbs', 'QUANTITY')] The statistical named entity recognizer respects pre-defined entities and wil "predict around" them. So if you're adding the EntityRuler before it in the pipeline, your custom QUANTITY entities will be assigned first and will be taken into account when the entity recognizer predicts labels for the remaining tokens. Note that this example is using the latest version of spaCy, v2.1.x. You might also want to add more patterns to cover different constructions. For more details and inspiration, check out the documentation on the EntityRuler, combining models and rules and the token match pattern syntax.