in Education by
I'm testing such sentence to extract entity values: s = "Height: 3m, width: 4.0m, others: 3.4 m, 4m, 5 meters, 10 m. Quantity: 6." sent = nlp(s) for ent in sent.ents: print(ent.text, ent.label_) And got some misleading values: 3 CARDINAL 4.0m CARDINAL 3.4 m CARDINAL 4m CARDINAL 5 meters QUANTITY 10 m QUANTITY 6 CARDINAL namely, number 3m is not paired with m. This is the case for many examples as I can't rely on this engine when want to separate meters from quantities. Should I do this manually? JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
One potential difficulty in your example is that it's not very close to natural language. The pre-trained English models were trained on ~2m words of general web and news text, so they're not always going to perform perfect out-of-the-box on text with a very different structure. While you could update the model with more examples of QUANTITY in your specific texts, I think that a rule-based approach might actually be a better and more efficient solution here. The example in this blog post is actually very close to what you're trying to do: import spacy from spacy.pipeline import EntityRuler nlp = spacy.load("en_core_web_sm") weights_pattern = [ {"LIKE_NUM": True}, {"LOWER": {"IN": ["g", "kg", "grams", "kilograms", "lb", "lbs", "pounds"]}} ] patterns = [{"label": "QUANTITY", "pattern": weights_pattern}] ruler = EntityRuler(nlp, patterns=patterns) nlp.add_pipe(ruler, before="ner") doc = nlp("U.S. average was 2 lbs.") print([(ent.text, ent.label_) for ent in doc.ents]) # [('U.S.', 'GPE'), ('2 lbs', 'QUANTITY')] The statistical named entity recognizer respects pre-defined entities and wil "predict around" them. So if you're adding the EntityRuler before it in the pipeline, your custom QUANTITY entities will be assigned first and will be taken into account when the entity recognizer predicts labels for the remaining tokens. Note that this example is using the latest version of spaCy, v2.1.x. You might also want to add more patterns to cover different constructions. For more details and inspiration, check out the documentation on the EntityRuler, combining models and rules and the token match pattern syntax.

Related questions

0 votes
    Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 20, 2022 in Education by JackTerrance
0 votes
    Hi Everyone I'm Having This Two Columns: Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 20, 2022 in Education by JackTerrance
0 votes
    I'm trying to do the Udacity mini project and I've got the latest version of the SKLearn library ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 24, 2022 in Education by JackTerrance
0 votes
    I'm creating my first program on python. The objective is to get an output of trip cost. In the ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 21, 2022 in Education by JackTerrance
0 votes
    I am working on a homework assignment in which we are creating a class to be used in a program to ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 16, 2022 in Education by JackTerrance
0 votes
    I am trying to fit some (numpy) data into python skLearn modules, but keep getting error messages. When ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 3, 2022 in Education by JackTerrance
0 votes
    I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 3, 2022 in Education by JackTerrance
0 votes
    This question already has answers here: Convert pandas DateTimeIndex to Unix Time? (7 answers) Closed 3 ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 26, 2022 in Education by JackTerrance
0 votes
    Let's say I have the following code: from types import coroutine @coroutine def stop(): yield 1 async ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 17, 2022 in Education by JackTerrance
0 votes
    I have multiple xlsx files with data in it that i want to import to separate dataframes in Python. ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 7, 2022 in Education by JackTerrance
0 votes
    I used the function for gower distance from this link: https://sourceforge.net/projects/gower-distance- ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 7, 2022 in Education by JackTerrance
0 votes
    I have two Dataframes : DF1(That i've just resampled): Mi_pollution.head(): Sensor_ID Time_Instant ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 3, 2022 in Education by JackTerrance
0 votes
    I want to have root mean squared of gradient boosting algorithm but when I want to print it, I ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 26, 2022 in Education by JackTerrance
0 votes
    I want to have root mean squared of gradient boosting algorithm but when I want to print it, I ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 26, 2022 in Education by JackTerrance
0 votes
    I want to get the values of col1 in 3 different columns with separate headers. Date/Time col1 0 ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 21, 2022 in Education by JackTerrance
...