SAMPLE DATA: https://docs.google.com/spreadsheets/d/1s6MzBu5lFcc-uUZ9B6CI1YR7P1fDSm4cByFwKt3ckgc/edit?usp=sharing
I have this function that uses textacy to extract the source attribution. This automatically returns the speaker, cue and content of the quotes. In my dataset, some paragraphs have several quotations, but I only need the first one, that’s why I put the BREAK in the for loop.
My problem now is that some of original data do not have quotation, so I was hoping that not only will the function skip it, it will also return something. I believe the problem is after the EXCEPT:
It returns something like this:
But its supposed to skip the first line because the first line returns an error so Im hoping for it to look like this:
import textacy from textacy import extract import spacy def extract_direct(text): extracted = pd.DataFrame() for i in text: try: doc = nlp(i) a = ex.direct_quotations(doc) for item in a: mined = {'speaker': item.speaker, 'cue': item.cue, 'content': item.content} extracted = extracted.append(mined, ignore_index = True) break except ValueError: continue contents = news_only['index'] extracted = pd.concat([extracted, contents], ignore_index=True) return(extracted) extract_direct(dataframe['Body'])
Advertisement
Answer
I did this to solve the problem. Had to append both instances at Try and Except.
def extract_direct(text): extracted = pd.DataFrame() for i in text: try: doc = nlp(i) a = ex.direct_quotations(doc) for item in a: mined = {'speaker': item.speaker, 'cue': item.cue, 'content': item.content} extracted = extracted.append(mined, ignore_index = True) break except ValueError: mined = {'speaker': 'None', 'cue': 'None', 'content': 'None'} extracted = extracted.append(mined, ignore_index = True) return(extracted)