Skip to content
Advertisement

Extracting human names from text data using python stanza

I have a dataset containing the string value of book title pages (e.g. all words on the title page, each line of my txt file is a different book). From this I am trying to retrieve the author’s name as the human name which appears on the title page, and store each name on a separate line in a csv file. When I type the following code I get a “no author” value for every entry, which is not plausible based on the input data. Can someone help me figure out what is going wrong? Thanks, I have been racking my head on this for the past few days with no results.

JavaScript

Advertisement

Answer

In case anyone has a similar issue… This seems to work, but the results are not altogether satisfactory (i.e. several names missed). I don’t know if this is because of the code I wrote or just stanza missing names once in a while, but I suspect it’s the latter.

JavaScript

a possibility is that perhaps stanza misses foreign names, but as far as I know it’s not possible to create a pipeline with multiple languages (nlp=stanza.Pipeline(‘en’, ‘de’, ‘fr’ …).

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement