Extract Number of pages from a text column

Question

I have a text column which contains comments like: 6 pages, LaTeX, no figures 19 pages, latex, 4 figures as uuencoded postscript files Invited Talk at the &#8220;VII Marcel Grossman Meeting on General Relativity&#8221; &#8211; Stanford, July 1994. 14 pages, latex, five figures, which will be available upon re…

Accepted Answer

This works as long as there is only one number of pages per comment.import recomments = ["6 pages, LaTeX, no figures","112 cucumber","19 pages, latex, 4 figures as uuencoded postscript files","Invited Talk at the ``VII Marcel Grossman Meeting on General Relativity'' - Stanford, July 1994. 14 pages, latex, five figures, which will be available upon request.",'15 pp. Phyzzx']def page_num_extract(text:list) -> list:  out = []  for line in text:    pages = re.findall("d* pages|d* pp.", line)    pages = re.findall("d*", str(*pages))[0]    if not pages:      pages = "NA"    out.append(pages)  return outpage_num_extract(comments)[&#8216;6&#8217;, &#8216;NA&#8217;, &#8217;19&#8217;, &#8217;14&#8217;, &#8217;15&#8217;]

Advertisement

Answer