Skip to content
Advertisement

Renaming scientific paper PDFs from one name pattern to another name pattern

I am trying to automate the renaming of PDFs of scientific papers from one name pattern to another using python.

The name pattern the PDFs occur in looks like this:

Cresswell, K., Worth, A., & Sheikh, A. (2011). Implementing and adopting electronic health record systems. Clinical governance- an international journal.

i.e. “LastName1, FirstLetterGivenName1., LastName2, FirstLeterGivenName2., […]. (Year). Title. Journal.”

The name pattern of this example should be renamed to looks like this:

Cresswell_K_2011_Implementing and adopting

i.e “LastName1_FirstLetterGivenName1_Year_First3LettersTitle”

Sadly I was unable to apply the solutions to similar problems to this specific one, as I am just starting to code.

Advertisement

Answer

You can use regular expression, like this for example:

import re

s = "Cresswell, K., Worth, A., & Sheikh, A. (2011). Implementing and adopting electronic health record systems. Clinical governance- an international journal."

p = re.compile(r'(?P<LastName1>[A-Za-z]+),s+(?P<GivenName1>[A-Za-z]+).?,.+((?P<Year>d+)).s+(?P<Title1>w+)s(?P<Title2>w+)s(?P<Title3>w+)')
m = p.search(s)
if m is not None:
    d = m.groupdict()
    result = d['LastName1'] +'_'+ d['GivenName1'][0] +'_'+ d['Year']+ '_'+ d['Title1']+ ' '+ d['Title2'] +' '+ d['Title3']
    print(result)

this gives the output:

Cresswell_K_2011_Implementing and adopting

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement