I am trying to automate the renaming of PDFs of scientific papers from one name pattern to another using python.
The name pattern the PDFs occur in looks like this:
Cresswell, K., Worth, A., & Sheikh, A. (2011). Implementing and adopting electronic health record systems. Clinical governance- an international journal.
i.e. “LastName1, FirstLetterGivenName1., LastName2, FirstLeterGivenName2., […]. (Year). Title. Journal.”
The name pattern of this example should be renamed to looks like this:
Cresswell_K_2011_Implementing and adopting
i.e “LastName1_FirstLetterGivenName1_Year_First3LettersTitle”
Sadly I was unable to apply the solutions to similar problems to this specific one, as I am just starting to code.
Advertisement
Answer
You can use regular expression, like this for example:
import re s = "Cresswell, K., Worth, A., & Sheikh, A. (2011). Implementing and adopting electronic health record systems. Clinical governance- an international journal." p = re.compile(r'(?P<LastName1>[A-Za-z]+),s+(?P<GivenName1>[A-Za-z]+).?,.+((?P<Year>d+)).s+(?P<Title1>w+)s(?P<Title2>w+)s(?P<Title3>w+)') m = p.search(s) if m is not None: d = m.groupdict() result = d['LastName1'] +'_'+ d['GivenName1'][0] +'_'+ d['Year']+ '_'+ d['Title1']+ ' '+ d['Title2'] +' '+ d['Title3'] print(result)
this gives the output:
Cresswell_K_2011_Implementing and adopting