I currently have a lists of captions in the form of a list
print(valid_captions)
-> [' Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino ', ' Chuck Grodin ', ' Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth ', ' Kelly Murro and Tom Murro ', ' Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton ']
I want to create a function that would iterate over each element of the list and create an adjacency listfor each person where I can get a list of unique names of all the folks that appear in the list within the data set. I want to represent this adjacency list as a python dictionary with each name as the key and the list of names they appear with as the values.
So the function would take a single caption and return a dictionary in the form of
name: [other names in caption]}
for each name while removing any titles like Dr
or Mayor
.
As an example I would like this
[Dr .Ron Iervolino, Trish Iervolino, and Mayor.Russ Middleton]
to return
{'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton'], 'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton'], 'Russ Middleton': ['Ron Iervolino', 'Russ Middleton']}
f someone appears in a caption by themselves, return {name: []}. So the caption ‘Robb Stark’ would return {‘Robb Stark’: []}
I have a function to remove the titles, but I’m getting the adjacency list all wrong.
def remove_title(names): removed_list = [] for name in names: altered_name = re.split('Dr |Mayor ', name) removed_list+=altered_name try: while True: removed_list.remove('') except: pass return removed_list
Advertisement
Answer
The following is my solution to the problem whereby I create a function that takes a caption and returns a dictionary of the form {name: [other names in caption]} for each name.
In the function, I cleaned up the captions using string manipulation functions at the very start to remove the titles like ‘Mayor’, ‘Dr’ while also stripping out ‘and’ from the captions. Then I also used strip() to remove any leading or trailing spaces. I incorporate try and except for any exception handling while removing individual elements of the prospective list and then using for loops for the rest of the process.
def format_caption(caption): name_list = re.split('Dr |Mayor |and |, ', caption) name_list = [name.strip() for name in name_list] name_dict = {} try: while True: name_list.remove('') except: pass for name in name_list: name_dict.update({name:[]}) for key, name_list_2 in name_dict.items(): for name in name_list: if name != key: name_list_2.append(name) return name_dict
The resulting function gives me the captions in the format I was looking for
list=['Dr .Ron Iervolino, Trish Iervolino, and Mayor.Russ Middleton'] print(format_caption(list))
>{'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton'], 'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton'], 'Russ Middleton': ['Ron Iervolino', 'Russ Middleton']}