Skip to content
Advertisement

Do you know how to split a list of strings into different variables?

I am a beginner in python, and I have a question that perhaps is simple. I have a “file.txt”, where in principle there can be a number n of strings.

> file.txt

John
Rafa
Marta
... 
n

This is loaded into the program with:

with open('/media/names.txt') as f:
    lines = f.read().splitlines()

Now, I load a dataframe from a csv, which has a column (with name “Identifier”) that contains a lot of names.

Registration = pd.read_csv('/media/Registration.csv', 
    sep='t', header=0)

The goal is to find the n strings separately for each variable. For example, in this case I have done it for the first data in the list:

names_1 = Registration[Registration['Identifier'].str.contains(lines[1])]
print(names_1)

Only keeping the lines that have “John” as an identifier. However, I am trying to create n dataframes as there are items in the “file.txt” list.

names_1 = Registration[Registration['Identifier'].str.contains(lines[1])]

names_2 = Registration[Registration['Identifier'].str.contains(lines[2])]

names_3 = Registration[Registration['Identifier'].str.contains(lines[3])]

names_n = Registration[Registration['Identifier'].str.contains(lines[n])]

But I’m a bit stuck and I don’t know how to do this loop. Someone help me? Thanks!

Advertisement

Answer

Theoretically speaking, the answer to your question is that local variables are stored in a dictionary accessible with the function locals(). As a result, it is possible to generate variables in a loop exactly as asked.

for i, line in enumerate(lines):
    locals()[f'names_{i}'] = Registration[Registration['Identifier'].str.contains(line)]

However, just because you can do it doesn’t mean you should, it’s generally not a good idea to generate variables in this manner.

Just ask yourself, how would you access the nth variable? You are going down a path that will make your data difficult to work with. A better approach is to use a data structure like a dictionary or a list to easily keep track of it.

names = []
for line in lines:
    names.append(Registration[Registration['Identifier'].str.contains(line)])

Do note also that the first index is 0, not 1.

Advertisement