I am a beginner in python, and I have a question that perhaps is simple. I have a “file.txt”, where in principle there can be a number n of strings.
> file.txt John Rafa Marta ... n
This is loaded into the program with:
with open('/media/names.txt') as f: lines = f.read().splitlines()
Now, I load a dataframe from a csv, which has a column (with name “Identifier”) that contains a lot of names.
Registration = pd.read_csv('/media/Registration.csv', sep='t', header=0)
The goal is to find the n strings separately for each variable. For example, in this case I have done it for the first data in the list:
names_1 = Registration[Registration['Identifier'].str.contains(lines[1])] print(names_1)
Only keeping the lines that have “John” as an identifier. However, I am trying to create n dataframes as there are items in the “file.txt” list.
names_1 = Registration[Registration['Identifier'].str.contains(lines[1])] names_2 = Registration[Registration['Identifier'].str.contains(lines[2])] names_3 = Registration[Registration['Identifier'].str.contains(lines[3])] names_n = Registration[Registration['Identifier'].str.contains(lines[n])]
But I’m a bit stuck and I don’t know how to do this loop. Someone help me? Thanks!
Advertisement
Answer
Theoretically speaking, the answer to your question is that local variables are stored in a dictionary accessible with the function locals(). As a result, it is possible to generate variables in a loop exactly as asked.
for i, line in enumerate(lines): locals()[f'names_{i}'] = Registration[Registration['Identifier'].str.contains(line)]
However, just because you can do it doesn’t mean you should, it’s generally not a good idea to generate variables in this manner.
Just ask yourself, how would you access the nth variable? You are going down a path that will make your data difficult to work with. A better approach is to use a data structure like a dictionary or a list to easily keep track of it.
names = [] for line in lines: names.append(Registration[Registration['Identifier'].str.contains(line)])
Do note also that the first index is 0, not 1.