I am still learning python, kindly excuse if the question looks trivial to some.
I have a csv file with following format and I want to extract a small segment of it and write to another csv file:
So, this is what I want to do:
- Just extract the entries under actor_list2 and the corresponding id column and write it to a csv file in following format.
Since the format is not a regular column headers followed by some values, I am not sure how to select starting point based on a cell value in a particular column.e.g. even if we consider actor_list2, then it may have any number of entries under that. Please help me understand if it can be done using pandas dataframe processing capability.
Update: The reason why I would like to automate it is because there can be thousands of such files and it would be impractical to manually get that info to create the final csv file which will essentially have a row for each file.
Advertisement
Answer
As Nour-Allah has pointed out the formatting here is not very regular to say the least. The best you can do if that is the case that your data comes out like this every time is to skip some rows of the file:
import pandas as pd df = pd.read_csv('blabla.csv', skiprows=list(range(17)), nrows=8) df_res = df.loc[:, ['actor_list2', 'ID']]
This should get you the result but given how erratic formatting is, this is no way to automate. What if next time there’s another actor? Or one fewer? Even Nour-Allah’s solution would not help there.
Honestly, you should just get better data.