Skip to content
Advertisement

Using python pandas how can we select very specific rows and associated column

I am still learning python, kindly excuse if the question looks trivial to some.

I have a csv file with following format and I want to extract a small segment of it and write to another csv file: enter image description here

So, this is what I want to do:

  1. Just extract the entries under actor_list2 and the corresponding id column and write it to a csv file in following format.

enter image description here

Since the format is not a regular column headers followed by some values, I am not sure how to select starting point based on a cell value in a particular column.e.g. even if we consider actor_list2, then it may have any number of entries under that. Please help me understand if it can be done using pandas dataframe processing capability.

Update: The reason why I would like to automate it is because there can be thousands of such files and it would be impractical to manually get that info to create the final csv file which will essentially have a row for each file.

Advertisement

Answer

As Nour-Allah has pointed out the formatting here is not very regular to say the least. The best you can do if that is the case that your data comes out like this every time is to skip some rows of the file:

import pandas as pd

df = pd.read_csv('blabla.csv', skiprows=list(range(17)), nrows=8)
df_res = df.loc[:, ['actor_list2', 'ID']]

This should get you the result but given how erratic formatting is, this is no way to automate. What if next time there’s another actor? Or one fewer? Even Nour-Allah’s solution would not help there.

Honestly, you should just get better data.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement