What I have is a basic dataframe that I want to pull two values out of, based on index position. So for this:
first_column | second_column |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 5 |
I want to extract the values in row 1 and row 2 (1 2) out of first_column, then extract values in row 2 and row 3 (2 3) out of the first_column, so on and so forth until I’ve iterated over the entire column. I ran into an issue with the four loop and am stuck with getting the next index value.
I have code like below:
import pandas as pd data = {'first_column': [1, 2, 3, 4, 5], 'second_column': [1, 2, 3, 4, 5], } df = pd.DataFrame(data) for index, row in df.iterrows(): print(index, row['first_column']) # value1 print(index + 1, row['first_column'].values(index + 1)) # value2 <-- error in logic here
Ignoring the prints, which will eventually become variables that are returned, how can I improve this to return (1 2), (2 3), (3 4), (4 5), etc.?
Also, is this easier done with iteritems() method instead of iterrows?
Advertisement
Answer
Not sure if this is what you want to achieve:
(temp= df.assign(second_column = df.second_column.shift(-1)) .dropna() .assign(second_column = lambda df: df.second_column.astype(int)) ) [*zip(temp.first_column.array, temp.second_column.array)] [(1, 2), (2, 3), (3, 4), (4, 5)]
A simpler solution from @HenryEcker:
list(zip(df['first_column'], df['first_column'].iloc[1:]))