Having a bit of trouble understanding the documentation
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dfbreed[‘x’] = dfbreed.apply(testbreed, axis=1) C:/Users/erasmuss/PycharmProjects/Sarah/farmdata.py:38: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Code is basically to re-arrange and clean some data to make analysis easier. Code in given row-by per each animal, but has repetitions, blanks, and some other sparse values Idea is to basically stack rows into columns and grab the useful data (Weight by date and final BCS) per animal
Initial DF few snippets of the dataframe
Output Format Output DF/csv
import pandas as pd import numpy as np #Function for cleaning up multiple entries of breeds def testbreed(x): if x.first_valid_index() is None: return None else: return x[x.first_valid_index()] #Read Data df1 = pd.read_csv("farmdata.csv") #Drop empty rows df1.dropna(how='all', axis=1, inplace=True) #Copy to extract Weights in DF2 df2 = df1.copy() df2 = df2.drop(['BCS', 'Breed','Age'], axis=1) #Pivot for ID names in DF1 df1 = df1.pivot(index='ID', columns='Date', values=['Breed','Weight', 'BCS']) #Pivot for weights in DF2 df2 = df2.pivot(index='ID', columns='Date', values = 'Weight') #Split out Breeds and BCS into individual dataframes w/Duplicate/missing data for each ID df3 = df1.copy() dfbreed = df3[['Breed']] dfBCS = df3[['BCS']] #Drop empty BCS columns df1.dropna(how='all', axis=1, inplace=True) #Shorten Breed and BCS to single Column by grabbing first value that is real. see function above dfbreed['x'] = dfbreed.apply(testbreed, axis=1) dfBCS['x'] = dfBCS.apply(testbreed, axis=1) #Populate BCS and Breed into new DF df5= pd.DataFrame(data=None) df5['Breed'] = dfbreed['x'] df5['BCS'] = dfBCS['x'] #Join Weights df5 = df5.join(df2) #Write output df5.to_csv(r'.out1.csv')
I want to take the BCS and Breed dataframes which are multi-indexed on the column by Breed or BCS and then by date to take the first non-NaN value in the rows of dates and set that into a column named breed.
I had a lot of trouble getting the columns to pick the first unique values in-situ on the DF I found a work-around with a 2015 answer:
which defined the function at the top. reading through the setting a value on the copy-of a slice makes sense intuitively, but I can’t seem to think of a way to make it work as a direct-replacement or index-based.
Should I be looping through?
Trying from The second answer here I get
dfbreed.loc[:,'Breed'] = dfbreed['Breed'].apply(testbreed, axis=1) dfBCS.loc[:, 'BCS'] = dfBCS.apply['BCS'](testbreed, axis=1)
which returns
ValueError: Must have equal len keys and value when setting with an iterable
I’m thinking this has something to do with the multi-index keys come up as:
MultiIndex([(‘Breed’, ‘1/28/2021’), (‘Breed’, ‘2/12/2021’), (‘Breed’, ‘2/4/2021’), (‘Breed’, ‘3/18/2021’), (‘Breed’, ‘7/30/2021’)], names=[None, ‘Date’]) MultiIndex([(‘BCS’, ‘1/28/2021’), (‘BCS’, ‘2/12/2021’), (‘BCS’, ‘2/4/2021’), (‘BCS’, ‘3/18/2021’), (‘BCS’, ‘7/30/2021’)], names=[None, ‘Date’])
Sorry for the long question(s?) Can anyone help me out?
Thanks.
Advertisement
Answer
You created dfbreed as:
dfbreed = df3[['Breed']]
So it is a view of the original DataFrame (limited to just this one column).
Remember that a view has not any own data buffer, it is only a tool to “view” a fragment of the original DataFrame, with read only access.
When you attempt to perform dfbreed['x'] = dfbreed.apply(...)
, you
actually attempt to violate the read-only access mode.
To avoid this error, create dfbreed as an “independent” DataFrame:
dfbreed = df3[['Breed']].copy()
Now dfbreed has its own data buffer and you are free to change the data.