Pandas “A value is trying to be set on a copy of a slice from a DataFrame”

Having a bit of trouble understanding the documentation

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dfbreed[‘x’] = dfbreed.apply(testbreed, axis=1) C:/Users/erasmuss/PycharmProjects/Sarah/farmdata.py:38: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Code is basically to re-arrange and clean some data to make analysis easier. Code in given row-by per each animal, but has repetitions, blanks, and some other sparse values Idea is to basically stack rows into columns and grab the useful data (Weight by date and final BCS) per animal

Initial DF few snippets of the dataframe

Output Format Output DF/csv

import pandas as pd
import numpy as np

#Function for cleaning up multiple entries of breeds
def testbreed(x):
    if x.first_valid_index() is None:
        return None
    else:
        return x[x.first_valid_index()]

#Read Data
df1 = pd.read_csv("farmdata.csv")

#Drop empty rows
df1.dropna(how='all', axis=1, inplace=True)

#Copy to extract Weights in DF2
df2 = df1.copy()
df2 = df2.drop(['BCS', 'Breed','Age'], axis=1)

#Pivot for ID names in DF1
df1 = df1.pivot(index='ID', columns='Date', values=['Breed','Weight', 'BCS'])

#Pivot for weights in DF2
df2 = df2.pivot(index='ID', columns='Date', values = 'Weight')

#Split out Breeds and BCS into individual dataframes w/Duplicate/missing data for each ID
df3 = df1.copy()
dfbreed = df3[['Breed']]
dfBCS = df3[['BCS']]

#Drop empty BCS columns
df1.dropna(how='all', axis=1, inplace=True)

#Shorten Breed and BCS to single Column by grabbing first value that is real. see function above
dfbreed['x'] = dfbreed.apply(testbreed, axis=1)
dfBCS['x'] = dfBCS.apply(testbreed, axis=1)

#Populate BCS and Breed into new DF
df5= pd.DataFrame(data=None)
df5['Breed'] = dfbreed['x']
df5['BCS'] = dfBCS['x']

#Join Weights
df5 = df5.join(df2)

#Write output
df5.to_csv(r'.out1.csv')

JavaScript
​x
 
import pandas as pd
import numpy as np
​
#Function for cleaning up multiple entries of breeds
def testbreed(x):
    if x.first_valid_index() is None:
        return None
    else:
        return x[x.first_valid_index()]
​
#Read Data
df1 = pd.read_csv("farmdata.csv")
​
#Drop empty rows
df1.dropna(how='all', axis=1, inplace=True)
​
#Copy to extract Weights in DF2
df2 = df1.copy()
df2 = df2.drop(['BCS', 'Breed','Age'], axis=1)
​
#Pivot for ID names in DF1
df1 = df1.pivot(index='ID', columns='Date', values=['Breed','Weight', 'BCS'])
​
#Pivot for weights in DF2
df2 = df2.pivot(index='ID', columns='Date', values = 'Weight')
​
#Split out Breeds and BCS into individual dataframes w/Duplicate/missing data for each ID
df3 = df1.copy()
dfbreed = df3[['Breed']]
dfBCS = df3[['BCS']]
​
#Drop empty BCS columns
df1.dropna(how='all', axis=1, inplace=True)
​
#Shorten Breed and BCS to single Column by grabbing first value that is real. see function above
dfbreed['x'] = dfbreed.apply(testbreed, axis=1)
dfBCS['x'] = dfBCS.apply(testbreed, axis=1)
​
#Populate BCS and Breed into new DF
df5= pd.DataFrame(data=None)
df5['Breed'] = dfbreed['x']
df5['BCS'] = dfBCS['x']
​
#Join Weights
df5 = df5.join(df2)
​
#Write output
df5.to_csv(r'.out1.csv')
​

I want to take the BCS and Breed dataframes which are multi-indexed on the column by Breed or BCS and then by date to take the first non-NaN value in the rows of dates and set that into a column named breed.

I had a lot of trouble getting the columns to pick the first unique values in-situ on the DF I found a work-around with a 2015 answer:

2015 Answer

which defined the function at the top. reading through the setting a value on the copy-of a slice makes sense intuitively, but I can’t seem to think of a way to make it work as a direct-replacement or index-based.

Should I be looping through?

Trying from The second answer here I get

dfbreed.loc[:,'Breed'] = dfbreed['Breed'].apply(testbreed, axis=1)
dfBCS.loc[:, 'BCS'] = dfBCS.apply['BCS'](testbreed, axis=1)

JavaScript
 
dfbreed.loc[:,'Breed'] = dfbreed['Breed'].apply(testbreed, axis=1)
dfBCS.loc[:, 'BCS'] = dfBCS.apply['BCS'](testbreed, axis=1)
​

which returns

ValueError: Must have equal len keys and value when setting with an iterable

I’m thinking this has something to do with the multi-index keys come up as:

MultiIndex([(‘Breed’, ‘1/28/2021’), (‘Breed’, ‘2/12/2021’), (‘Breed’, ‘2/4/2021’), (‘Breed’, ‘3/18/2021’), (‘Breed’, ‘7/30/2021’)], names=[None, ‘Date’]) MultiIndex([(‘BCS’, ‘1/28/2021’), (‘BCS’, ‘2/12/2021’), (‘BCS’, ‘2/4/2021’), (‘BCS’, ‘3/18/2021’), (‘BCS’, ‘7/30/2021’)], names=[None, ‘Date’])

Sorry for the long question(s?) Can anyone help me out?

Thanks.

Answer

You created dfbreed as:

dfbreed = df3[['Breed']]

JavaScript
 
dfbreed = df3[['Breed']]
​

So it is a view of the original DataFrame (limited to just this one column).

Remember that a view has not any own data buffer, it is only a tool to “view” a fragment of the original DataFrame, with read only access.

When you attempt to perform dfbreed['x'] = dfbreed.apply(...), you actually attempt to violate the read-only access mode.

To avoid this error, create dfbreed as an “independent” DataFrame:

dfbreed = df3[['Breed']].copy()

JavaScript
 
dfbreed = df3[['Breed']].copy()
​

Now dfbreed has its own data buffer and you are free to change the data.

Advertisement

Answer