I’m trying to use two columns from an existing dataframe to generate a list of new strings with those values. I found a lot of examples doing something similar, but not the same thing, so I appreciate advice or links elsewhere if this is a repeat question. Thanks in advance!
If I start with a data frame like this:
import pandas as pd df=pd.DataFrame(data=[["a",1],["b",2],["c",3]], columns=["id1","id2"]) id1 id2 0 a 1 1 b 2 2 c 3
I want to make a list that looks like new_ids=[‘a_1′,’b_2′,’c_3’] where values are from combining values in row 0 for id1 with values for row 0 for id2 and so on.
I started by making lists from the columns, but can’t figure out how to combine them into a new list. I also tried not using intermediate lists, but couldn’t get that either. Error messages below are accurate to the mock data, but are different from the ones with real data.
#making separate lists version
#this function works
def get_ids(orig_df):
    id1_list=[]
    id2_list=[]
    for i in range(len(orig_df)):
        id1_list.append(orig_df['id1'].values[i])
        id2_list.append(orig_df['id2'].values[i])
    return(id1_list,id2_list)    
idlist1,idlist2=get_ids(df)    
#this is the part that doesn't work
new_id=[]
for i,j in zip(idlist1,idlist2):
   row='_'.join(str(idlist1[i]),str(idlist2[j]))
   new_id.append(row)
#------------------------------------------------------------------------
#AttributeError                            Traceback (most recent call #last)
#<ipython-input-44-09983bd890a6> in <module>
#      1 newid_list=[]
#      2 for i in range(len(df)):
#----> 3     n1=df['id1'[i].values]
#      4     n2=df['id2'[i].values]
#      5     nid= str(n1)+"_"+str(n2)
#AttributeError: 'str' object has no attribute 'values'
#skipping making lists (also doesn't work)
newid_list=[]
for i in range(len(df)):
    n1=df['id1'[i].values]
    n2=df['id2'[i].values]
    nid= str(n1)+"_"+str(n2)
    newid_list.append(nid)
#---------------------------------------------------------------------------
#TypeError                                 Traceback (most recent call last)
#<ipython-input-41-6b0c949a1ad5> in <module>
#      1 new_id=[]
#      2 for i,j in zip(idlist1,idlist2):
#----> 3    row='_'.join(str(idlist1[i]),str(idlist2[j]))
#      4    new_id.append(row)
#      5    #return ', '.join(new_id)
#TypeError: list indices must be integers or slices, not str
Advertisement
Answer
(df.id1 + "_" + df.id2.astype(str)).tolist()
output:
['a_1', 'b_2', 'c_3']
your approaches(corrected):
def get_ids(orig_df):
    id1_list=[]
    id2_list=[]
    for i in range(len(orig_df)):
        id1_list.append(orig_df['id1'].values[i])
        id2_list.append(orig_df['id2'].values[i])
    return(id1_list,id2_list)    
idlist1, idlist2=get_ids(df)    
#this is the part that doesn't work
new_id=[]
for i,j in zip(idlist1,idlist2):
    row='_'.join([str(i),str(j)])
    new_id.append(row)
newid_list=[]
for i in range(len(df)):
    n1=df['id1'][i]
    n2=df['id2'][i]
    nid= str(n1)+"_"+str(n2)
    newid_list.append(nid)
points:
- in first approach, when you loop on data, iandjare data, not indices, so use them as data and convert them to string.
- join get listas data and simply define alistusing 2 data:[str(i),str(j)]and pass tojoin
- in second approach, you can get every element of every column using df['id1'][i]and you don’t needvaluesthat return all elements of column as a numpy array
if you want to use values:
(df.id1.values + "_" + df.id2.values.astype(str)).tolist()