Skip to content
Advertisement

Python- trying to make new list combining values from other list

I’m trying to use two columns from an existing dataframe to generate a list of new strings with those values. I found a lot of examples doing something similar, but not the same thing, so I appreciate advice or links elsewhere if this is a repeat question. Thanks in advance!

If I start with a data frame like this:

import pandas as pd

df=pd.DataFrame(data=[["a",1],["b",2],["c",3]], columns=["id1","id2"])

  id1  id2
0   a    1
1   b    2
2   c    3

I want to make a list that looks like new_ids=[‘a_1′,’b_2′,’c_3’] where values are from combining values in row 0 for id1 with values for row 0 for id2 and so on.

I started by making lists from the columns, but can’t figure out how to combine them into a new list. I also tried not using intermediate lists, but couldn’t get that either. Error messages below are accurate to the mock data, but are different from the ones with real data.

#making separate lists version
#this function works
def get_ids(orig_df):
    id1_list=[]
    id2_list=[]

    for i in range(len(orig_df)):
        id1_list.append(orig_df['id1'].values[i])
        id2_list.append(orig_df['id2'].values[i])
    return(id1_list,id2_list)    

idlist1,idlist2=get_ids(df)    

#this is the part that doesn't work
new_id=[]
for i,j in zip(idlist1,idlist2):
   row='_'.join(str(idlist1[i]),str(idlist2[j]))
   new_id.append(row)

#------------------------------------------------------------------------
#AttributeError                            Traceback (most recent call #last)
#<ipython-input-44-09983bd890a6> in <module>
#      1 newid_list=[]
#      2 for i in range(len(df)):
#----> 3     n1=df['id1'[i].values]
#      4     n2=df['id2'[i].values]
#      5     nid= str(n1)+"_"+str(n2)

#AttributeError: 'str' object has no attribute 'values'


#skipping making lists (also doesn't work)
newid_list=[]
for i in range(len(df)):
    n1=df['id1'[i].values]
    n2=df['id2'[i].values]
    nid= str(n1)+"_"+str(n2)
    newid_list.append(nid)

#---------------------------------------------------------------------------
#TypeError                                 Traceback (most recent call last)
#<ipython-input-41-6b0c949a1ad5> in <module>
#      1 new_id=[]
#      2 for i,j in zip(idlist1,idlist2):
#----> 3    row='_'.join(str(idlist1[i]),str(idlist2[j]))
#      4    new_id.append(row)
#      5    #return ', '.join(new_id)

#TypeError: list indices must be integers or slices, not str

Advertisement

Answer

(df.id1 + "_" + df.id2.astype(str)).tolist()

output:

['a_1', 'b_2', 'c_3']

your approaches(corrected):

def get_ids(orig_df):
    id1_list=[]
    id2_list=[]

    for i in range(len(orig_df)):
        id1_list.append(orig_df['id1'].values[i])
        id2_list.append(orig_df['id2'].values[i])
    return(id1_list,id2_list)    

idlist1, idlist2=get_ids(df)    
#this is the part that doesn't work
new_id=[]
for i,j in zip(idlist1,idlist2):
    row='_'.join([str(i),str(j)])
    new_id.append(row)


newid_list=[]
for i in range(len(df)):
    n1=df['id1'][i]
    n2=df['id2'][i]
    nid= str(n1)+"_"+str(n2)
    newid_list.append(nid)

points:

  1. in first approach, when you loop on data, i and j are data, not indices, so use them as data and convert them to string.
  2. join get list as data and simply define a list using 2 data: [str(i),str(j)] and pass to join
  3. in second approach, you can get every element of every column using df['id1'][i] and you don’t need values that return all elements of column as a numpy array

if you want to use values:

(df.id1.values + "_" + df.id2.values.astype(str)).tolist()
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement