I’m trying to use two columns from an existing dataframe to generate a list of new strings with those values. I found a lot of examples doing something similar, but not the same thing, so I appreciate advice or links elsewhere if this is a repeat question. Thanks in advance!
If I start with a data frame like this:
import pandas as pd df=pd.DataFrame(data=[["a",1],["b",2],["c",3]], columns=["id1","id2"]) id1 id2 0 a 1 1 b 2 2 c 3
I want to make a list that looks like new_ids=[‘a_1′,’b_2′,’c_3’] where values are from combining values in row 0 for id1 with values for row 0 for id2 and so on.
I started by making lists from the columns, but can’t figure out how to combine them into a new list. I also tried not using intermediate lists, but couldn’t get that either. Error messages below are accurate to the mock data, but are different from the ones with real data.
#making separate lists version #this function works def get_ids(orig_df): id1_list=[] id2_list=[] for i in range(len(orig_df)): id1_list.append(orig_df['id1'].values[i]) id2_list.append(orig_df['id2'].values[i]) return(id1_list,id2_list) idlist1,idlist2=get_ids(df) #this is the part that doesn't work new_id=[] for i,j in zip(idlist1,idlist2): row='_'.join(str(idlist1[i]),str(idlist2[j])) new_id.append(row) #------------------------------------------------------------------------ #AttributeError Traceback (most recent call #last) #<ipython-input-44-09983bd890a6> in <module> # 1 newid_list=[] # 2 for i in range(len(df)): #----> 3 n1=df['id1'[i].values] # 4 n2=df['id2'[i].values] # 5 nid= str(n1)+"_"+str(n2) #AttributeError: 'str' object has no attribute 'values' #skipping making lists (also doesn't work) newid_list=[] for i in range(len(df)): n1=df['id1'[i].values] n2=df['id2'[i].values] nid= str(n1)+"_"+str(n2) newid_list.append(nid) #--------------------------------------------------------------------------- #TypeError Traceback (most recent call last) #<ipython-input-41-6b0c949a1ad5> in <module> # 1 new_id=[] # 2 for i,j in zip(idlist1,idlist2): #----> 3 row='_'.join(str(idlist1[i]),str(idlist2[j])) # 4 new_id.append(row) # 5 #return ', '.join(new_id) #TypeError: list indices must be integers or slices, not str
Advertisement
Answer
(df.id1 + "_" + df.id2.astype(str)).tolist()
output:
['a_1', 'b_2', 'c_3']
your approaches(corrected):
def get_ids(orig_df): id1_list=[] id2_list=[] for i in range(len(orig_df)): id1_list.append(orig_df['id1'].values[i]) id2_list.append(orig_df['id2'].values[i]) return(id1_list,id2_list) idlist1, idlist2=get_ids(df) #this is the part that doesn't work new_id=[] for i,j in zip(idlist1,idlist2): row='_'.join([str(i),str(j)]) new_id.append(row) newid_list=[] for i in range(len(df)): n1=df['id1'][i] n2=df['id2'][i] nid= str(n1)+"_"+str(n2) newid_list.append(nid)
points:
- in first approach, when you loop on data,
i
andj
are data, not indices, so use them as data and convert them to string. - join get
list
as data and simply define alist
using 2 data:[str(i),str(j)]
and pass tojoin
- in second approach, you can get every element of every column using
df['id1'][i]
and you don’t needvalues
that return all elements of column as a numpy array
if you want to use values
:
(df.id1.values + "_" + df.id2.values.astype(str)).tolist()