Merging two dataframes in pandas without column names (new to pandas)

Question

Short explanation: If you have duplicate column names in your data, be sure to rename one column when you read the file. If you have NaN etc in your data, remove those. Then merge using correct answer below. Probably a pretty simple question. I have two datasets that I read in using pandas.read_csv(). My data is in two separate csv.

Accepted Answer

You should still be able to merge on the columns:merged = underlying.merge(options, left_on='0', right_on='0')This will perform an inner merge so only the intersection of both datasets, i.e. where the values in column 0 exist in both, if you want all values, then specifcy outer:merged = underlying.merge(options, left_on='0', right_on='0', how='outer')In [10]:  merged = underlying.merge(options, left_on='0', right_on='0', how='outer')mergedOut[10]:          0       1_x   1_y         2     3     4      5     6   7      8  0  20040326  3.579987   NaN       NaN   NaN   NaN    NaN   NaN NaN    NaN   1  20040329  3.690494   NaN       NaN   NaN   NaN    NaN   NaN NaN    NaN   2  20040330  3.755247   NaN       NaN   NaN   NaN    NaN   NaN NaN    NaN   3  20040331  3.719373   NaN       NaN   NaN   NaN    NaN   NaN NaN    NaN   4  20040401  3.728671   NaN       NaN   NaN   NaN    NaN   NaN NaN    NaN   5  20130628       NaN  SVXY  20130817  32.5  call  39.22  32.5   0  0.005             9        10  0       NaN       NaN  1       NaN       NaN  2       NaN       NaN  3       NaN       NaN  4       NaN       NaN  5  0.136986  0.411224  [6 rows x 12 columns]You would have to rename or move the columns that clashed 1_x and 1_y above.It is probably better to rename the columns to something sensible before hand.When reading the csv you can pass a list of column names:df = pd.read_csv('data.csv', names=['Id', 'Price'])

Advertisement

Answer