Skip to content
Advertisement

How do I add the counts of two rows where the values in the columns are swapped with respect of the other?

I have a Dataframe as follows:

import pandas as pd
df = pd.DataFrame({'Target': [0 ,1, 2], 
                   'Source': [1, 0, 3],
                    'Count': [1, 1, 1]})

I have to count how many pairs of Sources and Targets there are. (1,0) and (0,1) will be treated as duplicate, hence the count will be 2.

I need to do it several times as I have 79 nodes in total. Any help will be much appreciated.

Advertisement

Answer

import pandas as pd
# instantiate without the 'count' column to start over
In[1]: df = pd.DataFrame({'Target': [0, 1, 2], 
                          'Source': [1, 0, 3]})

Out[1]:      Target  Source
        0    0       1
        1    1       0
        2    2       3

To count pairs regardless of their order is possible by converting to numpy.ndarray and sorting the rows to make them identical:

In[1]: array = df.values
In[2]: array.sort(axis=1)
In[3]: array

Out[3]: array([[0, 1],
               [0, 1],
               [2, 3]])

And then turn it back to a DataFrame to perform .value_counts():

In[1]: df_sorted = pd.DataFrame(array, columns=['value1', 'value2'])
In[2]: df_sorted.value_counts()

Out[2]: value1  value2
        0       1         2
        2       3         1
        dtype: int64
Advertisement