I have a Dataframe as follows:
import pandas as pd
df = pd.DataFrame({'Target': [0 ,1, 2], 
                   'Source': [1, 0, 3],
                    'Count': [1, 1, 1]})
I have to count how many pairs of Sources and Targets there are. (1,0) and (0,1) will be treated as duplicate, hence the count will be 2.
I need to do it several times as I have 79 nodes in total. Any help will be much appreciated.
Advertisement
Answer
import pandas as pd
# instantiate without the 'count' column to start over
In[1]: df = pd.DataFrame({'Target': [0, 1, 2], 
                          'Source': [1, 0, 3]})
Out[1]:      Target  Source
        0    0       1
        1    1       0
        2    2       3
To count pairs regardless of their order is possible by converting to numpy.ndarray and sorting the rows to make them identical:
In[1]: array = df.values
In[2]: array.sort(axis=1)
In[3]: array
Out[3]: array([[0, 1],
               [0, 1],
               [2, 3]])
And then turn it back to a DataFrame to perform .value_counts():
In[1]: df_sorted = pd.DataFrame(array, columns=['value1', 'value2'])
In[2]: df_sorted.value_counts()
Out[2]: value1  value2
        0       1         2
        2       3         1
        dtype: int64