I have a Dataframe as follows:
import pandas as pd df = pd.DataFrame({'Target': [0 ,1, 2], 'Source': [1, 0, 3], 'Count': [1, 1, 1]})
I have to count how many pairs of Sources and Targets there are. (1,0) and (0,1) will be treated as duplicate, hence the count will be 2.
I need to do it several times as I have 79 nodes in total. Any help will be much appreciated.
Advertisement
Answer
import pandas as pd # instantiate without the 'count' column to start over In[1]: df = pd.DataFrame({'Target': [0, 1, 2], 'Source': [1, 0, 3]}) Out[1]: Target Source 0 0 1 1 1 0 2 2 3
To count pairs regardless of their order is possible by converting to numpy.ndarray
and sorting the rows to make them identical:
In[1]: array = df.values In[2]: array.sort(axis=1) In[3]: array Out[3]: array([[0, 1], [0, 1], [2, 3]])
And then turn it back to a DataFrame to perform .value_counts()
:
In[1]: df_sorted = pd.DataFrame(array, columns=['value1', 'value2']) In[2]: df_sorted.value_counts() Out[2]: value1 value2 0 1 2 2 3 1 dtype: int64