I am trying to implement the so called ‘concurrent’ softmax function given in the paper “Large-Scale Object Detection in the Wild from Imbalanced Multi-Labels”. Below is the definition of the concurrent softmax: NOTE: I have left the (1-rij) term out for the time being because I don’t think it applies to my problem given that my training dataset has a