Skip to content
Advertisement

I get the same output for a classifier algorithm with sklearn and pandas

Problem

I get the same output everytime regardless of the input.

Context

I have a .csv with IDs that represent a team of 5 persons (previously formed teams) like this:

JavaScript

My goal with the following code is to be able to input 4 IDs and get a prediction of what the 5th member should be.

JavaScript

Advertisement

Answer

Mainstream statistical machine learning assumes that it’s possible to predict an attribute of an object based on other observed attributes.

In the problem presented here: there are no attributes. Each row represents a previously observed team, and each column represents an identifier attribute of a team member. In other words: it is not clear how we would build a model.


There’s an alternate way to frame this problem though: “Which people prefer to work together?” or “What frequent patterns exist in this data?” or “How do we expect each person to rate one another?

Apriori” is an algorithm that helps estimate which objects (team members) frequently appear together, and mlxtend provides an implementation:

JavaScript

The output includes itemsets and their support (basically a measure of how frequently they were observed together).

JavaScript

For example: this tells us that user 2 has previously appeared in 80% of the teams, and this tells us that users 1, 2, and 4 worked together 60% of the time.

If we were trying to form groups in the future: we might sample from users who worked with one another previously, and randomly add or remove people until everyone was on a team.

Advertisement