I have lists of phrases I would like to convert into columns in a dataframe to be used as inputs for a machine learning model. The code should find the unique phrases in all of the rows of data, create columns for the unique rows and indicate if the phrase is present in the row by showing a 1
if the phrase is present and a 0
if it is missing.
The phrases will look like the following:
JavaScript
x
13
13
1
{"TV", "Internet", "Wireless Internet", "Kitchen", "Free Parking on Premises",
2
"Buzzer/Wireless Intercom", "Heating", "Family/Kid Friendly",
3
"Washer,Dryer", "Smoke Detector", "Carbon Monoxide Detector",
4
"First Aid Kit", "Safety Card", "Fire Extinguisher", "Essentials"
5
}
6
7
{"TV", "Internet", "Wireless Internet", "Air Conditioning", "Kitchen",
8
"Pets Allowed", "Pets live on this property", "Dog(s)", "Heating",
9
"Family/Kid Friendly", "Washer", "Dryer", "Smoke Detector",
10
"Carbon Monoxide Detector", "Fire Extinguisher", "Essentials",
11
"Shampoo", "Lock on Bedroom Door", "Hangers", "Hair Dryer", "Iron"
12
}
13
Desired output in the dataframe:
Advertisement
Answer
First create the columns for the DataFrame:
JavaScript
1
22
22
1
set1 = {"TV", "Internet", "Wireless Internet", "Kitchen", "Free Parking on Premises",
2
"Buzzer/Wireless Intercom", "Heating", "Family/Kid Friendly",
3
"Washer,Dryer", "Smoke Detector", "Carbon Monoxide Detector",
4
"First Aid Kit", "Safety Card", "Fire Extinguisher", "Essentials"
5
}
6
7
set2 = {"TV", "Internet", "Wireless Internet", "Air Conditioning", "Kitchen",
8
"Pets Allowed", "Pets live on this property", "Dog(s)", "Heating",
9
"Family/Kid Friendly", "Washer", "Dryer", "Smoke Detector",
10
"Carbon Monoxide Detector", "Fire Extinguisher", "Essentials",
11
"Shampoo", "Lock on Bedroom Door", "Hangers", "Hair Dryer", "Iron"
12
}
13
14
# Create a list of iterables for later
15
list_of_sets = [set1, set2]
16
17
# Create a list with the "splat" operator, and then create a set from the list
18
columns = set([*set1, *set2])
19
20
# Optionally remove spaces, commas, etc
21
columns_optional = set([x.replace(" ", "").replace(",", "").replace("/", "") for x in columns])
22
Now to create the DataFrame rows:
JavaScript
1
30
30
1
def create_rows(list_of_iterables, columns):
2
"""Iterate through list of iterables (i.e. sets of words)
3
and check if they're in the columns"""
4
5
list_of_df_rows = []
6
for iterable in list_of_iterables:
7
row_dict = {}
8
for col in columns:
9
# Set it to zero at first
10
row_dict[col] = 0
11
for item in iterable:
12
if col == item:
13
# Change it to 1 if we found a match
14
row_dict[col] = 1
15
16
list_of_df_rows.append(row_dict)
17
18
return list_of_df_rows
19
20
# Create DataFrame rows
21
rows = create_rows(list_of_sets, columns)
22
23
# Create DataFrame that's tall, not wide, at first
24
df = pd.DataFrame(rows, columns=columns)
25
26
print(df)
27
>>> Air Conditioning TV Free Parking on Premises Washer,Dryer
28
0 0 1 1 1
29
1 1 1 0 0
30