I have a filenames which I converted into a list. The list has the following elements:
JavaScript
x
8
1
list = ['15253_Variation.JPG',
2
'15253_Variation_Tainted.JPG',
3
'15253_Variation_O2_Saxophone.PNG',
4
'15253_Variation_O2_Saxophone.jpg',
5
'15253_Variation_O2_Saxophone_reference.png',
6
'15253_Variation_Side1.JPG',
7
'15253_Variation_Side2.JPG']
8
My goal is to extract elements from this list and fill out a dataframe, which should look like this:
LINK TO THE GOOGLE SHEETS CONTAINING THE IMAGE ABOVE: https://docs.google.com/spreadsheets/d/1kuX3M4RFCNWtNoE7Hm1ejxWMwF-Cs4p8SsjA3JzdidA/edit?usp=sharing
WHAT I’VE DONE SO FAR is the following code:
JavaScript
1
5
1
Obj = pd.DataFrame(data = list, index = None, columns = ['file'])
2
new_list = []
3
for i in Obj['file']:
4
new_list.append(i.split('_'))
5
But, this one does not leave empty spaces thus not doing what I needed.
Thank you very much in advanced.
Advertisement
Answer
As per number of the comments. It’s a pain because the tokens in the filename are not fully fixed format. Quite a lot of conditional logic
- have defined two additional lists
mi
instruments andoxygen
whatever it is. - first pass is in building
dict
that is pandas standard format - then work through conditional logic after have base data frame
JavaScript
1
46
46
1
# don't name it list - it override python list()!
2
l = ['15253_Variation.JPG',
3
'15253_Variation_Tainted.JPG',
4
'15253_Variation_O2_Saxophone.PNG',
5
'15253_Variation_O2_Saxophone.jpg',
6
'15253_Variation_O2_Saxophone_reference.png',
7
'15253_Variation_Side1.JPG',
8
'15253_Variation_Side2.JPG']
9
10
issues = ["Tainted","Perfect"]
11
mi = ["Saxophone"]
12
oxygen = ["O2"]
13
14
# first pass using dict/list comprehensions
15
df = pd.DataFrame({"filename":{i:f.split(".")[0] for i,f in enumerate(l)},
16
"Number":{i:f.split("_")[0] for i,f in enumerate(l)},
17
"Name":{i:f.split("_")[1].split(".")[0] for i,f in enumerate(l)},
18
"Location2":{},
19
"Issues":{}, "Oxygen":{},"Location":{}, "Musical Instrument":{},
20
"Ref":{},
21
"Extension":{i:f.split(".")[1] for i,f in enumerate(l)}})
22
23
df = df.assign(**{
24
# list of tokens for checking fixed lists against
25
"Tokens":lambda dfa: dfa.apply(lambda s: s["filename"].split("_")[2:], axis=1),
26
"Issues":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s, issues))[0][0]]
27
if np.isin(s, issues).any() else np.nan),
28
"Musical Instrument":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s, mi))[0][0]]
29
if np.isin(s, mi).any() else np.nan),
30
"Oxygen":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s, oxygen))[0][0]]
31
if np.isin(s, oxygen).any() else np.nan),
32
}).assign(**{
33
# let's do tokens again minus ones already placed
34
"Tokens":lambda dfa: dfa.apply(lambda s: [t for t in s["filename"].split("_")[2:]
35
if not(t==s["Issues"]
36
or t==s["Musical Instrument"]
37
or t==s["Oxygen"])], axis=1),
38
"Location2":lambda dfa: dfa.apply(lambda s: s["Tokens"][0] if len(s["Tokens"])>0
39
and "Side" in s["Tokens"][0] else np.nan, axis=1),
40
"Ref":lambda dfa: dfa.apply(lambda s: s["Tokens"][0] if len(s["Tokens"])>0
41
and "Side" not in s["Tokens"][0] else np.nan, axis=1)
42
43
}).drop(columns=["Tokens","filename"])
44
45
print(df.to_string(index=False))
46
output
JavaScript
1
9
1
Number Name Location2 Issues Oxygen Location Musical Instrument Ref Extension
2
15253 Variation NaN NaN NaN NaN NaN NaN JPG
3
15253 Variation NaN Tainted NaN NaN NaN NaN JPG
4
15253 Variation NaN NaN O2 NaN Saxophone NaN PNG
5
15253 Variation NaN NaN O2 NaN Saxophone NaN jpg
6
15253 Variation NaN NaN O2 NaN Saxophone reference png
7
15253 Variation Side1 NaN NaN NaN NaN NaN JPG
8
15253 Variation Side2 NaN NaN NaN NaN NaN JPG
9