I have a list-of-list of word groups in Turkish. I want to apply stemming and I found turkishnlp package. Although it has some shortcomings, it often returns the right word. However, when I apply this to the list, I don’t want the structure of my list to change and I want the words that he doesn’t know to stay the same.
For example, I have this list: mylist = [[‘yolda’,’gelirken’,’kopek’, ‘gördüm’],[‘cok’, ‘tatlıydı’]]
And I wrote this function:
JavaScript
x
6
1
from trnlp import TrnlpWord
2
def tr_stemming(x):
3
obj = TrnlpWord()
4
obj.setword(x) if isinstance(x, str) else type(x)(map(tr_stemming, x))
5
return obj.get_stem if isinstance(x, str) else type(x)(map(tr_stemming, x))
6
This function returns this list:
JavaScript
1
2
1
tr_stemming(mylist)
2
[[‘yol’, ‘gelir’, ”, ‘gör’], [”, ‘tatlı’]]
However, I want to get this as the output: [[‘yol’, ‘gelir’, ‘kopek’, ‘gör’], [‘cok’, ‘tatlı’]]
How can I update my function? Thank you for your helps!
Advertisement
Answer
IIUC, you could modify your function to:
JavaScript
1
11
11
1
def tr_stemming(x):
2
if isinstance(x, str):
3
obj = TrnlpWord()
4
obj.setword(x)
5
stem = obj.get_stem
6
return stem if stem else x
7
elif isinstance(x, list):
8
return [tr_stemming(e) for e in x]
9
10
out = tr_stemming(mylist)
11
output:
JavaScript
1
2
1
[['yol', 'gelir', 'kopek', 'gör'], ['cok', 'tatlı']]
2