Is it any chance to get cities names from the following raw strings without heavy iterations?
'nSegelbåt ntttttttttnttttttttntttttttttnttttt tttttntttttttttttnttttttttttttÄlvsborgntttttttt ttttntttttttttttnttttttttttntttttttntttttttnttttttnttttt' 'nSegelbåt ntttttttttnttttttttntttttttttntttt ttttttntttttttttttnttttttttttttÄlvsborgnttt tttttttttntttttttttttnttttttttttntttttttntttttttnttttttnttttt', 'nButiknSegelbåt ntttttttttnttttttttnttttttttt nttttttttttntttttttttttnttttttttttttStockholmntttt ttttttttntttttttttttnttttttttttntttttttnt ttttttnttttttnttttt'
need to get Älvsborg, Stockholm, etc, that is name of a cities, towns. Names will be different of cource
Function is already heavy with iterations, so that build or add-on functions/methods are preferable.
also it is possible to get them in the following format:
SegelbåtÄlvsborg ButikSegelbåtStockholm ButikSegelbåtStockholm SegelbåtJönköping SegelbåtGöteborg ButikSegelbåtGöteborg ButikSegelbåtGöteborg SegelbåtSkaraborg SegelbåtStockholm SegelbåtStockholm SegelbåtHalland SegelbåtStockholm ButikSegelbåtHelsingborg SegelbåtStockholm ButikSegelbåtKalmar SegelbåtGöteborg ButikSegelbåtGöteborg ButikSegelbåtÖstergötland ButikSegelbåt ButikSegelbåtGöteborg ButikSegelbåtGöteborg ButikSegelbåtGöteborg ButikSegelbåtGöteborg SegelbåtStockholm ButikSegelbåtHelsingborg SegelbåtKalmar SegelbåtGöteborg
which doesn’t make job easier.
Thank you!
p.s. i can separate letters and sheltered symbols like this in FOR cycle:
letters = ''.join(filter(lambda x: False if x.isspace() else True, place.get_text()
And after that i still need to separate cities names somehow…
Advertisement
Answer
You can just use str.split
:
In [1]: s = 'nSegelbåt ntttttttttnttttttttntttttttttnttttttttttntttttttttttnttttttttttttÄlvsborgntttttttt ... ttttntttttttttttnttttttttttntttttttntttttttnttttttnttttt' In [2]: s.split() # when called with no argument it splits on all whitespace Out[2]: ['Segelbåt', 'Älvsborg']
The city name seems to be the last element:
In [3]: s.split()[-1] Out[3]: 'Älvsborg'
It looks like you’re parsing HTML with BeautifulSoup. You may find it easier to select the proper elements directly instead of parsing what .get_text()
produces.