Skip to content
Advertisement

Filter raw string with sheltered simbols, python

Is it any chance to get cities names from the following raw strings without heavy iterations?

'nSegelbåt ntttttttttnttttttttntttttttttnttttt
tttttntttttttttttnttttttttttttÄlvsborgntttttttt
ttttntttttttttttnttttttttttntttttttntttttttnttttttnttttt'

'nSegelbåt ntttttttttnttttttttntttttttttntttt
ttttttntttttttttttnttttttttttttÄlvsborgnttt
tttttttttntttttttttttnttttttttttntttttttntttttttnttttttnttttt', 

'nButiknSegelbåt ntttttttttnttttttttnttttttttt
nttttttttttntttttttttttnttttttttttttStockholmntttt
ttttttttntttttttttttnttttttttttntttttttnt
ttttttnttttttnttttt'

need to get Älvsborg, Stockholm, etc, that is name of a cities, towns. Names will be different of cource

Function is already heavy with iterations, so that build or add-on functions/methods are preferable.

also it is possible to get them in the following format:

SegelbåtÄlvsborg
ButikSegelbåtStockholm
ButikSegelbåtStockholm
SegelbåtJönköping
SegelbåtGöteborg
ButikSegelbåtGöteborg
ButikSegelbåtGöteborg
SegelbåtSkaraborg
SegelbåtStockholm
SegelbåtStockholm
SegelbåtHalland
SegelbåtStockholm
ButikSegelbåtHelsingborg
SegelbåtStockholm
ButikSegelbåtKalmar
SegelbåtGöteborg
ButikSegelbåtGöteborg
ButikSegelbåtÖstergötland
ButikSegelbåt
ButikSegelbåtGöteborg
ButikSegelbåtGöteborg
ButikSegelbåtGöteborg
ButikSegelbåtGöteborg
SegelbåtStockholm
ButikSegelbåtHelsingborg
SegelbåtKalmar
SegelbåtGöteborg

which doesn’t make job easier.

Thank you!

p.s. i can separate letters and sheltered symbols like this in FOR cycle:

letters = ''.join(filter(lambda x: False if x.isspace() else True,
                                     place.get_text()

And after that i still need to separate cities names somehow…

Advertisement

Answer

You can just use str.split:

In [1]: s = 'nSegelbåt ntttttttttnttttttttntttttttttnttttttttttntttttttttttnttttttttttttÄlvsborgntttttttt
  ... ttttntttttttttttnttttttttttntttttttntttttttnttttttnttttt'

In [2]: s.split()  # when called with no argument it splits on all whitespace
Out[2]: ['Segelbåt', 'Älvsborg']

The city name seems to be the last element:

In [3]: s.split()[-1]
Out[3]: 'Älvsborg'

It looks like you’re parsing HTML with BeautifulSoup. You may find it easier to select the proper elements directly instead of parsing what .get_text() produces.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement