Replace special characters in a string in Python

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

Here is the code I have so far. I keep getting an error. I have also copied the error below.

import urllib.request

url = input("Please enter a URL: ")

z=urllib.request.urlopen(url)
z=str(z.read())
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?|`~-=_+", " ")

words = removeSpecialChars.split()

print ("Words list: ", words[0:20])

JavaScript
​x
 
import urllib.request
​
url = input("Please enter a URL: ")
​
z=urllib.request.urlopen(url)
z=str(z.read())
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?|`~-=_+", " ")
​
words = removeSpecialChars.split()
​
print ("Words list: ", words[0:20])
​

Here is the error.

Please enter a URL: http://simleyfootball.com
Traceback (most recent call last):
  File "C:Usersjeremy.KLUGMy DocumentsLiClipse WorkspacePython Project 2Module2.py", line 7, in <module>
    removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?|`~-=_+", " ")
TypeError: replace() takes at least 2 arguments (1 given)

JavaScript
 
Please enter a URL: http://simleyfootball.com
Traceback (most recent call last):
  File "C:Usersjeremy.KLUGMy DocumentsLiClipse WorkspacePython Project 2Module2.py", line 7, in <module>
    removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?|`~-=_+", " ")
TypeError: replace() takes at least 2 arguments (1 given)
​

Answer

str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this:

removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?|`~-=_+"})

JavaScript
 
removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?|`~-=_+"})
​

This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.

Advertisement

Answer