I am using urllib to get a string of html from a website and need to put each word in the html document into a list.
Here is the code I have so far. I keep getting an error. I have also copied the error below.
import urllib.request url = input("Please enter a URL: ") z=urllib.request.urlopen(url) z=str(z.read()) removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?|`~-=_+", " ") words = removeSpecialChars.split() print ("Words list: ", words[0:20])
Here is the error.
Please enter a URL: http://simleyfootball.com Traceback (most recent call last): File "C:Usersjeremy.KLUGMy DocumentsLiClipse WorkspacePython Project 2Module2.py", line 7, in <module> removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?|`~-=_+", " ") TypeError: replace() takes at least 2 arguments (1 given)
Advertisement
Answer
str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this:
removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?|`~-=_+"})
This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.