How to change names of scraped images with Python?

So I need to download the images of every coin on the list on CoinGecko, so I wrote the following code:

import requests
from bs4 import BeautifulSoup
from os.path  import basename

def getdata(url): 
    r = requests.get(url) 
    return r.text 
    
htmldata = getdata("https://www.coingecko.com/en") 
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
    link = item1.get('data-src').replace('thumb', 'thumb_2x')
    with open(basename(link), "wb") as f:
            f.write(requests.get(link).content)

JavaScript
​x
 
import requests
from bs4 import BeautifulSoup
from os.path  import basename
​
def getdata(url): 
    r = requests.get(url) 
    return r.text 
    
htmldata = getdata("https://www.coingecko.com/en") 
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
    link = item1.get('data-src').replace('thumb', 'thumb_2x')
    with open(basename(link), "wb") as f:
            f.write(requests.get(link).content)
​

However, I need to save the images with their names being the same as the ticker of the coin of that list from CoinGecko (rename bitcoin.png?1547033579 to BTC.png, ethereum.png?1595348880 to ETH.png, and so forth). There are over 7000 images that need to be renamed, and many of them have quite unique names, so slicing does not work here.

What is the way to do it?

Answer

I was browsing the html file and I found that the tag you are looking at has an alt parameter that has the ticker on the end of the string.

<div class="coin-icon mr-2 center flex-column">
<img class="" alt="bitcoin (BTC)" data-src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x" src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x">
</div>

JavaScript
 
<div class="coin-icon mr-2 center flex-column">
<img class="" alt="bitcoin (BTC)" data-src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x" src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x">
</div>
​

So we can use that to get the correct name like so:

import requests
from bs4 import BeautifulSoup
from os.path  import basename

def getdata(url): 
    r = requests.get(url) 
    return r.text 
    
htmldata = getdata("https://www.coingecko.com/en") 
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
    link = item1.get('data-src').replace('thumb', 'thumb_2x')
    raw_name = item1.get('alt')
    name = raw_name[raw_name.find('(') + 1:-1]
    with open(basename(name), "wb") as f:
            f.write(requests.get(link).content)

JavaScript
 
import requests
from bs4 import BeautifulSoup
from os.path  import basename
​
def getdata(url): 
    r = requests.get(url) 
    return r.text 
    
htmldata = getdata("https://www.coingecko.com/en") 
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
    link = item1.get('data-src').replace('thumb', 'thumb_2x')
    raw_name = item1.get('alt')
    name = raw_name[raw_name.find('(') + 1:-1]
    with open(basename(name), "wb") as f:
            f.write(requests.get(link).content)
​

We are basically extracting the value between the parenthesis using string slicing.

Advertisement

Answer