I want to replace the html code with my own

I am using lxml and beautifulsoup library, actually my goal is to translate text of the specific tags out of the whole html code, what I want is, I want to replace the text of specific tags with the translated text.

I want to set a loop for the specific xpath in which all the translated text should be inserted one after another. And the html code should be returned with the translated version.

from bs4 import BeautifulSoup, NavigableString, Tag
import requests
import time
import pandas as pd
import translators as ts
import json
import numpy as np
import regex
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from lxml import html
import time
import lxml.html



#r=requests.get(input('Enter the URL of your HTML page:n'))
r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')
page=r.content
element = html.fromstring(page)




try:
    articles=[]
    for item in element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]'):  

        texts=item.text_content()
        #texts=texts.split('"',100)
        #articles.append(item.text_content())
        articles.append(texts)
        translated_articles=[]
        for text in articles:
            print(text)
            output=ts.google(text, from_language='en', to_language='ro')
            translated_articles.append(output)
            
            for i,z in zip(translated_articles,soup.find_all('p', attrs={'class':'text_obisnuit'})):
                var=z.string
                var.replace_with(var.replace(var, i))

    
    #print(soup)

except Exception as e:
    print(e)

JavaScript
​x
 
from bs4 import BeautifulSoup, NavigableString, Tag
import requests
import time
import pandas as pd
import translators as ts
import json
import numpy as np
import regex
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from lxml import html
import time
import lxml.html
​
​
​
#r=requests.get(input('Enter the URL of your HTML page:n'))
r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')
page=r.content
element = html.fromstring(page)
​
​
​
​
try:
    articles=[]
    for item in element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]'):  
​
        texts=item.text_content()
        #texts=texts.split('"',100)
        #articles.append(item.text_content())
        articles.append(texts)
        translated_articles=[]
        for text in articles:
            print(text)
            output=ts.google(text, from_language='en', to_language='ro')
            translated_articles.append(output)
            
            for i,z in zip(translated_articles,soup.find_all('p', attrs={'class':'text_obisnuit'})):
                var=z.string
                var.replace_with(var.replace(var, i))
​
    
    #print(soup)
​
except Exception as e:
    print(e)
​

I am not getting the whole text from this xpath.

element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]')

JavaScript
 
element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]')
​

The output I am getting:

Everything in Kevin Lomax's life changed after he was recruited by the most powerful law firm in the world, "Milton, Chadwick & Waters". Despite the fact that his mother was not agree, he accepted to provide his services of a professional lawyer to this company headed by none other than John Milton, a very powerful man with a very strange personality, which has aroused some suspicion since their first meeting.
If you saw the movie "The Devil's Advocate (1997)", perhaps you remember the end. Milton proposes to Kevin to take over his company, promising that he will have everything in the world, but with a single price - to sell his soul. But Kevin was hiding virtues that Milton did not believe that he has them.
AttributeError: 'NoneType' object has no attribute 'replace_with'

JavaScript
 
Everything in Kevin Lomax's life changed after he was recruited by the most powerful law firm in the world, "Milton, Chadwick & Waters". Despite the fact that his mother was not agree, he accepted to provide his services of a professional lawyer to this company headed by none other than John Milton, a very powerful man with a very strange personality, which has aroused some suspicion since their first meeting.
If you saw the movie "The Devil's Advocate (1997)", perhaps you remember the end. Milton proposes to Kevin to take over his company, promising that he will have everything in the world, but with a single price - to sell his soul. But Kevin was hiding virtues that Milton did not believe that he has them.
AttributeError: 'NoneType' object has no attribute 'replace_with'
​

I want to extract all text of p tag of ” attribute class=obisnuit” using the above xpath and then translate it using translators library and want to return the whole html code with translated text between p tag of attribute class=obisnuit.

###NOTE:###

There should be a loop to insert the translated text in all these tags, I mean all tags should get its own text after translation using a loop.

I can not explain more, any one guide me please.

Answer

do you need to replace? Can’t you simply just set the string/contnet to the translation?

Also, you are sort of doing some unnecessary loops here. And you would need to fix your indentation as what you want is the for i,z to be 2 levels up.

try this:

r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')

try:
    articles = soup.find_all('p', {'class':"text_obisnuit"})
    for item in articles:  

        original_text=item.text
        #print(original_text)
        translated_output=ts.google(original_text, from_language='en', to_language='ro')
        print(item)

        item.string = translated_output
            
except Exception as e:
    print(e)

# To see that it was changed
for item in articles:   
    print(item)


translated_html = str(soup)

JavaScript
 
r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')
​
try:
    articles = soup.find_all('p', {'class':"text_obisnuit"})
    for item in articles:  
​
        original_text=item.text
        #print(original_text)
        translated_output=ts.google(original_text, from_language='en', to_language='ro')
        print(item)
​
        item.string = translated_output
            
except Exception as e:
    print(e)
​
# To see that it was changed
for item in articles:   
    print(item)
​
​
translated_html = str(soup)
​

Advertisement

Answer