I am using lxml and beautifulsoup library, actually my goal is to translate text of the specific tags out of the whole html code, what I want is, I want to replace the text of specific tags with the translated text.
I want to set a loop for the specific xpath in which all the translated text should be inserted one after another. And the html code should be returned with the translated version.
from bs4 import BeautifulSoup, NavigableString, Tag import requests import time import pandas as pd import translators as ts import json import numpy as np import regex import selenium from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import TimeoutException from lxml import html import time import lxml.html #r=requests.get(input('Enter the URL of your HTML page:n')) r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html') soup=BeautifulSoup(r.text, 'html.parser') page=r.content element = html.fromstring(page) try: articles=[] for item in element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]'): texts=item.text_content() #texts=texts.split('"',100) #articles.append(item.text_content()) articles.append(texts) translated_articles=[] for text in articles: print(text) output=ts.google(text, from_language='en', to_language='ro') translated_articles.append(output) for i,z in zip(translated_articles,soup.find_all('p', attrs={'class':'text_obisnuit'})): var=z.string var.replace_with(var.replace(var, i)) #print(soup) except Exception as e: print(e)
I am not getting the whole text from this xpath.
element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]')
The output I am getting:
Everything in Kevin Lomax's life changed after he was recruited by the most powerful law firm in the world, "Milton, Chadwick & Waters". Despite the fact that his mother was not agree, he accepted to provide his services of a professional lawyer to this company headed by none other than John Milton, a very powerful man with a very strange personality, which has aroused some suspicion since their first meeting. If you saw the movie "The Devil's Advocate (1997)", perhaps you remember the end. Milton proposes to Kevin to take over his company, promising that he will have everything in the world, but with a single price - to sell his soul. But Kevin was hiding virtues that Milton did not believe that he has them. AttributeError: 'NoneType' object has no attribute 'replace_with'
I want to extract all text of p tag of ” attribute class=obisnuit” using the above xpath and then translate it using translators library and want to return the whole html code with translated text between p tag of attribute class=obisnuit.
###NOTE:###
There should be a loop to insert the translated text in all these tags, I mean all tags should get its own text after translation using a loop.
I can not explain more, any one guide me please.
Advertisement
Answer
do you need to replace? Can’t you simply just set the string/contnet to the translation?
Also, you are sort of doing some unnecessary loops here. And you would need to fix your indentation as what you want is the for i,z
to be 2 levels up.
try this:
r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html') soup=BeautifulSoup(r.text, 'html.parser') try: articles = soup.find_all('p', {'class':"text_obisnuit"}) for item in articles: original_text=item.text #print(original_text) translated_output=ts.google(original_text, from_language='en', to_language='ro') print(item) item.string = translated_output except Exception as e: print(e) # To see that it was changed for item in articles: print(item) translated_html = str(soup)