Skip to content
Advertisement

Word count from different URL’s in Python

I have the following code which provides me with the columns: Authors, Date, Blog name, Link and blog category

To further enhance this, I want to add the word count of the article and the author, separately

JavaScript

The updated columns I am trying to achieve are: Authors, Date, Blog name, Link, blog category, description count, about count

Example: For the 1st article: https://www.bartonassociates.com/blog/updated-can-an-np-do-that-infographic

I am trying to get the count of everything from “Happy” to “today” as my “description count” (with the similar concept for the about count)

Tried Solution

I was able to disseminate each of the links under the ‘new’ variable and get the required text for “div.cf > p”. However they are texts for all the links, how could I map each of the paragraph’s to their respective links?

JavaScript

Even if I add “txt = p.get_text(strip=True)” I just get the last articles author bio and not all the information

Advertisement

Answer

Example: For the 1st article

You’re not getting the HTML for that article anywhere…

If I understand correctly, you have gotten a list of links, but you want the content of the articles those links refer to

Therefore, you need to make a new request/parser for each of those links

JavaScript

I suggest you define some subroutine function that accepts a link to an “article” that you want to parse, then return the data you expect. Then explicitly test that using the link given in your post and other articles.

From there, see, for example Python – Count number of words in a list strings


Worth mentioning, that the site may prevent you from making lots of requests (requesting all articles in the list in quick succession), and there is little that can be done about that.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement