Skip to content
Advertisement

Map different column values with website context

I have a dataframe like this:

JavaScript

What I want is to map the columns values with their description from this site https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/system-stored-procedures-transact-sql?view=sql-server-ver15

So for example this value EXEC sp_droplogin can be mapped with the description from here https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-droplogin-transact-sql?view=sql-server-ver15

so the output will look like this:

JavaScript

And the same must be done with the other column values.

What is the best way to perform this? With BeautifulSoup?

Can you provide some ideas/direction/code etc?

Advertisement

Answer

You could call a function for each index entry and replace it with the results of a requests beautifulsoup lookup:

JavaScript

This would change your dataframe as follows:

JavaScript
  1. First take value e.g. EXEC sp_droplogin and split it on the space. Then take the second part sp_droplogin and replace any _ with - which is needed for the URL.

  2. Create a suitable URL based on name.

  3. Use requests.get() to obtain the corresponding HTML from the Microsoft site.

  4. Locate a <div class='content'> which holds the description.

  5. Inside that div, locate all the <p> elements and extract the text for each. The fourth entry holds the required text. Return that.

If there are None values, you would need to test for this and return a suitable value:

JavaScript

For your updated example, I suggest you use a dictionary to keep the results of each request to avoid looking up the same value multiple times.

You can use .applymap() to run the function for all items in the dataframe.

Lastly, if value does not start with exec then just return the value unchanged (or whatever you prefer)

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement