Skip to content

Performance tuning: string wordcount in df

I have a df with column “free text”. I wish to count how many characters and words each cell has. Currently, I do it like this: Problem is, that it is pretty slow. I thought about using np.where but I wasn’t sure how. Would appreciate your help here. Answer IIUC: you can try via str.len() an…

Group by Issue with Years Pandas

I’m following the answer for this StackOverflow post to group a column of years by decades to make it easier for me to visualize later, but I’m not getting the same results. It seems like when DSM did it, it yielded integers for years, while mine is yielding floats for years. I’ve implemente…

How to automatically remove space before punctuation

For example: “This is some text . This is some text” should be “This is some text. This is some text” We can use replase but replacing ‘ .’ with ‘.’, but it’s not a good approach. Please let me know if you have any other idea which is generalised for any p…

Constants module, ran once

So I’m building a constants module that stores in dict some strings: Now to improve the module I want to retrieve the constants automatically, to reduce future maintenance. For this I use a method called get_constants() which returns a dict with the constants. So the module will be: Now to improve the p…