Skip to content
Advertisement

Regex : split on ‘.’ but not in substrings like “J.K. Rowling”

I am looking for names of books and authors in a bunch of texts, like:

JavaScript

Right now I am using the following code to split the text on separators like this:

JavaScript

Even if there are false positive (like ‘that’s it by the way’) my main problem is with authors that are cut when written as initials, which is pretty common.

I can’t figure out how to allow initials like “J. K. Rowling” (or the same without space before / after dot like “J.K.Rowling”)

Advertisement

Answer

change pattern to the following

JavaScript

To allow for initials in the author’s name, we need to make some modifications to the pattern. First, we will add an optional dot after the initial, using the character class “[A-Z]”, which matches any upper case letter, followed by a “.” (dot) and “?” (question mark) to make it optional. Next, we will add an optional space ” ?” after the dot. Next, we will repeat the pattern for multiple initials using “+”.

when I tried your code I with my pattern I got:

JavaScript

It seems to ignore the rest of the authors but it works for authors with initials. let me know if you want me to figure out how to make it work with both initials and non-initials if that make any sense.

Here I solve the problem, it took a while:

JavaScript

which will give:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement