Skip to content
Advertisement

regex or does not work – I do not know what is wrong in my pattern

I have the following strings:

JavaScript

I want to have it separated:

JavaScript

I want all numbers, exact matches for (na, nan, none)-upper and lower cases and “” in first group like:

JavaScript

This would be wrong:

JavaScript

I want

JavaScript

How do I write a regex which checks exact matches like ‘none’ – not case sensitive (should recognize also ‘None’,’nOne’ etc.)?

https://regex101.com/r/HvnZ47/3

Advertisement

Answer

What about the following with re.I:

JavaScript

https://regex101.com/r/d4XPPb/3

Explanation:

  • (None|NaN?|[-d]+)?
    • Either None
    • Or NaN from which the last N is optional (due to ?) so it also matches NA
    • Or digits and dashes one or more times
    • The whole group () is optional due to ? which means it might not be there
  • (.*) Any character to the end

However, there can still be edge cases. Consider the following:

JavaScript

would be parsed as

JavaScript

An alternative:

From here we can keep on making the regex more complex, however, I think that it would be a lot simpler for you to implement custom parsing without regex. Loop characters in each line and:

  • if it starts with digit, parse all digits and dashes into group 1, the rest in group 2 (ie when you hit a character, change group)
  • Take the first 4 chars of the string and if they are “none”, split them out. At the same time ensure that the 5th character is Upper case (case insensitive line[:4].lower() == "none" and line[4].isupper())
  • Similar to the above step but for NA and NaN:
    • line[:3].lower() == "nan" and line[3].isupper()
    • line[:2].lower() == "na" and line[2].isupper()

The above should produce more accurate result and should be a lot easier to read.

Example code:

JavaScript

Data:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement