Skip to content
Advertisement

Reformatting a string to simulate a json using python regex

What I want to do is essentially reformat a string and make it pass the jsonschema validate function.

Simple enough at face value. However, the tricky part is that the string is being read in from a file and can vary in it’s appearance and formatting.

Example being

JavaScript

OR

JavaScript

Or any possible combination of single quotes, double quotes, no quotes, arrays, strings etc and may contain an apostrophe etc etc.

I want to be liberal enough with the regex rules and treat the input data as unstructured, formatting the code passed at run time as uniform, in order to check it’s quality.

Using python, my approach so far has been to try and iterate over the most common transforms I wish to perform – however, I know that what I want to do is achievable, probably very obviously so and even though I haven’t completed it yet, I thought I’d try and save the headache and ask for some help.

A more thorough example I have is here (Have tried to demo most of the issues I’ve found so far):

JavaScript

Output I want:

JavaScript

I have tried this as my first step – but I’m convinced I’m going about this the hard way. I wanted to try and combine negative lookaheads and lookbehinds, so I could globally swap out all single quotes for double quotes, avoiding instances where a single quote was sandwiched in between 2 letters – but I don’t think I’m smart enough to get it on my own. All help much appreciated.

Thanks

Advertisement

Answer

You have a little issue with the word “isn’t” in considering the string

JavaScript

Now, you can do this about it:

JavaScript

which will produce this string

JavaScript

Not, the most beautiful thing. After this, you simply do the following (you need to import dirtyjson):

JavaScript

which gives you

JavaScript
Advertisement