Skip to content
Advertisement

Convert strings with an unknown number of hex strings embedded in them to strings using regex

So I have a list of strings (content from Snort rules), and I am trying to convert the hex portions of them to UTF-8/ASCII, so I can send the content over netcat.

The method I have now works fine for strings with single hex characters (i.e. 3A), but breaks when there’s a series of hex characters (i.e. 3A 4B 00 FF)

My current solution is:

JavaScript

For the strings in strings, this works, but for a string like:

|08 00 00 00 27 C7 CC 6B C2 FD 13 0E|

it breaks.

I tried changing the regex to:

JavaScript

but that only converts the last hex.

I need this solution to work for strings like Hello|3A|World, |3A 00 FF|, and Hello|3A 00|World

I know it’s an issue with the regexp, but I’m not sure what exactly.

Any help would be much appreciated.

Advertisement

Answer

It looks like a substring is either always hex i.e. (?:[A-Fa-f0-9]{2}s)+[A-Fa-f0-9]{2} or not hex at all between | symbols?

This works:

JavaScript

(extra parentheses for a capturing group 1 – you could leave out one pair of parentheses and change your function to act on group(0) instead)

But it breaks on your example |08 00 00 00 27 C7 CC 6B C2 FD 13 0E|, as that doesn’t appear to be a valid UTF-8 encoding. The resulting error:

JavaScript

However, a valid UTF-8 encoded multi-byte string like '|74 65 73 74 20 f0 9f 98 80|' works just fine:

JavaScript

Result:

JavaScript

If you don’t really need a printable representation of the data, you could just have your function return the bytes object and only apply the function to matching parts – instead of constructing a new string.

Based on what @Selcuk was saying, perhaps a result with byte-strings makes more sense – this works on all three types of input:

JavaScript

Result:

JavaScript

No encoding issues, because no encoding is chosen. (Note that I didn’t attempt to change convert_hex too much – there’s some encoding juggling in there that you may need to look at, I just got it to work for bytes)

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement