Skip to content
Advertisement

Why do I get a stray element with difflib.ndiff?

Minimal working example:

In [3]: a = ('r1', 'r2', 'r11', 'r6', 'r1', 'r2', 'r7', 'r8')                                                                                           

In [4]: b = ('r1', 'r2', 'r1', 'r6', 'r1', 'r2', 'r7', 'r8')                                                                                            

In [5]: list(difflib.ndiff(a, b))                                                                                                                       
Out[5]: 
['  r1',
 '  r2',
 '- r11',
 '?   -n',
 '+ r1',
 '  r6',
 '  r1',
 '  r2',
 '  r7',
 '  r8']

Can someone please explain why there’s a newline character as the fourth element in the output list? What can I do to not get that element as ndiff output, but only the rest of the list?

Advertisement

Answer

Because ndiff expects the lines you pass in to end with newline characters, like this:

a = ('r1n', 'r2n', 'r11n', 'r6n', 'r1n', 'r2n', 'r7n', 'r8n')
b = ('r1n', 'r2n', 'r1n', 'r6n', 'r1n', 'r2n', 'r7n', 'r8n')

In the docs for difflib.Differ.compare, which is what .ndiff() calls under the hood, we see this (emphasis mine):

compare(a, b)

Compare two sequences of lines, and generate the delta (a sequence of lines).

Each sequence must contain individual single-line strings ending with newlines. Such sequences can be obtained from the readlines() method of file-like objects. The delta generated also consists of newline-terminated strings, ready to be printed as-is via the writelines() method of a file-like object.

The output you’re getting makes sense, lines that start with ? are for highlighting what changed. In this case it’s drawing a - under the second 1 in r11 to show you that it was deleted. difflib is expecting that you will use the output like this

print(''.join(difflib.ndiff(a, b)))

so it needs to end any lines it adds with a newline.

You can add the newlines to your original values with a list comprehension

a = [line + "n" for line in a]
b = [line + "n" for line in b]
Advertisement