The algorithm used by PIL v1.1.7 gives ‘washed out’ looking results. When converting the same source data using ffmpeg
it looks correct. Using mplayer
gives identical results to ffmpeg
(perhaps they use the same library underneath). This leads me to believe PIL may be stuffing up the colour space conversions. The conversion seems to be sourced in libImaging/ConvertYCbCr.c
:
/* JPEG/JFIF YCbCr conversions Y = R * 0.29900 + G * 0.58700 + B * 0.11400 Cb = R * -0.16874 + G * -0.33126 + B * 0.50000 + 128 Cr = R * 0.50000 + G * -0.41869 + B * -0.08131 + 128 R = Y + + (Cr - 128) * 1.40200 G = Y + (Cb - 128) * -0.34414 + (Cr - 128) * -0.71414 B = Y + (Cb - 128) * 1.77200 */
This is just a comment in the source, of course it’s C code and the actual function is implemented with lookup tables not matrix multiplication (the static INT16 R_Cr
etc. snipped for brevity):
void ImagingConvertYCbCr2RGB(UINT8* out, const UINT8* in, int pixels) { int x; UINT8 a; int r, g, b; int y, cr, cb; for (x = 0; x < pixels; x++, in += 4, out += 4) { y = in[0]; cb = in[1]; cr = in[2]; a = in[3]; r = y + (( R_Cr[cr]) >> SCALE); g = y + ((G_Cb[cb] + G_Cr[cr]) >> SCALE); b = y + ((B_Cb[cb] ) >> SCALE); out[0] = (r <= 0) ? 0 : (r >= 255) ? 255 : r; out[1] = (g <= 0) ? 0 : (g >= 255) ? 255 : g; out[2] = (b <= 0) ? 0 : (b >= 255) ? 255 : b; out[3] = a; } }
I have googled but there seems to be a lot of confusion about the ‘right’ way to do this colour space conversion. So my question is, is the above correct – and if not what is a better way?
edit: After reading the links provided by Mark Ransom, I discovered that conflicting definitions exist depending on whether you use the full range of YCbCr or clamp out to the valid range. See links below for more info:
It seems the PIL version is using the incorrect algorithm, so I’ve rolled my own function for the conversion which gives the correct looking results (“SDTV” version). Code included below, for future readers to use:
from numpy import dot, ndarray, array A_SDTV = array([[1., 0., 0.701 ], [1., -0.886*0.114/0.587, -0.701*0.299/0.587], [1., 0.886, 0.]]) A_SDTV[:,0] *= 255./219. A_SDTV[:,1:] *= 255./112. A_HDTV = array([[1.164, 0., 1.793], [1.164, -0.213, -0.533], [1.164, 2.112, 0.]]) def yuv2rgb(im, version='SDTV'): """ Convert array-like YUV image to RGB colourspace version: - 'SDTV': ITU-R BT.601 version (default) - 'HDTV': ITU-R BT.709 version """ if not im.dtype == 'uint8': raise TypeError('yuv2rgb only implemented for uint8 arrays') # clip input to the valid range yuv = ndarray(im.shape) # float64 yuv[:,:, 0] = im[:,:, 0].clip(16, 235).astype(yuv.dtype) - 16 yuv[:,:,1:] = im[:,:,1:].clip(16, 240).astype(yuv.dtype) - 128 if version.upper() == 'SDTV': A = A_SDTV elif version.upper() == 'HDTV': A = A_HDTV else: raise Exception("Unrecognised version (choose 'SDTV' or 'HDTV')") rgb = dot(yuv, A.T) result = rgb.clip(0, 255).astype('uint8') return result
Advertisement
Answer
If you look at Wikipedia’s definitions, you can see that there are two conflicting definitions for YCbCr. The ITU-R BT.601 definition compresses the values to the range 16-235 to provide footroom and headroom, while the JPEG version uses the full range 0-255. If you were to decode values in the BT.601 space using the formula for JPEG, the result would definitely look washed out.