PIL’s colour space conversion YCbCr -> RGB

The algorithm used by PIL v1.1.7 gives ‘washed out’ looking results. When converting the same source data using ffmpeg it looks correct. Using mplayer gives identical results to ffmpeg (perhaps they use the same library underneath). This leads me to believe PIL may be stuffing up the colour space conversions. The conversion seems to be sourced in libImaging/ConvertYCbCr.c:

/*  JPEG/JFIF YCbCr conversions

    Y  = R *  0.29900 + G *  0.58700 + B *  0.11400
    Cb = R * -0.16874 + G * -0.33126 + B *  0.50000 + 128
    Cr = R *  0.50000 + G * -0.41869 + B * -0.08131 + 128

    R  = Y +                       + (Cr - 128) *  1.40200
    G  = Y + (Cb - 128) * -0.34414 + (Cr - 128) * -0.71414
    B  = Y + (Cb - 128) *  1.77200

*/

JavaScript
​x
 
/*  JPEG/JFIF YCbCr conversions
​
    Y  = R *  0.29900 + G *  0.58700 + B *  0.11400
    Cb = R * -0.16874 + G * -0.33126 + B *  0.50000 + 128
    Cr = R *  0.50000 + G * -0.41869 + B * -0.08131 + 128
​
    R  = Y +                       + (Cr - 128) *  1.40200
    G  = Y + (Cb - 128) * -0.34414 + (Cr - 128) * -0.71414
    B  = Y + (Cb - 128) *  1.77200
​
*/
​

This is just a comment in the source, of course it’s C code and the actual function is implemented with lookup tables not matrix multiplication (the static INT16 R_Cr etc. snipped for brevity):

void
ImagingConvertYCbCr2RGB(UINT8* out, const UINT8* in, int pixels)
{
    int x;
    UINT8 a;
    int r, g, b;
    int y, cr, cb;

    for (x = 0; x < pixels; x++, in += 4, out += 4) {

        y = in[0];
        cb = in[1];
        cr = in[2];
        a = in[3];

        r = y + ((           R_Cr[cr]) >> SCALE);
        g = y + ((G_Cb[cb] + G_Cr[cr]) >> SCALE);
        b = y + ((B_Cb[cb]           ) >> SCALE);

        out[0] = (r <= 0) ? 0 : (r >= 255) ? 255 : r;
        out[1] = (g <= 0) ? 0 : (g >= 255) ? 255 : g;
        out[2] = (b <= 0) ? 0 : (b >= 255) ? 255 : b;
        out[3] = a;
    }
}

JavaScript
 
void
ImagingConvertYCbCr2RGB(UINT8* out, const UINT8* in, int pixels)
{
    int x;
    UINT8 a;
    int r, g, b;
    int y, cr, cb;
​
    for (x = 0; x < pixels; x++, in += 4, out += 4) {
​
        y = in[0];
        cb = in[1];
        cr = in[2];
        a = in[3];
​
        r = y + ((           R_Cr[cr]) >> SCALE);
        g = y + ((G_Cb[cb] + G_Cr[cr]) >> SCALE);
        b = y + ((B_Cb[cb]           ) >> SCALE);
​
        out[0] = (r <= 0) ? 0 : (r >= 255) ? 255 : r;
        out[1] = (g <= 0) ? 0 : (g >= 255) ? 255 : g;
        out[2] = (b <= 0) ? 0 : (b >= 255) ? 255 : b;
        out[3] = a;
    }
}
​

I have googled but there seems to be a lot of confusion about the ‘right’ way to do this colour space conversion. So my question is, is the above correct – and if not what is a better way?

edit: After reading the links provided by Mark Ransom, I discovered that conflicting definitions exist depending on whether you use the full range of YCbCr or clamp out to the valid range. See links below for more info:

It seems the PIL version is using the incorrect algorithm, so I’ve rolled my own function for the conversion which gives the correct looking results (“SDTV” version). Code included below, for future readers to use:

from numpy import dot, ndarray, array


A_SDTV = array([[1.,                 0.,  0.701            ],
                [1., -0.886*0.114/0.587, -0.701*0.299/0.587],
                [1.,  0.886,                             0.]])
A_SDTV[:,0]  *= 255./219.
A_SDTV[:,1:] *= 255./112.

A_HDTV = array([[1.164,     0.,  1.793],
                [1.164, -0.213, -0.533],
                [1.164,  2.112,     0.]])


def yuv2rgb(im, version='SDTV'):
    """
    Convert array-like YUV image to RGB colourspace

    version:
      - 'SDTV':  ITU-R BT.601 version  (default)
      - 'HDTV':  ITU-R BT.709 version
    """
    if not im.dtype == 'uint8':
        raise TypeError('yuv2rgb only implemented for uint8 arrays')

    # clip input to the valid range
    yuv = ndarray(im.shape)  # float64
    yuv[:,:, 0] = im[:,:, 0].clip(16, 235).astype(yuv.dtype) - 16
    yuv[:,:,1:] = im[:,:,1:].clip(16, 240).astype(yuv.dtype) - 128

    if version.upper() == 'SDTV':
        A = A_SDTV
    elif version.upper() == 'HDTV':
        A = A_HDTV
    else:
        raise Exception("Unrecognised version (choose 'SDTV' or 'HDTV')")

    rgb = dot(yuv, A.T)
    result = rgb.clip(0, 255).astype('uint8')

    return result

JavaScript
 
from numpy import dot, ndarray, array
​
​
A_SDTV = array([[1.,                 0.,  0.701            ],
                [1., -0.886*0.114/0.587, -0.701*0.299/0.587],
                [1.,  0.886,                             0.]])
A_SDTV[:,0]  *= 255./219.
A_SDTV[:,1:] *= 255./112.
​
A_HDTV = array([[1.164,     0.,  1.793],
                [1.164, -0.213, -0.533],
                [1.164,  2.112,     0.]])
​
​
def yuv2rgb(im, version='SDTV'):
    """
    Convert array-like YUV image to RGB colourspace
​
    version:
      - 'SDTV':  ITU-R BT.601 version  (default)
      - 'HDTV':  ITU-R BT.709 version
    """
    if not im.dtype == 'uint8':
        raise TypeError('yuv2rgb only implemented for uint8 arrays')
​
    # clip input to the valid range
    yuv = ndarray(im.shape)  # float64
    yuv[:,:, 0] = im[:,:, 0].clip(16, 235).astype(yuv.dtype) - 16
    yuv[:,:,1:] = im[:,:,1:].clip(16, 240).astype(yuv.dtype) - 128
​
    if version.upper() == 'SDTV':
        A = A_SDTV
    elif version.upper() == 'HDTV':
        A = A_HDTV
    else:
        raise Exception("Unrecognised version (choose 'SDTV' or 'HDTV')")
​
    rgb = dot(yuv, A.T)
    result = rgb.clip(0, 255).astype('uint8')
​
    return result
​

Answer

If you look at Wikipedia’s definitions, you can see that there are two conflicting definitions for YCbCr. The ITU-R BT.601 definition compresses the values to the range 16-235 to provide footroom and headroom, while the JPEG version uses the full range 0-255. If you were to decode values in the BT.601 space using the formula for JPEG, the result would definitely look washed out.

Advertisement

Answer