Sunday, March 13, 2011

Kinect Color - Detph Camera Calibration

Kinect has two cameras, one for capturing a color image and the other for capturing an IR image. Although real-time depth information is provided by IR camera, the depth map tells how far the IR camera's pixels are and we actually do not know the depth information of the color image because the two cameras have different characteristics. As we can see in the image below, the pixels do not match in the two images.  The locations of the hand and arm are completely different in the two images. 



If we use the Kinect device for HCI, it does not matter much because using depth information is enough in most cases. However, if we'd like to use it for 3D scene capture or to want to relate the RGB and depth images, we need to match the color image's pixels to the depth image's. Thus, we need to perform calibration. 

Kinect camera calibration is not different from the general camera calibration. We just need to capture images of a chessboard pattern from IR and RGB cameras. We need to capture several images of a chessboard pattern. When capturing images from the IR camera, we need to block the emitter with something for good corner detection in chessboard images. If not, the captured images will look like below and corner detection will fail.



If the lightings in your environment does not have enough IR rays, you need a light source that emits IR rays (Halogen lamp ?). It might be good to capture the same scenes with two cameras. The below images show two images captured from the IR and RGB cameras, respectively. 



Once images are taken, then we can perform calibration w.r.t each camera by using OpenCV API, MATLAB calibration toolbox, or GML calibration toolbox. After calibration, the intrinsic camera matrices, K_ir and K_rgb, and distortion parameters of the two cameras are obtained. 



To achieve our goal, we need one more information, the geometrical relationship between the two cameras expressed as a rotation matrix R and a translation vector t. To compute them, capture the same scene containing the chessboard pattern with the two cameras and compute extrinsic parameters. From two extrinsic parameters, the relative transformation can be computed easily. 

Now, we can compute the depth of the color image from the depth image provided by the IR camera. Let's consider a pixel p_ir in the IR image. The 3D point P_ir corresponding to the p_ir can be computed by back-projecting p_ir in the IR camera's coordinate system.


P_ir = inv(K) * p_ir 


P_ir can be transformed to the RGB camera's coordinate system through relative transformation R and t.


P_rgb = R * P_ir + t


Then, we project P_rgb onto the RGB camera image and we obtain a 2D point p_rgb.


p_rgb = K_rgb * P_rgb


Finally, the depth value corresponding to the location p_rgb in RGB image is P_rgb's Z axis value.


depth of p_rgb =  Z axis value of P_rgb 


P_ir : 3D point in the IR camera's coordinate system
R, t : Relative transformation between two cameras
P_rgb : 3D point in the RGB camera's coordinate system
p_rgb : The projection of P_rgb onto the RGB image

In the above, conversion to homogeneous coordinates are omitted. When two or more 3D points are projected to the same 2D location in the RGB image, the closest one is chosen. We can also compute the color values of the depth map pixels in the same way. p_ir's color  corresponds to the color of p_rgb.

Here is the resulting depth image of the RGB camera. Since the RGB camera sees wider region than the IR camera, not all pixels' depth information are available.


If we ovary the RGB image and the computed depth image, we can see that the two match well, while they do not before calibration as shown at the beginning of this post.


Here is a demo video showing the depth map of the RGB image and the color map of the depth image.