[Paparazzi-devel] Photogrammetry (was georeferencing video stream)

I've been working with aerial photogrammetry for some time and have some insight into how EnsoMosaic works. I'll try to shed some light on the topic:

Orthorectification is a projection of an image on to the surface with all the distortions form the terrain and the camera removed so that the orthophoto is uniform in scale, positioned in a map reference system, and can be used for measurements or with other geographic data. With several overlapping images, you can create an orthophoto mosaic, taking only the most nadir (straight down) view from each photo and "stitching" them together into a larger photo map.

To orthorectify a raw image you need to know:

1. The camera intrinsic parameters (focal length, focal plane geometry, radial and other distortions)

2. The camera extrinsic parameters (translation (tx,ty,tz), and orientation Omega-phi-kappa)

3. A surface elevation model

#1 is usually measured and well known from a camera calibration.

#2 approximate values can come from the autopilot's GPS and IMU.

#3 comes from an external source or can be derived from the photos themselves (see block adjustment below).

In computer vision, they usually refer to #2 and #3 as motion and structure respectively.

You could in theory use the position and attitude data from the autopilot together with an external DSM to orhorecitfy images. This is called direct georeferencing and requires an extremely accurate IMU.

What is usually done instead is to take a large number of tie points (features on the ground that can be located in two or more images) and solve for all unknown parameters with a process called bundle (block) adjustment. This is an optimization technique to solve for all of the motion and structure parameters simultaneously. It is an iterative process to minimize the total reprojection error from 3D world coordinates to 2D image coordinates across all images and points. As an iterative process, it requires initial values and other parameters to ensure convergence. Block adjustment can even be used to solve for the camera intrinsic parameters but it is difficult to arrive at a realistic solution if you have too many unknowns simultaneously.

After a block adjustment, you have an updated set of extrinsic parameters and a surface model can be interpolated from the estimated tiepoint z-coordinates, giving you everything you need to orthorectify the photos.

Ground control points are like tie points, but their position is also known in the world coordinate system and they can be introduced into the block adjustment.

Prior estimates of parameter accuracy can also be introduced as Austin Jensen described, but they don't make constraints on the final solution, but rather give a weight or confidence on that measurement in the adjustment process.

Block adjustment is a rigorous mathematical solution to minimize the total error, but there is a fair bit of experience and black magic required to get it to converge, especially if tiepoints are automatically generated and contain errors or "blunders" as they are called.

To the question at hand about georeferencing a video stream, this is a rather difficult problem. It is quite manageable to take a set of photos with good overlap and do a block adjustment to create a photomosaic. But in the case with video, you would often like to do it real time- updating the solution as each new frame comes in. If you take an approach to match frames with only previous frames in the sequence, I think the solution will quickly diverge if you try to solve for both motion and structure. On the other hand if you comb over an area and try to to redo the adjustment for the entire area when each new frame comes in, the size of the problem grows to the point where it would become too slow. The option of direct georeferencing is always possible if a surface model is available and accuracy isn't critical, i.e. use the GPS and IMU data directly to reproject each frame to the surface.

If you want to investigate these things further with opensource tools, I would recommend openCV http://en.wikipedia.org/wiki/OpenCV for camera calibration, point matching, projection functions etc. and sba http://www.ics.forth.gr/~lourakis/sba/ for block adjustment with the following caveats:

- aircraft attitude pitch-roll-yaw are not the sames as photogrammetric angles omega-phi-kappa (although they can be derived from them).

- GPS position (X-Y-Z) is not the same as camera translation (tx, ty, yz) used most projection functions (although they can be derived from them).

- OpenCV is a computer vision library and doesn't use the same conventions and terminology as in aerial photogrammetry.

- sba requires a pretty good background in calculus and algebra just to understand what it is supposed to do (quaternion what?).

Then again you could just buy an INPHO licence for about 50,000 Euro, which does a good job.

The numerous programs and plugins that do "panorama stitching" are related but not quite the same. They do use the same camera intrinsics, extrinsics, and block adjustment with tie points, but have the advantage that the panorama camera position is usually fixed resulting in much fewer parameters (in fact the photo-tourism project also uses sba for structure from motion). They also don't really need to bother about the accuracy of the structure part as long as the seams between photos are good.

For an idea of how this theory can be used in practice, check out http://www.germatics.com/pages_eng/uav_sampleprojectt_eng.html (flown with a paparazzi tiny13!).

Regards,

Steve

From:	Steve Joyce
Subject:	[Paparazzi-devel] Photogrammetry (was georeferencing video stream)
Date:	Thu, 8 Oct 2009 14:17:31 +0200