|
From: | Steve Joyce |
Subject: | [Paparazzi-devel] Photogrammetry (was georeferencing video stream) |
Date: | Thu, 8 Oct 2009 14:17:31 +0200 |
I've been working with aerial photogrammetry for
some time and have some insight into how EnsoMosaic works. I'll try to
shed some light on the topic:
Orthorectification is a projection of an image on
to the surface with all the distortions form the terrain and the camera removed
so that the orthophoto is uniform in scale, positioned in a map reference
system, and can be used for measurements or with other geographic data.
With several overlapping images, you can create an orthophoto mosaic, taking
only the most nadir (straight down) view from each photo and "stitching" them
together into a larger photo map.
To orthorectify a raw image you need to
know:
1. The camera intrinsic parameters (focal
length, focal plane geometry, radial and other distortions)
2. The camera extrinsic parameters (translation
(tx,ty,tz), and orientation Omega-phi-kappa)
3. A surface elevation model
#1 is usually measured and well known from a camera
calibration.
#2 approximate values can come from the
autopilot's GPS and IMU.
#3 comes from an external source or can be derived
from the photos themselves (see block adjustment below).
In computer vision, they usually refer to #2 and #3
as motion and structure respectively.
You could in theory use the position and attitude
data from the autopilot together with an external DSM to orhorecitfy
images. This is called direct georeferencing and requires an extremely
accurate IMU.
What is usually done instead is to take a large
number of tie points (features on the ground that can be located in two or more
images) and solve for all unknown parameters with a process called bundle
(block) adjustment. This is an optimization technique to solve for all of
the motion and structure parameters simultaneously. It is an
iterative process to minimize the total reprojection error from 3D world
coordinates to 2D image coordinates across all images and points. As
an iterative process, it requires initial values and other parameters to ensure
convergence. Block adjustment can even be used to solve for the camera
intrinsic parameters but it is difficult to arrive at a realistic solution if
you have too many unknowns simultaneously.
After a block adjustment, you have an updated set
of extrinsic parameters and a surface model can be interpolated from
the estimated tiepoint z-coordinates, giving you everything you need to
orthorectify the photos.
Ground control points are like tie points, but
their position is also known in the world coordinate system and they can be
introduced into the block adjustment.
Prior estimates of parameter accuracy can also
be introduced as Austin Jensen described, but they don't make constraints
on the final solution, but rather give a weight or confidence on that
measurement in the adjustment process.
Block adjustment is a rigorous mathematical
solution to minimize the total error, but there is a fair bit of experience
and black magic required to get it to converge, especially if tiepoints are
automatically generated and contain errors or "blunders" as they are
called.
To the question at hand about georeferencing a
video stream, this is a rather difficult problem. It is quite manageable
to take a set of photos with good overlap and do a block adjustment to create a
photomosaic. But in the case with video, you would often like to do
it real time- updating the solution as each new frame comes in. If
you take an approach to match frames with only previous frames in the
sequence, I think the solution will quickly diverge if you try to
solve for both motion and structure. On the other hand if you comb over an
area and try to to redo the adjustment for the entire area when each new frame
comes in, the size of the problem grows to the point where it would become
too slow. The option of direct georeferencing is always possible if a
surface model is available and accuracy isn't critical, i.e. use the GPS and IMU
data directly to reproject each frame to the surface.
If you want to investigate these things further
with opensource tools, I would recommend openCV http://en.wikipedia.org/wiki/OpenCV for
camera calibration, point matching, projection functions etc. and sba http://www.ics.forth.gr/~lourakis/sba/ for
block adjustment with the following caveats:
- aircraft attitude pitch-roll-yaw are not the
sames as photogrammetric angles omega-phi-kappa (although they can be
derived from them).
- GPS position (X-Y-Z) is not the same as camera
translation (tx, ty, yz) used most projection functions (although they can be
derived from them).
- OpenCV is a computer vision library and doesn't
use the same conventions and terminology as in aerial
photogrammetry.
- sba requires a pretty good background in
calculus and algebra just to understand what it is supposed to do (quaternion
what?).
Then again you could just buy an INPHO licence for
about 50,000 Euro, which does a good job.
The numerous programs and plugins that do "panorama
stitching" are related but not quite the same. They do use the same camera
intrinsics, extrinsics, and block adjustment with tie points, but have the
advantage that the panorama camera position is usually fixed resulting in much
fewer parameters (in fact the photo-tourism project also uses sba
for structure from motion). They also don't really need to bother
about the accuracy of the structure part as long as the seams between photos are
good.
For an idea of how this theory can be used in
practice, check out http://www.germatics.com/pages_eng/uav_sampleprojectt_eng.html (flown
with a paparazzi tiny13!).
Regards,
Steve
|
[Prev in Thread] | Current Thread | [Next in Thread] |