Skip to main content
Log in

Imposing Semi-Local Geometric Constraints for Accurate Correspondences Selection in Structure from Motion: A Game-Theoretic Perspective

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Most Structure from Motion pipelines are based on the iterative refinement of an initial batch of feature correspondences. Typically this is performed by selecting a set of match candidates based on their photometric similarity; an initial estimate of camera intrinsic and extrinsic parameters is then computed by minimizing the reprojection error. Finally, outliers in the initial correspondences are filtered by enforcing some global geometric property such as the epipolar constraint. In the literature many different approaches have been proposed to deal with each of these three steps, but almost invariably they separate the first inlier selection step, which is based only on local image properties, from the enforcement of global geometric consistency. Unfortunately, these two steps are not independent since outliers can lead to inaccurate parameter estimation or even prevent convergence, leading to the well known sensitivity of all filtering approaches to the number of outliers, especially in the presence of structured noise, which can arise, for example, when the images present several repeated patterns. In this paper we introduce a novel stereo correspondence selection scheme that casts the problem into a Game-Theoretic framework in order to guide the inlier selection towards a consistent subset of correspondences. This is done by enforcing geometric constraints that do not depend on full knowledge of the motion parameters but rather on some semi-local property that can be estimated from the local appearance of the image features. The practical effectiveness of the proposed approach is confirmed by an extensive set of experiments and comparisons with state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Albarelli, A., Rota Bulò, S., Torsello, A., & Pelillo, M. (2009). Matching as a non-cooperative game. In Proc. IEEE international conference on computer vision—ICCV’09.

    Google Scholar 

  • Albarelli, A., Rodolà, E., & Torsello, A. (2010). Robust game-theoretic inlier selection for bundle adjustment. In Proc. 3D data processing, visualization and transmission—3DPVT’10.

    Google Scholar 

  • Aggarwal, J. K., & Duda, R. O. (1975). Computer analysis of moving polygonal images. IEEE Transactions on Computers, 24, 966–976.

    Article  MATH  Google Scholar 

  • Beardsley, P. A., Zisserman, A., & Murray, D. W. (1997). Sequential updating of projective and affine structure from motion. International Journal of Computer Vision, 23(3), 235–259.

    Article  Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2007). Image classification using random forests and ferns. In Proc. 11th IEEE international conference on computer vision—ICCV’07 (pp. 1–8).

    Chapter  Google Scholar 

  • Brown, M., & Lowe, D. G. (2005). Unsupervised 3d object recognition and reconstruction in unordered datasets. In 3DIM’05: Proceedings of the fifth international conference on 3-D digital imaging and modeling (pp. 56–63). Los Alamitos: IEEE Computer Society.

    Chapter  Google Scholar 

  • Fermuller, C., Brodsky, T., & Aloimonos, Y. (1999). Motion segmentation: a synergistic approach. In IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. 637–643).

    Google Scholar 

  • Fitzgibbon, A. W., & Zisserman, A. (1998). Automatic camera recovery for closed or open image sequences. In ECCV’98: Proceedings of the 5th European conference on computer vision (Vol. I, pp. 311–326). Berlin: Springer.

    Google Scholar 

  • Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1362–1376. doi:10.1109/TPAMI.2009.161.

    Article  Google Scholar 

  • Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proc. fourth Alvey vision conference (pp. 147–151).

    Google Scholar 

  • Hartley, R. I. (1995). In defence of the 8-point algorithm. In Proceedings of IEEE international conference on computer vision (pp. 1064–1070). Los Alamitos: IEEE Comput. Soc.

    Chapter  Google Scholar 

  • Herbert Bay, T. T., & Gool, L. V. (2006). SURF: Speeded up robust features. In 9th European conference on computer vision (Vol. 3951, pp. 404–417).

    Google Scholar 

  • Heyden, A., Berthilsson, R., & Sparr, G. (1999). An iterative factorization method for projective structure and motion from image sequences. Image and Vision Computing, 17(13), 981–991.

    Article  Google Scholar 

  • Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: a more distinctive representation for local image descriptors. In Proc. IEEE comp. soc. conf. on computer vision and pattern recognition—CVPR’04 (Vol. 2, pp. 506–513).

    Google Scholar 

  • Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Mechanics and Applied Mathematics, II(2), 164–168.

    MathSciNet  Google Scholar 

  • Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proc. of the international conference on computer vision ICCV (pp. 1150–1157).

    Google Scholar 

  • Lowe, D. (2003). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 20, 91–110.

    Google Scholar 

  • Marr, D., & Hildreth, E. (1980). Theory of edge detection. Proceedings of the Royal Society of London, Series B, 207, 187–217.

    Article  Google Scholar 

  • Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2002). An affine invariant interest point detector. In Proc. 7th European conference on computer vision—ECCV 2002 (pp. 128–142). Berlin: Springer.

    Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

    Article  Google Scholar 

  • Morel, J. M., & Yu, G. (2009). ASIFT: A new framework for fully affine invariant image comparison. Journal of Imaging Science, 2(2), 438–469.

    Article  MathSciNet  MATH  Google Scholar 

  • Pollefeys, M., Koch, R., Vergauwen, M., & Gool, L. V. (1999). Hand-held acquisition of 3d models with a video camera. In 3D digital imaging and modeling, international conference on 0:0014.

    Google Scholar 

  • Sarfraz, M. S., & Hellwich, O. (2008). Head pose estimation in face recognition across pose scenarios. In VISAPP (1) (pp. 235–242).

    Google Scholar 

  • Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proc. of the IEEE conference on computer vision and pattern recognition (pp. 519–528).

    Google Scholar 

  • Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo tourism: exploring photo collections in 3d. In ACM SIGGRAPH’06 (pp. 835–846).

    Google Scholar 

  • Snavely, N., Seitz, S. M., & Szeliski, R. (2008). Modeling the world from Internet photo collections. International Journal of Computer Vision, 80(2), 189–210.

    Article  Google Scholar 

  • Sturm, P. F., & Triggs, B. (1996). A factorization based algorithm for multi-image projective structure and motion. In ECCV’96: Proceedings of the 4th European conference on computer vision (Vol. II, pp. 709–720). Berlin: Springer.

    Google Scholar 

  • Taylor, P., & Jonker, L. (1978). Evolutionarily stable strategies and game dynamics. Mathematical Biosciences, 40, 145–156.

    Article  MathSciNet  MATH  Google Scholar 

  • Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9, 137–154. doi:10.1007/BF00129684.

    Article  Google Scholar 

  • Torr, P., & Zisserman, A. (1998). Robust computation and parametrization of multiple view relations. In ICCV’98: Proceedings of the sixth international conference on computer vision. Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Torsello, A., Rota Bulò, S., & Pelillo, M. (2006). Grouping with asymmetric affinities: A game-theoretic perspective. In Proc. of the IEEE conference on computer vision and pattern recognition—CVPR’06 (pp. 292–299).

    Google Scholar 

  • Triggs, B., McLauchlan, P., Hartley, R., & Fitzgibbon, A. (2000). Bundle adjustment—a modern synthesis. In B. Triggs, A. Zisserman, & R. Szeliski (Eds.), Lecture notes in computer science: Vol. 1883. Vision algorithms: theory and practice (pp. 298–372).

    Chapter  Google Scholar 

  • Tsai, R. (1987). A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE Journal of Robotics and Automation, 3(4), 323–344.

    Article  Google Scholar 

  • Vedaldi, A., & Fulkerson, B. (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.

  • Vergauwen, M., & Van Gool, L. (2006). Web-based 3d reconstruction service. Machine Vision and Applications, 17(6), 411–426.

    Article  Google Scholar 

  • Weibull, J. (1995). Evolutionary game theory. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Weinshall, D., & Tomasi, C. (1995). Linear and incremental acquisition of invariant shape models from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 512–517.

    Article  Google Scholar 

  • Weng, J., Cohen, P., & Herniou, M. (1992). Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10), 965–980.

    Article  Google Scholar 

  • Weng, J., Ahuja, N., & Huang, T. S. (1993). Optimal motion and structure estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 864–884.

    Article  Google Scholar 

  • Zhang, Z. (1995). Estimating motion and structure from correspondences of line segments between two perspective images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(12), 1129–1139.

    Article  Google Scholar 

  • Zhang, Z., Deriche, R., Faugeras, O., & Luong, Q. T. (1995). A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 78(1–2), 87–119.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Albarelli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albarelli, A., Rodolà, E. & Torsello, A. Imposing Semi-Local Geometric Constraints for Accurate Correspondences Selection in Structure from Motion: A Game-Theoretic Perspective. Int J Comput Vis 97, 36–53 (2012). https://doi.org/10.1007/s11263-011-0432-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0432-4

Keywords

Navigation