|Table of Contents|

Multi-View 3D Reconstruction Based on Capsule Convolution Network(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2023年01期
Page:
46-55,92
Research Field:
计算机科学与技术
Publishing date:

Info

Title:
Multi-View 3D Reconstruction Based on Capsule Convolution Network
Author(s):
Hu RonglinFu HaozhiHe XuqinZhang XinxinLu Wenhao
(Faculty of Computer and Software Engineering,Huaiyin Institute of Technology,Huai'an 223003,China)
Keywords:
feature extraction network3D capsules networkdilated convolutiongroup convolutionmulti-view stereo matching
PACS:
TP391
DOI:
10.3969/j.issn.1672-1292.2023.01.007
Abstract:
By exploring the influence of deep neural networks on the reconstruction effect, the paper proposes a multi-view 3D reconstruction model Caps-MVSNet based on a capsule convolutional network. Caps-MVSNet includes five stages:feature extraction, construction cost volume, cost volume regularization, regression depth map and refinement depth map. This paper focuses on the FENet-T feature extraction network and the 3D-CapsCNN network, which are used for the feature extraction stage and the cost volume regularization stage of the model, respectively. Among which, FENet-T uses an efficient block counting ratio, large-scale dilated convolutions and group convolutions to improve the feature extraction efficiency of the network. 3D-CapsCNN uses 3D capsule networks with a stronger spatial representation than convolutional neural networks to regularize the cost volume. Caps-MVSNet has completed the effect test with the DTU datasets. The results show that compared with the previous mainstream reconstruction methods(Colmap, Tola, Camp, Gipuma, Furu, SurfaceNet), the model proposed by this study achieves the optimum of the current reconstruction method in terms of integrity, and significantly improves the accuracy and completeness. Furthermore, it shows that compared to the model of MVSNet as benchmark, the accuracy, completeness and overall of the proposed model are improved by 3.3%, 4.9% and 8.2%, respectively, the number of parameters is reduced by 3.3%.

References:

[1]SEITZ S M,CURLESS B,DIEBEL J,et al. A comparison and evaluation of multi-view stereo reconstruction algorithms[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York,NY,USA:IEEE,2006,1:519-528.
[2]HARTLEY R,ZISSERMAN A. Multiple view geometry in computer vision[M]. London:Cambridge University Press,2003.
[3]韩文军,孙小虎,吉根林,等. 基于卷积神经网络的多光谱与全色遥感图像融合算法[J]. 南京师大学报(自然科学版),2021,44(3):123-130.
[4]杨会君,王瑞萍,王增莹,等. 基于多视角图像的作物果实三维表型重建[J]. 南京师大学报(自然科学版),2021,44(2):92-103.
[5]张天安,云挺,薛联凤,等. 基于骨架提取的树木主枝干三维重建算法[J]. 南京师范大学学报(工程技术版),2014,14(4):51-57.
[6]WU Z,SONG S,KHOSLA A,et al. 3D shapeNets:A deep representation for volumetric shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston,Massachusetts,USA,2015:1912-1920.
[7]TULSIANI S,ZHOU T,EFROS A A,et al. Multi-view supervision for single-view reconstruction via differentiable ray consistency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,Hawaii,USA,2017:2626-2634.
[8]LI K,PHAM T,ZHAN H,et al. Efficient dense point cloud object reconstruction using deformation vector fields[C]//Proceedings of the European Conference on Computer Vision. Munich,Germany,2018:497-513.
[9]SMITH E,FUJIMOTO S,MEGER D. Multi-view silhouette and depth decomposition for high resolution 3D object representation[J]. Advances in Neural Information Processing Systems,2018,31.
[10]ZHU Q,MIN C,WEI Z,et al. Deep learning for multi-view stereo via plane sweep:a survey[J]. arXiv Preprint arXiv:2106.15328,2021.
[11]YAO Y,LUO Z,LI S,et al. Mvsnet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision(ECCV). Munich,Germany,2018.
[12]YU Z,GAO S. Fast-Mvsnet:Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[13]SABOUR S,FROSST N,HINTON G E. Dynamic routing between capsules[C]//31st Conference on Advances in Neural Information Processing Systems. Long Beach,CA,USA,2017,30.
[14]HINTON G E,SABOUR S,FROSST N. Matrix capsules with EM routing[C]//International conference on learning representations. Vancouver,Canada,2018.
[15]SCHONBERGER J L,FRAHM J M. Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,Nevada,USA,2016.
[16]BARNES C,SHECHTMAN E,FINKELSTEIN A,et al. PatchMatch:a randomized correspondence algorithm for structural image editing[J]. ACM Transactions on Graphics,2009,28(3):24.
[17]ULLMAN S. The interpretation of structure from motion[J]. Proceedings of the Royal Society of London. Series B. Biological Sciences,1979,203(1153):405-426.
[18]MOULON P,MONASSE P,PERROT R,et al. Openmvg:open multiple view geometry[C]//International Workshop on Reproducible Research in Pattern Recognition. Cancún,Mexico,2016:60-74.
[19]GOESELE M,SNAVELY N,CURLESS B,et al. Multi-view stereo for community photo collections[C]//2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro,Brazil,2007.
[20]HAN X,LEUNG T,JIA Y,et al. Matchnet:Unifying feature and metric learning for patch-based matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston,MA,USA,2015.
[21]KENDALL A,MARTIROSYAN H,DASGUPTA S,et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. Chengdu,China,2017.
[22]CLARK R,WANG S,WEN H,et al. Vinet:Visual-inertial odometry as a sequence-to-sequence learning problem[C]//Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco,USA,2017.
[23]JI M,GALL J,ZHENG H,et al. Surfacenet:An end-to-end 3D neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision. Chengdu,China,2017.
[24]HUANG P H,MATZEN K,KOPF J,et al. Deepmvs:Learning multi-view stereopsis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[25]YAO Y,LUO Z,LI S,et al. Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA,2019.
[26]XUE Y,CHEN J,WAN W,et al. Mvscrf:Learning multi-view stereo with conditional random fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul,South Kerean,2019.
[27]GU X,FAN Z,ZHU S,et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[28]YANG J,MAO W,ALVAREZ J M,et al. Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[29]CHEN R,HAN S,XU J,et al. Point-based multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul,South Kerean,2019.
[30]TARG S,ALMEIDA D,LYMAN K. Resnet in resnet:Generalizing residual architectures[J]. arXiv Preprint arXiv:1603.08029,2016.
[31]SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv Preprint arXiv:1409.1556,2014.
[32]RONNEBERGER O,FISCHER P,BROX T. U-Net:Convolutional networks for biomedical image segmentation[C]//Interna-tional Conference on Medical Image Computing and Computer-Assisted Intervention. Munich,Germany,2015.
[33]LIU Z,MAO H,WU C Y,et al. A ConvNet for the 2020s[J]. arXiv Preprint arXiv:2201.03545,2022.
[34]LIU Z,LIN Y,CAO Y,et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal,Canada,2021.
[35]YAN J F,WEI Z Z,YI H W,et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[C]//European Conference on Computer Vision. Springer,Cham,2020:674-689.
[36]CHENG S,XU Z,ZHU S,et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[37]TRAN M,VO-HO V K,LE N T H. 3DConvCaps:3DUnet with convolutional capsule encoder for medical image segmentation[J]. arXiv Preprint arXiv:2205.09299,2022.
[38]ZHAO Y,BIRDAL T,DENG H,et al. 3D point capsule networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA,2019.
[39]WANG X,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[40]HOWARD A G,ZHU M L,CHEN B,et al. MobileNets:Efficient convolutional neural networks for mobile vision applications[J]. arXiv Preprint arXiv:1704.04861,2017.
[41]GALLUP D,FRAHM J M,MORDOHAI P,et al. Real-time plane-sweeping stereo with multiple sweeping directions[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis,Minnesota,USA,2007.
[42]XU N,PRICE B,COHEN S,et al. Deep image matting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA,2017.
[43]JENSEN R,DAHL A,VOGIATZIS G,et al. Large scale multi-view stereopsis evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA,2014.
[44]TOLA E,STRECHA C,FUA P. Efficient large-scale multi-view stereo for ultra high-resolution image sets[J]. Machine Vision and Applications,2012,23(5):903-920.
[45]CAMPBELL N D F,VOGIATZIS G,HERNÁNDEZ C,et al. Using multiple hypotheses to improve depth-maps for multi-view stereo[C]//European Conference on Computer Vision. Marseille,France,2008:766-779.
[46]GALLIANI S,LASINGER K,SCHINDLER K. Gipuma:Massively parallel multi-view stereo reconstruction[J]. Publikationen der Deutschen Gesellschaft für Photogrammetrie,Fernerkundung und Geoinformation,2016,25:361-369.
[47]FURUKAWA Y,PONCE J. Accurate,dense,and robust multiview stereopsis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,32(8):1362-1376.

Memo

Memo:
-
Last Update: 2023-03-15