[1]胡荣林,付浩志,何旭琴,等.基于胶囊卷积网络的多视图三维重建[J].南京师范大学学报(工程技术版),2023,23(01):046-55,92.[doi:10.3969/j.issn.1672-1292.2023.01.007]
 Hu Ronglin,Fu Haozhi,He Xuqin,et al.Multi-View 3D Reconstruction Based on Capsule Convolution Network[J].Journal of Nanjing Normal University(Engineering and Technology),2023,23(01):046-55,92.[doi:10.3969/j.issn.1672-1292.2023.01.007]
点击复制

基于胶囊卷积网络的多视图三维重建
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
23卷
期数:
2023年01期
页码:
046-55,92
栏目:
计算机科学与技术
出版日期:
2023-03-15

文章信息/Info

Title:
Multi-View 3D Reconstruction Based on Capsule Convolution Network
文章编号:
1672-1292(2023)01-0046-10
作者:
胡荣林付浩志何旭琴张新新陆文豪
(淮阴工学院计算机与软件工程学院,江苏 淮安 223003)
Author(s):
Hu RonglinFu HaozhiHe XuqinZhang XinxinLu Wenhao
(Faculty of Computer and Software Engineering,Huaiyin Institute of Technology,Huai'an 223003,China)
关键词:
特征提取网络3D胶囊网络空洞卷积分组卷积多视立体匹配
Keywords:
feature extraction network3D capsules networkdilated convolutiongroup convolutionmulti-view stereo matching
分类号:
TP391
DOI:
10.3969/j.issn.1672-1292.2023.01.007
文献标志码:
A
摘要:
从深度神经网络对重建效果影响的角度,提出了基于胶囊卷积网络的多视图三维重建模型Caps-MVSNet,包括特征提取、构建代价体、代价体正则化、回归深度图和细化深度图5个阶段. 提出了FENet-T特征提取网络和3D-CapsCNN网络,并分别应用于模型的特征提取阶段和代价体正则化阶段. 其中,FENet-T利用高效的Block计数比率以及大尺度空洞卷积和分组卷积提高网络的特征提取效率. 3D-CapsCNN使用比卷积神经网络更强空间表示能力的3D胶囊网络来正则化代价体. Caps-MVSNet在DTU数据集上完成了效果测试,结果表明,与先前主流重建方法相比该模型在完整性上达到了最优结果,在准确性、整体性上均取得较大提升. 另外,与基准模型MVSNet相比,该模型在准确性、整体性和完整性上分别提高3.3%、4.9%和8.2%,参数量减少3.3%.
Abstract:
By exploring the influence of deep neural networks on the reconstruction effect, the paper proposes a multi-view 3D reconstruction model Caps-MVSNet based on a capsule convolutional network. Caps-MVSNet includes five stages:feature extraction, construction cost volume, cost volume regularization, regression depth map and refinement depth map. This paper focuses on the FENet-T feature extraction network and the 3D-CapsCNN network, which are used for the feature extraction stage and the cost volume regularization stage of the model, respectively. Among which, FENet-T uses an efficient block counting ratio, large-scale dilated convolutions and group convolutions to improve the feature extraction efficiency of the network. 3D-CapsCNN uses 3D capsule networks with a stronger spatial representation than convolutional neural networks to regularize the cost volume. Caps-MVSNet has completed the effect test with the DTU datasets. The results show that compared with the previous mainstream reconstruction methods(Colmap, Tola, Camp, Gipuma, Furu, SurfaceNet), the model proposed by this study achieves the optimum of the current reconstruction method in terms of integrity, and significantly improves the accuracy and completeness. Furthermore, it shows that compared to the model of MVSNet as benchmark, the accuracy, completeness and overall of the proposed model are improved by 3.3%, 4.9% and 8.2%, respectively, the number of parameters is reduced by 3.3%.

参考文献/References:

[1]SEITZ S M,CURLESS B,DIEBEL J,et al. A comparison and evaluation of multi-view stereo reconstruction algorithms[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York,NY,USA:IEEE,2006,1:519-528.
[2]HARTLEY R,ZISSERMAN A. Multiple view geometry in computer vision[M]. London:Cambridge University Press,2003.
[3]韩文军,孙小虎,吉根林,等. 基于卷积神经网络的多光谱与全色遥感图像融合算法[J]. 南京师大学报(自然科学版),2021,44(3):123-130.
[4]杨会君,王瑞萍,王增莹,等. 基于多视角图像的作物果实三维表型重建[J]. 南京师大学报(自然科学版),2021,44(2):92-103.
[5]张天安,云挺,薛联凤,等. 基于骨架提取的树木主枝干三维重建算法[J]. 南京师范大学学报(工程技术版),2014,14(4):51-57.
[6]WU Z,SONG S,KHOSLA A,et al. 3D shapeNets:A deep representation for volumetric shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston,Massachusetts,USA,2015:1912-1920.
[7]TULSIANI S,ZHOU T,EFROS A A,et al. Multi-view supervision for single-view reconstruction via differentiable ray consistency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,Hawaii,USA,2017:2626-2634.
[8]LI K,PHAM T,ZHAN H,et al. Efficient dense point cloud object reconstruction using deformation vector fields[C]//Proceedings of the European Conference on Computer Vision. Munich,Germany,2018:497-513.
[9]SMITH E,FUJIMOTO S,MEGER D. Multi-view silhouette and depth decomposition for high resolution 3D object representation[J]. Advances in Neural Information Processing Systems,2018,31.
[10]ZHU Q,MIN C,WEI Z,et al. Deep learning for multi-view stereo via plane sweep:a survey[J]. arXiv Preprint arXiv:2106.15328,2021.
[11]YAO Y,LUO Z,LI S,et al. Mvsnet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision(ECCV). Munich,Germany,2018.
[12]YU Z,GAO S. Fast-Mvsnet:Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[13]SABOUR S,FROSST N,HINTON G E. Dynamic routing between capsules[C]//31st Conference on Advances in Neural Information Processing Systems. Long Beach,CA,USA,2017,30.
[14]HINTON G E,SABOUR S,FROSST N. Matrix capsules with EM routing[C]//International conference on learning representations. Vancouver,Canada,2018.
[15]SCHONBERGER J L,FRAHM J M. Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,Nevada,USA,2016.
[16]BARNES C,SHECHTMAN E,FINKELSTEIN A,et al. PatchMatch:a randomized correspondence algorithm for structural image editing[J]. ACM Transactions on Graphics,2009,28(3):24.
[17]ULLMAN S. The interpretation of structure from motion[J]. Proceedings of the Royal Society of London. Series B. Biological Sciences,1979,203(1153):405-426.
[18]MOULON P,MONASSE P,PERROT R,et al. Openmvg:open multiple view geometry[C]//International Workshop on Reproducible Research in Pattern Recognition. Cancún,Mexico,2016:60-74.
[19]GOESELE M,SNAVELY N,CURLESS B,et al. Multi-view stereo for community photo collections[C]//2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro,Brazil,2007.
[20]HAN X,LEUNG T,JIA Y,et al. Matchnet:Unifying feature and metric learning for patch-based matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston,MA,USA,2015.
[21]KENDALL A,MARTIROSYAN H,DASGUPTA S,et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. Chengdu,China,2017.
[22]CLARK R,WANG S,WEN H,et al. Vinet:Visual-inertial odometry as a sequence-to-sequence learning problem[C]//Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco,USA,2017.
[23]JI M,GALL J,ZHENG H,et al. Surfacenet:An end-to-end 3D neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision. Chengdu,China,2017.
[24]HUANG P H,MATZEN K,KOPF J,et al. Deepmvs:Learning multi-view stereopsis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[25]YAO Y,LUO Z,LI S,et al. Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA,2019.
[26]XUE Y,CHEN J,WAN W,et al. Mvscrf:Learning multi-view stereo with conditional random fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul,South Kerean,2019.
[27]GU X,FAN Z,ZHU S,et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[28]YANG J,MAO W,ALVAREZ J M,et al. Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[29]CHEN R,HAN S,XU J,et al. Point-based multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul,South Kerean,2019.
[30]TARG S,ALMEIDA D,LYMAN K. Resnet in resnet:Generalizing residual architectures[J]. arXiv Preprint arXiv:1603.08029,2016.
[31]SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv Preprint arXiv:1409.1556,2014.
[32]RONNEBERGER O,FISCHER P,BROX T. U-Net:Convolutional networks for biomedical image segmentation[C]//Interna-tional Conference on Medical Image Computing and Computer-Assisted Intervention. Munich,Germany,2015.
[33]LIU Z,MAO H,WU C Y,et al. A ConvNet for the 2020s[J]. arXiv Preprint arXiv:2201.03545,2022.
[34]LIU Z,LIN Y,CAO Y,et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal,Canada,2021.
[35]YAN J F,WEI Z Z,YI H W,et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[C]//European Conference on Computer Vision. Springer,Cham,2020:674-689.
[36]CHENG S,XU Z,ZHU S,et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[37]TRAN M,VO-HO V K,LE N T H. 3DConvCaps:3DUnet with convolutional capsule encoder for medical image segmentation[J]. arXiv Preprint arXiv:2205.09299,2022.
[38]ZHAO Y,BIRDAL T,DENG H,et al. 3D point capsule networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA,2019.
[39]WANG X,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[40]HOWARD A G,ZHU M L,CHEN B,et al. MobileNets:Efficient convolutional neural networks for mobile vision applications[J]. arXiv Preprint arXiv:1704.04861,2017.
[41]GALLUP D,FRAHM J M,MORDOHAI P,et al. Real-time plane-sweeping stereo with multiple sweeping directions[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis,Minnesota,USA,2007.
[42]XU N,PRICE B,COHEN S,et al. Deep image matting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA,2017.
[43]JENSEN R,DAHL A,VOGIATZIS G,et al. Large scale multi-view stereopsis evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA,2014.
[44]TOLA E,STRECHA C,FUA P. Efficient large-scale multi-view stereo for ultra high-resolution image sets[J]. Machine Vision and Applications,2012,23(5):903-920.
[45]CAMPBELL N D F,VOGIATZIS G,HERNÁNDEZ C,et al. Using multiple hypotheses to improve depth-maps for multi-view stereo[C]//European Conference on Computer Vision. Marseille,France,2008:766-779.
[46]GALLIANI S,LASINGER K,SCHINDLER K. Gipuma:Massively parallel multi-view stereo reconstruction[J]. Publikationen der Deutschen Gesellschaft für Photogrammetrie,Fernerkundung und Geoinformation,2016,25:361-369.
[47]FURUKAWA Y,PONCE J. Accurate,dense,and robust multiview stereopsis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,32(8):1362-1376.

备注/Memo

备注/Memo:
收稿日期:2022-09-15.
基金项目:江苏省研究生实践创新计划项目(SJCX22-1676).
通讯作者:胡荣林,博士,副教授,研究方向:人机交互技术. E-mail:huronglin@hyit.edu.cn
更新日期/Last Update: 2023-03-15