多模态大模型驱动的工程CAD图纸智能解析:技术框架、关键方法与研究进展

ISSN:3041-0630(P)

EISSN:3041-0606(O)

语言:中文

作者
焦彦凯
文章摘要
工程CAD图纸的传统解析依赖人工识读,效率与精度受限。本文梳理国内外多模态大模型在CAD图纸解析中的研究进展,提出“视觉感知—图元识别—语义推理—知识对齐”四层技术框架,讨论图元检测、GD&T标注解析、图层关系推断等环节的方法。结合建筑、机械、电力三类场景,对比各方法在标注抽取准确率与语义一致性等指标上的表现。结果表明大模型方法在符号识别与语义关联方面优于传统方法,但领域数据稀缺与推理可信度仍待解决。
文章关键词
多模态大模型;视觉语言模型;工程CAD图纸;智能解析;检索增强生成
参考文献
[1] Khan,M.T.,Chen,L.,Ng,Y.H.,Feng,W.,Tan,N.Y.J.,&Moon,S.K.(2025,March).Fine-tuning vision-language model for automated engineering drawing information extraction.In International Conference on Innovation in Artificial Intelligence(pp.411-423).Singapore:Springer Nature Singapore. [2] Khan,M.T.,Chen,L.,Feng,W.,&Moon,S.K.(2026).Context-aware mapping of 2D drawing annotations to 3D CAD features using LLM-assisted reasoning for manufacturing automation.arXiv preprint arXiv:2602.18296. [3] 林佳瑞,周育丞,郑哲,&陆新征.(2023).自动审图及智能审图研究与应用综述.工程力学,40(7),25-38. [4] Pizarro,P.N.,Hitschfeld,N.,&Sipiran,I.(2023).Large-scale multi-unit floor plan dataset for architectural plan analysis and recognition.Automation in Construction,156,105132. [5] Xu,J.,Wang,C.,Zhao,Z.,Liu,W.,Ma,Y.,&Gao,S.(2024).Cad-mllm:Unifying multimodality-conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954. [6] Gupta,M.,Wei,C.,&Czerniawski,T.(2024).Semi-supervised symbol detection for piping and instrumentation drawings.Automation in Construction,159,105260. [7] Iversen,O.,&Huang,L.(2026).Leveraging large language models for BIM-based automated compliance checking.Automation in Construction,182,106707. [8] Zhang,W.,Joseph,J.,Yin,Y.,Xie,L.,Furuhata,T.,Yamakawa,S.,Shimada,K.and Kara,L.B.,2023.Component segmentation of engineering drawings using graph convolutional networks.Computers in Industry,147,p.103885. [9] Gao,Y.,Xiong,Y.,Gao,X.,Jia,K.,Pan,J.,Bi,Y.,...&Wang,H.(2023).Retrieval-augmented generation for large language models:A survey.arXiv preprint arXiv:2312.10997,2(1),32.
Full Text:
DOI