Порівняльний аналіз моделей CodeBERT та CodeLlama: архітектура, функціональність та застосування в задачах програмного кодування

Олександр Дейнега; Олена Аршава; Ірина Жовтоніжко

Журнал: Том 30, № 4, 2025

Сторінки: 128 – 142

DOI: https://doi.org/10.62660/bcstu/4.2025.128

1 041 Перегляд

Порівняльний аналіз моделей CodeBERT та CodeLlama: архітектура, функціональність та застосування в задачах програмного кодування

Олександр Дейнега, Олена Аршава, Ірина Жовтоніжко

Отримано 30.05.2025

Доопрацьовано 19.10.2025

Прийнято 15.12.2025

Анотація

Актуальність дослідження зумовлена потребою порівняти великі мовні моделі CodeBERT і CodeLlama, які активно використовують для автоматизації генерації та аналізу коду з метою підвищення ефективності й якості програмного забезпечення. Метою дослідження було всебічне зіставлення архітектурних, функціональних характеристик обраних мовних моделей CodeBERT і CodeLlama. Використано інтерпретативний, порівняльний, системний та структурно-категоріальний аналізи для вивчення архітектур, завдань та релевантності моделей. Здійснено всебічний порівняльний аналіз моделей CodeBERT і CodeLlama за ключовими параметрами: архітектура моделей (архітектура енкодер RoBERTa у CodeBERT проти декодерної архітектури Llama 2 у CodeLlama), масштаб і джерела навчальних даних, спектр підтримуваних завдань, продуктивність на еталонних бенчмарках, переваги та обмеження, типові сфери застосування та умови доступності й ліцензування. Результати показали, що різниця в архітектурі та навчальних даних суттєво впливає на ефективність моделей у різних типах завдань, а також визначає їх практичні можливості й обмеження. Особливу увагу приділено питанням впровадження моделей у практичні сценарії, з урахуванням апаратних ресурсів і ліцензійної політики. Результати показали, що CodeLlama потребує значно більших обчислювальних ресурсів для ефективної роботи, тоді як CodeBERT є більш легким у впровадженні на стандартному обладнанні. Також було встановлено, що ліцензійні умови CodeLlama є більш обмежувальними, що може ускладнити його використання у комерційних проєктах, на відміну від CodeBERT із відкритою ліцензією. Зроблено висновок, що ці моделі виконують переважно взаємодоповнювальні функції: CodeBERT є ефективним інструментом для задач розуміння коду, тоді як CodeLlama демонструє високі результати в задачах генерації. У висновках окреслено виклики й перспективи розвитку моделей нового покоління з мультизадачністю та мультимодальністю. Практична цінність – допомога розробникам і дослідникам у виборі оптимального інструменту з урахуванням технічних і ліцензійних аспектів

Ключові слова

великі мовні моделі; ентрансформерна архітектура; декодерна архітектура; системний та функціональний аналіз; оптимізаційна модель; статистичний аналіз; обробка природної мови

Використані джерела

Bai, X., Huang, S., Wei, C., & Wang, R. (2025). Collaboration between intelligent agents and large language models: A novel approach for enhancing code generation capability. Expert Systems with Applications, 269, article number 126357. doi: 10.1016/j.eswa.2024.126357.
Bhandari, G., Gavric, N., & Shalaginov, A. (2025). Generating vulnerability security fixes with code language models. Information and Software Technology, 185, article number 107786. doi: 10.1016/j.infsof.2025.107786 .
Budzynskyi, O.V. (2025). Method of detecting vulnerabilities and automated response in corporate database protection systems. Modern Information Security, 2(62), 180-186. doi: 10.31673/2409-7292.2025.029259.
Çaylı, O. (2024). AI-enhanced cybersecurity vulnerability-based prevention, defense, and mitigation using generative AI. Orclever Proceedings of Research and Development, 5(1), 655-667. doi: 10.56038/oprd.v5i1.616.
d’Aloisio, G., Traini, L., Sarro, F., & Di Marco, A. (2025). On the compression of language models for code: An empirical study on codeBERT. In 2025 IEEE international conference on software analysis, evolution and reengineering (SANER) (pp. 12-23). Montreal: IEEE. doi: 10.1109/SANER64311.2025.00010 .
de-Fitero-Dominguez, D., Garcia-Lopez, E., Garcia-Cabot, A., & Martinez-Herraiz, J.-J. (2024). Enhanced automated code vulnerability repair using large language models. Engineering Applications of Artificial Intelligence, 138(A), article number 109291. doi: 10.1016/j.engappai.2024.109291.
Deineha, O., Donets, V., & Zholtkevych, G. (2024). The approach development of data extraction from lambda terms. Eastern-European Journal of Enterprise Technologies, 3(2(129)), 42-54. doi: 10.15587/1729 4061.2024.298991 .
Gain, B., Bandyopadhyay, D., Mukherjee, S., Sahoo, A., Dana, S., Kodeswaran, P., Sen, S., Ekbal, A., & Garg, D. (2025). Transforming code understanding: Clustering-based retrieval for improved summarization in domain specific languages. In O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. Di Eugenio, S. Schockaert, K. Darwish & A. Agarwal (Eds.), Proceedings of the 31st international conference on computational linguistics: Industry track (pp. 546-560). Abu Dhabi: Association for Computational Linguistics.
Gao, Z., Wang, H., Wang, Y., & Zhang, C. (2024). Virtual compiler is all you need for assembly code search. In Proceedings of the 62nd annual meeting of the association for computational linguistics (pp. 3040-3051). Bangkok: Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.167.
Ghaemi, H., Alizadehsani, Z., Shahraki, A., & Corchado, J.M. (2024). Transformers in source code generation: A comprehensive survey. Journal of Systems Architecture, 153, article number 103193. doi: 10.1016/j.sysarc.2024.103193.
Gurjar, A., Camp, L.J., Ringenberg, T., Ma, X., & Chaora, A. (2023). Can large language models detect PII in code? SSRN. doi: 10.2139/ssrn.4619112 .
Hodovychenko, M.A., & Kurinko, D.D. (2025). Analysis of existing approaches to automated refactoring of object-oriented software systems. Herald of Advanced Information Technology, 8(2), 179-196. doi: 10.15276/ hait.08.2025.11.
Huang, K., Zhang, J., Bao, X., Wang, X., & Liu, Y. (2025). Comprehensive fine-tuning large language models of code for automated program repair. IEEE Transactions on Software Engineering, 51(4), 904-928. doi: 10.1109/ tse.2025.3532759 .
Li, J., Tao, C., Li, J., Li, G., Jin, Z., Zhang, H., Fang, Z., & Liu, F. (2023). Large language model-aware in-context learning for code generation. ACM Transactions on Software Engineering and Methodology, 34(7), article number 190. doi: 10.1145/3715908.
Luo, W., Keung, J., Yang, B., Ye, H., Goues, C.L., Bissyandé, T.F., Tian, H., & Le, X.B.D. (2025). When fine-tuning LLMs meets data privacy: An empirical study of federated learning in LLM-based program repair. ACM Transactions on Software Engineering and Methodology. doi: 10.1145/3733599.
Mohamed, K., Yousef, M., Medhat, W., Mohamed, E.H., Khoriba, G., & Arafa, T. (2024). Hands-on analysis of using large language models for the auto evaluation of programming assignments. Information Systems, 128, article number 102473. doi: 10.1016/j.is.2024.102473 .
Panebianco, F., Isgro, A., Longari, S., Zanero, S., & Carminati, M. (2025). Guessing as a service: Large language models are not yet ready for vulnerability detection. In Proceedings of the joint national conference on cybersecurity (ITASEC & SERICS 2025) (pp. 1-17). Bologna: Security and Rights in CyberSpace Foundation.
Qin, Z., Wu, Y., & Han, L. (2025). CLNX: Bridging code and natural language for C/C++ vulnerability-contributing commits identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 25047-25055). Washington: AAAI Press. doi: 10.1609/aaai.v39i23.34689.
Raihan, N., Newman, C., & Zampieri, M. (2024). Code LLMs: A taxonomy-based survey. In 2024 IEEE international conference on Big Data (BigData) (pp. 5402-5411). Washington: IEEE. doi: 10.1109/BigData62323.2024.10826108.
Saberi, I., Esmaeili, A., Fard, F., & Chen, F. (2025). AdvFusion: Adapter-based knowledge transfer for code summarization on code language models. In 2025 IEEE international conference on software analysis, evolution and reengineering (SANER) (pp. 563-574). Montreal: IEEE. doi: 10.1109/SANER64311.2025.00059.
Shi, J., Yang, Z., Kang, H.J., Xu, B., He, J., & Lo, D. (2024). Greening large language models of code. In ICSESEIS’24: Proceedings of the 46th international conference on software engineering: Software engineering in society (pp. 142-153). New York: Association for Computing Machinery. doi: 10.1145/3639475.3640097.
Shimmi, S., Rahman, A., Gadde, M., Okhravi, H., & Rahimi, M. (2024). VulSim: Leveraging similarity of multi dimensional neighbor embeddings for vulnerability detection. In 33rd USENIX security symposium (USENIX Security 24) (pp. 1777-1794). Philadelphia: Curran Associates, Inc.
Siavvas, M., Kalouptsoglou, I., Gelenbe, E., Kehagias, D., & Tzovaras, D. (2024). Transforming the field of vulnerability prediction: Are large language models the key? In 2024 32nd international conference on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS) (pp. 1-6). Krakow: IEEE. doi: 10.1109/MASCOTS64422.2024.10786575.
Singhal, A., Ghosh, R., Mundra, R., Dadlani, H., & Dutta, D. (2025). Code2JSON: Can a zero-shot LLM extract code features for code RAG? In ICLR 2025 third workshop on deep learning for code (pp. 1-23). Singapore: International Conference on Learning Representations.
Su, Z., Xu, X., Huang, Z., Zhang, Z., Ye, Y., Huang, J., & Zhang, X. (2024). Codeart: Better code models by attention regularization when symbols are lacking. Proceedings of the ACM on Software Engineering, 1, 562-585. doi: 10.1145/3643752 .
Taghavi Far, S.M., & Feyzi, F. (2025). Large language models for software vulnerability detection: A guide for researchers on models, methods, techniques, datasets, and metrics. International Journal of Information Security, 24, article number 78. doi: 10.1007/s10207-025-00992-7.
Tehrani, A., Bhattacharjee, A., Chen, L., Ahmed, N.K., Yazdanbakhsh, A., & Jannesari, A. (2024). CodeRosetta: Pushing the boundaries of unsupervised code translation for parallel programming. In 38th conference on neural information processing systems (pp. 100965-100999). Vancouver: Neural Information Processing Systems Foundation, Inc.
Weyssow, M. (2024). Aligning language models to code: Exploring efficient, temporal, and preference alignment for code generation. (PhD dissertation, University of Montreal, Montreal, Canada).
Xiang, B., & Shao, Y. (2024). SUMLLAMA: Efficient contrastive representations and fine-tuned adapters for bug report summarization. IEEE Access, 12, 78562-78571. doi: 10.1109/access.2024.3397326 .
Yong, C., Defeng, H., Chao, X., Nannan, C., & Jianbo, L. (2025). Smart contract generation model based on code annotation and AST-LSTM tuning. Journal of Supercomputing, 81, article number 731. doi: 10.1007/s11227 025-07186-x.
Zhang, Y., Kang, H., & Wang, Q. (2025). MMFDetect: Webshell evasion detect method based on multimodal feature fusion. Electronics, 14(3), article number 416. doi: 10.3390/electronics14030416.
Zheng, Z., Ning, K., Wang, Y., Zhang, J., Zheng, D., Ye, M., & Chen, J. (2023). A survey of large language models for code: Evolution, benchmarking, and future trends. ArXiv. doi: 10.48550/arXiv.2311.10372.
Zhong, M., Lyu, F., Wang, L., Geng, H., Qiu, L., Cui, H., & Feng, X. (2024). ComBack: A versatile dataset for enhancing compiler backend development efficiency . In 38th conference on neural information processing systems (pp. 112310-112328). Vancouver: Neural Information Processing Systems Foundation, Inc.
Zhou, Z., Li, M., Yu, H., Fan, G., Yang, P., & Huang, Z. (2024). Learning to generate structured code summaries from hybrid code context. IEEE Transactions on Software Engineering, 50(10), 2512-2528. doi: 10.1109/ tse.2024.3439562 .

ЦИТУВАТИ

Deineha, O., Arshava, O., & Zhovtonizhko, I. (2025). A comparative analysis of CodeBERT and CodeLlama models: Architecture, functionality and application in software coding tasks. Bulletin of Cherkasy State Technological University, 30(4), 128-142. https://doi.org/10.62660/bcstu/4.2025.128

[1] Bai, X., Huang, S., Wei, C., & Wang, R. (2025). Collaboration between intelligent agents and large language models: A novel approach for enhancing code generation capability. Expert Systems with Applications, 269, article number 126357. doi: 10.1016/j.eswa.2024.126357.

[2] Bhandari, G., Gavric, N., & Shalaginov, A. (2025). Generating vulnerability security fixes with code language models. Information and Software Technology, 185, article number 107786. doi: 10.1016/j.infsof.2025.107786 .

[3] Budzynskyi, O.V. (2025). Method of detecting vulnerabilities and automated response in corporate database protection systems. Modern Information Security, 2(62), 180-186. doi: 10.31673/2409-7292.2025.029259.

[4] Çaylı, O. (2024). AI-enhanced cybersecurity vulnerability-based prevention, defense, and mitigation using generative AI. Orclever Proceedings of Research and Development, 5(1), 655-667. doi: 10.56038/oprd.v5i1.616.

[5] d’Aloisio, G., Traini, L., Sarro, F., & Di Marco, A. (2025). On the compression of language models for code: An empirical study on codeBERT. In 2025 IEEE international conference on software analysis, evolution and reengineering (SANER) (pp. 12-23). Montreal: IEEE. doi: 10.1109/SANER64311.2025.00010 .

[6] de-Fitero-Dominguez, D., Garcia-Lopez, E., Garcia-Cabot, A., & Martinez-Herraiz, J.-J. (2024). Enhanced automated code vulnerability repair using large language models. Engineering Applications of Artificial Intelligence, 138(A), article number 109291. doi: 10.1016/j.engappai.2024.109291.

[7] Deineha, O., Donets, V., & Zholtkevych, G. (2024). The approach development of data extraction from lambda terms. Eastern-European Journal of Enterprise Technologies, 3(2(129)), 42-54. doi: 10.15587/1729 4061.2024.298991 .

[8] Gain, B., Bandyopadhyay, D., Mukherjee, S., Sahoo, A., Dana, S., Kodeswaran, P., Sen, S., Ekbal, A., & Garg, D. (2025). Transforming code understanding: Clustering-based retrieval for improved summarization in domain specific languages. In O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. Di Eugenio, S. Schockaert, K. Darwish & A. Agarwal (Eds.), Proceedings of the 31st international conference on computational linguistics: Industry track (pp. 546-560). Abu Dhabi: Association for Computational Linguistics.

[9] Gao, Z., Wang, H., Wang, Y., & Zhang, C. (2024). Virtual compiler is all you need for assembly code search. In Proceedings of the 62nd annual meeting of the association for computational linguistics (pp. 3040-3051). Bangkok: Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.167.

[10] Ghaemi, H., Alizadehsani, Z., Shahraki, A., & Corchado, J.M. (2024). Transformers in source code generation: A comprehensive survey. Journal of Systems Architecture, 153, article number 103193. doi: 10.1016/j.sysarc.2024.103193.

[11] Gurjar, A., Camp, L.J., Ringenberg, T., Ma, X., & Chaora, A. (2023). Can large language models detect PII in code? SSRN. doi: 10.2139/ssrn.4619112 .

[12] Hodovychenko, M.A., & Kurinko, D.D. (2025). Analysis of existing approaches to automated refactoring of object-oriented software systems. Herald of Advanced Information Technology, 8(2), 179-196. doi: 10.15276/ hait.08.2025.11.

[13] Huang, K., Zhang, J., Bao, X., Wang, X., & Liu, Y. (2025). Comprehensive fine-tuning large language models of code for automated program repair. IEEE Transactions on Software Engineering, 51(4), 904-928. doi: 10.1109/ tse.2025.3532759 .

[14] Li, J., Tao, C., Li, J., Li, G., Jin, Z., Zhang, H., Fang, Z., & Liu, F. (2023). Large language model-aware in-context learning for code generation. ACM Transactions on Software Engineering and Methodology, 34(7), article number 190. doi: 10.1145/3715908.

[15] Luo, W., Keung, J., Yang, B., Ye, H., Goues, C.L., Bissyandé, T.F., Tian, H., & Le, X.B.D. (2025). When fine-tuning LLMs meets data privacy: An empirical study of federated learning in LLM-based program repair. ACM Transactions on Software Engineering and Methodology. doi: 10.1145/3733599.

[16] Mohamed, K., Yousef, M., Medhat, W., Mohamed, E.H., Khoriba, G., & Arafa, T. (2024). Hands-on analysis of using large language models for the auto evaluation of programming assignments. Information Systems, 128, article number 102473. doi: 10.1016/j.is.2024.102473 .

[17] Panebianco, F., Isgro, A., Longari, S., Zanero, S., & Carminati, M. (2025). Guessing as a service: Large language models are not yet ready for vulnerability detection. In Proceedings of the joint national conference on cybersecurity (ITASEC & SERICS 2025) (pp. 1-17). Bologna: Security and Rights in CyberSpace Foundation.

[18] Qin, Z., Wu, Y., & Han, L. (2025). CLNX: Bridging code and natural language for C/C++ vulnerability-contributing commits identification. In Proceedings of the AAAI conference on artificial intelligence (pp. 25047-25055). Washington: AAAI Press. doi: 10.1609/aaai.v39i23.34689.

[19] Raihan, N., Newman, C., & Zampieri, M. (2024). Code LLMs: A taxonomy-based survey. In 2024 IEEE international conference on Big Data (BigData) (pp. 5402-5411). Washington: IEEE. doi: 10.1109/BigData62323.2024.10826108.

[20] Saberi, I., Esmaeili, A., Fard, F., & Chen, F. (2025). AdvFusion: Adapter-based knowledge transfer for code summarization on code language models. In 2025 IEEE international conference on software analysis, evolution and reengineering (SANER) (pp. 563-574). Montreal: IEEE. doi: 10.1109/SANER64311.2025.00059.

[21] Shi, J., Yang, Z., Kang, H.J., Xu, B., He, J., & Lo, D. (2024). Greening large language models of code. In ICSESEIS’24: Proceedings of the 46th international conference on software engineering: Software engineering in society (pp. 142-153). New York: Association for Computing Machinery. doi: 10.1145/3639475.3640097.

[22] Shimmi, S., Rahman, A., Gadde, M., Okhravi, H., & Rahimi, M. (2024). VulSim: Leveraging similarity of multi dimensional neighbor embeddings for vulnerability detection. In 33rd USENIX security symposium (USENIX Security 24) (pp. 1777-1794). Philadelphia: Curran Associates, Inc.

[23] Siavvas, M., Kalouptsoglou, I., Gelenbe, E., Kehagias, D., & Tzovaras, D. (2024). Transforming the field of vulnerability prediction: Are large language models the key? In 2024 32nd international conference on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS) (pp. 1-6). Krakow: IEEE. doi: 10.1109/MASCOTS64422.2024.10786575.

[24] Singhal, A., Ghosh, R., Mundra, R., Dadlani, H., & Dutta, D. (2025). Code2JSON: Can a zero-shot LLM extract code features for code RAG? In ICLR 2025 third workshop on deep learning for code (pp. 1-23). Singapore: International Conference on Learning Representations.

[25] Su, Z., Xu, X., Huang, Z., Zhang, Z., Ye, Y., Huang, J., & Zhang, X. (2024). Codeart: Better code models by attention regularization when symbols are lacking. Proceedings of the ACM on Software Engineering, 1, 562-585. doi: 10.1145/3643752 .

[26] Taghavi Far, S.M., & Feyzi, F. (2025). Large language models for software vulnerability detection: A guide for researchers on models, methods, techniques, datasets, and metrics. International Journal of Information Security, 24, article number 78. doi: 10.1007/s10207-025-00992-7.

[27] Tehrani, A., Bhattacharjee, A., Chen, L., Ahmed, N.K., Yazdanbakhsh, A., & Jannesari, A. (2024). CodeRosetta: Pushing the boundaries of unsupervised code translation for parallel programming. In 38th conference on neural information processing systems (pp. 100965-100999). Vancouver: Neural Information Processing Systems Foundation, Inc.

[28] Weyssow, M. (2024). Aligning language models to code: Exploring efficient, temporal, and preference alignment for code generation. (PhD dissertation, University of Montreal, Montreal, Canada).

[29] Xiang, B., & Shao, Y. (2024). SUMLLAMA: Efficient contrastive representations and fine-tuned adapters for bug report summarization. IEEE Access, 12, 78562-78571. doi: 10.1109/access.2024.3397326 .

[30] Yong, C., Defeng, H., Chao, X., Nannan, C., & Jianbo, L. (2025). Smart contract generation model based on code annotation and AST-LSTM tuning. Journal of Supercomputing, 81, article number 731. doi: 10.1007/s11227 025-07186-x.

[31] Zhang, Y., Kang, H., & Wang, Q. (2025). MMFDetect: Webshell evasion detect method based on multimodal feature fusion. Electronics, 14(3), article number 416. doi: 10.3390/electronics14030416.

[32] Zheng, Z., Ning, K., Wang, Y., Zhang, J., Zheng, D., Ye, M., & Chen, J. (2023). A survey of large language models for code: Evolution, benchmarking, and future trends. ArXiv. doi: 10.48550/arXiv.2311.10372.

[33] Zhong, M., Lyu, F., Wang, L., Geng, H., Qiu, L., Cui, H., & Feng, X. (2024). ComBack: A versatile dataset for enhancing compiler backend development efficiency . In 38th conference on neural information processing systems (pp. 112310-112328). Vancouver: Neural Information Processing Systems Foundation, Inc.

[34] Zhou, Z., Li, M., Yu, H., Fan, G., Yang, P., & Huang, Z. (2024). Learning to generate structured code summaries from hybrid code context. IEEE Transactions on Software Engineering, 50(10), 2512-2528. doi: 10.1109/ tse.2024.3439562 .