Continuous feedback loops: Online fine-tuning of LLMs with user signals
Abstract
The intensive growth in the use of real-time language models requires mechanisms for their dynamic adaptation to changes in queries, terminology, and user expectations. The study aimed to investigate approaches to continuous feedback-based retraining of large language models. To achieve this goal, the theoretical and structural-functional modelling of the adaptation architecture, experimental implementation of the language model retraining cycle with processing and classification of different types of feedback, and quantitative evaluation of the results using automatic and user metrics were applied. The results of the study showed the effectiveness of the architecture of continuous online learning, which ensures the relevance and stability of the language model in real time. The study determined that implicit feedback is 4-10 times more common than explicit feedback, but explicit feedback gives a higher increase in the accuracy of answers. The proposed system successfully integrated different types of user signals, providing dynamic generation of training examples and hybrid relearning while maintaining the quality and consistency of the results. The Python software cycle for adaptive retraining of the language model involved processing and filtering user signals to form a high-quality buffer of training pairs. After 500 retraining steps on 52,912 query-response pairs, a significant improvement of the model was observed, which was confirmed by a decrease in the loss function from 3.82 to 3.15 and stability of the fine-tuning process without signs of overtraining. The results of the pre-training showed a moderate improvement in the quality of answers after adaptation: lexical similarity according to the Recall-Oriented Understudy for Gisting Evaluation was 0.102, accuracy according to the Bilingual Evaluation Understudy was 0.006, and subjective user satisfaction increased to 0.24, while maintaining the stability of the model with an average cosine similarity value of 0.396. The approach proposed in this study improves the quality and relevance of real-time responses of language models while maintaining their stability and can be used in productive systems to improve user experience
Keywords
adaptive relearning; generative transformers; dynamic model adaptation; Python implementation; hybrid learning; quality assessment metrics; language model stability
References
- Ai, Q., Dou, Z., & Zhang, M. (2024). Improving generative information retrieval systems based on user feedback. In R.W. White & C. Shah (Eds.), Information access in the era of generative AI (pp. 111-133). Cham: Springer. doi: 10.1007/978-3-031-73147-1_5.
- Anisuzzaman, D.M., Malins, J.G., Friedman, P.A., & Attia, Z.I. (2025). Fine-tuning LLMs for specialized use cases. Mayo Clinic Proceedings Digital Health, 3(1), article number 100184. doi: 10.1016/j.mcpdig.2024.11.005.
- Balaskas, G., Papadopoulos, H., Pappa, D., Loisel, Q., & Chastin, S. (2025). A framework for domain-specific dataset creation and adaptation of large language models. Computers, 14(5), article number 172. doi: 10.3390/ computers14050172.
- Bodaghi, A., Fung, B.C.M., & Schmitt, K.A. (2024). AugmenToxic: Leveraging reinforcement learning to optimize LLM instruction fine-tuning for data augmentation to enhance toxicity detection. ACM Transactions on the Web. doi: 10.1145/3700791.
- Cardó, A.V. (2025). Investigating feedback types in reinforcement learning with human feedback and large language models. (Bachelor’s thesis, Aalto University, Espoo, Finland).
- Dombi, J., & Jónás, T. (2022). Weighted aggregation systems and an expectation level-based weighting and scoring procedure. European Journal of Operational Research, 299(2), 580-588. doi: 10.62441/nano-ntp.vi.5447.
- Gaikwad, S.V., Agarkar, P., Mohapatra, S., & Bagade, S. (2024). Fine-tuning LLM for sentiment analysis. Nanotechnology Perceptions, 20(6), 4946-4959. doi: 10.62441/nano-ntp.vi.5447.
- Haider, Z., Rahman, H., Devabhaktuni, V., Moeykens, S., & Chakraborty, P. (2025). A framework for mitigating malicious RLHF feedback in LLM training using consensus-based reward. Scientific Reports, 15, article number 9177. doi: 10.1038/s41598-025-92889-7.
- Hao, S., & Duan, L. (2025). Online learning from strategic human feedback in LLM fine-tuning. In CASSP 2025 – 2025 IEEE international conference on acoustics, speech and signal processing (pp. 1-5). Hyderabad: IEEE. doi: 10.1109/ICASSP49660.2025.10887891.
- Hilel, A., Shenfeld, I., Andreas, J., & Choshen, L. (2025). LLM hypnosis: Exploiting user feedback for unauthorized knowledge injection to all users. ArXiv. doi: 10.48550/arXiv.2507.02850.
- Kamath, U., Keenan, K., Somers, G., & Sorenson, S. (2024). Tuning for LLM alignment. In Large language models: A deep dive (pp. 177-218). Cham: Springer. doi: 10.1007/978-3-031-65647-7_5.
- Kampelopoulos, D., Tsanousa, A., Vrochidis, S., & Kompatsiaris, I. (2025). A review of LLMs and their applications in the architecture, engineering and construction industry. Artificial Intelligence Review, 58, article number 250. doi: 10.1007/s10462-025-11241-7.
- Karunakaran, S., & Jain, A. (2025). Fine-tuning LLMs for personality preservation in AI assistants. International Journal of Research in Modern Engineering & Emerging Technology, 13(4), 172-191. doi: 10.63345/ijrmeet.org. v13.i4.10.
- Lin, X., Wang, W., Li, Y., Yang, S., Feng, F., Wei, Y., & Chua, T. (2024). Data-efficient fine-tuning for LLMbased recommendation. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval (pp. 365-374). New York: Association for Computing Machinery. doi: 10.1145/3626772.3657807.
- Martin, D. (2025). Improving CXR report labeling through LLM fine-tuning and human feedback. Preprints. doi: 10.20944/preprints202504.1668.v1.
- Mazzullo, E., & Bulut, O. (2025). Automated feedback generation for open-ended questions: Insights from fine-tuned LLMs. In Proceedings of large foundation models for educational assessment (pp. 103-120). Vancouver: MLResearchPress.
- Next Electronics. (n.d.). Elastic weight consolidation (EWC). Retrieved from https://www.next.gr/ai/deep-learningtheory/elastic-weight-consolidation-ewc/.
- NinjaTech AI. (n.d.). Ninja LLM suite. Retrieved from https://www.ninjatech.ai/product/ninja-llm.
- OpenAssistant conversations dataset. (n.d.). Retrieved from https://huggingface.co/datasets/OpenAssistant/ oasst1.
- Pratap, S., Aranha, A.R., Kumar, D., Malhotra, G., Iyer, A.P.N., & Shylaja, S.S. (2025). The fine art of fine-tuning: A structured review of advanced LLM fine-tuning techniques. Natural Language Processing Journal, 11, article number 100144. doi: 10.1016/j.nlp.2025.100144.
- Punnaivanam, M., & Velvizhy, P. (2024). Contextual fine-tuning of language models with classifier-driven content moderation for text generation. Entropy, 26(12), article number 1114. doi: 10.3390/e26121114.
- Rawal, N., Tavva, P., & Selvakumar, P. (2024). Enhancing large language model performance with reinforcement learning from human feedback: A comprehensive study on Q&A, summarization, and classification. In 2024 international conference on electrical, computer and energy technologies (pp. 1-6). Sydney: IEEE. doi: 10.1109/ ICECET61485.2024.10698396.
- Rehan, S., Al-Bander, B., & Al-Said Ahmad, A. (2025). Harnessing large language models for automated software testing: A leap towards scalable test case generation. Electronics, 14(7), article number 1463. doi: 10.3390/ electronics14071463.
- Shi, T., et al. (2024). WildFeedback: Aligning LLMs with in-situ user interactions and feedback. ArXiv. doi: 10.48550/arXiv.2408.15549.
- Terras, N., Pereira, F., Silva, A.R., Santos, AA., Lopes, A.M., Silva, A.F.D., Cartal, L.A., Apostolescu, T.C., Badea, F., & Machado, J. (2025). Integration of deep learning vision systems in collaborative robotics for real-time applications. Applied Sciences, 15(3), article number 1336. doi: 10.3390/app15031336.
- Wang, J., Lu, H., Liu, Y., Ma, H., Wang, Y., Gu, Y., Zhang, S., Han, N., Bi, S., Baugher, L., Chi, E.H., & Chen, M. (2024). LLMs for user interest exploration in large-scale recommendation systems. In T. di Noia, P. Lops, T. Joachims. K. Verbert, P. Castells, Z. Dong & B. London (Eds.), Proceedings of the 18th ACM conference on recommender systems (pp. 872-877). New York: Assoviation Computing Machinery. doi: 10.1145/3640457.3688161.
- Wang, K., Lu, Y., Santacroce, M., Gong, Y., Zhang, C., & Shen, Y. (2025). Adapting LLM agents with universal communication feedback. In Findings of the association for computational linguistics (pp. 6105-6122). Albuquerque: Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-naacl.339.
- Watson, E., Viana, T., Zhang, S., Sturgeon, B., & Petersson, L. (2024). Towards an end-to-end personal fine-tuning framework for AI value alignment. Electronics, 13(20), article number 4044. doi: 10.3390/electronics13204044.
- Wu, X.-K., et al. (2025). LLM fine-tuning: Concepts, opportunities, and challenges. Big Data and Cognitive Computing, 9(4), article number 87. doi: 10.3390/bdcc9040087.
- Yang, B., Tian, H., Ren, J., Zhang, H., Klein, J., Bissyandé, T.F., Le Goues, C., & Jin, S. (2025). MORepair: Teaching LLMs to repair code via multi-objective fine-tuning. ACM Transactions on Software Engineering and Methodology. doi: 10.1145/3735129.
- Ye, C., Xiong, W., Zhang, Y., Dong, H., Jiang, N., & Zhang, T. (2025). Online iterative reinforcement learning from human feedback with general preference model. In Proceedings of the 38th international conference on neural information processing systems (pp. 81773-81807). Red Hook: Curran Associates Inc.
- Zhao, Q., Harper, F.M., Adomavicius, G., & Konstan, J.A. (2018). Explicit or implicit feedback? Engagement or satisfaction? A field experiment on machine-learning-based recommender systems. In Proceedings of the 33rd annual ACM symposium on applied computing (pp. 1331-1340). New York: Association for Computing Machinery. doi: 10.1145/3167132.3167275.