Document Type
Article
Publication Date
8-28-2025
Abstract
Recent advances in large language models and retrieval-augmented generation, a method that enhances language models by integrating retrieved external documents, have created opportunities to deploy AI in secure, offline environments. This study explores the feasibility of using locally hosted, open-weight large language models with integrated retrieval-augmented generation capabilities on CPU-only hardware for tasks such as question answering and summarization. The evaluation reflects typical constraints in environments like government offices, where internet access and GPU acceleration may be restricted. Four models were tested using LocalGPT, a privacy-focused retrieval-augmented generation framework, on two consumer-grade systems: a laptop and a workstation. A technical project management textbook served as the source material. Performance was assessed using BERTScore and METEOR metrics, along with latency and response timing. All models demonstrated strong performance in direct question answering, providing accurate responses despite limited computational resources. However, summarization tasks showed greater variability, with models sometimes producing vague or incomplete outputs. The analysis also showed that quantization and hardware differences affected response time more than output quality; this is a tradeoff that should be considered in potential use cases. This study does not aim to rank models but instead highlights practical considerations in deploying large language models locally. The findings suggest that secure, CPU-only deployments are viable for structured tasks like factual retrieval, although limitations remain for more generative applications such as summarization. This feasibility-focused evaluation provides guidance for organizations seeking to use local large language models under privacy and resource constraints and lays the groundwork for future research in secure, offline AI systems.
Source Publication
Information (eISSN 2078-2489)
Recommended Citation
Tyndall, E.; Wagner, T.; Gayheart, C.; Some, A.; Langhals, B. Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference. Information 2025, 16, 744. https://doi.org/10.3390/info16090744
Comments
© 2025 by the authors. Licensee MDPI, Basel, Switzerland.
This article is published by MDPI, licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Sourced from the published version of record cited below.