Faculty Publications

Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference

Erick Tyndall, Air Force Institute of Technology
Torrey J. Wagner, Air Force Institute of TechnologyFollow
Colleen Gayheart, Air Force Institute of Technology
Alexandre Some, Air Force Institute of Technology
Brent T. Langhals, Air Force Institute of TechnologyFollow

Document Type

Article

Publication Date

8-28-2025

Abstract

Recent advances in large language models and retrieval-augmented generation, a method that enhances language models by integrating retrieved external documents, have created opportunities to deploy AI in secure, offline environments. This study explores the feasibility of using locally hosted, open-weight large language models with integrated retrieval-augmented generation capabilities on CPU-only hardware for tasks such as question answering and summarization. The evaluation reflects typical constraints in environments like government offices, where internet access and GPU acceleration may be restricted. Four models were tested using LocalGPT, a privacy-focused retrieval-augmented generation framework, on two consumer-grade systems: a laptop and a workstation. A technical project management textbook served as the source material. Performance was assessed using BERTScore and METEOR metrics, along with latency and response timing. All models demonstrated strong performance in direct question answering, providing accurate responses despite limited computational resources. However, summarization tasks showed greater variability, with models sometimes producing vague or incomplete outputs. The analysis also showed that quantization and hardware differences affected response time more than output quality; this is a tradeoff that should be considered in potential use cases. This study does not aim to rank models but instead highlights practical considerations in deploying large language models locally. The findings suggest that secure, CPU-only deployments are viable for structured tasks like factual retrieval, although limitations remain for more generative applications such as summarization. This feasibility-focused evaluation provides guidance for organizations seeking to use local large language models under privacy and resource constraints and lays the groundwork for future research in secure, offline AI systems.

Comments

This article is published by MDPI, licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Sourced from the published version of record cited below.

Source Publication

Information (eISSN 2078-2489)

Recommended Citation

Tyndall, E.; Wagner, T.; Gayheart, C.; Some, A.; Langhals, B. Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference. Information 2025, 16, 744. https://doi.org/10.3390/info16090744

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Faculty Publications

Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference

Document Type

Publication Date

Abstract

Comments

Source Publication

Recommended Citation

Included in

Search

Browse

Author Corner

Faculty Publications

Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference

Authors

Document Type

Publication Date

Abstract

Comments

Source Publication

Recommended Citation

Included in

Share

Search

Browse

Author Corner