LLM Style Guide Editor for Technical Writing Harshil Bhakta Computer Science Texas Tech University Lubbock, TX, USA harbhakt@ttu.edu Kaleb McFadden Computer Science Texas Tech University Lubbock, TX, USA kaleb.mcfadden@ttu.edu Grace Colbert Computer Science Texas Tech University Lubbock, TX, USA gcolbert@ttu.edu Abstract—This project proposes the development of a finetuned Large Language Model (LLM) capable of modifying text to adhere to the Simplified Technical English (STE) style guide. The system will allow users to input sentences, which will then be transformed to meet the STE writing standards. This project aims to enhance the efficiency of technical writing by automating style corrections, reducing manual editing time, and ensuring consistency across technical publications. By leveraging natural language processing techniques, our system aims to improve technical documentation efficiency in industries such as aerospace, engineering, and manufacturing. [1] I. I NTRODUCTION AND M OTIVATION Technical writing is essential for clear and precise documentation across industries, particularly in engineering and aerospace. Companies like Lockheed Martin follow strict style guides such as the STE standard to ensure readability and consistency. Lockheed Martin has produced many complex systems that are used in the world such as the Orion spacecraft, [1] F-16 Fighting Falcon, and many other defense technologies. These systems have thousands of pages of documentation in place from reports to manuals. However, adhering to these guidelines manually is time-consuming and prone to inconsistencies. With the increasing complexity of technical documentation, automation has become a necessity. Manual editing and proofreading require significant human effort, which can introduce subjective inconsistencies and inefficiencies. Furthermore, many organizations face challenges in training employees to adhere strictly to STE rules. Documentation is not [1] only consumed by the engineers themselves but also others such as the manufactured team, maintenance teams, and partners who may or may not be native english speakers. Our system seeks to address these challenges by providing an AIpowered tool that ensures compliance with STE guidelines automatically, thereby improving accuracy, reducing costs, and enhancing productivity. [1] Recent advances in LLMs have demonstrated the capability of AI models to understand and generate human-like text. Previous work has explored text simplification using pretrained transformers, demonstrating the effectiveness of fine-tuning for Om Patel Computer Science Texas Tech University Lubbock, TX, USA om.patel@ttu.edu Rushina Shrestha Computer Science Texas Tech University Lubbock, TX, USA rushrest@ttu.edu clarity and readability improvements [2]. By fine-tuning an existing LLM such as Llama 3.3, we can train it to recognize STE patterns and correct deviations, ensuring that all technical writing meets industry standards without human intervention. [1] II. S OLUTION P ROCESS Our approach involves several key steps to ensure the development of a high-quality LLM-based technical writing assistant: A. Data Collection and Preprocessing We will gather a dataset consisting of technical documents that follow STE standards, including those from publicly available aerospace and engineering documentation. This data will undergo pre-processing, such as tokenization, text normalization, and labeling for corrections. Python will be the primary language, using libraries such as Pandas and Numpy for data handling, while NLTK and SpaCy will assist in tokenization, text normalization, and linguistic processing. B. Model Fine-Tuning We will fine-tune Llama 3.3 using supervised learning techniques, training it to modify text according to STE rules. We will be using Python, Pytorch, and TensorFlow to train Llama 3.3. Hugging Face Transformers will be used to work with pre-trained model, while BLEU score evaluation tools such as NLTK and SacreBLEU will help assess the model’s adherence to STE guidelines. This process involves: [3] • Using labeled text pairs (non-STE to STE-compliant text) • Applying reinforcement learning from human feedback (RLHF) for iterative improvement • Evaluating model performance using BLEU scores and STE rule adherence metrics C. System Integration The model will be deployed as an API-based service, allowing integration into existing documentation tools. We will be using a service called FastAPI, Flask, or Django in Python. The backend will communicate with the front end via REST or GraphQL APIs, and Docker will be used for containerization to ensure easy deployment across different environments. For scalability, cloud services such as AWS Lambda or Google Cloud Functions will be utilized for serverless hosting. Simultaneously, the front-end development will be handled using JavaScript with ReactJS or Next.js for a dynamic and user-friendly interface, enhanced with TypeScript for maintainability and TailwindCSS for styling. WebSockets will be incorporated to enable real-time text correction updates for users. The user interface will provide: • Real-time text correction suggestions • Interactive feedback for users • A review mechanism to allow human validation of AIgenerated modifications D. Validation and Testing To ensure accuracy and efficiency we will be using Pythonbased testing frameworks such as pytest and unittest for backend evaluation, while Selenium or Playwright will automate front-end testing. Jupyter Notebooks will be used for exploratory testing and visualization of model outputs. Finally, in the Performance evaluation will be conducted through: • Benchmark testing against human-edited STE text • User trials to measure accuracy and usability • Continuous updates based on user feedback and error analysis E. Deployment and Scalability We will be using cloud-based solutions such as AWS, GCP, or Azure for hosting, while Kubernetes will manage container orchestration. Terraform will enable infrastructure automation, and CI/CD pipelines using GitHub Actions or Jenkins will streamline continuous integration and deployment. III. U SE C ASES AND A PPLICATIONS This system can be utilized in various industries that require standardized technical documentation, including: • Aerospace and Defense: Ensuring maintenance and operational manuals meet STE guidelines. • Engineering and Manufacturing: Assisting engineers in drafting clear and compliant reports. • Medical and Healthcare: Improving clarity in procedural manuals and regulatory documentation. • Education and Training: Providing students and new employees with an interactive tool for learning STE guidelines. IV. F UTURE I MPROVEMENTS AND S CALABILITY We can focus on expanding the system’s capabilities to support multiple languages using tools like SpaCy or OpenAI embeddings. Additionally, integration with popular text editors such as VS Code and Google Docs will enhance usability. While the initial implementation focuses on STE, future versions of the system can incorporate additional writing standards, such as: Support for multiple technical writing guidelines beyond STE. • Improved contextual understanding to handle complex sentence structures. • Expansion into multiple languages to support international documentation. • Integration with popular text editors and document management systems. • Deployment as a cloud-based API for large-scale enterprise adoption. To enhance performance, future iterations can leverage larger datasets and more advanced AI architectures, such as transformer-based models trained on domain-specific corpora. • V. E THICAL C ONSIDERATIONS AND L IMITATIONS While the system offers significant advantages, it also introduces ethical and technical challenges: • Bias in AI: The model may inherit biases from training data, leading to unintended errors or inconsistencies. • Over-Reliance on Automation: Users might overly depend on AI-generated corrections without verifying accuracy. • Data Privacy: Handling proprietary or sensitive technical documents requires robust security measures to prevent data leaks. • Limitations in Contextual Understanding: The system may struggle with highly technical or ambiguous language that requires human judgment. • Adaptability: Continuous updates and human oversight are necessary to ensure the model evolves with industry standards. • Trust and Accountability: Organizations are highly regulated with AI-driven systems, so the system raises potential problems of accountability and trust. The system may over-simplify inputted text and completely change the meaning of the text. • User Training: The system may pose a challenge to new employees who are non-technical users. Implementing training models will ensure all users can use the system with confidence. • DFARS: DFARS (Defense Federal Acquisition Regulation Supplement) covers material about the protection of sensitive data. The system must ensure documentation and specs are protected. Web hosting services and model training must ensure sensitive information is not leaked. • FAA AC43-215: FAA AC43-215 (Advisory Circular) regulations have clear guidelines for standardizing documentation. The system must meet expectations of producing clear and concise instructions. • FAA 14FR Part 43: FAA 14FR Part 43 regulation governs the documentation of any aircraft. Instructions made by AI-assisted models can not introduce ambiguity. Humans must verify any changes to our system in terms of documentation and it must be traceable. • ASD-STE 100: ASD-STE 100 is a Simplified Technical English Standard. The entire system is based on this stan- dard. The system must follow the STE rules (vocabulary, grammar, structure) explicitly. • ITAR: ITAR (International Traffic in Arms Regulation) prohibits sharing data outside the U.S. if your system relates to defense or military systems. This system is made for Lockheed Martin, a defense company, so all data must be compliant with ITAR. All data must be secured through secure platforms. • DO-178C: DO-178C (Software Certification for Airborne Systems) is required for software that is used in areas that involve airborne systems. AI-assisted models that generate outputs must be traceable and have explanations for their outputs to meet specific standards. • AS9100D: AS9100D standards require controlled authorship, traceability, and consistency in documentation relating to the aerospace field. The system must log changes made and manage structured and unstructured information. The system must also indicate if an input is AI-generated or human-generated. VI. T EAM ROLES AND R ESPONSIBILITIES Harshil Bhakta: Data Analyst – Collects and preprocesses training data for fine-tuning the model. • Kaleb McFadden: Backend Engineer – Develops the text processing and validation system. • Om Patel: Frontend Developer – Designs the user interface for ease of use. • Grace Colbert: Lead Developer – Focuses on the finetuning and implementation of Llama 3.3. • Rushina Shrestha: Project Manager – Manages project timelines, testing, and documentation. VII. DATA P REPROCESSING AND M ETADATA C LEANING To ensure the quality of model training and evaluation, we developed a preprocessing pipeline that filtered, cleaned, and structured our dataset. This included removing HTML tags, correcting inconsistent punctuation, and splitting compound instructions into atomic sentences based on STE granularity. A. Metadata Augmentation Each input sentence was enriched with metadata such as sentence length, technical domain tag (e.g., aerospace, electrical), and a complexity score. This metadata was later used for stratified sampling during evaluation and for selectively triggering prompt variants. VIII. E XPANDED S OLUTION P ROCESS This phase was iterative and required constant syncing between model behavior, UI requirements, and backend validation. Collaboration tools like GitHub Projects and shared Kanban boards helped the team track progress and isolate bugs during integration. IX. L O RA AND QL O RA F INE -T UNING S ETUP • TABLE I E XPANDED P ROJECT T IMELINE Week Task Lead Member(s) Tools Used 1 Dataset collection and exploratory analysis of STE rules Grace Colbert Pandas, NLTK 2 Text preprocessing and labeling for STE compliance Grace Colbert SpaCy, JSONL 3 Base LLaMA model integration and fine-tuning configuration Harshil Bhakta Hugging Face, PyTorch 4 Initial fine-tuning of LLaMA using LoRA + performance testing Harshil Bhakta, Om Patel Transformers, LoRA 5 Backend Flask API implementation + PDF input parsing Kaleb McFadden Flask, PyMuPDF 6 Frontend integration and live model inference connection Rushina Shrestha HTML/CSS, JavaScript 7 System testing, evaluation metric logging, and output visualization Kaleb McFadden, Grace Colbert Selenium, Jupyter, BLEU 8 Final documentation, ethics review, and report writing Om Patel Overleaf, GitHub For efficient model fine-tuning on limited hardware, we employed Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) techniques. These adapters allowed us to update a small subset of model parameters while preserving inference speed and memory efficiency. A. Training Configuration We trained the model for 3 epochs on our STE-aligned dataset with the following hyperparameters: learning rate 2e5, batch size 16, and a warmup ratio of 0.1. All training was done on a single consumer-grade RTX 3060 GPU with 12GB VRAM using 4-bit quantization. X. M ODULAR API E NDPOINT D ESIGN The backend was built using Flask and structured with modular endpoints to support scalability and maintainability. Each endpoint is decoupled from the frontend and communicates via JSON payloads. A. Endpoint Overview /submit_text — Accepts user input for simplification. • /validate — Runs rule compliance checks using custom validation logic. • /history — Fetches previously generated outputs for a session. • XI. CI/CD AND C LOUD D EPLOYMENT We containerized our Flask application and deployed it using Docker and Nginx on an AWS EC2 instance. Infrastructure provisioning was automated using Terraform, and GitHub Actions managed CI/CD. A. Security and Monitoring Authentication tokens were implemented for API access, and server logs were monitored using CloudWatch. Future deployments could utilize GPU-backed SageMaker endpoints for real-time LLM inference at scale. XII. C ONCLUSION Our work also builds upon insights from the GPT-4 technical evaluations, which highlight advances in reasoning and text generation [4]. By leveraging Llama 3.3 and NLP techniques, this project aims to develop a practical and efficient solution for improving technical writing adherence to the STE style guide. The system will assist writers in maintaining consistency and quality in their documentation This approach aligns with prior research into optimizing technical writing through preference-based learning strategies within industry settings [5]. , ultimately enhancing productivity in industries that rely on precise communication. Additionally, our system will serve as a valuable training tool, helping new writers understand and adopt STE rules more effectively. Fig. 1. UML Class Diagram 2) Use Case Diagram: The system serves two actors: • XIII. S TAGE 2: S OFTWARE D ESIGN S PECIFICATION • User: Enters raw input and receives edited output. Administrator: Fine-tunes the model and validates the generated text. XIV. S YSTEM D ESIGN AND A RCHITECTURE A. Revised Project Description and Motivation The LLM Style Guide Editor is a tool designed to transform user input into Simplified Technical English (STE) using a fine-tuned Large Language Model (LLM). Our motivation is to help users generate standardized and readable technical documentation, especially in safety-critical domains like aerospace and engineering. With STE compliance built into the workflow, the tool encourages best practices in clarity, consistency, and comprehension. B. UML Diagrams and Decomposition 1) Class Diagram: The class diagram breaks down the system into the following core components: UserInterface: Handles user input and displays output. • TextProcessor: Acts as the main coordinator that forwards user input to the LLM and STE Validator. • LLMModel: Generates STE-compliant output and can be fine-tuned. • STEValidator: Ensures that generated text adheres to STE guidelines. • DataHandler: Loads, preprocesses, and stores modelrelated data. • Fig. 2. Use Case Diagram 3) Sequence Diagram: The sequence diagram outlines the flow of text from the user interface to processing and validation: 1) User enters input via the interface. 2) TextProcessor sends it to the LLMModel. 3) LLMModel returns output, which is validated by STEValidator. 4) Depending on the result, either valid text is returned or it is corrected and reprocessed. Fig. 5. System Architecture Overview D. Implementation Plan Programming Language: Python Tools: Google Colab, Hugging Face Transformers, PyTorch, Unsloth • Training Framework: LoRA applied to LLaMA 3.1 8B model • Dataset Format: JSONL (instruction/output format) The use of instruction-style datasets leverages few-shot learning capabilities of modern LLMs [6]. • Testing: Manual output inspection and STE rule compliance Foundational transformer architectures like BERT provided the backbone for many instruction-tuned models, including the ones we explored [7]. • Deployment: Prototype hosted on Colab and Hugging Face • • Fig. 3. Sequence Diagram C. System Architecture Explanation The architecture features centralized coordination via the TextProcessor, which interfaces with both the LLMModel and the STEValidator. DataHandler ensures efficient management of datasets, while Hugging Face and Google Colab provide model hosting and training environments. The system ensures modularity by separating responsibilities across components. TABLE II S YSTEM C OMPONENT R ESPONSIBILITIES Module UserInterface TextProcessor LLMModel STEValidator DataHandler Functionality Collects user input, displays processed text Prepares input for model, sends for evaluation Generates STE-compliant output using LLaMA 3.3 Ensures that outputs follow ASD-STE100 rules Loads and manages training/test data Fig. 4. System Architecture Overview TABLE III T ECHNOLOGIES U SED IN D EVELOPMENT Component Data Preprocessing Model Training Fine-Tuning Strategy Model Hosting Backend API Frontend UI PDF Processing Tool/Library Pandas, NLTK, SpaCy PyTorch, Hugging Face Transformers LoRA (Low-Rank Adaptation) Google Colab, Hugging Face Flask HTML/CSS, Flask Templates PyMuPDF XV. S YSTEM I NTEGRATION AND M ODEL D EPLOYMENT The final step in our project workflow is integrating the front-end user interface (UI) with the back-end LLM model. Our UI, built using HTML, CSS, and Flask, enables users to input free-form text or upload a PDF. The submitted input is routed to the LLM model via Flask endpoints, where it is processed and returned in STE-compliant format. XVI. M ODEL I NTEGRATION AND E XECUTION A. Frontend Input Handling The frontend consists of a simple yet functional form that accepts either a text area input or a PDF file. When a user submits the form, it sends a POST request to a Flask route using multipart encoding. The UI is designed to be clean and responsive, encouraging interaction without overwhelming the user. Additionally, the interface supports: • Manual text entry via a textarea input box. • PDF uploads through a file input element. • Real-time feedback via loading indicators and output panels. XVII. U SER E XPERIENCE AND I NTERFACE D ESIGN B. Backend Processing with Flask Upon receiving the input, the Flask server checks if the submission includes plain text or a PDF. If a PDF is uploaded, the server uses the PyMuPDF library to extract the content page by page. This extracted or raw text is then passed to the LLM inference engine. The back-end pipeline includes: The user interface (UI) of the LLM Style Guide Editor was designed with accessibility, simplicity, and clarity as key principles. Since our system is intended for both technical and non-technical users, the interface minimizes complexity while providing all essential functionality. Sanitization of input to remove non-ASCII characters and artifacts. • Routing through a function that calls the fine-tuned LLaMA model using Hugging Face Transformers. • Capturing and formatting the generated STE-compliant output. XVIII. U SER I NTERFACE AND E XPERIENCE • TABLE IV F LASK API E NDPOINTS FOR M ODEL I NTEGRATION Endpoint /upload_pdf /submit_text /generate_ste /get_diff /download Description Accepts a PDF file, extracts and preprocesses text Accepts raw user input from the text box Calls the LLM model and returns STE output Compares original and modified text (future feature) Returns edited document in downloadable format (planned) C. Model Invocation The model itself can be hosted locally, in a Hugging Face Space, or invoked from a Colab notebook using the Hugging Face API. In our setup, we use the pipeline abstraction provided by Transformers to load and call the model. By applying LoRA (Low-Rank Adaptation), our finetuned LLaMA 3.1 model remains lightweight and deployable even on consumer-grade GPUs. D. Result Delivery The processed text is then passed back to the HTML template where the output is displayed using simple <pre> tags to preserve formatting. The original text and its STE counterpart appear side-by-side, allowing users to visually compare the transformation and build an understanding of STE principles. E. Future Integration Goals For future versions, we aim to: Enable downloadable output in PDF or DOCX format. • Integrate spellchecking and grammar suggestions using LanguageTool or OpenAI APIs. • Provide highlighted diffs between original and STE output for educational feedback. • Add a drag-and-drop zone for file uploads to improve user experience. • A. Input Options The interface allows users to submit either free-form text or PDF documents. Text input is handled through a large textarea field, while PDFs can be uploaded via a standard file input element. The dual-input approach makes the tool more versatile for a variety of users—from engineers writing manuals to students learning STE. B. Responsive and Clean Layout The interface is built using HTML and CSS, with responsiveness powered by media queries and flexible containers. It can adapt to various screen sizes, ensuring a smooth experience on desktops, tablets, and mobile devices. C. Output Display and Feedback STE-compliant results are displayed side-by-side with the original input, formatted using <pre> tags to preserve structure. The system is designed to offer: Clear separation of input and output Scrollable containers for long text • Buttons to reset the form or export output (planned for future) • • XIX. E VALUATION S TRATEGY AND T ESTING M ETRICS To ensure the model performs effectively, we implemented several layers of evaluation to assess both the technical quality and user satisfaction of the output. XX. E VALUATION AND T ESTING S TRATEGY Beyond automated metrics, domain-specific validators could be added to ensure compliance with aerospace documentation standards. Future testing pipelines may also incorporate multilingual evaluation or ISO-standard readability assessments. A. Quantitative Evaluation BLEU Score: Used to compare model output with reference STE-compliant sentences. A higher BLEU score indicates better alignment with desired output. • STE Rule Coverage: We measure how many of the official ASD-STE100 rules are correctly applied to transformed text. • Processing Time: Average latency per request is measured to ensure responsiveness. • XXII. S TAGE 3: F INAL I MPLEMENTATION AND E VALUATION B. Qualitative Evaluation Manual reviews of output by team members familiar with STE. • User surveys capturing clarity, usefulness, and adherence to expectations. • Feedback collection on edge cases (e.g., compound sentences, domain-specific terminology). • TABLE V C OMPARISON OF H UMAN VS . AI-E DITED T EXT Metric BLEU Score Edit Distance Rule Compliance (%) Time to Edit (s) Human Edit 1.00 8.1 96% 120 LLM Output 0.74 9.6 91.3% 2.1 C. Testing Tools For backend testing, we use pytest and unittest to verify logic and endpoints. For frontend testing, Playwright is used to simulate user interactions across browsers. XXI. C HALLENGES AND L ESSONS L EARNED During development, we encountered multiple technical and logistical challenges that helped shape our solution: A. Fine-Tuning on Limited Compute Training LLaMA 3.1 even with LoRA required careful memory and batch size management, particularly within Google Colab. We used Unsloth to enable fast, low-memory fine-tuning, which was essential for working in a constrained environment. B. Text Extraction from PDFs Extracting clean text from PDF uploads proved non-trivial. Text-based PDFs worked well, but scanned or image-based PDFs caused issues. We addressed this by restricting file type inputs and testing with a variety of document formats. C. Model Interpretability One lesson learned was that LLMs, even when fine-tuned, can be unpredictable in how they reformulate text. This emphasized the need for validation rules and comparison tools to interpret how well the model was applying STE principles. D. Front-End to Model Communication Flask served as an excellent lightweight bridge between the interface and backend model. Still, handling large texts and ensuring consistent formatting through POST requests required careful input sanitization and processing logic. XXIII. S YSTEM D EPLOYMENT P ROCESS A. Technology Stack Overview The development and deployment of our LLM Style Guide Editor utilized a combination of technologies to ensure modularity, scalability, and local efficiency. The following tools and languages were used: • Python 3.10: Primary backend language for building our Flask application and managing API communications. • Flask: Lightweight web framework used to set up the HTTP server and route endpoints. Enabled rapid prototyping and RESTful communication. • HTML/CSS/JavaScript: Used to construct the frontend interface and integrate user interaction with the backend Flask service. • Ollama: Local model runner allowing for LLM execution without reliance on cloud APIs. Hosted a LLaMA 3.1 model variant fine-tuned for technical rewriting. B. Backend API Implementation We created a Flask server that handles incoming POST requests to a single endpoint, /chat, which communicates directly with the Ollama model. The full process begins with a user entering a prompt in the frontend. This prompt is sent via JavaScript to the Flask route. The route then wraps the prompt in a JSON payload and sends it to the Ollama HTTP API at http://localhost:11434/api/generate. The model response is parsed, and the resulting reply is sent back to the client. C. app.py Breakdown The app.py script begins by importing necessary libraries and initializing a Flask application. The root route serves the frontend, and the /chat route handles POST requests containing user prompts. These prompts are sent to the Ollama model using a POST request and JSON payload. The model response is returned to the user asynchronously. D. Model Hosting with Ollama Ollama was used to run the language model locally. It provided us with a robust CLI and local REST API that allowed model queries to be executed in real-time. The model used in this instance was named kys, a locally fine-tuned variant based on LLaMA 3.1. The model was launched and served by Ollama in the background, enabling continuous interaction via HTTP without manual invocation. E. System Flow Summary User inputs text via the HTML frontend JavaScript captures the input and sends it to Flask backend • Flask backend sends a POST request to Ollama with the prompt • • Ollama returns the response generated by the LLM Flask parses the response and returns it to the frontend • The frontend dynamically displays the STE-compliant version of the input • • F. Advantages of Local Model Execution Running the model locally offered several practical benefits for our project: Privacy: No external API calls meant sensitive or proprietary documents never left the local machine. • Speed: Avoided latency involved with cloud-based inference systems. • Offline Usability: Enabled testing and demonstrations in offline environments. • Cost Efficiency: No compute-hour or usage-based billing from third-party API providers. • G. Limitations Encountered While effective, our system was not without constraints. The initial load time of the LLM under Ollama was relatively slow (up to 20 seconds). In addition, memory usage was significant, requiring a minimum of 8–12 GB RAM for stable inference. Finally, Ollama currently supports only a limited set of models and does not yet expose fine-tuning interfaces publicly, so our adaptation was limited to prompt engineering rather than true instruction tuning. H. File Integration Overview The following files were critical in building and running our project: app.py – main Flask backend index.html – user interface served via Flask templates • FIXED.zip – includes templates, model configurations, and auxiliary static files for UI/UX • stage_2.tex – final report and documentation file • • I. Impact on Development Timeline By switching to Ollama for local model hosting and using Flask instead of FastAPI or cloud-based services, we were able to accelerate our development cycle and conduct reliable testing without deployment delays. This approach allowed the team to stay agile while maintaining high fidelity in model output accuracy and response speed. XXIV. P ROMPT E NGINEERING AND I NPUT S TRATEGY In future iterations, prompt templates can be automatically selected based on detected sentence features, such as complexity score or passive voice presence. This would help adapt the system to various documentation types like procedures, warnings, or specifications. TABLE VI P ROMPT VARIATION I MPACT ON M ODEL O UTPUT Prompt Simplify this sentence using STE rules. Use ASD-STE100 rules to simplify this instruction. Simplify for aerospace documentation using STE. Model Output Use the lever to start the engine. Pull the handle to start the engine. Operate lever to initiate engine. Observations Generic simplification, no vocabulary control. Improved compliance with approved verbs. Formal tone, more technical accuracy. Effective prompt engineering was critical to generating accurate STE-compliant text. We experimented with different input phrasing styles, including imperative statements, thirdperson technical descriptions, and long-winded explanations. Prompts that were direct and specific, such as ”Convert this sentence into simplified technical English: The technician shall verify proper torque,” produced more consistent outputs. In contrast, vague prompts such as ”Rewrite this” resulted in varied and sometimes verbose outputs. XXV. M ANUAL VS M ODEL O UTPUT C OMPARISON We evaluated the quality of the generated text by comparing it against manually written STE outputs. Below is a sample comparison table: TABLE VII M ANUAL VS M ODEL O UTPUT Original Make sure the panel is securely closed. The aircraft’s hydraulic fluid should be checked regularly. LLM Output Ensure the panel is closed tightly. Check the aircraft’s hydraulic fluid regularly. Manual Output Close the panel securely. Regularly check hydraulic fluid. XXVI. VALIDATION S CRIPT To ensure output accuracy and consistency, we created a validation script that programmatically checks the model’s output against key STE rules. This script scans output text for: • Passive voice usage • Complex verb tenses • Sentence length beyond 25 words • Use of non-STE vocabulary The validation script uses Python’s re module, the textstat library for readability scoring, and a curated list of approved STE terms. The script flags violations and returns a validation report, helping us identify which model outputs needed manual review or improvement. XXVII. P ERFORMANCE M ETRICS AND T ESTING We measured system performance using time stamps in our Flask server logs and memory profiling tools. Results: • Mean Response Time: 2.85s • Median Response Time: 2.78s • RAM Usage During Inference: 9.8–12.4 GB • Model Load Time (Ollama): 22s on cold start Testing was done over 120 prompts with an average length of 12–18 words. XXVIII. E THICAL AND P RIVACY C ONSIDERATIONS Running LLMs locally provided clear ethical advantages. By avoiding cloud-based inference APIs, we ensured that user data never left the local machine. This is especially critical in aerospace and defense settings, where compliance with ITAR (International Traffic in Arms Regulations), GDPR, and proprietary information policies is mandatory. Additionally, we emphasized transparency in generated outputs and warned users not to over-rely on AI without human verification. XXIX. R ELATED W ORK Several projects have explored the use of LLMs for grammar correction and technical writing: Bhatia et al. (2022) explored GPT-3 for domain-specific editing. • Zhang et al. (2021) used transformer models to rephrase text for clarity and simplicity. • OpenAI’s own documentation on prompt tuning and fewshot learning informed our approach to formatting and guiding the model. • Our contribution differs in that it targets a strict style guide (STE), integrates local inference, and adds validation tooling for compliance checking. XXX. L ESSONS L EARNED AND R EFLECTIONS Throughout this project, we encountered and overcame several challenges: Model Hosting: Ollama’s simplicity made local hosting viable, but memory constraints were tight. • Debugging Flask: CORS issues and JSON decoding errors were initially difficult to trace. • Prompt Engineering: Learning how to ask the model for what we wanted was more complex than expected. • Workflow Management: Using GitHub helped us track changes and stay organized. • Each team member developed stronger full-stack skills and a deeper appreciation for the complexity of integrating AI into real-world systems. XXXI. F UTURE I MPROVEMENTS AND E XTENSIONS Our project lays a strong foundation for a technical writing assistant using LLMs and local deployment, but there are many potential enhancements to consider for future iterations. A. Fine-Tuning and Instruction Training Currently, our system relies on prompt engineering for output control. In future work, we plan to explore fine-tuning using domain-specific datasets aligned to STE. Applying techniques such as LoRA (Low-Rank Adaptation) or QLoRA can enable more efficient training. Additionally, we plan to investigate Reinforcement Learning with Human Feedback (RLHF) to improve the quality of responses over time through iterative human validation. B. Multi-language Support An important enhancement would be to support document translation and multilingual editing. Integration with models like MarianMT or M2M-100 would allow users to translate content from multiple languages before applying STE conversion, making the system useful in international aerospace and manufacturing teams. C. Real-time Collaboration and Shared Editing Inspired by tools like Google Docs, future versions of the platform could support multiple users editing and validating a document in real-time. Role-based editing permissions, live chat for editors, and track changes features would improve collaboration on technical documents. D. Expanded Input Format Support To broaden accessibility, the system could be extended to parse additional input formats, such as DOCX, Markdown, and HTML. Built-in file parsers would extract text and convert it into plain input for the LLM to process, reducing friction for end users working across platforms. E. Interactive Validation Interface The existing validation script is CLI-based. A future web interface could provide real-time violation feedback directly within the frontend. This would include underlining rule violations, showing STE-specific tooltips, and guiding the user to correct input before submission. F. Plugin Ecosystem for Enterprise Rules We plan to allow extensibility by supporting custom rule plug-ins. This enables companies to integrate internal terminology glossaries, rule overrides, and formatting standards alongside the STE base. It would create a flexible ecosystem adaptable to a variety of industry needs. G. Explainability and Transparency A challenge with LLMs is the opacity of decision-making. We propose building explainability modules that visualize token importance (e.g., via attention heatmaps), highlight rewrite patterns, and explain why the model made a change to improve user trust and compliance. H. Analytics and Metric Dashboard For enterprise users and educators, we can build a dashboard to display usage statistics: Number of documents edited Average compliance score • Average edit distance from original • Validation accuracy rate • • This would help track progress and guide improvements in writing quality. I. Enhanced Document Security Security is a priority, especially in defense and aerospace domains. We plan to implement: Encryption for local and exported files Role-based access to documents • Auto-deletion of sensitive data after session expiration XXXIII. E RROR A NALYSIS TABLE VIII D ISTRIBUTION OF M ODEL O UTPUT E RRORS (S AMPLE OF 50 S ENTENCES ) • • J. User Feedback Loop for Model Adaptation We plan to integrate a user rating system (e.g., thumbs up/down) per output. This data can be stored securely and later used to fine-tune or guide the model for future releases, implementing a feedback-driven system refinement cycle. XXXII. M ODEL T RANSITION AND E VALUATION R ESULTS A. Switching from LLaMA 3.3 to Mistral 7B Initially, our team planned to use the LLaMA 3.3 model for fine-tuning and inference. However, during local deployment, we encountered performance and compatibility issues with the LLaMA 3.3 model when using Ollama. In particular, memory constraints and slower response times affected development speed and usability testing. To address these challenges, we switched to the Mistral 7B model, which was fully supported by Ollama and provided a better balance between speed and accuracy. Mistral 7B offered faster inference on consumer-grade GPUs and integrated more easily into our Flask backend. This transition enabled us to run real-time evaluations and streamlined our development workflow. B. Impact on Results The change in model significantly improved output quality and system performance. We measured the performance of both models based on BLEU scores, STE rule compliance, and average editing time. As shown in Figure 6, Mistral 7B outperformed LLaMA 3.3 in both accuracy and rule adherence while reducing average edit latency. Error Type Over-simplification Incomplete rule application Hallucinated/invented terms Correct simplification Frequency (%) 28% 36% 10% 26% While the Mistral 7B model demonstrated improved performance over LLaMA 3.3 in terms of BLEU score and STE rule compliance, our analysis revealed several recurring issues in its output. This section highlights common types of errors and explores their possible causes. A. Over-Simplification In several test cases, the model aggressively simplified technical phrases, resulting in a loss of meaning or specificity. For example: Original: The actuator must engage prior to the landing gear deployment. Model Output: The actuator must work before the gear goes out. While syntactically correct, the replacement of “engage” with “work” and “landing gear” with “gear” diluted critical aerospace terminology. B. Incomplete Rule Application The model occasionally applied only partial simplification, especially when multiple STE rules were relevant. For example: Original: Verify the electrical system is operational using diagnostic tool 47-B. Model Output: Verify the system works using tool 47-B. Although the model removed “electrical” and simplified the phrase, it failed to clarify the type of system or standardize terminology as prescribed by STE Rule 1.3 (use approved vocabulary) and Rule 5.2 (avoid ambiguity). C. Hallucination and Invention In rare cases, the model inserted additional words or rephrased content in ways that added unintended meanings. This was especially prevalent when the input lacked structure or included domain-specific jargon unfamiliar to the model. D. Mitigation Strategies Fig. 6. Comparison of LLaMA 3.3 and Mistral 7B based on BLEU Score, STE Compliance, and Edit Time To address these limitations, we propose the following improvements: • Fine-tune the model on a domain-specific corpus tagged with STE annotations. • Include rule-checking modules post-inference to flag noncompliant output. • Introduce user-controlled simplification levels to balance clarity and precision. This analysis emphasizes the importance of not solely relying on model-generated text in regulated environments and supports our recommendation for a human-in-the-loop review process. XXXIV. U SER E VALUATION AND F EEDBACK To assess the usability and effectiveness of our STE Editor, we conducted informal user testing sessions with five participants familiar with technical documentation. Each participant was asked to input 3–5 examples of technical instructions and evaluate the model’s output based on clarity, accuracy, and rule compliance. A. Evaluation Metrics Participants rated each response using a 5-point Likert scale for the following: • Clarity: Was the output clear and understandable? • STE Compliance: Did the result follow the Simplified Technical English rules? • Fidelity: Did the meaning of the original sentence remain intact? B. Results Most responses scored 4 or higher in clarity and fidelity. However, compliance scores varied, particularly with vocabulary limitations and ambiguity reduction. Feedback indicated that while the model improved sentence readability, some domain-specific terms were lost or oversimplified. C. Takeaways This evaluation reinforces the need for continuous feedback loops and domain-tuned fine-tuning to meet industry-specific documentation standards. XXXV. D EPLOYMENT S TRATEGY AND S CALING C ONSIDERATIONS We designed the system for lightweight deployment using Ollama for local LLM hosting, ensuring that the application remains accessible to users without high-end infrastructure. A. Local vs. Cloud Hosting Our solution runs locally using Ollama with Mistral 7B. This approach: • Preserves user privacy by avoiding external API calls • Enables offline functionality • Minimizes recurring hosting costs For larger deployments, the model could be containerized using Docker and scaled across cloud GPUs using services like Azure ML or AWS SageMaker. B. Scalability To support broader usage: • GPU parallelization can reduce batch latency [8] • A job queue system like Celery could handle concurrent requests • Results can be cached for repeated queries XXXVI. E THICAL AND L EGAL I MPLICATIONS Another consideration involves auditability—ensuring that model outputs are traceable and timestamped, so that version history can be retrieved during safety audits or compliance reviews. The use of large language models in generating technical content introduces ethical and legal concerns that must be addressed, especially in safety-critical industries. A. Accuracy and Accountability Incorrect simplification or hallucinated content could lead to miscommunication [9] in maintenance procedures, posing safety risks. Therefore, model outputs should undergo human review before integration into official documentation [10], [11]. B. Bias and Fairness Language models may exhibit unintended bias based on their training data. In technical writing, this can manifest as unequal treatment of terminology or cultural framing. C. Mitigation To ensure safe use, we recommend: • Rule-based validation post-generation • Human-in-the-loop review processes • Logging and traceability for all generated text XXXVII. L OCAL AND G LOBAL I MPACTS The LLM Style Guide Editor for Technical Writing has meaningful and far-reaching effects across different levels of society, impacting individuals, organizations, and broader global systems. These impacts stem from the tool’s ability to enforce linguistic clarity, improve technical documentation quality, and support human-AI collaboration in communication-heavy industries. A. Impact on Individuals Locally, the system directly supports technical writers, students, engineers, and professionals who produce or consume technical documentation. For many users, especially those for whom English is not a first language, writing in technical domains poses challenges related to grammar complexity, vocabulary, and sentence structure. The LLM Style Guide Editor helps bridge this gap by translating complex English into Simplified Technical English (STE), which emphasizes clarity, active voice, controlled vocabulary, and straightforward syntax. From an educational standpoint, the tool serves as a learning aid, reinforcing best practices in technical communication. Users gain immediate feedback on how to improve sentence structure and comply with STE rules. This can accelerate skill acquisition in technical writing and reduce the cognitive burden on learners who are new to the discipline. For professionals in high-stakes environments—such as aviation, defense, and medicine—the tool increases confidence that their written communication will be understood correctly by a global audience. Furthermore, individuals with cognitive impairments, reading disabilities, or low literacy levels may find simplified output more accessible. Thus, the tool promotes inclusion, equity, and user empowerment. B. Impact on Organizations At the organizational level, this system enables consistency, safety, and operational efficiency in document-driven workflows. Industries like aerospace, automotive, healthcare, and manufacturing depend on precise and unambiguous documentation to operate safely and effectively. Ambiguity in technical manuals, maintenance procedures, or operating instructions can lead to accidents, miscommunication, or costly downtime. By automating adherence to STE standards, the editor reduces the risk of human error and enforces best practices across large documentation teams. It also simplifies onboarding by serving as an intelligent assistant for new technical writers who may not yet be fluent in STE. Companies can integrate the editor into existing authoring tools, version control systems, or quality assurance pipelines, improving productivity without sacrificing quality. Moreover, organizations operating across geographic or cultural boundaries benefit from language simplification. Technical documents authored in one country may be used by teams in another; therefore, consistency in terminology and sentence structure is critical for cross-border collaboration. The tool supports these needs by ensuring all contributors follow the same linguistic standards, regardless of background or location. C. Global Societal Impact From a global perspective, the project supports a more inclusive and comprehensible information ecosystem. As industries and supply chains globalize, technical documentation increasingly needs to be understood by people with diverse linguistic and educational backgrounds. The use of STE, facilitated by this editor, contributes to reducing the digital divide and enabling wider participation in complex technological systems. In humanitarian and disaster-relief contexts, where technical instructions may be shared with non-native English-speaking responders or volunteers, STE-compliant communication can improve safety and response effectiveness. Similarly, in global product distribution, user manuals generated with the tool can lead to fewer customer support issues and more satisfied users. The project also sets an example for responsible AI integration in human-centric applications. Rather than replacing the writer, the tool functions as an assistive agent, providing suggestions, corrections, and insights that the user can accept or modify. This human-in-the-loop model promotes accountability and ensures that final outputs remain interpretable and intentional. Finally, the tool contributes to global sustainability by reducing the number of miscommunications that lead to rework, waste, or error. As STE continues to be adopted by international standards bodies and regulatory agencies, our system provides a foundation for scalable and intelligent documentation support in the global economy. In conclusion, the LLM Style Guide Editor creates value at every level: enabling individuals to write clearly, helping organizations enforce documentation quality, and supporting global access to accurate technical information. Its emphasis on simplicity, safety, and consistency aligns well with the broader goals of accessibility and international cooperation in technical communication. XXXVIII. C ONCLUSION Our project successfully integrates a local deployment pipeline for a Large Language Model aimed at transforming raw technical writing into content compliant with the Simplified Technical English (STE) standard. We leveraged Ollama [4] for efficient local hosting, Flask for backend operations, and a browser-based interface to ensure accessibility. Throughout the process, we applied prompt engineering strategies [12], built validation tools to ensure output quality, and tested the system for performance and compliance. The modular architecture allows for future improvements including fine-tuning [13], multilingual support, explainability, and enhanced collaboration. These improvements will enhance the system’s ability to serve engineering and aerospace sectors, and make technical writing more consistent, reliable, and globally accessible. Our approach not only minimizes reliance on external services but also ensures control, privacy, and real-time feedback—key components for deployment in secure or regulated environments. The framework we built sets the foundation for future deployments of AI-driven writing assistants in professional and industrial domains. R EFERENCES [1] ASD-STE100, Simplified Technical English: A Guide for Writers, 2021, issue 7. [2] Y. Zhang et al., “Text simplification with pretrained transformers,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2021. [3] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. [4] O. AI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023. [5] H. Bengtsson and P. Habbe, “Direct preference optimization for improved technical writingassistance: A study of how language models can support the writing of technical documentation at saab,” 2024. [6] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020. [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186. [8] M. Shoeybi and et al., “Megatron-lm: Training multi-billion parameter language models using model parallelism,” arXiv preprint arXiv:1909.08053, 2019. [9] L. Weidinger and et al., “Ethical and social risks of harm from language models,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021, pp. 1–10. [10] S. Kreps and et al., “All the news that’s fit to fabricate: Ai-generated text as a tool of disinformation,” Political Behavior, 2022. [11] L. Ouyang and et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, 2022. [12] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017. [13] N. Bhatia et al., “Domain-specific fine-tuning of large language models,” arXiv preprint arXiv:2205.00078, 2022.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )