PEFT Fine-Tuning with 1.58-Bit Quantization for a Quantum Computing Research Agent Chatbot: Architecture, Mathematical Foundations, and Practical Implementation

Rasikannan.L,; Israk.B, Mohamed; Vijay.K,; Kumar.G, Vinoth; Sabtharishi.G,

doi:https://doi.org/10.55041/ijcope.v2i4.942

Volume 02, Issue 04

Published on: April 2026

PEFT FINE-TUNING WITH 1.58-BIT QUANTIZATION FOR A QUANTUM COMPUTING RESEARCH AGENT CHATBOT: ARCHITECTURE, MATHEMATICAL FOUNDATIONS, AND PRACTICAL IMPLEMENTATION

Rasikannan.L Mohamed Israk.B Vijay.K Vinoth Kumar.G Sabtharishi.G

Department of Computer Science and Engineering

Government College of Engineering Srirangam Trichy Tamil Nadu India

DOI:https://doi.org/10.55041/ijcope.v2i4.942

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Download PDF Review Report

Abstract

Quantum computing represents one of the most technically dense frontiers in modern science, with specialized vocabulary spanning quantum gates, superposition, entanglement, variational quantum algorithms, and error-correction codes. Providing accurate, real-time research-grade assistance in this domain requires an LLM that is both highly capable and deployable on resource-constrained hardware. This paper presents the design and deployment of a Quantum Computing Research Agent Chatbot powered by PEFT fine-tuning of a 1.58-bit quantized LLM (BitNet b1.58). We extend the mathematical frameworks of Low-Rank Adaptation (LoRA), Adapter Layers, and Prefix Tuning to the quantum-domain fine-tuning context, incorporating a bespoke quantum-terminology corpus and a Retrieval-Augmented Generation (RAG) layer backed by a curated quantum literature index. We derive the rank–accuracy trade-off in the context of quantum NLP tasks, show that LoRA at rank r = 16 achieves an F1 score of 88.7 on quantum question-answering benchmarks while using only 0.13% of full model parameters, and demonstrate end-to-end inference at 4.2 tokens/second on a single NVIDIA T4 GPU. The proposed agent pipeline integrates with academic literature APIs (arXiv, Semantic Scholar) and supports multi-turn research dialogues, citation generation, and mathematical expression rendering.

Keywords— Quantum Computing Chatbot, PEFT, BitNet 1.58-bit Quantization, Low-Rank Adaptation, Research Agent, Retrieval-Augmented Generation, Ternary Weight Matrices, Quantum NLP

How to Cite this Paper

Rasikannan.L, , Israk.B, M., Vijay.K, , Kumar.G, V. & Sabtharishi.G, (2026). PEFT Fine-Tuning with 1.58-Bit Quantization for a Quantum Computing Research Agent Chatbot: Architecture, Mathematical Foundations, and Practical Implementation. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.942

Rasikannan.L, , et al.. "PEFT Fine-Tuning with 1.58-Bit Quantization for a Quantum Computing Research Agent Chatbot: Architecture, Mathematical Foundations, and Practical Implementation." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.942.

Rasikannan.L, ,Mohamed Israk.B, Vijay.K,Vinoth Kumar.G, and Sabtharishi.G. "PEFT Fine-Tuning with 1.58-Bit Quantization for a Quantum Computing Research Agent Chatbot: Architecture, Mathematical Foundations, and Practical Implementation." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.942.

Search & Index

References

[1] S. Ma et al., "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits," arXiv:2402.17764, 2024.

[2] E. J. Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," Proc. ICLR, 2022.

[3] N. Houlsby et al., "Parameter-Efficient Transfer Learning for NLP," Proc. ICML, 2019.

[4] X. L. Li and P. Liang, "Prefix-Tuning: Optimizing Continuous Prompts for Generation," Proc. ACL, 2021.

[5] T. Dettmers et al., "QLoRA: Efficient Finetuning of Quantized LLMs," Proc. NeurIPS, 2023.

[6] A. Aghajanyan et al., "Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning," Proc. ACL, 2021.

[7] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2000.

[8] J. Preskill, "Quantum Computing in the NISQ Era and Beyond," Quantum, vol. 2, p. 79, 2018.

[9] P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," Proc. NeurIPS, 2020.

[10] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.

[11] Y. Bengio et al., "Estimating or Propagating Gradients Through Stochastic Neurons," arXiv:1308.3432, 2013.

[12] Z. Zhang et al., "Quantization-Aware Training for Natural Language Understanding," Proc. EMNLP, 2022.

Ethical Compliance & Review Process

•All submissions are screened under plagiarism detection.
•Review follows editorial policy.
•Authors retain copyright.
•Peer Review Type: Double-Blind Peer Review
•Published on: May 01 2026

CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License

Back to Volume 02, Issue 04 View All Issues Next Article

← Previous Article

Parental Involvement and Attitude towards Education among Tribal Communities in Bankura District

Next Article →

Performance Optimization in Mild Steel Shaping Operation Using Taguchi Design of Experments