Published on: June 2026
INTELLIGENT FRAMEWORK FOR AUTOMATED STRUCTURED DATA GENERATION FROM UNSTRUCTURED TEXT, IMAGES, AND PDF DOCUMENTS
TUSHAR T S
Naveen Kumar B
Article Status
Available Documents
Abstract
The rapid growth of digital information has led to the generation of large amounts of unstructured data in formats such as text documents, images, and PDF files. Since this type of data does not follow a fixed structure, extracting useful information from it remains a significant challenge for conventional data processing systems. This paper presents a unified framework for transforming unstructured content into structured and machine-readable data. The proposed approach combines Natural Language Processing (NLP) and Optical Character Recognition (OCR) techniques to handle information from textual documents, scanned records, images, and PDF files. NLP methods are employed to identify important entities, keywords, and contextual information from textual content, while OCR technology is used to extract text embedded within images and document files. The extracted information undergoes preprocessing, cleaning, and normalization before being organized into structured formats such as JSON and CSV. The developed framework improves data accessibility, simplifies information management, and reduces the effort required for manual data extraction. The solution can support various applications, including document management systems, business intelligence, healthcare record processing, and information retrieval platforms. Experimental evaluation indicates that the framework provides an efficient and reliable method for converting heterogeneous unstructured data into structured datasets suitable for analysis and decision-making.
How to Cite this Paper
S, T. T. (2026). Intelligent Framework for Automated Structured Data Generation from Unstructured Text, Images, and PDF Documents. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(6). https://doi.org/10.55041/ijcope.v2i6.119
S, TUSHAR. "Intelligent Framework for Automated Structured Data Generation from Unstructured Text, Images, and PDF Documents." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 6, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i6.119.
S, TUSHAR. "Intelligent Framework for Automated Structured Data Generation from Unstructured Text, Images, and PDF Documents." International Journal of Creative and Open Research in Engineering and Management 02, no. 6 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i6.119.
References
[1] K. Sambrekar, V. S. Rajpurohit, and J. Joshi, “A Proposed Technique for Conversion of Unstructured Agro-Data to Semi-Structured or Structured Data,” Proceedings of IEEE ICCUBEA, 2018.[2] N. I. Abo Dabowsa, A. M. Maatuk, S. M. Elakeili, and M. A. Ali, “Converting Relational Database to Document-Oriented NoSQL Cloud Database,” Proceedings of IEEE MI-STA, 2021.
[3] I. Valova, T. Kaneva, and T. Halacheva, “Automatic Extraction and Analysis of Text and Stylistic Features of PDF Documents,” Proceedings of IEEE EE&AE, 2025.
[4] K. Schatz, P.-Y. Hou, A. V. Gulyuk, Y. G. Yingling, and R. Chirkova, “BUILD-KG: Integrating Heterogeneous Data Into Analytics-Enabling Knowledge Graphs,” Proceedings of IEEE BigData, 2023.
[5] S. Rodrigues, A. Mhatre, K. Kuwar, and S. Borde, “Cricket Player Performance Analysis Using Deep Learning,” Proceedings of IEEE CONIT, 2025.
[6] IEEE IMCOM Authors, “Explicit and Implicit Section Identification from Clinical Discharge Summaries,” Proceedings of IEEE IMCOM, 2022.
[7] K. Sun et al., “Flexible Data Extraction from Unstructured Measurement Reports Using a Template-Driven Approach,” IEEE Access, 2025.
[8] Q. Zhai et al., “High Efficient Efuse Full Process Burning Solution Based on ATE,” IEEE Semiconductor Testing Research, 2025.
[9] L. M. Hoi et al., “Manipulating Data Lakes Intelligently With Java Annotations,” IEEE International Conference on Big Data, 2024.
[10] X. Liu et al., “Research and Applications of Large Language Models for Converting Unstructured Data into Structured Data,” IEEE Research Publication, 2024.
Ethical Compliance & Review Process
- •All submissions are screened under plagiarism detection.
- •Review follows editorial policy.
- •Authors retain copyright.
- •Peer Review Type: Double-Blind Peer Review
- •Published on: Jun 10 2026
This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

