IJCOPE Journal

UGC Logo DOI / ISO Logo

International Journal of Creative and Open Research in Engineering and Management

A Peer-Reviewed, Open-Access International Journal Supporting Multidisciplinary Research, Digital Publishing Standards, DOI Registration, and Academic Indexing.
Journal Information
ISSN: 3108-1754 (Online)
Crossref DOI: Available
ISO Certification: 9001:2015
Publication Fee: 599/- INR
Compliance: UGC Journal Norms
License: CC BY 4.0
Peer Review: Double Blind
Volume 02, Issue 04

Published on: April 2026

THE DOCUMENT SIMILARITY & DEDUPLICATION TOOL

B Saritha Pogula Sandhya P Vamshi Goud Banothu Narahari Chittoju Koushik Avinash

Department of CSE (Data Science) ACE Engineering College Hyderabad Telangana India

Article Status

Plagiarism Passed Peer Reviewed Open Access

Available Documents

Abstract

The Document Similarity and Deduplication Tool is made to solve the problem of having much repeated data in digital storage. Every day more and more documents are being. When there are duplicate or almost duplicate files it takes up space that is not needed and makes it hard to manage the data. The Document Similarity and Deduplication Tool uses Natural Language Processing techniques to prepare the text so the documents can be compared and analyzed correctly. The Document Similarity and Deduplication Tool system uses ways to measure how similar things are, like Cosine Similarity, Jaccard Index and Euclidean Distance to figure out how similar the documents are to each other. When the Document Similarity and Deduplication Tool finds documents it gives the user choices about what to do with them. The user can merge the documents put them in an archive or delete them depending on what the user wants and this makes the data cleaner and uses storage space. The Document Similarity and Deduplication Tool has a web interface that uses HTML, CSS and JavaScript. It lets the user upload files see how similar they are and manage the duplicates easily all in one place.

How to Cite this Paper

Saritha, B., Sandhya, P., Goud, P. V., Narahari, B. & Avinash, C. K. (2026). The Document Similarity & Deduplication Tool. International Journal of Creative and Open Research in Engineering and Management, <i>02</i>(04). https://doi.org/10.55041/ijcope.v2i4.163

Saritha, B, et al.. "The Document Similarity & Deduplication Tool." International Journal of Creative and Open Research in Engineering and Management, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijcope.v2i4.163.

Saritha, B,Pogula Sandhya,P Goud,Banothu Narahari, and Chittoju Avinash. "The Document Similarity & Deduplication Tool." International Journal of Creative and Open Research in Engineering and Management 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijcope.v2i4.163.

Search & Index

References


  1. A Comparative Study of TF-IDF & Cosine Similarity for Document Matching - S. Patel et al (2023)

  2. Shingling Algorithm for Near-Duplicate Detection - Andrei Broder et al.(2022)

  3. Data De-duplication on Similar File Detection - Xingjun Zhang, Runting Zhao (2021)

  4. Near-Duplicate Detection in Web App Model Inference - Rahulkrishna Yandrapally et al.(2020)

  5. Efficient Similarity Joins for Near-Duplicate Detection - Chuan Xiao et al.(2020)

Ethical Compliance & Review Process

  • All submissions are screened under plagiarism detection.
  • Review follows editorial policy.
  • Authors retain copyright.
  • Peer Review Type: Double-Blind Peer Review
  • Published on: Apr 08 2026
CCBYNC

This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to share and adapt this work for non-commercial purposes with proper attribution.

View License
Scroll to Top