Evaluating String-Based Similarity Algorithms for Duplicate Record Identification in Databases

Mohammed Nasser  Salem; Mohammed Fadhl  Abdullah

doi:10.20428/jst.v31i2.3566

pdf

Published 2026-02-14

DOI: https://doi.org/10.20428/jst.v31i2.3566

Mohammed Nasser Salem

Department, Faculty of Engineering, Aden University, Aden, Yemen

Mohammed Fadhl Abdullah

Department, Faculty of Engineering and Computation, University of Science and Technology, Aden, Yemen

Abstract

Duplicate records negatively affect the accuracy, reliability, and efficiency of database systems. This paper presents a comparative study of two widely used string-based similarity algorithms: Simil and Jaro–Winkler. Both algorithms measure textual similarity between records in order to identify entries that refer to the same real-world entity. The findings show that Simil is more operative for multi-word fields such as names and addresses, while Jaro–Winkler performs better for short words and typographical errors. The study highlights the strengths and limitations of both algorithms and provides practical guidance for their use in database duplicate detection systems. The study is helpful for academic interest and systems developers to get ideas about how to deal with unnecessary data and to improve the overall data quality.

Keywords

Database, Duplicate Records, Jaro-Winkler, Simil algorithm, Data cleansing.

Issue

Vol. 31 No. 2 (2026)

Section

Articles

How to Cite

[1]

Salem , M.N. and Abdullah, M.F. trans. 2026. Evaluating String-Based Similarity Algorithms for Duplicate Record Identification in Databases. Journal of Science and Technology. 31, 2 (Feb. 2026). DOI:https://doi.org/10.20428/jst.v31i2.3566.

How to Cite

[1]

Salem , M.N. and Abdullah, M.F. trans. 2026. Evaluating String-Based Similarity Algorithms for Duplicate Record Identification in Databases. Journal of Science and Technology. 31, 2 (Feb. 2026). DOI:https://doi.org/10.20428/jst.v31i2.3566.

Download Citation

Submission to first decision	5 days
Review time	30 days
Submission to acceptance	40 days
Acceptance to publication	10 days

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

##plugins.themes.bootstrap3.article.details##

How to Cite

Most read articles by the same author(s)