##plugins.themes.bootstrap3.article.main##

Mohammed Nasser Salem Mohammed Fadhl Abdullah

Abstract

Duplicate records negatively affect the accuracy, reliability, and efficiency of database systems. This paper presents a comparative study of two widely used string-based similarity algorithms: Simil and Jaro–Winkler. Both algorithms measure textual similarity between records in order to identify entries that refer to the same real-world entity. The findings show that Simil is more operative for multi-word fields such as names and addresses, while Jaro–Winkler performs better for short words and typographical errors. The study highlights the strengths and limitations of both algorithms and provides practical guidance for their use in database duplicate detection systems. The study is helpful for academic interest and systems developers to get ideas about how to deal with unnecessary data and to improve the overall data quality.

##plugins.themes.bootstrap3.article.details##

Keywords

Database, Duplicate Records, Jaro-Winkler, Simil algorithm, Data cleansing.

Section
Articles
How to Cite
[1]
Salem , M.N. and Abdullah, M.F. trans. 2026. Evaluating String-Based Similarity Algorithms for Duplicate Record Identification in Databases. Journal of Science and Technology. 31, 2 (Feb. 2026). DOI:https://doi.org/10.20428/jst.v31i2.3566.

How to Cite

[1]
Salem , M.N. and Abdullah, M.F. trans. 2026. Evaluating String-Based Similarity Algorithms for Duplicate Record Identification in Databases. Journal of Science and Technology. 31, 2 (Feb. 2026). DOI:https://doi.org/10.20428/jst.v31i2.3566.

Most read articles by the same author(s)