A Comparative Analysis of Python Text Matching Libraries: A Multilingual Evaluation of Capabilities, Performance and Resource Utilization

Nagwa Elmobark

doi:10.55151/ijeedu.v7i1.188

PDF DOWNLOAD

Published: Apr 15, 2025

DOI: https://doi.org/10.55151/ijeedu.v7i1.188

Keywords:

Difflib, FuzzyWuzzy, Jellyfish, Levenshtein, Natural Language Processing (NLP), RapidFuzz

Nagwa Elmobark

Department of Computer Science, University of Mansoura, Dakahlia Governorate 11432, Egypt

Abstract

Python text-matching libraries have become essential tools in data cleaning and natural language processing; however, researchers have not thoroughly examined their performance, accuracy, and resource efficiency across multilingual scenarios. This study evaluates five major libraries—FuzzyWuzzy, RapidFuzz, Difflib, Levenshtein, and Jellyfish—using a dataset of 50,000 test cases in English, Spanish, French, German, and Italian. We introduce controlled variations in text complexity, error types, and string lengths to measure processing speed, matching accuracy, and resource consumption. The experimental results reveal significant performance differences among the libraries. RapidFuzz processes text 40% faster than others while maintaining efficient memory usage. However, its performance varies depending on language and error type. Levenshtein achieves higher accuracy when handling non-Latin characters, while FuzzyWuzzy consistently performs well across different text lengths. Difflib, despite its built-in availability, runs slower and consumes more resources. Jellyfish specializes in phonetic matching but struggles with long text inputs. Memory usage fluctuates between 20 and 200 Megabytes for identical workloads, revealing substantial efficiency differences. These findings enable developers to select the most suitable library based on their specific needs and computational constraints. Our study introduces a standardized evaluation framework and a multilingual benchmarking dataset, enabling researchers to compare text-matching methods more effectively. By identifying key performance trade-offs, we provide a practical guide for optimizing text-matching efficiency in real-world applications. This research contributes to the broader field of natural language processing by offering data-driven insights and a structured methodology for evaluating text similarity techniques.

Downloads

Download data is not yet available.

How to Cite

[1]

N. Elmobark, “A Comparative Analysis of Python Text Matching Libraries: A Multilingual Evaluation of Capabilities, Performance and Resource Utilization”, Int. J. Environ. Eng. Educ., vol. 7, no. 1, pp. 48–60, Apr. 2025.

Issue

Vol. 7 No. 1 (2025)

Section

Research Article

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Authors who publish with this journal agree to the following terms:

Copyright of the published article belongs to the authors and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC BY SA) International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See the Effect of Open Access).

References

[1] Y. Li, J. Li, Y. Suhara, A. H. Doan, and W. C. Tan, “Effective entity matching with transformers,” VLDB J., vol. 32, no. 6, pp. 1215–1235, 2023, doi: 10.1007/s00778-023-00779-z.

[2] A. Kloptchenko, Text Mining Based on the Prototype Matching Method the Prototype Matching Method, no. 47. Citeseer, 2003.

[3] D. Pawar, S. Phansalkar, A. Sharma, G. K. Sahu, C. K. Ang, and W. H. Lim, “Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity Measures,” Sustain., vol. 15, no. 5, p. 4216, 2023, doi: 10.3390/su15054216.

[4] Y. Tian, A. Ding, D. Wang, X. Luo, B. Wan, and Y. Wang, “Bi-Attention enhanced representation learning for image-text matching,” Pattern Recognit., vol. 140, p. 109548, 2023, doi: 10.1016/j.patcog.2023.109548.

[5] J. Wang, H. Zhang, Y. Zhong, Y. Liang, R. Ji, and Y. Cang, “Advanced Multimodal Deep Learning Architecture for Image-Text Matching,” in 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information, ICETCI 2024, 2024, pp. 1185–1191, doi: 10.1109/ICETCI61221.2024.10594167.

[6] M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, M. Birattari, and T. Stützle, “The irace package: Iterated racing for automatic algorithm configuration,” Oper. Res. Perspect., vol. 3, pp. 43–58, 2016, doi: 10.1016/j.orp.2016.09.002.

[7] B. Bischl et al., “ASlib: A benchmark library for algorithm selection,” Artif. Intell., vol. 237, pp. 41–58, 2016, doi: 10.1016/j.artint.2016.04.003.

[8] M. Lindauer, J. N. van Rijn, and L. Kotthoff, “The algorithm selection competitions 2015 and 2017,” Artif. Intell., vol. 272, pp. 86–100, 2019, doi: 10.1016/j.artint.2018.10.004.

[9] E. Ukkonen, “Approximate string-matching with q-grams and maximal matches,” Theor. Comput. Sci., vol. 92, no. 1, pp. 191–211, 1992, doi: 10.1016/0304-3975(92)90143-4.

[10] W. I. Chang and J. Lampe, “Theoretical and empirical comparisons of approximate string matching algorithms,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1992, vol. 644 LNCS, pp. 175–184, doi: 10.1007/3-540-56024-6_14.

[11] E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic parrots: Can language models be too big?,” in FAccT 2021 - Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021, pp. 610–623, doi: 10.1145/3442188.3445922.

[12] P. Joshi, S. Santy, A. Budhiraja, K. Bali, and M. Choudhury, “The state and fate of linguistic diversity and inclusion in the NLP world,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2020, pp. 6282–6293, doi: 10.18653/v1/2020.acl-main.560.

[13] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised learning of universal sentence representations from natural language inference data,” in EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 2017, pp. 670–680, doi: 10.18653/v1/d17-1070.

[14] N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models,” 2021, [Online]. Available: http://arxiv.org/abs/2104.08663.

[15] T. Kenter and M. De Rijke, “Short text similarity with word embeddings,” in International Conference on Information and Knowledge Management, Proceedings, 2015, vol. 19-23-Oct-, pp. 1411–1420, doi: 10.1145/2806416.2806475.

[16] M. Johnson et al., “Google’s multilingual neural machine translation system: Enabling zero-shot translation,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 339–351, 2017.

[17] O. Firat, K. Cho, B. Sankaran, F. T. Yarman Vural, and Y. Bengio, “Multi-way, multilingual neural machine translation,” Comput. Speech Lang., vol. 45, pp. 236–252, 2017, doi: 10.1016/j.csl.2016.10.006.

[18] J. Li et al., “Performance Bug Analysis and Detection for Distributed Storage and Computing Systems,” ACM Trans. Storage, vol. 19, no. 3, pp. 1–33, 2023, doi: 10.1145/3580281.

[19] Y. Luo et al., “Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory,” in Proceedings of the International Conference on Dependable Systems and Networks, 2014, pp. 467–478, doi: 10.1109/DSN.2014.50.

[20] P. Christen, The Data Matching Process. Springer, 2012.

[21] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate record detection: A survey,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 1, pp. 1–16, 2007, doi: 10.1109/TKDE.2007.250581.

[22] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, pp. 4171–4186, 2019.

[23] Y. Zhao, A. Zhang, R. Xie, K. Liu, and X. Wang, “Connecting embeddings for knowledge graph entity typing,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 6419–6428, 2020, doi: 10.18653/v1/2020.acl-main.572.

[24] R. Rehurek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” Proc. Lr. 2010 Work. New Challenges NLP Fram., pp. 45–50, 2010.

[25] T. Pires, E. Schlinger, and D. Garrette, “How multilingual is multilingual BERT?,” ACL 2019 - 57th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., pp. 4996–5001, 2020, doi: 10.18653/v1/p19-1493.

[26] G. Doddington, “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics,” in Proceedings of the second international conference on Human Language Technology Research, 2002, p. 138, doi: 10.3115/1289189.1289273.

[27] L. Cardozo-Gaibisso, G. W. Hodges, C. Mardones-Segovia, and A. S. Cohen, “Multidimensional Assessment Performance Analysis: A Framework to Advance Multilingual Learners’ Scientific Equity in K-12 Contexts,” Education Sciences, vol. 14, no. 10. 2024, doi: 10.3390/educsci14101068.

[28] X. J. A. Bellekens, C. Tachtatzis, R. C. Atkinson, C. Renfrew, and T. Kirkham, “A highly-efficient memory-compression scheme for GPU-accelerated Intrusion Detection Systems,” in ACM International Conference Proceeding Series, 2014, vol. 2014-Septe, pp. 302–309, doi: 10.1145/2659651.2659723.

[29] Z. K. Baker and V. K. Prasanna, “Time and area efficient pattern matching on FPGAs,” in ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, 2004, vol. 12, pp. 223–232, doi: 10.1145/968280.968312.

[30] M. M. Baig, S. Sivakumar, and S. R. Nayak, “Optimizing Performance of Text Searching Using CPU and GPUs,” in Advances in Intelligent Systems and Computing, 2020, vol. 1119, pp. 141–150, doi: 10.1007/978-981-15-2414-1_15.

[31] H. Zhou et al., “The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit,” arXiv Prepr. arXiv2501.02173, 2025.

[32] K. I. Kim, K. Jung, and J. H. Kim, “Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1631–1639, 2003, doi: 10.1109/TPAMI.2003.1251157.

[33] A. Fan, S. Wang, and Y. Wang, “Legal Document Similarity Matching Based on Ensemble Learning,” IEEE Access, vol. 12, pp. 33910–33922, 2024, doi: 10.1109/ACCESS.2024.3371262.

[34] B. Wang, S. Yu, W. Lou, and Y. T. Hou, “Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud,” in Proceedings - IEEE INFOCOM, 2014, pp. 2112–2120, doi: 10.1109/INFOCOM.2014.6848153.

[35] H. Schütze, C. D. Manning, and P. Raghavan, Introduction to information retrieval, vol. 39. Cambridge University Press Cambridge, 2008.

[36] M. Rashmi, Introduction to Information Retrieval Systems, vol. 3, no. 4. Cambridge University Press Cambridge, 2015.

[37] M. T. Pilehvar and J. Camacho-Collados, Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning, vol. 13, no. 4. Morgan & Claypool Publishers, 2020.

[38] W. R. Pearson, “Selecting the right similarity‐scoring matrix,” Curr. Protoc. Bioinforma., vol. 43, no. 1, pp. 3–5, 2013.

[39] B. Berger, M. S. Waterman, and Y. W. Yu, “Levenshtein Distance, Sequence Comparison and Biological Database Search,” IEEE Trans. Inf. Theory, vol. 67, no. 6, pp. 3287–3294, 2021, doi: 10.1109/TIT.2020.2996543.

[40] N. Gali, R. Mariescu-Istodor, D. Hostettler, and P. Fränti, “Framework for syntactic string similarity measures,” Expert Syst. Appl., vol. 129, pp. 169–185, 2019, doi: 10.1016/j.eswa.2019.03.048.

[41] B. C. Gencosman, H. C. Ozmutlu, and S. Ozmutlu, “Character n-gram application for automatic new topic identification,” Inf. Process. Manag., vol. 50, no. 6, pp. 821–856, 2014, doi: 10.1016/j.ipm.2014.06.005.

[42] V. I. Levenshtein, “Efficient reconstruction of sequences,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2–22, 2001, doi: 10.1109/18.904499.

[43] P. Choudhury, Z. Ahmed, and B. K. Sunitha, “Analyzing Intelligent Optimization Techniques for 6G Radio Resource Allocation,” in 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, 2024, pp. 1–6, doi: 10.1109/ICCCNT61001.2024.10725020.

[44] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning for smart manufacturing: Methods and applications,” J. Manuf. Syst., vol. 48, pp. 144–156, 2018, doi: 10.1016/j.jmsy.2018.01.003.

[45] A. Spielberg et al., “Differentiable visual computing for inverse problems and machine learning,” Nat. Mach. Intell., vol. 5, no. 11, pp. 1189–1199, 2023, doi: 10.1038/s42256-023-00743-0.

[46] Y. Chaabi and F. Ataa Allah, “Amazigh spell checker using Damerau-Levenshtein algorithm and N-gram,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 6116–6124, 2022, doi: 10.1016/j.jksuci.2021.07.015.

[47] D. Hládek, J. Staš, and M. Pleva, “Survey of automatic spelling correction,” Electron., vol. 9, no. 10, pp. 1–29, 2020, doi: 10.3390/electronics9101670.

[48] H. Al-Rubaiee, R. Qiu, and D. Li, “Identifying Mubasher software products through sentiment analysis of Arabic tweets,” in 2016 International Conference on Industrial Informatics and Computer Systems, CIICS 2016, 2016, pp. 1–6, doi: 10.1109/ICCSII.2016.7462396.

[49] Z. Huang and W. Zhao, “A semantic matching approach addressing multidimensional representations for web service discovery,” Expert Syst. Appl., vol. 210, p. 118468, 2022, doi: 10.1016/j.eswa.2022.118468.

[50] A. A. Niaz, R. Ashraf, T. Mahmood, C. M. N. Faisal, and M. M. Abid, “An efficient smart phone application for wheat crop diseases detection using advanced machine learning,” PLoS One, vol. 20, no. 1 January, p. e0312768, 2025, doi: 10.1371/journal.pone.0312768.

[51] M. Rashmi, “Introduction to Information Retrieval Systems,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 3, no. 4. Wiley Online Library, pp. 2051–2054, 2015, doi: 10.17762/ijritcc2321-8169.150462.

[52] B. MacCartney, M. Galley, and C. D. Manning, “A phrase-based alignment model for natural language inference,” in EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL, 2008, pp. 802–811, doi: 10.3115/1613715.1613817.

[53] C. C. Aggarwal and C. X. Zhai, “A survey of text classification algorithms,” Min. Text Data, vol. 9781461432, pp. 163–222, 2012, doi: 10.1007/978-1-4614-3223-4_6.

[54] W. H.Gomaa and A. A. Fahmy, “A Survey of Text Similarity Approaches,” Int. J. Comput. Appl., vol. 68, no. 13, pp. 13–18, 2013, doi: 10.5120/11638-7118.

[55] J. Wang and Y. Dong, “Measurement of text similarity: A survey,” Inf., vol. 11, no. 9, pp. 1–17, 2020, doi: 10.3390/info11090421.

[56] A. Islam and D. Inkpen, “Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity,” ACM Trans. Knowl. Discov. Data, vol. 2, no. 2, pp. 1–25, 2008, doi: 10.1145/1376815.1376819.

[57] B. Wang and C. C. J. Kuo, “SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 2146–2157, 2020, doi: 10.1109/TASLP.2020.3008390.

[58] S. Abarna, J. I. Sheeba, and S. P. Devaneyan, “An ensemble model for idioms and literal text classification using knowledge-enabled BERT in deep learning,” Meas. Sensors, vol. 24, no. 1, p. 102756, 2022, doi: 10.1016/j.measen.2022.100434.

[59] Y. Bounab, M. Oussalah, and A. Ferdenache, “Reconciling Image Captioning and User’s Comments for Urban Tourism,” in 2020 10th International Conference on Image Processing Theory, Tools and Applications, IPTA 2020, 2020, pp. 1–6, doi: 10.1109/IPTA50016.2020.9286602.

[60] S. Behmanesh, A. Talebpour, M. Shamsfard, and M. M. Jafari, “A Novel Open-Domain Question Answering System on Curated and Extracted Knowledge Bases with Consideration of Confidence Scores in Existing Triples,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3490452.

[61] Y. Li and Y. Long, “Inferring storefront vacancy using mobile sensing images and computer vision approaches,” Comput. Environ. Urban Syst., vol. 108, p. 102071, 2024, doi: 10.1016/j.compenvurbsys.2023.102071.

[62] I. P. A. E. D. Udayana, I. G. T. A. Putra, I. P. S. Udyana, I. G. S. C. Nugraha, N. P. Widantari, and B. K. Wijaya, “Optimizing Latin to Balinese Script Transliteration: Hybrid Jaro Winkler and Damerau Levenshtein Methods,” in Digest of Technical Papers - IEEE International Conference on Consumer Electronics, 2024, pp. 12–18, doi: 10.1109/ISCT62336.2024.10791083.

[63] G. Alemu, B. Stevens, and P. Ross, “Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach,” New Libr. World, vol. 113, no. 1, pp. 38–54, 2012, doi: 10.1108/03074801211199031.

[64] M. Camacho and E. Navarro, Natural language processing with Python, vol. 4, no. 13. Frank Millstein, 2020.

[65] N. Van Otte, “Word Embedding A Powerful Tool — How To Use Word2Vec GloVe, FastText,” spotintelligence, 2022. https://spotintelligence.com/2022/11/30/word-embedding/?form=MG0AV3.

[66] P. McNamee and J. Mayfield, “Character n-gram tokenization for European language text retrieval,” Inf. Retr. Boston., vol. 7, no. 1–2, pp. 73–97, 2004, doi: 10.1023/b:inrt.0000009441.78971.be.

[67] Z. Jiang, A. El-Jaroudi, W. Hartmann, D. Karakos, and L. Zhao, “Cross-lingual Information Retrieval with BERT,” arXiv Prepr. arXiv2004.13005, 2020, [Online]. Available: http://arxiv.org/abs/2004.13005.

[68] T. Schuster, O. Ram, R. Barzilay, and A. Globerson, “Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, pp. 1599–1613, 2019, doi: 10.18653/v1/n19-1162.

[69] V. Iglovikov, “Need for Speed: A Comprehensive Benchmark of JPEG Decoders in Python,” arXiv Prepr. arXiv2501.13131, 2025.

[70] H. Li et al., “A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement,” Expert Syst. Appl., vol. 207, p. 117832, 2022, doi: 10.1016/j.eswa.2022.117832.

[71] K. Kim, M. K. Hasan, J. P. Heo, Y. W. Tai, and S. E. Yoon, “Probabilistic cost model for nearest neighbor search in image retrieval,” Comput. Vis. Image Underst., vol. 116, no. 9, pp. 991–998, 2012, doi: 10.1016/j.cviu.2012.05.001.

[72] F. Chen and M. Hsu, “A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics,” in ACM International Conference Proceeding Series, 2013, pp. 613–624, doi: 10.1145/2452376.2452448.

[73] V. Gupta, M. Gupta, J. Garg, and N. Garg, “Improvement in semantic address matching using natural language processing,” in 2021 2nd International Conference for Emerging Technology, INCET 2021, 2021, pp. 1–5, doi: 10.1109/INCET51464.2021.9456342.

[74] J. Lin, X. Ma, S. C. Lin, J. H. Yang, R. Pradeep, and R. Nogueira, “Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations,” in SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2356–2362, doi: 10.1145/3404835.3463238.

[75] B. P. Miller et al., “The Paradyn Parallel Performance Measurement Tool,” Computer (Long. Beach. Calif)., vol. 28, no. 11, pp. 37–46, 1995, doi: 10.1109/2.471178.

[76] T. Nawaz and A. Cavallaro, “A protocol for evaluating video trackers under real-world conditions,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1354–1361, 2013, doi: 10.1109/TIP.2012.2228497.

[77] S. Patel and R. Patel, “A Comprehensive Analysis of Computing Paradigms Leading to Fog Computing: Simulation Tools, Applications, and Use Cases,” J. Comput. Inf. Syst., vol. 63, no. 6, pp. 1495–1516, 2023, doi: 10.1080/08874417.2022.2121782.

[78] J. Lüttgau et al., “Survey of storage systems for high-performance computing,” Supercomput. Front. Innov., vol. 5, no. 1, pp. 31–58, 2018, doi: 10.14529/jsfi180103.

[79] H. J. Kim and J. S. Kim, “A user-space storage I/O framework for NVMe SSDs in mobile smart devices,” IEEE Trans. Consum. Electron., vol. 63, no. 1, pp. 28–35, 2017, doi: 10.1109/TCE.2017.014709.

[80] M. H. I. Chowdhuryy, M. Jung, F. Yao, and A. Awad, “D-Shield: Enabling Processor-side Encryption and Integrity Verification for Secure NVMe Drives,” in Proceedings - International Symposium on High-Performance Computer Architecture, 2023, vol. 2023-Febru, pp. 908–921, doi: 10.1109/HPCA56546.2023.10070924.

[81] J. Turnbull, Monitoring With Prometheus Website: Monitoring With Prometheus. Turnbull Press, 2019.

[82] Y. Li, H. Qi, G. Lu, F. Jin, Y. Guo, and X. Lu, “Understanding hot interconnects with an extensive benchmark survey,” BenchCouncil Trans. Benchmarks, Stand. Eval., vol. 2, no. 3, p. 100074, 2022, doi: 10.1016/j.tbench.2022.100074.

[83] M. Liebowitz, C. Kusek, and R. Spies, VMware vSphere Performance : Designing CPU, Memory, Storage, and Networking for Performance-Intensive Workloads. John Wiley & Sons, 2014.

[84] H. J. Escalante et al., “Term-weighting learning via genetic programming for text classification,” Knowledge-Based Syst., vol. 83, no. 1, pp. 176–189, 2015, doi: 10.1016/j.knosys.2015.03.025.

[85] N. F. Schneidewind, “Methodology for Validating Software Metrics,” IEEE Trans. Softw. Eng., vol. 18, no. 5, pp. 410–422, 1992, doi: 10.1109/32.135774.

[86] R. McCleary, S. Patterson, and J. Yates, “Quality Validation Method,” freepatentsonline, 2008. www.freepatentsonline.com/y2009/0169092.html.

[87] G. Navarro, “A guided tour to approximate string matching,” ACM Comput. Surv., vol. 33, no. 1, pp. 31–88, 2001, doi: 10.1145/375360.375365.

[88] W. Cohen, P. Ravikumar, and S. Fienberg, “A comparison of string metrics for matching names and records,” in Kdd workshop on data cleaning and object consolidation, 2003, vol. 3, pp. 73–78.

[89] V. S. Verykios, A. K. Elmagarmid, and E. N. Houstis, “Automating the approximate record-matching process,” Inf. Sci. (Ny)., vol. 126, no. 1, pp. 83–98, 2000, doi: 10.1016/S0020-0255(00)00013-X.

[90] Y. Zhang and S. Vogel, “Significance tests of automatic machine translation evaluation metrics,” Mach. Transl., vol. 24, no. 1, pp. 51–65, 2010, doi: 10.1007/s10590-010-9073-6.

[91] R. A. Wagner and M. J. Fischer, “The String-to-String Correction Problem,” J. ACM, vol. 21, no. 1, pp. 168–173, 1974, doi: 10.1145/321796.321811.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References