Indices and tables



[Ada17]Jason Adams. Ruby port of uealite stemmer. 2017. URL:
[AmonME12]Iván Amón, Francisco Moreno, and Jaime Echeverri. Algoritmo fonético para detección de cadenas de texto duplicadas en el idioma español. Revista Ingenierías Universidad de Medellín, 11(20):127–138, June 2012. URL:
[Axe09]Pål Axelsson. Sfinxbis. Technical Report, Swedish Alliance for Middleware Infrastructure, April 2009. URL:
[BCP02]Ilaria Bartolini, Paolo Ciaccia, and Marco Patella. String matching with metric trees using an approximate distance. In Alberto H. F. Laender and Arlindo L. Oliveira, editors, SPIRE 2002: String Processing and Information Retrieval, 271–283. Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. URL:, doi:10.1007/3-540-45735-6_24.
[BM08]Alexander Beider and Stephen P. Morse. Beider-morse phonetic matching: an alternative to soundex with fewer false hits. International Review of Jewish Genealogy, Summer 2008. URL:
[BBL81]Gérard Bouchard, Patrick Brard, and Yolande Lavoie. Fonem: un code de transcription phonétique pour la reconstitution automatique des familles saguenayennes. Population, 1981. URL:, doi:10.2307/1532326.
[Boy98]Carolyn B. Boyce. Information on the refined soundex algorithm. November 1998. URL:
[Boy11]Leonid Boytsov. Indexing methods for approximate dictionary searching: comparative analysis. Journal of Experimental Algorithmics, 16:1.1:1.1–1.1:1.91, May 2011. doi:10.1145/1963190.1963191.
[BW94]Michael Burrows and David J. Wheeler. A block sorting lossless data compression algorithm. SRC Research Report 124, Digital Equipment Corporation, Palo Alto, May 1994. URL:
[Cau99]Jörg Caumanns. A fast and simple stemming algorithm for german words. Technical Report, Free University of Berlin, 1999. URL:
[Chr11]Peter Christen. Febrl (freely extensible biomedical record linkage) – December 2011. URL:
[Chu]Richard Churchill. URL:
[CV05]Rudi Cilibrasi and Paul Michael Béla Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, April 2005. URL:, doi:10.1109/TIT.2005.844059.
[CislakG17]Aleksander Cisłak and Szymon Grabowski. Lightweight fingerprints for fast approximate keyword matching using bitwise operations. CoRR, 2017. URL:, arXiv:1711.08475.
[Cod18a]Rosetta Code. Longest common subsequence. 2018. URL:
[Cod18b]Rosetta Code. Run-length encoding. 2018. URL:
[C+69]Jay L. Cunningham and others. A study of the organization and search of bibliographic holdings in on-line computer systems: phase i. Technical Report, University of California, Berkleley, Institute of Library Research, mar 1969. URL:
[Dal05]Andrew Dalke. Arithmetic coder (python recipe). 2005. URL:
[Dam64]Fred J. Damerau. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176, March 1964. doi:10.1145/363958.363994.
[Dav62]Leon Davidson. Retrieval of misspelled names in an airlines passenger record system. Communications of the ACM, 5(3):169–171, March 1962. doi:10.1145/366862.366913.
[dcm4che]dcm4che. DICOM toolkit & library: URL:
[Dic45]Lee R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945. URL:, doi:10.2307/1932409.
[Dol70]James L. Dolby. An algorithm for variable-length proper-name compression. Journal of Library Automation, 3(4):257–275, 1970. URL:, doi:10.6017/ital.v3i4.5259.
[EJMS76]Honey S. Elovitz, Rodney W. Johnson, Astrid McHugh, and John E. Shore. Automatic translation of english text to pphonetic by means of letter-to-sound rules. NRL Report 7948, document AD/A021 929, Naval Research Laboratory, Washington, D.C., 1976.
[FurnrohrRvR02]Michael Fürnrohr, Birgit Rimmelspacher, and Tilman von Roncador. Zusammenführung von datenbeständen ohne numerische identifikatoren: ein verfahren im rahmen der testuntersuchungen zu einem registergestützten zensus. Bayern in Zahlen, 2002(7):308–321, 2002. URL:
[Gad90]T. N. Gadd. Phonix: the algorithm. Program, 24(4):363–366, 1990. doi:10.1108/eb047069.
[Gar15]Lars Marius Garshol. Norphone comparator. 2015. URL:
[GM88]Wilde Georg and Carsten Meyer. Nicht wörtlich genommen, ‘schreibweisentolerante’ suchroutine in dbase implementiert. c’t Magazin für Computer Technik, pages 126–131, October 1988.
[Gil97]Leicester E. Gill. Ox-link: the oxford medical record linkage system. In Record Linkage Techniques. Washington, D.C., March 1997. Federal Committee on Statistical Methodology, Office of Management and Budget. URL:
[Got82]Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705–708, 1982. URL:, doi:10.1016/0022-2836(82)90398-9.
[Gro91]Aaron D. Gross. Getty synoname: the development of software for personal name pattern matching. In Intelligent Text and Image Handling - Volume 2, RIAO ‘91, 754–763. Paris, France, France, 1991. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE. URL:
[HH00]Martin Haase and Kai Heitmann. Die erweiterte kölner phonetik. 2000.
[Ham50]R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, April 1950. URL:, doi:10.1002/j.1538-7305.1950.tb00463.x.
[Har91]Donna Harman. How effective is stemming? Journal of the American Society for Information Science, 42(1):7–15, 1991. URL:, doi:10.1002/(SICI)1097-4571(199101)42:1%3C7::AID-ASI2%3E3.0.CO;2-P.
[Hen76]Louis Henry. Projet de transcription phonétique des noms de famille. Annales de Démographie Historique, 1976:201–214, 1976. URL:
[HBD76]Theodore Hershberg, Alan Burstein, and Robert Dockhorn. Record linkage. Historical Methods Newsletter, 9(2–3):137–163, 1976. doi:10.1080/00182494.1976.10112639.
[HBD79]Theodore Hershberg, Alan Burstein, and Robert Dockhorn. Verkettung von daten: record linkage am beispiel des philadelphia social history project. In Wilhelm Heinz Schröder, editor, Moderne Stadtgeschichte, volume 8, pages 35–73. Klett-Cotta, 1979. URL:
[HM02]David Holmes and M. Catherine McCabe. Improving precision and recall for soundex retrieval. In Proceedings. International Conference on Information Technology: Coding and Computing, 22–26. April 2002. URL:, doi:10.1109/ITCC.2002.1000354.
[Hoo02]David Hood. Cavesystem: phonetic matching algorithm. Technical Report CTP060902, University of Otago, Dunedin, New Zealand, September 2002. URL:
[Hoo04]David Hood. Caverphone revisited. Technical Report CTP150804, University of Otago, Dunedin, New Zealand, December 2004. URL:
[Jac01]Paul Jaccard. Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:241–272, 1901. URL:
[Jar89]Matthew A. Jaro. Advances in record linkage methodology as applied to the 1985 census of tampa florida. Journal of the American Statistical Association, 84(406):414–420, 1989. doi:10.1080/01621459.1989.10478785.
[JS05]Marie-Claire Jenkins and Dan Smith. Conservative stemming for search and indexing. Technical Report, University of East-Anglia, Norwich, UK, 2005. URL:
[JBG13]Sergio Jiminez, Claudio Becerra, and Alexander Gelbukh. SOFTCARDINALITY-CORE: improving text overlap with distributional measures for semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task, 194–201. Atlanta, GA, June 2013. Association for Computational Linguistics. URL:
[Knu98]Donald E. Knuth. The Art of Computer Programming: Volume 3, Sorting and Searching, pages 394. Addison-Wesley, 1998.
[Kollar]Maroš Kollár. Text::phonetic::phonix. URL:
[KV17]Kerrthi Koneru and Cihan Varol. Privacy preserving record linkage using metasoundex algorithm. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 443–447. December 2017. URL:, doi:10.1109/ICMLA.2017.0-121.
[Kuh95]Michael Kuhn. Metaphone searches. November 1995. URL:
[LR96]Andrew J. Lait and Brian Randell. An assessment of name matching algorithms. Technical Report, University of Newcastle upon Tyne, Newcastle upon Tyne, UK, 1996. URL:
[Lan13]Joerg Lang. Inner wworking of the german analyzer in lucene. November 2013. URL:
[Lev66]Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707–710, February 1966. URL:
[Lov68]Julie Beth Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1–2):22–31, June 1968. URL:
[LA77]Billy T. Lynch and William L. Arends. Selection of a surname coding procedure for the srs record linkage system. Technical Report, Statistical Reporting Service, US Department of Agriculture, Washington, D.C., February 1977. URL:
[LegareLC72]Jacques Légaré, Yolande Lavoie, and Hubert Charbonneau. The early canadian population: problems in automatic record linkage. Canadian Historical Review, 53(4):427–442, December 1972. doi:10.3138/CHR-053-04-03.
[Mar15]Daniel Marcelino. Soundexbr: soundex (phonetic) algorithm for Brazilian portuguese. jul 2015. URL:
[Mic99]Jörg Michael. Doppelgänger gesucht – ein programm für die kontextsensitive phonetische stringumwandlung. c’t Magazin für Computer Technik, pages 252, 1999. URL:
[Mic07]Jörg Michael. Phonet.c. August 2007. URL:
[Min10]Hermann Minkowski. Geometrie der Zahlen. R. G. Teubner, Leipzig, 1910. URL:
[Mok97]Gary Mokotoff. Soundexing and genealogy. 1997. URL:
[ME96]Alvaro E. Monge and Charles P. Elkan. The field matching problem: algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD‘96, 267–270. AAAI Press, 1996. URL:
[MKTM77]Gwendolyn B. Moore, John L. Kuhns, Jeffrey L. Trefftzs, and Christine A. Montgomery. Accessing Individual Records from Personal Data Files Using Non-Unique Identifiers. Number 500-2 in Special Publication. National Bureau of Standards, Washington, D.C., February 1977. URL:
[MLM12]Alejandro Mosquera, Elena Lloret, and Paloma Moreda. Towards facilitating the accessibility of web 2.0~Texts through text normalisation. In Proceedings of the LREC workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA) ; Istanbul, Turkey., 9–14. 2012. URL:
[NW70]Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970. URL:, doi:10.1016/0022-2836(70)90057-4.
[Och57]Akira Ochiai. Zoogeographical studies on the soleoid fishes found in Japan and its neighhouring regions-ii. Bulletin of the Japanese Society of Scientific Fisheries, 22(9):526–530, 1957. URL:, doi:10.2331/suisan.22.526.
[Ope12]OpenRefine. Clustering in depth. 2012. URL:
[Ots36]Yanosuke Otsuka. The faunal character of the Japanese pleistocene marine mollusca, as evidence of the climate having become colder during the pleistocene in Japan. Bulletin of the Biogeographical Society of Japan, 6(16):165–170, 1936.
[Pai90]Chris D. Paice. Another stemmer. In ACM SIGIR Forum, volume 24, 56–61. Fall 1990. URL:, doi:10.1145/101306.101310.
[PK14]Vimal P. Parmar and CK Kumbharana. Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing ti with existing algorithm(s). International Journal of Computer Applications, 98(19):45–49, 2014. doi:10.5120/17295-7795.
[Pfe00]Ulrich Pfeifer. Wait 1.8 - soundex.c. 2000. URL:
[Phi90a]Lawrence Philips. Hanging on the metaphone. Computer Language Magazine, 7(12):39–44, December 1990.
[Phi90b]Lawrence Philips. Metaphone. December 1990. URL:
[Phi00]Lawrence Philips. The double metaphone search algorithm. C/C++ Users Journal, 18(6):38–43, June 2000.
[Pli18]Guillaume Plique. Talisman. 2018. URL:
[PZ84]Joseph J. Pollock and Antonio Zamora. Automatic spelling correction in scientific and scholarly text. Communications of the ACM, 27(4):358–368, April 1984. URL:, doi:10.1145/358027.358048.
[Por80]Martin F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July 1980. URL:, doi:10.1108/eb046814.
[Por02]Martin F. Porter. The english (porter2) stemming algorithm. September 2002. URL:
[Pos69]Hans Joachim Postel. Die kölner phonetik: ein verfahren zur identifizierung von personennamen auf der grundlage der gestaltanalyse. IBM-Nachrichten, 19:925–931, 1969.
[Pra15]Jörg Prante. Elasticsearch – 2015. URL:
[RM88]John W. Ratcliff and David E. Metzener. Pattern matching: the gestalt approach. Dr. Dobbs Journal, 1988. URL:
[Rep13]Dominic John Repici. Understanding classic soundex algorithms. 2013. URL:
[RU09]Nicholas Ring and Alexandra L. Uitdenbogerd. Finding `lucy in disguise’: the misheard lyric matching problem. In Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, and Tetsuya Sakai, editors, Information Retrieval Technology, 157–167. Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. doi:10.1007/978-3-642-04769-5_14.
[RC67]A. H. Robinson and Colin Cherry. Results of a prototype television bandwidth compression scheme. In Proceedings of the IEEE, volume 55, 356–364. IEEE, 1967. doi:10.1109/PROC.1967.5493.
[Ruk18]Dorothea Rukasz. Pprl – privacy preserving record linkage. 2018. URL:
[Rus18]Robert C. Russell. Index. 1918. URL:
[Sav05]Jacques Savoy. IR multilingual resources at unine. 2005. URL:
[SGRW96]Robyn Schinke, Mark Greengrass, Alexander M. Robertson, and Peter Willett. A stemming algorithm for latin text databases. Journal of Documentation, 52(2):172–187, 1996. doi:10.1108/eb026966.
[SBB04]Rainer Schnell, Tobias Bachteler, and Stefan Bender. A toolbox for record linkage. Australian Journal of Statistics, 33(1-2):125–133, 2004. URL:
[SA10]Boumedyen A. N. Shannaq and Victor V. Alexandrov. Using product similarity for adding business. Global Journal of Computer Science and Technology, 10(12):2–8, October 2010. URL:
[Sim49]Edward H. Simpson. Measurement of diversity. Nature, 163:688, April 1949. URL:, doi:10.1038/163688a0.
[Sjoo09]Allan Sjöö. Swamisfinxbix. 2009. URL:
[SW81]Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, 1981. URL:, doi:10.1016/0022-2836(81)90087-5.
[Son11]Wayne Song. Typo-distance. 2011. URL:
[Ste14]Kevin L. Stern. 2014. URL:
[Szy34]Dezydery Szymkiewicz. Une contribution statistique à la géographie floristique. Acta Societatis Botanicorum Poloniae, 11(3):249–265, 1934. URL:, doi:10.5586/asbp.1934.012.
[Sorensen48]Thorvald Sørensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Kongelige Danske Videnskabernes Selskab, 5(4):1–34, 1948. URL:,%20Thorvald.pdf.
[Taf70]Robert L. Taft. Name Search Techniques. Special report (New York State Identification and Intelligence System). Bureau of Systems Development, New York State Identification and Intelligence System, 1970.
[Tan58]T. T. Tanimoto. An elementary mathematical theory of classification and prediction. Technical Report, IBM, 1958.
[Tic]Ticki. Eudex: a blazingly fast phonetic reduction/hashing algorithm. URL:
[Tic16]Ticki. The eudex algorithm. December 2016. URL:
[Tve77]Amos Tversky. Features of similarity. Psychological Review, 84(4):327–352, 1977. URL:, doi:10.1037/0033-295x.84.4.327.
[VB12]Cihan Varol and Coskun Bayrak. Hybrid matching algorithm for personal names. Journal of Data and Information Quality, 3(4):8:1–8:18, September 2012. doi:10.1145/2348828.2348830.
[WF74]Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168–173, January 1974. doi:10.1145/321796.321811.
[Wik18]Wikibooks. Algorithm implementation/strings/longest common substring. 2018. URL:
[Wil05]Martin Wilz. Aspekte der kodierung phonetischer Ähnlichkeiten in deutschen eigennamen. Master’s thesis, Universität zu Köln, Köln, 2005. URL:
[Win90]William E. Winkler. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. Technical Report, U.S. Bureau of the Census, Statistical Research Division, Washington, D.C., 1990. URL:
[WMJL94]William E. Winkler, George McLaughlin, Matthew A. Jaro, and Maureen Lync. Strcmp95.c. January 1994. URL:
[Zac14]Siderite Zackwehdex. Super fast and accurate string distance algorithm: sift4. 2014. URL:
[Zed15]Jesper Zedlitz. Phonet4java 2015. URL:
[ZD96]Justin Zobel and Philip Dart. Phonetic string matching: lessons from information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘96, 166–172. New York, NY, USA, 1996. ACM. doi:10.1145/243199.243258.
[delPAngelesBailonM16]Mar’ıa del Pilar Angeles and Noemi Bailón-Miguel. Performance of spanish encoding functions during record linkage. In DATA ANALYTICS 2016: The Fifth International Conference on Data Analysis, 1–7. 2016. URL:
[delPAngelesEGGM15]Mar’ıa del Pilar Angeles, Adrián Espino-Gamez, and Jonathan Gil-Moncada. Comparison of a modified spanish phonetic, soundex, and phonex coding functions during data matching process. In 2015 International Conference on Informatics, Electronics Vision (ICIEV), 1–5. June 2015. URL:, doi:10.1109/ICIEV.2015.7334028.
[IBMCorporation73]IBM Corporation. Alpha Search Inquiry System, General Information Manual. White Plains, NY, 1973.
[JPGTrust91]The J. Paul Getty Trust. Synoname. 1991. URL:
[UnitedStates97]United States. Using the Census Soundex. Number 55 in General Information Leaflet. National Archives and Records Administration, Washington, D.C., 1997. URL:
[UnitedStates07]United States. Soundex system: the soundex indexing system. 2007. URL:
[vonRethS77]Hans-Peter von Reth and Hans-Jörg Schek. Eine zugriffsmethode für die phonetische Ähnlichkeitssuche. Technical Report 77.03.002, IBM Deutschland GmbH., 1977.
[65]Влади́мир И. Левенштейн. Двоичные \cyrchar \cyrk оды с исправлением выпадений, вставо\cyrchar \cyrk и замещений символов. Доклады Академий Наук СCCP, 163(4):845–848, 1965. URL: