Indices

AZvanGemund07

Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. An evaluation of similarity coefficients for software fault localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06). 2007. doi:10.1109/PRDC.2006.18.

Ada17

Jason Adams. Ruby port of uealite stemmer. 2017. URL: https://github.com/ealdent/uea-stemmer.

Ain73

William A. Ainsworth. A system for converting text into speech. IEEE Transactions on Audio and Electroacoustics, AU-21(3):288–290, June 1973. doi:10.1109/TAU.1973.1162452.

AmonME12

Iván Amón, Francisco Moreno, and Jaime Echeverri. Algoritmo fonético para detección de cadenas de texto duplicadas en el idioma español. Revista Ingenier\'ıas Universidad de Medell\'ın, 11(20):127–138, June 2012. URL: http://www.scielo.org.co/scielo.php?pid=S1692-33242012000100011\&script=sci\_abstract\&tlng=es.

And73

Michael R. Anderberg. Cluster Analysis for Applications. Academic Press, New York, 1973. doi:10.1016/C2013-0-06161-0.

AM04

Marti J. Anderson and Russell B. Millar. Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern new zealand. Journal of Experimental Marine Biology and Ecology, 305:191–221, 2004. doi:10.1016/j.jembe.2003.12.011.

AndresM04

A. Martín Andrés and P. Femia Marzo. Delta: a new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology, 57(1):1–20, May 2004. doi:10.1348/000711004849268.

AC77

Brian Austin and Rita R. Colwell. Evaluation of some coefficients for use in numerical taxonomy of microorganisms. International Journal of Systematic Bacteriology, 27(3):204–210, July 1977. doi:10.1099/00207713-27-3-204.

Axe09

Pål Axelsson. Sfinxbis. Technical Report, Swedish Alliance for Middleware Infrastructure, April 2009. URL: http://www.swami.se/download/18.248ad5af12aa8136533800091/SfinxBis.pdf.

BUB76

Cesare Baroni-Urbani and Mauro W. Buser. Similarity of binary data. Systematic Biology, 25(3):251–259, September 1976. doi:10.2307/2412493.

BCP02

Ilaria Bartolini, Paolo Ciaccia, and Marco Patella. String matching with metric trees using an approximate distance. In Alberto H. F. Laender and Arlindo L. Oliveira, editors, SPIRE 2002: String Processing and Information Retrieval, 271–283. Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. URL: http://www-db.disi.unibo.it/research/papers/SPIRE02.pdf, doi:10.1007/3-540-45735-6\_24.

BB95

Vladimir Batagelj and Matevž Bren. Comparing resemblance measures. Journal of Classification, 12(1):73–90, March 1995. doi:10.1007/BF01202268.

Bau89

Forrest B. Baulieu. A classification of presence/absence based dissimilarity coefficients. Journal of Classification, 6(1):233–246, 1989. doi:10.1007/BF01908601.

Bau97

Forrest B. Baulieu. Two variant axiom systems for presence/absence based dissimilarity coefficients. Journal of Classification, 14(1):159–170, 1997. doi:10.1007/s003579900009.

BM08

Alexander Beider and Stephen P. Morse. Beider-morse phonetic matching: an alternative to soundex with fewer false hits. International Review of Jewish Genealogy, Summer 2008. URL: https://stevemorse.org/phonetics/bmpm.htm.

Ben01

Rudolfo Benini. Principii di Demografia. Number 29 in Manuali Barbera di Scienze Giuridiche Sociali e Politiche. G. Barbera, Firenze, 1901. URL: http://www.archive.org/stream/principiididemo00benigoog.

BAG54

E. M. Bennet, R. Alpert, and A. C. Goldstein. Communications through limited-response questioning. Public Opinion Quarterly, 18(3):303–308, 1954. doi:10.1086/266520.

Bha46

Anil Kumar Bhattacharyya. On a measure of divergence between two multinomial populations. Sankhyā: The Indian Journal of Statistics (1933-1960), 7(4):401–406, July 1946. doi:10.2307/25047882.

BP80

Gerard Bouchard and Christian Pouyez. Name variations and computerized record linkage. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 13(2):119–125, 1980. doi:10.1080/01615440.1980.10594037.

BBL81

Gérard Bouchard, Patrick Brard, and Yolande Lavoie. Fonem: un code de transcription phonétique pour la reconstitution automatique des familles saguenayennes. Population, 1981. URL: http://www.persee.fr/doc/pop\_0032-4663\_1981\_num\_36\_6\_17248, doi:10.2307/1532326.

Boy98

Carolyn B. Boyce. Information on the refined soundex algorithm. November 1998. URL: https://web.archive.org/web/20010513121003/http://www.bluepoof.com:80/Soundex/info2.html.

Boy11

Leonid Boytsov. Indexing methods for approximate dictionary searching: comparative analysis. Journal of Experimental Algorithmics, 16:1.1:1.1–1.1:1.91, May 2011. doi:10.1145/1963190.1963191.

Bra51

George W. Brainerd. The place of chronological ordering in archaeological analysis. American Antiquity, 16(4):301–313, April 1951. doi:10.2307/276979.

BB32

Josias Braun-Blanquet. Plant Sociology: The Study of Plant Communities. McGraw-Hill Book Company, New York, 1932. URL: https://archive.org/details/plantsociologyst00brau.

BC57

J. Roger Bray and John T. Curtis. An ordination of upland forest communities of southern wisconsin. Ecological Monographs, 27(4):325–349, February 1957. URL: http://cescos.fau.edu/gawliklab/papers/BrayJRandJTCurtis1957.pdf, doi:10.2307/1942268.

Bro97

Andrei Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences: Proceedings, Positano, Amalfitan Coast, Salerno, Italy, June 11-13, 1997, 21–29. 1997. doi:10.1109/SEQUEN.1997.666900.

BW94

Michael Burrows and David J. Wheeler. A block sorting lossless data compression algorithm. SRC Research Report 124, Digital Equipment Corporation, Palo Alto, May 1994. URL: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.html.

CBW97

Yong Cao, Anthony W. Bark, and W. Peter Williams. Similarity measure bias in river benthic aufwuchs community analysis. Water Environment Research, 69(1):95–106, 1997. doi:10.2175/106143097x125227.

Cau99

Jörg Caumanns. A fast and simple stemming algorithm for german words. Technical Report, Free University of Berlin, 1999. URL: https://refubium.fu-berlin.de/bitstream/handle/fub188/18405/tr-b-99-16.pdf.

Cha08

Sung-Hyuk Cha. Taxonomy of nominal type histogram distance measures. In Proceedings of the American Conference on Applied Mathematics (MATH '08). 2008. URL: http://www.wseas.us/e-library/conferences/2008/harvard/math/49-577-887.pdf.

CTY06

Sung-Hyuk Cha, Charles C. Tappert, and Sungsoo Yoon. Enhancing binary feature vector similarity measures. Journal of Pattern Recognition Research, 1(1):63–77, 2006. doi:10.13176/11.20.

CCCS04

Anne Chao, Robin L. Chazdon, Robert K. Colwell, and Tsung-Jen Shen. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8(2):148–159, 2004. doi:10.1111/j.1461-0248.2004.00707.x.

CCT10

Seung-Seok Choi, Sung-Hyuk Cha, and Charles C. Tappert. A survey of binary similarity and distance measures. Systemics, Cybernetics and Informatics, 8(1):43–48, 2010.

Chr06

Peter Christen. A comparison of personal name matching: techniques and practical issues. Technical Report TR-CS-06-02, Australian National University, Canberra, Australia, 2006. URL: https://openresearch-repository.anu.edu.au/bitstream/1885/44521/3/TR-CS-06-02.pdf.

Chr11

Peter Christen. Febrl (freely extensible biomedical record linkage) – encode.py. December 2011. URL: https://sourceforge.net/projects/febrl/.

CGHH91

Kenneth Church, William Gale, Patrick Hanks, and Donald Hindle. Using statistics in lexical analysis. In Lexical Acquisition: Exploiting On-Line Resources to Build up a Lexicon, pages 115–164. Lawrence Erlbaum, Hillsdale, NJ, 1991.

Chu

Richard Churchill. Ueastem.java. URL: http://lemur.cmp.uea.ac.uk/Research/stemmer/UEAstem.java.

CV05

Rudi Cilibrasi and Paul Michael Béla Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, April 2005. URL: https://ieeexplore.ieee.org/document/1412045, doi:10.1109/TIT.2005.844059.

CislakG17

Aleksander Cisłak and Szymon Grabowski. Lightweight fingerprints for fast approximate keyword matching using bitwise operations. CoRR, 2017. URL: http://arxiv.org/abs/1711.08475.

Cla52

Philip J. Clark. An extension of the coefficient of divergence for use with multiple characters. Copeia, 1952(2):61–64, June 1952. doi:10.2307/1438532.

Cle76

Paul W. Clement. A formula for computing inter-observer agreement. Psychological Reports, 39(1):257–258, 1976. doi:10.2466/pr0.1976.39.1.257.

Cod18a

Rosetta Code. Longest common subsequence. 2018. URL: http://rosettacode.org/wiki/Longest\_common\_subsequence\#Dynamic\_Programming\_6.

Cod18b

Rosetta Code. Run-length encoding. 2018. URL: https://rosettacode.org/wiki/Run-length\_encoding\#Python.

Coh11

Adam Cohen. Fuzzywuzzy: fuzzy string matching in python. July 2011. URL: https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/.

Coh60

Jacob Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46, 1960. doi:10.1177/001316446002000104.

CRF03

William A. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB'03 Proceedings of the 2003 International Conference on Information, 73–78. 2003. URL: http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf.

CRFR03

William W. Cohen, Pradeep Ravikumar, Stephen E. Fienberg, and Kathryn Rivard. Secondstring. 2003. URL: https://github.com/TeamCohen/secondstring.

Col49

Lamont C. Cole. The measurement of interspecific association. Ecology, 30(4):411–424, 1949. doi:10.2307/1932444.

CT12

Viviana Consonni and Roberto Todeschini. New similarity coefficients for binary data. MATCH Communications in Mathematical and in Computer Chemistry, 68:581–592, 2012.

Cor03

Graham Cormode. Seuqnce Distance Embeddings. PhD thesis, The University of Warwick, 2003. URL: http://wrap.warwick.ac.uk/61310/7/WRAP\_THESIS\_Cormode\_2003.pdf.

CPSV00

Graham Cormode, Mike Paterson, Süleyman Cenk Sahinalp, and Uzi Vishkin. Communication complexity of document exchange. In SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, 197–200. 2000.

Cor73

IBM Corporation. Alpha Search Inquiry System, General Information Manual. White Plains, NY, 1973.

Cor17

IBM Corporation. IBM SPSS Statistics Algorithms. IBM Corporation, 25 edition, 2017. URL: ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/subscription/en/client/Manuals/IBM\_SPSS\_Statistics\_Algorithms.pdf.

Cov96

Michael A. Covington. An algorithm to align words for historical comparison. Computational Linguistics, 22(4):481–496, December 1996.

Cro51

Lee J. Cronbach. Coefficient alpha and the internal structure of tests. Psychometrika, 16(3):297–334, September 1951. doi:10.1007/BF02310555.

C+69

Jay L. Cunningham and others. A study of the organization and search of bibliographic holdings in on-line computer systems: phase i. Technical Report, University of California, Berkleley, Institute of Library Research, March 1969. URL: https://files.eric.ed.gov/fulltext/ED029679.pdf.

Cze09

Jan Czekanowski. Zur differentialdiagnose der neandertalgruppe. Korrespondenz-Blatt der Deutschen Gesellschaft für Anthropologie, Ethnologie und Urgeschichte, 40:44–47, 1909.

DLP99

Ido Dagan, Lillian Lee, and Fernando C. N. Pereire. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1–3):43–69, February 1999. doi:10.1023/A:1007537716579.

Dal05

Andrew Dalke. Arithmetic coder (python recipe). 2005. URL: http://code.activestate.com/recipes/306626/.

DLZ05

Valentin Dallmeier, Christian Lindig, and Andreas Zeller. Lightweight. In ECOOP'05 Proceedings of the 19th European conference on Object-Oriented Programming. 2005. URL: https://www.st.cs.uni-saarland.de/papers/dlz2004/dlz2004.pdf, doi:10.1007/11531142\_23.

Dam64

Fred J. Damerau. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176, March 1964. doi:10.1145/363958.363994.

Dav62

Leon Davidson. Retrieval of misspelled names in an airlines passenger record system. Communications of the ACM, 5(3):169–171, March 1962. doi:10.1145/366862.366913.

dcm4che

dcm4che. DICOM toolkit & library: phonem.java. URL: https://github.com/dcm4che/dcm4che/blob/master/dcm4che-soundex/src/main/java/org/dcm4che3/soundex/Phonem.java.

Den65

Sally F. Dennis. The construction of a thesaurus automatic from a sample of text. In Mary Elizabeth Stevens, Vincent E. Giuliano, and Laurence B. Heilprin, editors, Statistical Association Techniques for Mechanized Documentation: Symposium Proceedings, number 269 in National Bureau of Standards Miscellaneous Publication, 61–148. Washington, D.C., December 1965. United States Department of Commerce. URL: https://archive.org/details/statisticalassoc269stev.

DD16

Michel Marie Deza and Elena Deza. Encyclopedia of Distances. Springer-Verlag, Berlin, 4 edition, 2016.

Dic45

Lee R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945. URL: https://www.jstor.org/stable/1932409, doi:10.2307/1932409.

Dig83

P. G. N. Digby. Approximating the tetrachoric correlation coefficient. Biometrics, 39(3):753–757, September 1983. doi:10.2307/2531104.

Dol70

James L. Dolby. An algorithm for variable-length proper-name compression. Journal of Library Automation, 3(4):257–275, 1970. URL: https://ejournals.bc.edu/ojs/index.php/ital/article/download/5259/4734, doi:10.6017/ital.v3i4.5259.

Doo84

Mayrick H. Doolittle. The verification of predictions. The American Meteorological Journal, 2:327–329, 1884. URL: https://books.google.com/books?id=2f0wAQAAMAAJ&pg=PA327.

DHC+08

Sean S. Downey, Brian Hallmark, Murray P. Cox, Peter Norquest, and J. Stephen Lansing. Computational featuresensitive reconstruction of language relationships: developing the aline distance for comparative historical linguistic reconstruction. Journal of Quantitative Linguistics, 15(4):340–369, November 2008. doi:10.1080/09296170802326681.

DK32

Harold E. Driver and Alfred L. Kroeber. Quantitative expression of cultural relationships. University of California Publications in American Archaeology and Ethnology, 31(4):211–256, 1932. URL: http://digitalassets.lib.berkeley.edu/anthpubs/ucb/text/ucp031-005.pdf.

Dun93

Ted Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61–74, 1993. URL: http://www.aclweb.org/anthology/J93-1003.

EH88

Andrzej Ehrenfeucht and David Haussler. A new distance metric on strings computable in linear time. Discrete Applied Mathematics, 20(3):191–203, 1988. doi:10.1016/0166-218X(88)90076-5.

Eid14

Horst Eidenberger. Categorization and Machine Learning: The ModModel of Human Understanding in Computers. atpress, 2014.

Ell56

Heinz Ellenberg. Grundlagen Der Vegetationsgliederung. Teil 1. Aufgaben Und Methoden Der Vegetationskunde. Verlag Eugen Ulmer, Stuttgart, 1956.

EJMS76

Honey S. Elovitz, Rodney W. Johnson, Astrid McHugh, and John E. Shore. Automatic translation of english text to pphonetic by means of letter-to-sound rules. NRL Report 7948, document AD/A021 929, Naval Research Laboratory, Washington, D.C., 1976.

Eri97

Klas Erikson. Approximate swedish name matching - survey and test of different algorithms. Nada report TRITA-NA-E9721, KTH, Royal Institute of Technology, Stockholm, Sweden, 1997. URL: ftp://ftp.nada.kth.se/pub/documents/Theory/Viggo-Kann/NADA-E9721.pdf.

Eyr38

Henri Eyraud. Les principes de la mesure des corrélations. Annales de l'Universit/e de Lyon, III Series, Section A, 1:30–47, 1938.

Fag57

Edward W. Fager. Determination and analysis of recurrent groups. Ecology, 38(4):586–595, October 1957. doi:10.2307/1943124.

FM63

Edward W. Fager and John A. McGowan. Zooplankton species groups in the north pacific. Science, 140(3566):453–460, 1963. doi:10.1126/science.140.3566.453.

Fai83

Daniel P. Faith. Asymmetric binary similarity measures. Oecologia, 57(3):287–290, March 1983. doi:10.1007/BF00377169.

Fle75

Joseph L. Fleiss. Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31(3):651–659, 1975. doi:10.2307/2529549.

FLP03

Joseph L. Fleiss, Bruce Levin, and Myunghee Cho Paik. Statistical Methods for Rates and Proportions. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, 3rd edition, 2003.

For07

Stephen A. Forbes. On the local distribution of certain illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory of Natural History, 7:273–303, 1907.

For25

Stephen A. Forbes. Method of determining and measuring the associative relations of species. Science, 61(1585):518–524, 1925.

FK66

Earl G. Fossum and Gilbert Kaskey. Optimization and standardization of information retrieval language and systems. Technical Report, Directorate of Information Sciences, Air Force Office of Scientific Research, Office of Aerospace Research, United States Air Force, Washington, D.C., 1966. URL: https://archive.org/details/DTIC\_AD0630797.

FM83

E. B. Fowlkes and Colin L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569, 1983. doi:10.1080/01621459.1983.10478008.

FurnrohrRvR02

Michael Fürnrohr, Birgit Rimmelspacher, and Tilman von Roncador. Zusammenführung von datenbeständen ohne numerische identifikatoren: ein verfahren im rahmen der testuntersuchungen zu einem registergestützten zensus. Bayern in Zahlen, 2002(7):308–321, 2002. URL: https://www.statistik.bayern.de/medien/statistik/zensus/zusammenf\_\_hrung\_von\_datenbest\_\_nden\_ohne\_numerische\_identifikatoren.pdf.

Gad90

T. N. Gadd. Phonix: the algorithm. Program, 24(4):363–366, 1990. doi:10.1108/eb047069.

Gar15

Lars Marius Garshol. Norphone comparator. 2015. URL: https://github.com/larsga/Duke/blob/master/duke-core/src/main/java/no/priv/garshol/duke/comparators/NorphoneComparator.java.

GM88

Wilde Georg and Carsten Meyer. Nicht wörtlich genommen, 'schreibweisentolerante' suchroutine in dbase implementiert. c't Magazin für Computer Technik, pages 126–131, October 1988.

GW66

N. Gilbert and Terry C. E. Wells. Analysis of quadrat data. Journal of Ecology, 54(3):675–685, November 1966. doi:10.2307/2257810.

Gil84

Grove K. Gilbert. Finley's tornado predictions. American Meteorological Journal, 1:166–172, 1884.

Gil97

Leicester E. Gill. Ox-link: the oxford medical record linkage system. In Record Linkage Techniques. Washington, D.C., March 1997. Federal Committee on Statistical Methodology, Office of Management and Budget. URL: https://pdfs.semanticscholar.org/fff7/02a3322e05c282a84064ee085e589ef74584.pdf.

Gin12

Corrado Gini. Variabilità e mutabilità. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. C. Cuppini, Bologna, 1912.

Gin15

Corrado Gini. Nuovi contributi all teoria delle relazioni statistiche. Atti del Reale Istituto Veneto di Scienze, Lettere ed Arti, Series 8, 74(2):1903–1942, 1915.

Gle20

Henry Allan Gleason. Some applications of the quadrat method. Bulletin of the Torrey Botanical Club, 47(1):21–33, January 1920. doi:10.2307/2480223.

Goo67

David W. Goodall. The distribution of the matching coefficient. Biometrics, 23(4):647–656, December 1967. doi:10.2307/2528419.

GK54

Leo A. Goodman and William H. Kruskal. Measures of association for cross classification i. Journal of the American Statistical Association, 49(268):732–764, 1954. doi:10.2307/2281536.

GK59

Leo A. Goodman and William H. Kruskal. Measures of association for cross classification ii: further discussion and references. Journal of the American Statistical Association, 54(285):123–163, March 1959. doi:10.2307/2282143.

Got82

Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705–708, 1982. URL: http://www.sciencedirect.com/science/article/pii/0022283682903989, doi:10.1016/0022-2836(82)90398-9.

Gow71

John C. Gower. A general coefficient of similarities and some of its properties. Biometrics, 27(4):857–871, December 1971. doi:10.2307/2528823.

GL86

John C. Gower and Pierre Legendre. Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1):5–48, February 1986. doi:10.1007/BF01896809.

GIJ+01

Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishman, and Divesh Srivastava. Approximate string joins in a database (almost) for free. In Proceedings of the 27th VLDB Conference, Roma, Italy, 2001. 2001.

Gro91

Aaron D. Gross. Getty synoname: the development of software for personal name pattern matching. In Intelligent Text and Image Handling - Volume 2, RIAO '91, 754–763. Paris, France, France, 1991. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE. URL: http://dl.acm.org/citation.cfm?id=3171004.3171021.

Guirk

J. P. Guildford. Fundamental Statistics in Psychology and Education. McGraw-Hill Book Company, New York, New York. URL: https://archive.org/details/in.ernet.dli.2015.228996.

Gut76

Gloria J. A. Guth. Surname spellings and computerized record linkage. Historical Methods Newsletter, 10(1):10–19, 1976. doi:10.1080/00182494.1976.10112645.

Gut41

Louis Guttman. An outline of the statistical theory of prediction. In Paul Horst, editor, The Prediction of Personal Adjustment, number 48, pages 253–311. Social Science Research Council, 1941. URL: https://babel.hathitrust.org/cgi/pt?id=uc1.b4579784;view=1up;seq=271.

Gwe08

Kilem Li Gwet. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1):29–48, 2008. doi:10.1348/000711006X126600.

HH00

Martin Haase and Kai Heitmann. Die erweiterte kölner phonetik. 2000.

Ham61

Ulrich Hamann. Merkmalbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen. Willdenowia, 2:639–768, 1961.

Ham50

R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, April 1950. URL: https://ieeexplore.ieee.org/document/6772729/, doi:10.1002/j.1538-7305.1950.tb00463.x.

Har91

Donna Harman. How effective is stemming? Journal of the American Society for Information Science, 42(1):7–15, 1991. URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.9828\&rep=rep1\&type=pdf, doi:10.1002/(SICI)1097-4571(199101)42:1\%3C7::AID-ASI2\%3E3.0.CO;2-P.

HL78

Francis C. Harris and Benjamin B. Lahey. A method for combining occurrence and nonoccurrence interobserver agreement scores. Journal of Applied Behavior Analysis, 11(4):523–527, 1978. doi:10.1901/jaba.1978.11-523.

Has14

Ahmad Basheer Hassanat. Dimensionality invariant similarity measure. Journal of American Science, 10(8):221–226, 2014. URL: https://arxiv.org/abs/1409.0923.

HD73

Robert P. Hawkins and Victor A. Dotson. Reliability scores that delude: an alice in wonderland trip through the misleading characteristics of inter-observer agreement scores in interval recording. Technical Report, Western Michigan University, 1973. URL: https://eric.ed.gov/?id=ED094277.

Hel09

Ernst Hellinger. Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal Für Die Reine Und Angewandte Mathematik, 1909(136):210–271, 1909. doi:10.1515/crll.1909.136.210.

HH77

Robert A. Henderson and Malcolm L. Heron. A probabilistic method of paleobiogeographic analysis. Lethaia, 10(1):1–15, 1977. doi:10.1111/j.1502-3931.1977.tb00584.x.

Hen76

Louis Henry. Projet de transcription phonétique des noms de famille. Annales de Démographie Historique, 1976:201–214, 1976. URL: https://www.persee.fr/doc/adh\_0066-2062\_1976\_num\_1976\_1\_1313.

HBD76

Theodore Hershberg, Alan Burstein, and Robert Dockhorn. Record linkage. Historical Methods Newsletter, 9(2–3):137–163, 1976. doi:10.1080/00182494.1976.10112639.

HBD79

Theodore Hershberg, Alan Burstein, and Robert Dockhorn. Verkettung von daten: record linkage am beispiel des philadelphia social history project. In Wilhelm Heinz Schröder, editor, Moderne Stadtgeschichte, volume 8, pages 35–73. Klett-Cotta, 1979. URL: https://www.ssoar.info/ssoar/handle/document/32782.

HM02

David Holmes and M. Catherine McCabe. Improving precision and recall for soundex retrieval. In Proceedings. International Conference on Information Technology: Coding and Computing, 22–26. April 2002. URL: https://ieeexplore.ieee.org/document/1000354/, doi:10.1109/ITCC.2002.1000354.

Hoo02

David Hood. Cavesystem: phonetic matching algorithm. Technical Report CTP060902, University of Otago, Dunedin, New Zealand, September 2002. URL: https://caversham.otago.ac.nz/files/working/ctp060902.pdf.

Hoo04

David Hood. Caverphone revisited. Technical Report CTP150804, University of Otago, Dunedin, New Zealand, December 2004. URL: https://caversham.otago.ac.nz/files/working/ctp150804.pdf.

Hor66

Henry S. Horn. Measurement of "overlap" in comparative ecological studies. The American Naturalist, 100(914):419–424, September 1966. doi:10.2307/2459242.

Hubalek08

Zdenek Hubálek. Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biological Reviews, 57(4):669–689, February 2008. doi:10.1111/j.1469-185X.1982.tb00376.x.

Hur69

Stuart H. Hurlbert. A coefficient of interspecific assciation. Ecology, 50(1):1–9, January 1969. doi:10.2307/1934657.

Jac01

Paul Jaccard. Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:241–272, 1901. URL: https://core.ac.uk/download/pdf/20654241.pdf.

Jar89

Matthew A. Jaro. Advances in record linkage methodology as applied to the 1985 census of tampa florida. Journal of the American Statistical Association, 84(406):414–420, 1989. doi:10.1080/01621459.1989.10478785.

JS05

Marie-Claire Jenkins and Dan Smith. Conservative stemming for search and indexing. Technical Report, University of East-Anglia, Norwich, UK, 2005. URL: http://lemur.cmp.uea.ac.uk/Research/stemmer/stemmer25feb.pdf.

JBG13

Sergio Jiminez, Claudio Becerra, and Alexander Gelbukh. SOFTCARDINALITY-CORE: improving text overlap with distributional measures for semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (\textasteriskcenteredSEM ), Volume 1: Proceedings of the Main Conference and the Shared Task, 194–201. Atlanta, GA, June 2013. Association for Computational Linguistics. URL: http://www.aclweb.org/anthology/S13-1028.

Joh67

Stephen C. Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241–254, September 1967. doi:10.1007/BF02289588.

JH05

James A. Jones and Mary Jean Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In ASE '05 Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, 273–282. New York, November 2005. ACM, ACM. doi:10.1145/1101908.1101949.

Kem05

Sebastian Kempken. Bewertung historischer und regionaler schreibvarianten mit hilfe von abstandsmaßen. Master's thesis, Universität Duisburg-Essen, December 2005. URL: https://duepublico.uni-duisburg-essen.de/servlets/DerivateServlet/Derivate-17252/BewertungSchreibvarianten.pdf.

Ken38

Maurice G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, June 1938. doi:10.2307/2332226.

KF77

Ronald N. Kent and Sharon L. Foster. Direct observational procedure: methodological issues in naturalistic settings. In Anthony R. Ciminero, Karen, S. Calhoun, and Henry E. Adams, editors, Handbook of Behavioral Assessment, chapter 9, pages 279–328. John Wiley & Sons, New York, 1977. URL: https://archive.org/details/handbookofbehavi00cimi.

Knu98

Donald E. Knuth. The Art of Computer Programming: Volume 3, Sorting and Searching, pages 394. Addison-Wesley, 1998.

Kollar

Maroš Kollár. Text::phonetic::phonix. URL: https://github.com/maros/Text-Phonetic/blob/master/lib/Text/Phonetic/Phonix.pm.

Kon00

Grzegorz Kondrak. A new algorithm for the alignment of phonetic sequences. In NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. 2000. doi:10.0000/dl.acm.org/974343.

Kon02

Grzegorz Kondrak. Algorithms for Language Reconstruction. PhD thesis, University of Toronto, 2002. URL: https://webdocs.cs.ualberta.ca/~kondrak/papers/thesis.pdf.

KD03

Grzegorz Kondrak and Bonnie J. Dorr. A similarity-based approach and evaluation methodology for reduction of drug name confusion. Technical Report, University of Maryland, Institute for Advanced Computer Studies, 2003. URL: https://apps.dtic.mil/dtic/tr/fulltext/u2/a452242.pdf.

KV17

Kerrthi Koneru and Cihan Varol. Privacy preserving record linkage using metasoundex algorithm. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 443–447. December 2017. URL: https://ieeexplore.ieee.org/document/8260671/, doi:10.1109/ICMLA.2017.0-121.

KR37

G. Frederic Kuder and Marion Webster Richardson. The theory of the estimation of test reliability. Psychometrika, 2(3):151–160, September 1937. doi:10.1007/bf02288391.

Kuh95

Michael Kuhn. Metaphone searches. November 1995. URL: http://aspell.net/metaphone/metaphone-kuhn.txt.

Kuh64

John L. Kuhns. The continuum of coefficients of association. In Mary Elizabeth Stevens, Vincent E. Giuliano, and Laurence B. Heilprin, editors, Statistical Association Methods for Mechanized Documentation, number 269 in National Bureau of Standards Miscellaneous Publication, 33–40. 1964.

Kul15

Maciej Kula. Simple minhash implementation in python. June 2015. URL: https://maciejkula.github.io/2015/06/01/simple-minhash-implementation-in-python/.

Kulczynski27

Stanisław Kulczynśki. Die pflanzenassoziationen der pieninen. Bulletin International de l'Academie Polonaise des Sciences et des Lettres, Classe des Sciences Mathematiques et Naturelles, B (Sciences Naturelles), pages 57–203, 1927.

Koppen70

Wladimir Köppen. Die aufeinanderfolge der periodischen witterungserscheinungen nach den grundsätzen der wahrscheinlichkeitsrechnung. In Repertorium für Meteorologie, volume 2, pages 189–238. Akademiia Nauk, 1870. URL: https://books.google.com/books?id=1ww0AQAAMAAJ\&pg=RA1-PA187\#v=onepage\&q\&f=false.

LR96

Andrew J. Lait and Brian Randell. An assessment of name matching algorithms. Technical Report, University of Newcastle upon Tyne, Newcastle upon Tyne, UK, 1996. URL: http://homepages.cs.ncl.ac.uk/brian.randell/Genealogy/NameMatching.pdf.

LW66

Godfrey N. Lance and William T. Williams. Computer programs for hierarchical polythetic classification ("similarity analysis"). Computer Journal, 1966. doi:10.1093/comjnl/9.1.60.

LW67a

Godfrey N. Lance and William T. Williams. A general theory of classificatory sorting strategies. ii. clustering systems. Computer Journal, 10(3):271–277, January 1967. URL: https://academic.oup.com/comjnl/article-pdf/10/3/271/1333425/100271.pdf, doi:10.1093/comjnl/10.3.271.

LW67b

Godfrey N. Lance and William T. Williams. Mixed-data classificatory programs i. agglomerative systems. Australian Computer Journal, 1:15–20, 1967.

Lan13

Joerg Lang. Inner wworking of the german analyzer in lucene. November 2013. URL: http://www.evelix.ch/unternehmen/Blog/evelix/2013/11/11/inner-workings-of-the-german-analyzer-in-lucene.

LL98

Pierre Legendre and Louis Legendre. Numerical Ecology. Number 20 in Developments in Environmental Modelling. Elsevier, Amsterdam, 2nd edition, 1998.

Lev65

Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR, 163(4):845–848, 1965. URL: http://mi.mathnet.ru/dan31411.

Lev66

Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707–710, February 1966. URL: https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf.

Lin04

Chin-Yew Lin. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out. 2004. URL: http://aclweb.org/anthology/W04-1013.

LSShaweTaylor+02

Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419–444, 2002. doi:10.1162/153244302760200687.

Lov68

Julie Beth Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1–2):22–31, June 1968. URL: http://www.mt-archive.info/MT-1968-Lovins.pdf.

LA77

Billy T. Lynch and William L. Arends. Selection of a surname coding procedure for the srs record linkage system. Technical Report, Statistical Reporting Service, US Department of Agriculture, Washington, D.C., February 1977. URL: https://naldc.nal.usda.gov/download/27833/PDF.

LegareLC72

Jacques Légaré, Yolande Lavoie, and Hubert Charbonneau. The early canadian population: problems in automatic record linkage. Canadian Historical Review, 53(4):427–442, December 1972. doi:10.3138/CHR-053-04-03.

Mar15

Daniel Marcelino. Soundexbr: soundex (phonetic) algorithm for Brazilian portuguese. July 2015. URL: https://github.com/danielmarcelino/SoundexBR.

Mat75

Brian W. Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2):442–451, 1975.

Mat55

Kameo Matusita. Decision rules, based on the distance, for problems of fit, two samples, and estimation. The Annals of Mathematical Statistics, 26(4):631–640, December 1955. doi:10.2307/2236376.

MP68

A. E. Maxwell and A. E. G. Pilliner. Deriving coefficients of reliability and agreement for ratings. The British Journal of Mathematical and Statistical Psychology, 21(1):105–116, May 1968. doi:10.1111/j.2044-8317.1968.tb00401.x.

McC64

Bayard H. McConnaughey. The determination and analysis of plankton communities. Lembaga Penelitian Laut, pages 1–40, 1964.

Mic99

Jörg Michael. Doppelgänger gesucht – ein programm für die kontextsensitive phonetische stringumwandlung. c't Magazin für Computer Technik, pages 252, 1999. URL: http://www.heise.de/ct/ftp/99/25/252/.

Mic07

Jörg Michael. Phonet.c. August 2007. URL: ftp://ftp.heise.de/pub/ct/listings/phonet.zip.

Mic20

Ellis L. Michael. Marine ecology and the coefficient of association: a plea in behalf of quantitative biology. The Journal of Ecology, 8(1):54–59, 1920. doi:10.2307/2255213.

Min10

Hermann Minkowski. Geometrie der Zahlen. R. G. Teubner, Leipzig, 1910. URL: https://archive.org/stream/geometriederzahl00minkrich.

Mok97

Gary Mokotoff. Soundexing and genealogy. 1997. URL: http://www.avotaynu.com/soundex.htm.

ME96

Alvaro E. Monge and Charles P. Elkan. The field matching problem: algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD'96, 267–270. AAAI Press, 1996. URL: http://dl.acm.org/citation.cfm?id=3001460.3001516.

MKTM77

Gwendolyn B. Moore, John L. Kuhns, Jeffrey L. Trefftzs, and Christine A. Montgomery. Accessing Individual Records from Personal Data Files Using Non-Unique Identifiers. Number 500-2 in Special Publication. National Bureau of Standards, Washington, D.C., February 1977. URL: https://archive.org/details/accessingindivid00moor.

MYCappe08

Erwan Moreau, François Yvon, and Olivier Cappé. Robust similarity measures for named entities matching. In COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, 593–600. August 2008.

Mor59

Masaaki Morisita. Measuring of interspecific association and similarity between communities. In Memoirs of the Faculty of Science, volume 3 of Series E (Biology), pages 65–80. Kyushu University, 1959.

Mor12

James F. Morris. A Quantitative MethoMethod for Vetting "Dark Network" Intelligence Sources for Social Network Analysis. PhD thesis, Air Force Institute of Technology, 2012. URL: https://apps.dtic.mil/dtic/tr/fulltext/u2/a561702.pdf.

MLM12

Alejandro Mosquera, Elena Lloret, and Paloma Moreda. Towards facilitating the accessibility of web 2.0 Texts through text normalisation. In Proceedings of the LREC workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA) ; Istanbul, Turkey., 9–14. 2012. URL: http://www.taln.upf.edu/pages/nlp4ita/pdfs/mosquera-nlp4ita2012.pdf.

MDobrzanskiZ50

J. Motyka, B. Dobrzański, and S. Zawadzki. Wstçpne badania nad lakami paludniowo-wschodnilj lubel-szczyzny (preliminary studies on meadows in the south-east of the province lublin). Annales Universitatis Mariae Curie-Skłodowska, Sectio E, 5(13):367–447, 1950.

Mou62

M. D. Mountford. An index of similarity and its application to classificatory problems. In P. W. Murphy, editor, Progress in Soil Zoology: Papers from a Colloquium on Research Methods Organized by the Soil Zoology Committee of the International Society of Soil Science, 43–50. London, July 1962. Butterworths. URL: https://openlibrary.org/books/OL5908681M/Progress\_in\_soil\_zoology.

Moz36

Alan Mozley. The statistical analysis of the distribution of pond molluscs in western Canada. The American Naturalist, 1936. doi:10.1086/280660.

NMM11

Rashid Naseem, Onaiza Maqbool, and Siraj Muhammad. Improved similarity measures for software clustering. In Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR. March 2011. doi:10.1109/CSMR.2011.9.

NW70

Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970. URL: http://www.sciencedirect.com/science/article/pii/0022283670900574, doi:10.1016/0022-2836(70)90057-4.

Och57

Akira Ochiai. Zoogeographical studies on the soleoid fishes found in Japan and its neighhouring regions-ii. Bulletin of the Japanese Society of Scientific Fisheries, 22(9):526–530, 1957. URL: https://www.jstage.jst.go.jp/article/suisan1932/22/9/22\_9\_526/\_pdf/-char/en, doi:10.2331/suisan.22.526.

oC13

Library of Congress. Classification and Shelflisting Manual. Library of Congress, 2013. URL: https://www.loc.gov/aba/publications/FreeCSM/freecsm.html.

Ope12

OpenRefine. Clustering in depth. 2012. URL: https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth.

Orloci67

Laszlo Orlóci. An agllomerative method for classification of plant communities. The Journal of Ecology, 55(1):193–206, March 1967. doi:10.2307/2257725.

Ots36

Yanosuke Otsuka. The faunal character of the Japanese pleistocene marine mollusca, as evidence of the climate having become colder during the pleistocene in Japan. Bulletin of the Biogeographical Society of Japan, 6(16):165–170, 1936.

Ozb15

Hakan Ozbay. Ozbay metric. 2015. URL: https://github.com/hakanozbay/ozbay-metric.

Pai90

Chris D. Paice. Another stemmer. In ACM SIGIR Forum, volume 24, 56–61. Fall 1990. URL: https://dl.acm.org/citation.cfm?id=101310, doi:10.1145/101306.101310.

PRWZ02

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, 311–318. 2002. URL: https://www.aclweb.org/anthology/P02-1040.pdf.

PK14

Vimal P. Parmar and CK Kumbharana. Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing ti with existing algorithm(s). International Journal of Computer Applications, 98(19):45–49, 2014. doi:10.5120/17295-7795.

Pas06

Rebecca Passonneau. Measuring agreement on set-valued items (masi) for semantic and pragmatic annotation. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06), 831–836. May 2006.

Pea00

Karl Pearson. Mathematical contributions to the theory of evolution. vii. on the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society, 195 A:1–47, 1900. doi:10.1098/rsta.1900.0022.

PH13

Karl Pearson and David Heron. On theories of association. Biometrika, 9(1/2):159–315, 1913. doi:10.2307/2331805.

Pec10

Pavel Pecina. Lexical association measures and collocation extraction. Language Resources & Evaluation, 44(1/2):137–158, 2010. doi:10.2307/40666353.

Pei84

Charles S. Peirce. The numerical measure of the success of predictions. Science, 4(93):453–454, 1884. doi:10.1126/science.ns-4.93.453-a.

Pen52

Lionel S. Penrose. Distance, size and shape. Annals of Eugenics, 17(1):337–343, January 1952. doi:10.1111/j.1469-1809.1952.tb02527.x.

Pfe00

Ulrich Pfeifer. Wait 1.8 - soundex.c. 2000. URL: https://fastapi.metacpan.org/source/ULPFR/WAIT-1.800/soundex.c.

Phi90a

Lawrence Philips. Hanging on the metaphone. Computer Language, 7(12):39–44, December 1990.

Phi90b

Lawrence Philips. Metaphone. December 1990. URL: http://aspell.net/metaphone/metaphone.basic.

Phi00

Lawrence Philips. The double metaphone search algorithm. C/C++ Users Journal, 18(6):38–43, June 2000.

Pli18

Guillaume Plique. Talisman. 2018. URL: https://github.com/Yomguithereal/talisman.

PZ84

Joseph J. Pollock and Antonio Zamora. Automatic spelling correction in scientific and scholarly text. Communications of the ACM, 27(4):358–368, April 1984. URL: http://dl.acm.org/citation.cfm?id=358048, doi:10.1145/358027.358048.

Por80

Martin F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July 1980. URL: http://snowball.tartarus.org/algorithms/porter/stemmer.html, doi:10.1108/eb046814.

Por02

Martin F. Porter. The english (porter2) stemming algorithm. September 2002. URL: http://snowball.tartarus.org/algorithms/english/stemmer.html.

Pos69

Hans Joachim Postel. Die kölner phonetik: ein verfahren zur identifizierung von personennamen auf der grundlage der gestaltanalyse. IBM-Nachrichten, 19:925–931, 1969.

Pra15

Jörg Prante. Elasticsearch – haasephonetik.java. 2015. URL: https://github.com/elastic/elasticsearch/blob/master/plugins/analysis-phonetic/src/main/java/org/elasticsearch/index/analysis/phonetic/HaasePhonetik.java.

Rruvzivcka58

M. Růžička. Anwendung mathematische-statistischer methoden in der geobotanik (synthetische bearbeitung von aufnahmen). Biologia, Bratislava, 13:647–661, 1958.

RTS+01

Dragomir Radev, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Çelebi, Hong Qi, Elliott Drabek, and Danyu Liu. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical Report, Johns Hopkins, 2001. URL: https://pdfs.semanticscholar.org/44a1/df62a1c815fc84aa42788283655a38c85550.pdf.

Ran71

William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850, December 1971. doi:10.2307/2284239.

RM88

John W. Ratcliff and David E. Metzener. Pattern matching: the gestalt approach. Dr. Dobbs Journal, 1988. URL: http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970.

RC79

David M. Raup and Rex E. Crick. Measurement of faunal similarity in paleontology. Journal of Paleontology, 53(5):1213–1227, September 1979. doi:10.2307/1304099.

RaissouliLC09

Mustapha Raïssouli, Fatima Leazizi, and Mohamed Chergui. Arithmetic-geometric-harmonic mean of three positive operators. Journal of Inequalities in Pure and Applied Mathematics, 2009. URL: http://www.emis.de/journals/JIPAM/images/014\_08\_JIPAM/014\_08.pdf.

Ree14

Tony Rees. Taxamatch, an algorithm for near ('fuzzy') matching on scientific names in taxonomic databases. PLoS ONE, 9(9):1–27, September 2014. doi:10.1371/journal.pone.0107510.

RB13

Tony Rees and Barbara Boehmer. The mdld (modified damerau-levenshtein distance) algorithm. November 2013. URL: https://confluence.csiro.au/public/taxamatch/the-mdld-modified-damerau-levenshtein-distance-algorithm.

Rep13

Dominic John Repici. Understanding classic soundex algorithms. 2013. URL: http://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm\#SoundExAndCensus.

RU09

Nicholas Ring and Alexandra L. Uitdenbogerd. Finding `lucy in disguise': the misheard lyric matching problem. In Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, and Tetsuya Sakai, editors, Information Retrieval Technology, 157–167. Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. doi:10.1007/978-3-642-04769-5\_14.

Rob86

David W. Roberts. Ordination on the basis of fuzzy set theory. Vegetatio, 66(3):123–131, 1986. doi:10.1007/BF00039905.

RC67

A. H. Robinson and Colin Cherry. Results of a prototype television bandwidth compression scheme. In Proceedings of the IEEE, volume 55, 356–364. IEEE, 1967. doi:10.1109/PROC.1967.5493.

Rob51

W. S. Robinson. A method for chronologically ordering archaeological deposits. American Antiquity, 16(4):293–301, April 1951. doi:10.2307/276978.

RT60

David J. Rogers and Taffee T. Tanimoto. A computer program for classifying plants. Science, 132(3434):1115–1118, October 1960. doi:10.1126/science.132.3434.1115.

RG66

Eugene Rogot and Irving D. Goldberg. A proposed index for measuring agreement in test-retest studies. Journal of Chronic Diseases, 1966. doi:10.1016/0021-9681(66)90032-4.

RY05

Gong Ruibin and Chan Kai Yun. An adaptive model for phonetic string search. In Knowledge-Based Intelligent Information and Engineering Systems, 9th International Conference, KES 2005 Melbourne, Australia, September 14-16, 2005 Proceedings, Part III, volume 3683 of Lecture Notes in Artificial Intelligence, 915–921. 2005.

Ruk18

Dorothea Rukasz. Pprl – privacy preserving record linkage. 2018. URL: https://github.com/cran/PPRL.

RHJF14

Daniel E. Russ, Kwan-Yuet Ho, Calvin A. Johnson, and Melissa C. Friesen. Computer-based coding of occupation codes for epidemiological analysis. In 2014 IEEE 27th International Symposium on Computer-Based Medical Systems, 347–350. 2014. doi:10.1109/CBMS.2014.79.

RR40

Paul F. Russell and T. Ramachandra Rao. On habitat and association of species of anopheline larvae in south-eastern madras. Journal of the Malaria Institute of India, 3(1):153–178, 1940.

Rus18

Robert C. Russell. Index. 1918. URL: https://patentimages.storage.googleapis.com/31/35/a1/f697a3ab85ced6/US1261167.pdf.

Sav05

Jacques Savoy. IR multilingual resources at unine. 2005. URL: http://members.unine.ch/jacques.savoy/clef/.

Schurer07

Kevin Schürer. Creating a nationally representative individual and household sample for great britain, 1851 to 1901 - the victorian panel study (vps). Historical Social Research / Historische Sozialforschung, 32(2):211–331, 2007. doi:10.2307/20762213.

SGRW96

Robyn Schinke, Mark Greengrass, Alexander M. Robertson, and Peter Willett. A stemming algorithm for latin text databases. Journal of Documentation, 52(2):172–187, 1996. doi:10.1108/eb026966.

SBB04

Rainer Schnell, Tobias Bachteler, and Stefan Bender. A toolbox for record linkage. Australian Journal of Statistics, 33(1-2):125–133, 2004. URL: https://pdfs.semanticscholar.org/2353/21c24ed0401cd05d7752c2c8a8da5b7a4dc0.pdf.

Sco55

William A. Scott. Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19(3):321–325, 1955. doi:10.1086/266577.

Sei93

Heinz-Jürgen Seiffert. Problem 887. Nieuw Archief voor Wiskunde, 11(4):176, 1993.

Seq18

SequentiX. Distance measures. 2018. URL: https://www.sequentix.de/gelquest/help/distance\_measures.htm.

SA10

Boumedyen A. N. Shannaq and Victor V. Alexandrov. Using product similarity for adding business. Global Journal of Computer Science and Technology, 10(12):2–8, October 2010. URL: https://www.sial.iias.spb.su/files/386-386-1-PB.pdf.

SS07

Dana Shapira and James A. Storer. Edit distance with move operations. Journal of Discrete Algorithms, 5(2):380–392, June 2007. doi:10.1016/j.jda.2005.01.010.

Shi93

Guang R. Shi. Multivariate data analysis in palaeoecology and palaeobiogeography—a review. Palaeogeography, Palaeoclimatology, Palaeoecology, 105(3-4):199–234, 1993. doi:10.1016/0031-0182(93)90084-v.

SGGomezAP14

Grigori Sidorov, Alexander Gelbukh, Helena Gómez-Adorno, and David Pinto. Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas, 2014. URL: http://www.scielo.org.mx/pdf/cys/v18n3/v18n3a7.pdf, doi:10.13053/CyS-18-3-2043.

Sim49

Edward H. Simpson. Measurement of diversity. Nature, 163:688, April 1949. URL: https://www.nature.com/articles/163688a0, doi:10.1038/163688a0.

Sjoo09

Allan Sjöö. Swamisfinxbix. 2009. URL: http://www.swami.se/download/18.248ad5af12aa8136533800093/swamiSfinxBis.java.txt.

SW81

Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, 1981. URL: http://www.sciencedirect.com/science/article/pii/0022283681900875, doi:10.1016/0022-2836(81)90087-5.

SD02

Chakkrit Snae and Bernard Diaz. An interface for mining genealogical nominal data using the concept of linkage and a hybrid name matching algorithm. Journal of 3D-Forum Society, 16(1):142–147, 2002. URL: https://web.archive.org/web/20050329140715/www.csc.liv.ac.uk/~chakkrit/Publications/hc2001\_Journal.pdf.

SM58

Robert R. Sokal and Charles D. Michener. A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin, 38, part 2(22):1409–1438, March 1958. URL: https://archive.org/details/cbarchive\_133648\_astatisticalmethodforevaluatin1902.

SS63

Robert R. Sokal and Peter H. A. Sneath. Principles of Numerical Taxonomy. W. H. Freeman and Company, San Francisco, 1963.

Son11

Wayne Song. Typo-distance. 2011. URL: https://github.com/wsong/Typo-Distance.

Sor58

Theodor Sorgenfrei. Molluscan Assemblages from the Marine Middle Miocene of South Jutland and Their Environments. Number 79 in 2. Danmarks Geologiske Undersøgelse, 1–503, 1958.

Sta97

United States. Using the Census Soundex. Number 55 in General Information Leaflet. National Archives and Records Administration, Washington, D.C., 1997. URL: https://hdl.handle.net/2027/pur1.32754067050041.

Sta07

United States. Soundex system: the soundex indexing system. 2007. URL: https://www.archives.gov/research/census/soundex.html.

Ste34

J. F. Steffensen. On certain measures of dependence between statistical variables. Biometrika, 26(1/2):251–255, May 1934. doi:10.2307/2332058.

SLaclavik15

Sam Steingold and Michal Laclavík. An information theoretic metric for multi-class categorization. Technical Report, Magnetic Media Online, 2015. URL: https://github.com/Magnetic/proficiency-metric/blob/master/paper/predeval.pdf.

Ste14

Kevin L. Stern. Dameraulevenshteinalgorithm.java. 2014. URL: https://github.com/KevinStern/software-and-algorithms/blob/master/src/main/java/blogspot/software\_and\_algorithms/stern\_library/string/DamerauLevenshteinAlgorithm.java.

Sti61

H. Edmund Stiles. The association factor in information retrieval. Journal of the ACM, 8(2):271–279, April 1961. doi:10.1145/321062.321074.

SSK05

Giorgos Stoilos, Giorgos Stamou, and Stefanos Kollias. A string metric for ontology alignment. In ISWC'05 Proceedings of the 4th international conference on The Semantic Web, 624–637. Galway, Ireland, November 2005. doi:10.1007/11574620\_45.

Stu53

A. Stuart. The estimation and comparison of strengths of association in contingency tables. Biometrika, 40(1/2):105–110, June 1953. doi:10.2307/2333101.

Szy34

Dezydery Szymkiewicz. Une contribution statistique à la géographie floristique. Acta Societatis Botanicorum Poloniae, 11(3):249–265, 1934. URL: https://pbsociety.org.pl/journals/index.php/asbp/article/download/asbp.1934.012/6710, doi:10.5586/asbp.1934.012.

Sorensen48

Thorvald Sørensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Kongelige Danske Videnskabernes Selskab, 5(4):1–34, 1948. URL: http://www.royalacademy.dk/Publications/High/295\_S\%C3\%B8rensen,\%20Thorvald.pdf.

Taf70

Robert L. Taft. Name Search Techniques. Special report (New York State Identification and Intelligence System). Bureau of Systems Development, New York State Identification and Intelligence System, 1970.

Tan58

T. T. Tanimoto. An elementary mathematical theory of classification and prediction. Technical Report, IBM, 1958.

Tar60

Kazimierz Tarwid. Szacowanie zbieznosci nisz ekologicznych gatunkow droga oceny prawdopodobienstwa spotykania sie ich w polowach. Ekologia Polska, Seria B, pages 115–130, 1960.

Tic84

Walter F. Tichy. The string-to-string correction problem with block moves. ACM Transactions on Computer Systems, 2(4):309–321, November 1984. doi:10.1145/357401.357404.

Tic

Ticki. Eudex: a blazingly fast phonetic reduction/hashing algorithm. URL: https://github.com/ticki/eudex.

Tic16

Ticki. The eudex algorithm. December 2016. URL: http://ticki.github.io/blog/the-eudex-algorithm/.

Tul97

Rodham E. Tulloss. Assessment of similarity indices for undesirable properties and a new tripartite similarity index based on cost functions. In Mary E. Palm and Ignacio H. Chapela, editors, Mycology in Sustainable Development: Expanding Concepts, Vanishing Borders, pages 122–143. Parkway Publishers, Inc., Boone, NC, 1997.

TCLM88

W. A. Turner, G. Charton, F. Laville, and B. Michelet. Packaging information for peer review: new co-word analysis techniques. In Handbook of Quantitative Studies of Science and Technology. New Holland, 1988.

Tve77

Amos Tversky. Features of similarity. Psychological Review, 84(4):327–352, 1977. URL: http://www.cogsci.ucsd.edu/~coulson/203/tversky-features.pdf, doi:10.1037/0033-295x.84.4.327.

Ukk92

Esko Ukkonen. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science, 92(1):191–211, 1992. doi:10.1016/0304-3975(92)90143-4.

Uph77

William B. Upholt. Estimation of DNA sequence divergence from comparison of restriction endonuclease digests. Nucleic Acids Research, 4(5):1257–1265, January 1977. doi:10.1093/nar/4.5.1257.

VB12

Cihan Varol and Coskun Bayrak. Hybrid matching algorithm for personal names. Journal of Data and Information Quality, 3(4):8:1–8:18, September 2012. doi:10.1145/2348828.2348830.

WF74

Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168–173, January 1974. doi:10.1145/321796.321811.

War08

Matthijs J. Warrens. Similarity Coefficients for Binary Data: Properties of Coefficients, Coefficient Matrices, Multi-way Metrics and Multivariate Coefficients. PhD thesis, Universiteit Leiden, Leiden, June 2008. URL: https://openaccess.leidenuniv.nl/bitstream/handle/1887/12987/Full\_thesis.pdf.

Whid.

Simon White. How to strike a match. Web, Nd. The oldest version on Internet Archive was archived in 2004. URL: http://www.catalysoft.com/articles/StrikeAMatch.html.

Whi52

R. H. Whittaker. A study of summer foliage insect communities in the great smoky mountains. Ecological Monographs, 22(1):1–44, January 1952. doi:10.2307/1948527.

Whi82

Robert H. Whittaker. Ordination of Plant Communities. Volume 5 of Handbook of Vegetation Sciecne. Springer Netherlands, 1982.

Wik18

Wikibooks. Algorithm implementation/strings/longest common substring. 2018. URL: https://en.wikibooks.org/wiki/Algorithm\_Implementation/Strings/Longest\_common\_substring\#Python.

Wil05

Martin Wilz. Aspekte der kodierung phonetischer Ähnlichkeiten in deutschen eigennamen. Master's thesis, Universität zu Köln, Köln, 2005. URL: http://ifl.phil-fak.uni-koeln.de/sites/linguistik/Phonetik/import/Phonetik\_Files/Allgemeine\_Dateien/Martin\_Wilz.pdf.

Win90

William E. Winkler. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. Technical Report, U.S. Bureau of the Census, Statistical Research Division, Washington, D.C., 1990. URL: https://files.eric.ed.gov/fulltext/ED325505.pdf.

WMJL94

William E. Winkler, George McLaughlin, Matthew A. Jaro, and Maureen Lync. Strcmp95.c. January 1994. URL: https://web.archive.org/web/20110629121242/http://www.census.gov/geo/msb/stand/strcmp.c.

Xia13

Hua Xiang. Similarity-based Virtual Screening: Effect of the Choice of Similarity Measure. PhD thesis, The University of Sheffield, 2013. URL: http://etheses.whiterose.ac.uk/5662/1/Thesis\_Final.pdf.

YJH+16

Ruiyu Yang, Yuxiang Jiang, Matthew W. Hahn, Elizabeth A. Houseworth, and Predrag Radivojac. New metrics for learning and inference on sets, ontologies, and functions. March 2016. URL: https://arxiv.org/abs/1603.06846v1.

Yat34

Frank Yates. Contingency tables involving small numbers and the \$\chi \$2 Test. Supplement to the Journal of the Royal Statistical Society, 1(2):217–235, 1934. doi:10.2307/2983604.

You50

William John Youden. Index for rating diagnostic tests. Cancer, 3(1):32–35, 1950. doi:10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.

YB07

Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1091–1095, 2007. doi:10.1109/TPAMI.2007.1078.

Yul12

G. Udny Yule. On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 1912. doi:10.2307/2340126.

YK68

G. Udny Yule and Maurice G. Kendall. An Introduction to the Theory of Statistics. Griffin, London, 14 edition, 1968.

Zac14

Siderite Zackwehdex. Super fast and accurate string distance algorithm: sift4. 2014. URL: https://siderite.blogspot.com/2014/11/super-fast-and-accurate-string-distance.html.

Zed15

Jesper Zedlitz. Phonet4java phonet.java. 2015. URL: https://github.com/jze/phonet4java/blob/master/src/main/java/de/zedlitz/phonet4java/Phonet.java.

ZD96

Justin Zobel and Philip Dart. Phonetic string matching: lessons from information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '96, 166–172. New York, NY, USA, 1996. ACM. doi:10.1145/243199.243258.

delHigueraMico08

Colin de la Higuera and Luisa Micó. A contextual normalised edit distance. In First International Workshop on Similarity Search and Applications (sisap 2008). 2008. doi:10.1109/SISAP.2008.17.

delPAngelesBailonM16

María del Pilar Angeles and Noemi Bailón-Miguel. Performance of spanish encoding functions during record linkage. In DATA ANALYTICS 2016: The Fifth International Conference on Data Analysis, 1–7. 2016. URL: https://core.ac.uk/download/pdf/55855695.pdf\#page=14.

delPAngelesEGGM15

María del Pilar Angeles, Adrián Espino-Gamez, and Jonathan Gil-Moncada. Comparison of a modified spanish phonetic, soundex, and phonex coding functions during data matching process. In 2015 International Conference on Informatics, Electronics Vision (ICIEV), 1–5. June 2015. URL: https://www.researchgate.net/publication/285589803\_Comparison\_of\_a\_Modified\_Spanish\_Phonetic\_Soundex\_and\_Phonex\_coding\_functions\_during\_data\_matching\_process, doi:10.1109/ICIEV.2015.7334028.

JPGTrust91

The J. Paul Getty Trust. Synoname. 1991. URL: http://www.cs.cmu.edu/Groups/AI/areas/nlp/misc/synoname/synoname.zip.

vandMaarel69

Eddy van der Maarel. On the use of ordination model in phytosociology. Vegetatio Acta Geobotanica, 19(1–6):21–46, January 1969.

vonRethS77

Hans-Peter von Reth and Hans-Jörg Schek. Eine zugriffsmethode für die phonetische Ähnlichkeitssuche. Technical Report 77.03.002, IBM Deutschland GmbH., 1977.