abydos.phonetic package¶

abydos.phonetic.

The phonetic module implements phonetic algorithms including:

Robert C. Russell’s Index

American Soundex

Refined Soundex

Daitch-Mokotoff Soundex

Kölner Phonetik

NYSIIS

Match Rating Algorithm

Metaphone

Double Metaphone

Caverphone

Alpha Search Inquiry System

Fuzzy Soundex

Phonex

Phonem

Phonix

SfinxBis

phonet

Standardized Phonetic Frequency Code

Statistics Canada

Lein

Roger Root

Oxford Name Compression Algorithm (ONCA)

Eudex phonetic hash

Haase Phonetik

Reth-Schek Phonetik

FONEM

Parmar-Kumbharana

Davidson’s Consonant Code

SoundD

PSHP Soundex/Viewex Coding

an early version of Henry Code

Norphone

Dolby Code

Phonetic Spanish

Spanish Metaphone

MetaSoundex

SoundexBR

NRL English-to-phoneme

Beider-Morse Phonetic Matching

abydos.phonetic.russell_index(word)[source]¶

Return the Russell Index (integer output) of a word.

This follows Robert C. Russell’s Index algorithm, as described in [Rus18].

Parameters:	word (str) – the word to transform
Returns:	the Russell Index value
Return type:	int

>>> russell_index('Christopher')
3813428
>>> russell_index('Niall')
715
>>> russell_index('Smith')
3614
>>> russell_index('Schmidt')
3614

abydos.phonetic.russell_index_num_to_alpha(num)[source]¶

Convert the Russell Index integer to an alphabetic string.

This follows Robert C. Russell’s Index algorithm, as described in [Rus18].

Parameters:	num (int) – a Russell Index integer value
Returns:	the Russell Index as an alphabetic string
Return type:	str

>>> russell_index_num_to_alpha(3813428)
'CRACDBR'
>>> russell_index_num_to_alpha(715)
'NAL'
>>> russell_index_num_to_alpha(3614)
'CMAD'

abydos.phonetic.russell_index_alpha(word)[source]¶

Return the Russell Index (alphabetic output) for the word.

This follows Robert C. Russell’s Index algorithm, as described in [Rus18].

Parameters:	word (str) – the word to transform
Returns:	the Russell Index value as an alphabetic string
Return type:	str

>>> russell_index_alpha('Christopher')
'CRACDBR'
>>> russell_index_alpha('Niall')
'NAL'
>>> russell_index_alpha('Smith')
'CMAD'
>>> russell_index_alpha('Schmidt')
'CMAD'

abydos.phonetic.soundex(word, max_length=4, var='American', reverse=False, zero_pad=True)[source]¶

Return the Soundex code for a word.

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 4) var (str) – the variant of the algorithm to employ (defaults to ‘American’): ’American’ follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode ’special’ follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13]. ’Census’ follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names reverse (bool) – reverse the word before computing the selected Soundex (defaults to False); This results in “Reverse Soundex”, which is useful for blocking in cases where the initial elements may be in error. zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Soundex value
Return type:	str

>>> soundex("Christopher")
'C623'
>>> soundex("Niall")
'N400'
>>> soundex('Smith')
'S530'
>>> soundex('Schmidt')
'S530'

>>> soundex('Christopher', max_length=-1)
'C623160000000000000000000000000000000000000000000000000000000000'
>>> soundex('Christopher', max_length=-1, zero_pad=False)
'C62316'

>>> soundex('Christopher', reverse=True)
'R132'

>>> soundex('Ashcroft')
'A261'
>>> soundex('Asicroft')
'A226'
>>> soundex('Ashcroft', var='special')
'A226'
>>> soundex('Asicroft', var='special')
'A226'

abydos.phonetic.refined_soundex(word, max_length=-1, zero_pad=False, retain_vowels=False)[source]¶

Return the Refined Soundex code for a word.

This is Soundex, but with more character classes. It was defined at [Boy98].

Parameters:	word – the word to transform max_length – the length of the code returned (defaults to unlimited) zero_pad – pad the end of the return value with 0s to achieve a max_length string retain_vowels – retain vowels (as 0) in the resulting code
Returns:	the Refined Soundex value
Return type:	str

>>> refined_soundex('Christopher')
'C393619'
>>> refined_soundex('Niall')
'N87'
>>> refined_soundex('Smith')
'S386'
>>> refined_soundex('Schmidt')
'S386'

abydos.phonetic.dm_soundex(word, max_length=6, zero_pad=True)[source]¶

Return the Daitch-Mokotoff Soundex code for a word.

Based on Daitch-Mokotoff Soundex [Mok97], this returns values of a word as a set. A collection is necessary since there can be multiple values for a single word.

Parameters:	word – the word to transform max_length – the length of the code returned (defaults to 6; must be between 6 and 64) zero_pad – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Daitch-Mokotoff Soundex value
Return type:	str

>>> sorted(dm_soundex('Christopher'))
['494379', '594379']
>>> dm_soundex('Niall')
{'680000'}
>>> dm_soundex('Smith')
{'463000'}
>>> dm_soundex('Schmidt')
{'463000'}

>>> sorted(dm_soundex('The quick brown fox', max_length=20,
... zero_pad=False))
['35457976754', '3557976754']

abydos.phonetic.fuzzy_soundex(word, max_length=5, zero_pad=True)[source]¶

Return the Fuzzy Soundex code for a word.

Fuzzy Soundex is an algorithm derived from Soundex, defined in [HM02].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 4) zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Fuzzy Soundex value
Return type:	str

>>> fuzzy_soundex('Christopher')
'K6931'
>>> fuzzy_soundex('Niall')
'N4000'
>>> fuzzy_soundex('Smith')
'S5300'
>>> fuzzy_soundex('Smith')
'S5300'

abydos.phonetic.lein(word, max_length=4, zero_pad=True)[source]¶

Return the Lein code for a word.

This is Lein name coding, described in [MKTM77].

Parameters:	word (str) – the word to transform max_length (int) – the maximum length (default 4) of the code to return zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Lein code
Return type:	str

>>> lein('Christopher')
'C351'
>>> lein('Niall')
'N300'
>>> lein('Smith')
'S210'
>>> lein('Schmidt')
'S521'

abydos.phonetic.phonex(word, max_length=4, zero_pad=True)[source]¶

Return the Phonex code for a word.

Phonex is an algorithm derived from Soundex, defined in [LR96].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 4) zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Phonex value
Return type:	str

>>> phonex('Christopher')
'C623'
>>> phonex('Niall')
'N400'
>>> phonex('Schmidt')
'S253'
>>> phonex('Smith')
'S530'

abydos.phonetic.phonix(word, max_length=4, zero_pad=True)[source]¶

Return the Phonix code for a word.

Phonix is a Soundex-like algorithm defined in [Gad90].

This implementation is based on: - [Pfe00] - [Chr11] - [Kollar]

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 4) zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Phonix value
Return type:	str

>>> phonix('Christopher')
'K683'
>>> phonix('Niall')
'N400'
>>> phonix('Smith')
'S530'
>>> phonix('Schmidt')
'S530'

abydos.phonetic.pshp_soundex_first(fname, max_length=4, german=False)[source]¶

Calculate the PSHP Soundex/Viewex Coding of a first name.

This coding is based on [HBD76].

Reference was also made to the German version of the same: [HBD79].

A separate function, pshp_soundex_last() is used for last names.

Parameters:	fname (str) – the first name to encode max_length (int) – the length of the code returned (defaults to 4) german (bool) – set to True if the name is German (different rules apply)
Returns:	the PSHP Soundex/Viewex Coding
Return type:	str

>>> pshp_soundex_first('Smith')
'S530'
>>> pshp_soundex_first('Waters')
'W352'
>>> pshp_soundex_first('James')
'J700'
>>> pshp_soundex_first('Schmidt')
'S500'
>>> pshp_soundex_first('Ashcroft')
'A220'
>>> pshp_soundex_first('John')
'J500'
>>> pshp_soundex_first('Colin')
'K400'
>>> pshp_soundex_first('Niall')
'N400'
>>> pshp_soundex_first('Sally')
'S400'
>>> pshp_soundex_first('Jane')
'J500'

abydos.phonetic.pshp_soundex_last(lname, max_length=4, german=False)[source]¶

Calculate the PSHP Soundex/Viewex Coding of a last name.

This coding is based on [HBD76].

Reference was also made to the German version of the same: [HBD79].

A separate function, pshp_soundex_first() is used for first names.

Parameters:	lname (str) – the last name to encode max_length (int) – the length of the code returned (defaults to 4) german (bool) – set to True if the name is German (different rules apply)
Returns:	the PSHP Soundex/Viewex Coding
Return type:	str

>>> pshp_soundex_last('Smith')
'S530'
>>> pshp_soundex_last('Waters')
'W350'
>>> pshp_soundex_last('James')
'J500'
>>> pshp_soundex_last('Schmidt')
'S530'
>>> pshp_soundex_last('Ashcroft')
'A225'

abydos.phonetic.nysiis(word, max_length=6, modified=False)[source]¶

Return the NYSIIS code for a word.

The New York State Identification and Intelligence System algorithm is defined in [Taf70].

The modified version of this algorithm is described in Appendix B of [LA77].

Parameters:	word (str) – the word to transform max_length (int) – the maximum length (default 6) of the code to return modified (bool) – indicates whether to use USDA modified NYSIIS
Returns:	the NYSIIS value
Return type:	str

>>> nysiis('Christopher')
'CRASTA'
>>> nysiis('Niall')
'NAL'
>>> nysiis('Smith')
'SNAT'
>>> nysiis('Schmidt')
'SNAD'

>>> nysiis('Christopher', max_length=-1)
'CRASTAFAR'

>>> nysiis('Christopher', max_length=8, modified=True)
'CRASTAFA'
>>> nysiis('Niall', max_length=8, modified=True)
'NAL'
>>> nysiis('Smith', max_length=8, modified=True)
'SNAT'
>>> nysiis('Schmidt', max_length=8, modified=True)
'SNAD'

abydos.phonetic.mra(word)[source]¶

Return the MRA personal numeric identifier (PNI) for a word.

A description of the Western Airlines Surname Match Rating Algorithm can be found on page 18 of [MKTM77].

Parameters:	word (str) – the word to transform
Returns:	the MRA PNI
Return type:	str

>>> mra('Christopher')
'CHRPHR'
>>> mra('Niall')
'NL'
>>> mra('Smith')
'SMTH'
>>> mra('Schmidt')
'SCHMDT'

abydos.phonetic.caverphone(word, version=2)[source]¶

Return the Caverphone code for a word.

A description of version 1 of the algorithm can be found in [Hoo02].

A description of version 2 of the algorithm can be found in [Hoo04].

Parameters:	word (str) – the word to transform version (int) – the version of Caverphone to employ for encoding (defaults to 2)
Returns:	the Caverphone value
Return type:	str

>>> caverphone('Christopher')
'KRSTFA1111'
>>> caverphone('Niall')
'NA11111111'
>>> caverphone('Smith')
'SMT1111111'
>>> caverphone('Schmidt')
'SKMT111111'

>>> caverphone('Christopher', 1)
'KRSTF1'
>>> caverphone('Niall', 1)
'N11111'
>>> caverphone('Smith', 1)
'SMT111'
>>> caverphone('Schmidt', 1)
'SKMT11'

abydos.phonetic.alpha_sis(word, max_length=14)[source]¶

Return the IBM Alpha Search Inquiry System code for a word.

The Alpha Search Inquiry System code is defined in [IBMCorporation73]. This implementation is based on the description in [MKTM77].

A collection is necessary since there can be multiple values for a single word. But the collection must be ordered since the first value is the primary coding.

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 14)
Returns:	the Alpha SIS value
Return type:	tuple

>>> alpha_sis('Christopher')
('06401840000000', '07040184000000', '04018400000000')
>>> alpha_sis('Niall')
('02500000000000',)
>>> alpha_sis('Smith')
('03100000000000',)
>>> alpha_sis('Schmidt')
('06310000000000',)

abydos.phonetic.davidson(lname, fname='.', omit_fname=False)[source]¶

Return Davidson’s Consonant Code.

This is based on the name compression system described in [Dav62].

[Dol70] identifies this as having been the name compression algorithm used by SABRE.

Parameters:	lname (str) – Last name (or word) to be encoded fname (str) – First name (optional), of which the first character is included in the code. omit_fname (bool) – Set to True to completely omit the first character of the first name
Returns:	Davidson’s Consonant Code
Return type:	str

>>> davidson('Gough')
'G   .'
>>> davidson('pneuma')
'PNM .'
>>> davidson('knight')
'KNGT.'
>>> davidson('trice')
'TRC .'
>>> davidson('judge')
'JDG .'
>>> davidson('Smith', 'James')
'SMT J'
>>> davidson('Wasserman', 'Tabitha')
'WSRMT'

abydos.phonetic.dolby(word, max_length=-1, keep_vowels=False, vowel_char='*')[source]¶

Return the Dolby Code of a name.

This follows “A Spelling Equivalent Abbreviation Algorithm For Personal Names” from [Dol70] and [C+69].

Parameters:	word – the word to encode max_length – maximum length of the returned Dolby code – this also activates the fixed-length code mode if it is greater than 0 keep_vowels – if True, retains all vowel markers vowel_char – the vowel marker character (default to *)
Returns:	the Dolby Code
Return type:	str

>>> dolby('Hansen')
'H*NSN'
>>> dolby('Larsen')
'L*RSN'
>>> dolby('Aagaard')
'*GR'
>>> dolby('Braaten')
'BR*DN'
>>> dolby('Sandvik')
'S*NVK'
>>> dolby('Hansen', max_length=6)
'H*NS*N'
>>> dolby('Larsen', max_length=6)
'L*RS*N'
>>> dolby('Aagaard', max_length=6)
'*G*R  '
>>> dolby('Braaten', max_length=6)
'BR*D*N'
>>> dolby('Sandvik', max_length=6)
'S*NF*K'

>>> dolby('Smith')
'SM*D'
>>> dolby('Waters')
'W*DRS'
>>> dolby('James')
'J*MS'
>>> dolby('Schmidt')
'SM*D'
>>> dolby('Ashcroft')
'*SKRFD'
>>> dolby('Smith', max_length=6)
'SM*D  '
>>> dolby('Waters', max_length=6)
'W*D*RS'
>>> dolby('James', max_length=6)
'J*M*S '
>>> dolby('Schmidt', max_length=6)
'SM*D  '
>>> dolby('Ashcroft', max_length=6)
'*SKRFD'

abydos.phonetic.spfc(word)[source]¶

Return the Standardized Phonetic Frequency Code (SPFC) of a word.

Standardized Phonetic Frequency Code is roughly Soundex-like. This implementation is based on page 19-21 of [MKTM77].

Parameters:	word (str) – the word to transform
Returns:	the SPFC value
Return type:	str

>>> spfc('Christopher Smith')
'01160'
>>> spfc('Christopher Schmidt')
'01160'
>>> spfc('Niall Smith')
'01660'
>>> spfc('Niall Schmidt')
'01660'

>>> spfc('L.Smith')
'01960'
>>> spfc('R.Miller')
'65490'

>>> spfc(('L', 'Smith'))
'01960'
>>> spfc(('R', 'Miller'))
'65490'

abydos.phonetic.roger_root(word, max_length=5, zero_pad=True)[source]¶

Return the Roger Root code for a word.

This is Roger Root name coding, described in [MKTM77].

Parameters:	word (str) – the word to transform max_length (int) – the maximum length (default 5) of the code to return zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the Roger Root code
Return type:	str

>>> roger_root('Christopher')
'06401'
>>> roger_root('Niall')
'02500'
>>> roger_root('Smith')
'00310'
>>> roger_root('Schmidt')
'06310'

abydos.phonetic.statistics_canada(word, max_length=4)[source]¶

Return the Statistics Canada code for a word.

The original description of this algorithm could not be located, and may only have been specified in an unpublished TR. The coding does not appear to be in use by Statistics Canada any longer. In its place, this is an implementation of the “Census modified Statistics Canada name coding procedure”.

The modified version of this algorithm is described in Appendix B of: [MKTM77].

Parameters:	word (str) – the word to transform max_length (int) – the maximum length (default 4) of the code to return
Returns:	the Statistics Canada name code value
Return type:	str

>>> statistics_canada('Christopher')
'CHRS'
>>> statistics_canada('Niall')
'NL'
>>> statistics_canada('Smith')
'SMTH'
>>> statistics_canada('Schmidt')
'SCHM'

abydos.phonetic.sound_d(word, max_length=4)[source]¶

Return the SoundD code.

SoundD is defined in [VB12].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 4)
Returns:	the SoundD code
Return type:	str

>>> sound_d('Gough')
'2000'
>>> sound_d('pneuma')
'5500'
>>> sound_d('knight')
'5300'
>>> sound_d('trice')
'3620'
>>> sound_d('judge')
'2200'

abydos.phonetic.parmar_kumbharana(word)[source]¶

Return the Parmar-Kumbharana encoding of a word.

This is based on the phonetic algorithm proposed in [PK14].

Parameters:	word (str) – the word to transform
Returns:	the Parmar-Kumbharana encoding
Return type:	str

>>> parmar_kumbharana('Gough')
'GF'
>>> parmar_kumbharana('pneuma')
'NM'
>>> parmar_kumbharana('knight')
'NT'
>>> parmar_kumbharana('trice')
'TRS'
>>> parmar_kumbharana('judge')
'JJ'

abydos.phonetic.metaphone(word, max_length=-1)[source]¶

Return the Metaphone code for a word.

Based on Lawrence Philips’ Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].

Parameters:	word (str) – the word to transform max_length (int) – the maximum length of the returned Metaphone code (defaults to 64, but in Philips’ original implementation this was 4)
Returns:	the Metaphone value
Return type:	str

>>> metaphone('Christopher')
'KRSTFR'
>>> metaphone('Niall')
'NL'
>>> metaphone('Smith')
'SM0'
>>> metaphone('Schmidt')
'SKMTT'

abydos.phonetic.double_metaphone(word, max_length=-1)[source]¶

Return the Double Metaphone code for a word.

Based on Lawrence Philips’ (Visual) C++ code from 1999 [Phi00].

Parameters:	word – the word to transform max_length – the maximum length of the returned Double Metaphone codes (defaults to 64, but in Philips’ original implementation this was 4)
Returns:	the Double Metaphone value(s)
Return type:	tuple

>>> double_metaphone('Christopher')
('KRSTFR', '')
>>> double_metaphone('Niall')
('NL', '')
>>> double_metaphone('Smith')
('SM0', 'XMT')
>>> double_metaphone('Schmidt')
('XMT', 'SMT')

abydos.phonetic.eudex(word, max_length=8)[source]¶

Return the eudex phonetic hash of a word.

This implementation of eudex phonetic hashing is based on the specification (not the reference implementation) at [Tic].

Further details can be found at [Tic16].

Parameters:	word (str) – the word to transform max_length (int) – the length in bits of the code returned (default 8)
Returns:	the eudex hash
Return type:	int

>>> eudex('Colin')
432345564238053650
>>> eudex('Christopher')
433648490138894409
>>> eudex('Niall')
648518346341351840
>>> eudex('Smith')
720575940412906756
>>> eudex('Schmidt')
720589151732307997

abydos.phonetic.bmpm(word, language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]¶

Return the Beider-Morse Phonetic Matching encoding(s) of a term.

The Beider-Morse Phonetic Matching algorithm is described in [BM08]. The reference implementation is licensed under GPLv3.

Parameters:	word (str) – the word to transform language_arg (str) – the language of the term; supported values include: ’any’ ’arabic’ ’cyrillic’ ’czech’ ’dutch’ ’english’ ’french’ ’german’ ’greek’ ’greeklatin’ ’hebrew’ ’hungarian’ ’italian’ ’latvian’ ’polish’ ’portuguese’ ’romanian’ ’russian’ ’spanish’ ’turkish’ name_mode (str) – the name mode of the algorithm: ’gen’ – general (default) ’ash’ – Ashkenazi ’sep’ – Sephardic match_mode (str) – matching mode: ‘approx’ or ‘exact’ concat (bool) – concatenation mode filter_langs (bool) – filter out incompatible languages
Returns:	the BMPM value(s)
Return type:	tuple

>>> bmpm('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir xristopi xritopir xritopi xristofi xritofir xritofi tzristopir
tzristofir zristopir zristopi zritopir zritopi zristofir zristofi zritofir
zritofi'
>>> bmpm('Niall')
'nial niol'
>>> bmpm('Smith')
'zmit'
>>> bmpm('Schmidt')
'zmit stzmit'

>>> bmpm('Christopher', language_arg='German')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir'
>>> bmpm('Christopher', language_arg='English')
'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir xristafir
xrQstafir'
>>> bmpm('Christopher', language_arg='German', name_mode='ash')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir'

>>> bmpm('Christopher', language_arg='German', match_mode='exact')
'xriStopher xriStofer xristopher xristofer'

abydos.phonetic.nrl(word)[source]¶

Return the Naval Research Laboratory phonetic encoding of a word.

This is defined by [EJMS76].

Parameters:	word (str) – the word to transform
Returns:	the NRL phonetic encoding
Return type:	str

>>> nrl('the')
'DHAX'
>>> nrl('round')
'rAWnd'
>>> nrl('quick')
'kwIHk'
>>> nrl('eaten')
'IYtEHn'
>>> nrl('Smith')
'smIHTH'
>>> nrl('Larsen')
'lAArsEHn'

abydos.phonetic.metasoundex(word, lang='en')[source]¶

Return the MetaSoundex code for a word.

This is based on [KV17]. Only English (‘en’) and Spanish (‘es’) languages are supported, as in the original.

Parameters:	word (str) – the word to transform lang (str) – either ‘en’ for English or ‘es’ for Spanish
Returns:	the MetaSoundex code
Return type:	str

>>> metasoundex('Smith')
'4500'
>>> metasoundex('Waters')
'7362'
>>> metasoundex('James')
'1520'
>>> metasoundex('Schmidt')
'4530'
>>> metasoundex('Ashcroft')
'0261'
>>> metasoundex('Perez', lang='es')
'094'
>>> metasoundex('Martinez', lang='es')
'69364'
>>> metasoundex('Gutierrez', lang='es')
'83994'
>>> metasoundex('Santiago', lang='es')
'4638'
>>> metasoundex('Nicolás', lang='es')
'6754'

abydos.phonetic.onca(word, max_length=4, zero_pad=True)[source]¶

Return the Oxford Name Compression Algorithm (ONCA) code for a word.

This is the Oxford Name Compression Algorithm, based on [Gil97].

I can find no complete description of the “anglicised version of the NYSIIS method” identified as the first step in this algorithm, so this is likely not a precisely correct implementation, in that it employs the standard NYSIIS algorithm.

Parameters:	word (str) – the word to transform max_length (int) – the maximum length (default 5) of the code to return zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the ONCA code
Return type:	str

>>> onca('Christopher')
'C623'
>>> onca('Niall')
'N400'
>>> onca('Smith')
'S530'
>>> onca('Schmidt')
'S530'

abydos.phonetic.fonem(word)[source]¶

Return the FONEM code of a word.

FONEM is a phonetic algorithm designed for French (particularly surnames in Saguenay, Canada), defined in [BBL81].

Guillaume Plique’s Javascript implementation [Pli18] at https://github.com/Yomguithereal/talisman/blob/master/src/phonetics/french/fonem.js was also consulted for this implementation.

Parameters:	word (str) – the word to transform
Returns:	the FONEM code
Return type:	str

>>> fonem('Marchand')
'MARCHEN'
>>> fonem('Beaulieu')
'BOLIEU'
>>> fonem('Beaumont')
'BOMON'
>>> fonem('Legrand')
'LEGREN'
>>> fonem('Pelletier')
'PELETIER'

abydos.phonetic.henry_early(word, max_length=3)[source]¶

Calculate the early version of the Henry code for a word.

The early version of Henry coding is given in [LegareLC72]. This is different from the later version defined in [Hen76].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 3)
Returns:	the early Henry code
Return type:	str

>>> henry_early('Marchand')
'MRC'
>>> henry_early('Beaulieu')
'BL'
>>> henry_early('Beaumont')
'BM'
>>> henry_early('Legrand')
'LGR'
>>> henry_early('Pelletier')
'PLT'

abydos.phonetic.koelner_phonetik(word)[source]¶

Return the Kölner Phonetik (numeric output) code for a word.

Based on the algorithm defined by [Pos69].

While the output code is numeric, it is still a str because 0s can lead the code.

Parameters:	word (str) – the word to transform
Returns:	the Kölner Phonetik value as a numeric string
Return type:	str

>>> koelner_phonetik('Christopher')
'478237'
>>> koelner_phonetik('Niall')
'65'
>>> koelner_phonetik('Smith')
'862'
>>> koelner_phonetik('Schmidt')
'862'
>>> koelner_phonetik('Müller')
'657'
>>> koelner_phonetik('Zimmermann')
'86766'

abydos.phonetic.koelner_phonetik_num_to_alpha(num)[source]¶

Convert a Kölner Phonetik code from numeric to alphabetic.

Parameters:	num (str) – a numeric Kölner Phonetik representation (can be a str or an int)
Returns:	an alphabetic representation of the same word
Return type:	str

>>> koelner_phonetik_num_to_alpha('862')
'SNT'
>>> koelner_phonetik_num_to_alpha('657')
'NLR'
>>> koelner_phonetik_num_to_alpha('86766')
'SNRNN'

abydos.phonetic.koelner_phonetik_alpha(word)[source]¶

Return the Kölner Phonetik (alphabetic output) code for a word.

Parameters:	word (str) – the word to transform
Returns:	the Kölner Phonetik value as an alphabetic string
Return type:	str

>>> koelner_phonetik_alpha('Smith')
'SNT'
>>> koelner_phonetik_alpha('Schmidt')
'SNT'
>>> koelner_phonetik_alpha('Müller')
'NLR'
>>> koelner_phonetik_alpha('Zimmermann')
'SNRNN'

abydos.phonetic.haase_phonetik(word, primary_only=False)[source]¶

Return the Haase Phonetik (numeric output) code for a word.

Based on the algorithm described at [Pra15].

Based on the original [HH00].

While the output code is numeric, it is nevertheless a str.

Parameters:	word (str) – the word to transform primary_only (bool) – if True, only the primary code is returned
Returns:	the Haase Phonetik value as a numeric string
Return type:	tuple

>>> haase_phonetik('Joachim')
('9496',)
>>> haase_phonetik('Christoph')
('4798293', '8798293')
>>> haase_phonetik('Jörg')
('974',)
>>> haase_phonetik('Smith')
('8692',)
>>> haase_phonetik('Schmidt')
('8692', '4692')

abydos.phonetic.reth_schek_phonetik(word)[source]¶

Return Reth-Schek Phonetik code for a word.

This algorithm is proposed in [vonRethS77].

Since I couldn’t secure a copy of that document (maybe I’ll look for it next time I’m in Germany), this implementation is based on what I could glean from the implementations published by German Record Linkage Center (www.record-linkage.de):

Privacy-preserving Record Linkage (PPRL) (in R) [Ruk18]
Merge ToolBox (in Java) [SBB04]

Rules that are unclear:

Should ‘C’ become ‘G’ or ‘Z’? (PPRL has both, ‘Z’ rule blocked)
Should ‘CC’ become ‘G’? (PPRL has blocked ‘CK’ that may be typo)
Should ‘TUI’ -> ‘ZUI’ rule exist? (PPRL has rule, but I can’t think of a German word with ‘-tui-‘ in it.)
Should we really change ‘SCH’ -> ‘CH’ and then ‘CH’ -> ‘SCH’?

Parameters:	word (str) – the word to transform
Returns:	the Reth-Schek Phonetik code
Return type:	str

>>> reth_schek_phonetik('Joachim')
'JOAGHIM'
>>> reth_schek_phonetik('Christoph')
'GHRISDOF'
>>> reth_schek_phonetik('Jörg')
'JOERG'
>>> reth_schek_phonetik('Smith')
'SMID'
>>> reth_schek_phonetik('Schmidt')
'SCHMID'

abydos.phonetic.phonem(word)[source]¶

Return the Phonem code for a word.

Phonem is defined in [GM88].

This version is based on the Perl implementation documented at [Wil05]. It includes some enhancements presented in the Java port at [dcm4che].

Phonem is intended chiefly for German names/words.

Parameters:	word (str) – the word to transform
Returns:	the Phonem value
Return type:	str

>>> phonem('Christopher')
'CRYSDOVR'
>>> phonem('Niall')
'NYAL'
>>> phonem('Smith')
'SMYD'
>>> phonem('Schmidt')
'CMYD'

abydos.phonetic.phonet(word, mode=1, lang='de')[source]¶

Return the phonet code for a word.

phonet (“Hannoveraner Phonetik”) was developed by Jörg Michael and documented in [Mic99].

This is a port of Jesper Zedlitz’s code, which is licensed LGPL [Zed15].

That is, in turn, based on Michael’s C code, which is also licensed LGPL [Mic07].

Parameters:	word (str) – the word to transform mode (int) – the ponet variant to employ (1 or 2) lang (str) – ‘de’ (default) for German ‘none’ for no language
Returns:	the phonet value
Return type:	str

>>> phonet('Christopher')
'KRISTOFA'
>>> phonet('Niall')
'NIAL'
>>> phonet('Smith')
'SMIT'
>>> phonet('Schmidt')
'SHMIT'

>>> phonet('Christopher', mode=2)
'KRIZTUFA'
>>> phonet('Niall', mode=2)
'NIAL'
>>> phonet('Smith', mode=2)
'ZNIT'
>>> phonet('Schmidt', mode=2)
'ZNIT'

>>> phonet('Christopher', lang='none')
'CHRISTOPHER'
>>> phonet('Niall', lang='none')
'NIAL'
>>> phonet('Smith', lang='none')
'SMITH'
>>> phonet('Schmidt', lang='none')
'SCHMIDT'

abydos.phonetic.soundex_br(word, max_length=4, zero_pad=True)[source]¶

Return the SoundexBR encoding of a word.

This is based on [Mar15].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 4) zero_pad (bool) – pad the end of the return value with 0s to achieve a max_length string
Returns:	the SoundexBR code
Return type:	str

>>> soundex_br('Oliveira')
'O416'
>>> soundex_br('Almeida')
'A453'
>>> soundex_br('Barbosa')
'B612'
>>> soundex_br('Araújo')
'A620'
>>> soundex_br('Gonçalves')
'G524'
>>> soundex_br('Goncalves')
'G524'

abydos.phonetic.phonetic_spanish(word, max_length=-1)[source]¶

Return the PhoneticSpanish coding of word.

This follows the coding described in [AmonME12] and [delPAngelesEGGM15].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to unlimited)
Returns:	the PhoneticSpanish code
Return type:	str

>>> phonetic_spanish('Perez')
'094'
>>> phonetic_spanish('Martinez')
'69364'
>>> phonetic_spanish('Gutierrez')
'83994'
>>> phonetic_spanish('Santiago')
'4638'
>>> phonetic_spanish('Nicolás')
'6454'

abydos.phonetic.spanish_metaphone(word, max_length=6, modified=False)[source]¶

Return the Spanish Metaphone of a word.

This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at https://github.com/amsqr/Spanish-Metaphone and discussed in [MLM12].

Modified version based on [delPAngelesBailonM16].

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to 6) modified (bool) – Set to True to use del Pilar Angeles & Bailón-Miguel’s modified version of the algorithm
Returns:	the Spanish Metaphone code
Return type:	str

>>> spanish_metaphone('Perez')
'PRZ'
>>> spanish_metaphone('Martinez')
'MRTNZ'
>>> spanish_metaphone('Gutierrez')
'GTRRZ'
>>> spanish_metaphone('Santiago')
'SNTG'
>>> spanish_metaphone('Nicolás')
'NKLS'

abydos.phonetic.sfinxbis(word, max_length=-1)[source]¶

Return the SfinxBis code for a word.

SfinxBis is a Soundex-like algorithm defined in [Axe09].

This implementation follows the reference implementation: [Sjoo09].

SfinxBis is intended chiefly for Swedish names.

Parameters:	word (str) – the word to transform max_length (int) – the length of the code returned (defaults to unlimited)
Returns:	the SfinxBis value
Return type:	tuple

>>> sfinxbis('Christopher')
('K68376',)
>>> sfinxbis('Niall')
('N4',)
>>> sfinxbis('Smith')
('S53',)
>>> sfinxbis('Schmidt')
('S53',)

>>> sfinxbis('Johansson')
('J585',)
>>> sfinxbis('Sjöberg')
('#162',)

abydos.phonetic.norphone(word)[source]¶

Return the Norphone code.

The reference implementation by Lars Marius Garshol is available in [Gar15].

Norphone was designed for Norwegian, but this implementation has been extended to support Swedish vowels as well. This function incorporates the “not implemented” rules from the above file’s rule set.

Parameters:	word (str) – the word to transform
Returns:	the Norphone code
Return type:	str

>>> norphone('Hansen')
'HNSN'
>>> norphone('Larsen')
'LRSN'
>>> norphone('Aagaard')
'ÅKRT'
>>> norphone('Braaten')
'BRTN'
>>> norphone('Sandvik')
'SNVK'