abydos.phonetic package

abydos.phonetic.

The phonetic package includes classes for phonetic algorithms, including:

There are also language-specific phonetic algorithms for German:

For French:

For Spanish:

For Swedish:

For Norwegian:

For Brazilian Portuguese:

And there are some hybrid phonetic algorithms that employ multiple underlying phonetic algorithms:

  • Oxford Name Compression Algorithm (ONCA) (ONCA)
  • MetaSoundex (MetaSoundex)

Each class has an encode method to return the phonetically encoded string. Classes for which encode returns a numeric value generally have an encode_alpha method that returns an alphabetic version of the phonetic encoding, as demonstrated below:

>>> rus = RussellIndex()
>>> rus.encode('Abramson')
128637
>>> rus.encode_alpha('Abramson')
'ABRMCN'

class abydos.phonetic.RussellIndex[source]

Bases: abydos.phonetic._phonetic._Phonetic

Russell Index.

This follows Robert C. Russell's Index algorithm, as described in [Rus18].

encode(word)[source]

Return the Russell Index (integer output) of a word.

Parameters:word (str) -- The word to transform
Returns:The Russell Index value
Return type:int

Examples

>>> pe = RussellIndex()
>>> pe.encode('Christopher')
3813428
>>> pe.encode('Niall')
715
>>> pe.encode('Smith')
3614
>>> pe.encode('Schmidt')
3614
encode_alpha(word)[source]

Return the Russell Index (alphabetic output) for the word.

This follows Robert C. Russell's Index algorithm, as described in [Rus18].

Parameters:word (str) -- The word to transform
Returns:The Russell Index value as an alphabetic string
Return type:str

Examples

>>> pe = RussellIndex()
>>> pe.encode_alpha('Christopher')
'CRACDBR'
>>> pe.encode_alpha('Niall')
'NAL'
>>> pe.encode_alpha('Smith')
'CMAD'
>>> pe.encode_alpha('Schmidt')
'CMAD'
abydos.phonetic.russell_index(word)[source]

Return the Russell Index (integer output) of a word.

This is a wrapper for RussellIndex.encode().

Parameters:word (str) -- The word to transform
Returns:The Russell Index value
Return type:int

Examples

>>> russell_index('Christopher')
3813428
>>> russell_index('Niall')
715
>>> russell_index('Smith')
3614
>>> russell_index('Schmidt')
3614
abydos.phonetic.russell_index_num_to_alpha(num)[source]

Convert the Russell Index integer to an alphabetic string.

This is a wrapper for RussellIndex._to_alpha().

Parameters:num (int) -- A Russell Index integer value
Returns:The Russell Index as an alphabetic string
Return type:str

Examples

>>> russell_index_num_to_alpha(3813428)
'CRACDBR'
>>> russell_index_num_to_alpha(715)
'NAL'
>>> russell_index_num_to_alpha(3614)
'CMAD'
abydos.phonetic.russell_index_alpha(word)[source]

Return the Russell Index (alphabetic output) for the word.

This is a wrapper for RussellIndex.encode_alpha().

Parameters:word (str) -- The word to transform
Returns:The Russell Index value as an alphabetic string
Return type:str

Examples

>>> russell_index_alpha('Christopher')
'CRACDBR'
>>> russell_index_alpha('Niall')
'NAL'
>>> russell_index_alpha('Smith')
'CMAD'
>>> russell_index_alpha('Schmidt')
'CMAD'
class abydos.phonetic.Soundex[source]

Bases: abydos.phonetic._phonetic._Phonetic

Soundex.

Three variants of Soundex are implemented:

  • 'American' follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
  • 'special' follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
  • 'Census' follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
encode(word, max_length=4, var='American', reverse=False, zero_pad=True)[source]

Return the Soundex code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • var (str) --

    The variant of the algorithm to employ (defaults to American):

    • American follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
    • special follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
    • Census follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
  • reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Soundex value

Return type:

str

Examples

>>> pe = Soundex()
>>> pe.encode("Christopher")
'C623'
>>> pe.encode("Niall")
'N400'
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Schmidt')
'S530'
>>> pe.encode('Christopher', max_length=-1)
'C623160000000000000000000000000000000000000000000000000000000000'
>>> pe.encode('Christopher', max_length=-1, zero_pad=False)
'C62316'
>>> pe.encode('Christopher', reverse=True)
'R132'
>>> pe.encode('Ashcroft')
'A261'
>>> pe.encode('Asicroft')
'A226'
>>> pe.encode('Ashcroft', var='special')
'A226'
>>> pe.encode('Asicroft', var='special')
'A226'
abydos.phonetic.soundex(word, max_length=4, var='American', reverse=False, zero_pad=True)[source]

Return the Soundex code for a word.

This is a wrapper for Soundex.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • var (str) --

    The variant of the algorithm to employ (defaults to American):

    • American follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
    • special follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
    • Census follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
  • reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Soundex value

Return type:

str

Examples

>>> soundex("Christopher")
'C623'
>>> soundex("Niall")
'N400'
>>> soundex('Smith')
'S530'
>>> soundex('Schmidt')
'S530'
>>> soundex('Christopher', max_length=-1)
'C623160000000000000000000000000000000000000000000000000000000000'
>>> soundex('Christopher', max_length=-1, zero_pad=False)
'C62316'
>>> soundex('Christopher', reverse=True)
'R132'
>>> soundex('Ashcroft')
'A261'
>>> soundex('Asicroft')
'A226'
>>> soundex('Ashcroft', var='special')
'A226'
>>> soundex('Asicroft', var='special')
'A226'
class abydos.phonetic.RefinedSoundex[source]

Bases: abydos.phonetic._phonetic._Phonetic

Refined Soundex.

This is Soundex, but with more character classes. It was defined at [Boy98].

encode(word, max_length=-1, zero_pad=False, retain_vowels=False)[source]

Return the Refined Soundex code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to unlimited)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
  • retain_vowels (bool) -- Retain vowels (as 0) in the resulting code
Returns:

The Refined Soundex value

Return type:

str

Examples

>>> pe = RefinedSoundex()
>>> pe.encode('Christopher')
'C393619'
>>> pe.encode('Niall')
'N87'
>>> pe.encode('Smith')
'S386'
>>> pe.encode('Schmidt')
'S386'
abydos.phonetic.refined_soundex(word, max_length=-1, zero_pad=False, retain_vowels=False)[source]

Return the Refined Soundex code for a word.

This is a wrapper for RefinedSoundex.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to unlimited)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
  • retain_vowels (bool) -- Retain vowels (as 0) in the resulting code
Returns:

The Refined Soundex value

Return type:

str

Examples

>>> refined_soundex('Christopher')
'C393619'
>>> refined_soundex('Niall')
'N87'
>>> refined_soundex('Smith')
'S386'
>>> refined_soundex('Schmidt')
'S386'
class abydos.phonetic.DaitchMokotoff[source]

Bases: abydos.phonetic._phonetic._Phonetic

Daitch-Mokotoff Soundex.

Based on Daitch-Mokotoff Soundex [Mok97], this returns values of a word as a set. A collection is necessary since there can be multiple values for a single word.

encode(word, max_length=6, zero_pad=True)[source]

Return the Daitch-Mokotoff Soundex code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Daitch-Mokotoff Soundex value

Return type:

str

Examples

>>> pe = DaitchMokotoff()
>>> sorted(pe.encode('Christopher'))
['494379', '594379']
>>> pe.encode('Niall')
{'680000'}
>>> pe.encode('Smith')
{'463000'}
>>> pe.encode('Schmidt')
{'463000'}
>>> sorted(pe.encode('The quick brown fox', max_length=20,
... zero_pad=False))
['35457976754', '3557976754']
abydos.phonetic.dm_soundex(word, max_length=6, zero_pad=True)[source]

Return the Daitch-Mokotoff Soundex code for a word.

This is a wrapper for DaitchMokotoff.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Daitch-Mokotoff Soundex value

Return type:

str

Examples

>>> sorted(dm_soundex('Christopher'))
['494379', '594379']
>>> dm_soundex('Niall')
{'680000'}
>>> dm_soundex('Smith')
{'463000'}
>>> dm_soundex('Schmidt')
{'463000'}
>>> sorted(dm_soundex('The quick brown fox', max_length=20,
... zero_pad=False))
['35457976754', '3557976754']
class abydos.phonetic.FuzzySoundex[source]

Bases: abydos.phonetic._phonetic._Phonetic

Fuzzy Soundex.

Fuzzy Soundex is an algorithm derived from Soundex, defined in [HM02].

encode(word, max_length=5, zero_pad=True)[source]

Return the Fuzzy Soundex code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Fuzzy Soundex value

Return type:

str

Examples

>>> pe = FuzzySoundex()
>>> pe.encode('Christopher')
'K6931'
>>> pe.encode('Niall')
'N4000'
>>> pe.encode('Smith')
'S5300'
>>> pe.encode('Smith')
'S5300'
abydos.phonetic.fuzzy_soundex(word, max_length=5, zero_pad=True)[source]

Return the Fuzzy Soundex code for a word.

This is a wrapper for FuzzySoundex.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Fuzzy Soundex value

Return type:

str

Examples

>>> fuzzy_soundex('Christopher')
'K6931'
>>> fuzzy_soundex('Niall')
'N4000'
>>> fuzzy_soundex('Smith')
'S5300'
>>> fuzzy_soundex('Smith')
'S5300'
class abydos.phonetic.Lein[source]

Bases: abydos.phonetic._phonetic._Phonetic

Lein code.

This is Lein name coding, described in [MKTM77].

encode(word, max_length=4, zero_pad=True)[source]

Return the Lein code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Lein code

Return type:

str

Examples

>>> pe = Lein()
>>> pe.encode('Christopher')
'C351'
>>> pe.encode('Niall')
'N300'
>>> pe.encode('Smith')
'S210'
>>> pe.encode('Schmidt')
'S521'
abydos.phonetic.lein(word, max_length=4, zero_pad=True)[source]

Return the Lein code for a word.

This is a wrapper for Lein.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Lein code

Return type:

str

Examples

>>> lein('Christopher')
'C351'
>>> lein('Niall')
'N300'
>>> lein('Smith')
'S210'
>>> lein('Schmidt')
'S521'
class abydos.phonetic.Phonex[source]

Bases: abydos.phonetic._phonetic._Phonetic

Phonex code.

Phonex is an algorithm derived from Soundex, defined in [LR96].

encode(word, max_length=4, zero_pad=True)[source]

Return the Phonex code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Phonex value

Return type:

str

Examples

>>> pe = Phonex()
>>> pe.encode('Christopher')
'C623'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Schmidt')
'S253'
>>> pe.encode('Smith')
'S530'
abydos.phonetic.phonex(word, max_length=4, zero_pad=True)[source]

Return the Phonex code for a word.

This is a wrapper for Phonex.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Phonex value

Return type:

str

Examples

>>> phonex('Christopher')
'C623'
>>> phonex('Niall')
'N400'
>>> phonex('Schmidt')
'S253'
>>> phonex('Smith')
'S530'
class abydos.phonetic.Phonix[source]

Bases: abydos.phonetic._phonetic._Phonetic

Phonix code.

Phonix is a Soundex-like algorithm defined in [Gad90].

This implementation is based on: - [Pfe00] - [Chr11] - [Kollar]

encode(word, max_length=4, zero_pad=True)[source]

Return the Phonix code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Phonix value

Return type:

str

Examples

>>> pe = Phonix()
>>> pe.encode('Christopher')
'K683'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Schmidt')
'S530'
abydos.phonetic.phonix(word, max_length=4, zero_pad=True)[source]

Return the Phonix code for a word.

This is a wrapper for Phonix.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Phonix value

Return type:

str

Examples

>>> phonix('Christopher')
'K683'
>>> phonix('Niall')
'N400'
>>> phonix('Smith')
'S530'
>>> phonix('Schmidt')
'S530'
class abydos.phonetic.PSHPSoundexFirst[source]

Bases: abydos.phonetic._phonetic._Phonetic

PSHP Soundex/Viewex Coding of a first name.

This coding is based on [HBD76].

Reference was also made to the German version of the same: [HBD79].

A separate class, PSHPSoundexLast is used for last names.

encode(fname, max_length=4, german=False)[source]

Calculate the PSHP Soundex/Viewex Coding of a first name.

Parameters:
  • fname (str) -- The first name to encode
  • max_length (int) -- The length of the code returned (defaults to 4)
  • german (bool) -- Set to True if the name is German (different rules apply)
Returns:

The PSHP Soundex/Viewex Coding

Return type:

str

Examples

>>> pe = PSHPSoundexFirst()
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Waters')
'W352'
>>> pe.encode('James')
'J700'
>>> pe.encode('Schmidt')
'S500'
>>> pe.encode('Ashcroft')
'A220'
>>> pe.encode('John')
'J500'
>>> pe.encode('Colin')
'K400'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Sally')
'S400'
>>> pe.encode('Jane')
'J500'
abydos.phonetic.pshp_soundex_first(fname, max_length=4, german=False)[source]

Calculate the PSHP Soundex/Viewex Coding of a first name.

This is a wrapper for PSHPSoundexFirst.encode().

Parameters:
  • fname (str) -- The first name to encode
  • max_length (int) -- The length of the code returned (defaults to 4)
  • german (bool) -- Set to True if the name is German (different rules apply)
Returns:

The PSHP Soundex/Viewex Coding

Return type:

str

Examples

>>> pshp_soundex_first('Smith')
'S530'
>>> pshp_soundex_first('Waters')
'W352'
>>> pshp_soundex_first('James')
'J700'
>>> pshp_soundex_first('Schmidt')
'S500'
>>> pshp_soundex_first('Ashcroft')
'A220'
>>> pshp_soundex_first('John')
'J500'
>>> pshp_soundex_first('Colin')
'K400'
>>> pshp_soundex_first('Niall')
'N400'
>>> pshp_soundex_first('Sally')
'S400'
>>> pshp_soundex_first('Jane')
'J500'
class abydos.phonetic.PSHPSoundexLast[source]

Bases: abydos.phonetic._phonetic._Phonetic

PSHP Soundex/Viewex Coding of a last name.

This coding is based on [HBD76].

Reference was also made to the German version of the same: [HBD79].

A separate function, PSHPSoundexFirst is used for first names.

encode(lname, max_length=4, german=False)[source]

Calculate the PSHP Soundex/Viewex Coding of a last name.

Parameters:
  • lname (str) -- The last name to encode
  • max_length (int) -- The length of the code returned (defaults to 4)
  • german (bool) -- Set to True if the name is German (different rules apply)
Returns:

The PSHP Soundex/Viewex Coding

Return type:

str

Examples

>>> pe = PSHPSoundexLast()
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Waters')
'W350'
>>> pe.encode('James')
'J500'
>>> pe.encode('Schmidt')
'S530'
>>> pe.encode('Ashcroft')
'A225'
abydos.phonetic.pshp_soundex_last(lname, max_length=4, german=False)[source]

Calculate the PSHP Soundex/Viewex Coding of a last name.

This is a wrapper for PSHPSoundexLast.encode().

Parameters:
  • lname (str) -- The last name to encode
  • max_length (int) -- The length of the code returned (defaults to 4)
  • german (bool) -- Set to True if the name is German (different rules apply)
Returns:

The PSHP Soundex/Viewex Coding

Return type:

str

Examples

>>> pshp_soundex_last('Smith')
'S530'
>>> pshp_soundex_last('Waters')
'W350'
>>> pshp_soundex_last('James')
'J500'
>>> pshp_soundex_last('Schmidt')
'S530'
>>> pshp_soundex_last('Ashcroft')
'A225'
class abydos.phonetic.NYSIIS[source]

Bases: abydos.phonetic._phonetic._Phonetic

NYSIIS Code.

The New York State Identification and Intelligence System algorithm is defined in [Taf70].

The modified version of this algorithm is described in Appendix B of [LA77].

encode(word, max_length=6, modified=False)[source]

Return the NYSIIS code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 6) of the code to return
  • modified (bool) -- Indicates whether to use USDA modified NYSIIS
Returns:

The NYSIIS value

Return type:

str

Examples

>>> pe = NYSIIS()
>>> pe.encode('Christopher')
'CRASTA'
>>> pe.encode('Niall')
'NAL'
>>> pe.encode('Smith')
'SNAT'
>>> pe.encode('Schmidt')
'SNAD'
>>> pe.encode('Christopher', max_length=-1)
'CRASTAFAR'
>>> pe.encode('Christopher', max_length=8, modified=True)
'CRASTAFA'
>>> pe.encode('Niall', max_length=8, modified=True)
'NAL'
>>> pe.encode('Smith', max_length=8, modified=True)
'SNAT'
>>> pe.encode('Schmidt', max_length=8, modified=True)
'SNAD'
abydos.phonetic.nysiis(word, max_length=6, modified=False)[source]

Return the NYSIIS code for a word.

This is a wrapper for NYSIIS.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 6) of the code to return
  • modified (bool) -- Indicates whether to use USDA modified NYSIIS
Returns:

The NYSIIS value

Return type:

str

Examples

>>> nysiis('Christopher')
'CRASTA'
>>> nysiis('Niall')
'NAL'
>>> nysiis('Smith')
'SNAT'
>>> nysiis('Schmidt')
'SNAD'
>>> nysiis('Christopher', max_length=-1)
'CRASTAFAR'
>>> nysiis('Christopher', max_length=8, modified=True)
'CRASTAFA'
>>> nysiis('Niall', max_length=8, modified=True)
'NAL'
>>> nysiis('Smith', max_length=8, modified=True)
'SNAT'
>>> nysiis('Schmidt', max_length=8, modified=True)
'SNAD'
class abydos.phonetic.MRA[source]

Bases: abydos.phonetic._phonetic._Phonetic

Western Airlines Surname Match Rating Algorithm.

A description of the Western Airlines Surname Match Rating Algorithm can be found on page 18 of [MKTM77].

encode(word)[source]

Return the MRA personal numeric identifier (PNI) for a word.

Parameters:word (str) -- The word to transform
Returns:The MRA PNI
Return type:str

Examples

>>> pe = MRA()
>>> pe.encode('Christopher')
'CHRPHR'
>>> pe.encode('Niall')
'NL'
>>> pe.encode('Smith')
'SMTH'
>>> pe.encode('Schmidt')
'SCHMDT'
abydos.phonetic.mra(word)[source]

Return the MRA personal numeric identifier (PNI) for a word.

This is a wrapper for MRA.encode().

Parameters:word (str) -- The word to transform
Returns:The MRA PNI
Return type:str

Examples

>>> mra('Christopher')
'CHRPHR'
>>> mra('Niall')
'NL'
>>> mra('Smith')
'SMTH'
>>> mra('Schmidt')
'SCHMDT'
class abydos.phonetic.Caverphone[source]

Bases: abydos.phonetic._phonetic._Phonetic

Caverphone.

A description of version 1 of the algorithm can be found in [Hoo02].

A description of version 2 of the algorithm can be found in [Hoo04].

encode(word, version=2)[source]

Return the Caverphone code for a word.

Parameters:
  • word (str) -- The word to transform
  • version (int) -- The version of Caverphone to employ for encoding (defaults to 2)
Returns:

The Caverphone value

Return type:

str

Examples

>>> pe = Caverphone()
>>> pe.encode('Christopher')
'KRSTFA1111'
>>> pe.encode('Niall')
'NA11111111'
>>> pe.encode('Smith')
'SMT1111111'
>>> pe.encode('Schmidt')
'SKMT111111'
>>> pe.encode('Christopher', 1)
'KRSTF1'
>>> pe.encode('Niall', 1)
'N11111'
>>> pe.encode('Smith', 1)
'SMT111'
>>> pe.encode('Schmidt', 1)
'SKMT11'
abydos.phonetic.caverphone(word, version=2)[source]

Return the Caverphone code for a word.

This is a wrapper for Caverphone.encode().

Parameters:
  • word (str) -- The word to transform
  • version (int) -- The version of Caverphone to employ for encoding (defaults to 2)
Returns:

The Caverphone value

Return type:

str

Examples

>>> caverphone('Christopher')
'KRSTFA1111'
>>> caverphone('Niall')
'NA11111111'
>>> caverphone('Smith')
'SMT1111111'
>>> caverphone('Schmidt')
'SKMT111111'
>>> caverphone('Christopher', 1)
'KRSTF1'
>>> caverphone('Niall', 1)
'N11111'
>>> caverphone('Smith', 1)
'SMT111'
>>> caverphone('Schmidt', 1)
'SKMT11'
class abydos.phonetic.AlphaSIS[source]

Bases: abydos.phonetic._phonetic._Phonetic

Alpha-SIS.

The Alpha Search Inquiry System code is defined in [IBMCorporation73]. This implementation is based on the description in [MKTM77].

encode(word, max_length=14)[source]

Return the IBM Alpha Search Inquiry System code for a word.

A collection is necessary as the return type since there can be multiple values for a single word. But the collection must be ordered since the first value is the primary coding.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 14)
Returns:

The Alpha-SIS value

Return type:

tuple

Examples

>>> pe = AlphaSIS()
>>> pe.encode('Christopher')
('06401840000000', '07040184000000', '04018400000000')
>>> pe.encode('Niall')
('02500000000000',)
>>> pe.encode('Smith')
('03100000000000',)
>>> pe.encode('Schmidt')
('06310000000000',)
abydos.phonetic.alpha_sis(word, max_length=14)[source]

Return the IBM Alpha Search Inquiry System code for a word.

This is a wrapper for AlphaSIS.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 14)
Returns:

The Alpha-SIS value

Return type:

tuple

Examples

>>> alpha_sis('Christopher')
('06401840000000', '07040184000000', '04018400000000')
>>> alpha_sis('Niall')
('02500000000000',)
>>> alpha_sis('Smith')
('03100000000000',)
>>> alpha_sis('Schmidt')
('06310000000000',)
class abydos.phonetic.Davidson[source]

Bases: abydos.phonetic._phonetic._Phonetic

Davidson Consonant Code.

This is based on the name compression system described in [Dav62].

[Dol70] identifies this as having been the name compression algorithm used by SABRE.

encode(lname, fname='.', omit_fname=False)[source]

Return Davidson's Consonant Code.

Parameters:
  • lname (str) -- Last name (or word) to be encoded
  • fname (str) -- First name (optional), of which the first character is included in the code.
  • omit_fname (bool) -- Set to True to completely omit the first character of the first name
Returns:

Davidson's Consonant Code

Return type:

str

Example

>>> pe = Davidson()
>>> pe.encode('Gough')
'G   .'
>>> pe.encode('pneuma')
'PNM .'
>>> pe.encode('knight')
'KNGT.'
>>> pe.encode('trice')
'TRC .'
>>> pe.encode('judge')
'JDG .'
>>> pe.encode('Smith', 'James')
'SMT J'
>>> pe.encode('Wasserman', 'Tabitha')
'WSRMT'
abydos.phonetic.davidson(lname, fname='.', omit_fname=False)[source]

Return Davidson's Consonant Code.

This is a wrapper for Davidson.encode().

Parameters:
  • lname (str) -- Last name (or word) to be encoded
  • fname (str) -- First name (optional), of which the first character is included in the code.
  • omit_fname (bool) -- Set to True to completely omit the first character of the first name
Returns:

Davidson's Consonant Code

Return type:

str

Example

>>> davidson('Gough')
'G   .'
>>> davidson('pneuma')
'PNM .'
>>> davidson('knight')
'KNGT.'
>>> davidson('trice')
'TRC .'
>>> davidson('judge')
'JDG .'
>>> davidson('Smith', 'James')
'SMT J'
>>> davidson('Wasserman', 'Tabitha')
'WSRMT'
class abydos.phonetic.Dolby[source]

Bases: abydos.phonetic._phonetic._Phonetic

Dolby Code.

This follows "A Spelling Equivalent Abbreviation Algorithm For Personal Names" from [Dol70] and [C+69].

encode(word, max_length=-1, keep_vowels=False, vowel_char='*')[source]

Return the Dolby Code of a name.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
  • keep_vowels (bool) -- If True, retains all vowel markers
  • vowel_char (str) -- The vowel marker character (default to *)
Returns:

The Dolby Code

Return type:

str

Examples

>>> pe = Dolby()
>>> pe.encode('Hansen')
'H*NSN'
>>> pe.encode('Larsen')
'L*RSN'
>>> pe.encode('Aagaard')
'*GR'
>>> pe.encode('Braaten')
'BR*DN'
>>> pe.encode('Sandvik')
'S*NVK'
>>> pe.encode('Hansen', max_length=6)
'H*NS*N'
>>> pe.encode('Larsen', max_length=6)
'L*RS*N'
>>> pe.encode('Aagaard', max_length=6)
'*G*R  '
>>> pe.encode('Braaten', max_length=6)
'BR*D*N'
>>> pe.encode('Sandvik', max_length=6)
'S*NF*K'
>>> pe.encode('Smith')
'SM*D'
>>> pe.encode('Waters')
'W*DRS'
>>> pe.encode('James')
'J*MS'
>>> pe.encode('Schmidt')
'SM*D'
>>> pe.encode('Ashcroft')
'*SKRFD'
>>> pe.encode('Smith', max_length=6)
'SM*D  '
>>> pe.encode('Waters', max_length=6)
'W*D*RS'
>>> pe.encode('James', max_length=6)
'J*M*S '
>>> pe.encode('Schmidt', max_length=6)
'SM*D  '
>>> pe.encode('Ashcroft', max_length=6)
'*SKRFD'
abydos.phonetic.dolby(word, max_length=-1, keep_vowels=False, vowel_char='*')[source]

Return the Dolby Code of a name.

This is a wrapper for Dolby.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
  • keep_vowels (bool) -- If True, retains all vowel markers
  • vowel_char (str) -- The vowel marker character (default to *)
Returns:

The Dolby Code

Return type:

str

Examples

>>> dolby('Hansen')
'H*NSN'
>>> dolby('Larsen')
'L*RSN'
>>> dolby('Aagaard')
'*GR'
>>> dolby('Braaten')
'BR*DN'
>>> dolby('Sandvik')
'S*NVK'
>>> dolby('Hansen', max_length=6)
'H*NS*N'
>>> dolby('Larsen', max_length=6)
'L*RS*N'
>>> dolby('Aagaard', max_length=6)
'*G*R  '
>>> dolby('Braaten', max_length=6)
'BR*D*N'
>>> dolby('Sandvik', max_length=6)
'S*NF*K'
>>> dolby('Smith')
'SM*D'
>>> dolby('Waters')
'W*DRS'
>>> dolby('James')
'J*MS'
>>> dolby('Schmidt')
'SM*D'
>>> dolby('Ashcroft')
'*SKRFD'
>>> dolby('Smith', max_length=6)
'SM*D  '
>>> dolby('Waters', max_length=6)
'W*D*RS'
>>> dolby('James', max_length=6)
'J*M*S '
>>> dolby('Schmidt', max_length=6)
'SM*D  '
>>> dolby('Ashcroft', max_length=6)
'*SKRFD'
class abydos.phonetic.SPFC[source]

Bases: abydos.phonetic._phonetic._Phonetic

Standardized Phonetic Frequency Code (SPFC).

Standardized Phonetic Frequency Code is roughly Soundex-like. This implementation is based on page 19-21 of [MKTM77].

encode(word)[source]

Return the Standardized Phonetic Frequency Code (SPFC) of a word.

Parameters:word (str) -- The word to transform
Returns:The SPFC value
Return type:str
Raises:AttributeError -- Word attribute must be a string with a space or period dividing the first and last names or a tuple/list consisting of the first and last names

Examples

>>> pe = SPFC()
>>> pe.encode('Christopher Smith')
'01160'
>>> pe.encode('Christopher Schmidt')
'01160'
>>> pe.encode('Niall Smith')
'01660'
>>> pe.encode('Niall Schmidt')
'01660'
>>> pe.encode('L.Smith')
'01960'
>>> pe.encode('R.Miller')
'65490'
>>> pe.encode(('L', 'Smith'))
'01960'
>>> pe.encode(('R', 'Miller'))
'65490'
abydos.phonetic.spfc(word)[source]

Return the Standardized Phonetic Frequency Code (SPFC) of a word.

This is a wrapper for SPFC.encode().

Parameters:word (str) -- The word to transform
Returns:The SPFC value
Return type:str

Examples

>>> spfc('Christopher Smith')
'01160'
>>> spfc('Christopher Schmidt')
'01160'
>>> spfc('Niall Smith')
'01660'
>>> spfc('Niall Schmidt')
'01660'
>>> spfc('L.Smith')
'01960'
>>> spfc('R.Miller')
'65490'
>>> spfc(('L', 'Smith'))
'01960'
>>> spfc(('R', 'Miller'))
'65490'
class abydos.phonetic.RogerRoot[source]

Bases: abydos.phonetic._phonetic._Phonetic

Roger Root code.

This is Roger Root name coding, described in [MKTM77].

encode(word, max_length=5, zero_pad=True)[source]

Return the Roger Root code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 5) of the code to return
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Roger Root code

Return type:

str

Examples

>>> roger_root('Christopher')
'06401'
>>> roger_root('Niall')
'02500'
>>> roger_root('Smith')
'00310'
>>> roger_root('Schmidt')
'06310'
abydos.phonetic.roger_root(word, max_length=5, zero_pad=True)[source]

Return the Roger Root code for a word.

This is a wrapper for RogerRoot.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 5) of the code to return
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The Roger Root code

Return type:

str

Examples

>>> roger_root('Christopher')
'06401'
>>> roger_root('Niall')
'02500'
>>> roger_root('Smith')
'00310'
>>> roger_root('Schmidt')
'06310'
class abydos.phonetic.StatisticsCanada[source]

Bases: abydos.phonetic._phonetic._Phonetic

Statistics Canada code.

The original description of this algorithm could not be located, and may only have been specified in an unpublished TR. The coding does not appear to be in use by Statistics Canada any longer. In its place, this is an implementation of the "Census modified Statistics Canada name coding procedure".

The modified version of this algorithm is described in Appendix B of [MKTM77].

encode(word, max_length=4)[source]

Return the Statistics Canada code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 4) of the code to return
Returns:

The Statistics Canada name code value

Return type:

str

Examples

>>> pe = StatisticsCanada()
>>> pe.encode('Christopher')
'CHRS'
>>> pe.encode('Niall')
'NL'
>>> pe.encode('Smith')
'SMTH'
>>> pe.encode('Schmidt')
'SCHM'
abydos.phonetic.statistics_canada(word, max_length=4)[source]

Return the Statistics Canada code for a word.

This is a wrapper for StatisticsCanada.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 4) of the code to return
Returns:

The Statistics Canada name code value

Return type:

str

Examples

>>> statistics_canada('Christopher')
'CHRS'
>>> statistics_canada('Niall')
'NL'
>>> statistics_canada('Smith')
'SMTH'
>>> statistics_canada('Schmidt')
'SCHM'
class abydos.phonetic.SoundD[source]

Bases: abydos.phonetic._phonetic._Phonetic

SoundD code.

SoundD is defined in [VB12].

encode(word, max_length=4)[source]

Return the SoundD code.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
Returns:

The SoundD code

Return type:

str

Examples

>>> sound_d('Gough')
'2000'
>>> sound_d('pneuma')
'5500'
>>> sound_d('knight')
'5300'
>>> sound_d('trice')
'3620'
>>> sound_d('judge')
'2200'
abydos.phonetic.sound_d(word, max_length=4)[source]

Return the SoundD code.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
Returns:

The SoundD code

Return type:

str

Examples

>>> sound_d('Gough')
'2000'
>>> sound_d('pneuma')
'5500'
>>> sound_d('knight')
'5300'
>>> sound_d('trice')
'3620'
>>> sound_d('judge')
'2200'
class abydos.phonetic.ParmarKumbharana[source]

Bases: abydos.phonetic._phonetic._Phonetic

Parmar-Kumbharana code.

This is based on the phonetic algorithm proposed in [PK14].

encode(word)[source]

Return the Parmar-Kumbharana encoding of a word.

Parameters:word (str) -- The word to transform
Returns:The Parmar-Kumbharana encoding
Return type:str

Examples

>>> pe = ParmarKumbharana()
>>> pe.encode('Gough')
'GF'
>>> pe.encode('pneuma')
'NM'
>>> pe.encode('knight')
'NT'
>>> pe.encode('trice')
'TRS'
>>> pe.encode('judge')
'JJ'
abydos.phonetic.parmar_kumbharana(word)[source]

Return the Parmar-Kumbharana encoding of a word.

This is a wrapper for ParmarKumbharana.encode().

Parameters:word (str) -- The word to transform
Returns:The Parmar-Kumbharana encoding
Return type:str

Examples

>>> parmar_kumbharana('Gough')
'GF'
>>> parmar_kumbharana('pneuma')
'NM'
>>> parmar_kumbharana('knight')
'NT'
>>> parmar_kumbharana('trice')
'TRS'
>>> parmar_kumbharana('judge')
'JJ'
class abydos.phonetic.Metaphone[source]

Bases: abydos.phonetic._phonetic._Phonetic

Metaphone.

Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].

encode(word, max_length=-1)[source]

Return the Metaphone code for a word.

Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)
Returns:

The Metaphone value

Return type:

str

Examples

>>> pe = Metaphone()
>>> pe.encode('Christopher')
'KRSTFR'
>>> pe.encode('Niall')
'NL'
>>> pe.encode('Smith')
'SM0'
>>> pe.encode('Schmidt')
'SKMTT'
abydos.phonetic.metaphone(word, max_length=-1)[source]

Return the Metaphone code for a word.

This is a wrapper for Metaphone.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)
Returns:

The Metaphone value

Return type:

str

Examples

>>> metaphone('Christopher')
'KRSTFR'
>>> metaphone('Niall')
'NL'
>>> metaphone('Smith')
'SM0'
>>> metaphone('Schmidt')
'SKMTT'
class abydos.phonetic.DoubleMetaphone[source]

Bases: abydos.phonetic._phonetic._Phonetic

Double Metaphone.

Based on Lawrence Philips' (Visual) C++ code from 1999 [Phi00].

encode(word, max_length=-1)[source]

Return the Double Metaphone code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length of the returned Double Metaphone codes (defaults to unlmited, but in Philips' original implementation this was 4)
Returns:

The Double Metaphone value(s)

Return type:

tuple

Examples

>>> pe = DoubleMetaphone()
>>> pe.encode('Christopher')
('KRSTFR', '')
>>> pe.encode('Niall')
('NL', '')
>>> pe.encode('Smith')
('SM0', 'XMT')
>>> pe.encode('Schmidt')
('XMT', 'SMT')
abydos.phonetic.double_metaphone(word, max_length=-1)[source]

Return the Double Metaphone code for a word.

This is a wrapper for DoubleMetaphone.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length of the returned Double Metaphone codes (defaults to unlimited, but in Philips' original implementation this was 4)
Returns:

The Double Metaphone value(s)

Return type:

tuple

Examples

>>> double_metaphone('Christopher')
('KRSTFR', '')
>>> double_metaphone('Niall')
('NL', '')
>>> double_metaphone('Smith')
('SM0', 'XMT')
>>> double_metaphone('Schmidt')
('XMT', 'SMT')
class abydos.phonetic.Eudex[source]

Bases: abydos.phonetic._phonetic._Phonetic

Eudex hash.

This implementation of eudex phonetic hashing is based on the specification (not the reference implementation) at [Tic].

Further details can be found at [Tic16].

encode(word, max_length=8)[source]

Return the eudex phonetic hash of a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length in bits of the code returned (default 8)
Returns:

The eudex hash

Return type:

int

Examples

>>> pe = Eudex()
>>> pe.encode('Colin')
432345564238053650
>>> pe.encode('Christopher')
433648490138894409
>>> pe.encode('Niall')
648518346341351840
>>> pe.encode('Smith')
720575940412906756
>>> pe.encode('Schmidt')
720589151732307997
abydos.phonetic.eudex(word, max_length=8)[source]

Return the eudex phonetic hash of a word.

This is a wrapper for Eudex.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length in bits of the code returned (default 8)
Returns:

The eudex hash

Return type:

int

Examples

>>> eudex('Colin')
432345564238053650
>>> eudex('Christopher')
433648490138894409
>>> eudex('Niall')
648518346341351840
>>> eudex('Smith')
720575940412906756
>>> eudex('Schmidt')
720589151732307997
class abydos.phonetic.BeiderMorse[source]

Bases: abydos.phonetic._phonetic._Phonetic

Beider-Morse Phonetic Matching.

The Beider-Morse Phonetic Matching algorithm is described in [BM08]. The reference implementation is licensed under GPLv3.

encode(word, language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]

Return the Beider-Morse Phonetic Matching encoding(s) of a term.

Parameters:
  • word (str) -- The word to transform
  • language_arg (int) --

    The language of the term; supported values include:

    • any
    • arabic
    • cyrillic
    • czech
    • dutch
    • english
    • french
    • german
    • greek
    • greeklatin
    • hebrew
    • hungarian
    • italian
    • latvian
    • polish
    • portuguese
    • romanian
    • russian
    • spanish
    • turkish
  • name_mode (str) --

    The name mode of the algorithm:

    • gen -- general (default)
    • ash -- Ashkenazi
    • sep -- Sephardic
  • match_mode (str) -- Matching mode: approx or exact
  • concat (bool) -- Concatenation mode
  • filter_langs (bool) -- Filter out incompatible languages
Returns:

The Beider-Morse phonetic value(s)

Return type:

tuple

Raises:

ValueError -- Unknown language

Examples

>>> pe = BeiderMorse()
>>> pe.encode('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir
xristofir xristYfir xristopi xritopir xritopi xristofi xritofir
xritofi tzristopir tzristofir zristopir zristopi zritopir zritopi
zristofir zristofi zritofir zritofi'
>>> pe.encode('Niall')
'nial niol'
>>> pe.encode('Smith')
'zmit'
>>> pe.encode('Schmidt')
'zmit stzmit'
>>> pe.encode('Christopher', language_arg='German')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir
xristofir xristYfir'
>>> pe.encode('Christopher', language_arg='English')
'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir
xristafir xrQstafir'
>>> pe.encode('Christopher', language_arg='German', name_mode='ash')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir
xristofir xristYfir'
>>> pe.encode('Christopher', language_arg='German', match_mode='exact')
'xriStopher xriStofer xristopher xristofer'
abydos.phonetic.bmpm(word, language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]

Return the Beider-Morse Phonetic Matching encoding(s) of a term.

This is a wrapper for BeiderMorse.encode().

Parameters:
  • word (str) -- The word to transform
  • language_arg (str) --

    The language of the term; supported values include:

    • any
    • arabic
    • cyrillic
    • czech
    • dutch
    • english
    • french
    • german
    • greek
    • greeklatin
    • hebrew
    • hungarian
    • italian
    • latvian
    • polish
    • portuguese
    • romanian
    • russian
    • spanish
    • turkish
  • name_mode (str) --

    The name mode of the algorithm:

    • gen -- general (default)
    • ash -- Ashkenazi
    • sep -- Sephardic
  • match_mode (str) -- Matching mode: approx or exact
  • concat (bool) -- Concatenation mode
  • filter_langs (bool) -- Filter out incompatible languages
Returns:

The Beider-Morse phonetic value(s)

Return type:

tuple

Examples

>>> bmpm('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir xristopi xritopir xritopi xristofi xritofir xritofi
tzristopir tzristofir zristopir zristopi zritopir zritopi zristofir
zristofi zritofir zritofi'
>>> bmpm('Niall')
'nial niol'
>>> bmpm('Smith')
'zmit'
>>> bmpm('Schmidt')
'zmit stzmit'
>>> bmpm('Christopher', language_arg='German')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir'
>>> bmpm('Christopher', language_arg='English')
'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir
xristafir xrQstafir'
>>> bmpm('Christopher', language_arg='German', name_mode='ash')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir'
>>> bmpm('Christopher', language_arg='German', match_mode='exact')
'xriStopher xriStofer xristopher xristofer'
class abydos.phonetic.NRL[source]

Bases: abydos.phonetic._phonetic._Phonetic

Naval Research Laboratory English-to-phoneme encoder.

This is defined by [EJMS76].

encode(word)[source]

Return the Naval Research Laboratory phonetic encoding of a word.

Parameters:word (str) -- The word to transform
Returns:The NRL phonetic encoding
Return type:str

Examples

>>> pe = NRL()
>>> pe.encode('the')
'DHAX'
>>> pe.encode('round')
'rAWnd'
>>> pe.encode('quick')
'kwIHk'
>>> pe.encode('eaten')
'IYtEHn'
>>> pe.encode('Smith')
'smIHTH'
>>> pe.encode('Larsen')
'lAArsEHn'
abydos.phonetic.nrl(word)[source]

Return the Naval Research Laboratory phonetic encoding of a word.

This is a wrapper for NRL.encode().

Parameters:word (str) -- The word to transform
Returns:The NRL phonetic encoding
Return type:str

Examples

>>> nrl('the')
'DHAX'
>>> nrl('round')
'rAWnd'
>>> nrl('quick')
'kwIHk'
>>> nrl('eaten')
'IYtEHn'
>>> nrl('Smith')
'smIHTH'
>>> nrl('Larsen')
'lAArsEHn'
class abydos.phonetic.MetaSoundex[source]

Bases: abydos.phonetic._phonetic._Phonetic

MetaSoundex.

This is based on [KV17]. Only English ('en') and Spanish ('es') languages are supported, as in the original.

encode(word, lang='en')[source]

Return the MetaSoundex code for a word.

Parameters:
  • word (str) -- The word to transform
  • lang (str) -- Either en for English or es for Spanish
Returns:

The MetaSoundex code

Return type:

str

Examples

>>> pe = MetaSoundex()
>>> pe.encode('Smith')
'4500'
>>> pe.encode('Waters')
'7362'
>>> pe.encode('James')
'1520'
>>> pe.encode('Schmidt')
'4530'
>>> pe.encode('Ashcroft')
'0261'
>>> pe.encode('Perez', lang='es')
'094'
>>> pe.encode('Martinez', lang='es')
'69364'
>>> pe.encode('Gutierrez', lang='es')
'83994'
>>> pe.encode('Santiago', lang='es')
'4638'
>>> pe.encode('Nicolás', lang='es')
'6754'
abydos.phonetic.metasoundex(word, lang='en')[source]

Return the MetaSoundex code for a word.

This is a wrapper for MetaSoundex.encode().

Parameters:
  • word (str) -- The word to transform
  • lang (str) -- Either en for English or es for Spanish
Returns:

The MetaSoundex code

Return type:

str

Examples

>>> metasoundex('Smith')
'4500'
>>> metasoundex('Waters')
'7362'
>>> metasoundex('James')
'1520'
>>> metasoundex('Schmidt')
'4530'
>>> metasoundex('Ashcroft')
'0261'
>>> metasoundex('Perez', lang='es')
'094'
>>> metasoundex('Martinez', lang='es')
'69364'
>>> metasoundex('Gutierrez', lang='es')
'83994'
>>> metasoundex('Santiago', lang='es')
'4638'
>>> metasoundex('Nicolás', lang='es')
'6754'
class abydos.phonetic.ONCA[source]

Bases: abydos.phonetic._phonetic._Phonetic

Oxford Name Compression Algorithm (ONCA).

This is the Oxford Name Compression Algorithm, based on [Gil97].

I can find no complete description of the "anglicised version of the NYSIIS method" identified as the first step in this algorithm, so this is likely not a precisely correct implementation, in that it employs the standard NYSIIS algorithm.

encode(word, max_length=4, zero_pad=True)[source]

Return the Oxford Name Compression Algorithm (ONCA) code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 5) of the code to return
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The ONCA code

Return type:

str

Examples

>>> pe = ONCA()
>>> pe.encode('Christopher')
'C623'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Schmidt')
'S530'
abydos.phonetic.onca(word, max_length=4, zero_pad=True)[source]

Return the Oxford Name Compression Algorithm (ONCA) code for a word.

This is a wrapper for ONCA.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The maximum length (default 5) of the code to return
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The ONCA code

Return type:

str

Examples

>>> onca('Christopher')
'C623'
>>> onca('Niall')
'N400'
>>> onca('Smith')
'S530'
>>> onca('Schmidt')
'S530'
class abydos.phonetic.FONEM[source]

Bases: abydos.phonetic._phonetic._Phonetic

FONEM.

FONEM is a phonetic algorithm designed for French (particularly surnames in Saguenay, Canada), defined in [BBL81].

Guillaume Plique's Javascript implementation [Pli18] at https://github.com/Yomguithereal/talisman/blob/master/src/phonetics/french/fonem.js was also consulted for this implementation.

encode(word)[source]

Return the FONEM code of a word.

Parameters:word (str) -- The word to transform
Returns:The FONEM code
Return type:str

Examples

>>> pe = FONEM()
>>> pe.encode('Marchand')
'MARCHEN'
>>> pe.encode('Beaulieu')
'BOLIEU'
>>> pe.encode('Beaumont')
'BOMON'
>>> pe.encode('Legrand')
'LEGREN'
>>> pe.encode('Pelletier')
'PELETIER'
abydos.phonetic.fonem(word)[source]

Return the FONEM code of a word.

This is a wrapper for FONEM.encode().

Parameters:word (str) -- The word to transform
Returns:The FONEM code
Return type:str

Examples

>>> fonem('Marchand')
'MARCHEN'
>>> fonem('Beaulieu')
'BOLIEU'
>>> fonem('Beaumont')
'BOMON'
>>> fonem('Legrand')
'LEGREN'
>>> fonem('Pelletier')
'PELETIER'
class abydos.phonetic.HenryEarly[source]

Bases: abydos.phonetic._phonetic._Phonetic

Henry code, early version.

The early version of Henry coding is given in [LegareLC72]. This is different from the later version defined in [Hen76].

encode(word, max_length=3)[source]

Calculate the early version of the Henry code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 3)
Returns:

The early Henry code

Return type:

str

Examples

>>> henry_early('Marchand')
'MRC'
>>> henry_early('Beaulieu')
'BL'
>>> henry_early('Beaumont')
'BM'
>>> henry_early('Legrand')
'LGR'
>>> henry_early('Pelletier')
'PLT'
abydos.phonetic.henry_early(word, max_length=3)[source]

Calculate the early version of the Henry code for a word.

This is a wrapper for HenryEarly.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 3)
Returns:

The early Henry code

Return type:

str

Examples

>>> henry_early('Marchand')
'MRC'
>>> henry_early('Beaulieu')
'BL'
>>> henry_early('Beaumont')
'BM'
>>> henry_early('Legrand')
'LGR'
>>> henry_early('Pelletier')
'PLT'
class abydos.phonetic.Koelner[source]

Bases: abydos.phonetic._phonetic._Phonetic

Kölner Phonetik.

Based on the algorithm defined by [Pos69].

encode(word)[source]

Return the Kölner Phonetik (numeric output) code for a word.

While the output code is numeric, it is still a str because 0s can lead the code.

Parameters:word (str) -- The word to transform
Returns:The Kölner Phonetik value as a numeric string
Return type:str

Example

>>> pe = Koelner()
>>> pe.encode('Christopher')
'478237'
>>> pe.encode('Niall')
'65'
>>> pe.encode('Smith')
'862'
>>> pe.encode('Schmidt')
'862'
>>> pe.encode('Müller')
'657'
>>> pe.encode('Zimmermann')
'86766'
encode_alpha(word)[source]

Return the Kölner Phonetik (alphabetic output) code for a word.

Parameters:word (str) -- The word to transform
Returns:The Kölner Phonetik value as an alphabetic string
Return type:str

Examples

>>> pe = Koelner()
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SNT'
>>> pe.encode_alpha('Müller')
'NLR'
>>> pe.encode_alpha('Zimmermann')
'SNRNN'
abydos.phonetic.koelner_phonetik(word)[source]

Return the Kölner Phonetik (numeric output) code for a word.

This is a wrapper for Koelner.encode().

Parameters:word (str) -- The word to transform
Returns:The Kölner Phonetik value as a numeric string
Return type:str

Example

>>> koelner_phonetik('Christopher')
'478237'
>>> koelner_phonetik('Niall')
'65'
>>> koelner_phonetik('Smith')
'862'
>>> koelner_phonetik('Schmidt')
'862'
>>> koelner_phonetik('Müller')
'657'
>>> koelner_phonetik('Zimmermann')
'86766'
abydos.phonetic.koelner_phonetik_num_to_alpha(num)[source]

Convert a Kölner Phonetik code from numeric to alphabetic.

This is a wrapper for Koelner._to_alpha().

Parameters:num (str or int) -- A numeric Kölner Phonetik representation
Returns:An alphabetic representation of the same word
Return type:str

Examples

>>> koelner_phonetik_num_to_alpha('862')
'SNT'
>>> koelner_phonetik_num_to_alpha('657')
'NLR'
>>> koelner_phonetik_num_to_alpha('86766')
'SNRNN'
abydos.phonetic.koelner_phonetik_alpha(word)[source]

Return the Kölner Phonetik (alphabetic output) code for a word.

This is a wrapper for Koelner.encode_alpha().

Parameters:word (str) -- The word to transform
Returns:The Kölner Phonetik value as an alphabetic string
Return type:str

Examples

>>> koelner_phonetik_alpha('Smith')
'SNT'
>>> koelner_phonetik_alpha('Schmidt')
'SNT'
>>> koelner_phonetik_alpha('Müller')
'NLR'
>>> koelner_phonetik_alpha('Zimmermann')
'SNRNN'
class abydos.phonetic.Haase[source]

Bases: abydos.phonetic._phonetic._Phonetic

Haase Phonetik.

Based on the algorithm described at [Pra15].

Based on the original [HH00].

encode(word, primary_only=False)[source]

Return the Haase Phonetik (numeric output) code for a word.

While the output code is numeric, it is nevertheless a str.

Parameters:
  • word (str) -- The word to transform
  • primary_only (bool) -- If True, only the primary code is returned
Returns:

The Haase Phonetik value as a numeric string

Return type:

tuple

Examples

>>> pe = Haase()
>>> pe.encode('Joachim')
('9496',)
>>> pe.encode('Christoph')
('4798293', '8798293')
>>> pe.encode('Jörg')
('974',)
>>> pe.encode('Smith')
('8692',)
>>> pe.encode('Schmidt')
('8692', '4692')
abydos.phonetic.haase_phonetik(word, primary_only=False)[source]

Return the Haase Phonetik (numeric output) code for a word.

This is a wrapper for Haase.encode().

Parameters:
  • word (str) -- The word to transform
  • primary_only (bool) -- If True, only the primary code is returned
Returns:

The Haase Phonetik value as a numeric string

Return type:

tuple

Examples

>>> haase_phonetik('Joachim')
('9496',)
>>> haase_phonetik('Christoph')
('4798293', '8798293')
>>> haase_phonetik('Jörg')
('974',)
>>> haase_phonetik('Smith')
('8692',)
>>> haase_phonetik('Schmidt')
('8692', '4692')
class abydos.phonetic.RethSchek[source]

Bases: abydos.phonetic._phonetic._Phonetic

Reth-Schek Phonetik.

This algorithm is proposed in [vonRethS77].

Since I couldn't secure a copy of that document (maybe I'll look for it next time I'm in Germany), this implementation is based on what I could glean from the implementations published by German Record Linkage Center (www.record-linkage.de):

  • Privacy-preserving Record Linkage (PPRL) (in R) [Ruk18]
  • Merge ToolBox (in Java) [SBB04]

Rules that are unclear:

  • Should 'C' become 'G' or 'Z'? (PPRL has both, 'Z' rule blocked)
  • Should 'CC' become 'G'? (PPRL has blocked 'CK' that may be typo)
  • Should 'TUI' -> 'ZUI' rule exist? (PPRL has rule, but I can't think of a German word with '-tui-' in it.)
  • Should we really change 'SCH' -> 'CH' and then 'CH' -> 'SCH'?
encode(word)[source]

Return Reth-Schek Phonetik code for a word.

Parameters:word (str) -- The word to transform
Returns:The Reth-Schek Phonetik code
Return type:str

Examples

>>> reth_schek_phonetik('Joachim')
'JOAGHIM'
>>> reth_schek_phonetik('Christoph')
'GHRISDOF'
>>> reth_schek_phonetik('Jörg')
'JOERG'
>>> reth_schek_phonetik('Smith')
'SMID'
>>> reth_schek_phonetik('Schmidt')
'SCHMID'
abydos.phonetic.reth_schek_phonetik(word)[source]

Return Reth-Schek Phonetik code for a word.

This is a wrapper for RethSchek.encode().

Parameters:word (str) -- The word to transform
Returns:The Reth-Schek Phonetik code
Return type:str

Examples

>>> reth_schek_phonetik('Joachim')
'JOAGHIM'
>>> reth_schek_phonetik('Christoph')
'GHRISDOF'
>>> reth_schek_phonetik('Jörg')
'JOERG'
>>> reth_schek_phonetik('Smith')
'SMID'
>>> reth_schek_phonetik('Schmidt')
'SCHMID'
class abydos.phonetic.Phonem[source]

Bases: abydos.phonetic._phonetic._Phonetic

Phonem.

Phonem is defined in [GM88].

This version is based on the Perl implementation documented at [Wil05]. It includes some enhancements presented in the Java port at [dcm4che].

Phonem is intended chiefly for German names/words.

encode(word)[source]

Return the Phonem code for a word.

Parameters:
  • word (str) --
  • word to transform (The) --
Returns:

The Phonem value

Return type:

str

Examples

>>> pe = Phonem()
>>> pe.encode('Christopher')
'CRYSDOVR'
>>> pe.encode('Niall')
'NYAL'
>>> pe.encode('Smith')
'SMYD'
>>> pe.encode('Schmidt')
'CMYD'
abydos.phonetic.phonem(word)[source]

Return the Phonem code for a word.

This is a wrapper for Phonem.encode().

Parameters:word (str) -- The word to transform
Returns:The Phonem value
Return type:str

Examples

>>> phonem('Christopher')
'CRYSDOVR'
>>> phonem('Niall')
'NYAL'
>>> phonem('Smith')
'SMYD'
>>> phonem('Schmidt')
'CMYD'
class abydos.phonetic.Phonet[source]

Bases: abydos.phonetic._phonetic._Phonetic

Phonet code.

phonet ("Hannoveraner Phonetik") was developed by Jörg Michael and documented in [Mic99].

This is a port of Jesper Zedlitz's code, which is licensed LGPL [Zed15].

That is, in turn, based on Michael's C code, which is also licensed LGPL [Mic07].

encode(word, mode=1, lang='de')[source]

Return the phonet code for a word.

Parameters:
  • word (str) -- The word to transform
  • mode (int) -- The ponet variant to employ (1 or 2)
  • lang (str) -- de (default) for German, none for no language
Returns:

The phonet value

Return type:

str

Examples

>>> pe = Phonet()
>>> pe.encode('Christopher')
'KRISTOFA'
>>> pe.encode('Niall')
'NIAL'
>>> pe.encode('Smith')
'SMIT'
>>> pe.encode('Schmidt')
'SHMIT'
>>> pe.encode('Christopher', mode=2)
'KRIZTUFA'
>>> pe.encode('Niall', mode=2)
'NIAL'
>>> pe.encode('Smith', mode=2)
'ZNIT'
>>> pe.encode('Schmidt', mode=2)
'ZNIT'
>>> pe.encode('Christopher', lang='none')
'CHRISTOPHER'
>>> pe.encode('Niall', lang='none')
'NIAL'
>>> pe.encode('Smith', lang='none')
'SMITH'
>>> pe.encode('Schmidt', lang='none')
'SCHMIDT'
abydos.phonetic.phonet(word, mode=1, lang='de')[source]

Return the phonet code for a word.

This is a wrapper for Phonet.encode().

Parameters:
  • word (str) -- The word to transform
  • mode (int) -- The ponet variant to employ (1 or 2)
  • lang (str) -- de (default) for German, none for no language
Returns:

The phonet value

Return type:

str

Examples

>>> phonet('Christopher')
'KRISTOFA'
>>> phonet('Niall')
'NIAL'
>>> phonet('Smith')
'SMIT'
>>> phonet('Schmidt')
'SHMIT'
>>> phonet('Christopher', mode=2)
'KRIZTUFA'
>>> phonet('Niall', mode=2)
'NIAL'
>>> phonet('Smith', mode=2)
'ZNIT'
>>> phonet('Schmidt', mode=2)
'ZNIT'
>>> phonet('Christopher', lang='none')
'CHRISTOPHER'
>>> phonet('Niall', lang='none')
'NIAL'
>>> phonet('Smith', lang='none')
'SMITH'
>>> phonet('Schmidt', lang='none')
'SCHMIDT'
class abydos.phonetic.SoundexBR[source]

Bases: abydos.phonetic._phonetic._Phonetic

SoundexBR.

This is based on [Mar15].

encode(word, max_length=4, zero_pad=True)[source]

Return the SoundexBR encoding of a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The SoundexBR code

Return type:

str

Examples

>>> soundex_br('Oliveira')
'O416'
>>> soundex_br('Almeida')
'A453'
>>> soundex_br('Barbosa')
'B612'
>>> soundex_br('Araújo')
'A620'
>>> soundex_br('Gonçalves')
'G524'
>>> soundex_br('Goncalves')
'G524'
abydos.phonetic.soundex_br(word, max_length=4, zero_pad=True)[source]

Return the SoundexBR encoding of a word.

This is a wrapper for SoundexBR.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 4)
  • zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
Returns:

The SoundexBR code

Return type:

str

Examples

>>> soundex_br('Oliveira')
'O416'
>>> soundex_br('Almeida')
'A453'
>>> soundex_br('Barbosa')
'B612'
>>> soundex_br('Araújo')
'A620'
>>> soundex_br('Gonçalves')
'G524'
>>> soundex_br('Goncalves')
'G524'
class abydos.phonetic.PhoneticSpanish[source]

Bases: abydos.phonetic._phonetic._Phonetic

PhoneticSpanish.

This follows the coding described in [AmonME12] and [delPAngelesEGGM15].

encode(word, max_length=-1)[source]

Return the PhoneticSpanish coding of word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to unlimited)
Returns:

The PhoneticSpanish code

Return type:

str

Examples

>>> pe = PhoneticSpanish()
>>> pe.encode('Perez')
'094'
>>> pe.encode('Martinez')
'69364'
>>> pe.encode('Gutierrez')
'83994'
>>> pe.encode('Santiago')
'4638'
>>> pe.encode('Nicolás')
'6454'
abydos.phonetic.phonetic_spanish(word, max_length=-1)[source]

Return the PhoneticSpanish coding of word.

This is a wrapper for PhoneticSpanish.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to unlimited)
Returns:

The PhoneticSpanish code

Return type:

str

Examples

>>> phonetic_spanish('Perez')
'094'
>>> phonetic_spanish('Martinez')
'69364'
>>> phonetic_spanish('Gutierrez')
'83994'
>>> phonetic_spanish('Santiago')
'4638'
>>> phonetic_spanish('Nicolás')
'6454'
class abydos.phonetic.SpanishMetaphone[source]

Bases: abydos.phonetic._phonetic._Phonetic

Spanish Metaphone.

This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at https://github.com/amsqr/Spanish-Metaphone and discussed in [MLM12].

Modified version based on [delPAngelesBailonM16].

encode(word, max_length=6, modified=False)[source]

Return the Spanish Metaphone of a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 6)
  • modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm
Returns:

The Spanish Metaphone code

Return type:

str

Examples

>>> pe = SpanishMetaphone()
>>> pe.encode('Perez')
'PRZ'
>>> pe.encode('Martinez')
'MRTNZ'
>>> pe.encode('Gutierrez')
'GTRRZ'
>>> pe.encode('Santiago')
'SNTG'
>>> pe.encode('Nicolás')
'NKLS'
abydos.phonetic.spanish_metaphone(word, max_length=6, modified=False)[source]

Return the Spanish Metaphone of a word.

This is a wrapper for SpanishMetaphone.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to 6)
  • modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm
Returns:

The Spanish Metaphone code

Return type:

str

Examples

>>> spanish_metaphone('Perez')
'PRZ'
>>> spanish_metaphone('Martinez')
'MRTNZ'
>>> spanish_metaphone('Gutierrez')
'GTRRZ'
>>> spanish_metaphone('Santiago')
'SNTG'
>>> spanish_metaphone('Nicolás')
'NKLS'
class abydos.phonetic.SfinxBis[source]

Bases: abydos.phonetic._phonetic._Phonetic

SfinxBis code.

SfinxBis is a Soundex-like algorithm defined in [Axe09].

This implementation follows the reference implementation: [Sjoo09].

SfinxBis is intended chiefly for Swedish names.

encode(word, max_length=-1)[source]

Return the SfinxBis code for a word.

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to unlimited)
Returns:

The SfinxBis value

Return type:

tuple

Examples

>>> pe = SfinxBis()
>>> pe.encode('Christopher')
('K68376',)
>>> pe.encode('Niall')
('N4',)
>>> pe.encode('Smith')
('S53',)
>>> pe.encode('Schmidt')
('S53',)
>>> pe.encode('Johansson')
('J585',)
>>> pe.encode('Sjöberg')
('#162',)
abydos.phonetic.sfinxbis(word, max_length=-1)[source]

Return the SfinxBis code for a word.

This is a wrapper for SfinxBis.encode().

Parameters:
  • word (str) -- The word to transform
  • max_length (int) -- The length of the code returned (defaults to unlimited)
Returns:

The SfinxBis value

Return type:

tuple

Examples

>>> sfinxbis('Christopher')
('K68376',)
>>> sfinxbis('Niall')
('N4',)
>>> sfinxbis('Smith')
('S53',)
>>> sfinxbis('Schmidt')
('S53',)
>>> sfinxbis('Johansson')
('J585',)
>>> sfinxbis('Sjöberg')
('#162',)
class abydos.phonetic.Norphone[source]

Bases: abydos.phonetic._phonetic._Phonetic

Norphone.

The reference implementation by Lars Marius Garshol is available in [Gar15].

Norphone was designed for Norwegian, but this implementation has been extended to support Swedish vowels as well. This function incorporates the "not implemented" rules from the above file's rule set.

encode(word)[source]

Return the Norphone code.

Parameters:word (str) -- The word to transform
Returns:The Norphone code
Return type:str

Examples

>>> pe = Norphone()
>>> pe.encode('Hansen')
'HNSN'
>>> pe.encode('Larsen')
'LRSN'
>>> pe.encode('Aagaard')
'ÅKRT'
>>> pe.encode('Braaten')
'BRTN'
>>> pe.encode('Sandvik')
'SNVK'
abydos.phonetic.norphone(word)[source]

Return the Norphone code.

This is a wrapper for Norphone.encode().

Parameters:word (str) -- The word to transform
Returns:The Norphone code
Return type:str

Examples

>>> norphone('Hansen')
'HNSN'
>>> norphone('Larsen')
'LRSN'
>>> norphone('Aagaard')
'ÅKRT'
>>> norphone('Braaten')
'BRTN'
>>> norphone('Sandvik')
'SNVK'