abydos.phonetic package¶

abydos.phonetic.

The phonetic package includes classes for phonetic algorithms, including:

Robert C. Russell's Index (RussellIndex)

American Soundex (Soundex)

Refined Soundex (RefinedSoundex)

Daitch-Mokotoff Soundex (DaitchMokotoff)

NYSIIS (NYSIIS)

Match Rating Algorithm (phonetic.MRA)

Metaphone (Metaphone)

Double Metaphone (DoubleMetaphone)

Caverphone (Caverphone)

Alpha Search Inquiry System (AlphaSIS)

Fuzzy Soundex (FuzzySoundex)

Phonex (Phonex)

Phonem (Phonem)

Phonix (Phonix)

Standardized Phonetic Frequency Code (SPFC)

Statistics Canada (StatisticsCanada)

LEIN (LEIN)

Roger Root (RogerRoot)

Eudex phonetic hash (phonetic.Eudex)

Parmar-Kumbharana (ParmarKumbharana)

Davidson's Consonant Code (Davidson)

SoundD (SoundD)

PSHP Soundex/Viewex Coding (PSHPSoundexFirst and PSHPSoundexLast)

Dolby Code (Dolby)

NRL English-to-phoneme (NRL)

Beider-Morse Phonetic Matching (BeiderMorse)

There are also language-specific phonetic algorithms for German:

Kölner Phonetik (Koelner)

phonet (Phonet)

Haase Phonetik (Haase)

Reth-Schek Phonetik (RethSchek)

For French:

FONEM (FONEM)

an early version of Henry Code (HenryEarly)

For Spanish:

Phonetic Spanish (PhoneticSpanish)

Spanish Metaphone (SpanishMetaphone)

For Swedish:

SfinxBis (SfinxBis)

Wåhlin (Waahlin)

For Norwegian:

Norphone (Norphone)

For Brazilian Portuguese:

SoundexBR (SoundexBR)

And there are some hybrid phonetic algorithms that employ multiple underlying phonetic algorithms:

Oxford Name Compression Algorithm (ONCA) (ONCA)

MetaSoundex (MetaSoundex)

Each class has an encode method to return the phonetically encoded string. Classes for which encode returns a numeric value generally have an encode_alpha method that returns an alphabetic version of the phonetic encoding, as demonstrated below:

>>> rus = RussellIndex()
>>> rus.encode('Abramson')
128637
>>> rus.encode_alpha('Abramson')
'ABRMCN'

class abydos.phonetic._Phonetic[source]¶

Bases: object

Abstract Phonetic class.

New in version 0.3.6.

_delete_consecutive_repeats(word)[source]¶

Delete consecutive repeated characters in a word.

Parameters: word (str) -- The word to transform
Returns: Word with consecutive repeating characters collapsed to a single instance
Return type: str

Examples

>>> pe = _Phonetic()
>>> pe._delete_consecutive_repeats('REDDEE')
'REDE'
>>> pe._delete_consecutive_repeats('AEIOU')
'AEIOU'
>>> pe._delete_consecutive_repeats('AAACCCTTTGGG')
'ACTG'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_lc_set = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}¶

_lc_v_set = {'a', 'e', 'i', 'o', 'u'}¶

_lc_vy_set = {'a', 'e', 'i', 'o', 'u', 'y'}¶

_uc_set = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'}¶

_uc_v_set = {'A', 'E', 'I', 'O', 'U'}¶

_uc_vy_set = {'A', 'E', 'I', 'O', 'U', 'Y'}¶

encode(word)[source]¶

Encode phonetically.

Parameters: word (str) -- The word to transform

New in version 0.3.6.

encode_alpha(word)[source]¶

Encode phonetically using alphabetic characters.

Parameters: word (str) -- The word to transform
Returns: The word transformed
Return type: str

New in version 0.3.6.

class abydos.phonetic.RussellIndex[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Russell Index.

This follows Robert C. Russell's Index algorithm, as described in [Rus18].

New in version 0.3.6.

_num_set = {'1', '2', '3', '4', '5', '6', '7', '8'}¶

_num_trans = {49: 'A', 50: 'B', 51: 'C', 52: 'D', 53: 'L', 54: 'M', 55: 'N', 56: 'R'}¶

_to_alpha(num)[source]¶

Convert the Russell Index integer to an alphabetic string.

This follows Robert C. Russell's Index algorithm, as described in [Rus18].

Parameters: num (int) -- A Russell Index integer value
Returns: The Russell Index as an alphabetic string
Return type: str

Examples

>>> pe = RussellIndex()
>>> pe._to_alpha(3813428)
'CRACDBR'
>>> pe._to_alpha(715)
'NAL'
>>> pe._to_alpha(3614)
'CMAD'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_trans = {65: '1', 66: '2', 67: '3', 68: '4', 69: '1', 70: '2', 71: '3', 73: '1', 75: '3', 76: '5', 77: '6', 78: '7', 79: '1', 80: '2', 81: '3', 82: '8', 83: '3', 84: '4', 85: '1', 86: '2', 88: '3', 89: '1', 90: '3'}¶

_uc_set = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Y', 'Z'}¶

encode(word)[source]¶

Return the Russell Index (integer output) of a word.

Parameters: word (str) -- The word to transform
Returns: The Russell Index value
Return type: int

Examples

>>> pe = RussellIndex()
>>> pe.encode('Christopher')
3813428
>>> pe.encode('Niall')
715
>>> pe.encode('Smith')
3614
>>> pe.encode('Schmidt')
3614

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the Russell Index (alphabetic output) for the word.

This follows Robert C. Russell's Index algorithm, as described in [Rus18].

Parameters: word (str) -- The word to transform
Returns: The Russell Index value as an alphabetic string
Return type: str

Examples

>>> pe = RussellIndex()
>>> pe.encode_alpha('Christopher')
'CRACDBR'
>>> pe.encode_alpha('Niall')
'NAL'
>>> pe.encode_alpha('Smith')
'CMAD'
>>> pe.encode_alpha('Schmidt')
'CMAD'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.russell_index(word)[source]¶

Return the Russell Index (integer output) of a word.

This is a wrapper for RussellIndex.encode().

Parameters: word (str) -- The word to transform
Returns: The Russell Index value
Return type: int

Examples

>>> russell_index('Christopher')
3813428
>>> russell_index('Niall')
715
>>> russell_index('Smith')
3614
>>> russell_index('Schmidt')
3614

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RussellIndex.encode method instead.

abydos.phonetic.russell_index_num_to_alpha(num)[source]¶

Convert the Russell Index integer to an alphabetic string.

This is a wrapper for RussellIndex._to_alpha().

Parameters: num (int) -- A Russell Index integer value
Returns: The Russell Index as an alphabetic string
Return type: str

Examples

>>> russell_index_num_to_alpha(3813428)
'CRACDBR'
>>> russell_index_num_to_alpha(715)
'NAL'
>>> russell_index_num_to_alpha(3614)
'CMAD'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RussellIndex._to_alpha method instead.

abydos.phonetic.russell_index_alpha(word)[source]¶

Return the Russell Index (alphabetic output) for the word.

This is a wrapper for RussellIndex.encode_alpha().

Parameters: word (str) -- The word to transform
Returns: The Russell Index value as an alphabetic string
Return type: str

Examples

>>> russell_index_alpha('Christopher')
'CRACDBR'
>>> russell_index_alpha('Niall')
'NAL'
>>> russell_index_alpha('Smith')
'CMAD'
>>> russell_index_alpha('Schmidt')
'CMAD'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RussellIndex.encode_alpha method instead.

class abydos.phonetic.Soundex(max_length=4, var='American', reverse=False, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Soundex.

Three variants of Soundex are implemented:

'American' follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
'special' follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
'Census' follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names

New in version 0.3.6.

Initialize Soundex instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
var (str) --
The variant of the algorithm to employ (defaults to American):
- American follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
- special follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
- Census follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 57: 'H'}¶

_trans = {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '9', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '2', 84: '3', 85: '0', 86: '1', 87: '9', 88: '2', 89: '0', 90: '2'}¶

encode(word)[source]¶

Return the Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The Soundex value
Return type: str

Examples

>>> pe = Soundex()
>>> pe.encode("Christopher")
'C623'
>>> pe.encode("Niall")
'N400'
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Schmidt')
'S530'

>>> Soundex(max_length=-1).encode('Christopher')
'C623160000000000000000000000000000000000000000000000000000000000'
>>> Soundex(max_length=-1, zero_pad=False).encode('Christopher')
'C62316'

>>> Soundex(reverse=True).encode('Christopher')
'R132'

>>> pe.encode('Ashcroft')
'A261'
>>> pe.encode('Asicroft')
'A226'

>>> pe_special = Soundex(var='special')
>>> pe_special.encode('Ashcroft')
'A226'
>>> pe_special.encode('Asicroft')
'A226'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Soundex value
Return type: str

Examples

>>> pe = Soundex()
>>> pe.encode_alpha("Christopher")
'CRKT'
>>> pe.encode_alpha("Niall")
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SNT'

New in version 0.4.0.

abydos.phonetic.soundex(word, max_length=4, var='American', reverse=False, zero_pad=True)[source]¶

Return the Soundex code for a word.

This is a wrapper for Soundex.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
var (str) --
The variant of the algorithm to employ (defaults to American):
- American follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
- special follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
- Census follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The Soundex value

Return type

str

Examples

>>> soundex("Christopher")
'C623'
>>> soundex("Niall")
'N400'
>>> soundex('Smith')
'S530'
>>> soundex('Schmidt')
'S530'

>>> soundex('Christopher', max_length=-1)
'C623160000000000000000000000000000000000000000000000000000000000'
>>> soundex('Christopher', max_length=-1, zero_pad=False)
'C62316'

>>> soundex('Christopher', reverse=True)
'R132'

>>> soundex('Ashcroft')
'A261'
>>> soundex('Asicroft')
'A226'
>>> soundex('Ashcroft', var='special')
'A226'
>>> soundex('Asicroft', var='special')
'A226'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Soundex.encode method instead.

class abydos.phonetic.RefinedSoundex(max_length=-1, zero_pad=False, retain_vowels=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Refined Soundex.

This is Soundex, but with more character classes. It was defined at [Boy98].

New in version 0.3.6.

Initialize RefinedSoundex instance.

Parameters

max_length (int) -- The length of the code returned (defaults to unlimited)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
retain_vowels (bool) -- Retain vowels (as 0) in the resulting code

New in version 0.4.0.

_alphabetic = {49: 'P', 50: 'F', 51: 'K', 52: 'G', 53: 'Z', 54: 'T', 55: 'L', 56: 'N', 57: 'R'}¶

_trans = {65: '0', 66: '1', 67: '3', 68: '6', 69: '0', 70: '2', 71: '4', 72: '0', 73: '0', 74: '4', 75: '3', 76: '7', 77: '8', 78: '8', 79: '0', 80: '1', 81: '5', 82: '9', 83: '3', 84: '6', 85: '0', 86: '2', 87: '0', 88: '5', 89: '0', 90: '5'}¶

encode(word)[source]¶

Return the Refined Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The Refined Soundex value
Return type: str

Examples

>>> pe = RefinedSoundex()
>>> pe.encode('Christopher')
'C93619'
>>> pe.encode('Niall')
'N7'
>>> pe.encode('Smith')
'S86'
>>> pe.encode('Schmidt')
'S386'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Refined Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Refined Soundex value
Return type: str

Examples

>>> pe = RefinedSoundex()
>>> pe.encode_alpha('Christopher')
'CRKTPR'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SKNT'

New in version 0.4.0.

abydos.phonetic.refined_soundex(word, max_length=-1, zero_pad=False, retain_vowels=False)[source]¶

Return the Refined Soundex code for a word.

This is a wrapper for RefinedSoundex.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to unlimited)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
retain_vowels (bool) -- Retain vowels (as 0) in the resulting code

Returns

The Refined Soundex value

Return type

str

Examples

>>> refined_soundex('Christopher')
'C93619'
>>> refined_soundex('Niall')
'N7'
>>> refined_soundex('Smith')
'S86'
>>> refined_soundex('Schmidt')
'S386'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RefinedSoundex.encode method instead.

class abydos.phonetic.DaitchMokotoff(max_length=6, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Daitch-Mokotoff Soundex.

Based on Daitch-Mokotoff Soundex [Mok97], this returns values of a word as a set. A collection is necessary since there can be multiple values for a single word.

New in version 0.3.6.

Initialize DaitchMokotoff instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {48: 'A', 49: 'Y', 50: 'ﬆ', 51: 'T', 52: 'S', 53: 'K', 54: 'N', 55: 'P', 56: 'L', 57: 'R'}¶

_alphabetic_non_initials = {48: ' ', 49: 'A', 50: ' ', 51: 'T', 52: 'S', 53: 'K', 54: 'N', 55: 'P', 56: 'L', 57: 'R'}¶

_dms_order = {'A': ('AI', 'AJ', 'AU', 'AY', 'A'), 'B': ('B',), 'C': ('CHS', 'CSZ', 'CZS', 'CH', 'CK', 'CS', 'CZ', 'C'), 'D': ('DRS', 'DRZ', 'DSH', 'DSZ', 'DZH', 'DZS', 'DS', 'DT', 'DZ', 'D'), 'E': ('EI', 'EJ', 'EU', 'EY', 'E'), 'F': ('FB', 'F'), 'G': ('G',), 'H': ('H',), 'I': ('IA', 'IE', 'IO', 'IU', 'I'), 'J': ('J',), 'K': ('KH', 'KS', 'K'), 'L': ('L',), 'M': ('MN', 'M'), 'N': ('NM', 'N'), 'O': ('OI', 'OJ', 'OY', 'O'), 'P': ('PF', 'PH', 'P'), 'Q': ('Q',), 'R': ('RS', 'RZ', 'R'), 'S': ('SCHTSCH', 'SCHTCH', 'SCHTSH', 'SHTCH', 'SHTSH', 'STSCH', 'SCHD', 'SCHT', 'SHCH', 'STCH', 'STRS', 'STRZ', 'STSH', 'SZCS', 'SZCZ', 'SCH', 'SHD', 'SHT', 'SZD', 'SZT', 'SC', 'SD', 'SH', 'ST', 'SZ', 'S'), 'T': ('TTSCH', 'TSCH', 'TTCH', 'TTSZ', 'TCH', 'THS', 'TRS', 'TRZ', 'TSH', 'TSZ', 'TTS', 'TTZ', 'TZS', 'TC', 'TH', 'TS', 'TZ', 'T'), 'U': ('UE', 'UI', 'UJ', 'UY', 'U'), 'V': ('V',), 'W': ('W',), 'X': ('X',), 'Y': ('Y',), 'Z': ('ZHDZH', 'ZDZH', 'ZSCH', 'ZDZ', 'ZHD', 'ZSH', 'ZD', 'ZH', 'ZS', 'Z')}¶

_dms_table = {'A': (0, '_', '_'), 'AI': (0, 1, '_'), 'AJ': (0, 1, '_'), 'AU': (0, 7, '_'), 'AY': (0, 1, '_'), 'B': (7, 7, 7), 'C': ((5, 4), (5, 4), (5, 4)), 'CH': ((5, 4), (5, 4), (5, 4)), 'CHS': (5, 54, 54), 'CK': ((5, 45), (5, 45), (5, 45)), 'CS': (4, 4, 4), 'CSZ': (4, 4, 4), 'CZ': (4, 4, 4), 'CZS': (4, 4, 4), 'D': (3, 3, 3), 'DRS': (4, 4, 4), 'DRZ': (4, 4, 4), 'DS': (4, 4, 4), 'DSH': (4, 4, 4), 'DSZ': (4, 4, 4), 'DT': (3, 3, 3), 'DZ': (4, 4, 4), 'DZH': (4, 4, 4), 'DZS': (4, 4, 4), 'E': (0, '_', '_'), 'EI': (0, 1, '_'), 'EJ': (0, 1, '_'), 'EU': (1, 1, '_'), 'EY': (0, 1, '_'), 'F': (7, 7, 7), 'FB': (7, 7, 7), 'G': (5, 5, 5), 'H': (5, 5, '_'), 'I': (0, '_', '_'), 'IA': (1, '_', '_'), 'IE': (1, '_', '_'), 'IO': (1, '_', '_'), 'IU': (1, '_', '_'), 'J': ((1, 4), ('_', 4), ('_', 4)), 'K': (5, 5, 5), 'KH': (5, 5, 5), 'KS': (5, 54, 54), 'L': (8, 8, 8), 'M': (6, 6, 6), 'MN': ('6_6', '6_6', '6_6'), 'N': (6, 6, 6), 'NM': ('6_6', '6_6', '6_6'), 'O': (0, '_', '_'), 'OI': (0, 1, '_'), 'OJ': (0, 1, '_'), 'OY': (0, 1, '_'), 'P': (7, 7, 7), 'PF': (7, 7, 7), 'PH': (7, 7, 7), 'Q': (5, 5, 5), 'R': (9, 9, 9), 'RS': ((94, 4), (94, 4), (94, 4)), 'RZ': ((94, 4), (94, 4), (94, 4)), 'S': (4, 4, 4), 'SC': (2, 4, 4), 'SCH': (4, 4, 4), 'SCHD': (2, 43, 43), 'SCHT': (2, 43, 43), 'SCHTCH': (2, 4, 4), 'SCHTSCH': (2, 4, 4), 'SCHTSH': (2, 4, 4), 'SD': (2, 43, 43), 'SH': (4, 4, 4), 'SHCH': (2, 4, 4), 'SHD': (2, 43, 43), 'SHT': (2, 43, 43), 'SHTCH': (2, 4, 4), 'SHTSH': (2, 4, 4), 'ST': (2, 43, 43), 'STCH': (2, 4, 4), 'STRS': (2, 4, 4), 'STRZ': (2, 4, 4), 'STSCH': (2, 4, 4), 'STSH': (2, 4, 4), 'SZ': (4, 4, 4), 'SZCS': (2, 4, 4), 'SZCZ': (2, 4, 4), 'SZD': (2, 43, 43), 'SZT': (2, 43, 43), 'T': (3, 3, 3), 'TC': (4, 4, 4), 'TCH': (4, 4, 4), 'TH': (3, 3, 3), 'THS': (4, 4, 4), 'TRS': (4, 4, 4), 'TRZ': (4, 4, 4), 'TS': (4, 4, 4), 'TSCH': (4, 4, 4), 'TSH': (4, 4, 4), 'TSZ': (4, 4, 4), 'TTCH': (4, 4, 4), 'TTS': (4, 4, 4), 'TTSCH': (4, 4, 4), 'TTSZ': (4, 4, 4), 'TTZ': (4, 4, 4), 'TZ': (4, 4, 4), 'TZS': (4, 4, 4), 'U': (0, '_', '_'), 'UE': (0, '_', '_'), 'UI': (0, 1, '_'), 'UJ': (0, 1, '_'), 'UY': (0, 1, '_'), 'V': (7, 7, 7), 'W': (7, 7, 7), 'X': (5, 54, 54), 'Y': (1, '_', '_'), 'Z': (4, 4, 4), 'ZD': (2, 43, 43), 'ZDZ': (2, 4, 4), 'ZDZH': (2, 4, 4), 'ZH': (4, 4, 4), 'ZHD': (2, 43, 43), 'ZHDZH': (2, 4, 4), 'ZS': (4, 4, 4), 'ZSCH': (4, 4, 4), 'ZSH': (4, 4, 4)}¶

_uc_v_set = {'A', 'E', 'I', 'J', 'O', 'U', 'Y'}¶

encode(word)[source]¶

Return the Daitch-Mokotoff Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The Daitch-Mokotoff Soundex value
Return type: str

Examples

>>> pe = DaitchMokotoff()
>>> sorted(pe.encode('Christopher'))
['494379', '594379']
>>> pe.encode('Niall')
{'680000'}
>>> pe.encode('Smith')
{'463000'}
>>> pe.encode('Schmidt')
{'463000'}

>>> sorted(DaitchMokotoff(max_length=20,
... zero_pad=False).encode('The quick brown fox'))
['35457976754', '3557976754']

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Daitch-Mokotoff Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Daitch-Mokotoff Soundex value
Return type: str

Examples

>>> pe = DaitchMokotoff()
>>> sorted(pe.encode_alpha('Christopher'))
['KRSTPR', 'SRSTPR']
>>> pe.encode_alpha('Niall')
{'NL'}
>>> pe.encode_alpha('Smith')
{'SNT'}
>>> pe.encode_alpha('Schmidt')
{'SNT'}

>>> sorted(DaitchMokotoff(max_length=20,
... zero_pad=False).encode_alpha('The quick brown fox'))
['TKKPRPNPKS', 'TKSKPRPNPKS']

New in version 0.4.0.

abydos.phonetic.dm_soundex(word, max_length=6, zero_pad=True)[source]¶

Return the Daitch-Mokotoff Soundex code for a word.

This is a wrapper for DaitchMokotoff.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The Daitch-Mokotoff Soundex value

Return type

str

Examples

>>> sorted(dm_soundex('Christopher'))
['494379', '594379']
>>> dm_soundex('Niall')
{'680000'}
>>> dm_soundex('Smith')
{'463000'}
>>> dm_soundex('Schmidt')
{'463000'}

>>> sorted(dm_soundex('The quick brown fox', max_length=20,
... zero_pad=False))
['35457976754', '3557976754']

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the DaitchMokotoff.encode method instead.

class abydos.phonetic.FuzzySoundex(max_length=5, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Fuzzy Soundex.

Fuzzy Soundex is an algorithm derived from Soundex, defined in [HM02].

New in version 0.3.6.

Initialize FuzzySoundex instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {48: 'A', 49: 'P', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 55: 'K', 57: 'S'}¶

_trans = {65: '0', 66: '1', 67: '9', 68: '3', 69: '0', 70: '1', 71: '7', 72: '-', 73: '0', 74: '7', 75: '7', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '7', 82: '6', 83: '9', 84: '3', 85: '0', 86: '1', 87: '-', 88: '7', 89: '-', 90: '9'}¶

encode(word)[source]¶

Return the Fuzzy Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The Fuzzy Soundex value
Return type: str

Examples

>>> pe = FuzzySoundex()
>>> pe.encode('Christopher')
'K6931'
>>> pe.encode('Niall')
'N4000'
>>> pe.encode('Smith')
'S5300'
>>> pe.encode('Smith')
'S5300'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Fuzzy Soundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Fuzzy Soundex value
Return type: str

Examples

>>> pe = FuzzySoundex()
>>> pe.encode_alpha('Christopher')
'KRSTP'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SNT'

New in version 0.4.0.

abydos.phonetic.fuzzy_soundex(word, max_length=5, zero_pad=True)[source]¶

Return the Fuzzy Soundex code for a word.

This is a wrapper for FuzzySoundex.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The Fuzzy Soundex value

Return type

str

Examples

>>> fuzzy_soundex('Christopher')
'K6931'
>>> fuzzy_soundex('Niall')
'N4000'
>>> fuzzy_soundex('Smith')
'S5300'
>>> fuzzy_soundex('Smith')
'S5300'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the FuzzySoundex.encode method instead.

class abydos.phonetic.LEIN(max_length=4, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

LEIN code.

This is Michigan LEIN (Law Enforcement Information Network) name coding, described in [MKTM77].

New in version 0.3.6.

Initialize LEIN instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {49: 'T', 50: 'N', 51: 'L', 52: 'P', 53: 'K'}¶

_del_trans = {32: None, 65: None, 69: None, 72: None, 73: None, 79: None, 85: None, 87: None, 89: None}¶

_trans = {66: '4', 67: '5', 68: '1', 70: '4', 71: '5', 74: '5', 75: '5', 76: '3', 77: '2', 78: '2', 80: '4', 81: '5', 82: '3', 83: '5', 84: '1', 86: '4', 88: '5', 90: '5'}¶

encode(word)[source]¶

Return the LEIN code for a word.

Parameters: word (str) -- The word to transform
Returns: The LEIN code
Return type: str

Examples

>>> pe = LEIN()
>>> pe.encode('Christopher')
'C351'
>>> pe.encode('Niall')
'N300'
>>> pe.encode('Smith')
'S210'
>>> pe.encode('Schmidt')
'S521'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic LEIN code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic LEIN code
Return type: str

Examples

>>> pe = LEIN()
>>> pe.encode_alpha('Christopher')
'CLKT'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SKNT'

New in version 0.4.0.

abydos.phonetic.lein(word, max_length=4, zero_pad=True)[source]¶

Return the LEIN code for a word.

This is a wrapper for LEIN.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The LEIN code

Return type

str

Examples

>>> lein('Christopher')
'C351'
>>> lein('Niall')
'N300'
>>> lein('Smith')
'S210'
>>> lein('Schmidt')
'S521'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the LEIN.encode method instead.

class abydos.phonetic.Phonex(max_length=4, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Phonex code.

Phonex is an algorithm derived from Soundex, defined in [LR96].

New in version 0.3.6.

Initialize Phonex instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {49: 'P', 50: 'S', 51: 'T', 52: 'L', 53: 'N', 54: 'R'}¶

encode(word)[source]¶

Return the Phonex code for a word.

Parameters: word (str) -- The word to transform
Returns: The Phonex value
Return type: str

Examples

>>> pe = Phonex()
>>> pe.encode('Christopher')
'C623'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Schmidt')
'S253'
>>> pe.encode('Smith')
'S530'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Phonex code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Phonex value
Return type: str

Examples

>>> pe = Phonex()
>>> pe.encode_alpha('Christopher')
'CRST'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SSNT'

New in version 0.4.0.

abydos.phonetic.phonex(word, max_length=4, zero_pad=True)[source]¶

Return the Phonex code for a word.

This is a wrapper for Phonex.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The Phonex value

Return type

str

Examples

>>> phonex('Christopher')
'C623'
>>> phonex('Niall')
'N400'
>>> phonex('Schmidt')
'S253'
>>> phonex('Smith')
'S530'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonex.encode method instead.

class abydos.phonetic.Phonix(max_length=4, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Phonix code.

Phonix is a Soundex-like algorithm defined in [Gad90].

This implementation is based on: - [Pfe00] - [Chr11] - [Kollar]

New in version 0.3.6.

Initialize Phonix instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.3.6.

_alphabetic = {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 55: 'F', 56: 'S'}¶

_substitutions = None¶

_trans = {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '7', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '8', 84: '3', 85: '0', 86: '7', 87: '0', 88: '8', 89: '0', 90: '8'}¶

_uc_c_set = None¶

encode(word)[source]¶

Return the Phonix code for a word.

Parameters: word (str) -- The word to transform
Returns: The Phonix value
Return type: str

Examples

>>> pe = Phonix()
>>> pe.encode('Christopher')
'K683'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Schmidt')
'S530'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Phonix code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Phonix value
Return type: str

Examples

>>> pe = Phonix()
>>> pe.encode_alpha('Christopher')
'KRST'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SNT'

New in version 0.4.0.

abydos.phonetic.phonix(word, max_length=4, zero_pad=True)[source]¶

Return the Phonix code for a word.

This is a wrapper for Phonix.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The Phonix value

Return type

str

Examples

>>> phonix('Christopher')
'K683'
>>> phonix('Niall')
'N400'
>>> phonix('Smith')
'S530'
>>> phonix('Schmidt')
'S530'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonix.encode method instead.

class abydos.phonetic.PSHPSoundexFirst(max_length=4, german=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

PSHP Soundex/Viewex Coding of a first name.

This coding is based on [HBD76].

Reference was also made to the German version of the same: [HBD79].

A separate class, PSHPSoundexLast is used for last names.

New in version 0.3.6.

Initialize PSHPSoundexFirst instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)

New in version 0.4.0.

_alphabetic = {49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N'}¶

_trans = {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '5', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶

encode(fname)[source]¶

Calculate the PSHP Soundex/Viewex Coding of a first name.

Parameters: fname (str) -- The first name to encode
Returns: The PSHP Soundex/Viewex Coding
Return type: str

Examples

>>> pe = PSHPSoundexFirst()
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Waters')
'W352'
>>> pe.encode('James')
'J700'
>>> pe.encode('Schmidt')
'S500'
>>> pe.encode('Ashcroft')
'A220'
>>> pe.encode('John')
'J500'
>>> pe.encode('Colin')
'K400'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Sally')
'S400'
>>> pe.encode('Jane')
'J500'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(fname)[source]¶

Calculate the alphabetic PSHP Soundex/Viewex Coding of a first name.

Parameters: fname (str) -- The first name to encode
Returns: The alphabetic PSHP Soundex/Viewex Coding
Return type: str

Examples

>>> pe = PSHPSoundexFirst()
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Waters')
'WTNK'
>>> pe.encode_alpha('James')
'JN'
>>> pe.encode_alpha('Schmidt')
'SN'
>>> pe.encode_alpha('Ashcroft')
'AKK'
>>> pe.encode_alpha('John')
'JN'
>>> pe.encode_alpha('Colin')
'KL'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Sally')
'SL'
>>> pe.encode_alpha('Jane')
'JN'

New in version 0.4.0.

abydos.phonetic.pshp_soundex_first(fname, max_length=4, german=False)[source]¶

Calculate the PSHP Soundex/Viewex Coding of a first name.

This is a wrapper for PSHPSoundexFirst.encode().

Parameters

fname (str) -- The first name to encode
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)

Returns

The PSHP Soundex/Viewex Coding

Return type

str

Examples

>>> pshp_soundex_first('Smith')
'S530'
>>> pshp_soundex_first('Waters')
'W352'
>>> pshp_soundex_first('James')
'J700'
>>> pshp_soundex_first('Schmidt')
'S500'
>>> pshp_soundex_first('Ashcroft')
'A220'
>>> pshp_soundex_first('John')
'J500'
>>> pshp_soundex_first('Colin')
'K400'
>>> pshp_soundex_first('Niall')
'N400'
>>> pshp_soundex_first('Sally')
'S400'
>>> pshp_soundex_first('Jane')
'J500'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PSHPSoundexFirst.encode method instead.

class abydos.phonetic.PSHPSoundexLast(max_length=4, german=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

PSHP Soundex/Viewex Coding of a last name.

This coding is based on [HBD76].

Reference was also made to the German version of the same: [HBD79].

A separate function, PSHPSoundexFirst is used for first names.

New in version 0.3.6.

Initialize PSHPSoundexLast instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)

New in version 0.4.0.

_alphabetic = {49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N'}¶

_trans = {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '5', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶

encode(lname)[source]¶

Calculate the PSHP Soundex/Viewex Coding of a last name.

Parameters: lname (str) -- The last name to encode
Returns: The PSHP Soundex/Viewex Coding
Return type: str

Examples

>>> pe = PSHPSoundexLast()
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Waters')
'W350'
>>> pe.encode('James')
'J500'
>>> pe.encode('Schmidt')
'S530'
>>> pe.encode('Ashcroft')
'A225'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(lname)[source]¶

Calculate the alphabetic PSHP Soundex/Viewex Coding of a last name.

Parameters: lname (str) -- The last name to encode
Returns: The PSHP alphabetic Soundex/Viewex Coding
Return type: str

Examples

>>> pe = PSHPSoundexLast()
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Waters')
'WTN'
>>> pe.encode_alpha('James')
'JN'
>>> pe.encode_alpha('Schmidt')
'SNT'
>>> pe.encode_alpha('Ashcroft')
'AKKN'

New in version 0.4.0.

abydos.phonetic.pshp_soundex_last(lname, max_length=4, german=False)[source]¶

Calculate the PSHP Soundex/Viewex Coding of a last name.

This is a wrapper for PSHPSoundexLast.encode().

Parameters

lname (str) -- The last name to encode
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)

Returns

The PSHP Soundex/Viewex Coding

Return type

str

Examples

>>> pshp_soundex_last('Smith')
'S530'
>>> pshp_soundex_last('Waters')
'W350'
>>> pshp_soundex_last('James')
'J500'
>>> pshp_soundex_last('Schmidt')
'S530'
>>> pshp_soundex_last('Ashcroft')
'A225'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PSHPSoundexLast.encode method instead.

class abydos.phonetic.NYSIIS(max_length=6, modified=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

NYSIIS Code.

The New York State Identification and Intelligence System algorithm is defined in [Taf70].

The modified version of this algorithm is described in Appendix B of [LA77].

New in version 0.3.6.

Initialize AlphaSIS instance.

Parameters

max_length (int) -- The maximum length (default 6) of the code to return
modified (bool) -- Indicates whether to use USDA modified NYSIIS

New in version 0.4.0.

encode(word)[source]¶

Return the NYSIIS code for a word.

Parameters: word (str) -- The word to transform
Returns: The NYSIIS value
Return type: str

Examples

>>> pe = NYSIIS()
>>> pe.encode('Christopher')
'CRASTA'
>>> pe.encode('Niall')
'NAL'
>>> pe.encode('Smith')
'SNAT'
>>> pe.encode('Schmidt')
'SNAD'

>>> NYSIIS(max_length=-1).encode('Christopher')
'CRASTAFAR'

>>> pe_8m = NYSIIS(max_length=8, modified=True)
>>> pe_8m.encode('Christopher')
'CRASTAFA'
>>> pe_8m.encode('Niall')
'NAL'
>>> pe_8m.encode('Smith')
'SNAT'
>>> pe_8m.encode('Schmidt')
'SNAD'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.nysiis(word, max_length=6, modified=False)[source]¶

Return the NYSIIS code for a word.

This is a wrapper for NYSIIS.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The maximum length (default 6) of the code to return
modified (bool) -- Indicates whether to use USDA modified NYSIIS

Returns

The NYSIIS value

Return type

str

Examples

>>> nysiis('Christopher')
'CRASTA'
>>> nysiis('Niall')
'NAL'
>>> nysiis('Smith')
'SNAT'
>>> nysiis('Schmidt')
'SNAD'

>>> nysiis('Christopher', max_length=-1)
'CRASTAFAR'

>>> nysiis('Christopher', max_length=8, modified=True)
'CRASTAFA'
>>> nysiis('Niall', max_length=8, modified=True)
'NAL'
>>> nysiis('Smith', max_length=8, modified=True)
'SNAT'
>>> nysiis('Schmidt', max_length=8, modified=True)
'SNAD'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the NYSIIS.encode method instead.

class abydos.phonetic.MRA[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Western Airlines Surname Match Rating Algorithm.

A description of the Western Airlines Surname Match Rating Algorithm can be found on page 18 of [MKTM77].

New in version 0.3.6.

encode(word)[source]¶

Return the MRA personal numeric identifier (PNI) for a word.

Parameters: word (str) -- The word to transform
Returns: The MRA PNI
Return type: str

Examples

>>> pe = MRA()
>>> pe.encode('Christopher')
'CHRPHR'
>>> pe.encode('Niall')
'NL'
>>> pe.encode('Smith')
'SMTH'
>>> pe.encode('Schmidt')
'SCHMDT'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.mra(word)[source]¶

Return the MRA personal numeric identifier (PNI) for a word.

This is a wrapper for MRA.encode().

Parameters: word (str) -- The word to transform
Returns: The MRA PNI
Return type: str

Examples

>>> mra('Christopher')
'CHRPHR'
>>> mra('Niall')
'NL'
>>> mra('Smith')
'SMTH'
>>> mra('Schmidt')
'SCHMDT'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the MRA.encode method instead.

class abydos.phonetic.Caverphone(version=2)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Caverphone.

A description of version 1 of the algorithm can be found in [Hoo02].

A description of version 2 of the algorithm can be found in [Hoo04].

New in version 0.3.6.

Initialize Caverphone instance.

Parameters: version (int) -- The version of Caverphone to employ for encoding (defaults to 2)

New in version 0.4.0.

encode(word)[source]¶

Return the Caverphone code for a word.

Parameters: word (str) -- The word to transform
Returns: The Caverphone value
Return type: str

Examples

>>> pe = Caverphone()
>>> pe.encode('Christopher')
'KRSTFA1111'
>>> pe.encode('Niall')
'NA11111111'
>>> pe.encode('Smith')
'SMT1111111'
>>> pe.encode('Schmidt')
'SKMT111111'

>>> pe_1 = Caverphone(version=1)
>>> pe_1.encode('Christopher')
'KRSTF1'
>>> pe_1.encode('Niall')
'N11111'
>>> pe_1.encode('Smith')
'SMT111'
>>> pe_1.encode('Schmidt')
'SKMT11'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Caverphone code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Caverphone value
Return type: str

Examples

>>> pe = Caverphone()
>>> pe.encode_alpha('Christopher')
'KRSTFA'
>>> pe.encode_alpha('Niall')
'NA'
>>> pe.encode_alpha('Smith')
'SMT'
>>> pe.encode_alpha('Schmidt')
'SKMT'

>>> pe_1 = Caverphone(version=1)
>>> pe_1.encode_alpha('Christopher')
'KRSTF'
>>> pe_1.encode_alpha('Niall')
'N'
>>> pe_1.encode_alpha('Smith')
'SMT'
>>> pe_1.encode_alpha('Schmidt')
'SKMT'

New in version 0.4.0.

abydos.phonetic.caverphone(word, version=2)[source]¶

Return the Caverphone code for a word.

This is a wrapper for Caverphone.encode().

Parameters

word (str) -- The word to transform
version (int) -- The version of Caverphone to employ for encoding (defaults to 2)

Returns

The Caverphone value

Return type

str

Examples

>>> caverphone('Christopher')
'KRSTFA1111'
>>> caverphone('Niall')
'NA11111111'
>>> caverphone('Smith')
'SMT1111111'
>>> caverphone('Schmidt')
'SKMT111111'

>>> caverphone('Christopher', 1)
'KRSTF1'
>>> caverphone('Niall', 1)
'N11111'
>>> caverphone('Smith', 1)
'SMT111'
>>> caverphone('Schmidt', 1)
'SKMT11'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Caverphone.encode method instead.

class abydos.phonetic.AlphaSIS(max_length=14)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Alpha-SIS.

The Alpha Search Inquiry System code is defined in [IBMCorporation73]. This implementation is based on the description in [MKTM77].

New in version 0.3.6.

Initialize AlphaSIS instance.

Parameters: max_length (int) -- The length of the code returned (defaults to 14)

New in version 0.4.0.

_alpha_sis_basic = {'B': '9', 'C': ('7', '6'), 'CE': '0', 'CH': ('6', '70', '0'), 'CI': '0', 'CK': ('7', '6'), 'CY': '0', 'CZ': ('70', '6', '0'), 'D': '1', 'DG': '7', 'DS': ('0', '10'), 'DZ': ('0', '10'), 'F': '8', 'G': '7', 'J': '6', 'K': ('7', '6'), 'L': '5', 'M': '3', 'N': '2', 'P': '9', 'PH': '8', 'Q': '7', 'R': '4', 'S': '0', 'SCH': '6', 'SH': '6', 'T': '1', 'TS': ('0', '10'), 'TZ': ('0', '10'), 'V': '8', 'X': '7', 'Z': '0'}¶

_alpha_sis_basic_order = ('SCH', 'CZ', 'CH', 'CK', 'DS', 'DZ', 'TS', 'TZ', 'CI', 'CY', 'CE', 'SH', 'DG', 'PH', 'C', 'K', 'Z', 'S', 'D', 'T', 'N', 'M', 'R', 'L', 'J', 'C', 'G', 'K', 'Q', 'X', 'F', 'V', 'B', 'P')¶

_alpha_sis_initials = {'A': '1', 'E': '1', 'GF': '08', 'GM': '03', 'GN': '02', 'H': '2', 'I': '1', 'J': '3', 'KN': '02', 'O': '1', 'PF': '08', 'PN': '02', 'PS': '00', 'U': '1', 'W': '4', 'WR': '04', 'Y': '5'}¶

_alpha_sis_initials_order = ('GF', 'GM', 'GN', 'KN', 'PF', 'PN', 'PS', 'WR', 'A', 'E', 'H', 'I', 'J', 'O', 'U', 'W', 'Y')¶

_alphabetic_initials = {48: ' ', 49: 'A', 50: 'H', 51: 'J', 52: 'W', 53: 'Y'}¶

_alphabetic_non_initials = {48: 'S', 49: 'T', 50: 'N', 51: 'M', 52: 'R', 53: 'L', 54: 'J', 55: 'K', 56: 'F', 57: 'P'}¶

encode(word)[source]¶

Return the IBM Alpha Search Inquiry System code for a word.

A collection is necessary as the return type since there can be multiple values for a single word. But the collection must be ordered since the first value is the primary coding.

Parameters: word (str) -- The word to transform
Returns: The Alpha-SIS value
Return type: tuple

Examples

>>> pe = AlphaSIS()
>>> pe.encode('Christopher')
('06401840000000', '07040184000000', '04018400000000')
>>> pe.encode('Niall')
('02500000000000',)
>>> pe.encode('Smith')
('03100000000000',)
>>> pe.encode('Schmidt')
('06310000000000',)

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Alpha-SIS code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Alpha-SIS value
Return type: tuple

Examples

>>> pe = AlphaSIS()
>>> pe.encode_alpha('Christopher')
('JRSTFR', 'KSRSTFR', 'RSTFR')
>>> pe.encode_alpha('Niall')
('NL',)
>>> pe.encode_alpha('Smith')
('MT',)
>>> pe.encode_alpha('Schmidt')
('JMT',)

New in version 0.4.0.

abydos.phonetic.alpha_sis(word, max_length=14)[source]¶

Return the IBM Alpha Search Inquiry System code for a word.

This is a wrapper for AlphaSIS.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 14)

Returns

The Alpha-SIS value

Return type

tuple

Examples

>>> alpha_sis('Christopher')
('06401840000000', '07040184000000', '04018400000000')
>>> alpha_sis('Niall')
('02500000000000',)
>>> alpha_sis('Smith')
('03100000000000',)
>>> alpha_sis('Schmidt')
('06310000000000',)

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the AlphaSIS.encode method instead.

class abydos.phonetic.Davidson(omit_fname=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Davidson Consonant Code.

This is based on the name compression system described in [Dav62].

[Dol70] identifies this as having been the name compression algorithm used by SABRE.

New in version 0.3.6.

Initialize Davidson instance.

Parameters: omit_fname (bool) -- Set to True to completely omit the first character of the first name

New in version 0.4.0.

_trans = {65: '', 69: '', 72: '', 73: '', 79: '', 85: '', 87: '', 89: ''}¶

encode(lname, fname='.')[source]¶

Return Davidson's Consonant Code.

Parameters

lname (str) -- Last name (or word) to be encoded
fname (str) -- First name (optional), of which the first character is included in the code.

Returns

Davidson's Consonant Code

Return type

str

Example

>>> pe = Davidson()
>>> pe.encode('Gough')
'G   .'
>>> pe.encode('pneuma')
'PNM .'
>>> pe.encode('knight')
'KNGT.'
>>> pe.encode('trice')
'TRC .'
>>> pe.encode('judge')
'JDG .'
>>> pe.encode('Smith', 'James')
'SMT J'
>>> pe.encode('Wasserman', 'Tabitha')
'WSRMT'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.davidson(lname, fname='.', omit_fname=False)[source]¶

Return Davidson's Consonant Code.

This is a wrapper for Davidson.encode().

Parameters

lname (str) -- Last name (or word) to be encoded
fname (str) -- First name (optional), of which the first character is included in the code.
omit_fname (bool) -- Set to True to completely omit the first character of the first name

Returns

Davidson's Consonant Code

Return type

str

Example

>>> davidson('Gough')
'G   .'
>>> davidson('pneuma')
'PNM .'
>>> davidson('knight')
'KNGT.'
>>> davidson('trice')
'TRC .'
>>> davidson('judge')
'JDG .'
>>> davidson('Smith', 'James')
'SMT J'
>>> davidson('Wasserman', 'Tabitha')
'WSRMT'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Davidson.encode method instead.

class abydos.phonetic.Dolby(max_length=-1, keep_vowels=False, vowel_char='*')[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Dolby Code.

This follows "A Spelling Equivalent Abbreviation Algorithm For Personal Names" from [Dol70] and [C+69].

New in version 0.3.6.

Initialize Dolby instance.

Parameters

max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
keep_vowels (bool) -- If True, retains all vowel markers
vowel_char (str) -- The vowel marker character (default to *)

New in version 0.4.0.

encode(word)[source]¶

Return the Dolby Code of a name.

Parameters: word (str) -- The word to transform
Returns: The Dolby Code
Return type: str

Examples

>>> pe = Dolby()
>>> pe.encode('Hansen')
'H*NSN'
>>> pe.encode('Larsen')
'L*RSN'
>>> pe.encode('Aagaard')
'*GR'
>>> pe.encode('Braaten')
'BR*DN'
>>> pe.encode('Sandvik')
'S*NVK'

>>> pe_6 = Dolby(max_length=6)
>>> pe_6.encode('Hansen')
'H*NS*N'
>>> pe_6.encode('Larsen')
'L*RS*N'
>>> pe_6.encode('Aagaard')
'*G*R  '
>>> pe_6.encode('Braaten')
'BR*D*N'
>>> pe_6.encode('Sandvik')
'S*NF*K'

>>> pe.encode('Smith')
'SM*D'
>>> pe.encode('Waters')
'W*DRS'
>>> pe.encode('James')
'J*MS'
>>> pe.encode('Schmidt')
'SM*D'
>>> pe.encode('Ashcroft')
'*SKRFD'

>>> pe_6.encode('Smith')
'SM*D  '
>>> pe_6.encode('Waters')
'W*D*RS'
>>> pe_6.encode('James')
'J*M*S '
>>> pe_6.encode('Schmidt')
'SM*D  '
>>> pe_6.encode('Ashcroft')
'*SKRFD'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Dolby Code of a name.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Dolby Code
Return type: str

Examples

>>> pe = Dolby()
>>> pe.encode_alpha('Hansen')
'HANSN'
>>> pe.encode_alpha('Larsen')
'LARSN'
>>> pe.encode_alpha('Aagaard')
'AGR'
>>> pe.encode_alpha('Braaten')
'BRADN'
>>> pe.encode_alpha('Sandvik')
'SANVK'

New in version 0.4.0.

abydos.phonetic.dolby(word, max_length=-1, keep_vowels=False, vowel_char='*')[source]¶

Return the Dolby Code of a name.

This is a wrapper for Dolby.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
keep_vowels (bool) -- If True, retains all vowel markers
vowel_char (str) -- The vowel marker character (default to *)

Returns

The Dolby Code

Return type

str

Examples

>>> dolby('Hansen')
'H*NSN'
>>> dolby('Larsen')
'L*RSN'
>>> dolby('Aagaard')
'*GR'
>>> dolby('Braaten')
'BR*DN'
>>> dolby('Sandvik')
'S*NVK'
>>> dolby('Hansen', max_length=6)
'H*NS*N'
>>> dolby('Larsen', max_length=6)
'L*RS*N'
>>> dolby('Aagaard', max_length=6)
'*G*R  '
>>> dolby('Braaten', max_length=6)
'BR*D*N'
>>> dolby('Sandvik', max_length=6)
'S*NF*K'

>>> dolby('Smith')
'SM*D'
>>> dolby('Waters')
'W*DRS'
>>> dolby('James')
'J*MS'
>>> dolby('Schmidt')
'SM*D'
>>> dolby('Ashcroft')
'*SKRFD'
>>> dolby('Smith', max_length=6)
'SM*D  '
>>> dolby('Waters', max_length=6)
'W*D*RS'
>>> dolby('James', max_length=6)
'J*M*S '
>>> dolby('Schmidt', max_length=6)
'SM*D  '
>>> dolby('Ashcroft', max_length=6)
'*SKRFD'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Dolby.encode method instead.

class abydos.phonetic.SPFC[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Standardized Phonetic Frequency Code (SPFC).

Standardized Phonetic Frequency Code is roughly Soundex-like. This implementation is based on page 19-21 of [MKTM77].

New in version 0.3.6.

_pf1 = {65: '3', 66: '3', 67: '1', 68: '5', 69: '6', 70: '2', 71: '7', 72: '5', 73: '5', 74: '7', 75: '1', 76: '4', 77: '6', 78: '6', 79: '4', 80: '2', 81: '1', 82: '4', 83: '0', 84: '7', 85: '2', 86: '1', 87: '2', 88: '6', 90: '0'}¶

_pf1_alphabetic = {48: 'S', 49: 'C', 50: 'F', 51: 'A', 52: 'L', 53: 'D', 54: 'E', 55: 'G'}¶

_pf2 = {65: '3', 66: '3', 67: '1', 68: '5', 69: '9', 70: '2', 71: '7', 72: '5', 73: '5', 74: '7', 75: '1', 76: '9', 77: '6', 78: '6', 79: '4', 80: '2', 81: '1', 82: '4', 83: '0', 84: '7', 85: '8', 86: '8', 87: '8', 88: '2', 90: '0'}¶

_pf2_alphabetic = {48: 'S', 49: 'C', 50: 'F', 51: 'A', 52: 'O', 53: 'D', 54: 'M', 55: 'G', 56: 'U', 57: 'E'}¶

_pf3 = {65: '7', 66: '0', 67: '0', 68: '1', 69: '7', 70: '2', 71: '3', 72: '7', 73: '7', 74: '3', 75: '0', 76: '2', 77: '4', 78: '4', 79: '7', 80: '2', 81: '0', 82: '5', 83: '6', 84: '1', 85: '7', 86: '0', 87: '7', 88: '3', 89: '7', 90: '6'}¶

_pf3_alphabetic = {48: 'B', 49: 'D', 50: 'F', 51: 'G', 52: 'M', 53: 'R', 54: 'S', 55: 'Z'}¶

_substitutions = (('DK', 'K'), ('DT', 'T'), ('SC', 'S'), ('KN', 'N'), ('MN', 'N'))¶

encode(word)[source]¶

Return the Standardized Phonetic Frequency Code (SPFC) of a word.

Parameters: word (str) -- The word to transform
Returns: The SPFC value
Return type: str
Raises: AttributeError -- Word attribute must be a string with a space or period dividing the first and last names or a tuple/list consisting of the first and last names

Examples

>>> pe = SPFC()
>>> pe.encode('Christopher Smith')
'01160'
>>> pe.encode('Christopher Schmidt')
'01160'
>>> pe.encode('Niall Smith')
'01660'
>>> pe.encode('Niall Schmidt')
'01660'

>>> pe.encode('L.Smith')
'01960'
>>> pe.encode('R.Miller')
'65490'

>>> pe.encode(('L', 'Smith'))
'01960'
>>> pe.encode(('R', 'Miller'))
'65490'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic SPFC of a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic SPFC value
Return type: str

Examples

>>> pe = SPFC()
>>> pe.encode_alpha('Christopher Smith')
'SDCMS'
>>> pe.encode_alpha('Christopher Schmidt')
'SDCMS'
>>> pe.encode_alpha('Niall Smith')
'SDMMS'
>>> pe.encode_alpha('Niall Schmidt')
'SDMMS'

>>> pe.encode_alpha('L.Smith')
'SDEMS'
>>> pe.encode_alpha('R.Miller')
'EROES'

>>> pe.encode_alpha(('L', 'Smith'))
'SDEMS'
>>> pe.encode_alpha(('R', 'Miller'))
'EROES'

New in version 0.4.0.

abydos.phonetic.spfc(word)[source]¶

Return the Standardized Phonetic Frequency Code (SPFC) of a word.

This is a wrapper for SPFC.encode().

Parameters: word (str) -- The word to transform
Returns: The SPFC value
Return type: str

Examples

>>> spfc('Christopher Smith')
'01160'
>>> spfc('Christopher Schmidt')
'01160'
>>> spfc('Niall Smith')
'01660'
>>> spfc('Niall Schmidt')
'01660'

>>> spfc('L.Smith')
'01960'
>>> spfc('R.Miller')
'65490'

>>> spfc(('L', 'Smith'))
'01960'
>>> spfc(('R', 'Miller'))
'65490'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SPFC.encode method instead.

class abydos.phonetic.RogerRoot(max_length=5, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Roger Root code.

This is Roger Root name coding, described in [MKTM77].

New in version 0.3.6.

Initialize RogerRoot instance.

Parameters

max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {48: 'S', 49: 'T', 50: 'N', 51: 'M', 52: 'R', 53: 'L', 54: 'J', 55: 'K', 56: 'F', 57: 'P'}¶

_alphabetic_initial = {48: ' ', 49: 'A', 50: 'H', 51: 'J', 52: 'W', 53: 'Y'}¶

_init_patterns = {1: {'A': '1', 'B': '09', 'C': '07', 'D': '01', 'E': '1', 'F': '08', 'G': '07', 'H': '2', 'I': '1', 'J': '3', 'K': '07', 'L': '05', 'M': '03', 'N': '02', 'O': '1', 'P': '09', 'Q': '07', 'R': '04', 'S': '0*0', 'T': '01', 'U': '1', 'V': '08', 'W': '4', 'X': '07', 'Y': '5', 'Z': '0*0'}, 2: {'CE': '0*0', 'CH': '06', 'CI': '0*0', 'CY': '0*0', 'DG': '07', 'GF': '08', 'GM': '03', 'GN': '02', 'KN': '02', 'PF': '08', 'PH': '08', 'PN': '02', 'SH': '06', 'TS': '0*0', 'WR': '04'}, 3: {'SCH': '06', 'TSH': '06'}, 4: {'TSCH': '06'}}¶

_med_patterns = {1: {'A': '*', 'B': '9', 'C': '7', 'D': '1', 'E': '*', 'F': '8', 'G': '7', 'H': '*', 'I': '*', 'J': '6', 'K': '7', 'L': '5', 'M': '3', 'N': '2', 'O': '*', 'P': '9', 'Q': '7', 'R': '4', 'S': '0', 'T': '1', 'U': '*', 'V': '8', 'W': '*', 'X': '7', 'Y': '*', 'Z': '0'}, 2: {'CE': '0', 'CH': '6', 'CI': '0', 'CY': '0', 'DG': '7', 'PH': '8', 'SH': '6', 'TS': '0'}, 3: {'SCH': '6', 'TSH': '6'}, 4: {'TSCH': '6'}}¶

encode(word)[source]¶

Return the Roger Root code for a word.

Parameters: word (str) -- The word to transform
Returns: The Roger Root code
Return type: str

Examples

>>> pe = RogerRoot()
>>> pe.encode('Christopher')
'06401'
>>> pe.encode('Niall')
'02500'
>>> pe.encode('Smith')
'00310'
>>> pe.encode('Schmidt')
'06310'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Roger Root code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Roger Root code
Return type: str

Examples

>>> pe = RogerRoot()
>>> pe.encode_alpha('Christopher')
'JRST'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SMT'
>>> pe.encode_alpha('Schmidt')
'JMT'

New in version 0.4.0.

abydos.phonetic.roger_root(word, max_length=5, zero_pad=True)[source]¶

Return the Roger Root code for a word.

This is a wrapper for RogerRoot.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The Roger Root code

Return type

str

Examples

>>> roger_root('Christopher')
'06401'
>>> roger_root('Niall')
'02500'
>>> roger_root('Smith')
'00310'
>>> roger_root('Schmidt')
'06310'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RogerRoot.encode method instead.

class abydos.phonetic.StatisticsCanada(max_length=4)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Statistics Canada code.

The original description of this algorithm could not be located, and may only have been specified in an unpublished TR. The coding does not appear to be in use by Statistics Canada any longer. In its place, this is an implementation of the "Census modified Statistics Canada name coding procedure".

The modified version of this algorithm is described in Appendix B of [MKTM77].

New in version 0.3.6.

Initialize StatisticsCanada instance.

Parameters: max_length (int) -- The length of the code returned (defaults to 4)

New in version 0.4.0.

encode(word)[source]¶

Return the Statistics Canada code for a word.

Parameters: word (str) -- The word to transform
Returns: The Statistics Canada name code value
Return type: str

Examples

>>> pe = StatisticsCanada()
>>> pe.encode('Christopher')
'CHRS'
>>> pe.encode('Niall')
'NL'
>>> pe.encode('Smith')
'SMTH'
>>> pe.encode('Schmidt')
'SCHM'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.statistics_canada(word, max_length=4)[source]¶

Return the Statistics Canada code for a word.

This is a wrapper for StatisticsCanada.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The maximum length (default 4) of the code to return

Returns

The Statistics Canada name code value

Return type

str

Examples

>>> statistics_canada('Christopher')
'CHRS'
>>> statistics_canada('Niall')
'NL'
>>> statistics_canada('Smith')
'SMTH'
>>> statistics_canada('Schmidt')
'SCHM'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the StatisticsCanada.encode method instead.

class abydos.phonetic.SoundD(max_length=4)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

SoundD code.

SoundD is defined in [VB12].

New in version 0.3.6.

Initialize SoundD instance.

Parameters: max_length (int) -- The length of the code returned (defaults to 4)

New in version 0.4.0.

_alphabetic = {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R'}¶

_trans = {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶

encode(word)[source]¶

Return the SoundD code.

Parameters: word (str) -- The word to transform
Returns: The SoundD code
Return type: str

Examples

>>> pe = SoundD()
>>> pe.encode('Gough')
'2000'
>>> pe.encode('pneuma')
'5500'
>>> pe.encode('knight')
'5300'
>>> pe.encode('trice')
'3620'
>>> pe.encode('judge')
'2200'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic SoundD code.

Parameters: word (str) -- The word to transform
Returns: The alphabetic SoundD code
Return type: str

Examples

>>> pe = SoundD()
>>> pe.encode_alpha('Gough')
'K'
>>> pe.encode_alpha('pneuma')
'NN'
>>> pe.encode_alpha('knight')
'NT'
>>> pe.encode_alpha('trice')
'TRK'
>>> pe.encode_alpha('judge')
'KK'

New in version 0.4.0.

abydos.phonetic.sound_d(word, max_length=4)[source]¶

Return the SoundD code.

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)

Returns

The SoundD code

Return type

str

Examples

>>> sound_d('Gough')
'2000'
>>> sound_d('pneuma')
'5500'
>>> sound_d('knight')
'5300'
>>> sound_d('trice')
'3620'
>>> sound_d('judge')
'2200'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SoundD.encode method instead.

class abydos.phonetic.ParmarKumbharana[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Parmar-Kumbharana code.

This is based on the phonetic algorithm proposed in [PK14].

New in version 0.3.6.

_del_trans = {65: '', 69: '', 73: '', 79: '', 85: '', 89: ''}¶

_rules = {2: {'CE': 'S', 'CI': 'S', 'CK': 'K', 'CY': 'S', 'GE': 'J', 'GI': 'J', 'GN': 'N', 'GY': 'J', 'KN': 'N', 'PN': 'N', 'SH': 'S', 'WR': 'R'}, 3: {'DGE': 'J', 'GHT': 'T', 'OUL': 'U'}, 4: {'OUGH': 'F'}}¶

encode(word)[source]¶

Return the Parmar-Kumbharana encoding of a word.

Parameters: word (str) -- The word to transform
Returns: The Parmar-Kumbharana encoding
Return type: str

Examples

>>> pe = ParmarKumbharana()
>>> pe.encode('Gough')
'GF'
>>> pe.encode('pneuma')
'NM'
>>> pe.encode('knight')
'NT'
>>> pe.encode('trice')
'TRS'
>>> pe.encode('judge')
'JJ'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.parmar_kumbharana(word)[source]¶

Return the Parmar-Kumbharana encoding of a word.

This is a wrapper for ParmarKumbharana.encode().

Parameters: word (str) -- The word to transform
Returns: The Parmar-Kumbharana encoding
Return type: str

Examples

>>> parmar_kumbharana('Gough')
'GF'
>>> parmar_kumbharana('pneuma')
'NM'
>>> parmar_kumbharana('knight')
'NT'
>>> parmar_kumbharana('trice')
'TRS'
>>> parmar_kumbharana('judge')
'JJ'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the ParmarKumbharana.encode method instead.

class abydos.phonetic.Metaphone(max_length=-1)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Metaphone.

Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].

New in version 0.3.6.

Initialize AlphaSIS instance.

Parameters: max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)

New in version 0.4.0.

_frontv = {'E', 'I', 'Y'}¶

_varson = {'C', 'G', 'P', 'S', 'T'}¶

encode(word)[source]¶

Return the Metaphone code for a word.

Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].

Parameters: word (str) -- The word to transform
Returns: The Metaphone value
Return type: str

Examples

>>> pe = Metaphone()
>>> pe.encode('Christopher')
'KRSTFR'
>>> pe.encode('Niall')
'NL'
>>> pe.encode('Smith')
'SM0'
>>> pe.encode('Schmidt')
'SKMTT'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.metaphone(word, max_length=-1)[source]¶

Return the Metaphone code for a word.

This is a wrapper for Metaphone.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)

Returns

The Metaphone value

Return type

str

Examples

>>> metaphone('Christopher')
'KRSTFR'
>>> metaphone('Niall')
'NL'
>>> metaphone('Smith')
'SM0'
>>> metaphone('Schmidt')
'SKMTT'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Metaphone.encode method instead.

class abydos.phonetic.DoubleMetaphone(max_length=-1)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Double Metaphone.

Based on Lawrence Philips' (Visual) C++ code from 1999 [Phi00].

New in version 0.3.6.

Initialize DoubleMetaphone instance.

Parameters: max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0

New in version 0.4.0.

encode(word)[source]¶

Return the Double Metaphone code for a word.

Parameters: word (str) -- The word to transform
Returns: The Double Metaphone value(s)
Return type: tuple

Examples

>>> pe = DoubleMetaphone()
>>> pe.encode('Christopher')
('KRSTFR', '')
>>> pe.encode('Niall')
('NL', '')
>>> pe.encode('Smith')
('SM0', 'XMT')
>>> pe.encode('Schmidt')
('XMT', 'SMT')

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Double Metaphone code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Double Metaphone value(s)
Return type: tuple

Examples

>>> pe = DoubleMetaphone()
>>> pe.encode_alpha('Christopher')
('KRSTFR', '')
>>> pe.encode_alpha('Niall')
('NL', '')
>>> pe.encode_alpha('Smith')
('SMÞ', 'XMT')
>>> pe.encode_alpha('Schmidt')
('XMT', 'SMT')

New in version 0.4.0.

abydos.phonetic.double_metaphone(word, max_length=-1)[source]¶

Return the Double Metaphone code for a word.

This is a wrapper for DoubleMetaphone.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The maximum length of the returned Double Metaphone codes (defaults to unlimited, but in Philips' original implementation this was 4)

Returns

The Double Metaphone value(s)

Return type

tuple

Examples

>>> double_metaphone('Christopher')
('KRSTFR', '')
>>> double_metaphone('Niall')
('NL', '')
>>> double_metaphone('Smith')
('SM0', 'XMT')
>>> double_metaphone('Schmidt')
('XMT', 'SMT')

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the DoubleMetaphone.encode method instead.

class abydos.phonetic.Eudex(max_length=8)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Eudex hash.

This implementation of eudex phonetic hashing is based on the specification (not the reference implementation) at [Tic].

Further details can be found at [Tic16].

New in version 0.3.6.

Initialize Eudex instance.

Parameters: max_length (int) -- The length in bits of the code returned (default 8)

New in version 0.4.0.

_initial_phones = {'a': 132, 'b': 36, 'c': 6, 'd': 12, 'e': 216, 'f': 34, 'g': 4, 'h': 2, 'i': 248, 'j': 3, 'k': 5, 'l': 80, 'm': 1, 'n': 9, 'o': 148, 'p': 37, 'q': 84, 'r': 81, 's': 10, 't': 14, 'u': 224, 'v': 35, 'w': 0, 'x': 66, 'y': 228, 'z': 74, 'ß': 11, 'à': 133, 'á': 133, 'â': 128, 'ã': 134, 'ä': 166, 'å': 194, 'æ': 167, 'ç': 84, 'è': 217, 'é': 217, 'ê': 217, 'ë': 198, 'ì': 249, 'í': 249, 'î': 249, 'ï': 249, 'ð': 11, 'ñ': 11, 'ò': 149, 'ó': 149, 'ô': 149, 'õ': 149, 'ö': 220, '÷': 255, 'ø': 221, 'ù': 225, 'ú': 225, 'û': 225, 'ü': 229, 'ý': 229, 'þ': 11, 'ÿ': 229}¶

_trailing_phones = {'a': 0, 'b': 72, 'c': 12, 'd': 24, 'e': 0, 'f': 68, 'g': 8, 'h': 4, 'i': 1, 'j': 5, 'k': 9, 'l': 160, 'm': 2, 'n': 18, 'o': 0, 'p': 73, 'q': 168, 'r': 161, 's': 20, 't': 29, 'u': 1, 'v': 69, 'w': 0, 'x': 132, 'y': 1, 'z': 148, 'ß': 21, 'à': 0, 'á': 0, 'â': 0, 'ã': 0, 'ä': 0, 'å': 1, 'æ': 0, 'ç': 149, 'è': 1, 'é': 1, 'ê': 1, 'ë': 1, 'ì': 1, 'í': 1, 'î': 1, 'ï': 1, 'ð': 21, 'ñ': 23, 'ò': 0, 'ó': 0, 'ô': 0, 'õ': 0, 'ö': 1, '÷': 255, 'ø': 1, 'ù': 1, 'ú': 1, 'û': 1, 'ü': 1, 'ý': 1, 'þ': 21, 'ÿ': 1}¶

encode(word)[source]¶

Return the eudex phonetic hash of a word.

Parameters: word (str) -- The word to transform
Returns: The eudex hash
Return type: int

Examples

>>> pe = Eudex()
>>> pe.encode('Colin')
432345564238053650
>>> pe.encode('Christopher')
433648490138894409
>>> pe.encode('Niall')
648518346341351840
>>> pe.encode('Smith')
720575940412906756
>>> pe.encode('Schmidt')
720589151732307997

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.eudex(word, max_length=8)[source]¶

Return the eudex phonetic hash of a word.

This is a wrapper for Eudex.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length in bits of the code returned (default 8)

Returns

The eudex hash

Return type

int

Examples

>>> eudex('Colin')
432345564238053650
>>> eudex('Christopher')
433648490138894409
>>> eudex('Niall')
648518346341351840
>>> eudex('Smith')
720575940412906756
>>> eudex('Schmidt')
720589151732307997

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Eudex.encode method instead.

class abydos.phonetic.BeiderMorse(language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Beider-Morse Phonetic Matching.

The Beider-Morse Phonetic Matching algorithm is described in [BM08]. The reference implementation is licensed under GPLv3.

New in version 0.3.6.

Initialize BeiderMorse instance.

Parameters

language_arg (str or int) --
The language of the term; supported values include:
- any
- arabic
- cyrillic
- czech
- dutch
- english
- french
- german
- greek
- greeklatin
- hebrew
- hungarian
- italian
- latvian
- polish
- portuguese
- romanian
- russian
- spanish
- turkish
name_mode (str) --
The name mode of the algorithm:
- gen -- general (default)
- ash -- Ashkenazi
- sep -- Sephardic
match_mode (str) -- Matching mode: approx or exact
concat (bool) -- Concatenation mode
filter_langs (bool) -- Filter out incompatible languages

New in version 0.4.0.

_apply_final_rules(phonetic, final_rules, language_arg, strip)[source]¶

Apply a set of final rules to the phonetic encoding.

Parameters

phonetic (str) -- The term to which to apply the final rules
final_rules (tuple) -- The set of final phonetic transform regexps
language_arg (int) -- An integer representing the target language of the phonetic encoding
strip (bool) -- Flag to indicate whether to normalize the language attributes

Returns

A Beider-Morse phonetic code

Return type

str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_apply_rule_if_compat(phonetic, target, language_arg)[source]¶

Apply a phonetic regex if compatible.

tests for compatible language rules

to do so, apply the rule, expand the results, and detect alternatives: with incompatible attributes
then drop each alternative that has incompatible attributes and keep: those that are compatible

if there are no compatible alternatives left, return false

otherwise return the compatible alternatives

apply the rule

Parameters

phonetic (str) -- The Beider-Morse phonetic encoding (so far)
target (str) -- A proposed addition to the phonetic encoding
language_arg (int) -- An integer representing the target language of the phonetic encoding

Returns

A candidate encoding

Return type

str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_expand_alternates(phonetic)[source]¶

Expand phonetic alternates separated by |s.

Parameters: phonetic (str) -- A Beider-Morse phonetic encoding
Returns: A Beider-Morse phonetic code
Return type: str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_language(name, name_mode)[source]¶

Return the best guess language ID for the word and language choices.

Parameters

name (str) -- The term to guess the language of
name_mode (str) -- The name mode of the algorithm: gen (default), ash (Ashkenazi), or sep (Sephardic)

Returns

Language ID

Return type

int

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_language_index_from_code(code, name_mode)[source]¶

Return the index value for a language code.

This returns l_any if more than one code is specified or the code is out of bounds.

Parameters

code (int) -- The language code to interpret
name_mode (str) -- The name mode of the algorithm: gen (default), ash (Ashkenazi), or sep (Sephardic)

Returns

Language code index

Return type

int

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_normalize_lang_attrs(text, strip)[source]¶

Remove embedded bracketed attributes.

This (potentially) bitwise-ands bracketed attributes together and adds to the end. This is applied to a single alternative at a time -- not to a parenthesized list. It removes all embedded bracketed attributes, logically-ands them together, and places them at the end. However if strip is true, this can indeed remove embedded bracketed attributes from a parenthesized list.

Parameters

text (str) -- A Beider-Morse phonetic encoding (in progress)
strip (bool) -- Remove the bracketed attributes (and throw away)

Returns

A Beider-Morse phonetic code

Return type

str

Raises

ValueError -- No closing square bracket

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_phonetic(term, name_mode, rules, final_rules1, final_rules2, language_arg=0, concat=False)[source]¶

Return the Beider-Morse encoding(s) of a term.

Parameters

term (str) -- The term to encode via Beider-Morse
name_mode (str) -- The name mode of the algorithm: gen (default), ash (Ashkenazi), or sep (Sephardic)
rules (tuple) -- The set of initial phonetic transform regexps
final_rules1 (tuple) -- The common set of final phonetic transform regexps
final_rules2 (tuple) -- The specific set of final phonetic transform regexps
language_arg (int) -- The language of the term
concat (bool) -- A flag to indicate concatenation

Returns

A Beider-Morse phonetic code

Return type

str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_phonetic_number(phonetic)[source]¶

Remove bracketed text from the end of a string.

Parameters: phonetic (str) -- A Beider-Morse phonetic encoding
Returns: A Beider-Morse phonetic code
Return type: str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_phonetic_numbers(phonetic)[source]¶

Prepare & join phonetic numbers.

Split phonetic value on '-', run through _pnums_with_leading_space, and join with ' '

Parameters: phonetic (str) -- A Beider-Morse phonetic encoding
Returns: A Beider-Morse phonetic code
Return type: str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_pnums_with_leading_space(phonetic)[source]¶

Join prefixes & suffixes in cases of alternate phonetic values.

Parameters: phonetic (str) -- A Beider-Morse phonetic encoding
Returns: A Beider-Morse phonetic code
Return type: str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_redo_language(term, name_mode, rules, final_rules1, final_rules2, concat)[source]¶

Reassess the language of the terms and call the phonetic encoder.

Uses a split multi-word term.

Parameters

term (str) -- The term to encode via Beider-Morse
name_mode (str) -- The name mode of the algorithm: gen (default), ash (Ashkenazi), or sep (Sephardic)
rules (tuple) -- The set of initial phonetic transform regexps
final_rules1 (tuple) -- The common set of final phonetic transform regexps
final_rules2 (tuple) -- The specific set of final phonetic transform regexps
concat (bool) -- A flag to indicate concatenation

Returns

A Beider-Morse phonetic code

Return type

str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_remove_dupes(phonetic)[source]¶

Remove duplicates from a phonetic encoding list.

Parameters: phonetic (str) -- A Beider-Morse phonetic encoding
Returns: A Beider-Morse phonetic code
Return type: str

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode(word)[source]¶

Return the Beider-Morse Phonetic Matching encoding(s) of a term.

Parameters: word (str) -- The word to transform
Returns: The Beider-Morse phonetic value(s)
Return type: tuple
Raises: ValueError -- Unknown language

Examples

>>> pe = BeiderMorse()
>>> pe.encode('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir
xristofir xristYfir xristopi xritopir xritopi xristofi xritofir
xritofi tzristopir tzristofir zristopir zristopi zritopir zritopi
zristofir zristofi zritofir zritofi'
>>> pe.encode('Niall')
'nial niol'
>>> pe.encode('Smith')
'zmit'
>>> pe.encode('Schmidt')
'zmit stzmit'

>>> BeiderMorse(language_arg='German').encode('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir
xristofir xristYfir'
>>> BeiderMorse(language_arg='English').encode('Christopher')
'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir
xristafir xrQstafir'
>>> BeiderMorse(language_arg='German',
... name_mode='ash').encode('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir
xristofir xristYfir'

>>> BeiderMorse(language_arg='German',
... match_mode='exact').encode('Christopher')
'xriStopher xriStofer xristopher xristofer'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.bmpm(word, language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]¶

Return the Beider-Morse Phonetic Matching encoding(s) of a term.

This is a wrapper for BeiderMorse.encode().

Parameters

word (str) -- The word to transform
language_arg (str) --
The language of the term; supported values include:
- any
- arabic
- cyrillic
- czech
- dutch
- english
- french
- german
- greek
- greeklatin
- hebrew
- hungarian
- italian
- latvian
- polish
- portuguese
- romanian
- russian
- spanish
- turkish
name_mode (str) --
The name mode of the algorithm:
- gen -- general (default)
- ash -- Ashkenazi
- sep -- Sephardic
match_mode (str) -- Matching mode: approx or exact
concat (bool) -- Concatenation mode
filter_langs (bool) -- Filter out incompatible languages

Returns

The Beider-Morse phonetic value(s)

Return type

tuple

Examples

>>> bmpm('Christopher')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir xristopi xritopir xritopi xristofi xritofir xritofi
tzristopir tzristofir zristopir zristopi zritopir zritopi zristofir
zristofi zritofir zritofi'
>>> bmpm('Niall')
'nial niol'
>>> bmpm('Smith')
'zmit'
>>> bmpm('Schmidt')
'zmit stzmit'

>>> bmpm('Christopher', language_arg='German')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir'
>>> bmpm('Christopher', language_arg='English')
'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir
xristafir xrQstafir'
>>> bmpm('Christopher', language_arg='German', name_mode='ash')
'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir
xristYfir'

>>> bmpm('Christopher', language_arg='German', match_mode='exact')
'xriStopher xriStofer xristopher xristofer'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the BeiderMorse.encode method instead.

class abydos.phonetic.NRL[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Naval Research Laboratory English-to-phoneme encoder.

This is defined by [EJMS76].

New in version 0.3.6.

_rules = {' ': (('', ' ', '', ' '), ('', '-', '', ''), ('.', "'S", '', 'z'), ('#:.E', "'S", '', 'z'), ('#', "'S", '', 'z'), ('', "'", '', ''), ('', ',', '', ' '), ('', '.', '', ' '), ('', '?', '', ' '), ('', '!', '', ' ')), 'A': (('', 'A', ' ', 'AX'), (' ', 'ARE', ' ', 'AAr'), (' ', 'AR', 'O', 'AXr'), ('', 'AR', '#', 'EHr'), ('^', 'AS', '#', 'EYs'), ('', 'A', 'WA', 'AX'), ('', 'AW', '', 'AO'), (' :', 'ANY', '', 'EHnIY'), ('', 'A', '^+#', 'EY'), ('#:', 'ALLY', '', 'AXlIY'), (' ', 'AL', '#', 'AXl'), ('', 'AGAIN', '', 'AXgEHn'), ('#:', 'AG', 'E', 'IHj'), ('', 'A', '^+:#', 'AE'), (' :', 'A', '^+ ', 'EY'), ('', 'A', '^%', 'EY'), (' ', 'ARR', '', 'AXr'), ('', 'ARR', '', 'AEr'), (' :', 'AR', ' ', 'AAr'), ('', 'AR', ' ', 'ER'), ('', 'AR', '', 'AAr'), ('', 'AIR', '', 'EHr'), ('', 'AI', '', 'EY'), ('', 'AY', '', 'EY'), ('', 'AU', '', 'AO'), ('#:', 'AL', ' ', 'AXl'), ('#:', 'ALS', ' ', 'AXlz'), ('', 'ALK', '', 'AOk'), ('', 'AL', '^', 'AOl'), (' :', 'ABLE', '', 'EYbAXl'), ('', 'ABLE', '', 'AXbAXl'), ('', 'ANG', '+', 'EYnj'), ('', 'A', '', 'AE')), 'B': ((' ', 'BE', '^#', 'bIH'), ('', 'BEING', '', 'bIYIHNG'), (' ', 'BOTH', ' ', 'bOWTH'), (' ', 'BUS', '#', 'bIHz'), ('', 'BUIL', '', 'bIHl'), ('', 'B', '', 'b')), 'C': ((' ', 'CH', '^', 'k'), ('^E', 'CH', '', 'k'), ('', 'CH', '', 'CH'), (' S', 'CI', '#', 'sAY'), ('', 'CI', 'A', 'SH'), ('', 'CI', 'O', 'SH'), ('', 'CI', 'EN', 'SH'), ('', 'C', '+', 's'), ('', 'CK', '', 'k'), ('', 'COM', '%', 'kAHm'), ('', 'C', '', 'k')), 'D': (('#:', 'DED', ' ', 'dIHd'), ('.E', 'D', ' ', 'd'), ('#:^E', 'D', ' ', 't'), (' ', 'DE', '^#', 'dIH'), (' ', 'DO', ' ', 'dUW'), (' ', 'DOES', '', 'dAHz'), (' ', 'DOING', '', 'dUWIHNG'), (' ', 'DOW', '', 'dAW'), ('', 'DU', 'A', 'jUW'), ('', 'D', '', 'd')), 'E': (('#:', 'E', ' ', ''), ("':^", 'E', ' ', ''), (' :', 'E', ' ', 'IY'), ('#', 'ED', ' ', 'd'), ('#:', 'E', 'D ', ''), ('', 'EV', 'ER', 'EHv'), ('', 'E', '^%', 'IY'), ('', 'ERI', '#', 'IYrIY'), ('', 'ERI', '', 'EHrIH'), ('#:', 'ER', '#', 'ER'), ('', 'ER', '#', 'EHr'), ('', 'ER', '', 'ER'), (' ', 'EVEN', '', 'IYvEHn'), ('#:', 'E', 'W', ''), ('T', 'EW', '', 'UW'), ('S', 'EW', '', 'UW'), ('R', 'EW', '', 'UW'), ('D', 'EW', '', 'UW'), ('L', 'EW', '', 'UW'), ('Z', 'EW', '', 'UW'), ('N', 'EW', '', 'UW'), ('J', 'EW', '', 'UW'), ('TH', 'EW', '', 'UW'), ('CH', 'EW', '', 'UW'), ('SH', 'EW', '', 'UW'), ('', 'EW', '', 'yUW'), ('', 'E', 'O', 'IY'), ('#:S', 'ES', ' ', 'IHz'), ('#:C', 'ES', ' ', 'IHz'), ('#:G', 'ES', ' ', 'IHz'), ('#:Z', 'ES', ' ', 'IHz'), ('#:X', 'ES', ' ', 'IHz'), ('#:J', 'ES', ' ', 'IHz'), ('#:CH', 'ES', ' ', 'IHz'), ('#:SH', 'ES', ' ', 'IHz'), ('#:', 'E', 'S ', ''), ('#:', 'ELY', ' ', 'lIY'), ('#:', 'EMENT', '', 'mEHnt'), ('', 'EFUL', '', 'fUHl'), ('', 'EE', '', 'IY'), ('', 'EARN', '', 'ERn'), (' ', 'EAR', '^', 'ER'), ('', 'EAD', '', 'EHd'), ('#:', 'EA', ' ', 'IYAX'), ('', 'EA', 'SU', 'EH'), ('', 'EA', '', 'IY'), ('', 'EIGH', '', 'EY'), ('', 'EI', '', 'IY'), (' ', 'EYE', '', 'AY'), ('', 'EY', '', 'IY'), ('', 'EU', '', 'yUW'), ('', 'E', '', 'EH')), 'F': (('', 'FUL', '', 'fUHl'), ('', 'F', '', 'f')), 'G': (('', 'GIV', '', 'gIHv'), (' ', 'G', 'I^', 'g'), ('', 'GE', 'T', 'gEH'), ('SU', 'GGES', '', 'gjEHs'), ('', 'GG', '', 'g'), (' B#', 'G', '', 'g'), ('', 'G', '+', 'j'), ('', 'GREAT', '', 'grEYt'), ('#', 'GH', '', ''), ('', 'G', '', 'g')), 'H': ((' ', 'HAV', '', 'hAEv'), (' ', 'HERE', '', 'hIYr'), (' ', 'HOUR', '', 'AWER'), ('', 'HOW', '', 'hAW'), ('', 'H', '#', 'h'), ('', 'H', '', '')), 'I': ((' ', 'IN', '', 'IHn'), (' ', 'I', ' ', 'AY'), ('', 'IN', 'D', 'AYn'), ('', 'IER', '', 'IYER'), ('#:R', 'IED', '', 'IYd'), ('', 'IED', ' ', 'AYd'), ('', 'IEN', '', 'IYEHn'), ('', 'IE', 'T', 'AYEH'), (' :', 'I', '%', 'AY'), ('', 'I', '%', 'IY'), ('', 'IE', '', 'IY'), ('', 'I', '^+:#', 'IH'), ('', 'IR', '#', 'AYr'), ('', 'IZ', '%', 'AYz'), ('', 'IS', '%', 'AYz'), ('', 'I', 'D%', 'AY'), ('+^', 'I', '^+', 'IH'), ('', 'I', 'T%', 'AY'), ('#:^', 'I', '^+', 'IH'), ('', 'I', '^+', 'AY'), ('', 'IR', '', 'ER'), ('', 'IGH', '', 'AY'), ('', 'ILD', '', 'AYld'), ('', 'IGN', ' ', 'AYn'), ('', 'IGN', '^', 'AYn'), ('', 'IGN', '%', 'AYn'), ('', 'IQUE', '', 'IYk'), ('', 'I', '', 'IH')), 'J': (('', 'J', '', 'j'),), 'K': ((' ', 'K', 'N', ''), ('', 'K', '', 'k')), 'L': (('', 'LO', 'C#', 'lOW'), ('L', 'L', '', ''), ('#:^', 'L', '%', 'AXl'), ('', 'LEAD', '', 'lIYd'), ('', 'L', '', 'l')), 'M': (('', 'MOV', '', 'mUWv'), ('', 'M', '', 'm')), 'N': (('E', 'NG', '+', 'nj'), ('', 'NG', 'R', 'NGg'), ('', 'NG', '#', 'NGg'), ('', 'NGL', '%', 'NGgAXl'), ('', 'NG', '', 'NG'), ('', 'NK', '', 'NGk'), (' ', 'NOW', ' ', 'nAW'), ('', 'N', '', 'n')), 'O': (('', 'OF', ' ', 'AXv'), ('', 'OROUGH', '', 'EROW'), ('#:', 'OR', ' ', 'ER'), ('#:', 'ORS', ' ', 'ERz'), ('', 'OR', '', 'AOr'), (' ', 'ONE', '', 'wAHn'), ('', 'OW', '', 'OW'), (' ', 'OVER', '', 'OWvER'), ('', 'OV', '', 'AHv'), ('', 'O', '^%', 'OW'), ('', 'O', '^EN', 'OW'), ('', 'O', '^I#', 'OW'), ('', 'OL', 'D', 'OWl'), ('', 'OUGHT', '', 'AOt'), ('', 'OUGH', '', 'AHf'), (' ', 'OU', '', 'AW'), ('H', 'OU', 'S#', 'AW'), ('', 'OUS', '', 'AXs'), ('', 'OUR', '', 'AOr'), ('', 'OULD', '', 'UHd'), ('^', 'OU', '^L', 'AH'), ('', 'OUP', '', 'UWp'), ('', 'OU', '', 'AW'), ('', 'OY', '', 'OY'), ('', 'OING', '', 'OWIHNG'), ('', 'OI', '', 'OY'), ('', 'OOR', '', 'AOr'), ('', 'OOK', '', 'UHk'), ('', 'OOD', '', 'UHd'), ('', 'OO', '', 'UW'), ('', 'O', 'E', 'OW'), ('', 'O', ' ', 'OW'), ('', 'OA', '', 'OW'), (' ', 'ONLY', '', 'OWnlIY'), (' ', 'ONCE', '', 'wAHns'), ('', "ON'T", '', 'OWnt'), ('C', 'O', 'N', 'AA'), ('', 'O', 'NG', 'AO'), (' :^', 'O', 'N', 'AH'), ('I', 'ON', '', 'AXn'), ('#:', 'ON', ' ', 'AXn'), ('#^', 'ON', '', 'AXn'), ('', 'O', 'ST ', 'OW'), ('', 'OF', '^', 'AOf'), ('', 'OTHER', '', 'AHDHER'), ('', 'OSS', ' ', 'AOs'), ('#:^', 'OM', '', 'AHm'), ('', 'O', '', 'AA')), 'P': (('', 'PH', '', 'f'), ('', 'PEOP', '', 'pIYp'), ('', 'POW', '', 'pAW'), ('', 'PUT', ' ', 'pUHt'), ('', 'P', '', 'p')), 'Q': (('', 'QUAR', '', 'kwAOr'), ('', 'QU', '', 'kw'), ('', 'Q', '', 'k')), 'R': ((' ', 'RE', '^#', 'rIY'), ('', 'R', '', 'r')), 'S': (('', 'SH', '', 'SH'), ('#', 'SION', '', 'ZHAXn'), ('', 'SOME', '', 'sAHm'), ('#', 'SUR', '#', 'ZHER'), ('', 'SUR', '#', 'SHER'), ('#', 'SU', '#', 'ZHUW'), ('#', 'SSU', '#', 'SHUW'), ('#', 'SED', ' ', 'zd'), ('#', 'S', '#', 'z'), ('', 'SAID', '', 'sEHd'), ('^', 'SION', '', 'SHAXn'), ('', 'S', 'S', ''), ('.', 'S', ' ', 'z'), ('#:.E', 'S', ' ', 'z'), ('#:^##', 'S', ' ', 'z'), ('#:^#', 'S', ' ', 's'), ('U', 'S', ' ', 's'), (' :#', 'S', ' ', 'z'), (' ', 'SCH', '', 'sk'), ('', 'S', 'C+', ''), ('#', 'SM', '', 'zm'), ('#', 'SN', "'", 'zAXn'), ('', 'S', '', 's')), 'T': ((' ', 'THE', ' ', 'DHAX'), ('', 'TO', ' ', 'tUW'), ('', 'THAT', ' ', 'DHAEt'), (' ', 'THIS', ' ', 'DHIHs'), (' ', 'THEY', '', 'DHEY'), (' ', 'THERE', '', 'DHEHr'), ('', 'THER', '', 'DHER'), ('', 'THEIR', '', 'DHEHr'), (' ', 'THAN', ' ', 'DHAEn'), (' ', 'THEM', ' ', 'DHEHm'), ('', 'THESE', ' ', 'DHIYz'), (' ', 'THEN', '', 'DHEHn'), ('', 'THROUGH', '', 'THrUW'), ('', 'THOSE', '', 'DHOWz'), ('', 'THOUGH', ' ', 'DHOW'), (' ', 'THUS', '', 'DHAHs'), ('', 'TH', '', 'TH'), ('#:', 'TED', ' ', 'tIHd'), ('S', 'TI', '#N', 'CH'), ('', 'TI', 'O', 'SH'), ('', 'TI', 'A', 'SH'), ('', 'TIEN', '', 'SHAXn'), ('', 'TUR', '#', 'CHER'), ('', 'TU', 'A', 'CHUW'), (' ', 'TWO', '', 'tUW'), ('', 'T', '', 't')), 'U': ((' ', 'UN', 'I', 'yUWn'), (' ', 'UN', '', 'AHn'), (' ', 'UPON', '', 'AXpAOn'), ('T', 'UR', '#', 'UHr'), ('S', 'UR', '#', 'UHr'), ('R', 'UR', '#', 'UHr'), ('D', 'UR', '#', 'UHr'), ('L', 'UR', '#', 'UHr'), ('Z', 'UR', '#', 'UHr'), ('N', 'UR', '#', 'UHr'), ('J', 'UR', '#', 'UHr'), ('TH', 'UR', '#', 'UHr'), ('CH', 'UR', '#', 'UHr'), ('SH', 'UR', '#', 'UHr'), ('', 'UR', '#', 'yUHr'), ('', 'UR', '', 'ER'), ('', 'U', '^ ', 'AH'), ('', 'U', '^^', 'AH'), ('', 'UY', '', 'AY'), (' G', 'U', '#', ''), ('G', 'U', '%', ''), ('G', 'U', '#', 'w'), ('#N', 'U', '', 'yUW'), ('T', 'U', '', 'UW'), ('S', 'U', '', 'UW'), ('R', 'U', '', 'UW'), ('D', 'U', '', 'UW'), ('L', 'U', '', 'UW'), ('Z', 'U', '', 'UW'), ('N', 'U', '', 'UW'), ('J', 'U', '', 'UW'), ('TH', 'U', '', 'UW'), ('CH', 'U', '', 'UW'), ('SH', 'U', '', 'UW'), ('', 'U', '', 'yUW')), 'V': (('', 'VIEW', '', 'vyUW'), ('', 'V', '', 'v')), 'W': ((' ', 'WERE', '', 'wER'), ('', 'WA', 'S', 'wAA'), ('', 'WA', 'T', 'wAA'), ('', 'WHERE', '', 'WHEHr'), ('', 'WHAT', '', 'WHAAt'), ('', 'WHOL', '', 'hOWl'), ('', 'WHO', '', 'hUW'), ('', 'WH', '', 'WH'), ('', 'WAR', '', 'wAOr'), ('', 'WOR', '^', 'wER'), ('', 'WR', '', 'r'), ('', 'W', '', 'w')), 'X': (('', 'X', '', 'ks'),), 'Y': (('', 'YOUNG', '', 'yAHNG'), (' ', 'YOU', '', 'yUW'), (' ', 'YES', '', 'yEHs'), (' ', 'Y', '', 'y'), ('#:^', 'Y', ' ', 'IY'), ('#:^', 'Y', 'I', 'IY'), (' :', 'Y', ' ', 'AY'), (' :', 'Y', '#', 'AY'), (' :', 'Y', '^+:#', 'IH'), (' :', 'Y', '^#', 'AY'), ('', 'Y', '', 'IH')), 'Z': (('', 'Z', '', 'z'),)}¶

encode(word)[source]¶

Return the Naval Research Laboratory phonetic encoding of a word.

Parameters: word (str) -- The word to transform
Returns: The NRL phonetic encoding
Return type: str

Examples

>>> pe = NRL()
>>> pe.encode('the')
'DHAX'
>>> pe.encode('round')
'rAWnd'
>>> pe.encode('quick')
'kwIHk'
>>> pe.encode('eaten')
'IYtEHn'
>>> pe.encode('Smith')
'smIHTH'
>>> pe.encode('Larsen')
'lAArsEHn'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.nrl(word)[source]¶

Return the Naval Research Laboratory phonetic encoding of a word.

This is a wrapper for NRL.encode().

Parameters: word (str) -- The word to transform
Returns: The NRL phonetic encoding
Return type: str

Examples

>>> nrl('the')
'DHAX'
>>> nrl('round')
'rAWnd'
>>> nrl('quick')
'kwIHk'
>>> nrl('eaten')
'IYtEHn'
>>> nrl('Smith')
'smIHTH'
>>> nrl('Larsen')
'lAArsEHn'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the NRL.encode method instead.

class abydos.phonetic.MetaSoundex(lang='en')[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

MetaSoundex.

This is based on [KV17]. Only English ('en') and Spanish ('es') languages are supported, as in the original.

New in version 0.3.6.

Initialize MetaSoundex instance.

Parameters: lang (str) -- Either en for English or es for Spanish

New in version 0.4.0.

_trans = {65: '0', 66: '7', 67: '4', 68: '3', 69: '0', 70: '7', 71: '5', 72: '5', 73: '0', 74: '1', 75: '5', 76: '8', 77: '6', 78: '6', 79: '0', 80: '7', 81: '5', 82: '9', 83: '4', 84: '3', 85: '0', 86: '7', 87: '7', 88: '5', 89: '1', 90: '4'}¶

encode(word)[source]¶

Return the MetaSoundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The MetaSoundex code
Return type: str

Examples

>>> pe = MetaSoundex()
>>> pe.encode('Smith')
'4500'
>>> pe.encode('Waters')
'7362'
>>> pe.encode('James')
'1520'
>>> pe.encode('Schmidt')
'4530'
>>> pe.encode('Ashcroft')
'0261'

>>> pe = MetaSoundex(lang='es')
>>> pe.encode('Perez')
'094'
>>> pe.encode('Martinez')
'69364'
>>> pe.encode('Gutierrez')
'83994'
>>> pe.encode('Santiago')
'4638'
>>> pe.encode('Nicolás')
'6754'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the MetaSoundex code for a word.

Parameters: word (str) -- The word to transform
Returns: The MetaSoundex code
Return type: str

Examples

>>> pe = MetaSoundex()
>>> pe.encode_alpha('Smith')
'SN'
>>> pe.encode_alpha('Waters')
'WTRK'
>>> pe.encode_alpha('James')
'JNK'
>>> pe.encode_alpha('Schmidt')
'SNT'
>>> pe.encode_alpha('Ashcroft')
'AKRP'

>>> pe = MetaSoundex(lang='es')
>>> pe.encode_alpha('Perez')
'PRS'
>>> pe.encode_alpha('Martinez')
'NRTNS'
>>> pe.encode_alpha('Gutierrez')
'GTRRS'
>>> pe.encode_alpha('Santiago')
'SNTG'
>>> pe.encode_alpha('Nicolás')
'NKLS'

New in version 0.4.0.

abydos.phonetic.metasoundex(word, lang='en')[source]¶

Return the MetaSoundex code for a word.

This is a wrapper for MetaSoundex.encode().

Parameters

word (str) -- The word to transform
lang (str) -- Either en for English or es for Spanish

Returns

The MetaSoundex code

Return type

str

Examples

>>> metasoundex('Smith')
'4500'
>>> metasoundex('Waters')
'7362'
>>> metasoundex('James')
'1520'
>>> metasoundex('Schmidt')
'4530'
>>> metasoundex('Ashcroft')
'0261'
>>> metasoundex('Perez', lang='es')
'094'
>>> metasoundex('Martinez', lang='es')
'69364'
>>> metasoundex('Gutierrez', lang='es')
'83994'
>>> metasoundex('Santiago', lang='es')
'4638'
>>> metasoundex('Nicolás', lang='es')
'6754'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the MetaSoundex.encode method instead.

class abydos.phonetic.ONCA(max_length=4, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Oxford Name Compression Algorithm (ONCA).

This is the Oxford Name Compression Algorithm, based on [Gil97].

I can find no complete description of the "anglicised version of the NYSIIS method" identified as the first step in this algorithm, so this is likely not a precisely correct implementation, in that it employs the standard NYSIIS algorithm.

New in version 0.3.6.

Initialize ONCA instance.

Parameters

max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

encode(word)[source]¶

Return the Oxford Name Compression Algorithm (ONCA) code for a word.

Parameters: word (str) -- The word to transform
Returns: The ONCA code
Return type: str

Examples

>>> pe = ONCA()
>>> pe.encode('Christopher')
'C623'
>>> pe.encode('Niall')
'N400'
>>> pe.encode('Smith')
'S530'
>>> pe.encode('Schmidt')
'S530'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic ONCA code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic ONCA code
Return type: str

Examples

>>> pe = ONCA()
>>> pe.encode_alpha('Christopher')
'CRKT'
>>> pe.encode_alpha('Niall')
'NL'
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SNT'

New in version 0.4.0.

abydos.phonetic.onca(word, max_length=4, zero_pad=True)[source]¶

Return the Oxford Name Compression Algorithm (ONCA) code for a word.

This is a wrapper for ONCA.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The ONCA code

Return type

str

Examples

>>> onca('Christopher')
'C623'
>>> onca('Niall')
'N400'
>>> onca('Smith')
'S530'
>>> onca('Schmidt')
'S530'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the ONCA.encode method instead.

class abydos.phonetic.FONEM[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

FONEM.

FONEM is a phonetic algorithm designed for French (particularly surnames in Saguenay, Canada), defined in [BBL81].

Guillaume Plique's Javascript implementation [Pli18] at https://github.com/Yomguithereal/talisman/blob/master/src/phonetics/french/fonem.js was also consulted for this implementation.

New in version 0.3.6.

_rule_order = ('V-14', 'C-28', 'C-28a', 'C-28b', 'C-28bb', 'C-28c', 'C-28d', 'C-12', 'C-8', 'C-9', 'C-10', 'C-16', 'C-17', 'C-2', 'C-3', 'C-7', 'V-2,5', 'V-3,4', 'V-6', 'V-1', 'C-14', 'C-31,33', 'C-30,32', 'C-11', 'V-15', 'V-17', 'V-18', 'V-7', 'V-8', 'V-9', 'V-10', 'V-11', 'V-12', 'V-13', 'V-16', 'V-19', 'V-20', 'C-1', 'C-4', 'C-5', 'C-6', 'C-13', 'C-15', 'C-18', 'C-19', 'C-20', 'C-21', 'C-22', 'C-23', 'C-24', 'C-25', 'C-26', 'C-27', 'C-29', 'V-14', 'C-28', 'C-28a', 'C-28b', 'C-28bb', 'C-28c', 'C-28d', 'C-34', 'C-35')¶

_rule_table = {'C-1': ('BV', 'V'), 'C-10': (re.compile('G(?=[EIY])'), 'J'), 'C-11': (re.compile('GA(?=I?[MN])'), 'G#'), 'C-12': (re.compile('GE(O|AU)'), 'JO'), 'C-13': (re.compile('GNI(?=[AEIOUY])'), 'GN'), 'C-14': (re.compile('(?<![PCS])H'), ''), 'C-15': ('JEA', 'JA'), 'C-16': (re.compile('^MAC(?=[BCDFGHJKLMNPQRSTVWXZ])'), 'MA#'), 'C-17': (re.compile('^MC'), 'MA#'), 'C-18': ('PH', 'F'), 'C-19': ('QU', 'K'), 'C-2': (re.compile('(?<=[AEIOUY])C(?=[EIY])'), 'SS'), 'C-20': (re.compile('^SC(?=[EIY])'), 'S'), 'C-21': (re.compile('(?<=.)SC(?=[EIY])'), 'SS'), 'C-22': (re.compile('(?<=.)SC(?=[AOU])'), 'SK'), 'C-23': ('SH', 'CH'), 'C-24': (re.compile('TIA$'), 'SSIA'), 'C-25': (re.compile('(?<=[AIOUY])W'), ''), 'C-26': (re.compile('X[CSZ]'), 'X'), 'C-27': (re.compile('(?<=[AEIOUY])Z|(?<=[BCDFGHJKLMNPQRSTVWXZ])Z(?=[BCDFGHJKLMNPQRSTVWXZ])'), 'S'), 'C-28': (re.compile('([BDFGHJKMNPQRTVWXZ])\\1'), '\\1'), 'C-28a': (re.compile('CC(?=[BCDFGHJKLMNPQRSTVWXZ]|$)'), 'C'), 'C-28b': (re.compile('((?<=[BCDFGHJKLMNPQRSTVWXZ])|^)SS'), 'S'), 'C-28bb': (re.compile('SS(?=[BCDFGHJKLMNPQRSTVWXZ]|$)'), 'S'), 'C-28c': (re.compile('((?<=[^I])|^)LL'), 'L'), 'C-28d': (re.compile('ILE$'), 'ILLE'), 'C-29': (re.compile('(ILS|[CS]H|[MN]P|R[CFKLNSX])$|([BCDFGHJKLMNPQRSTVWXZ])[BCDFGHJKLMNPQRSTVWXZ]$'), <function _get_parts>), 'C-3': (re.compile('(?<=[BDFGHJKLMNPQRSTVWZ])C(?=[EIY])'), 'S'), 'C-30,32': (re.compile('^(SA?INT?|SEI[NM]|CINQ?|ST)(?!E)-?'), 'ST-'), 'C-31,33': (re.compile('^(SAINTE|STE)-?'), 'STE-'), 'C-34': ('G#', 'GA'), 'C-35': ('MA#', 'MAC'), 'C-4': (re.compile('^C(?=[EIY])'), 'S'), 'C-5': (re.compile('^C(?=[OUA])'), 'K'), 'C-6': (re.compile('(?<=[AEIOUY])C$'), 'K'), 'C-7': (re.compile('C(?=[BDFGJKLMNPQRSTVWXZ])'), 'K'), 'C-8': (re.compile('CC(?=[AOU])'), 'K'), 'C-9': (re.compile('CC(?=[EIY])'), 'X'), 'V-1': (re.compile('E?AU'), 'O'), 'V-10': ('Y', 'I'), 'V-11': (re.compile('(?<=[AEIOUY])I(?=[AEIOUY])'), 'Y'), 'V-12': (re.compile('(?<=[AEIOUY])ILL'), 'Y'), 'V-13': (re.compile('OU(?=[AEOU]|I(?!LL))'), 'W'), 'V-14': (re.compile('([AEIOUY])(?=\\1)'), ''), 'V-15': (re.compile('[AE]M(?=[BCDFGHJKLMPQRSTVWXZ])(?!$)'), 'EN'), 'V-16': (re.compile('OM(?=[BCDFGHJKLMPQRSTVWXZ])'), 'ON'), 'V-17': (re.compile('AN(?=[BCDFGHJKLMNPQRSTVWXZ])'), 'EN'), 'V-18': (re.compile('(AI[MN]|EIN)(?=[BCDFGHJKLMNPQRSTVWXZ]|$)'), 'IN'), 'V-19': (re.compile('B(O|U|OU)RNE?$'), 'BURN'), 'V-2,5': (re.compile('(E?AU|O)L[TX]$'), 'O'), 'V-20': (re.compile('(^IM|(?<=[BCDFGHJKLMNPQRSTVWXZ])IM(?=[BCDFGHJKLMPQRSTVWXZ]))'), 'IN'), 'V-3,4': (re.compile('E?AU[TX]$'), 'O'), 'V-6': (re.compile('E?AUL?D$'), 'O'), 'V-7': (re.compile('(?<!G)AY$'), 'E'), 'V-8': (re.compile('EUX$'), 'EU'), 'V-9': (re.compile('EY(?=$|[BCDFGHJKLMNPQRSTVWXZ])'), 'E')}¶

_uc_set = {'-', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'}¶

encode(word)[source]¶

Return the FONEM code of a word.

Parameters: word (str) -- The word to transform
Returns: The FONEM code
Return type: str

Examples

>>> pe = FONEM()
>>> pe.encode('Marchand')
'MARCHEN'
>>> pe.encode('Beaulieu')
'BOLIEU'
>>> pe.encode('Beaumont')
'BOMON'
>>> pe.encode('Legrand')
'LEGREN'
>>> pe.encode('Pelletier')
'PELETIER'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.fonem(word)[source]¶

Return the FONEM code of a word.

This is a wrapper for FONEM.encode().

Parameters: word (str) -- The word to transform
Returns: The FONEM code
Return type: str

Examples

>>> fonem('Marchand')
'MARCHEN'
>>> fonem('Beaulieu')
'BOLIEU'
>>> fonem('Beaumont')
'BOMON'
>>> fonem('Legrand')
'LEGREN'
>>> fonem('Pelletier')
'PELETIER'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the FONEM.encode method instead.

class abydos.phonetic.HenryEarly(max_length=3)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Henry code, early version.

The early version of Henry coding is given in [LegareLC72]. This is different from the later version defined in [Hen76].

New in version 0.3.6.

Initialize HenryEarly instance.

Parameters: max_length (int) -- The length of the code returned (defaults to 3)

New in version 0.4.0.

_diph = {'AI': 'E', 'AU': 'O', 'AY': 'E', 'EI': 'E', 'EU': 'U', 'OI': 'O', 'OU': 'O'}¶

_simple = {'W': 'V', 'X': 'S', 'Z': 'S'}¶

_uc_c_set = {'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Z'}¶

encode(word)[source]¶

Calculate the early version of the Henry code for a word.

Parameters: word (str) -- The word to transform
Returns: The early Henry code
Return type: str

Examples

>>> pe = HenryEarly()
>>> pe.encode('Marchand')
'MRC'
>>> pe.encode('Beaulieu')
'BL'
>>> pe.encode('Beaumont')
'BM'
>>> pe.encode('Legrand')
'LGR'
>>> pe.encode('Pelletier')
'PLT'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.henry_early(word, max_length=3)[source]¶

Calculate the early version of the Henry code for a word.

This is a wrapper for HenryEarly.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 3)

Returns

The early Henry code

Return type

str

Examples

>>> henry_early('Marchand')
'MRC'
>>> henry_early('Beaulieu')
'BL'
>>> henry_early('Beaumont')
'BM'
>>> henry_early('Legrand')
'LGR'
>>> henry_early('Pelletier')
'PLT'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the HenryEarly.encode method instead.

class abydos.phonetic.Koelner[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Kölner Phonetik.

Based on the algorithm defined by [Pos69].

New in version 0.3.6.

_num_set = {'0', '1', '2', '3', '4', '5', '6', '7', '8'}¶

_num_trans = {48: 'A', 49: 'P', 50: 'T', 51: 'F', 52: 'K', 53: 'L', 54: 'N', 55: 'R', 56: 'S'}¶

_to_alpha(num)[source]¶

Convert a Kölner Phonetik code from numeric to alphabetic.

Parameters: num (str or int) -- A numeric Kölner Phonetik representation
Returns: An alphabetic representation of the same word
Return type: str

Examples

>>> pe = Koelner()
>>> pe._to_alpha('862')
'SNT'
>>> pe._to_alpha('657')
'NLR'
>>> pe._to_alpha('86766')
'SNRNN'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

_uc_v_set = {'A', 'E', 'I', 'J', 'O', 'U', 'Y'}¶

encode(word)[source]¶

Return the Kölner Phonetik (numeric output) code for a word.

While the output code is numeric, it is still a str because 0s can lead the code.

Parameters: word (str) -- The word to transform
Returns: The Kölner Phonetik value as a numeric string
Return type: str

Example

>>> pe = Koelner()
>>> pe.encode('Christopher')
'478237'
>>> pe.encode('Niall')
'65'
>>> pe.encode('Smith')
'862'
>>> pe.encode('Schmidt')
'862'
>>> pe.encode('Müller')
'657'
>>> pe.encode('Zimmermann')
'86766'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the Kölner Phonetik (alphabetic output) code for a word.

Parameters: word (str) -- The word to transform
Returns: The Kölner Phonetik value as an alphabetic string
Return type: str

Examples

>>> pe = Koelner()
>>> pe.encode_alpha('Smith')
'SNT'
>>> pe.encode_alpha('Schmidt')
'SNT'
>>> pe.encode_alpha('Müller')
'NLR'
>>> pe.encode_alpha('Zimmermann')
'SNRNN'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.koelner_phonetik(word)[source]¶

Return the Kölner Phonetik (numeric output) code for a word.

This is a wrapper for Koelner.encode().

Parameters: word (str) -- The word to transform
Returns: The Kölner Phonetik value as a numeric string
Return type: str

Example

>>> koelner_phonetik('Christopher')
'478237'
>>> koelner_phonetik('Niall')
'65'
>>> koelner_phonetik('Smith')
'862'
>>> koelner_phonetik('Schmidt')
'862'
>>> koelner_phonetik('Müller')
'657'
>>> koelner_phonetik('Zimmermann')
'86766'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Koelner.encode method instead.

abydos.phonetic.koelner_phonetik_num_to_alpha(num)[source]¶

Convert a Kölner Phonetik code from numeric to alphabetic.

This is a wrapper for Koelner._to_alpha().

Parameters: num (str or int) -- A numeric Kölner Phonetik representation
Returns: An alphabetic representation of the same word
Return type: str

Examples

>>> koelner_phonetik_num_to_alpha('862')
'SNT'
>>> koelner_phonetik_num_to_alpha('657')
'NLR'
>>> koelner_phonetik_num_to_alpha('86766')
'SNRNN'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Koelner._to_alpha method instead.

abydos.phonetic.koelner_phonetik_alpha(word)[source]¶

Return the Kölner Phonetik (alphabetic output) code for a word.

This is a wrapper for Koelner.encode_alpha().

Parameters: word (str) -- The word to transform
Returns: The Kölner Phonetik value as an alphabetic string
Return type: str

Examples

>>> koelner_phonetik_alpha('Smith')
'SNT'
>>> koelner_phonetik_alpha('Schmidt')
'SNT'
>>> koelner_phonetik_alpha('Müller')
'NLR'
>>> koelner_phonetik_alpha('Zimmermann')
'SNRNN'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Koelner.encode_alpha method instead.

class abydos.phonetic.Haase(primary_only=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Haase Phonetik.

Based on the algorithm described at [Pra15].

Based on the original [HH00].

New in version 0.3.6.

Initialize Haase instance.

Parameters: primary_only (bool) -- If True, only the primary code is returned

New in version 0.4.0.

_alphabetic = {49: 'P', 50: 'T', 51: 'F', 52: 'K', 53: 'L', 54: 'N', 55: 'R', 56: 'S', 57: 'A'}¶

_uc_v_set = {'A', 'E', 'I', 'J', 'O', 'U', 'Y'}¶

encode(word)[source]¶

Return the Haase Phonetik (numeric output) code for a word.

While the output code is numeric, it is nevertheless a str.

Parameters: word (str) -- The word to transform
Returns: The Haase Phonetik value as a numeric string
Return type: tuple

Examples

>>> pe = Haase()
>>> pe.encode('Joachim')
('9496',)
>>> pe.encode('Christoph')
('4798293', '8798293')
>>> pe.encode('Jörg')
('974',)
>>> pe.encode('Smith')
('8692',)
>>> pe.encode('Schmidt')
('8692', '4692')

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic Haase Phonetik code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Haase Phonetik value
Return type: tuple

Examples

>>> pe = Haase()
>>> pe.encode_alpha('Joachim')
('AKAN',)
>>> pe.encode_alpha('Christoph')
('KRASTAF', 'SRASTAF')
>>> pe.encode_alpha('Jörg')
('ARK',)
>>> pe.encode_alpha('Smith')
('SNAT',)
>>> pe.encode_alpha('Schmidt')
('SNAT', 'KNAT')

New in version 0.4.0.

abydos.phonetic.haase_phonetik(word, primary_only=False)[source]¶

Return the Haase Phonetik code for a word.

This is a wrapper for Haase.encode().

Parameters

word (str) -- The word to transform
primary_only (bool) -- If True, only the primary code is returned

Returns

The Haase Phonetik value as a numeric string

Return type

tuple

Examples

>>> haase_phonetik('Joachim')
('9496',)
>>> haase_phonetik('Christoph')
('4798293', '8798293')
>>> haase_phonetik('Jörg')
('974',)
>>> haase_phonetik('Smith')
('8692',)
>>> haase_phonetik('Schmidt')
('8692', '4692')

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Haase.encode method instead.

class abydos.phonetic.RethSchek[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Reth-Schek Phonetik.

This algorithm is proposed in [vonRethS77].

Since I couldn't secure a copy of that document (maybe I'll look for it next time I'm in Germany), this implementation is based on what I could glean from the implementations published by German Record Linkage Center (www.record-linkage.de):

Privacy-preserving Record Linkage (PPRL) (in R) [Ruk18]
Merge ToolBox (in Java) [SBB04]

Rules that are unclear:

Should 'C' become 'G' or 'Z'? (PPRL has both, 'Z' rule blocked)
Should 'CC' become 'G'? (PPRL has blocked 'CK' that may be typo)
Should 'TUI' -> 'ZUI' rule exist? (PPRL has rule, but I can't think of a German word with '-tui-' in it.)
Should we really change 'SCH' -> 'CH' and then 'CH' -> 'SCH'?

New in version 0.3.6.

_replacements = {1: {'C': 'G', 'K': 'G', 'P': 'B', 'T': 'D', 'V': 'F', 'W': 'F', 'Y': 'I'}, 2: {'AA': 'A', 'AE': 'E', 'AH': 'A', 'AY': 'AI', 'BB': 'B', 'BP': 'B', 'CC': 'C', 'CK': 'G', 'DD': 'D', 'DT': 'D', 'EE': 'E', 'EH': 'E', 'EI': 'AI', 'EU': 'OI', 'EY': 'AI', 'FF': 'F', 'GG': 'G', 'GK': 'G', 'GS': 'X', 'IE': 'I', 'IH': 'I', 'KG': 'G', 'KK': 'K', 'KS': 'X', 'KW': 'QU', 'LL': 'L', 'MM': 'M', 'NN': 'N', 'OH': 'O', 'OO': 'O', 'PB': 'B', 'PH': 'F', 'PP': 'B', 'RR': 'R', 'SS': 'S', 'SZ': 'S', 'TH': 'D', 'TT': 'D', 'TZ': 'Z', 'UH': 'U'}, 3: {'AEH': 'E', 'AEU': 'OI', 'CHS': 'X', 'CKS': 'X', 'IEH': 'I', 'OEH': 'OE', 'SCH': 'CH', 'TIU': 'TIO', 'UEH': 'UE', 'ZIO': 'TIO', 'ZIU': 'TIO'}}¶

encode(word)[source]¶

Return Reth-Schek Phonetik code for a word.

Parameters: word (str) -- The word to transform
Returns: The Reth-Schek Phonetik code
Return type: str

Examples

>>> pe = RethSchek()
>>> pe.encode('Joachim')
'JOAGHIM'
>>> pe.encode('Christoph')
'GHRISDOF'
>>> pe.encode('Jörg')
'JOERG'
>>> pe.encode('Smith')
'SMID'
>>> pe.encode('Schmidt')
'SCHMID'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.reth_schek_phonetik(word)[source]¶

Return Reth-Schek Phonetik code for a word.

This is a wrapper for RethSchek.encode().

Parameters: word (str) -- The word to transform
Returns: The Reth-Schek Phonetik code
Return type: str

Examples

>>> reth_schek_phonetik('Joachim')
'JOAGHIM'
>>> reth_schek_phonetik('Christoph')
'GHRISDOF'
>>> reth_schek_phonetik('Jörg')
'JOERG'
>>> reth_schek_phonetik('Smith')
'SMID'
>>> reth_schek_phonetik('Schmidt')
'SCHMID'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RethSchek.encode method instead.

class abydos.phonetic.Phonem[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Phonem.

Phonem is defined in [GM88].

This version is based on the Perl implementation documented at [Wil05]. It includes some enhancements presented in the Java port at [dcm4che].

Phonem is intended chiefly for German names/words.

New in version 0.3.6.

_substitutions = (('SC', 'C'), ('SZ', 'C'), ('CZ', 'C'), ('TZ', 'C'), ('TS', 'C'), ('KS', 'X'), ('PF', 'V'), ('QU', 'KW'), ('PH', 'V'), ('UE', 'Y'), ('AE', 'E'), ('OE', 'Ö'), ('EI', 'AY'), ('EY', 'AY'), ('EU', 'OY'), ('AU', 'A§'), ('OU', '§'))¶

_trans = {70: 'V', 71: 'C', 73: 'Y', 74: 'Y', 75: 'C', 80: 'B', 81: 'C', 84: 'D', 87: 'V', 90: 'C', 167: 'U', 192: 'A', 193: 'A', 194: 'A', 195: 'A', 196: 'E', 197: 'A', 198: 'E', 199: 'C', 200: 'E', 201: 'E', 202: 'E', 203: 'E', 204: 'Y', 205: 'Y', 206: 'Y', 207: 'Y', 209: 'N', 210: 'O', 211: 'O', 212: 'O', 213: 'O', 216: 'Ö', 217: 'U', 218: 'U', 219: 'U', 220: 'Y', 221: 'Y', 223: 'S'}¶

_uc_set = {'A', 'B', 'C', 'D', 'L', 'M', 'N', 'O', 'R', 'S', 'U', 'V', 'W', 'X', 'Y', 'Ö'}¶

encode(word)[source]¶

Return the Phonem code for a word.

Parameters

word (str) --
word to transform (The) --

Returns

The Phonem value

Return type

str

Examples

>>> pe = Phonem()
>>> pe.encode('Christopher')
'CRYSDOVR'
>>> pe.encode('Niall')
'NYAL'
>>> pe.encode('Smith')
'SMYD'
>>> pe.encode('Schmidt')
'CMYD'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.phonem(word)[source]¶

Return the Phonem code for a word.

This is a wrapper for Phonem.encode().

Parameters: word (str) -- The word to transform
Returns: The Phonem value
Return type: str

Examples

>>> phonem('Christopher')
'CRYSDOVR'
>>> phonem('Niall')
'NYAL'
>>> phonem('Smith')
'SMYD'
>>> phonem('Schmidt')
'CMYD'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonem.encode method instead.

class abydos.phonetic.Phonet(mode=1, lang='de')[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Phonet code.

phonet ("Hannoveraner Phonetik") was developed by Jörg Michael and documented in [Mic99].

This is a port of Jesper Zedlitz's code, which is licensed LGPL [Zed15].

That is, in turn, based on Michael's C code, which is also licensed LGPL [Mic07].

New in version 0.3.6.

Initialize AlphaSIS instance.

Parameters

mode (int) -- The ponet variant to employ (1 or 2)
lang (str) -- de (default) for German, none for no language

New in version 0.4.0.

_rules_german = ('´', ' ', ' ', '"', ' ', ' ', '`$', '', '', "'", ' ', ' ', ',', ' ', ' ', ';', ' ', ' ', '-', ' ', ' ', ' ', ' ', ' ', '.', '.', '.', ':', '.', '.', 'ÄE', 'E', 'E', 'ÄU<', 'EU', 'EU', 'ÄV(AEOU)-<', 'EW', None, 'Ä$', 'Ä', None, 'Ä<', None, 'E', 'Ä', 'E', None, 'ÖE', 'Ö', 'Ö', 'ÖU', 'Ö', 'Ö', 'ÖVER--<', 'ÖW', None, 'ÖV(AOU)-', 'ÖW', None, 'ÜBEL(GNRW)-^^', 'ÜBL ', 'IBL ', 'ÜBER^^', 'ÜBA', 'IBA', 'ÜE', 'Ü', 'I', 'ÜVER--<', 'ÜW', None, 'ÜV(AOU)-', 'ÜW', None, 'Ü', None, 'I', 'ßCH<', None, 'Z', 'ß<', 'S', 'Z', 'À<', 'A', 'A', 'Á<', 'A', 'A', 'Â<', 'A', 'A', 'Ã<', 'A', 'A', 'Å<', 'A', 'A', 'ÆER-', 'E', 'E', 'ÆU<', 'EU', 'EU', 'ÆV(AEOU)-<', 'EW', None, 'Æ$', 'Ä', None, 'Æ<', None, 'E', 'Æ', 'E', None, 'Ç', 'Z', 'Z', 'ÐÐ-', '', '', 'Ð', 'DI', 'TI', 'È<', 'E', 'E', 'É<', 'E', 'E', 'Ê<', 'E', 'E', 'Ë', 'E', 'E', 'Ì<', 'I', 'I', 'Í<', 'I', 'I', 'Î<', 'I', 'I', 'Ï', 'I', 'I', 'ÑÑ-', '', '', 'Ñ', 'NI', 'NI', 'Ò<', 'O', 'U', 'Ó<', 'O', 'U', 'Ô<', 'O', 'U', 'Õ<', 'O', 'U', 'Œ<', 'Ö', 'Ö', 'Ø(IJY)-<', 'E', 'E', 'Ø<', 'Ö', 'Ö', 'Š', 'SH', 'Z', 'Þ', 'T', 'T', 'Ù<', 'U', 'U', 'Ú<', 'U', 'U', 'Û<', 'U', 'U', 'Ý<', 'I', 'I', 'Ÿ<', 'I', 'I', 'ABELLE$', 'ABL', 'ABL', 'ABELL$', 'ABL', 'ABL', 'ABIENNE$', 'ABIN', 'ABIN', 'ACHME---^', 'ACH', 'AK', 'ACEY$', 'AZI', 'AZI', 'ADV', 'ATW', None, 'AEGL-', 'EK', None, 'AEU<', 'EU', 'EU', 'AE2', 'E', 'E', 'AFTRAUBEN------', 'AFT ', 'AFT ', 'AGL-1', 'AK', None, 'AGNI-^', 'AKN', 'AKN', 'AGNIE-', 'ANI', 'ANI', 'AGN(AEOU)-$', 'ANI', 'ANI', 'AH(AIOÖUÜY)-', 'AH', None, 'AIA2', 'AIA', 'AIA', 'AIE$', 'E', 'E', 'AILL(EOU)-', 'ALI', 'ALI', 'AINE$', 'EN', 'EN', 'AIRE$', 'ER', 'ER', 'AIR-', 'E', 'E', 'AISE$', 'ES', 'EZ', 'AISSANCE$', 'ESANS', 'EZANZ', 'AISSE$', 'ES', 'EZ', 'AIX$', 'EX', 'EX', 'AJ(AÄEÈÉÊIOÖUÜ)--', 'A', 'A', 'AKTIE', 'AXIE', 'AXIE', 'AKTUEL', 'AKTUEL', None, 'ALOI^', 'ALOI', 'ALUI', 'ALOY^', 'ALOI', 'ALUI', 'AMATEU(RS)-', 'AMATÖ', 'ANATÖ', 'ANCH(OEI)-', 'ANSH', 'ANZ', 'ANDERGEGANG----', 'ANDA GE', 'ANTA KE', 'ANDERGEHE----', 'ANDA ', 'ANTA ', 'ANDERGESETZ----', 'ANDA GE', 'ANTA KE', 'ANDERGING----', 'ANDA ', 'ANTA ', 'ANDERSETZ(ET)-----', 'ANDA ', 'ANTA ', 'ANDERZUGEHE----', 'ANDA ZU ', 'ANTA ZU ', 'ANDERZUSETZE-----', 'ANDA ZU ', 'ANTA ZU ', 'ANER(BKO)---^^', 'AN', None, 'ANHAND---^$', 'AN H', 'AN ', 'ANH(AÄEIOÖUÜY)--^^', 'AN', None, 'ANIELLE$', 'ANIEL', 'ANIL', 'ANIEL', 'ANIEL', None, 'ANSTELLE----^$', 'AN ST', 'AN ZT', 'ANTI^^', 'ANTI', 'ANTI', 'ANVER^^', 'ANFA', 'ANFA', 'ATIA$', 'ATIA', 'ATIA', 'ATIA(NS)--', 'ATI', 'ATI', 'ATI(AÄOÖUÜ)-', 'AZI', 'AZI', 'AUAU--', '', '', 'AUERE$', 'AUERE', None, 'AUERE(NS)-$', 'AUERE', None, 'AUERE(AIOUY)--', 'AUER', None, 'AUER(AÄIOÖUÜY)-', 'AUER', None, 'AUER<', 'AUA', 'AUA', 'AUF^^', 'AUF', 'AUF', 'AULT$', 'O', 'U', 'AUR(BCDFGKLMNQSTVWZ)-', 'AUA', 'AUA', 'AUR$', 'AUA', 'AUA', 'AUSSE$', 'OS', 'UZ', 'AUS(ST)-^', 'AUS', 'AUS', 'AUS^^', 'AUS', 'AUS', 'AUTOFAHR----', 'AUTO ', 'AUTU ', 'AUTO^^', 'AUTO', 'AUTU', 'AUX(IY)-', 'AUX', 'AUX', 'AUX', 'O', 'U', 'AU', 'AU', 'AU', 'AVER--<', 'AW', None, 'AVIER$', 'AWIE', 'AFIE', 'AV(EÈÉÊI)-^', 'AW', None, 'AV(AOU)-', 'AW', None, 'AYRE$', 'EIRE', 'EIRE', 'AYRE(NS)-$', 'EIRE', 'EIRE', 'AYRE(AIOUY)--', 'EIR', 'EIR', 'AYR(AÄIOÖUÜY)-', 'EIR', 'EIR', 'AYR<', 'EIA', 'EIA', 'AYER--<', 'EI', 'EI', 'AY(AÄEIOÖUÜY)--', 'A', 'A', 'AË', 'E', 'E', 'A(IJY)<', 'EI', 'EI', 'BABY^$', 'BEBI', 'BEBI', 'BAB(IY)^', 'BEBI', 'BEBI', 'BEAU^$', 'BO', None, 'BEA(BCMNRU)-^', 'BEA', 'BEA', 'BEAT(AEIMORU)-^', 'BEAT', 'BEAT', 'BEE$', 'BI', 'BI', 'BEIGE^$', 'BESH', 'BEZ', 'BENOIT--', 'BENO', 'BENU', 'BER(DT)-', 'BER', None, 'BERN(DT)-', 'BERN', None, 'BE(LMNRST)-^', 'BE', 'BE', 'BETTE$', 'BET', 'BET', 'BEVOR^$', 'BEFOR', None, 'BIC$', 'BIZ', 'BIZ', 'BOWL(EI)-', 'BOL', 'BUL', 'BP(AÄEÈÉÊIÌÍÎOÖRUÜY)-', 'B', 'B', 'BRINGEND-----^', 'BRI', 'BRI', 'BRINGEND-----', ' BRI', ' BRI', 'BROW(NS)-', 'BRAU', 'BRAU', 'BUDGET7', 'BÜGE', 'BIKE', 'BUFFET7', 'BÜFE', 'BIFE', 'BYLLE$', 'BILE', 'BILE', 'BYLL$', 'BIL', 'BIL', 'BYPA--^', 'BEI', 'BEI', 'BYTE<', 'BEIT', 'BEIT', 'BY9^', 'BÜ', None, 'B(SßZ)$', 'BS', None, 'CACH(EI)-^', 'KESH', 'KEZ', 'CAE--', 'Z', 'Z', 'CA(IY)$', 'ZEI', 'ZEI', 'CE(EIJUY)--', 'Z', 'Z', 'CENT<', 'ZENT', 'ZENT', 'CERST(EI)----^', 'KE', 'KE', 'CER$', 'ZA', 'ZA', 'CE3', 'ZE', 'ZE', "CH'S$", 'X', 'X', 'CH´S$', 'X', 'X', 'CHAO(ST)-', 'KAO', 'KAU', 'CHAMPIO-^', 'SHEMPI', 'ZENBI', 'CHAR(AI)-^', 'KAR', 'KAR', 'CHAU(CDFSVWXZ)-', 'SHO', 'ZU', 'CHÄ(CF)-', 'SHE', 'ZE', 'CHE(CF)-', 'SHE', 'ZE', 'CHEM-^', 'KE', 'KE', 'CHEQUE<', 'SHEK', 'ZEK', 'CHI(CFGPVW)-', 'SHI', 'ZI', 'CH(AEUY)-<^', 'SH', 'Z', 'CHK-', '', '', 'CHO(CKPS)-^', 'SHO', 'ZU', 'CHRIS-', 'KRI', None, 'CHRO-', 'KR', None, 'CH(LOR)-<^', 'K', 'K', 'CHST-', 'X', 'X', 'CH(SßXZ)3', 'X', 'X', 'CHTNI-3', 'CHN', 'KN', 'CH^', 'K', 'K', 'CH', 'CH', 'K', 'CIC$', 'ZIZ', 'ZIZ', 'CIENCEFICT----', 'EIENS ', 'EIENZ ', 'CIENCE$', 'EIENS', 'EIENZ', 'CIER$', 'ZIE', 'ZIE', 'CYB-^', 'ZEI', 'ZEI', 'CY9^', 'ZÜ', 'ZI', 'C(IJY)-<3', 'Z', 'Z', 'CLOWN-', 'KLAU', 'KLAU', 'CCH', 'Z', 'Z', 'CCE-', 'X', 'X', 'C(CK)-', '', '', 'CLAUDET---', 'KLO', 'KLU', 'CLAUDINE^$', 'KLODIN', 'KLUTIN', 'COACH', 'KOSH', 'KUZ', 'COLE$', 'KOL', 'KUL', 'COUCH', 'KAUSH', 'KAUZ', 'COW', 'KAU', 'KAU', 'CQUES$', 'K', 'K', 'CQUE', 'K', 'K', 'CRASH--9', 'KRE', 'KRE', 'CREAT-^', 'KREA', 'KREA', 'CST', 'XT', 'XT', 'CS<^', 'Z', 'Z', 'C(SßX)', 'X', 'X', "CT'S$", 'X', 'X', 'CT(SßXZ)', 'X', 'X', 'CZ<', 'Z', 'Z', 'C(ÈÉÊÌÍÎÝ)3', 'Z', 'Z', 'C.^', 'C.', 'C.', 'CÄ-', 'Z', 'Z', 'CÜ$', 'ZÜ', 'ZI', "C'S$", 'X', 'X', 'C<', 'K', 'K', 'DAHER^$', 'DAHER', None, 'DARAUFFOLGE-----', 'DARAUF ', 'TARAUF ', 'DAVO(NR)-^$', 'DAFO', 'TAFU', 'DD(SZ)--<', '', '', 'DD9', 'D', None, 'DEPOT7', 'DEPO', 'TEBU', 'DESIGN', 'DISEIN', 'TIZEIN', 'DE(LMNRST)-3^', 'DE', 'TE', 'DETTE$', 'DET', 'TET', 'DH$', 'T', None, 'DIC$', 'DIZ', 'TIZ', 'DIDR-^', 'DIT', None, 'DIEDR-^', 'DIT', None, 'DJ(AEIOU)-^', 'I', 'I', 'DMITR-^', 'DIMIT', 'TINIT', 'DRY9^', 'DRÜ', None, 'DT-', '', '', 'DUIS-^', 'DÜ', 'TI', 'DURCH^^', 'DURCH', 'TURK', 'DVA$', 'TWA', None, 'DY9^', 'DÜ', None, 'DYS$', 'DIS', None, 'DS(CH)--<', 'T', 'T', 'DST', 'ZT', 'ZT', 'DZS(CH)--', 'T', 'T', 'D(SßZ)', 'Z', 'Z', 'D(AÄEIOÖRUÜY)-', 'D', None, 'D(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'D', None, "D'H^", 'D', 'T', 'D´H^', 'D', 'T', 'D`H^', 'D', 'T', "D'S3$", 'Z', 'Z', 'D´S3$', 'Z', 'Z', 'D^', 'D', None, 'D', 'T', 'T', 'EAULT$', 'O', 'U', 'EAUX$', 'O', 'U', 'EAU', 'O', 'U', 'EAV', 'IW', 'IF', 'EAS3$', 'EAS', None, 'EA(AÄEIOÖÜY)-3', 'EA', 'EA', 'EA3$', 'EA', 'EA', 'EA3', 'I', 'I', 'EBENSO^$', 'EBNSO', 'EBNZU', 'EBENSO^^', 'EBNSO ', 'EBNZU ', 'EBEN^^', 'EBN', 'EBN', 'EE9', 'E', 'E', 'EGL-1', 'EK', None, 'EHE(IUY)--1', 'EH', None, 'EHUNG---1', 'E', None, 'EH(AÄIOÖUÜY)-1', 'EH', None, 'EIEI--', '', '', 'EIERE^$', 'EIERE', None, 'EIERE$', 'EIERE', None, 'EIERE(NS)-$', 'EIERE', None, 'EIERE(AIOUY)--', 'EIER', None, 'EIER(AÄIOÖUÜY)-', 'EIER', None, 'EIER<', 'EIA', None, 'EIGL-1', 'EIK', None, 'EIGH$', 'EI', 'EI', 'EIH--', 'E', 'E', 'EILLE$', 'EI', 'EI', 'EIR(BCDFGKLMNQSTVWZ)-', 'EIA', 'EIA', 'EIR$', 'EIA', 'EIA', 'EITRAUBEN------', 'EIT ', 'EIT ', 'EI', 'EI', 'EI', 'EJ$', 'EI', 'EI', 'ELIZ^', 'ELIS', None, 'ELZ^', 'ELS', None, 'EL-^', 'E', 'E', 'ELANG----1', 'E', 'E', 'EL(DKL)--1', 'E', 'E', 'EL(MNT)--1$', 'E', 'E', 'ELYNE$', 'ELINE', 'ELINE', 'ELYN$', 'ELIN', 'ELIN', 'EL(AÄEÈÉÊIÌÍÎOÖUÜY)-1', 'EL', 'EL', 'EL-1', 'L', 'L', 'EM-^', None, 'E', 'EM(DFKMPQT)--1', None, 'E', 'EM(AÄEÈÉÊIÌÍÎOÖUÜY)--1', None, 'E', 'EM-1', None, 'N', 'ENGAG-^', 'ANGA', 'ANKA', 'EN-^', 'E', 'E', 'ENTUEL', 'ENTUEL', None, 'EN(CDGKQSTZ)--1', 'E', 'E', 'EN(AÄEÈÉÊIÌÍÎNOÖUÜY)-1', 'EN', 'EN', 'EN-1', '', '', 'ERH(AÄEIOÖUÜ)-^', 'ERH', 'ER', 'ER-^', 'E', 'E', 'ERREGEND-----', ' ER', ' ER', 'ERT1$', 'AT', None, 'ER(DGLKMNRQTZß)-1', 'ER', None, 'ER(AÄEÈÉÊIÌÍÎOÖUÜY)-1', 'ER', 'A', 'ER1$', 'A', 'A', 'ER<1', 'A', 'A', 'ETAT7', 'ETA', 'ETA', 'ETI(AÄOÖÜU)-', 'EZI', 'EZI', 'EUERE$', 'EUERE', None, 'EUERE(NS)-$', 'EUERE', None, 'EUERE(AIOUY)--', 'EUER', None, 'EUER(AÄIOÖUÜY)-', 'EUER', None, 'EUER<', 'EUA', None, 'EUEU--', '', '', 'EUILLE$', 'Ö', 'Ö', 'EUR$', 'ÖR', 'ÖR', 'EUX', 'Ö', 'Ö', 'EUSZ$', 'EUS', None, 'EUTZ$', 'EUS', None, 'EUYS$', 'EUS', 'EUZ', 'EUZ$', 'EUS', None, 'EU', 'EU', 'EU', 'EVER--<1', 'EW', None, 'EV(ÄOÖUÜ)-1', 'EW', None, 'EYER<', 'EIA', 'EIA', 'EY<', 'EI', 'EI', 'FACETTE', 'FASET', 'FAZET', 'FANS--^$', 'FE', 'FE', 'FAN-^$', 'FE', 'FE', 'FAULT-', 'FOL', 'FUL', 'FEE(DL)-', 'FI', 'FI', 'FEHLER', 'FELA', 'FELA', 'FE(LMNRST)-3^', 'FE', 'FE', 'FOERDERN---^', 'FÖRD', 'FÖRT', 'FOERDERN---', ' FÖRD', ' FÖRT', 'FOND7', 'FON', 'FUN', 'FRAIN$', 'FRA', 'FRA', 'FRISEU(RS)-', 'FRISÖ', 'FRIZÖ', 'FY9^', 'FÜ', None, 'FÖRDERN---^', 'FÖRD', 'FÖRT', 'FÖRDERN---', ' FÖRD', ' FÖRT', 'GAGS^$', 'GEX', 'KEX', 'GAG^$', 'GEK', 'KEK', 'GD', 'KT', 'KT', 'GEGEN^^', 'GEGN', 'KEKN', 'GEGENGEKOM-----', 'GEGN ', 'KEKN ', 'GEGENGESET-----', 'GEGN ', 'KEKN ', 'GEGENKOMME-----', 'GEGN ', 'KEKN ', 'GEGENZUKOM---', 'GEGN ZU ', 'KEKN ZU ', 'GENDETWAS-----$', 'GENT ', 'KENT ', 'GENRE', 'IORE', 'IURE', 'GE(LMNRST)-3^', 'GE', 'KE', 'GER(DKT)-', 'GER', None, 'GETTE$', 'GET', 'KET', 'GGF.', 'GF.', None, 'GG-', '', '', 'GH', 'G', None, 'GI(AOU)-^', 'I', 'I', 'GION-3', 'KIO', 'KIU', 'G(CK)-', '', '', 'GJ(AEIOU)-^', 'I', 'I', 'GMBH^$', 'GMBH', 'GMBH', 'GNAC$', 'NIAK', 'NIAK', 'GNON$', 'NION', 'NIUN', 'GN$', 'N', 'N', 'GONCAL-^', 'GONZA', 'KUNZA', 'GRY9^', 'GRÜ', None, 'G(SßXZ)-<', 'K', 'K', 'GUCK-', 'KU', 'KU', 'GUISEP-^', 'IUSE', 'IUZE', 'GUI-^', 'G', 'K', 'GUTAUSSEH------^', 'GUT ', 'KUT ', 'GUTGEHEND------^', 'GUT ', 'KUT ', 'GY9^', 'GÜ', None, 'G(AÄEILOÖRUÜY)-', 'G', None, 'G(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'G', None, "G'S$", 'X', 'X', 'G´S$', 'X', 'X', 'G^', 'G', None, 'G', 'K', 'K', 'HA(HIUY)--1', 'H', None, 'HANDVOL---^', 'HANT ', 'ANT ', 'HANNOVE-^', 'HANOF', None, 'HAVEN7$', 'HAFN', None, 'HEAD-', 'HE', 'E', 'HELIEGEN------', 'E ', 'E ', 'HESTEHEN------', 'E ', 'E ', 'HE(LMNRST)-3^', 'HE', 'E', 'HE(LMN)-1', 'E', 'E', 'HEUR1$', 'ÖR', 'ÖR', 'HE(HIUY)--1', 'H', None, 'HIH(AÄEIOÖUÜY)-1', 'IH', None, 'HLH(AÄEIOÖUÜY)-1', 'LH', None, 'HMH(AÄEIOÖUÜY)-1', 'MH', None, 'HNH(AÄEIOÖUÜY)-1', 'NH', None, 'HOBBY9^', 'HOBI', None, 'HOCHBEGAB-----^', 'HOCH ', 'UK ', 'HOCHTALEN-----^', 'HOCH ', 'UK ', 'HOCHZUFRI-----^', 'HOCH ', 'UK ', 'HO(HIY)--1', 'H', None, 'HRH(AÄEIOÖUÜY)-1', 'RH', None, 'HUH(AÄEIOÖUÜY)-1', 'UH', None, 'HUIS^^', 'HÜS', 'IZ', 'HUIS$', 'ÜS', 'IZ', 'HUI--1', 'H', None, 'HYGIEN^', 'HÜKIEN', None, 'HY9^', 'HÜ', None, 'HY(BDGMNPST)-', 'Ü', None, 'H.^', None, 'H.', 'HÄU--1', 'H', None, 'H^', 'H', '', 'H', '', '', 'ICHELL---', 'ISH', 'IZ', 'ICHI$', 'ISHI', 'IZI', 'IEC$', 'IZ', 'IZ', 'IEDENSTELLE------', 'IDN ', 'ITN ', 'IEI-3', '', '', 'IELL3', 'IEL', 'IEL', 'IENNE$', 'IN', 'IN', 'IERRE$', 'IER', 'IER', 'IERZULAN---', 'IR ZU ', 'IR ZU ', 'IETTE$', 'IT', 'IT', 'IEU', 'IÖ', 'IÖ', 'IE<4', 'I', 'I', 'IGL-1', 'IK', None, 'IGHT3$', 'EIT', 'EIT', 'IGNI(EO)-', 'INI', 'INI', 'IGN(AEOU)-$', 'INI', 'INI', 'IHER(DGLKRT)--1', 'IHE', None, 'IHE(IUY)--', 'IH', None, 'IH(AIOÖUÜY)-', 'IH', None, 'IJ(AOU)-', 'I', 'I', 'IJ$', 'I', 'I', 'IJ<', 'EI', 'EI', 'IKOLE$', 'IKOL', 'IKUL', 'ILLAN(STZ)--4', 'ILIA', 'ILIA', 'ILLAR(DT)--4', 'ILIA', 'ILIA', 'IMSTAN----^', 'IM ', 'IN ', 'INDELERREGE------', 'INDL ', 'INTL ', 'INFRAGE-----^$', 'IN ', 'IN ', 'INTERN(AOU)-^', 'INTAN', 'INTAN', 'INVER-', 'INWE', 'INFE', 'ITI(AÄIOÖUÜ)-', 'IZI', 'IZI', 'IUSZ$', 'IUS', None, 'IUTZ$', 'IUS', None, 'IUZ$', 'IUS', None, 'IVER--<', 'IW', None, 'IVIER$', 'IWIE', 'IFIE', 'IV(ÄOÖUÜ)-', 'IW', None, 'IV<3', 'IW', None, 'IY2', 'I', None, 'I(ÈÉÊ)<4', 'I', 'I', 'JAVIE---<^', 'ZA', 'ZA', 'JEANS^$', 'JINS', 'INZ', 'JEANNE^$', 'IAN', 'IAN', 'JEAN-^', 'IA', 'IA', 'JER-^', 'IE', 'IE', 'JE(LMNST)-', 'IE', 'IE', 'JI^', 'JI', None, 'JOR(GK)^$', 'IÖRK', 'IÖRK', 'J', 'I', 'I', 'KC(ÄEIJ)-', 'X', 'X', 'KD', 'KT', None, 'KE(LMNRST)-3^', 'KE', 'KE', 'KG(AÄEILOÖRUÜY)-', 'K', None, 'KH<^', 'K', 'K', 'KIC$', 'KIZ', 'KIZ', 'KLE(LMNRST)-3^', 'KLE', 'KLE', 'KOTELE-^', 'KOTL', 'KUTL', 'KREAT-^', 'KREA', 'KREA', 'KRÜS(TZ)--^', 'KRI', None, 'KRYS(TZ)--^', 'KRI', None, 'KRY9^', 'KRÜ', None, 'KSCH---', 'K', 'K', 'KSH--', 'K', 'K', 'K(SßXZ)7', 'X', 'X', "KT'S$", 'X', 'X', 'KTI(AIOU)-3', 'XI', 'XI', 'KT(SßXZ)', 'X', 'X', 'KY9^', 'KÜ', None, "K'S$", 'X', 'X', 'K´S$', 'X', 'X', 'LANGES$', ' LANGES', ' LANKEZ', 'LANGE$', ' LANGE', ' LANKE', 'LANG$', ' LANK', ' LANK', 'LARVE-', 'LARF', 'LARF', 'LD(SßZ)$', 'LS', 'LZ', "LD'S$", 'LS', 'LZ', 'LD´S$', 'LS', 'LZ', 'LEAND-^', 'LEAN', 'LEAN', 'LEERSTEHE-----^', 'LER ', 'LER ', 'LEICHBLEIB-----', 'LEICH ', 'LEIK ', 'LEICHLAUTE-----', 'LEICH ', 'LEIK ', 'LEIDERREGE------', 'LEIT ', 'LEIT ', 'LEIDGEPR----^', 'LEIT ', 'LEIT ', 'LEINSTEHE-----', 'LEIN ', 'LEIN ', 'LEL-', 'LE', 'LE', 'LE(MNRST)-3^', 'LE', 'LE', 'LETTE$', 'LET', 'LET', 'LFGNAG-', 'LFGAN', 'LFKAN', 'LICHERWEIS----', 'LICHA ', 'LIKA ', 'LIC$', 'LIZ', 'LIZ', 'LIVE^$', 'LEIF', 'LEIF', 'LT(SßZ)$', 'LS', 'LZ', "LT'S$", 'LS', 'LZ', 'LT´S$', 'LS', 'LZ', 'LUI(GS)--', 'LU', 'LU', 'LV(AIO)-', 'LW', None, 'LY9^', 'LÜ', None, 'LSTS$', 'LS', 'LZ', 'LZ(BDFGKLMNPQRSTVWX)-', 'LS', None, 'L(SßZ)$', 'LS', None, 'MAIR-<', 'MEI', 'NEI', 'MANAG-', 'MENE', 'NENE', 'MANUEL', 'MANUEL', None, 'MASSEU(RS)-', 'MASÖ', 'NAZÖ', 'MATCH', 'MESH', 'NEZ', 'MAURICE', 'MORIS', 'NURIZ', 'MBH^$', 'MBH', 'MBH', 'MB(ßZ)$', 'MS', None, 'MB(SßTZ)-', 'M', 'N', 'MCG9^', 'MAK', 'NAK', 'MC9^', 'MAK', 'NAK', 'MEMOIR-^', 'MEMOA', 'NENUA', 'MERHAVEN$', 'MAHAFN', None, 'ME(LMNRST)-3^', 'ME', 'NE', 'MEN(STZ)--3', 'ME', None, 'MEN$', 'MEN', None, 'MIGUEL-', 'MIGE', 'NIKE', 'MIKE^$', 'MEIK', 'NEIK', 'MITHILFE----^$', 'MIT H', 'NIT ', 'MN$', 'M', None, 'MN', 'N', 'N', 'MPJUTE-', 'MPUT', 'NBUT', 'MP(ßZ)$', 'MS', None, 'MP(SßTZ)-', 'M', 'N', 'MP(BDJLMNPQVW)-', 'MB', 'NB', 'MY9^', 'MÜ', None, 'M(ßZ)$', 'MS', None, 'M´G7^', 'MAK', 'NAK', "M'G7^", 'MAK', 'NAK', 'M´^', 'MAK', 'NAK', "M'^", 'MAK', 'NAK', 'M', None, 'N', 'NACH^^', 'NACH', 'NAK', 'NADINE', 'NADIN', 'NATIN', 'NAIV--', 'NA', 'NA', 'NAISE$', 'NESE', 'NEZE', 'NAUGENOMM------', 'NAU ', 'NAU ', 'NAUSOGUT$', 'NAUSO GUT', 'NAUZU KUT', 'NCH$', 'NSH', 'NZ', 'NCOISE$', 'SOA', 'ZUA', 'NCOIS$', 'SOA', 'ZUA', 'NDAR$', 'NDA', 'NTA', 'NDERINGEN------', 'NDE ', 'NTE ', 'NDRO(CDKTZ)-', 'NTRO', None, 'ND(BFGJLMNPQVW)-', 'NT', None, 'ND(SßZ)$', 'NS', 'NZ', "ND'S$", 'NS', 'NZ', 'ND´S$', 'NS', 'NZ', 'NEBEN^^', 'NEBN', 'NEBN', 'NENGELERN------', 'NEN ', 'NEN ', 'NENLERN(ET)---', 'NEN LE', 'NEN LE', 'NENZULERNE---', 'NEN ZU LE', 'NEN ZU LE', 'NE(LMNRST)-3^', 'NE', 'NE', 'NEN-3', 'NE', 'NE', 'NETTE$', 'NET', 'NET', 'NGU^^', 'NU', 'NU', 'NG(BDFJLMNPQRTVW)-', 'NK', 'NK', 'NH(AUO)-$', 'NI', 'NI', 'NICHTSAHNEN-----', 'NIX ', 'NIX ', 'NICHTSSAGE----', 'NIX ', 'NIX ', 'NICHTS^^', 'NIX', 'NIX', 'NICHT^^', 'NICHT', 'NIKT', 'NINE$', 'NIN', 'NIN', 'NON^^', 'NON', 'NUN', 'NOTLEIDE-----^', 'NOT ', 'NUT ', 'NOT^^', 'NOT', 'NUT', 'NTI(AIOU)-3', 'NZI', 'NZI', 'NTIEL--3', 'NZI', 'NZI', 'NT(SßZ)$', 'NS', 'NZ', "NT'S$", 'NS', 'NZ', 'NT´S$', 'NS', 'NZ', 'NYLON', 'NEILON', 'NEILUN', 'NY9^', 'NÜ', None, 'NSTZUNEH---', 'NST ZU ', 'NZT ZU ', 'NSZ-', 'NS', None, 'NSTS$', 'NS', 'NZ', 'NZ(BDFGKLMNPQRSTVWX)-', 'NS', None, 'N(SßZ)$', 'NS', None, 'OBERE-', 'OBER', None, 'OBER^^', 'OBA', 'UBA', 'OEU2', 'Ö', 'Ö', 'OE<2', 'Ö', 'Ö', 'OGL-', 'OK', None, 'OGNIE-', 'ONI', 'UNI', 'OGN(AEOU)-$', 'ONI', 'UNI', 'OH(AIOÖUÜY)-', 'OH', None, 'OIE$', 'Ö', 'Ö', 'OIRE$', 'OA', 'UA', 'OIR$', 'OA', 'UA', 'OIX', 'OA', 'UA', 'OI<3', 'EU', 'EU', 'OKAY^$', 'OKE', 'UKE', 'OLYN$', 'OLIN', 'ULIN', 'OO(DLMZ)-', 'U', None, 'OO$', 'U', None, 'OO-', '', '', 'ORGINAL-----', 'ORI', 'URI', 'OTI(AÄOÖUÜ)-', 'OZI', 'UZI', 'OUI^', 'WI', 'FI', 'OUILLE$', 'ULIE', 'ULIE', 'OU(DT)-^', 'AU', 'AU', 'OUSE$', 'AUS', 'AUZ', 'OUT-', 'AU', 'AU', 'OU', 'U', 'U', 'O(FV)$', 'AU', 'AU', 'OVER--<', 'OW', None, 'OV(AOU)-', 'OW', None, 'OW$', 'AU', 'AU', 'OWS$', 'OS', 'UZ', 'OJ(AÄEIOÖUÜ)--', 'O', 'U', 'OYER', 'OIA', None, 'OY(AÄEIOÖUÜ)--', 'O', 'U', 'O(JY)<', 'EU', 'EU', 'OZ$', 'OS', None, 'O´^', 'O', 'U', "O'^", 'O', 'U', 'O', None, 'U', 'PATIEN--^', 'PAZI', 'PAZI', 'PENSIO-^', 'PANSI', 'PANZI', 'PE(LMNRST)-3^', 'PE', 'PE', 'PFER-^', 'FE', 'FE', 'P(FH)<', 'F', 'F', 'PIC^$', 'PIK', 'PIK', 'PIC$', 'PIZ', 'PIZ', 'PIPELINE', 'PEIBLEIN', 'PEIBLEIN', 'POLYP-', 'POLÜ', None, 'POLY^^', 'POLI', 'PULI', 'PORTRAIT7', 'PORTRE', 'PURTRE', 'POWER7', 'PAUA', 'PAUA', 'PP(FH)--<', 'B', 'B', 'PP-', '', '', 'PRODUZ-^', 'PRODU', 'BRUTU', 'PRODUZI--', ' PRODU', ' BRUTU', 'PRIX^$', 'PRI', 'PRI', 'PS-^^', 'P', None, 'P(SßZ)^', None, 'Z', 'P(SßZ)$', 'BS', None, 'PT-^', '', '', 'PTI(AÄOÖUÜ)-3', 'BZI', 'BZI', 'PY9^', 'PÜ', None, 'P(AÄEIOÖRUÜY)-', 'P', 'P', 'P(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'P', None, 'P.^', None, 'P.', 'P^', 'P', None, 'P', 'B', 'B', 'QI-', 'Z', 'Z', 'QUARANT--', 'KARA', 'KARA', 'QUE(LMNRST)-3', 'KWE', 'KFE', 'QUE$', 'K', 'K', 'QUI(NS)$', 'KI', 'KI', 'QUIZ7', 'KWIS', None, 'Q(UV)7', 'KW', 'KF', 'Q<', 'K', 'K', 'RADFAHR----', 'RAT ', 'RAT ', 'RAEFTEZEHRE-----', 'REFTE ', 'REFTE ', 'RCH', 'RCH', 'RK', 'REA(DU)---3^', 'R', None, 'REBSERZEUG------', 'REBS ', 'REBZ ', 'RECHERCH^', 'RESHASH', 'REZAZ', 'RECYCL--', 'RIZEI', 'RIZEI', 'RE(ALST)-3^', 'RE', None, 'REE$', 'RI', 'RI', 'RER$', 'RA', 'RA', 'RE(MNR)-4', 'RE', 'RE', 'RETTE$', 'RET', 'RET', 'REUZ$', 'REUZ', None, 'REW$', 'RU', 'RU', 'RH<^', 'R', 'R', 'RJA(MN)--', 'RI', 'RI', 'ROWD-^', 'RAU', 'RAU', 'RTEMONNAIE-', 'RTMON', 'RTNUN', 'RTI(AÄOÖUÜ)-3', 'RZI', 'RZI', 'RTIEL--3', 'RZI', 'RZI', 'RV(AEOU)-3', 'RW', None, 'RY(KN)-$', 'RI', 'RI', 'RY9^', 'RÜ', None, 'RÄFTEZEHRE-----', 'REFTE ', 'REFTE ', 'SAISO-^', 'SES', 'ZEZ', 'SAFE^$', 'SEIF', 'ZEIF', 'SAUCE-^', 'SOS', 'ZUZ', 'SCHLAGGEBEN-----<', 'SHLAK ', 'ZLAK ', 'SCHSCH---7', '', '', 'SCHTSCH', 'SH', 'Z', 'SC(HZ)<', 'SH', 'Z', 'SC', 'SK', 'ZK', 'SELBSTST--7^^', 'SELB', 'ZELB', 'SELBST7^^', 'SELBST', 'ZELBZT', 'SERVICE7^', 'SÖRWIS', 'ZÖRFIZ', 'SERVI-^', 'SERW', None, 'SE(LMNRST)-3^', 'SE', 'ZE', 'SETTE$', 'SET', 'ZET', 'SHP-^', 'S', 'Z', 'SHST', 'SHT', 'ZT', 'SHTSH', 'SH', 'Z', 'SHT', 'ST', 'Z', 'SHY9^', 'SHÜ', None, 'SH^^', 'SH', None, 'SH3', 'SH', 'Z', 'SICHERGEGAN-----^', 'SICHA ', 'ZIKA ', 'SICHERGEHE----^', 'SICHA ', 'ZIKA ', 'SICHERGESTEL------^', 'SICHA ', 'ZIKA ', 'SICHERSTELL-----^', 'SICHA ', 'ZIKA ', 'SICHERZU(GS)--^', 'SICHA ZU ', 'ZIKA ZU ', 'SIEGLI-^', 'SIKL', 'ZIKL', 'SIGLI-^', 'SIKL', 'ZIKL', 'SIGHT', 'SEIT', 'ZEIT', 'SIGN', 'SEIN', 'ZEIN', 'SKI(NPZ)-', 'SKI', 'ZKI', 'SKI<^', 'SHI', 'ZI', 'SODASS^$', 'SO DAS', 'ZU TAZ', 'SODAß^$', 'SO DAS', 'ZU TAZ', 'SOGENAN--^', 'SO GEN', 'ZU KEN', 'SOUND-', 'SAUN', 'ZAUN', 'STAATS^^', 'STAZ', 'ZTAZ', 'STADT^^', 'STAT', 'ZTAT', 'STANDE$', ' STANDE', ' ZTANTE', 'START^^', 'START', 'ZTART', 'STAURANT7', 'STORAN', 'ZTURAN', 'STEAK-', 'STE', 'ZTE', 'STEPHEN-^$', 'STEW', None, 'STERN', 'STERN', None, 'STRAF^^', 'STRAF', 'ZTRAF', "ST'S$", 'Z', 'Z', 'ST´S$', 'Z', 'Z', 'STST--', '', '', 'STS(ACEÈÉÊHIÌÍÎOUÄÜÖ)--', 'ST', 'ZT', 'ST(SZ)', 'Z', 'Z', 'SPAREN---^', 'SPA', 'ZPA', 'SPAREND----', ' SPA', ' ZPA', 'S(PTW)-^^', 'S', None, 'SP', 'SP', None, 'STYN(AE)-$', 'STIN', 'ZTIN', 'ST', 'ST', 'ZT', 'SUITE<', 'SIUT', 'ZIUT', 'SUKE--$', 'S', 'Z', 'SURF(EI)-', 'SÖRF', 'ZÖRF', 'SV(AEÈÉÊIÌÍÎOU)-<^', 'SW', None, 'SYB(IY)--^', 'SIB', None, 'SYL(KVW)--^', 'SI', None, 'SY9^', 'SÜ', None, 'SZE(NPT)-^', 'ZE', 'ZE', 'SZI(ELN)-^', 'ZI', 'ZI', 'SZCZ<', 'SH', 'Z', 'SZT<', 'ST', 'ZT', 'SZ<3', 'SH', 'Z', 'SÜL(KVW)--^', 'SI', None, 'S', None, 'Z', 'TCH', 'SH', 'Z', 'TD(AÄEIOÖRUÜY)-', 'T', None, 'TD(ÀÁÂÃÅÈÉÊËÌÍÎÏÒÓÔÕØÙÚÛÝŸ)-', 'T', None, 'TEAT-^', 'TEA', 'TEA', 'TERRAI7^', 'TERA', 'TERA', 'TE(LMNRST)-3^', 'TE', 'TE', 'TH<', 'T', 'T', 'TICHT-', 'TIK', 'TIK', 'TICH$', 'TIK', 'TIK', 'TIC$', 'TIZ', 'TIZ', 'TIGGESTELL-------', 'TIK ', 'TIK ', 'TIGSTELL-----', 'TIK ', 'TIK ', 'TOAS-^', 'TO', 'TU', 'TOILET-', 'TOLE', 'TULE', 'TOIN-', 'TOA', 'TUA', 'TRAECHTI-^', 'TRECHT', 'TREKT', 'TRAECHTIG--', ' TRECHT', ' TREKT', 'TRAINI-', 'TREN', 'TREN', 'TRÄCHTI-^', 'TRECHT', 'TREKT', 'TRÄCHTIG--', ' TRECHT', ' TREKT', 'TSCH', 'SH', 'Z', 'TSH', 'SH', 'Z', 'TST', 'ZT', 'ZT', 'T(Sß)', 'Z', 'Z', 'TT(SZ)--<', '', '', 'TT9', 'T', 'T', 'TV^$', 'TV', 'TV', 'TX(AEIOU)-3', 'SH', 'Z', 'TY9^', 'TÜ', None, 'TZ-', '', '', "T'S3$", 'Z', 'Z', 'T´S3$', 'Z', 'Z', 'UEBEL(GNRW)-^^', 'ÜBL ', 'IBL ', 'UEBER^^', 'ÜBA', 'IBA', 'UE2', 'Ü', 'I', 'UGL-', 'UK', None, 'UH(AOÖUÜY)-', 'UH', None, 'UIE$', 'Ü', 'I', 'UM^^', 'UM', 'UN', 'UNTERE--3', 'UNTE', 'UNTE', 'UNTER^^', 'UNTA', 'UNTA', 'UNVER^^', 'UNFA', 'UNFA', 'UN^^', 'UN', 'UN', 'UTI(AÄOÖUÜ)-', 'UZI', 'UZI', 'UVE-4', 'UW', None, 'UY2', 'UI', None, 'UZZ', 'AS', 'AZ', 'VACL-^', 'WAZ', 'FAZ', 'VAC$', 'WAZ', 'FAZ', 'VAN DEN ^', 'FANDN', 'FANTN', 'VANES-^', 'WANE', None, 'VATRO-', 'WATR', None, 'VA(DHJNT)--^', 'F', None, 'VEDD-^', 'FE', 'FE', 'VE(BEHIU)--^', 'F', None, 'VEL(BDLMNT)-^', 'FEL', None, 'VENTZ-^', 'FEN', None, 'VEN(NRSZ)-^', 'FEN', None, 'VER(AB)-^$', 'WER', None, 'VERBAL^$', 'WERBAL', None, 'VERBAL(EINS)-^', 'WERBAL', None, 'VERTEBR--', 'WERTE', None, 'VEREIN-----', 'F', None, 'VEREN(AEIOU)-^', 'WEREN', None, 'VERIFI', 'WERIFI', None, 'VERON(AEIOU)-^', 'WERON', None, 'VERSEN^', 'FERSN', 'FAZN', 'VERSIERT--^', 'WERSI', None, 'VERSIO--^', 'WERS', None, 'VERSUS', 'WERSUS', None, 'VERTI(GK)-', 'WERTI', None, 'VER^^', 'FER', 'FA', 'VERSPRECHE-------', ' FER', ' FA', 'VER$', 'WA', None, 'VER', 'FA', 'FA', 'VET(HT)-^', 'FET', 'FET', 'VETTE$', 'WET', 'FET', 'VE^', 'WE', None, 'VIC$', 'WIZ', 'FIZ', 'VIELSAGE----', 'FIL ', 'FIL ', 'VIEL', 'FIL', 'FIL', 'VIEW', 'WIU', 'FIU', 'VILL(AE)-', 'WIL', None, 'VIS(ACEIKUVWZ)-<^', 'WIS', None, 'VI(ELS)--^', 'F', None, 'VILLON--', 'WILI', 'FILI', 'VIZE^^', 'FIZE', 'FIZE', 'VLIE--^', 'FL', None, 'VL(AEIOU)--', 'W', None, 'VOKA-^', 'WOK', None, 'VOL(ATUVW)--^', 'WO', None, 'VOR^^', 'FOR', 'FUR', 'VR(AEIOU)--', 'W', None, 'VV9', 'W', None, 'VY9^', 'WÜ', 'FI', 'V(ÜY)-', 'W', None, 'V(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'W', None, 'V(AEIJLRU)-<', 'W', None, 'V.^', 'V.', None, 'V<', 'F', 'F', 'WEITERENTWI-----^', 'WEITA ', 'FEITA ', 'WEITREICH-----^', 'WEIT ', 'FEIT ', 'WEITVER^', 'WEIT FER', 'FEIT FA', 'WE(LMNRST)-3^', 'WE', 'FE', 'WER(DST)-', 'WER', None, 'WIC$', 'WIZ', 'FIZ', 'WIEDERU--', 'WIDE', 'FITE', 'WIEDER^$', 'WIDA', 'FITA', 'WIEDER^^', 'WIDA ', 'FITA ', 'WIEVIEL', 'WI FIL', 'FI FIL', 'WISUEL', 'WISUEL', None, 'WR-^', 'W', None, 'WY9^', 'WÜ', 'FI', 'W(BDFGJKLMNPQRSTZ)-', 'F', None, 'W$', 'F', None, 'W', None, 'F', 'X<^', 'Z', 'Z', 'XHAVEN$', 'XAFN', None, 'X(CSZ)', 'X', 'X', 'XTS(CH)--', 'XT', 'XT', 'XT(SZ)', 'Z', 'Z', 'YE(LMNRST)-3^', 'IE', 'IE', 'YE-3', 'I', 'I', 'YOR(GK)^$', 'IÖRK', 'IÖRK', 'Y(AOU)-<7', 'I', 'I', 'Y(BKLMNPRSTX)-1', 'Ü', None, 'YVES^$', 'IF', 'IF', 'YVONNE^$', 'IWON', 'IFUN', 'Y.^', 'Y.', None, 'Y', 'I', 'I', 'ZC(AOU)-', 'SK', 'ZK', 'ZE(LMNRST)-3^', 'ZE', 'ZE', 'ZIEJ$', 'ZI', 'ZI', 'ZIGERJA(HR)-3', 'ZIGA IA', 'ZIKA IA', 'ZL(AEIOU)-', 'SL', None, 'ZS(CHT)--', '', '', 'ZS', 'SH', 'Z', 'ZUERST', 'ZUERST', 'ZUERST', 'ZUGRUNDE^$', 'ZU GRUNDE', 'ZU KRUNTE', 'ZUGRUNDE', 'ZU GRUNDE ', 'ZU KRUNTE ', 'ZUGUNSTEN', 'ZU GUNSTN', 'ZU KUNZTN', 'ZUHAUSE-', 'ZU HAUS', 'ZU AUZ', 'ZULASTEN^$', 'ZU LASTN', 'ZU LAZTN', 'ZURUECK^^', 'ZURÜK', 'ZURIK', 'ZURZEIT', 'ZUR ZEIT', 'ZUR ZEIT', 'ZURÜCK^^', 'ZURÜK', 'ZURIK', 'ZUSTANDE', 'ZU STANDE', 'ZU ZTANTE', 'ZUTAGE', 'ZU TAGE', 'ZU TAKE', 'ZUVER^^', 'ZUFA', 'ZUFA', 'ZUVIEL', 'ZU FIL', 'ZU FIL', 'ZUWENIG', 'ZU WENIK', 'ZU FENIK', 'ZY9^', 'ZÜ', None, 'ZYK3$', 'ZIK', None, 'Z(VW)7^', 'SW', None, None, None, None)¶

_rules_no_lang = ('´', ' ', ' ', '"', ' ', ' ', '`$', '', '', "'", ' ', ' ', ',', ',', ',', ';', ',', ',', '-', ' ', ' ', ' ', ' ', ' ', '.', '.', '.', ':', '.', '.', 'Ä', 'AE', 'AE', 'Ö', 'OE', 'OE', 'Ü', 'UE', 'UE', 'ß', 'S', 'S', 'À', 'A', 'A', 'Á', 'A', 'A', 'Â', 'A', 'A', 'Ã', 'A', 'A', 'Å', 'A', 'A', 'Æ', 'AE', 'AE', 'Ç', 'C', 'C', 'Ð', 'DJ', 'DJ', 'È', 'E', 'E', 'É', 'E', 'E', 'Ê', 'E', 'E', 'Ë', 'E', 'E', 'Ì', 'I', 'I', 'Í', 'I', 'I', 'Î', 'I', 'I', 'Ï', 'I', 'I', 'Ñ', 'NH', 'NH', 'Ò', 'O', 'O', 'Ó', 'O', 'O', 'Ô', 'O', 'O', 'Õ', 'O', 'O', 'Œ', 'OE', 'OE', 'Ø', 'OE', 'OE', 'Š', 'SH', 'SH', 'Þ', 'TH', 'TH', 'Ù', 'U', 'U', 'Ú', 'U', 'U', 'Û', 'U', 'U', 'Ý', 'Y', 'Y', 'Ÿ', 'Y', 'Y', 'MC^', 'MAC', 'MAC', 'MC^', 'MAC', 'MAC', 'M´^', 'MAC', 'MAC', "M'^", 'MAC', 'MAC', 'O´^', 'O', 'O', "O'^", 'O', 'O', 'VAN DEN ^', 'VANDEN', 'VANDEN', None, None, None)¶

_upper_trans = {97: 'A', 98: 'B', 99: 'C', 100: 'D', 101: 'E', 102: 'F', 103: 'G', 104: 'H', 105: 'I', 106: 'J', 107: 'K', 108: 'L', 109: 'M', 110: 'N', 111: 'O', 112: 'P', 113: 'Q', 114: 'R', 115: 'S', 116: 'T', 117: 'U', 118: 'V', 119: 'W', 120: 'X', 121: 'Y', 122: 'Z', 223: 'ß', 224: 'À', 225: 'Á', 226: 'Â', 227: 'Ã', 228: 'Ä', 229: 'Å', 230: 'Æ', 231: 'Ç', 232: 'È', 233: 'É', 234: 'Ê', 235: 'Ë', 236: 'Ì', 237: 'Í', 238: 'Î', 239: 'Ï', 240: 'Ð', 241: 'Ñ', 242: 'Ò', 243: 'Ó', 244: 'Ô', 245: 'Õ', 246: 'Ö', 248: 'Ø', 249: 'Ù', 250: 'Ú', 251: 'Û', 252: 'Ü', 253: 'Ý', 254: 'Þ', 255: 'Ÿ', 339: 'Œ', 353: 'Š'}¶

encode(word)[source]¶

Return the phonet code for a word.

Parameters: word (str) -- The word to transform
Returns: The phonet value
Return type: str

Examples

>>> pe = Phonet()
>>> pe.encode('Christopher')
'KRISTOFA'
>>> pe.encode('Niall')
'NIAL'
>>> pe.encode('Smith')
'SMIT'
>>> pe.encode('Schmidt')
'SHMIT'

>>> pe2 = Phonet(mode=2)
>>> pe2.encode('Christopher')
'KRIZTUFA'
>>> pe2.encode('Niall')
'NIAL'
>>> pe2.encode('Smith')
'ZNIT'
>>> pe2.encode('Schmidt')
'ZNIT'

>>> pe_none = Phonet(lang='none')
>>> pe_none.encode('Christopher')
'CHRISTOPHER'
>>> pe_none.encode('Niall')
'NIAL'
>>> pe_none.encode('Smith')
'SMITH'
>>> pe_none.encode('Schmidt')
'SCHMIDT'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.phonet(word, mode=1, lang='de')[source]¶

Return the phonet code for a word.

This is a wrapper for Phonet.encode().

Parameters

word (str) -- The word to transform
mode (int) -- The ponet variant to employ (1 or 2)
lang (str) -- de (default) for German, none for no language

Returns

The phonet value

Return type

str

Examples

>>> phonet('Christopher')
'KRISTOFA'
>>> phonet('Niall')
'NIAL'
>>> phonet('Smith')
'SMIT'
>>> phonet('Schmidt')
'SHMIT'

>>> phonet('Christopher', mode=2)
'KRIZTUFA'
>>> phonet('Niall', mode=2)
'NIAL'
>>> phonet('Smith', mode=2)
'ZNIT'
>>> phonet('Schmidt', mode=2)
'ZNIT'

>>> phonet('Christopher', lang='none')
'CHRISTOPHER'
>>> phonet('Niall', lang='none')
'NIAL'
>>> phonet('Smith', lang='none')
'SMITH'
>>> phonet('Schmidt', lang='none')
'SCHMIDT'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonet.encode method instead.

class abydos.phonetic.SoundexBR(max_length=4, zero_pad=True)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

SoundexBR.

This is based on [Mar15].

New in version 0.3.6.

Initialize SoundexBR instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

New in version 0.4.0.

_alphabetic = {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R'}¶

_trans = {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶

encode(word)[source]¶

Return the SoundexBR encoding of a word.

Parameters: word (str) -- The word to transform
Returns: The SoundexBR code
Return type: str

Examples

>>> pe = SoundexBR()
>>> pe.encode('Oliveira')
'O416'
>>> pe.encode('Almeida')
'A453'
>>> pe.encode('Barbosa')
'B612'
>>> pe.encode('Araújo')
'A620'
>>> pe.encode('Gonçalves')
'G524'
>>> pe.encode('Goncalves')
'G524'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic SoundexBR encoding of a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic SoundexBR code
Return type: str

Examples

>>> pe = SoundexBR()
>>> pe.encode_alpha('Oliveira')
'OLPR'
>>> pe.encode_alpha('Almeida')
'ALNT'
>>> pe.encode_alpha('Barbosa')
'BRPK'
>>> pe.encode_alpha('Araújo')
'ARK'
>>> pe.encode_alpha('Gonçalves')
'GNKL'
>>> pe.encode_alpha('Goncalves')
'GNKL'

New in version 0.4.0.

abydos.phonetic.soundex_br(word, max_length=4, zero_pad=True)[source]¶

Return the SoundexBR encoding of a word.

This is a wrapper for SoundexBR.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string

Returns

The SoundexBR code

Return type

str

Examples

>>> soundex_br('Oliveira')
'O416'
>>> soundex_br('Almeida')
'A453'
>>> soundex_br('Barbosa')
'B612'
>>> soundex_br('Araújo')
'A620'
>>> soundex_br('Gonçalves')
'G524'
>>> soundex_br('Goncalves')
'G524'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SoundexBR.encode method instead.

class abydos.phonetic.PhoneticSpanish(max_length=-1)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

PhoneticSpanish.

This follows the coding described in [AmonME12] and [delPAngelesEGGM15].

New in version 0.3.6.

Initialize PhoneticSpanish instance.

Parameters: max_length (int) -- The length of the code returned (defaults to unlimited)

New in version 0.4.0.

_alphabetic = {48: 'P', 49: 'B', 50: 'F', 51: 'T', 52: 'S', 53: 'L', 54: 'N', 55: 'K', 56: 'G', 57: 'R'}¶

_trans = {66: '1', 67: '4', 68: '3', 70: '2', 71: '8', 72: '2', 74: '8', 75: '7', 76: '5', 77: '6', 78: '6', 80: '0', 81: '7', 82: '9', 83: '4', 84: '3', 86: '1', 88: '4', 89: '5', 90: '4'}¶

_uc_set = {'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'X', 'Y', 'Z'}¶

encode(word)[source]¶

Return the PhoneticSpanish coding of word.

Parameters: word (str) -- The word to transform
Returns: The PhoneticSpanish code
Return type: str

Examples

>>> pe = PhoneticSpanish()
>>> pe.encode('Perez')
'094'
>>> pe.encode('Martinez')
'69364'
>>> pe.encode('Gutierrez')
'83994'
>>> pe.encode('Santiago')
'4638'
>>> pe.encode('Nicolás')
'6454'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic PhoneticSpanish coding of word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic PhoneticSpanish code
Return type: str

Examples

>>> pe = PhoneticSpanish()
>>> pe.encode_alpha('Perez')
'PRS'
>>> pe.encode_alpha('Martinez')
'NRTNS'
>>> pe.encode_alpha('Gutierrez')
'GTRRS'
>>> pe.encode_alpha('Santiago')
'SNTG'
>>> pe.encode_alpha('Nicolás')
'NSLS'

New in version 0.4.0.

abydos.phonetic.phonetic_spanish(word, max_length=-1)[source]¶

Return the PhoneticSpanish coding of word.

This is a wrapper for PhoneticSpanish.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to unlimited)

Returns

The PhoneticSpanish code

Return type

str

Examples

>>> phonetic_spanish('Perez')
'094'
>>> phonetic_spanish('Martinez')
'69364'
>>> phonetic_spanish('Gutierrez')
'83994'
>>> phonetic_spanish('Santiago')
'4638'
>>> phonetic_spanish('Nicolás')
'6454'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PhoneticSpanish.encode method instead.

class abydos.phonetic.SpanishMetaphone(max_length=6, modified=False)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Spanish Metaphone.

This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at https://github.com/amsqr/Spanish-Metaphone and discussed in [MLM12].

Modified version based on [delPAngelesBailonM16].

New in version 0.3.6.

Initialize AlphaSIS instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 6)
modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm

New in version 0.4.0.

encode(word)[source]¶

Return the Spanish Metaphone of a word.

Parameters: word (str) -- The word to transform
Returns: The Spanish Metaphone code
Return type: str

Examples

>>> pe = SpanishMetaphone()
>>> pe.encode('Perez')
'PRZ'
>>> pe.encode('Martinez')
'MRTNZ'
>>> pe.encode('Gutierrez')
'GTRRZ'
>>> pe.encode('Santiago')
'SNTG'
>>> pe.encode('Nicolás')
'NKLS'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.spanish_metaphone(word, max_length=6, modified=False)[source]¶

Return the Spanish Metaphone of a word.

This is a wrapper for SpanishMetaphone.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 6)
modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm

Returns

The Spanish Metaphone code

Return type

str

Examples

>>> spanish_metaphone('Perez')
'PRZ'
>>> spanish_metaphone('Martinez')
'MRTNZ'
>>> spanish_metaphone('Gutierrez')
'GTRRZ'
>>> spanish_metaphone('Santiago')
'SNTG'
>>> spanish_metaphone('Nicolás')
'NKLS'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SpanishMetaphone.encode method instead.

class abydos.phonetic.SfinxBis(max_length=-1)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

SfinxBis code.

SfinxBis is a Soundex-like algorithm defined in [Axe09].

This implementation follows the reference implementation: [Sjoo09].

SfinxBis is intended chiefly for Swedish names.

New in version 0.3.6.

Initialize SfinxBis instance.

Parameters: max_length (int) -- The length of the code returned (defaults to unlimited)

New in version 0.4.0.

_adelstitler = (' DE LA ', ' DE LAS ', ' DE LOS ', ' VAN DE ', ' VAN DEN ', ' VAN DER ', ' VON DEM ', ' VON DER ', ' AF ', ' AV ', ' DA ', ' DE ', ' DEL ', ' DEN ', ' DES ', ' DI ', ' DO ', ' DON ', ' DOS ', ' DU ', ' E ', ' IN ', ' LA ', ' LE ', ' MAC ', ' MC ', ' VAN ', ' VON ', ' Y ', ' S:T ')¶

_alphabetic = {35: 'Š', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 55: 'F', 56: 'S', 57: 'A'}¶

_harde_vokaler = {'A', 'O', 'U', 'Å'}¶

_mjuka_vokaler = {'E', 'I', 'Y', 'Ä', 'Ö'}¶

_substitutions = {87: 'V', 90: 'S', 192: 'A', 193: 'A', 194: 'A', 195: 'A', 198: 'Ä', 199: 'C', 200: 'E', 201: 'E', 202: 'E', 203: 'E', 204: 'I', 205: 'I', 206: 'I', 207: 'I', 209: 'N', 210: 'O', 211: 'O', 212: 'O', 213: 'O', 216: 'Ö', 217: 'U', 218: 'U', 219: 'U', 220: 'Y', 221: 'Y'}¶

_trans = {65: '9', 66: '1', 67: '2', 68: '3', 69: '9', 70: '7', 71: '2', 72: '9', 73: '9', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '9', 80: '1', 81: '2', 82: '6', 83: '8', 84: '3', 85: '9', 86: '7', 89: '9', 90: '8', 196: '9', 197: '9', 214: '9'}¶

_uc_c_set = {'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Z'}¶

_uc_set = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'Ä', 'Å', 'Ö'}¶

encode(word)[source]¶

Return the SfinxBis code for a word.

Parameters: word (str) -- The word to transform
Returns: The SfinxBis value
Return type: tuple

Examples

>>> pe = SfinxBis()
>>> pe.encode('Christopher')
('K68376',)
>>> pe.encode('Niall')
('N4',)
>>> pe.encode('Smith')
('S53',)
>>> pe.encode('Schmidt')
('S53',)

>>> pe.encode('Johansson')
('J585',)
>>> pe.encode('Sjöberg')
('#162',)

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

encode_alpha(word)[source]¶

Return the alphabetic SfinxBis code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic SfinxBis value
Return type: tuple

Examples

>>> pe = SfinxBis()
>>> pe.encode_alpha('Christopher')
('KRSTFR',)
>>> pe.encode_alpha('Niall')
('NL',)
>>> pe.encode_alpha('Smith')
('SNT',)
>>> pe.encode_alpha('Schmidt')
('SNT',)

>>> pe.encode_alpha('Johansson')
('JNSN',)
>>> pe.encode_alpha('Sjöberg')
('ŠPRK',)

New in version 0.4.0.

abydos.phonetic.sfinxbis(word, max_length=-1)[source]¶

Return the SfinxBis code for a word.

This is a wrapper for SfinxBis.encode().

Parameters

word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to unlimited)

Returns

The SfinxBis value

Return type

tuple

Examples

>>> sfinxbis('Christopher')
('K68376',)
>>> sfinxbis('Niall')
('N4',)
>>> sfinxbis('Smith')
('S53',)
>>> sfinxbis('Schmidt')
('S53',)

>>> sfinxbis('Johansson')
('J585',)
>>> sfinxbis('Sjöberg')
('#162',)

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SfinxBis.encode method instead.

class abydos.phonetic.Waahlin(encoder=None)[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Wåhlin code.

Wåhlin's first-letter coding is based on the description in [Eri97].

New in version 0.3.6.

Initialize Waahlin instance.

Parameters: encoder (_Phonetic) -- An initialized phonetic algorithm object

New in version 0.4.0.

_encode_next(word)[source]¶

_transforms = {1: {'Q': 'K', 'W': 'V', 'Z': 'S', 'Ä': 'E'}, 2: {'AE': 'E', 'CH': 'K', 'DJ': 'J', 'GJ': 'J', 'HJ': 'J', 'HR': 'R', 'HV': 'V', 'HW': 'V', 'KJ': '+', 'LJ': 'J', 'PH': 'F', 'QU': 'KV', 'SJ': '*', 'TJ': '+'}, 3: {'SCH': '*', 'SKJ': '*', 'STJ': '*'}}¶

encode(word, alphabetic=False)[source]¶

Return the Wåhlin code for a word.

Parameters

word (str) -- The word to transform
alphabetic (bool) -- If True, the encoder will apply its alphabetic form (.encode_alpha rather than .encode)

Returns

The Wåhlin code value

Return type

str

Examples

>>> pe = Waahlin()
>>> pe.encode('Christopher')
'KRISTOFER'
>>> pe.encode('Niall')
'NJALL'
>>> pe.encode('Smith')
'SMITH'
>>> pe.encode('Schmidt')
'*MIDT'

New in version 0.4.0.

encode_alpha(word)[source]¶

Return the alphabetic Wåhlin code for a word.

Parameters: word (str) -- The word to transform
Returns: The alphabetic Wåhlin code value
Return type: str

Examples

>>> pe = Waahlin()
>>> pe.encode_alpha('Christopher')
'KRISTOFER'
>>> pe.encode_alpha('Niall')
'NJALL'
>>> pe.encode_alpha('Smith')
'SMITH'
>>> pe.encode_alpha('Schmidt')
'ŠMIDT'

New in version 0.4.0.

class abydos.phonetic.Norphone[source]¶

Bases: abydos.phonetic._phonetic._Phonetic

Norphone.

The reference implementation by Lars Marius Garshol is available in [Gar15].

Norphone was designed for Norwegian, but this implementation has been extended to support Swedish vowels as well. This function incorporates the "not implemented" rules from the above file's rule set.

New in version 0.3.6.

_replacements = {1: {'D': 'T', 'G': 'K', 'W': 'V', 'X': 'KS', 'Z': 'S'}, 2: {'CH': 'K', 'CK': 'K', 'GH': 'K', 'GJ': 'J', 'HG': 'K', 'HJ': 'J', 'HL': 'L', 'HR': 'R', 'KI': 'X', 'KJ': 'X', 'LD': 'L', 'ND': 'N', 'PH': 'F', 'SJ': 'X', 'TH': 'T'}, 3: {'KEI': 'X', 'SKJ': 'X'}, 4: {'SKEI': 'X'}}¶

_uc_v_set = {'A', 'E', 'I', 'O', 'U', 'Y', 'Ä', 'Å', 'Æ', 'Ö', 'Ø'}¶

encode(word)[source]¶

Return the Norphone code.

Parameters: word (str) -- The word to transform
Returns: The Norphone code
Return type: str

Examples

>>> pe = Norphone()
>>> pe.encode('Hansen')
'HNSN'
>>> pe.encode('Larsen')
'LRSN'
>>> pe.encode('Aagaard')
'ÅKRT'
>>> pe.encode('Braaten')
'BRTN'
>>> pe.encode('Sandvik')
'SNVK'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.phonetic.norphone(word)[source]¶

Return the Norphone code.

This is a wrapper for Norphone.encode().

Parameters: word (str) -- The word to transform
Returns: The Norphone code
Return type: str

Examples

>>> norphone('Hansen')
'HNSN'
>>> norphone('Larsen')
'LRSN'
>>> norphone('Aagaard')
'ÅKRT'
>>> norphone('Braaten')
'BRTN'
>>> norphone('Sandvik')
'SNVK'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Norphone.encode method instead.