abydos.phonetic package¶
abydos.phonetic.
The phonetic package includes classes for phonetic algorithms, including:
Robert C. Russell's Index (
RussellIndex
)American Soundex (
Soundex
)Refined Soundex (
RefinedSoundex
)Daitch-Mokotoff Soundex (
DaitchMokotoff
)NYSIIS (
NYSIIS
)Match Rating Algorithm (
phonetic.MRA
)Metaphone (
Metaphone
)Double Metaphone (
DoubleMetaphone
)Caverphone (
Caverphone
)Alpha Search Inquiry System (
AlphaSIS
)Fuzzy Soundex (
FuzzySoundex
)Phonex (
Phonex
)Phonem (
Phonem
)Phonix (
Phonix
)Standardized Phonetic Frequency Code (
SPFC
)Statistics Canada (
StatisticsCanada
)LEIN (
LEIN
)Roger Root (
RogerRoot
)Eudex phonetic hash (
phonetic.Eudex
)Parmar-Kumbharana (
ParmarKumbharana
)Davidson's Consonant Code (
Davidson
)SoundD (
SoundD
)PSHP Soundex/Viewex Coding (
PSHPSoundexFirst
andPSHPSoundexLast
)Dolby Code (
Dolby
)NRL English-to-phoneme (
NRL
)Beider-Morse Phonetic Matching (
BeiderMorse
)
There are also language-specific phonetic algorithms for German:
For French:
FONEM (
FONEM
)an early version of Henry Code (
HenryEarly
)
For Spanish:
Phonetic Spanish (
PhoneticSpanish
)Spanish Metaphone (
SpanishMetaphone
)
For Swedish:
For Norwegian:
Norphone (
Norphone
)
For Brazilian Portuguese:
SoundexBR (
SoundexBR
)
And there are some hybrid phonetic algorithms that employ multiple underlying phonetic algorithms:
Oxford Name Compression Algorithm (ONCA) (
ONCA
)MetaSoundex (
MetaSoundex
)
Each class has an encode
method to return the phonetically encoded string.
Classes for which encode
returns a numeric value generally have an
encode_alpha
method that returns an alphabetic version of the phonetic
encoding, as demonstrated below:
>>> rus = RussellIndex()
>>> rus.encode('Abramson')
128637
>>> rus.encode_alpha('Abramson')
'ABRMCN'
-
class
abydos.phonetic.
_Phonetic
[source]¶ Bases:
object
Abstract Phonetic class.
New in version 0.3.6.
-
_delete_consecutive_repeats
(word)[source]¶ Delete consecutive repeated characters in a word.
- Parameters
word (str) -- The word to transform
- Returns
Word with consecutive repeating characters collapsed to a single instance
- Return type
str
Examples
>>> pe = _Phonetic() >>> pe._delete_consecutive_repeats('REDDEE') 'REDE' >>> pe._delete_consecutive_repeats('AEIOU') 'AEIOU' >>> pe._delete_consecutive_repeats('AAACCCTTTGGG') 'ACTG'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_lc_set
= {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'}¶
-
_lc_v_set
= {'a', 'e', 'i', 'o', 'u'}¶
-
_lc_vy_set
= {'a', 'e', 'i', 'o', 'u', 'y'}¶
-
_uc_set
= {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'}¶
-
_uc_v_set
= {'A', 'E', 'I', 'O', 'U'}¶
-
_uc_vy_set
= {'A', 'E', 'I', 'O', 'U', 'Y'}¶
-
-
class
abydos.phonetic.
RussellIndex
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Russell Index.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
New in version 0.3.6.
-
_num_set
= {'1', '2', '3', '4', '5', '6', '7', '8'}¶
-
_num_trans
= {49: 'A', 50: 'B', 51: 'C', 52: 'D', 53: 'L', 54: 'M', 55: 'N', 56: 'R'}¶
-
_to_alpha
(num)[source]¶ Convert the Russell Index integer to an alphabetic string.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
- Parameters
num (int) -- A Russell Index integer value
- Returns
The Russell Index as an alphabetic string
- Return type
str
Examples
>>> pe = RussellIndex() >>> pe._to_alpha(3813428) 'CRACDBR' >>> pe._to_alpha(715) 'NAL' >>> pe._to_alpha(3614) 'CMAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_trans
= {65: '1', 66: '2', 67: '3', 68: '4', 69: '1', 70: '2', 71: '3', 73: '1', 75: '3', 76: '5', 77: '6', 78: '7', 79: '1', 80: '2', 81: '3', 82: '8', 83: '3', 84: '4', 85: '1', 86: '2', 88: '3', 89: '1', 90: '3'}¶
-
_uc_set
= {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Y', 'Z'}¶
-
encode
(word)[source]¶ Return the Russell Index (integer output) of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value
- Return type
int
Examples
>>> pe = RussellIndex() >>> pe.encode('Christopher') 3813428 >>> pe.encode('Niall') 715 >>> pe.encode('Smith') 3614 >>> pe.encode('Schmidt') 3614
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the Russell Index (alphabetic output) for the word.
This follows Robert C. Russell's Index algorithm, as described in [Rus18].
- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value as an alphabetic string
- Return type
str
Examples
>>> pe = RussellIndex() >>> pe.encode_alpha('Christopher') 'CRACDBR' >>> pe.encode_alpha('Niall') 'NAL' >>> pe.encode_alpha('Smith') 'CMAD' >>> pe.encode_alpha('Schmidt') 'CMAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
russell_index
(word)[source]¶ Return the Russell Index (integer output) of a word.
This is a wrapper for
RussellIndex.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value
- Return type
int
Examples
>>> russell_index('Christopher') 3813428 >>> russell_index('Niall') 715 >>> russell_index('Smith') 3614 >>> russell_index('Schmidt') 3614
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RussellIndex.encode method instead.
-
abydos.phonetic.
russell_index_num_to_alpha
(num)[source]¶ Convert the Russell Index integer to an alphabetic string.
This is a wrapper for
RussellIndex._to_alpha()
.- Parameters
num (int) -- A Russell Index integer value
- Returns
The Russell Index as an alphabetic string
- Return type
str
Examples
>>> russell_index_num_to_alpha(3813428) 'CRACDBR' >>> russell_index_num_to_alpha(715) 'NAL' >>> russell_index_num_to_alpha(3614) 'CMAD'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RussellIndex._to_alpha method instead.
-
abydos.phonetic.
russell_index_alpha
(word)[source]¶ Return the Russell Index (alphabetic output) for the word.
This is a wrapper for
RussellIndex.encode_alpha()
.- Parameters
word (str) -- The word to transform
- Returns
The Russell Index value as an alphabetic string
- Return type
str
Examples
>>> russell_index_alpha('Christopher') 'CRACDBR' >>> russell_index_alpha('Niall') 'NAL' >>> russell_index_alpha('Smith') 'CMAD' >>> russell_index_alpha('Schmidt') 'CMAD'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RussellIndex.encode_alpha method instead.
-
class
abydos.phonetic.
Soundex
(max_length=4, var='American', reverse=False, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Soundex.
Three variants of Soundex are implemented:
'American' follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracode
'special' follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].
'Census' follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
New in version 0.3.6.
Initialize Soundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
var (str) --
The variant of the algorithm to employ (defaults to
American
):American
follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracodespecial
follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].Census
follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 57: 'H'}¶
-
_trans
= {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '9', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '2', 84: '3', 85: '0', 86: '1', 87: '9', 88: '2', 89: '0', 90: '2'}¶
-
encode
(word)[source]¶ Return the Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Soundex value
- Return type
str
Examples
>>> pe = Soundex() >>> pe.encode("Christopher") 'C623' >>> pe.encode("Niall") 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
>>> Soundex(max_length=-1).encode('Christopher') 'C623160000000000000000000000000000000000000000000000000000000000' >>> Soundex(max_length=-1, zero_pad=False).encode('Christopher') 'C62316'
>>> Soundex(reverse=True).encode('Christopher') 'R132'
>>> pe.encode('Ashcroft') 'A261' >>> pe.encode('Asicroft') 'A226'
>>> pe_special = Soundex(var='special') >>> pe_special.encode('Ashcroft') 'A226' >>> pe_special.encode('Asicroft') 'A226'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Soundex value
- Return type
str
Examples
>>> pe = Soundex() >>> pe.encode_alpha("Christopher") 'CRKT' >>> pe.encode_alpha("Niall") 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
abydos.phonetic.
soundex
(word, max_length=4, var='American', reverse=False, zero_pad=True)[source]¶ Return the Soundex code for a word.
This is a wrapper for
Soundex.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
var (str) --
The variant of the algorithm to employ (defaults to
American
):American
follows the American Soundex algorithm, as described at [UnitedStates07] and in [Knu98]; this is also called Miracodespecial
follows the rules from the 1880-1910 US Census retrospective re-analysis, in which h & w are not treated as blocking consonants but as vowels. Cf. [Rep13].Census
follows the rules laid out in GIL 55 [UnitedStates97] by the US Census, including coding prefixed and unprefixed versions of some names
reverse (bool) -- Reverse the word before computing the selected Soundex (defaults to False); This results in "Reverse Soundex", which is useful for blocking in cases where the initial elements may be in error.
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The Soundex value
- Return type
str
Examples
>>> soundex("Christopher") 'C623' >>> soundex("Niall") 'N400' >>> soundex('Smith') 'S530' >>> soundex('Schmidt') 'S530'
>>> soundex('Christopher', max_length=-1) 'C623160000000000000000000000000000000000000000000000000000000000' >>> soundex('Christopher', max_length=-1, zero_pad=False) 'C62316'
>>> soundex('Christopher', reverse=True) 'R132'
>>> soundex('Ashcroft') 'A261' >>> soundex('Asicroft') 'A226' >>> soundex('Ashcroft', var='special') 'A226' >>> soundex('Asicroft', var='special') 'A226'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Soundex.encode method instead.
-
class
abydos.phonetic.
RefinedSoundex
(max_length=-1, zero_pad=False, retain_vowels=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Refined Soundex.
This is Soundex, but with more character classes. It was defined at [Boy98].
New in version 0.3.6.
Initialize RefinedSoundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
retain_vowels (bool) -- Retain vowels (as 0) in the resulting code
New in version 0.4.0.
-
_alphabetic
= {49: 'P', 50: 'F', 51: 'K', 52: 'G', 53: 'Z', 54: 'T', 55: 'L', 56: 'N', 57: 'R'}¶
-
_trans
= {65: '0', 66: '1', 67: '3', 68: '6', 69: '0', 70: '2', 71: '4', 72: '0', 73: '0', 74: '4', 75: '3', 76: '7', 77: '8', 78: '8', 79: '0', 80: '1', 81: '5', 82: '9', 83: '3', 84: '6', 85: '0', 86: '2', 87: '0', 88: '5', 89: '0', 90: '5'}¶
-
encode
(word)[source]¶ Return the Refined Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Refined Soundex value
- Return type
str
Examples
>>> pe = RefinedSoundex() >>> pe.encode('Christopher') 'C93619' >>> pe.encode('Niall') 'N7' >>> pe.encode('Smith') 'S86' >>> pe.encode('Schmidt') 'S386'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Refined Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Refined Soundex value
- Return type
str
Examples
>>> pe = RefinedSoundex() >>> pe.encode_alpha('Christopher') 'CRKTPR' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SKNT'
New in version 0.4.0.
-
abydos.phonetic.
refined_soundex
(word, max_length=-1, zero_pad=False, retain_vowels=False)[source]¶ Return the Refined Soundex code for a word.
This is a wrapper for
RefinedSoundex.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to unlimited)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
retain_vowels (bool) -- Retain vowels (as 0) in the resulting code
- Returns
The Refined Soundex value
- Return type
str
Examples
>>> refined_soundex('Christopher') 'C93619' >>> refined_soundex('Niall') 'N7' >>> refined_soundex('Smith') 'S86' >>> refined_soundex('Schmidt') 'S386'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RefinedSoundex.encode method instead.
-
class
abydos.phonetic.
DaitchMokotoff
(max_length=6, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Daitch-Mokotoff Soundex.
Based on Daitch-Mokotoff Soundex [Mok97], this returns values of a word as a set. A collection is necessary since there can be multiple values for a single word.
New in version 0.3.6.
Initialize DaitchMokotoff instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {48: 'A', 49: 'Y', 50: 'st', 51: 'T', 52: 'S', 53: 'K', 54: 'N', 55: 'P', 56: 'L', 57: 'R'}¶
-
_alphabetic_non_initials
= {48: ' ', 49: 'A', 50: ' ', 51: 'T', 52: 'S', 53: 'K', 54: 'N', 55: 'P', 56: 'L', 57: 'R'}¶
-
_dms_order
= {'A': ('AI', 'AJ', 'AU', 'AY', 'A'), 'B': ('B',), 'C': ('CHS', 'CSZ', 'CZS', 'CH', 'CK', 'CS', 'CZ', 'C'), 'D': ('DRS', 'DRZ', 'DSH', 'DSZ', 'DZH', 'DZS', 'DS', 'DT', 'DZ', 'D'), 'E': ('EI', 'EJ', 'EU', 'EY', 'E'), 'F': ('FB', 'F'), 'G': ('G',), 'H': ('H',), 'I': ('IA', 'IE', 'IO', 'IU', 'I'), 'J': ('J',), 'K': ('KH', 'KS', 'K'), 'L': ('L',), 'M': ('MN', 'M'), 'N': ('NM', 'N'), 'O': ('OI', 'OJ', 'OY', 'O'), 'P': ('PF', 'PH', 'P'), 'Q': ('Q',), 'R': ('RS', 'RZ', 'R'), 'S': ('SCHTSCH', 'SCHTCH', 'SCHTSH', 'SHTCH', 'SHTSH', 'STSCH', 'SCHD', 'SCHT', 'SHCH', 'STCH', 'STRS', 'STRZ', 'STSH', 'SZCS', 'SZCZ', 'SCH', 'SHD', 'SHT', 'SZD', 'SZT', 'SC', 'SD', 'SH', 'ST', 'SZ', 'S'), 'T': ('TTSCH', 'TSCH', 'TTCH', 'TTSZ', 'TCH', 'THS', 'TRS', 'TRZ', 'TSH', 'TSZ', 'TTS', 'TTZ', 'TZS', 'TC', 'TH', 'TS', 'TZ', 'T'), 'U': ('UE', 'UI', 'UJ', 'UY', 'U'), 'V': ('V',), 'W': ('W',), 'X': ('X',), 'Y': ('Y',), 'Z': ('ZHDZH', 'ZDZH', 'ZSCH', 'ZDZ', 'ZHD', 'ZSH', 'ZD', 'ZH', 'ZS', 'Z')}¶
-
_dms_table
= {'A': (0, '_', '_'), 'AI': (0, 1, '_'), 'AJ': (0, 1, '_'), 'AU': (0, 7, '_'), 'AY': (0, 1, '_'), 'B': (7, 7, 7), 'C': ((5, 4), (5, 4), (5, 4)), 'CH': ((5, 4), (5, 4), (5, 4)), 'CHS': (5, 54, 54), 'CK': ((5, 45), (5, 45), (5, 45)), 'CS': (4, 4, 4), 'CSZ': (4, 4, 4), 'CZ': (4, 4, 4), 'CZS': (4, 4, 4), 'D': (3, 3, 3), 'DRS': (4, 4, 4), 'DRZ': (4, 4, 4), 'DS': (4, 4, 4), 'DSH': (4, 4, 4), 'DSZ': (4, 4, 4), 'DT': (3, 3, 3), 'DZ': (4, 4, 4), 'DZH': (4, 4, 4), 'DZS': (4, 4, 4), 'E': (0, '_', '_'), 'EI': (0, 1, '_'), 'EJ': (0, 1, '_'), 'EU': (1, 1, '_'), 'EY': (0, 1, '_'), 'F': (7, 7, 7), 'FB': (7, 7, 7), 'G': (5, 5, 5), 'H': (5, 5, '_'), 'I': (0, '_', '_'), 'IA': (1, '_', '_'), 'IE': (1, '_', '_'), 'IO': (1, '_', '_'), 'IU': (1, '_', '_'), 'J': ((1, 4), ('_', 4), ('_', 4)), 'K': (5, 5, 5), 'KH': (5, 5, 5), 'KS': (5, 54, 54), 'L': (8, 8, 8), 'M': (6, 6, 6), 'MN': ('6_6', '6_6', '6_6'), 'N': (6, 6, 6), 'NM': ('6_6', '6_6', '6_6'), 'O': (0, '_', '_'), 'OI': (0, 1, '_'), 'OJ': (0, 1, '_'), 'OY': (0, 1, '_'), 'P': (7, 7, 7), 'PF': (7, 7, 7), 'PH': (7, 7, 7), 'Q': (5, 5, 5), 'R': (9, 9, 9), 'RS': ((94, 4), (94, 4), (94, 4)), 'RZ': ((94, 4), (94, 4), (94, 4)), 'S': (4, 4, 4), 'SC': (2, 4, 4), 'SCH': (4, 4, 4), 'SCHD': (2, 43, 43), 'SCHT': (2, 43, 43), 'SCHTCH': (2, 4, 4), 'SCHTSCH': (2, 4, 4), 'SCHTSH': (2, 4, 4), 'SD': (2, 43, 43), 'SH': (4, 4, 4), 'SHCH': (2, 4, 4), 'SHD': (2, 43, 43), 'SHT': (2, 43, 43), 'SHTCH': (2, 4, 4), 'SHTSH': (2, 4, 4), 'ST': (2, 43, 43), 'STCH': (2, 4, 4), 'STRS': (2, 4, 4), 'STRZ': (2, 4, 4), 'STSCH': (2, 4, 4), 'STSH': (2, 4, 4), 'SZ': (4, 4, 4), 'SZCS': (2, 4, 4), 'SZCZ': (2, 4, 4), 'SZD': (2, 43, 43), 'SZT': (2, 43, 43), 'T': (3, 3, 3), 'TC': (4, 4, 4), 'TCH': (4, 4, 4), 'TH': (3, 3, 3), 'THS': (4, 4, 4), 'TRS': (4, 4, 4), 'TRZ': (4, 4, 4), 'TS': (4, 4, 4), 'TSCH': (4, 4, 4), 'TSH': (4, 4, 4), 'TSZ': (4, 4, 4), 'TTCH': (4, 4, 4), 'TTS': (4, 4, 4), 'TTSCH': (4, 4, 4), 'TTSZ': (4, 4, 4), 'TTZ': (4, 4, 4), 'TZ': (4, 4, 4), 'TZS': (4, 4, 4), 'U': (0, '_', '_'), 'UE': (0, '_', '_'), 'UI': (0, 1, '_'), 'UJ': (0, 1, '_'), 'UY': (0, 1, '_'), 'V': (7, 7, 7), 'W': (7, 7, 7), 'X': (5, 54, 54), 'Y': (1, '_', '_'), 'Z': (4, 4, 4), 'ZD': (2, 43, 43), 'ZDZ': (2, 4, 4), 'ZDZH': (2, 4, 4), 'ZH': (4, 4, 4), 'ZHD': (2, 43, 43), 'ZHDZH': (2, 4, 4), 'ZS': (4, 4, 4), 'ZSCH': (4, 4, 4), 'ZSH': (4, 4, 4)}¶
-
_uc_v_set
= {'A', 'E', 'I', 'J', 'O', 'U', 'Y'}¶
-
encode
(word)[source]¶ Return the Daitch-Mokotoff Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> pe = DaitchMokotoff() >>> sorted(pe.encode('Christopher')) ['494379', '594379'] >>> pe.encode('Niall') {'680000'} >>> pe.encode('Smith') {'463000'} >>> pe.encode('Schmidt') {'463000'}
>>> sorted(DaitchMokotoff(max_length=20, ... zero_pad=False).encode('The quick brown fox')) ['35457976754', '3557976754']
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Daitch-Mokotoff Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> pe = DaitchMokotoff() >>> sorted(pe.encode_alpha('Christopher')) ['KRSTPR', 'SRSTPR'] >>> pe.encode_alpha('Niall') {'NL'} >>> pe.encode_alpha('Smith') {'SNT'} >>> pe.encode_alpha('Schmidt') {'SNT'}
>>> sorted(DaitchMokotoff(max_length=20, ... zero_pad=False).encode_alpha('The quick brown fox')) ['TKKPRPNPKS', 'TKSKPRPNPKS']
New in version 0.4.0.
-
abydos.phonetic.
dm_soundex
(word, max_length=6, zero_pad=True)[source]¶ Return the Daitch-Mokotoff Soundex code for a word.
This is a wrapper for
DaitchMokotoff.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 6; must be between 6 and 64)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The Daitch-Mokotoff Soundex value
- Return type
str
Examples
>>> sorted(dm_soundex('Christopher')) ['494379', '594379'] >>> dm_soundex('Niall') {'680000'} >>> dm_soundex('Smith') {'463000'} >>> dm_soundex('Schmidt') {'463000'}
>>> sorted(dm_soundex('The quick brown fox', max_length=20, ... zero_pad=False)) ['35457976754', '3557976754']
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the DaitchMokotoff.encode method instead.
-
class
abydos.phonetic.
FuzzySoundex
(max_length=5, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Fuzzy Soundex.
Fuzzy Soundex is an algorithm derived from Soundex, defined in [HM02].
New in version 0.3.6.
Initialize FuzzySoundex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {48: 'A', 49: 'P', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 55: 'K', 57: 'S'}¶
-
_trans
= {65: '0', 66: '1', 67: '9', 68: '3', 69: '0', 70: '1', 71: '7', 72: '-', 73: '0', 74: '7', 75: '7', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '7', 82: '6', 83: '9', 84: '3', 85: '0', 86: '1', 87: '-', 88: '7', 89: '-', 90: '9'}¶
-
encode
(word)[source]¶ Return the Fuzzy Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Fuzzy Soundex value
- Return type
str
Examples
>>> pe = FuzzySoundex() >>> pe.encode('Christopher') 'K6931' >>> pe.encode('Niall') 'N4000' >>> pe.encode('Smith') 'S5300' >>> pe.encode('Smith') 'S5300'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Fuzzy Soundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Fuzzy Soundex value
- Return type
str
Examples
>>> pe = FuzzySoundex() >>> pe.encode_alpha('Christopher') 'KRSTP' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
abydos.phonetic.
fuzzy_soundex
(word, max_length=5, zero_pad=True)[source]¶ Return the Fuzzy Soundex code for a word.
This is a wrapper for
FuzzySoundex.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The Fuzzy Soundex value
- Return type
str
Examples
>>> fuzzy_soundex('Christopher') 'K6931' >>> fuzzy_soundex('Niall') 'N4000' >>> fuzzy_soundex('Smith') 'S5300' >>> fuzzy_soundex('Smith') 'S5300'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the FuzzySoundex.encode method instead.
-
class
abydos.phonetic.
LEIN
(max_length=4, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
LEIN code.
This is Michigan LEIN (Law Enforcement Information Network) name coding, described in [MKTM77].
New in version 0.3.6.
Initialize LEIN instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {49: 'T', 50: 'N', 51: 'L', 52: 'P', 53: 'K'}¶
-
_del_trans
= {32: None, 65: None, 69: None, 72: None, 73: None, 79: None, 85: None, 87: None, 89: None}¶
-
_trans
= {66: '4', 67: '5', 68: '1', 70: '4', 71: '5', 74: '5', 75: '5', 76: '3', 77: '2', 78: '2', 80: '4', 81: '5', 82: '3', 83: '5', 84: '1', 86: '4', 88: '5', 90: '5'}¶
-
encode
(word)[source]¶ Return the LEIN code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The LEIN code
- Return type
str
Examples
>>> pe = LEIN() >>> pe.encode('Christopher') 'C351' >>> pe.encode('Niall') 'N300' >>> pe.encode('Smith') 'S210' >>> pe.encode('Schmidt') 'S521'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic LEIN code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic LEIN code
- Return type
str
Examples
>>> pe = LEIN() >>> pe.encode_alpha('Christopher') 'CLKT' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SKNT'
New in version 0.4.0.
-
abydos.phonetic.
lein
(word, max_length=4, zero_pad=True)[source]¶ Return the LEIN code for a word.
This is a wrapper for
LEIN.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The LEIN code
- Return type
str
Examples
>>> lein('Christopher') 'C351' >>> lein('Niall') 'N300' >>> lein('Smith') 'S210' >>> lein('Schmidt') 'S521'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the LEIN.encode method instead.
-
class
abydos.phonetic.
Phonex
(max_length=4, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonex code.
Phonex is an algorithm derived from Soundex, defined in [LR96].
New in version 0.3.6.
Initialize Phonex instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {49: 'P', 50: 'S', 51: 'T', 52: 'L', 53: 'N', 54: 'R'}¶
-
encode
(word)[source]¶ Return the Phonex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Phonex value
- Return type
str
Examples
>>> pe = Phonex() >>> pe.encode('Christopher') 'C623' >>> pe.encode('Niall') 'N400' >>> pe.encode('Schmidt') 'S253' >>> pe.encode('Smith') 'S530'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Phonex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Phonex value
- Return type
str
Examples
>>> pe = Phonex() >>> pe.encode_alpha('Christopher') 'CRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SSNT'
New in version 0.4.0.
-
abydos.phonetic.
phonex
(word, max_length=4, zero_pad=True)[source]¶ Return the Phonex code for a word.
This is a wrapper for
Phonex.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The Phonex value
- Return type
str
Examples
>>> phonex('Christopher') 'C623' >>> phonex('Niall') 'N400' >>> phonex('Schmidt') 'S253' >>> phonex('Smith') 'S530'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonex.encode method instead.
-
class
abydos.phonetic.
Phonix
(max_length=4, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonix code.
Phonix is a Soundex-like algorithm defined in [Gad90].
This implementation is based on: - [Pfe00] - [Chr11] - [Kollar]
New in version 0.3.6.
Initialize Phonix instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.3.6.
-
_alphabetic
= {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 55: 'F', 56: 'S'}¶
-
_substitutions
= None¶
-
_trans
= {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '7', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '8', 84: '3', 85: '0', 86: '7', 87: '0', 88: '8', 89: '0', 90: '8'}¶
-
_uc_c_set
= None¶
-
encode
(word)[source]¶ Return the Phonix code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Phonix value
- Return type
str
Examples
>>> pe = Phonix() >>> pe.encode('Christopher') 'K683' >>> pe.encode('Niall') 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Phonix code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Phonix value
- Return type
str
Examples
>>> pe = Phonix() >>> pe.encode_alpha('Christopher') 'KRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
abydos.phonetic.
phonix
(word, max_length=4, zero_pad=True)[source]¶ Return the Phonix code for a word.
This is a wrapper for
Phonix.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The Phonix value
- Return type
str
Examples
>>> phonix('Christopher') 'K683' >>> phonix('Niall') 'N400' >>> phonix('Smith') 'S530' >>> phonix('Schmidt') 'S530'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonix.encode method instead.
-
class
abydos.phonetic.
PSHPSoundexFirst
(max_length=4, german=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PSHP Soundex/Viewex Coding of a first name.
This coding is based on [HBD76].
Reference was also made to the German version of the same: [HBD79].
A separate class,
PSHPSoundexLast
is used for last names.New in version 0.3.6.
Initialize PSHPSoundexFirst instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
New in version 0.4.0.
-
_alphabetic
= {49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N'}¶
-
_trans
= {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '5', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶
-
encode
(fname)[source]¶ Calculate the PSHP Soundex/Viewex Coding of a first name.
- Parameters
fname (str) -- The first name to encode
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexFirst() >>> pe.encode('Smith') 'S530' >>> pe.encode('Waters') 'W352' >>> pe.encode('James') 'J700' >>> pe.encode('Schmidt') 'S500' >>> pe.encode('Ashcroft') 'A220' >>> pe.encode('John') 'J500' >>> pe.encode('Colin') 'K400' >>> pe.encode('Niall') 'N400' >>> pe.encode('Sally') 'S400' >>> pe.encode('Jane') 'J500'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(fname)[source]¶ Calculate the alphabetic PSHP Soundex/Viewex Coding of a first name.
- Parameters
fname (str) -- The first name to encode
- Returns
The alphabetic PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexFirst() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Waters') 'WTNK' >>> pe.encode_alpha('James') 'JN' >>> pe.encode_alpha('Schmidt') 'SN' >>> pe.encode_alpha('Ashcroft') 'AKK' >>> pe.encode_alpha('John') 'JN' >>> pe.encode_alpha('Colin') 'KL' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Sally') 'SL' >>> pe.encode_alpha('Jane') 'JN'
New in version 0.4.0.
-
abydos.phonetic.
pshp_soundex_first
(fname, max_length=4, german=False)[source]¶ Calculate the PSHP Soundex/Viewex Coding of a first name.
This is a wrapper for
PSHPSoundexFirst.encode()
.- Parameters
fname (str) -- The first name to encode
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pshp_soundex_first('Smith') 'S530' >>> pshp_soundex_first('Waters') 'W352' >>> pshp_soundex_first('James') 'J700' >>> pshp_soundex_first('Schmidt') 'S500' >>> pshp_soundex_first('Ashcroft') 'A220' >>> pshp_soundex_first('John') 'J500' >>> pshp_soundex_first('Colin') 'K400' >>> pshp_soundex_first('Niall') 'N400' >>> pshp_soundex_first('Sally') 'S400' >>> pshp_soundex_first('Jane') 'J500'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PSHPSoundexFirst.encode method instead.
-
class
abydos.phonetic.
PSHPSoundexLast
(max_length=4, german=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PSHP Soundex/Viewex Coding of a last name.
This coding is based on [HBD76].
Reference was also made to the German version of the same: [HBD79].
A separate function,
PSHPSoundexFirst
is used for first names.New in version 0.3.6.
Initialize PSHPSoundexLast instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
New in version 0.4.0.
-
_alphabetic
= {49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N'}¶
-
_trans
= {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '5', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶
-
encode
(lname)[source]¶ Calculate the PSHP Soundex/Viewex Coding of a last name.
- Parameters
lname (str) -- The last name to encode
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexLast() >>> pe.encode('Smith') 'S530' >>> pe.encode('Waters') 'W350' >>> pe.encode('James') 'J500' >>> pe.encode('Schmidt') 'S530' >>> pe.encode('Ashcroft') 'A225'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(lname)[source]¶ Calculate the alphabetic PSHP Soundex/Viewex Coding of a last name.
- Parameters
lname (str) -- The last name to encode
- Returns
The PSHP alphabetic Soundex/Viewex Coding
- Return type
str
Examples
>>> pe = PSHPSoundexLast() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Waters') 'WTN' >>> pe.encode_alpha('James') 'JN' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Ashcroft') 'AKKN'
New in version 0.4.0.
-
abydos.phonetic.
pshp_soundex_last
(lname, max_length=4, german=False)[source]¶ Calculate the PSHP Soundex/Viewex Coding of a last name.
This is a wrapper for
PSHPSoundexLast.encode()
.- Parameters
lname (str) -- The last name to encode
max_length (int) -- The length of the code returned (defaults to 4)
german (bool) -- Set to True if the name is German (different rules apply)
- Returns
The PSHP Soundex/Viewex Coding
- Return type
str
Examples
>>> pshp_soundex_last('Smith') 'S530' >>> pshp_soundex_last('Waters') 'W350' >>> pshp_soundex_last('James') 'J500' >>> pshp_soundex_last('Schmidt') 'S530' >>> pshp_soundex_last('Ashcroft') 'A225'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PSHPSoundexLast.encode method instead.
-
class
abydos.phonetic.
NYSIIS
(max_length=6, modified=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
NYSIIS Code.
The New York State Identification and Intelligence System algorithm is defined in [Taf70].
The modified version of this algorithm is described in Appendix B of [LA77].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The maximum length (default 6) of the code to return
modified (bool) -- Indicates whether to use USDA modified NYSIIS
New in version 0.4.0.
-
encode
(word)[source]¶ Return the NYSIIS code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The NYSIIS value
- Return type
str
Examples
>>> pe = NYSIIS() >>> pe.encode('Christopher') 'CRASTA' >>> pe.encode('Niall') 'NAL' >>> pe.encode('Smith') 'SNAT' >>> pe.encode('Schmidt') 'SNAD'
>>> NYSIIS(max_length=-1).encode('Christopher') 'CRASTAFAR'
>>> pe_8m = NYSIIS(max_length=8, modified=True) >>> pe_8m.encode('Christopher') 'CRASTAFA' >>> pe_8m.encode('Niall') 'NAL' >>> pe_8m.encode('Smith') 'SNAT' >>> pe_8m.encode('Schmidt') 'SNAD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
nysiis
(word, max_length=6, modified=False)[source]¶ Return the NYSIIS code for a word.
This is a wrapper for
NYSIIS.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The maximum length (default 6) of the code to return
modified (bool) -- Indicates whether to use USDA modified NYSIIS
- Returns
The NYSIIS value
- Return type
str
Examples
>>> nysiis('Christopher') 'CRASTA' >>> nysiis('Niall') 'NAL' >>> nysiis('Smith') 'SNAT' >>> nysiis('Schmidt') 'SNAD'
>>> nysiis('Christopher', max_length=-1) 'CRASTAFAR'
>>> nysiis('Christopher', max_length=8, modified=True) 'CRASTAFA' >>> nysiis('Niall', max_length=8, modified=True) 'NAL' >>> nysiis('Smith', max_length=8, modified=True) 'SNAT' >>> nysiis('Schmidt', max_length=8, modified=True) 'SNAD'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the NYSIIS.encode method instead.
-
class
abydos.phonetic.
MRA
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Western Airlines Surname Match Rating Algorithm.
A description of the Western Airlines Surname Match Rating Algorithm can be found on page 18 of [MKTM77].
New in version 0.3.6.
-
encode
(word)[source]¶ Return the MRA personal numeric identifier (PNI) for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MRA PNI
- Return type
str
Examples
>>> pe = MRA() >>> pe.encode('Christopher') 'CHRPHR' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SMTH' >>> pe.encode('Schmidt') 'SCHMDT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
mra
(word)[source]¶ Return the MRA personal numeric identifier (PNI) for a word.
This is a wrapper for
MRA.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The MRA PNI
- Return type
str
Examples
>>> mra('Christopher') 'CHRPHR' >>> mra('Niall') 'NL' >>> mra('Smith') 'SMTH' >>> mra('Schmidt') 'SCHMDT'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the MRA.encode method instead.
-
class
abydos.phonetic.
Caverphone
(version=2)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Caverphone.
A description of version 1 of the algorithm can be found in [Hoo02].
A description of version 2 of the algorithm can be found in [Hoo04].
New in version 0.3.6.
Initialize Caverphone instance.
- Parameters
version (int) -- The version of Caverphone to employ for encoding (defaults to 2)
New in version 0.4.0.
-
encode
(word)[source]¶ Return the Caverphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Caverphone value
- Return type
str
Examples
>>> pe = Caverphone() >>> pe.encode('Christopher') 'KRSTFA1111' >>> pe.encode('Niall') 'NA11111111' >>> pe.encode('Smith') 'SMT1111111' >>> pe.encode('Schmidt') 'SKMT111111'
>>> pe_1 = Caverphone(version=1) >>> pe_1.encode('Christopher') 'KRSTF1' >>> pe_1.encode('Niall') 'N11111' >>> pe_1.encode('Smith') 'SMT111' >>> pe_1.encode('Schmidt') 'SKMT11'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Caverphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Caverphone value
- Return type
str
Examples
>>> pe = Caverphone() >>> pe.encode_alpha('Christopher') 'KRSTFA' >>> pe.encode_alpha('Niall') 'NA' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'SKMT'
>>> pe_1 = Caverphone(version=1) >>> pe_1.encode_alpha('Christopher') 'KRSTF' >>> pe_1.encode_alpha('Niall') 'N' >>> pe_1.encode_alpha('Smith') 'SMT' >>> pe_1.encode_alpha('Schmidt') 'SKMT'
New in version 0.4.0.
-
abydos.phonetic.
caverphone
(word, version=2)[source]¶ Return the Caverphone code for a word.
This is a wrapper for
Caverphone.encode()
.- Parameters
word (str) -- The word to transform
version (int) -- The version of Caverphone to employ for encoding (defaults to 2)
- Returns
The Caverphone value
- Return type
str
Examples
>>> caverphone('Christopher') 'KRSTFA1111' >>> caverphone('Niall') 'NA11111111' >>> caverphone('Smith') 'SMT1111111' >>> caverphone('Schmidt') 'SKMT111111'
>>> caverphone('Christopher', 1) 'KRSTF1' >>> caverphone('Niall', 1) 'N11111' >>> caverphone('Smith', 1) 'SMT111' >>> caverphone('Schmidt', 1) 'SKMT11'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Caverphone.encode method instead.
-
class
abydos.phonetic.
AlphaSIS
(max_length=14)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Alpha-SIS.
The Alpha Search Inquiry System code is defined in [IBMCorporation73]. This implementation is based on the description in [MKTM77].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 14)
New in version 0.4.0.
-
_alpha_sis_basic
= {'B': '9', 'C': ('7', '6'), 'CE': '0', 'CH': ('6', '70', '0'), 'CI': '0', 'CK': ('7', '6'), 'CY': '0', 'CZ': ('70', '6', '0'), 'D': '1', 'DG': '7', 'DS': ('0', '10'), 'DZ': ('0', '10'), 'F': '8', 'G': '7', 'J': '6', 'K': ('7', '6'), 'L': '5', 'M': '3', 'N': '2', 'P': '9', 'PH': '8', 'Q': '7', 'R': '4', 'S': '0', 'SCH': '6', 'SH': '6', 'T': '1', 'TS': ('0', '10'), 'TZ': ('0', '10'), 'V': '8', 'X': '7', 'Z': '0'}¶
-
_alpha_sis_basic_order
= ('SCH', 'CZ', 'CH', 'CK', 'DS', 'DZ', 'TS', 'TZ', 'CI', 'CY', 'CE', 'SH', 'DG', 'PH', 'C', 'K', 'Z', 'S', 'D', 'T', 'N', 'M', 'R', 'L', 'J', 'C', 'G', 'K', 'Q', 'X', 'F', 'V', 'B', 'P')¶
-
_alpha_sis_initials
= {'A': '1', 'E': '1', 'GF': '08', 'GM': '03', 'GN': '02', 'H': '2', 'I': '1', 'J': '3', 'KN': '02', 'O': '1', 'PF': '08', 'PN': '02', 'PS': '00', 'U': '1', 'W': '4', 'WR': '04', 'Y': '5'}¶
-
_alpha_sis_initials_order
= ('GF', 'GM', 'GN', 'KN', 'PF', 'PN', 'PS', 'WR', 'A', 'E', 'H', 'I', 'J', 'O', 'U', 'W', 'Y')¶
-
_alphabetic_initials
= {48: ' ', 49: 'A', 50: 'H', 51: 'J', 52: 'W', 53: 'Y'}¶
-
_alphabetic_non_initials
= {48: 'S', 49: 'T', 50: 'N', 51: 'M', 52: 'R', 53: 'L', 54: 'J', 55: 'K', 56: 'F', 57: 'P'}¶
-
encode
(word)[source]¶ Return the IBM Alpha Search Inquiry System code for a word.
A collection is necessary as the return type since there can be multiple values for a single word. But the collection must be ordered since the first value is the primary coding.
- Parameters
word (str) -- The word to transform
- Returns
The Alpha-SIS value
- Return type
tuple
Examples
>>> pe = AlphaSIS() >>> pe.encode('Christopher') ('06401840000000', '07040184000000', '04018400000000') >>> pe.encode('Niall') ('02500000000000',) >>> pe.encode('Smith') ('03100000000000',) >>> pe.encode('Schmidt') ('06310000000000',)
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Alpha-SIS code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Alpha-SIS value
- Return type
tuple
Examples
>>> pe = AlphaSIS() >>> pe.encode_alpha('Christopher') ('JRSTFR', 'KSRSTFR', 'RSTFR') >>> pe.encode_alpha('Niall') ('NL',) >>> pe.encode_alpha('Smith') ('MT',) >>> pe.encode_alpha('Schmidt') ('JMT',)
New in version 0.4.0.
-
abydos.phonetic.
alpha_sis
(word, max_length=14)[source]¶ Return the IBM Alpha Search Inquiry System code for a word.
This is a wrapper for
AlphaSIS.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 14)
- Returns
The Alpha-SIS value
- Return type
tuple
Examples
>>> alpha_sis('Christopher') ('06401840000000', '07040184000000', '04018400000000') >>> alpha_sis('Niall') ('02500000000000',) >>> alpha_sis('Smith') ('03100000000000',) >>> alpha_sis('Schmidt') ('06310000000000',)
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the AlphaSIS.encode method instead.
-
class
abydos.phonetic.
Davidson
(omit_fname=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Davidson Consonant Code.
This is based on the name compression system described in [Dav62].
[Dol70] identifies this as having been the name compression algorithm used by SABRE.
New in version 0.3.6.
Initialize Davidson instance.
- Parameters
omit_fname (bool) -- Set to True to completely omit the first character of the first name
New in version 0.4.0.
-
_trans
= {65: '', 69: '', 72: '', 73: '', 79: '', 85: '', 87: '', 89: ''}¶
-
encode
(lname, fname='.')[source]¶ Return Davidson's Consonant Code.
- Parameters
lname (str) -- Last name (or word) to be encoded
fname (str) -- First name (optional), of which the first character is included in the code.
- Returns
Davidson's Consonant Code
- Return type
str
Example
>>> pe = Davidson() >>> pe.encode('Gough') 'G .' >>> pe.encode('pneuma') 'PNM .' >>> pe.encode('knight') 'KNGT.' >>> pe.encode('trice') 'TRC .' >>> pe.encode('judge') 'JDG .' >>> pe.encode('Smith', 'James') 'SMT J' >>> pe.encode('Wasserman', 'Tabitha') 'WSRMT'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
davidson
(lname, fname='.', omit_fname=False)[source]¶ Return Davidson's Consonant Code.
This is a wrapper for
Davidson.encode()
.- Parameters
lname (str) -- Last name (or word) to be encoded
fname (str) -- First name (optional), of which the first character is included in the code.
omit_fname (bool) -- Set to True to completely omit the first character of the first name
- Returns
Davidson's Consonant Code
- Return type
str
Example
>>> davidson('Gough') 'G .' >>> davidson('pneuma') 'PNM .' >>> davidson('knight') 'KNGT.' >>> davidson('trice') 'TRC .' >>> davidson('judge') 'JDG .' >>> davidson('Smith', 'James') 'SMT J' >>> davidson('Wasserman', 'Tabitha') 'WSRMT'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Davidson.encode method instead.
-
class
abydos.phonetic.
Dolby
(max_length=-1, keep_vowels=False, vowel_char='*')[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Dolby Code.
This follows "A Spelling Equivalent Abbreviation Algorithm For Personal Names" from [Dol70] and [C+69].
New in version 0.3.6.
Initialize Dolby instance.
- Parameters
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
keep_vowels (bool) -- If True, retains all vowel markers
vowel_char (str) -- The vowel marker character (default to *)
New in version 0.4.0.
-
encode
(word)[source]¶ Return the Dolby Code of a name.
- Parameters
word (str) -- The word to transform
- Returns
The Dolby Code
- Return type
str
Examples
>>> pe = Dolby() >>> pe.encode('Hansen') 'H*NSN' >>> pe.encode('Larsen') 'L*RSN' >>> pe.encode('Aagaard') '*GR' >>> pe.encode('Braaten') 'BR*DN' >>> pe.encode('Sandvik') 'S*NVK'
>>> pe_6 = Dolby(max_length=6) >>> pe_6.encode('Hansen') 'H*NS*N' >>> pe_6.encode('Larsen') 'L*RS*N' >>> pe_6.encode('Aagaard') '*G*R ' >>> pe_6.encode('Braaten') 'BR*D*N' >>> pe_6.encode('Sandvik') 'S*NF*K'
>>> pe.encode('Smith') 'SM*D' >>> pe.encode('Waters') 'W*DRS' >>> pe.encode('James') 'J*MS' >>> pe.encode('Schmidt') 'SM*D' >>> pe.encode('Ashcroft') '*SKRFD'
>>> pe_6.encode('Smith') 'SM*D ' >>> pe_6.encode('Waters') 'W*D*RS' >>> pe_6.encode('James') 'J*M*S ' >>> pe_6.encode('Schmidt') 'SM*D ' >>> pe_6.encode('Ashcroft') '*SKRFD'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Dolby Code of a name.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Dolby Code
- Return type
str
Examples
>>> pe = Dolby() >>> pe.encode_alpha('Hansen') 'HANSN' >>> pe.encode_alpha('Larsen') 'LARSN' >>> pe.encode_alpha('Aagaard') 'AGR' >>> pe.encode_alpha('Braaten') 'BRADN' >>> pe.encode_alpha('Sandvik') 'SANVK'
New in version 0.4.0.
-
abydos.phonetic.
dolby
(word, max_length=-1, keep_vowels=False, vowel_char='*')[source]¶ Return the Dolby Code of a name.
This is a wrapper for
Dolby.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
keep_vowels (bool) -- If True, retains all vowel markers
vowel_char (str) -- The vowel marker character (default to *)
- Returns
The Dolby Code
- Return type
str
Examples
>>> dolby('Hansen') 'H*NSN' >>> dolby('Larsen') 'L*RSN' >>> dolby('Aagaard') '*GR' >>> dolby('Braaten') 'BR*DN' >>> dolby('Sandvik') 'S*NVK' >>> dolby('Hansen', max_length=6) 'H*NS*N' >>> dolby('Larsen', max_length=6) 'L*RS*N' >>> dolby('Aagaard', max_length=6) '*G*R ' >>> dolby('Braaten', max_length=6) 'BR*D*N' >>> dolby('Sandvik', max_length=6) 'S*NF*K'
>>> dolby('Smith') 'SM*D' >>> dolby('Waters') 'W*DRS' >>> dolby('James') 'J*MS' >>> dolby('Schmidt') 'SM*D' >>> dolby('Ashcroft') '*SKRFD' >>> dolby('Smith', max_length=6) 'SM*D ' >>> dolby('Waters', max_length=6) 'W*D*RS' >>> dolby('James', max_length=6) 'J*M*S ' >>> dolby('Schmidt', max_length=6) 'SM*D ' >>> dolby('Ashcroft', max_length=6) '*SKRFD'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Dolby.encode method instead.
-
class
abydos.phonetic.
SPFC
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Standardized Phonetic Frequency Code (SPFC).
Standardized Phonetic Frequency Code is roughly Soundex-like. This implementation is based on page 19-21 of [MKTM77].
New in version 0.3.6.
-
_pf1
= {65: '3', 66: '3', 67: '1', 68: '5', 69: '6', 70: '2', 71: '7', 72: '5', 73: '5', 74: '7', 75: '1', 76: '4', 77: '6', 78: '6', 79: '4', 80: '2', 81: '1', 82: '4', 83: '0', 84: '7', 85: '2', 86: '1', 87: '2', 88: '6', 90: '0'}¶
-
_pf1_alphabetic
= {48: 'S', 49: 'C', 50: 'F', 51: 'A', 52: 'L', 53: 'D', 54: 'E', 55: 'G'}¶
-
_pf2
= {65: '3', 66: '3', 67: '1', 68: '5', 69: '9', 70: '2', 71: '7', 72: '5', 73: '5', 74: '7', 75: '1', 76: '9', 77: '6', 78: '6', 79: '4', 80: '2', 81: '1', 82: '4', 83: '0', 84: '7', 85: '8', 86: '8', 87: '8', 88: '2', 90: '0'}¶
-
_pf2_alphabetic
= {48: 'S', 49: 'C', 50: 'F', 51: 'A', 52: 'O', 53: 'D', 54: 'M', 55: 'G', 56: 'U', 57: 'E'}¶
-
_pf3
= {65: '7', 66: '0', 67: '0', 68: '1', 69: '7', 70: '2', 71: '3', 72: '7', 73: '7', 74: '3', 75: '0', 76: '2', 77: '4', 78: '4', 79: '7', 80: '2', 81: '0', 82: '5', 83: '6', 84: '1', 85: '7', 86: '0', 87: '7', 88: '3', 89: '7', 90: '6'}¶
-
_pf3_alphabetic
= {48: 'B', 49: 'D', 50: 'F', 51: 'G', 52: 'M', 53: 'R', 54: 'S', 55: 'Z'}¶
-
_substitutions
= (('DK', 'K'), ('DT', 'T'), ('SC', 'S'), ('KN', 'N'), ('MN', 'N'))¶
-
encode
(word)[source]¶ Return the Standardized Phonetic Frequency Code (SPFC) of a word.
- Parameters
word (str) -- The word to transform
- Returns
The SPFC value
- Return type
str
- Raises
AttributeError -- Word attribute must be a string with a space or period dividing the first and last names or a tuple/list consisting of the first and last names
Examples
>>> pe = SPFC() >>> pe.encode('Christopher Smith') '01160' >>> pe.encode('Christopher Schmidt') '01160' >>> pe.encode('Niall Smith') '01660' >>> pe.encode('Niall Schmidt') '01660'
>>> pe.encode('L.Smith') '01960' >>> pe.encode('R.Miller') '65490'
>>> pe.encode(('L', 'Smith')) '01960' >>> pe.encode(('R', 'Miller')) '65490'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic SPFC of a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SPFC value
- Return type
str
Examples
>>> pe = SPFC() >>> pe.encode_alpha('Christopher Smith') 'SDCMS' >>> pe.encode_alpha('Christopher Schmidt') 'SDCMS' >>> pe.encode_alpha('Niall Smith') 'SDMMS' >>> pe.encode_alpha('Niall Schmidt') 'SDMMS'
>>> pe.encode_alpha('L.Smith') 'SDEMS' >>> pe.encode_alpha('R.Miller') 'EROES'
>>> pe.encode_alpha(('L', 'Smith')) 'SDEMS' >>> pe.encode_alpha(('R', 'Miller')) 'EROES'
New in version 0.4.0.
-
-
abydos.phonetic.
spfc
(word)[source]¶ Return the Standardized Phonetic Frequency Code (SPFC) of a word.
This is a wrapper for
SPFC.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The SPFC value
- Return type
str
Examples
>>> spfc('Christopher Smith') '01160' >>> spfc('Christopher Schmidt') '01160' >>> spfc('Niall Smith') '01660' >>> spfc('Niall Schmidt') '01660'
>>> spfc('L.Smith') '01960' >>> spfc('R.Miller') '65490'
>>> spfc(('L', 'Smith')) '01960' >>> spfc(('R', 'Miller')) '65490'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SPFC.encode method instead.
-
class
abydos.phonetic.
RogerRoot
(max_length=5, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Roger Root code.
This is Roger Root name coding, described in [MKTM77].
New in version 0.3.6.
Initialize RogerRoot instance.
- Parameters
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {48: 'S', 49: 'T', 50: 'N', 51: 'M', 52: 'R', 53: 'L', 54: 'J', 55: 'K', 56: 'F', 57: 'P'}¶
-
_alphabetic_initial
= {48: ' ', 49: 'A', 50: 'H', 51: 'J', 52: 'W', 53: 'Y'}¶
-
_init_patterns
= {1: {'A': '1', 'B': '09', 'C': '07', 'D': '01', 'E': '1', 'F': '08', 'G': '07', 'H': '2', 'I': '1', 'J': '3', 'K': '07', 'L': '05', 'M': '03', 'N': '02', 'O': '1', 'P': '09', 'Q': '07', 'R': '04', 'S': '0*0', 'T': '01', 'U': '1', 'V': '08', 'W': '4', 'X': '07', 'Y': '5', 'Z': '0*0'}, 2: {'CE': '0*0', 'CH': '06', 'CI': '0*0', 'CY': '0*0', 'DG': '07', 'GF': '08', 'GM': '03', 'GN': '02', 'KN': '02', 'PF': '08', 'PH': '08', 'PN': '02', 'SH': '06', 'TS': '0*0', 'WR': '04'}, 3: {'SCH': '06', 'TSH': '06'}, 4: {'TSCH': '06'}}¶
-
_med_patterns
= {1: {'A': '*', 'B': '9', 'C': '7', 'D': '1', 'E': '*', 'F': '8', 'G': '7', 'H': '*', 'I': '*', 'J': '6', 'K': '7', 'L': '5', 'M': '3', 'N': '2', 'O': '*', 'P': '9', 'Q': '7', 'R': '4', 'S': '0', 'T': '1', 'U': '*', 'V': '8', 'W': '*', 'X': '7', 'Y': '*', 'Z': '0'}, 2: {'CE': '0', 'CH': '6', 'CI': '0', 'CY': '0', 'DG': '7', 'PH': '8', 'SH': '6', 'TS': '0'}, 3: {'SCH': '6', 'TSH': '6'}, 4: {'TSCH': '6'}}¶
-
encode
(word)[source]¶ Return the Roger Root code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Roger Root code
- Return type
str
Examples
>>> pe = RogerRoot() >>> pe.encode('Christopher') '06401' >>> pe.encode('Niall') '02500' >>> pe.encode('Smith') '00310' >>> pe.encode('Schmidt') '06310'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Roger Root code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Roger Root code
- Return type
str
Examples
>>> pe = RogerRoot() >>> pe.encode_alpha('Christopher') 'JRST' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SMT' >>> pe.encode_alpha('Schmidt') 'JMT'
New in version 0.4.0.
-
abydos.phonetic.
roger_root
(word, max_length=5, zero_pad=True)[source]¶ Return the Roger Root code for a word.
This is a wrapper for
RogerRoot.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The Roger Root code
- Return type
str
Examples
>>> roger_root('Christopher') '06401' >>> roger_root('Niall') '02500' >>> roger_root('Smith') '00310' >>> roger_root('Schmidt') '06310'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RogerRoot.encode method instead.
-
class
abydos.phonetic.
StatisticsCanada
(max_length=4)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Statistics Canada code.
The original description of this algorithm could not be located, and may only have been specified in an unpublished TR. The coding does not appear to be in use by Statistics Canada any longer. In its place, this is an implementation of the "Census modified Statistics Canada name coding procedure".
The modified version of this algorithm is described in Appendix B of [MKTM77].
New in version 0.3.6.
Initialize StatisticsCanada instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
New in version 0.4.0.
-
encode
(word)[source]¶ Return the Statistics Canada code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Statistics Canada name code value
- Return type
str
Examples
>>> pe = StatisticsCanada() >>> pe.encode('Christopher') 'CHRS' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SMTH' >>> pe.encode('Schmidt') 'SCHM'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
statistics_canada
(word, max_length=4)[source]¶ Return the Statistics Canada code for a word.
This is a wrapper for
StatisticsCanada.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The maximum length (default 4) of the code to return
- Returns
The Statistics Canada name code value
- Return type
str
Examples
>>> statistics_canada('Christopher') 'CHRS' >>> statistics_canada('Niall') 'NL' >>> statistics_canada('Smith') 'SMTH' >>> statistics_canada('Schmidt') 'SCHM'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the StatisticsCanada.encode method instead.
-
class
abydos.phonetic.
SoundD
(max_length=4)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
SoundD code.
SoundD is defined in [VB12].
New in version 0.3.6.
Initialize SoundD instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
New in version 0.4.0.
-
_alphabetic
= {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R'}¶
-
_trans
= {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶
-
encode
(word)[source]¶ Return the SoundD code.
- Parameters
word (str) -- The word to transform
- Returns
The SoundD code
- Return type
str
Examples
>>> pe = SoundD() >>> pe.encode('Gough') '2000' >>> pe.encode('pneuma') '5500' >>> pe.encode('knight') '5300' >>> pe.encode('trice') '3620' >>> pe.encode('judge') '2200'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic SoundD code.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SoundD code
- Return type
str
Examples
>>> pe = SoundD() >>> pe.encode_alpha('Gough') 'K' >>> pe.encode_alpha('pneuma') 'NN' >>> pe.encode_alpha('knight') 'NT' >>> pe.encode_alpha('trice') 'TRK' >>> pe.encode_alpha('judge') 'KK'
New in version 0.4.0.
-
abydos.phonetic.
sound_d
(word, max_length=4)[source]¶ Return the SoundD code.
- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
- Returns
The SoundD code
- Return type
str
Examples
>>> sound_d('Gough') '2000' >>> sound_d('pneuma') '5500' >>> sound_d('knight') '5300' >>> sound_d('trice') '3620' >>> sound_d('judge') '2200'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SoundD.encode method instead.
-
class
abydos.phonetic.
ParmarKumbharana
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Parmar-Kumbharana code.
This is based on the phonetic algorithm proposed in [PK14].
New in version 0.3.6.
-
_del_trans
= {65: '', 69: '', 73: '', 79: '', 85: '', 89: ''}¶
-
_rules
= {2: {'CE': 'S', 'CI': 'S', 'CK': 'K', 'CY': 'S', 'GE': 'J', 'GI': 'J', 'GN': 'N', 'GY': 'J', 'KN': 'N', 'PN': 'N', 'SH': 'S', 'WR': 'R'}, 3: {'DGE': 'J', 'GHT': 'T', 'OUL': 'U'}, 4: {'OUGH': 'F'}}¶
-
encode
(word)[source]¶ Return the Parmar-Kumbharana encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Parmar-Kumbharana encoding
- Return type
str
Examples
>>> pe = ParmarKumbharana() >>> pe.encode('Gough') 'GF' >>> pe.encode('pneuma') 'NM' >>> pe.encode('knight') 'NT' >>> pe.encode('trice') 'TRS' >>> pe.encode('judge') 'JJ'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
parmar_kumbharana
(word)[source]¶ Return the Parmar-Kumbharana encoding of a word.
This is a wrapper for
ParmarKumbharana.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The Parmar-Kumbharana encoding
- Return type
str
Examples
>>> parmar_kumbharana('Gough') 'GF' >>> parmar_kumbharana('pneuma') 'NM' >>> parmar_kumbharana('knight') 'NT' >>> parmar_kumbharana('trice') 'TRS' >>> parmar_kumbharana('judge') 'JJ'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the ParmarKumbharana.encode method instead.
-
class
abydos.phonetic.
Metaphone
(max_length=-1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Metaphone.
Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)
New in version 0.4.0.
-
_frontv
= {'E', 'I', 'Y'}¶
-
_varson
= {'C', 'G', 'P', 'S', 'T'}¶
-
encode
(word)[source]¶ Return the Metaphone code for a word.
Based on Lawrence Philips' Pick BASIC code from 1990 [Phi90b], as described in [Phi90a]. This incorporates some corrections to the above code, particularly some of those suggested by Michael Kuhn in [Kuh95].
- Parameters
word (str) -- The word to transform
- Returns
The Metaphone value
- Return type
str
Examples
>>> pe = Metaphone() >>> pe.encode('Christopher') 'KRSTFR' >>> pe.encode('Niall') 'NL' >>> pe.encode('Smith') 'SM0' >>> pe.encode('Schmidt') 'SKMTT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
metaphone
(word, max_length=-1)[source]¶ Return the Metaphone code for a word.
This is a wrapper for
Metaphone.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The maximum length of the returned Metaphone code (defaults to 64, but in Philips' original implementation this was 4)
- Returns
The Metaphone value
- Return type
str
Examples
>>> metaphone('Christopher') 'KRSTFR' >>> metaphone('Niall') 'NL' >>> metaphone('Smith') 'SM0' >>> metaphone('Schmidt') 'SKMTT'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Metaphone.encode method instead.
-
class
abydos.phonetic.
DoubleMetaphone
(max_length=-1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Double Metaphone.
Based on Lawrence Philips' (Visual) C++ code from 1999 [Phi00].
New in version 0.3.6.
Initialize DoubleMetaphone instance.
- Parameters
max_length (int) -- Maximum length of the returned Dolby code -- this also activates the fixed-length code mode if it is greater than 0
New in version 0.4.0.
-
encode
(word)[source]¶ Return the Double Metaphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Double Metaphone value(s)
- Return type
tuple
Examples
>>> pe = DoubleMetaphone() >>> pe.encode('Christopher') ('KRSTFR', '') >>> pe.encode('Niall') ('NL', '') >>> pe.encode('Smith') ('SM0', 'XMT') >>> pe.encode('Schmidt') ('XMT', 'SMT')
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Double Metaphone code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Double Metaphone value(s)
- Return type
tuple
Examples
>>> pe = DoubleMetaphone() >>> pe.encode_alpha('Christopher') ('KRSTFR', '') >>> pe.encode_alpha('Niall') ('NL', '') >>> pe.encode_alpha('Smith') ('SMÞ', 'XMT') >>> pe.encode_alpha('Schmidt') ('XMT', 'SMT')
New in version 0.4.0.
-
abydos.phonetic.
double_metaphone
(word, max_length=-1)[source]¶ Return the Double Metaphone code for a word.
This is a wrapper for
DoubleMetaphone.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The maximum length of the returned Double Metaphone codes (defaults to unlimited, but in Philips' original implementation this was 4)
- Returns
The Double Metaphone value(s)
- Return type
tuple
Examples
>>> double_metaphone('Christopher') ('KRSTFR', '') >>> double_metaphone('Niall') ('NL', '') >>> double_metaphone('Smith') ('SM0', 'XMT') >>> double_metaphone('Schmidt') ('XMT', 'SMT')
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the DoubleMetaphone.encode method instead.
-
class
abydos.phonetic.
Eudex
(max_length=8)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Eudex hash.
This implementation of eudex phonetic hashing is based on the specification (not the reference implementation) at [Tic].
Further details can be found at [Tic16].
New in version 0.3.6.
Initialize Eudex instance.
- Parameters
max_length (int) -- The length in bits of the code returned (default 8)
New in version 0.4.0.
-
_initial_phones
= {'a': 132, 'b': 36, 'c': 6, 'd': 12, 'e': 216, 'f': 34, 'g': 4, 'h': 2, 'i': 248, 'j': 3, 'k': 5, 'l': 80, 'm': 1, 'n': 9, 'o': 148, 'p': 37, 'q': 84, 'r': 81, 's': 10, 't': 14, 'u': 224, 'v': 35, 'w': 0, 'x': 66, 'y': 228, 'z': 74, 'ß': 11, 'à': 133, 'á': 133, 'â': 128, 'ã': 134, 'ä': 166, 'å': 194, 'æ': 167, 'ç': 84, 'è': 217, 'é': 217, 'ê': 217, 'ë': 198, 'ì': 249, 'í': 249, 'î': 249, 'ï': 249, 'ð': 11, 'ñ': 11, 'ò': 149, 'ó': 149, 'ô': 149, 'õ': 149, 'ö': 220, '÷': 255, 'ø': 221, 'ù': 225, 'ú': 225, 'û': 225, 'ü': 229, 'ý': 229, 'þ': 11, 'ÿ': 229}¶
-
_trailing_phones
= {'a': 0, 'b': 72, 'c': 12, 'd': 24, 'e': 0, 'f': 68, 'g': 8, 'h': 4, 'i': 1, 'j': 5, 'k': 9, 'l': 160, 'm': 2, 'n': 18, 'o': 0, 'p': 73, 'q': 168, 'r': 161, 's': 20, 't': 29, 'u': 1, 'v': 69, 'w': 0, 'x': 132, 'y': 1, 'z': 148, 'ß': 21, 'à': 0, 'á': 0, 'â': 0, 'ã': 0, 'ä': 0, 'å': 1, 'æ': 0, 'ç': 149, 'è': 1, 'é': 1, 'ê': 1, 'ë': 1, 'ì': 1, 'í': 1, 'î': 1, 'ï': 1, 'ð': 21, 'ñ': 23, 'ò': 0, 'ó': 0, 'ô': 0, 'õ': 0, 'ö': 1, '÷': 255, 'ø': 1, 'ù': 1, 'ú': 1, 'û': 1, 'ü': 1, 'ý': 1, 'þ': 21, 'ÿ': 1}¶
-
encode
(word)[source]¶ Return the eudex phonetic hash of a word.
- Parameters
word (str) -- The word to transform
- Returns
The eudex hash
- Return type
int
Examples
>>> pe = Eudex() >>> pe.encode('Colin') 432345564238053650 >>> pe.encode('Christopher') 433648490138894409 >>> pe.encode('Niall') 648518346341351840 >>> pe.encode('Smith') 720575940412906756 >>> pe.encode('Schmidt') 720589151732307997
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
eudex
(word, max_length=8)[source]¶ Return the eudex phonetic hash of a word.
This is a wrapper for
Eudex.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length in bits of the code returned (default 8)
- Returns
The eudex hash
- Return type
int
Examples
>>> eudex('Colin') 432345564238053650 >>> eudex('Christopher') 433648490138894409 >>> eudex('Niall') 648518346341351840 >>> eudex('Smith') 720575940412906756 >>> eudex('Schmidt') 720589151732307997
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Eudex.encode method instead.
-
class
abydos.phonetic.
BeiderMorse
(language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Beider-Morse Phonetic Matching.
The Beider-Morse Phonetic Matching algorithm is described in [BM08]. The reference implementation is licensed under GPLv3.
New in version 0.3.6.
Initialize BeiderMorse instance.
- Parameters
language_arg (str or int) --
The language of the term; supported values include:
any
arabic
cyrillic
czech
dutch
english
french
german
greek
greeklatin
hebrew
hungarian
italian
latvian
polish
portuguese
romanian
russian
spanish
turkish
name_mode (str) --
The name mode of the algorithm:
gen
-- general (default)ash
-- Ashkenazisep
-- Sephardic
match_mode (str) -- Matching mode:
approx
orexact
concat (bool) -- Concatenation mode
filter_langs (bool) -- Filter out incompatible languages
New in version 0.4.0.
-
_apply_final_rules
(phonetic, final_rules, language_arg, strip)[source]¶ Apply a set of final rules to the phonetic encoding.
- Parameters
phonetic (str) -- The term to which to apply the final rules
final_rules (tuple) -- The set of final phonetic transform regexps
language_arg (int) -- An integer representing the target language of the phonetic encoding
strip (bool) -- Flag to indicate whether to normalize the language attributes
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_apply_rule_if_compat
(phonetic, target, language_arg)[source]¶ Apply a phonetic regex if compatible.
tests for compatible language rules
- to do so, apply the rule, expand the results, and detect alternatives
with incompatible attributes
- then drop each alternative that has incompatible attributes and keep
those that are compatible
if there are no compatible alternatives left, return false
otherwise return the compatible alternatives
apply the rule
- Parameters
phonetic (str) -- The Beider-Morse phonetic encoding (so far)
target (str) -- A proposed addition to the phonetic encoding
language_arg (int) -- An integer representing the target language of the phonetic encoding
- Returns
A candidate encoding
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_expand_alternates
(phonetic)[source]¶ Expand phonetic alternates separated by |s.
- Parameters
phonetic (str) -- A Beider-Morse phonetic encoding
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_language
(name, name_mode)[source]¶ Return the best guess language ID for the word and language choices.
- Parameters
name (str) -- The term to guess the language of
name_mode (str) -- The name mode of the algorithm:
gen
(default),ash
(Ashkenazi), orsep
(Sephardic)
- Returns
Language ID
- Return type
int
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_language_index_from_code
(code, name_mode)[source]¶ Return the index value for a language code.
This returns l_any if more than one code is specified or the code is out of bounds.
- Parameters
code (int) -- The language code to interpret
name_mode (str) -- The name mode of the algorithm:
gen
(default),ash
(Ashkenazi), orsep
(Sephardic)
- Returns
Language code index
- Return type
int
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_normalize_lang_attrs
(text, strip)[source]¶ Remove embedded bracketed attributes.
This (potentially) bitwise-ands bracketed attributes together and adds to the end. This is applied to a single alternative at a time -- not to a parenthesized list. It removes all embedded bracketed attributes, logically-ands them together, and places them at the end. However if strip is true, this can indeed remove embedded bracketed attributes from a parenthesized list.
- Parameters
text (str) -- A Beider-Morse phonetic encoding (in progress)
strip (bool) -- Remove the bracketed attributes (and throw away)
- Returns
A Beider-Morse phonetic code
- Return type
str
- Raises
ValueError -- No closing square bracket
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_phonetic
(term, name_mode, rules, final_rules1, final_rules2, language_arg=0, concat=False)[source]¶ Return the Beider-Morse encoding(s) of a term.
- Parameters
term (str) -- The term to encode via Beider-Morse
name_mode (str) -- The name mode of the algorithm:
gen
(default),ash
(Ashkenazi), orsep
(Sephardic)rules (tuple) -- The set of initial phonetic transform regexps
final_rules1 (tuple) -- The common set of final phonetic transform regexps
final_rules2 (tuple) -- The specific set of final phonetic transform regexps
language_arg (int) -- The language of the term
concat (bool) -- A flag to indicate concatenation
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_phonetic_number
(phonetic)[source]¶ Remove bracketed text from the end of a string.
- Parameters
phonetic (str) -- A Beider-Morse phonetic encoding
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_phonetic_numbers
(phonetic)[source]¶ Prepare & join phonetic numbers.
Split phonetic value on '-', run through _pnums_with_leading_space, and join with ' '
- Parameters
phonetic (str) -- A Beider-Morse phonetic encoding
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_pnums_with_leading_space
(phonetic)[source]¶ Join prefixes & suffixes in cases of alternate phonetic values.
- Parameters
phonetic (str) -- A Beider-Morse phonetic encoding
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_redo_language
(term, name_mode, rules, final_rules1, final_rules2, concat)[source]¶ Reassess the language of the terms and call the phonetic encoder.
Uses a split multi-word term.
- Parameters
term (str) -- The term to encode via Beider-Morse
name_mode (str) -- The name mode of the algorithm:
gen
(default),ash
(Ashkenazi), orsep
(Sephardic)rules (tuple) -- The set of initial phonetic transform regexps
final_rules1 (tuple) -- The common set of final phonetic transform regexps
final_rules2 (tuple) -- The specific set of final phonetic transform regexps
concat (bool) -- A flag to indicate concatenation
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_remove_dupes
(phonetic)[source]¶ Remove duplicates from a phonetic encoding list.
- Parameters
phonetic (str) -- A Beider-Morse phonetic encoding
- Returns
A Beider-Morse phonetic code
- Return type
str
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode
(word)[source]¶ Return the Beider-Morse Phonetic Matching encoding(s) of a term.
- Parameters
word (str) -- The word to transform
- Returns
The Beider-Morse phonetic value(s)
- Return type
tuple
- Raises
ValueError -- Unknown language
Examples
>>> pe = BeiderMorse() >>> pe.encode('Christopher') 'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir xristYfir xristopi xritopir xritopi xristofi xritofir xritofi tzristopir tzristofir zristopir zristopi zritopir zritopi zristofir zristofi zritofir zritofi' >>> pe.encode('Niall') 'nial niol' >>> pe.encode('Smith') 'zmit' >>> pe.encode('Schmidt') 'zmit stzmit'
>>> BeiderMorse(language_arg='German').encode('Christopher') 'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir xristYfir' >>> BeiderMorse(language_arg='English').encode('Christopher') 'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir xristafir xrQstafir' >>> BeiderMorse(language_arg='German', ... name_mode='ash').encode('Christopher') 'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir xristYfir'
>>> BeiderMorse(language_arg='German', ... match_mode='exact').encode('Christopher') 'xriStopher xriStofer xristopher xristofer'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
bmpm
(word, language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False)[source]¶ Return the Beider-Morse Phonetic Matching encoding(s) of a term.
This is a wrapper for
BeiderMorse.encode()
.- Parameters
word (str) -- The word to transform
language_arg (str) --
The language of the term; supported values include:
any
arabic
cyrillic
czech
dutch
english
french
german
greek
greeklatin
hebrew
hungarian
italian
latvian
polish
portuguese
romanian
russian
spanish
turkish
name_mode (str) --
The name mode of the algorithm:
gen
-- general (default)ash
-- Ashkenazisep
-- Sephardic
match_mode (str) -- Matching mode:
approx
orexact
concat (bool) -- Concatenation mode
filter_langs (bool) -- Filter out incompatible languages
- Returns
The Beider-Morse phonetic value(s)
- Return type
tuple
Examples
>>> bmpm('Christopher') 'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir xristYfir xristopi xritopir xritopi xristofi xritofir xritofi tzristopir tzristofir zristopir zristopi zritopir zritopi zristofir zristofi zritofir zritofi' >>> bmpm('Niall') 'nial niol' >>> bmpm('Smith') 'zmit' >>> bmpm('Schmidt') 'zmit stzmit'
>>> bmpm('Christopher', language_arg='German') 'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir xristYfir' >>> bmpm('Christopher', language_arg='English') 'tzristofir tzrQstofir tzristafir tzrQstafir xristofir xrQstofir xristafir xrQstafir' >>> bmpm('Christopher', language_arg='German', name_mode='ash') 'xrQstopir xrQstYpir xristopir xristYpir xrQstofir xrQstYfir xristofir xristYfir'
>>> bmpm('Christopher', language_arg='German', match_mode='exact') 'xriStopher xriStofer xristopher xristofer'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the BeiderMorse.encode method instead.
-
class
abydos.phonetic.
NRL
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Naval Research Laboratory English-to-phoneme encoder.
This is defined by [EJMS76].
New in version 0.3.6.
-
_rules
= {' ': (('', ' ', '', ' '), ('', '-', '', ''), ('.', "'S", '', 'z'), ('#:.E', "'S", '', 'z'), ('#', "'S", '', 'z'), ('', "'", '', ''), ('', ',', '', ' '), ('', '.', '', ' '), ('', '?', '', ' '), ('', '!', '', ' ')), 'A': (('', 'A', ' ', 'AX'), (' ', 'ARE', ' ', 'AAr'), (' ', 'AR', 'O', 'AXr'), ('', 'AR', '#', 'EHr'), ('^', 'AS', '#', 'EYs'), ('', 'A', 'WA', 'AX'), ('', 'AW', '', 'AO'), (' :', 'ANY', '', 'EHnIY'), ('', 'A', '^+#', 'EY'), ('#:', 'ALLY', '', 'AXlIY'), (' ', 'AL', '#', 'AXl'), ('', 'AGAIN', '', 'AXgEHn'), ('#:', 'AG', 'E', 'IHj'), ('', 'A', '^+:#', 'AE'), (' :', 'A', '^+ ', 'EY'), ('', 'A', '^%', 'EY'), (' ', 'ARR', '', 'AXr'), ('', 'ARR', '', 'AEr'), (' :', 'AR', ' ', 'AAr'), ('', 'AR', ' ', 'ER'), ('', 'AR', '', 'AAr'), ('', 'AIR', '', 'EHr'), ('', 'AI', '', 'EY'), ('', 'AY', '', 'EY'), ('', 'AU', '', 'AO'), ('#:', 'AL', ' ', 'AXl'), ('#:', 'ALS', ' ', 'AXlz'), ('', 'ALK', '', 'AOk'), ('', 'AL', '^', 'AOl'), (' :', 'ABLE', '', 'EYbAXl'), ('', 'ABLE', '', 'AXbAXl'), ('', 'ANG', '+', 'EYnj'), ('', 'A', '', 'AE')), 'B': ((' ', 'BE', '^#', 'bIH'), ('', 'BEING', '', 'bIYIHNG'), (' ', 'BOTH', ' ', 'bOWTH'), (' ', 'BUS', '#', 'bIHz'), ('', 'BUIL', '', 'bIHl'), ('', 'B', '', 'b')), 'C': ((' ', 'CH', '^', 'k'), ('^E', 'CH', '', 'k'), ('', 'CH', '', 'CH'), (' S', 'CI', '#', 'sAY'), ('', 'CI', 'A', 'SH'), ('', 'CI', 'O', 'SH'), ('', 'CI', 'EN', 'SH'), ('', 'C', '+', 's'), ('', 'CK', '', 'k'), ('', 'COM', '%', 'kAHm'), ('', 'C', '', 'k')), 'D': (('#:', 'DED', ' ', 'dIHd'), ('.E', 'D', ' ', 'd'), ('#:^E', 'D', ' ', 't'), (' ', 'DE', '^#', 'dIH'), (' ', 'DO', ' ', 'dUW'), (' ', 'DOES', '', 'dAHz'), (' ', 'DOING', '', 'dUWIHNG'), (' ', 'DOW', '', 'dAW'), ('', 'DU', 'A', 'jUW'), ('', 'D', '', 'd')), 'E': (('#:', 'E', ' ', ''), ("':^", 'E', ' ', ''), (' :', 'E', ' ', 'IY'), ('#', 'ED', ' ', 'd'), ('#:', 'E', 'D ', ''), ('', 'EV', 'ER', 'EHv'), ('', 'E', '^%', 'IY'), ('', 'ERI', '#', 'IYrIY'), ('', 'ERI', '', 'EHrIH'), ('#:', 'ER', '#', 'ER'), ('', 'ER', '#', 'EHr'), ('', 'ER', '', 'ER'), (' ', 'EVEN', '', 'IYvEHn'), ('#:', 'E', 'W', ''), ('T', 'EW', '', 'UW'), ('S', 'EW', '', 'UW'), ('R', 'EW', '', 'UW'), ('D', 'EW', '', 'UW'), ('L', 'EW', '', 'UW'), ('Z', 'EW', '', 'UW'), ('N', 'EW', '', 'UW'), ('J', 'EW', '', 'UW'), ('TH', 'EW', '', 'UW'), ('CH', 'EW', '', 'UW'), ('SH', 'EW', '', 'UW'), ('', 'EW', '', 'yUW'), ('', 'E', 'O', 'IY'), ('#:S', 'ES', ' ', 'IHz'), ('#:C', 'ES', ' ', 'IHz'), ('#:G', 'ES', ' ', 'IHz'), ('#:Z', 'ES', ' ', 'IHz'), ('#:X', 'ES', ' ', 'IHz'), ('#:J', 'ES', ' ', 'IHz'), ('#:CH', 'ES', ' ', 'IHz'), ('#:SH', 'ES', ' ', 'IHz'), ('#:', 'E', 'S ', ''), ('#:', 'ELY', ' ', 'lIY'), ('#:', 'EMENT', '', 'mEHnt'), ('', 'EFUL', '', 'fUHl'), ('', 'EE', '', 'IY'), ('', 'EARN', '', 'ERn'), (' ', 'EAR', '^', 'ER'), ('', 'EAD', '', 'EHd'), ('#:', 'EA', ' ', 'IYAX'), ('', 'EA', 'SU', 'EH'), ('', 'EA', '', 'IY'), ('', 'EIGH', '', 'EY'), ('', 'EI', '', 'IY'), (' ', 'EYE', '', 'AY'), ('', 'EY', '', 'IY'), ('', 'EU', '', 'yUW'), ('', 'E', '', 'EH')), 'F': (('', 'FUL', '', 'fUHl'), ('', 'F', '', 'f')), 'G': (('', 'GIV', '', 'gIHv'), (' ', 'G', 'I^', 'g'), ('', 'GE', 'T', 'gEH'), ('SU', 'GGES', '', 'gjEHs'), ('', 'GG', '', 'g'), (' B#', 'G', '', 'g'), ('', 'G', '+', 'j'), ('', 'GREAT', '', 'grEYt'), ('#', 'GH', '', ''), ('', 'G', '', 'g')), 'H': ((' ', 'HAV', '', 'hAEv'), (' ', 'HERE', '', 'hIYr'), (' ', 'HOUR', '', 'AWER'), ('', 'HOW', '', 'hAW'), ('', 'H', '#', 'h'), ('', 'H', '', '')), 'I': ((' ', 'IN', '', 'IHn'), (' ', 'I', ' ', 'AY'), ('', 'IN', 'D', 'AYn'), ('', 'IER', '', 'IYER'), ('#:R', 'IED', '', 'IYd'), ('', 'IED', ' ', 'AYd'), ('', 'IEN', '', 'IYEHn'), ('', 'IE', 'T', 'AYEH'), (' :', 'I', '%', 'AY'), ('', 'I', '%', 'IY'), ('', 'IE', '', 'IY'), ('', 'I', '^+:#', 'IH'), ('', 'IR', '#', 'AYr'), ('', 'IZ', '%', 'AYz'), ('', 'IS', '%', 'AYz'), ('', 'I', 'D%', 'AY'), ('+^', 'I', '^+', 'IH'), ('', 'I', 'T%', 'AY'), ('#:^', 'I', '^+', 'IH'), ('', 'I', '^+', 'AY'), ('', 'IR', '', 'ER'), ('', 'IGH', '', 'AY'), ('', 'ILD', '', 'AYld'), ('', 'IGN', ' ', 'AYn'), ('', 'IGN', '^', 'AYn'), ('', 'IGN', '%', 'AYn'), ('', 'IQUE', '', 'IYk'), ('', 'I', '', 'IH')), 'J': (('', 'J', '', 'j'),), 'K': ((' ', 'K', 'N', ''), ('', 'K', '', 'k')), 'L': (('', 'LO', 'C#', 'lOW'), ('L', 'L', '', ''), ('#:^', 'L', '%', 'AXl'), ('', 'LEAD', '', 'lIYd'), ('', 'L', '', 'l')), 'M': (('', 'MOV', '', 'mUWv'), ('', 'M', '', 'm')), 'N': (('E', 'NG', '+', 'nj'), ('', 'NG', 'R', 'NGg'), ('', 'NG', '#', 'NGg'), ('', 'NGL', '%', 'NGgAXl'), ('', 'NG', '', 'NG'), ('', 'NK', '', 'NGk'), (' ', 'NOW', ' ', 'nAW'), ('', 'N', '', 'n')), 'O': (('', 'OF', ' ', 'AXv'), ('', 'OROUGH', '', 'EROW'), ('#:', 'OR', ' ', 'ER'), ('#:', 'ORS', ' ', 'ERz'), ('', 'OR', '', 'AOr'), (' ', 'ONE', '', 'wAHn'), ('', 'OW', '', 'OW'), (' ', 'OVER', '', 'OWvER'), ('', 'OV', '', 'AHv'), ('', 'O', '^%', 'OW'), ('', 'O', '^EN', 'OW'), ('', 'O', '^I#', 'OW'), ('', 'OL', 'D', 'OWl'), ('', 'OUGHT', '', 'AOt'), ('', 'OUGH', '', 'AHf'), (' ', 'OU', '', 'AW'), ('H', 'OU', 'S#', 'AW'), ('', 'OUS', '', 'AXs'), ('', 'OUR', '', 'AOr'), ('', 'OULD', '', 'UHd'), ('^', 'OU', '^L', 'AH'), ('', 'OUP', '', 'UWp'), ('', 'OU', '', 'AW'), ('', 'OY', '', 'OY'), ('', 'OING', '', 'OWIHNG'), ('', 'OI', '', 'OY'), ('', 'OOR', '', 'AOr'), ('', 'OOK', '', 'UHk'), ('', 'OOD', '', 'UHd'), ('', 'OO', '', 'UW'), ('', 'O', 'E', 'OW'), ('', 'O', ' ', 'OW'), ('', 'OA', '', 'OW'), (' ', 'ONLY', '', 'OWnlIY'), (' ', 'ONCE', '', 'wAHns'), ('', "ON'T", '', 'OWnt'), ('C', 'O', 'N', 'AA'), ('', 'O', 'NG', 'AO'), (' :^', 'O', 'N', 'AH'), ('I', 'ON', '', 'AXn'), ('#:', 'ON', ' ', 'AXn'), ('#^', 'ON', '', 'AXn'), ('', 'O', 'ST ', 'OW'), ('', 'OF', '^', 'AOf'), ('', 'OTHER', '', 'AHDHER'), ('', 'OSS', ' ', 'AOs'), ('#:^', 'OM', '', 'AHm'), ('', 'O', '', 'AA')), 'P': (('', 'PH', '', 'f'), ('', 'PEOP', '', 'pIYp'), ('', 'POW', '', 'pAW'), ('', 'PUT', ' ', 'pUHt'), ('', 'P', '', 'p')), 'Q': (('', 'QUAR', '', 'kwAOr'), ('', 'QU', '', 'kw'), ('', 'Q', '', 'k')), 'R': ((' ', 'RE', '^#', 'rIY'), ('', 'R', '', 'r')), 'S': (('', 'SH', '', 'SH'), ('#', 'SION', '', 'ZHAXn'), ('', 'SOME', '', 'sAHm'), ('#', 'SUR', '#', 'ZHER'), ('', 'SUR', '#', 'SHER'), ('#', 'SU', '#', 'ZHUW'), ('#', 'SSU', '#', 'SHUW'), ('#', 'SED', ' ', 'zd'), ('#', 'S', '#', 'z'), ('', 'SAID', '', 'sEHd'), ('^', 'SION', '', 'SHAXn'), ('', 'S', 'S', ''), ('.', 'S', ' ', 'z'), ('#:.E', 'S', ' ', 'z'), ('#:^##', 'S', ' ', 'z'), ('#:^#', 'S', ' ', 's'), ('U', 'S', ' ', 's'), (' :#', 'S', ' ', 'z'), (' ', 'SCH', '', 'sk'), ('', 'S', 'C+', ''), ('#', 'SM', '', 'zm'), ('#', 'SN', "'", 'zAXn'), ('', 'S', '', 's')), 'T': ((' ', 'THE', ' ', 'DHAX'), ('', 'TO', ' ', 'tUW'), ('', 'THAT', ' ', 'DHAEt'), (' ', 'THIS', ' ', 'DHIHs'), (' ', 'THEY', '', 'DHEY'), (' ', 'THERE', '', 'DHEHr'), ('', 'THER', '', 'DHER'), ('', 'THEIR', '', 'DHEHr'), (' ', 'THAN', ' ', 'DHAEn'), (' ', 'THEM', ' ', 'DHEHm'), ('', 'THESE', ' ', 'DHIYz'), (' ', 'THEN', '', 'DHEHn'), ('', 'THROUGH', '', 'THrUW'), ('', 'THOSE', '', 'DHOWz'), ('', 'THOUGH', ' ', 'DHOW'), (' ', 'THUS', '', 'DHAHs'), ('', 'TH', '', 'TH'), ('#:', 'TED', ' ', 'tIHd'), ('S', 'TI', '#N', 'CH'), ('', 'TI', 'O', 'SH'), ('', 'TI', 'A', 'SH'), ('', 'TIEN', '', 'SHAXn'), ('', 'TUR', '#', 'CHER'), ('', 'TU', 'A', 'CHUW'), (' ', 'TWO', '', 'tUW'), ('', 'T', '', 't')), 'U': ((' ', 'UN', 'I', 'yUWn'), (' ', 'UN', '', 'AHn'), (' ', 'UPON', '', 'AXpAOn'), ('T', 'UR', '#', 'UHr'), ('S', 'UR', '#', 'UHr'), ('R', 'UR', '#', 'UHr'), ('D', 'UR', '#', 'UHr'), ('L', 'UR', '#', 'UHr'), ('Z', 'UR', '#', 'UHr'), ('N', 'UR', '#', 'UHr'), ('J', 'UR', '#', 'UHr'), ('TH', 'UR', '#', 'UHr'), ('CH', 'UR', '#', 'UHr'), ('SH', 'UR', '#', 'UHr'), ('', 'UR', '#', 'yUHr'), ('', 'UR', '', 'ER'), ('', 'U', '^ ', 'AH'), ('', 'U', '^^', 'AH'), ('', 'UY', '', 'AY'), (' G', 'U', '#', ''), ('G', 'U', '%', ''), ('G', 'U', '#', 'w'), ('#N', 'U', '', 'yUW'), ('T', 'U', '', 'UW'), ('S', 'U', '', 'UW'), ('R', 'U', '', 'UW'), ('D', 'U', '', 'UW'), ('L', 'U', '', 'UW'), ('Z', 'U', '', 'UW'), ('N', 'U', '', 'UW'), ('J', 'U', '', 'UW'), ('TH', 'U', '', 'UW'), ('CH', 'U', '', 'UW'), ('SH', 'U', '', 'UW'), ('', 'U', '', 'yUW')), 'V': (('', 'VIEW', '', 'vyUW'), ('', 'V', '', 'v')), 'W': ((' ', 'WERE', '', 'wER'), ('', 'WA', 'S', 'wAA'), ('', 'WA', 'T', 'wAA'), ('', 'WHERE', '', 'WHEHr'), ('', 'WHAT', '', 'WHAAt'), ('', 'WHOL', '', 'hOWl'), ('', 'WHO', '', 'hUW'), ('', 'WH', '', 'WH'), ('', 'WAR', '', 'wAOr'), ('', 'WOR', '^', 'wER'), ('', 'WR', '', 'r'), ('', 'W', '', 'w')), 'X': (('', 'X', '', 'ks'),), 'Y': (('', 'YOUNG', '', 'yAHNG'), (' ', 'YOU', '', 'yUW'), (' ', 'YES', '', 'yEHs'), (' ', 'Y', '', 'y'), ('#:^', 'Y', ' ', 'IY'), ('#:^', 'Y', 'I', 'IY'), (' :', 'Y', ' ', 'AY'), (' :', 'Y', '#', 'AY'), (' :', 'Y', '^+:#', 'IH'), (' :', 'Y', '^#', 'AY'), ('', 'Y', '', 'IH')), 'Z': (('', 'Z', '', 'z'),)}¶
-
encode
(word)[source]¶ Return the Naval Research Laboratory phonetic encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The NRL phonetic encoding
- Return type
str
Examples
>>> pe = NRL() >>> pe.encode('the') 'DHAX' >>> pe.encode('round') 'rAWnd' >>> pe.encode('quick') 'kwIHk' >>> pe.encode('eaten') 'IYtEHn' >>> pe.encode('Smith') 'smIHTH' >>> pe.encode('Larsen') 'lAArsEHn'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
nrl
(word)[source]¶ Return the Naval Research Laboratory phonetic encoding of a word.
This is a wrapper for
NRL.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The NRL phonetic encoding
- Return type
str
Examples
>>> nrl('the') 'DHAX' >>> nrl('round') 'rAWnd' >>> nrl('quick') 'kwIHk' >>> nrl('eaten') 'IYtEHn' >>> nrl('Smith') 'smIHTH' >>> nrl('Larsen') 'lAArsEHn'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the NRL.encode method instead.
-
class
abydos.phonetic.
MetaSoundex
(lang='en')[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
MetaSoundex.
This is based on [KV17]. Only English ('en') and Spanish ('es') languages are supported, as in the original.
New in version 0.3.6.
Initialize MetaSoundex instance.
- Parameters
lang (str) -- Either
en
for English ores
for Spanish
New in version 0.4.0.
-
_trans
= {65: '0', 66: '7', 67: '4', 68: '3', 69: '0', 70: '7', 71: '5', 72: '5', 73: '0', 74: '1', 75: '5', 76: '8', 77: '6', 78: '6', 79: '0', 80: '7', 81: '5', 82: '9', 83: '4', 84: '3', 85: '0', 86: '7', 87: '7', 88: '5', 89: '1', 90: '4'}¶
-
encode
(word)[source]¶ Return the MetaSoundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> pe = MetaSoundex() >>> pe.encode('Smith') '4500' >>> pe.encode('Waters') '7362' >>> pe.encode('James') '1520' >>> pe.encode('Schmidt') '4530' >>> pe.encode('Ashcroft') '0261'
>>> pe = MetaSoundex(lang='es') >>> pe.encode('Perez') '094' >>> pe.encode('Martinez') '69364' >>> pe.encode('Gutierrez') '83994' >>> pe.encode('Santiago') '4638' >>> pe.encode('Nicolás') '6754'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the MetaSoundex code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> pe = MetaSoundex() >>> pe.encode_alpha('Smith') 'SN' >>> pe.encode_alpha('Waters') 'WTRK' >>> pe.encode_alpha('James') 'JNK' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Ashcroft') 'AKRP'
>>> pe = MetaSoundex(lang='es') >>> pe.encode_alpha('Perez') 'PRS' >>> pe.encode_alpha('Martinez') 'NRTNS' >>> pe.encode_alpha('Gutierrez') 'GTRRS' >>> pe.encode_alpha('Santiago') 'SNTG' >>> pe.encode_alpha('Nicolás') 'NKLS'
New in version 0.4.0.
-
abydos.phonetic.
metasoundex
(word, lang='en')[source]¶ Return the MetaSoundex code for a word.
This is a wrapper for
MetaSoundex.encode()
.- Parameters
word (str) -- The word to transform
lang (str) -- Either
en
for English ores
for Spanish
- Returns
The MetaSoundex code
- Return type
str
Examples
>>> metasoundex('Smith') '4500' >>> metasoundex('Waters') '7362' >>> metasoundex('James') '1520' >>> metasoundex('Schmidt') '4530' >>> metasoundex('Ashcroft') '0261' >>> metasoundex('Perez', lang='es') '094' >>> metasoundex('Martinez', lang='es') '69364' >>> metasoundex('Gutierrez', lang='es') '83994' >>> metasoundex('Santiago', lang='es') '4638' >>> metasoundex('Nicolás', lang='es') '6754'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the MetaSoundex.encode method instead.
-
class
abydos.phonetic.
ONCA
(max_length=4, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Oxford Name Compression Algorithm (ONCA).
This is the Oxford Name Compression Algorithm, based on [Gil97].
I can find no complete description of the "anglicised version of the NYSIIS method" identified as the first step in this algorithm, so this is likely not a precisely correct implementation, in that it employs the standard NYSIIS algorithm.
New in version 0.3.6.
Initialize ONCA instance.
- Parameters
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
encode
(word)[source]¶ Return the Oxford Name Compression Algorithm (ONCA) code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The ONCA code
- Return type
str
Examples
>>> pe = ONCA() >>> pe.encode('Christopher') 'C623' >>> pe.encode('Niall') 'N400' >>> pe.encode('Smith') 'S530' >>> pe.encode('Schmidt') 'S530'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic ONCA code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic ONCA code
- Return type
str
Examples
>>> pe = ONCA() >>> pe.encode_alpha('Christopher') 'CRKT' >>> pe.encode_alpha('Niall') 'NL' >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT'
New in version 0.4.0.
-
abydos.phonetic.
onca
(word, max_length=4, zero_pad=True)[source]¶ Return the Oxford Name Compression Algorithm (ONCA) code for a word.
This is a wrapper for
ONCA.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The maximum length (default 5) of the code to return
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The ONCA code
- Return type
str
Examples
>>> onca('Christopher') 'C623' >>> onca('Niall') 'N400' >>> onca('Smith') 'S530' >>> onca('Schmidt') 'S530'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the ONCA.encode method instead.
-
class
abydos.phonetic.
FONEM
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
FONEM.
FONEM is a phonetic algorithm designed for French (particularly surnames in Saguenay, Canada), defined in [BBL81].
Guillaume Plique's Javascript implementation [Pli18] at https://github.com/Yomguithereal/talisman/blob/master/src/phonetics/french/fonem.js was also consulted for this implementation.
New in version 0.3.6.
-
_rule_order
= ('V-14', 'C-28', 'C-28a', 'C-28b', 'C-28bb', 'C-28c', 'C-28d', 'C-12', 'C-8', 'C-9', 'C-10', 'C-16', 'C-17', 'C-2', 'C-3', 'C-7', 'V-2,5', 'V-3,4', 'V-6', 'V-1', 'C-14', 'C-31,33', 'C-30,32', 'C-11', 'V-15', 'V-17', 'V-18', 'V-7', 'V-8', 'V-9', 'V-10', 'V-11', 'V-12', 'V-13', 'V-16', 'V-19', 'V-20', 'C-1', 'C-4', 'C-5', 'C-6', 'C-13', 'C-15', 'C-18', 'C-19', 'C-20', 'C-21', 'C-22', 'C-23', 'C-24', 'C-25', 'C-26', 'C-27', 'C-29', 'V-14', 'C-28', 'C-28a', 'C-28b', 'C-28bb', 'C-28c', 'C-28d', 'C-34', 'C-35')¶
-
_rule_table
= {'C-1': ('BV', 'V'), 'C-10': (re.compile('G(?=[EIY])'), 'J'), 'C-11': (re.compile('GA(?=I?[MN])'), 'G#'), 'C-12': (re.compile('GE(O|AU)'), 'JO'), 'C-13': (re.compile('GNI(?=[AEIOUY])'), 'GN'), 'C-14': (re.compile('(?<![PCS])H'), ''), 'C-15': ('JEA', 'JA'), 'C-16': (re.compile('^MAC(?=[BCDFGHJKLMNPQRSTVWXZ])'), 'MA#'), 'C-17': (re.compile('^MC'), 'MA#'), 'C-18': ('PH', 'F'), 'C-19': ('QU', 'K'), 'C-2': (re.compile('(?<=[AEIOUY])C(?=[EIY])'), 'SS'), 'C-20': (re.compile('^SC(?=[EIY])'), 'S'), 'C-21': (re.compile('(?<=.)SC(?=[EIY])'), 'SS'), 'C-22': (re.compile('(?<=.)SC(?=[AOU])'), 'SK'), 'C-23': ('SH', 'CH'), 'C-24': (re.compile('TIA$'), 'SSIA'), 'C-25': (re.compile('(?<=[AIOUY])W'), ''), 'C-26': (re.compile('X[CSZ]'), 'X'), 'C-27': (re.compile('(?<=[AEIOUY])Z|(?<=[BCDFGHJKLMNPQRSTVWXZ])Z(?=[BCDFGHJKLMNPQRSTVWXZ])'), 'S'), 'C-28': (re.compile('([BDFGHJKMNPQRTVWXZ])\\1'), '\\1'), 'C-28a': (re.compile('CC(?=[BCDFGHJKLMNPQRSTVWXZ]|$)'), 'C'), 'C-28b': (re.compile('((?<=[BCDFGHJKLMNPQRSTVWXZ])|^)SS'), 'S'), 'C-28bb': (re.compile('SS(?=[BCDFGHJKLMNPQRSTVWXZ]|$)'), 'S'), 'C-28c': (re.compile('((?<=[^I])|^)LL'), 'L'), 'C-28d': (re.compile('ILE$'), 'ILLE'), 'C-29': (re.compile('(ILS|[CS]H|[MN]P|R[CFKLNSX])$|([BCDFGHJKLMNPQRSTVWXZ])[BCDFGHJKLMNPQRSTVWXZ]$'), <function _get_parts>), 'C-3': (re.compile('(?<=[BDFGHJKLMNPQRSTVWZ])C(?=[EIY])'), 'S'), 'C-30,32': (re.compile('^(SA?INT?|SEI[NM]|CINQ?|ST)(?!E)-?'), 'ST-'), 'C-31,33': (re.compile('^(SAINTE|STE)-?'), 'STE-'), 'C-34': ('G#', 'GA'), 'C-35': ('MA#', 'MAC'), 'C-4': (re.compile('^C(?=[EIY])'), 'S'), 'C-5': (re.compile('^C(?=[OUA])'), 'K'), 'C-6': (re.compile('(?<=[AEIOUY])C$'), 'K'), 'C-7': (re.compile('C(?=[BDFGJKLMNPQRSTVWXZ])'), 'K'), 'C-8': (re.compile('CC(?=[AOU])'), 'K'), 'C-9': (re.compile('CC(?=[EIY])'), 'X'), 'V-1': (re.compile('E?AU'), 'O'), 'V-10': ('Y', 'I'), 'V-11': (re.compile('(?<=[AEIOUY])I(?=[AEIOUY])'), 'Y'), 'V-12': (re.compile('(?<=[AEIOUY])ILL'), 'Y'), 'V-13': (re.compile('OU(?=[AEOU]|I(?!LL))'), 'W'), 'V-14': (re.compile('([AEIOUY])(?=\\1)'), ''), 'V-15': (re.compile('[AE]M(?=[BCDFGHJKLMPQRSTVWXZ])(?!$)'), 'EN'), 'V-16': (re.compile('OM(?=[BCDFGHJKLMPQRSTVWXZ])'), 'ON'), 'V-17': (re.compile('AN(?=[BCDFGHJKLMNPQRSTVWXZ])'), 'EN'), 'V-18': (re.compile('(AI[MN]|EIN)(?=[BCDFGHJKLMNPQRSTVWXZ]|$)'), 'IN'), 'V-19': (re.compile('B(O|U|OU)RNE?$'), 'BURN'), 'V-2,5': (re.compile('(E?AU|O)L[TX]$'), 'O'), 'V-20': (re.compile('(^IM|(?<=[BCDFGHJKLMNPQRSTVWXZ])IM(?=[BCDFGHJKLMPQRSTVWXZ]))'), 'IN'), 'V-3,4': (re.compile('E?AU[TX]$'), 'O'), 'V-6': (re.compile('E?AUL?D$'), 'O'), 'V-7': (re.compile('(?<!G)AY$'), 'E'), 'V-8': (re.compile('EUX$'), 'EU'), 'V-9': (re.compile('EY(?=$|[BCDFGHJKLMNPQRSTVWXZ])'), 'E')}¶
-
_uc_set
= {'-', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'}¶
-
encode
(word)[source]¶ Return the FONEM code of a word.
- Parameters
word (str) -- The word to transform
- Returns
The FONEM code
- Return type
str
Examples
>>> pe = FONEM() >>> pe.encode('Marchand') 'MARCHEN' >>> pe.encode('Beaulieu') 'BOLIEU' >>> pe.encode('Beaumont') 'BOMON' >>> pe.encode('Legrand') 'LEGREN' >>> pe.encode('Pelletier') 'PELETIER'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
fonem
(word)[source]¶ Return the FONEM code of a word.
This is a wrapper for
FONEM.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The FONEM code
- Return type
str
Examples
>>> fonem('Marchand') 'MARCHEN' >>> fonem('Beaulieu') 'BOLIEU' >>> fonem('Beaumont') 'BOMON' >>> fonem('Legrand') 'LEGREN' >>> fonem('Pelletier') 'PELETIER'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the FONEM.encode method instead.
-
class
abydos.phonetic.
HenryEarly
(max_length=3)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Henry code, early version.
The early version of Henry coding is given in [LegareLC72]. This is different from the later version defined in [Hen76].
New in version 0.3.6.
Initialize HenryEarly instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 3)
New in version 0.4.0.
-
_diph
= {'AI': 'E', 'AU': 'O', 'AY': 'E', 'EI': 'E', 'EU': 'U', 'OI': 'O', 'OU': 'O'}¶
-
_simple
= {'W': 'V', 'X': 'S', 'Z': 'S'}¶
-
_uc_c_set
= {'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Z'}¶
-
encode
(word)[source]¶ Calculate the early version of the Henry code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The early Henry code
- Return type
str
Examples
>>> pe = HenryEarly() >>> pe.encode('Marchand') 'MRC' >>> pe.encode('Beaulieu') 'BL' >>> pe.encode('Beaumont') 'BM' >>> pe.encode('Legrand') 'LGR' >>> pe.encode('Pelletier') 'PLT'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
henry_early
(word, max_length=3)[source]¶ Calculate the early version of the Henry code for a word.
This is a wrapper for
HenryEarly.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 3)
- Returns
The early Henry code
- Return type
str
Examples
>>> henry_early('Marchand') 'MRC' >>> henry_early('Beaulieu') 'BL' >>> henry_early('Beaumont') 'BM' >>> henry_early('Legrand') 'LGR' >>> henry_early('Pelletier') 'PLT'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the HenryEarly.encode method instead.
-
class
abydos.phonetic.
Koelner
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Kölner Phonetik.
Based on the algorithm defined by [Pos69].
New in version 0.3.6.
-
_num_set
= {'0', '1', '2', '3', '4', '5', '6', '7', '8'}¶
-
_num_trans
= {48: 'A', 49: 'P', 50: 'T', 51: 'F', 52: 'K', 53: 'L', 54: 'N', 55: 'R', 56: 'S'}¶
-
_to_alpha
(num)[source]¶ Convert a Kölner Phonetik code from numeric to alphabetic.
- Parameters
num (str or int) -- A numeric Kölner Phonetik representation
- Returns
An alphabetic representation of the same word
- Return type
str
Examples
>>> pe = Koelner() >>> pe._to_alpha('862') 'SNT' >>> pe._to_alpha('657') 'NLR' >>> pe._to_alpha('86766') 'SNRNN'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
_uc_v_set
= {'A', 'E', 'I', 'J', 'O', 'U', 'Y'}¶
-
encode
(word)[source]¶ Return the Kölner Phonetik (numeric output) code for a word.
While the output code is numeric, it is still a str because 0s can lead the code.
- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as a numeric string
- Return type
str
Example
>>> pe = Koelner() >>> pe.encode('Christopher') '478237' >>> pe.encode('Niall') '65' >>> pe.encode('Smith') '862' >>> pe.encode('Schmidt') '862' >>> pe.encode('Müller') '657' >>> pe.encode('Zimmermann') '86766'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the Kölner Phonetik (alphabetic output) code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as an alphabetic string
- Return type
str
Examples
>>> pe = Koelner() >>> pe.encode_alpha('Smith') 'SNT' >>> pe.encode_alpha('Schmidt') 'SNT' >>> pe.encode_alpha('Müller') 'NLR' >>> pe.encode_alpha('Zimmermann') 'SNRNN'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
koelner_phonetik
(word)[source]¶ Return the Kölner Phonetik (numeric output) code for a word.
This is a wrapper for
Koelner.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as a numeric string
- Return type
str
Example
>>> koelner_phonetik('Christopher') '478237' >>> koelner_phonetik('Niall') '65' >>> koelner_phonetik('Smith') '862' >>> koelner_phonetik('Schmidt') '862' >>> koelner_phonetik('Müller') '657' >>> koelner_phonetik('Zimmermann') '86766'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Koelner.encode method instead.
-
abydos.phonetic.
koelner_phonetik_num_to_alpha
(num)[source]¶ Convert a Kölner Phonetik code from numeric to alphabetic.
This is a wrapper for
Koelner._to_alpha()
.- Parameters
num (str or int) -- A numeric Kölner Phonetik representation
- Returns
An alphabetic representation of the same word
- Return type
str
Examples
>>> koelner_phonetik_num_to_alpha('862') 'SNT' >>> koelner_phonetik_num_to_alpha('657') 'NLR' >>> koelner_phonetik_num_to_alpha('86766') 'SNRNN'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Koelner._to_alpha method instead.
-
abydos.phonetic.
koelner_phonetik_alpha
(word)[source]¶ Return the Kölner Phonetik (alphabetic output) code for a word.
This is a wrapper for
Koelner.encode_alpha()
.- Parameters
word (str) -- The word to transform
- Returns
The Kölner Phonetik value as an alphabetic string
- Return type
str
Examples
>>> koelner_phonetik_alpha('Smith') 'SNT' >>> koelner_phonetik_alpha('Schmidt') 'SNT' >>> koelner_phonetik_alpha('Müller') 'NLR' >>> koelner_phonetik_alpha('Zimmermann') 'SNRNN'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Koelner.encode_alpha method instead.
-
class
abydos.phonetic.
Haase
(primary_only=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Haase Phonetik.
Based on the algorithm described at [Pra15].
Based on the original [HH00].
New in version 0.3.6.
Initialize Haase instance.
- Parameters
primary_only (bool) -- If True, only the primary code is returned
New in version 0.4.0.
-
_alphabetic
= {49: 'P', 50: 'T', 51: 'F', 52: 'K', 53: 'L', 54: 'N', 55: 'R', 56: 'S', 57: 'A'}¶
-
_uc_v_set
= {'A', 'E', 'I', 'J', 'O', 'U', 'Y'}¶
-
encode
(word)[source]¶ Return the Haase Phonetik (numeric output) code for a word.
While the output code is numeric, it is nevertheless a str.
- Parameters
word (str) -- The word to transform
- Returns
The Haase Phonetik value as a numeric string
- Return type
tuple
Examples
>>> pe = Haase() >>> pe.encode('Joachim') ('9496',) >>> pe.encode('Christoph') ('4798293', '8798293') >>> pe.encode('Jörg') ('974',) >>> pe.encode('Smith') ('8692',) >>> pe.encode('Schmidt') ('8692', '4692')
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic Haase Phonetik code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Haase Phonetik value
- Return type
tuple
Examples
>>> pe = Haase() >>> pe.encode_alpha('Joachim') ('AKAN',) >>> pe.encode_alpha('Christoph') ('KRASTAF', 'SRASTAF') >>> pe.encode_alpha('Jörg') ('ARK',) >>> pe.encode_alpha('Smith') ('SNAT',) >>> pe.encode_alpha('Schmidt') ('SNAT', 'KNAT')
New in version 0.4.0.
-
abydos.phonetic.
haase_phonetik
(word, primary_only=False)[source]¶ Return the Haase Phonetik code for a word.
This is a wrapper for
Haase.encode()
.- Parameters
word (str) -- The word to transform
primary_only (bool) -- If True, only the primary code is returned
- Returns
The Haase Phonetik value as a numeric string
- Return type
tuple
Examples
>>> haase_phonetik('Joachim') ('9496',) >>> haase_phonetik('Christoph') ('4798293', '8798293') >>> haase_phonetik('Jörg') ('974',) >>> haase_phonetik('Smith') ('8692',) >>> haase_phonetik('Schmidt') ('8692', '4692')
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Haase.encode method instead.
-
class
abydos.phonetic.
RethSchek
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Reth-Schek Phonetik.
This algorithm is proposed in [vonRethS77].
Since I couldn't secure a copy of that document (maybe I'll look for it next time I'm in Germany), this implementation is based on what I could glean from the implementations published by German Record Linkage Center (www.record-linkage.de):
Rules that are unclear:
Should 'C' become 'G' or 'Z'? (PPRL has both, 'Z' rule blocked)
Should 'CC' become 'G'? (PPRL has blocked 'CK' that may be typo)
Should 'TUI' -> 'ZUI' rule exist? (PPRL has rule, but I can't think of a German word with '-tui-' in it.)
Should we really change 'SCH' -> 'CH' and then 'CH' -> 'SCH'?
New in version 0.3.6.
-
_replacements
= {1: {'C': 'G', 'K': 'G', 'P': 'B', 'T': 'D', 'V': 'F', 'W': 'F', 'Y': 'I'}, 2: {'AA': 'A', 'AE': 'E', 'AH': 'A', 'AY': 'AI', 'BB': 'B', 'BP': 'B', 'CC': 'C', 'CK': 'G', 'DD': 'D', 'DT': 'D', 'EE': 'E', 'EH': 'E', 'EI': 'AI', 'EU': 'OI', 'EY': 'AI', 'FF': 'F', 'GG': 'G', 'GK': 'G', 'GS': 'X', 'IE': 'I', 'IH': 'I', 'KG': 'G', 'KK': 'K', 'KS': 'X', 'KW': 'QU', 'LL': 'L', 'MM': 'M', 'NN': 'N', 'OH': 'O', 'OO': 'O', 'PB': 'B', 'PH': 'F', 'PP': 'B', 'RR': 'R', 'SS': 'S', 'SZ': 'S', 'TH': 'D', 'TT': 'D', 'TZ': 'Z', 'UH': 'U'}, 3: {'AEH': 'E', 'AEU': 'OI', 'CHS': 'X', 'CKS': 'X', 'IEH': 'I', 'OEH': 'OE', 'SCH': 'CH', 'TIU': 'TIO', 'UEH': 'UE', 'ZIO': 'TIO', 'ZIU': 'TIO'}}¶
-
encode
(word)[source]¶ Return Reth-Schek Phonetik code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The Reth-Schek Phonetik code
- Return type
str
Examples
>>> pe = RethSchek() >>> pe.encode('Joachim') 'JOAGHIM' >>> pe.encode('Christoph') 'GHRISDOF' >>> pe.encode('Jörg') 'JOERG' >>> pe.encode('Smith') 'SMID' >>> pe.encode('Schmidt') 'SCHMID'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
reth_schek_phonetik
(word)[source]¶ Return Reth-Schek Phonetik code for a word.
This is a wrapper for
RethSchek.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The Reth-Schek Phonetik code
- Return type
str
Examples
>>> reth_schek_phonetik('Joachim') 'JOAGHIM' >>> reth_schek_phonetik('Christoph') 'GHRISDOF' >>> reth_schek_phonetik('Jörg') 'JOERG' >>> reth_schek_phonetik('Smith') 'SMID' >>> reth_schek_phonetik('Schmidt') 'SCHMID'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RethSchek.encode method instead.
-
class
abydos.phonetic.
Phonem
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonem.
Phonem is defined in [GM88].
This version is based on the Perl implementation documented at [Wil05]. It includes some enhancements presented in the Java port at [dcm4che].
Phonem is intended chiefly for German names/words.
New in version 0.3.6.
-
_substitutions
= (('SC', 'C'), ('SZ', 'C'), ('CZ', 'C'), ('TZ', 'C'), ('TS', 'C'), ('KS', 'X'), ('PF', 'V'), ('QU', 'KW'), ('PH', 'V'), ('UE', 'Y'), ('AE', 'E'), ('OE', 'Ö'), ('EI', 'AY'), ('EY', 'AY'), ('EU', 'OY'), ('AU', 'A§'), ('OU', '§'))¶
-
_trans
= {70: 'V', 71: 'C', 73: 'Y', 74: 'Y', 75: 'C', 80: 'B', 81: 'C', 84: 'D', 87: 'V', 90: 'C', 167: 'U', 192: 'A', 193: 'A', 194: 'A', 195: 'A', 196: 'E', 197: 'A', 198: 'E', 199: 'C', 200: 'E', 201: 'E', 202: 'E', 203: 'E', 204: 'Y', 205: 'Y', 206: 'Y', 207: 'Y', 209: 'N', 210: 'O', 211: 'O', 212: 'O', 213: 'O', 216: 'Ö', 217: 'U', 218: 'U', 219: 'U', 220: 'Y', 221: 'Y', 223: 'S'}¶
-
_uc_set
= {'A', 'B', 'C', 'D', 'L', 'M', 'N', 'O', 'R', 'S', 'U', 'V', 'W', 'X', 'Y', 'Ö'}¶
-
encode
(word)[source]¶ Return the Phonem code for a word.
- Parameters
word (str) --
word to transform (The) --
- Returns
The Phonem value
- Return type
str
Examples
>>> pe = Phonem() >>> pe.encode('Christopher') 'CRYSDOVR' >>> pe.encode('Niall') 'NYAL' >>> pe.encode('Smith') 'SMYD' >>> pe.encode('Schmidt') 'CMYD'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
phonem
(word)[source]¶ Return the Phonem code for a word.
This is a wrapper for
Phonem.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The Phonem value
- Return type
str
Examples
>>> phonem('Christopher') 'CRYSDOVR' >>> phonem('Niall') 'NYAL' >>> phonem('Smith') 'SMYD' >>> phonem('Schmidt') 'CMYD'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonem.encode method instead.
-
class
abydos.phonetic.
Phonet
(mode=1, lang='de')[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Phonet code.
phonet ("Hannoveraner Phonetik") was developed by Jörg Michael and documented in [Mic99].
This is a port of Jesper Zedlitz's code, which is licensed LGPL [Zed15].
That is, in turn, based on Michael's C code, which is also licensed LGPL [Mic07].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
mode (int) -- The ponet variant to employ (1 or 2)
lang (str) --
de
(default) for German,none
for no language
New in version 0.4.0.
-
_rules_german
= ('´', ' ', ' ', '"', ' ', ' ', '`$', '', '', "'", ' ', ' ', ',', ' ', ' ', ';', ' ', ' ', '-', ' ', ' ', ' ', ' ', ' ', '.', '.', '.', ':', '.', '.', 'ÄE', 'E', 'E', 'ÄU<', 'EU', 'EU', 'ÄV(AEOU)-<', 'EW', None, 'Ä$', 'Ä', None, 'Ä<', None, 'E', 'Ä', 'E', None, 'ÖE', 'Ö', 'Ö', 'ÖU', 'Ö', 'Ö', 'ÖVER--<', 'ÖW', None, 'ÖV(AOU)-', 'ÖW', None, 'ÜBEL(GNRW)-^^', 'ÜBL ', 'IBL ', 'ÜBER^^', 'ÜBA', 'IBA', 'ÜE', 'Ü', 'I', 'ÜVER--<', 'ÜW', None, 'ÜV(AOU)-', 'ÜW', None, 'Ü', None, 'I', 'ßCH<', None, 'Z', 'ß<', 'S', 'Z', 'À<', 'A', 'A', 'Á<', 'A', 'A', 'Â<', 'A', 'A', 'Ã<', 'A', 'A', 'Å<', 'A', 'A', 'ÆER-', 'E', 'E', 'ÆU<', 'EU', 'EU', 'ÆV(AEOU)-<', 'EW', None, 'Æ$', 'Ä', None, 'Æ<', None, 'E', 'Æ', 'E', None, 'Ç', 'Z', 'Z', 'ÐÐ-', '', '', 'Ð', 'DI', 'TI', 'È<', 'E', 'E', 'É<', 'E', 'E', 'Ê<', 'E', 'E', 'Ë', 'E', 'E', 'Ì<', 'I', 'I', 'Í<', 'I', 'I', 'Î<', 'I', 'I', 'Ï', 'I', 'I', 'ÑÑ-', '', '', 'Ñ', 'NI', 'NI', 'Ò<', 'O', 'U', 'Ó<', 'O', 'U', 'Ô<', 'O', 'U', 'Õ<', 'O', 'U', 'Œ<', 'Ö', 'Ö', 'Ø(IJY)-<', 'E', 'E', 'Ø<', 'Ö', 'Ö', 'Š', 'SH', 'Z', 'Þ', 'T', 'T', 'Ù<', 'U', 'U', 'Ú<', 'U', 'U', 'Û<', 'U', 'U', 'Ý<', 'I', 'I', 'Ÿ<', 'I', 'I', 'ABELLE$', 'ABL', 'ABL', 'ABELL$', 'ABL', 'ABL', 'ABIENNE$', 'ABIN', 'ABIN', 'ACHME---^', 'ACH', 'AK', 'ACEY$', 'AZI', 'AZI', 'ADV', 'ATW', None, 'AEGL-', 'EK', None, 'AEU<', 'EU', 'EU', 'AE2', 'E', 'E', 'AFTRAUBEN------', 'AFT ', 'AFT ', 'AGL-1', 'AK', None, 'AGNI-^', 'AKN', 'AKN', 'AGNIE-', 'ANI', 'ANI', 'AGN(AEOU)-$', 'ANI', 'ANI', 'AH(AIOÖUÜY)-', 'AH', None, 'AIA2', 'AIA', 'AIA', 'AIE$', 'E', 'E', 'AILL(EOU)-', 'ALI', 'ALI', 'AINE$', 'EN', 'EN', 'AIRE$', 'ER', 'ER', 'AIR-', 'E', 'E', 'AISE$', 'ES', 'EZ', 'AISSANCE$', 'ESANS', 'EZANZ', 'AISSE$', 'ES', 'EZ', 'AIX$', 'EX', 'EX', 'AJ(AÄEÈÉÊIOÖUÜ)--', 'A', 'A', 'AKTIE', 'AXIE', 'AXIE', 'AKTUEL', 'AKTUEL', None, 'ALOI^', 'ALOI', 'ALUI', 'ALOY^', 'ALOI', 'ALUI', 'AMATEU(RS)-', 'AMATÖ', 'ANATÖ', 'ANCH(OEI)-', 'ANSH', 'ANZ', 'ANDERGEGANG----', 'ANDA GE', 'ANTA KE', 'ANDERGEHE----', 'ANDA ', 'ANTA ', 'ANDERGESETZ----', 'ANDA GE', 'ANTA KE', 'ANDERGING----', 'ANDA ', 'ANTA ', 'ANDERSETZ(ET)-----', 'ANDA ', 'ANTA ', 'ANDERZUGEHE----', 'ANDA ZU ', 'ANTA ZU ', 'ANDERZUSETZE-----', 'ANDA ZU ', 'ANTA ZU ', 'ANER(BKO)---^^', 'AN', None, 'ANHAND---^$', 'AN H', 'AN ', 'ANH(AÄEIOÖUÜY)--^^', 'AN', None, 'ANIELLE$', 'ANIEL', 'ANIL', 'ANIEL', 'ANIEL', None, 'ANSTELLE----^$', 'AN ST', 'AN ZT', 'ANTI^^', 'ANTI', 'ANTI', 'ANVER^^', 'ANFA', 'ANFA', 'ATIA$', 'ATIA', 'ATIA', 'ATIA(NS)--', 'ATI', 'ATI', 'ATI(AÄOÖUÜ)-', 'AZI', 'AZI', 'AUAU--', '', '', 'AUERE$', 'AUERE', None, 'AUERE(NS)-$', 'AUERE', None, 'AUERE(AIOUY)--', 'AUER', None, 'AUER(AÄIOÖUÜY)-', 'AUER', None, 'AUER<', 'AUA', 'AUA', 'AUF^^', 'AUF', 'AUF', 'AULT$', 'O', 'U', 'AUR(BCDFGKLMNQSTVWZ)-', 'AUA', 'AUA', 'AUR$', 'AUA', 'AUA', 'AUSSE$', 'OS', 'UZ', 'AUS(ST)-^', 'AUS', 'AUS', 'AUS^^', 'AUS', 'AUS', 'AUTOFAHR----', 'AUTO ', 'AUTU ', 'AUTO^^', 'AUTO', 'AUTU', 'AUX(IY)-', 'AUX', 'AUX', 'AUX', 'O', 'U', 'AU', 'AU', 'AU', 'AVER--<', 'AW', None, 'AVIER$', 'AWIE', 'AFIE', 'AV(EÈÉÊI)-^', 'AW', None, 'AV(AOU)-', 'AW', None, 'AYRE$', 'EIRE', 'EIRE', 'AYRE(NS)-$', 'EIRE', 'EIRE', 'AYRE(AIOUY)--', 'EIR', 'EIR', 'AYR(AÄIOÖUÜY)-', 'EIR', 'EIR', 'AYR<', 'EIA', 'EIA', 'AYER--<', 'EI', 'EI', 'AY(AÄEIOÖUÜY)--', 'A', 'A', 'AË', 'E', 'E', 'A(IJY)<', 'EI', 'EI', 'BABY^$', 'BEBI', 'BEBI', 'BAB(IY)^', 'BEBI', 'BEBI', 'BEAU^$', 'BO', None, 'BEA(BCMNRU)-^', 'BEA', 'BEA', 'BEAT(AEIMORU)-^', 'BEAT', 'BEAT', 'BEE$', 'BI', 'BI', 'BEIGE^$', 'BESH', 'BEZ', 'BENOIT--', 'BENO', 'BENU', 'BER(DT)-', 'BER', None, 'BERN(DT)-', 'BERN', None, 'BE(LMNRST)-^', 'BE', 'BE', 'BETTE$', 'BET', 'BET', 'BEVOR^$', 'BEFOR', None, 'BIC$', 'BIZ', 'BIZ', 'BOWL(EI)-', 'BOL', 'BUL', 'BP(AÄEÈÉÊIÌÍÎOÖRUÜY)-', 'B', 'B', 'BRINGEND-----^', 'BRI', 'BRI', 'BRINGEND-----', ' BRI', ' BRI', 'BROW(NS)-', 'BRAU', 'BRAU', 'BUDGET7', 'BÜGE', 'BIKE', 'BUFFET7', 'BÜFE', 'BIFE', 'BYLLE$', 'BILE', 'BILE', 'BYLL$', 'BIL', 'BIL', 'BYPA--^', 'BEI', 'BEI', 'BYTE<', 'BEIT', 'BEIT', 'BY9^', 'BÜ', None, 'B(SßZ)$', 'BS', None, 'CACH(EI)-^', 'KESH', 'KEZ', 'CAE--', 'Z', 'Z', 'CA(IY)$', 'ZEI', 'ZEI', 'CE(EIJUY)--', 'Z', 'Z', 'CENT<', 'ZENT', 'ZENT', 'CERST(EI)----^', 'KE', 'KE', 'CER$', 'ZA', 'ZA', 'CE3', 'ZE', 'ZE', "CH'S$", 'X', 'X', 'CH´S$', 'X', 'X', 'CHAO(ST)-', 'KAO', 'KAU', 'CHAMPIO-^', 'SHEMPI', 'ZENBI', 'CHAR(AI)-^', 'KAR', 'KAR', 'CHAU(CDFSVWXZ)-', 'SHO', 'ZU', 'CHÄ(CF)-', 'SHE', 'ZE', 'CHE(CF)-', 'SHE', 'ZE', 'CHEM-^', 'KE', 'KE', 'CHEQUE<', 'SHEK', 'ZEK', 'CHI(CFGPVW)-', 'SHI', 'ZI', 'CH(AEUY)-<^', 'SH', 'Z', 'CHK-', '', '', 'CHO(CKPS)-^', 'SHO', 'ZU', 'CHRIS-', 'KRI', None, 'CHRO-', 'KR', None, 'CH(LOR)-<^', 'K', 'K', 'CHST-', 'X', 'X', 'CH(SßXZ)3', 'X', 'X', 'CHTNI-3', 'CHN', 'KN', 'CH^', 'K', 'K', 'CH', 'CH', 'K', 'CIC$', 'ZIZ', 'ZIZ', 'CIENCEFICT----', 'EIENS ', 'EIENZ ', 'CIENCE$', 'EIENS', 'EIENZ', 'CIER$', 'ZIE', 'ZIE', 'CYB-^', 'ZEI', 'ZEI', 'CY9^', 'ZÜ', 'ZI', 'C(IJY)-<3', 'Z', 'Z', 'CLOWN-', 'KLAU', 'KLAU', 'CCH', 'Z', 'Z', 'CCE-', 'X', 'X', 'C(CK)-', '', '', 'CLAUDET---', 'KLO', 'KLU', 'CLAUDINE^$', 'KLODIN', 'KLUTIN', 'COACH', 'KOSH', 'KUZ', 'COLE$', 'KOL', 'KUL', 'COUCH', 'KAUSH', 'KAUZ', 'COW', 'KAU', 'KAU', 'CQUES$', 'K', 'K', 'CQUE', 'K', 'K', 'CRASH--9', 'KRE', 'KRE', 'CREAT-^', 'KREA', 'KREA', 'CST', 'XT', 'XT', 'CS<^', 'Z', 'Z', 'C(SßX)', 'X', 'X', "CT'S$", 'X', 'X', 'CT(SßXZ)', 'X', 'X', 'CZ<', 'Z', 'Z', 'C(ÈÉÊÌÍÎÝ)3', 'Z', 'Z', 'C.^', 'C.', 'C.', 'CÄ-', 'Z', 'Z', 'CÜ$', 'ZÜ', 'ZI', "C'S$", 'X', 'X', 'C<', 'K', 'K', 'DAHER^$', 'DAHER', None, 'DARAUFFOLGE-----', 'DARAUF ', 'TARAUF ', 'DAVO(NR)-^$', 'DAFO', 'TAFU', 'DD(SZ)--<', '', '', 'DD9', 'D', None, 'DEPOT7', 'DEPO', 'TEBU', 'DESIGN', 'DISEIN', 'TIZEIN', 'DE(LMNRST)-3^', 'DE', 'TE', 'DETTE$', 'DET', 'TET', 'DH$', 'T', None, 'DIC$', 'DIZ', 'TIZ', 'DIDR-^', 'DIT', None, 'DIEDR-^', 'DIT', None, 'DJ(AEIOU)-^', 'I', 'I', 'DMITR-^', 'DIMIT', 'TINIT', 'DRY9^', 'DRÜ', None, 'DT-', '', '', 'DUIS-^', 'DÜ', 'TI', 'DURCH^^', 'DURCH', 'TURK', 'DVA$', 'TWA', None, 'DY9^', 'DÜ', None, 'DYS$', 'DIS', None, 'DS(CH)--<', 'T', 'T', 'DST', 'ZT', 'ZT', 'DZS(CH)--', 'T', 'T', 'D(SßZ)', 'Z', 'Z', 'D(AÄEIOÖRUÜY)-', 'D', None, 'D(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'D', None, "D'H^", 'D', 'T', 'D´H^', 'D', 'T', 'D`H^', 'D', 'T', "D'S3$", 'Z', 'Z', 'D´S3$', 'Z', 'Z', 'D^', 'D', None, 'D', 'T', 'T', 'EAULT$', 'O', 'U', 'EAUX$', 'O', 'U', 'EAU', 'O', 'U', 'EAV', 'IW', 'IF', 'EAS3$', 'EAS', None, 'EA(AÄEIOÖÜY)-3', 'EA', 'EA', 'EA3$', 'EA', 'EA', 'EA3', 'I', 'I', 'EBENSO^$', 'EBNSO', 'EBNZU', 'EBENSO^^', 'EBNSO ', 'EBNZU ', 'EBEN^^', 'EBN', 'EBN', 'EE9', 'E', 'E', 'EGL-1', 'EK', None, 'EHE(IUY)--1', 'EH', None, 'EHUNG---1', 'E', None, 'EH(AÄIOÖUÜY)-1', 'EH', None, 'EIEI--', '', '', 'EIERE^$', 'EIERE', None, 'EIERE$', 'EIERE', None, 'EIERE(NS)-$', 'EIERE', None, 'EIERE(AIOUY)--', 'EIER', None, 'EIER(AÄIOÖUÜY)-', 'EIER', None, 'EIER<', 'EIA', None, 'EIGL-1', 'EIK', None, 'EIGH$', 'EI', 'EI', 'EIH--', 'E', 'E', 'EILLE$', 'EI', 'EI', 'EIR(BCDFGKLMNQSTVWZ)-', 'EIA', 'EIA', 'EIR$', 'EIA', 'EIA', 'EITRAUBEN------', 'EIT ', 'EIT ', 'EI', 'EI', 'EI', 'EJ$', 'EI', 'EI', 'ELIZ^', 'ELIS', None, 'ELZ^', 'ELS', None, 'EL-^', 'E', 'E', 'ELANG----1', 'E', 'E', 'EL(DKL)--1', 'E', 'E', 'EL(MNT)--1$', 'E', 'E', 'ELYNE$', 'ELINE', 'ELINE', 'ELYN$', 'ELIN', 'ELIN', 'EL(AÄEÈÉÊIÌÍÎOÖUÜY)-1', 'EL', 'EL', 'EL-1', 'L', 'L', 'EM-^', None, 'E', 'EM(DFKMPQT)--1', None, 'E', 'EM(AÄEÈÉÊIÌÍÎOÖUÜY)--1', None, 'E', 'EM-1', None, 'N', 'ENGAG-^', 'ANGA', 'ANKA', 'EN-^', 'E', 'E', 'ENTUEL', 'ENTUEL', None, 'EN(CDGKQSTZ)--1', 'E', 'E', 'EN(AÄEÈÉÊIÌÍÎNOÖUÜY)-1', 'EN', 'EN', 'EN-1', '', '', 'ERH(AÄEIOÖUÜ)-^', 'ERH', 'ER', 'ER-^', 'E', 'E', 'ERREGEND-----', ' ER', ' ER', 'ERT1$', 'AT', None, 'ER(DGLKMNRQTZß)-1', 'ER', None, 'ER(AÄEÈÉÊIÌÍÎOÖUÜY)-1', 'ER', 'A', 'ER1$', 'A', 'A', 'ER<1', 'A', 'A', 'ETAT7', 'ETA', 'ETA', 'ETI(AÄOÖÜU)-', 'EZI', 'EZI', 'EUERE$', 'EUERE', None, 'EUERE(NS)-$', 'EUERE', None, 'EUERE(AIOUY)--', 'EUER', None, 'EUER(AÄIOÖUÜY)-', 'EUER', None, 'EUER<', 'EUA', None, 'EUEU--', '', '', 'EUILLE$', 'Ö', 'Ö', 'EUR$', 'ÖR', 'ÖR', 'EUX', 'Ö', 'Ö', 'EUSZ$', 'EUS', None, 'EUTZ$', 'EUS', None, 'EUYS$', 'EUS', 'EUZ', 'EUZ$', 'EUS', None, 'EU', 'EU', 'EU', 'EVER--<1', 'EW', None, 'EV(ÄOÖUÜ)-1', 'EW', None, 'EYER<', 'EIA', 'EIA', 'EY<', 'EI', 'EI', 'FACETTE', 'FASET', 'FAZET', 'FANS--^$', 'FE', 'FE', 'FAN-^$', 'FE', 'FE', 'FAULT-', 'FOL', 'FUL', 'FEE(DL)-', 'FI', 'FI', 'FEHLER', 'FELA', 'FELA', 'FE(LMNRST)-3^', 'FE', 'FE', 'FOERDERN---^', 'FÖRD', 'FÖRT', 'FOERDERN---', ' FÖRD', ' FÖRT', 'FOND7', 'FON', 'FUN', 'FRAIN$', 'FRA', 'FRA', 'FRISEU(RS)-', 'FRISÖ', 'FRIZÖ', 'FY9^', 'FÜ', None, 'FÖRDERN---^', 'FÖRD', 'FÖRT', 'FÖRDERN---', ' FÖRD', ' FÖRT', 'GAGS^$', 'GEX', 'KEX', 'GAG^$', 'GEK', 'KEK', 'GD', 'KT', 'KT', 'GEGEN^^', 'GEGN', 'KEKN', 'GEGENGEKOM-----', 'GEGN ', 'KEKN ', 'GEGENGESET-----', 'GEGN ', 'KEKN ', 'GEGENKOMME-----', 'GEGN ', 'KEKN ', 'GEGENZUKOM---', 'GEGN ZU ', 'KEKN ZU ', 'GENDETWAS-----$', 'GENT ', 'KENT ', 'GENRE', 'IORE', 'IURE', 'GE(LMNRST)-3^', 'GE', 'KE', 'GER(DKT)-', 'GER', None, 'GETTE$', 'GET', 'KET', 'GGF.', 'GF.', None, 'GG-', '', '', 'GH', 'G', None, 'GI(AOU)-^', 'I', 'I', 'GION-3', 'KIO', 'KIU', 'G(CK)-', '', '', 'GJ(AEIOU)-^', 'I', 'I', 'GMBH^$', 'GMBH', 'GMBH', 'GNAC$', 'NIAK', 'NIAK', 'GNON$', 'NION', 'NIUN', 'GN$', 'N', 'N', 'GONCAL-^', 'GONZA', 'KUNZA', 'GRY9^', 'GRÜ', None, 'G(SßXZ)-<', 'K', 'K', 'GUCK-', 'KU', 'KU', 'GUISEP-^', 'IUSE', 'IUZE', 'GUI-^', 'G', 'K', 'GUTAUSSEH------^', 'GUT ', 'KUT ', 'GUTGEHEND------^', 'GUT ', 'KUT ', 'GY9^', 'GÜ', None, 'G(AÄEILOÖRUÜY)-', 'G', None, 'G(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'G', None, "G'S$", 'X', 'X', 'G´S$', 'X', 'X', 'G^', 'G', None, 'G', 'K', 'K', 'HA(HIUY)--1', 'H', None, 'HANDVOL---^', 'HANT ', 'ANT ', 'HANNOVE-^', 'HANOF', None, 'HAVEN7$', 'HAFN', None, 'HEAD-', 'HE', 'E', 'HELIEGEN------', 'E ', 'E ', 'HESTEHEN------', 'E ', 'E ', 'HE(LMNRST)-3^', 'HE', 'E', 'HE(LMN)-1', 'E', 'E', 'HEUR1$', 'ÖR', 'ÖR', 'HE(HIUY)--1', 'H', None, 'HIH(AÄEIOÖUÜY)-1', 'IH', None, 'HLH(AÄEIOÖUÜY)-1', 'LH', None, 'HMH(AÄEIOÖUÜY)-1', 'MH', None, 'HNH(AÄEIOÖUÜY)-1', 'NH', None, 'HOBBY9^', 'HOBI', None, 'HOCHBEGAB-----^', 'HOCH ', 'UK ', 'HOCHTALEN-----^', 'HOCH ', 'UK ', 'HOCHZUFRI-----^', 'HOCH ', 'UK ', 'HO(HIY)--1', 'H', None, 'HRH(AÄEIOÖUÜY)-1', 'RH', None, 'HUH(AÄEIOÖUÜY)-1', 'UH', None, 'HUIS^^', 'HÜS', 'IZ', 'HUIS$', 'ÜS', 'IZ', 'HUI--1', 'H', None, 'HYGIEN^', 'HÜKIEN', None, 'HY9^', 'HÜ', None, 'HY(BDGMNPST)-', 'Ü', None, 'H.^', None, 'H.', 'HÄU--1', 'H', None, 'H^', 'H', '', 'H', '', '', 'ICHELL---', 'ISH', 'IZ', 'ICHI$', 'ISHI', 'IZI', 'IEC$', 'IZ', 'IZ', 'IEDENSTELLE------', 'IDN ', 'ITN ', 'IEI-3', '', '', 'IELL3', 'IEL', 'IEL', 'IENNE$', 'IN', 'IN', 'IERRE$', 'IER', 'IER', 'IERZULAN---', 'IR ZU ', 'IR ZU ', 'IETTE$', 'IT', 'IT', 'IEU', 'IÖ', 'IÖ', 'IE<4', 'I', 'I', 'IGL-1', 'IK', None, 'IGHT3$', 'EIT', 'EIT', 'IGNI(EO)-', 'INI', 'INI', 'IGN(AEOU)-$', 'INI', 'INI', 'IHER(DGLKRT)--1', 'IHE', None, 'IHE(IUY)--', 'IH', None, 'IH(AIOÖUÜY)-', 'IH', None, 'IJ(AOU)-', 'I', 'I', 'IJ$', 'I', 'I', 'IJ<', 'EI', 'EI', 'IKOLE$', 'IKOL', 'IKUL', 'ILLAN(STZ)--4', 'ILIA', 'ILIA', 'ILLAR(DT)--4', 'ILIA', 'ILIA', 'IMSTAN----^', 'IM ', 'IN ', 'INDELERREGE------', 'INDL ', 'INTL ', 'INFRAGE-----^$', 'IN ', 'IN ', 'INTERN(AOU)-^', 'INTAN', 'INTAN', 'INVER-', 'INWE', 'INFE', 'ITI(AÄIOÖUÜ)-', 'IZI', 'IZI', 'IUSZ$', 'IUS', None, 'IUTZ$', 'IUS', None, 'IUZ$', 'IUS', None, 'IVER--<', 'IW', None, 'IVIER$', 'IWIE', 'IFIE', 'IV(ÄOÖUÜ)-', 'IW', None, 'IV<3', 'IW', None, 'IY2', 'I', None, 'I(ÈÉÊ)<4', 'I', 'I', 'JAVIE---<^', 'ZA', 'ZA', 'JEANS^$', 'JINS', 'INZ', 'JEANNE^$', 'IAN', 'IAN', 'JEAN-^', 'IA', 'IA', 'JER-^', 'IE', 'IE', 'JE(LMNST)-', 'IE', 'IE', 'JI^', 'JI', None, 'JOR(GK)^$', 'IÖRK', 'IÖRK', 'J', 'I', 'I', 'KC(ÄEIJ)-', 'X', 'X', 'KD', 'KT', None, 'KE(LMNRST)-3^', 'KE', 'KE', 'KG(AÄEILOÖRUÜY)-', 'K', None, 'KH<^', 'K', 'K', 'KIC$', 'KIZ', 'KIZ', 'KLE(LMNRST)-3^', 'KLE', 'KLE', 'KOTELE-^', 'KOTL', 'KUTL', 'KREAT-^', 'KREA', 'KREA', 'KRÜS(TZ)--^', 'KRI', None, 'KRYS(TZ)--^', 'KRI', None, 'KRY9^', 'KRÜ', None, 'KSCH---', 'K', 'K', 'KSH--', 'K', 'K', 'K(SßXZ)7', 'X', 'X', "KT'S$", 'X', 'X', 'KTI(AIOU)-3', 'XI', 'XI', 'KT(SßXZ)', 'X', 'X', 'KY9^', 'KÜ', None, "K'S$", 'X', 'X', 'K´S$', 'X', 'X', 'LANGES$', ' LANGES', ' LANKEZ', 'LANGE$', ' LANGE', ' LANKE', 'LANG$', ' LANK', ' LANK', 'LARVE-', 'LARF', 'LARF', 'LD(SßZ)$', 'LS', 'LZ', "LD'S$", 'LS', 'LZ', 'LD´S$', 'LS', 'LZ', 'LEAND-^', 'LEAN', 'LEAN', 'LEERSTEHE-----^', 'LER ', 'LER ', 'LEICHBLEIB-----', 'LEICH ', 'LEIK ', 'LEICHLAUTE-----', 'LEICH ', 'LEIK ', 'LEIDERREGE------', 'LEIT ', 'LEIT ', 'LEIDGEPR----^', 'LEIT ', 'LEIT ', 'LEINSTEHE-----', 'LEIN ', 'LEIN ', 'LEL-', 'LE', 'LE', 'LE(MNRST)-3^', 'LE', 'LE', 'LETTE$', 'LET', 'LET', 'LFGNAG-', 'LFGAN', 'LFKAN', 'LICHERWEIS----', 'LICHA ', 'LIKA ', 'LIC$', 'LIZ', 'LIZ', 'LIVE^$', 'LEIF', 'LEIF', 'LT(SßZ)$', 'LS', 'LZ', "LT'S$", 'LS', 'LZ', 'LT´S$', 'LS', 'LZ', 'LUI(GS)--', 'LU', 'LU', 'LV(AIO)-', 'LW', None, 'LY9^', 'LÜ', None, 'LSTS$', 'LS', 'LZ', 'LZ(BDFGKLMNPQRSTVWX)-', 'LS', None, 'L(SßZ)$', 'LS', None, 'MAIR-<', 'MEI', 'NEI', 'MANAG-', 'MENE', 'NENE', 'MANUEL', 'MANUEL', None, 'MASSEU(RS)-', 'MASÖ', 'NAZÖ', 'MATCH', 'MESH', 'NEZ', 'MAURICE', 'MORIS', 'NURIZ', 'MBH^$', 'MBH', 'MBH', 'MB(ßZ)$', 'MS', None, 'MB(SßTZ)-', 'M', 'N', 'MCG9^', 'MAK', 'NAK', 'MC9^', 'MAK', 'NAK', 'MEMOIR-^', 'MEMOA', 'NENUA', 'MERHAVEN$', 'MAHAFN', None, 'ME(LMNRST)-3^', 'ME', 'NE', 'MEN(STZ)--3', 'ME', None, 'MEN$', 'MEN', None, 'MIGUEL-', 'MIGE', 'NIKE', 'MIKE^$', 'MEIK', 'NEIK', 'MITHILFE----^$', 'MIT H', 'NIT ', 'MN$', 'M', None, 'MN', 'N', 'N', 'MPJUTE-', 'MPUT', 'NBUT', 'MP(ßZ)$', 'MS', None, 'MP(SßTZ)-', 'M', 'N', 'MP(BDJLMNPQVW)-', 'MB', 'NB', 'MY9^', 'MÜ', None, 'M(ßZ)$', 'MS', None, 'M´G7^', 'MAK', 'NAK', "M'G7^", 'MAK', 'NAK', 'M´^', 'MAK', 'NAK', "M'^", 'MAK', 'NAK', 'M', None, 'N', 'NACH^^', 'NACH', 'NAK', 'NADINE', 'NADIN', 'NATIN', 'NAIV--', 'NA', 'NA', 'NAISE$', 'NESE', 'NEZE', 'NAUGENOMM------', 'NAU ', 'NAU ', 'NAUSOGUT$', 'NAUSO GUT', 'NAUZU KUT', 'NCH$', 'NSH', 'NZ', 'NCOISE$', 'SOA', 'ZUA', 'NCOIS$', 'SOA', 'ZUA', 'NDAR$', 'NDA', 'NTA', 'NDERINGEN------', 'NDE ', 'NTE ', 'NDRO(CDKTZ)-', 'NTRO', None, 'ND(BFGJLMNPQVW)-', 'NT', None, 'ND(SßZ)$', 'NS', 'NZ', "ND'S$", 'NS', 'NZ', 'ND´S$', 'NS', 'NZ', 'NEBEN^^', 'NEBN', 'NEBN', 'NENGELERN------', 'NEN ', 'NEN ', 'NENLERN(ET)---', 'NEN LE', 'NEN LE', 'NENZULERNE---', 'NEN ZU LE', 'NEN ZU LE', 'NE(LMNRST)-3^', 'NE', 'NE', 'NEN-3', 'NE', 'NE', 'NETTE$', 'NET', 'NET', 'NGU^^', 'NU', 'NU', 'NG(BDFJLMNPQRTVW)-', 'NK', 'NK', 'NH(AUO)-$', 'NI', 'NI', 'NICHTSAHNEN-----', 'NIX ', 'NIX ', 'NICHTSSAGE----', 'NIX ', 'NIX ', 'NICHTS^^', 'NIX', 'NIX', 'NICHT^^', 'NICHT', 'NIKT', 'NINE$', 'NIN', 'NIN', 'NON^^', 'NON', 'NUN', 'NOTLEIDE-----^', 'NOT ', 'NUT ', 'NOT^^', 'NOT', 'NUT', 'NTI(AIOU)-3', 'NZI', 'NZI', 'NTIEL--3', 'NZI', 'NZI', 'NT(SßZ)$', 'NS', 'NZ', "NT'S$", 'NS', 'NZ', 'NT´S$', 'NS', 'NZ', 'NYLON', 'NEILON', 'NEILUN', 'NY9^', 'NÜ', None, 'NSTZUNEH---', 'NST ZU ', 'NZT ZU ', 'NSZ-', 'NS', None, 'NSTS$', 'NS', 'NZ', 'NZ(BDFGKLMNPQRSTVWX)-', 'NS', None, 'N(SßZ)$', 'NS', None, 'OBERE-', 'OBER', None, 'OBER^^', 'OBA', 'UBA', 'OEU2', 'Ö', 'Ö', 'OE<2', 'Ö', 'Ö', 'OGL-', 'OK', None, 'OGNIE-', 'ONI', 'UNI', 'OGN(AEOU)-$', 'ONI', 'UNI', 'OH(AIOÖUÜY)-', 'OH', None, 'OIE$', 'Ö', 'Ö', 'OIRE$', 'OA', 'UA', 'OIR$', 'OA', 'UA', 'OIX', 'OA', 'UA', 'OI<3', 'EU', 'EU', 'OKAY^$', 'OKE', 'UKE', 'OLYN$', 'OLIN', 'ULIN', 'OO(DLMZ)-', 'U', None, 'OO$', 'U', None, 'OO-', '', '', 'ORGINAL-----', 'ORI', 'URI', 'OTI(AÄOÖUÜ)-', 'OZI', 'UZI', 'OUI^', 'WI', 'FI', 'OUILLE$', 'ULIE', 'ULIE', 'OU(DT)-^', 'AU', 'AU', 'OUSE$', 'AUS', 'AUZ', 'OUT-', 'AU', 'AU', 'OU', 'U', 'U', 'O(FV)$', 'AU', 'AU', 'OVER--<', 'OW', None, 'OV(AOU)-', 'OW', None, 'OW$', 'AU', 'AU', 'OWS$', 'OS', 'UZ', 'OJ(AÄEIOÖUÜ)--', 'O', 'U', 'OYER', 'OIA', None, 'OY(AÄEIOÖUÜ)--', 'O', 'U', 'O(JY)<', 'EU', 'EU', 'OZ$', 'OS', None, 'O´^', 'O', 'U', "O'^", 'O', 'U', 'O', None, 'U', 'PATIEN--^', 'PAZI', 'PAZI', 'PENSIO-^', 'PANSI', 'PANZI', 'PE(LMNRST)-3^', 'PE', 'PE', 'PFER-^', 'FE', 'FE', 'P(FH)<', 'F', 'F', 'PIC^$', 'PIK', 'PIK', 'PIC$', 'PIZ', 'PIZ', 'PIPELINE', 'PEIBLEIN', 'PEIBLEIN', 'POLYP-', 'POLÜ', None, 'POLY^^', 'POLI', 'PULI', 'PORTRAIT7', 'PORTRE', 'PURTRE', 'POWER7', 'PAUA', 'PAUA', 'PP(FH)--<', 'B', 'B', 'PP-', '', '', 'PRODUZ-^', 'PRODU', 'BRUTU', 'PRODUZI--', ' PRODU', ' BRUTU', 'PRIX^$', 'PRI', 'PRI', 'PS-^^', 'P', None, 'P(SßZ)^', None, 'Z', 'P(SßZ)$', 'BS', None, 'PT-^', '', '', 'PTI(AÄOÖUÜ)-3', 'BZI', 'BZI', 'PY9^', 'PÜ', None, 'P(AÄEIOÖRUÜY)-', 'P', 'P', 'P(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'P', None, 'P.^', None, 'P.', 'P^', 'P', None, 'P', 'B', 'B', 'QI-', 'Z', 'Z', 'QUARANT--', 'KARA', 'KARA', 'QUE(LMNRST)-3', 'KWE', 'KFE', 'QUE$', 'K', 'K', 'QUI(NS)$', 'KI', 'KI', 'QUIZ7', 'KWIS', None, 'Q(UV)7', 'KW', 'KF', 'Q<', 'K', 'K', 'RADFAHR----', 'RAT ', 'RAT ', 'RAEFTEZEHRE-----', 'REFTE ', 'REFTE ', 'RCH', 'RCH', 'RK', 'REA(DU)---3^', 'R', None, 'REBSERZEUG------', 'REBS ', 'REBZ ', 'RECHERCH^', 'RESHASH', 'REZAZ', 'RECYCL--', 'RIZEI', 'RIZEI', 'RE(ALST)-3^', 'RE', None, 'REE$', 'RI', 'RI', 'RER$', 'RA', 'RA', 'RE(MNR)-4', 'RE', 'RE', 'RETTE$', 'RET', 'RET', 'REUZ$', 'REUZ', None, 'REW$', 'RU', 'RU', 'RH<^', 'R', 'R', 'RJA(MN)--', 'RI', 'RI', 'ROWD-^', 'RAU', 'RAU', 'RTEMONNAIE-', 'RTMON', 'RTNUN', 'RTI(AÄOÖUÜ)-3', 'RZI', 'RZI', 'RTIEL--3', 'RZI', 'RZI', 'RV(AEOU)-3', 'RW', None, 'RY(KN)-$', 'RI', 'RI', 'RY9^', 'RÜ', None, 'RÄFTEZEHRE-----', 'REFTE ', 'REFTE ', 'SAISO-^', 'SES', 'ZEZ', 'SAFE^$', 'SEIF', 'ZEIF', 'SAUCE-^', 'SOS', 'ZUZ', 'SCHLAGGEBEN-----<', 'SHLAK ', 'ZLAK ', 'SCHSCH---7', '', '', 'SCHTSCH', 'SH', 'Z', 'SC(HZ)<', 'SH', 'Z', 'SC', 'SK', 'ZK', 'SELBSTST--7^^', 'SELB', 'ZELB', 'SELBST7^^', 'SELBST', 'ZELBZT', 'SERVICE7^', 'SÖRWIS', 'ZÖRFIZ', 'SERVI-^', 'SERW', None, 'SE(LMNRST)-3^', 'SE', 'ZE', 'SETTE$', 'SET', 'ZET', 'SHP-^', 'S', 'Z', 'SHST', 'SHT', 'ZT', 'SHTSH', 'SH', 'Z', 'SHT', 'ST', 'Z', 'SHY9^', 'SHÜ', None, 'SH^^', 'SH', None, 'SH3', 'SH', 'Z', 'SICHERGEGAN-----^', 'SICHA ', 'ZIKA ', 'SICHERGEHE----^', 'SICHA ', 'ZIKA ', 'SICHERGESTEL------^', 'SICHA ', 'ZIKA ', 'SICHERSTELL-----^', 'SICHA ', 'ZIKA ', 'SICHERZU(GS)--^', 'SICHA ZU ', 'ZIKA ZU ', 'SIEGLI-^', 'SIKL', 'ZIKL', 'SIGLI-^', 'SIKL', 'ZIKL', 'SIGHT', 'SEIT', 'ZEIT', 'SIGN', 'SEIN', 'ZEIN', 'SKI(NPZ)-', 'SKI', 'ZKI', 'SKI<^', 'SHI', 'ZI', 'SODASS^$', 'SO DAS', 'ZU TAZ', 'SODAß^$', 'SO DAS', 'ZU TAZ', 'SOGENAN--^', 'SO GEN', 'ZU KEN', 'SOUND-', 'SAUN', 'ZAUN', 'STAATS^^', 'STAZ', 'ZTAZ', 'STADT^^', 'STAT', 'ZTAT', 'STANDE$', ' STANDE', ' ZTANTE', 'START^^', 'START', 'ZTART', 'STAURANT7', 'STORAN', 'ZTURAN', 'STEAK-', 'STE', 'ZTE', 'STEPHEN-^$', 'STEW', None, 'STERN', 'STERN', None, 'STRAF^^', 'STRAF', 'ZTRAF', "ST'S$", 'Z', 'Z', 'ST´S$', 'Z', 'Z', 'STST--', '', '', 'STS(ACEÈÉÊHIÌÍÎOUÄÜÖ)--', 'ST', 'ZT', 'ST(SZ)', 'Z', 'Z', 'SPAREN---^', 'SPA', 'ZPA', 'SPAREND----', ' SPA', ' ZPA', 'S(PTW)-^^', 'S', None, 'SP', 'SP', None, 'STYN(AE)-$', 'STIN', 'ZTIN', 'ST', 'ST', 'ZT', 'SUITE<', 'SIUT', 'ZIUT', 'SUKE--$', 'S', 'Z', 'SURF(EI)-', 'SÖRF', 'ZÖRF', 'SV(AEÈÉÊIÌÍÎOU)-<^', 'SW', None, 'SYB(IY)--^', 'SIB', None, 'SYL(KVW)--^', 'SI', None, 'SY9^', 'SÜ', None, 'SZE(NPT)-^', 'ZE', 'ZE', 'SZI(ELN)-^', 'ZI', 'ZI', 'SZCZ<', 'SH', 'Z', 'SZT<', 'ST', 'ZT', 'SZ<3', 'SH', 'Z', 'SÜL(KVW)--^', 'SI', None, 'S', None, 'Z', 'TCH', 'SH', 'Z', 'TD(AÄEIOÖRUÜY)-', 'T', None, 'TD(ÀÁÂÃÅÈÉÊËÌÍÎÏÒÓÔÕØÙÚÛÝŸ)-', 'T', None, 'TEAT-^', 'TEA', 'TEA', 'TERRAI7^', 'TERA', 'TERA', 'TE(LMNRST)-3^', 'TE', 'TE', 'TH<', 'T', 'T', 'TICHT-', 'TIK', 'TIK', 'TICH$', 'TIK', 'TIK', 'TIC$', 'TIZ', 'TIZ', 'TIGGESTELL-------', 'TIK ', 'TIK ', 'TIGSTELL-----', 'TIK ', 'TIK ', 'TOAS-^', 'TO', 'TU', 'TOILET-', 'TOLE', 'TULE', 'TOIN-', 'TOA', 'TUA', 'TRAECHTI-^', 'TRECHT', 'TREKT', 'TRAECHTIG--', ' TRECHT', ' TREKT', 'TRAINI-', 'TREN', 'TREN', 'TRÄCHTI-^', 'TRECHT', 'TREKT', 'TRÄCHTIG--', ' TRECHT', ' TREKT', 'TSCH', 'SH', 'Z', 'TSH', 'SH', 'Z', 'TST', 'ZT', 'ZT', 'T(Sß)', 'Z', 'Z', 'TT(SZ)--<', '', '', 'TT9', 'T', 'T', 'TV^$', 'TV', 'TV', 'TX(AEIOU)-3', 'SH', 'Z', 'TY9^', 'TÜ', None, 'TZ-', '', '', "T'S3$", 'Z', 'Z', 'T´S3$', 'Z', 'Z', 'UEBEL(GNRW)-^^', 'ÜBL ', 'IBL ', 'UEBER^^', 'ÜBA', 'IBA', 'UE2', 'Ü', 'I', 'UGL-', 'UK', None, 'UH(AOÖUÜY)-', 'UH', None, 'UIE$', 'Ü', 'I', 'UM^^', 'UM', 'UN', 'UNTERE--3', 'UNTE', 'UNTE', 'UNTER^^', 'UNTA', 'UNTA', 'UNVER^^', 'UNFA', 'UNFA', 'UN^^', 'UN', 'UN', 'UTI(AÄOÖUÜ)-', 'UZI', 'UZI', 'UVE-4', 'UW', None, 'UY2', 'UI', None, 'UZZ', 'AS', 'AZ', 'VACL-^', 'WAZ', 'FAZ', 'VAC$', 'WAZ', 'FAZ', 'VAN DEN ^', 'FANDN', 'FANTN', 'VANES-^', 'WANE', None, 'VATRO-', 'WATR', None, 'VA(DHJNT)--^', 'F', None, 'VEDD-^', 'FE', 'FE', 'VE(BEHIU)--^', 'F', None, 'VEL(BDLMNT)-^', 'FEL', None, 'VENTZ-^', 'FEN', None, 'VEN(NRSZ)-^', 'FEN', None, 'VER(AB)-^$', 'WER', None, 'VERBAL^$', 'WERBAL', None, 'VERBAL(EINS)-^', 'WERBAL', None, 'VERTEBR--', 'WERTE', None, 'VEREIN-----', 'F', None, 'VEREN(AEIOU)-^', 'WEREN', None, 'VERIFI', 'WERIFI', None, 'VERON(AEIOU)-^', 'WERON', None, 'VERSEN^', 'FERSN', 'FAZN', 'VERSIERT--^', 'WERSI', None, 'VERSIO--^', 'WERS', None, 'VERSUS', 'WERSUS', None, 'VERTI(GK)-', 'WERTI', None, 'VER^^', 'FER', 'FA', 'VERSPRECHE-------', ' FER', ' FA', 'VER$', 'WA', None, 'VER', 'FA', 'FA', 'VET(HT)-^', 'FET', 'FET', 'VETTE$', 'WET', 'FET', 'VE^', 'WE', None, 'VIC$', 'WIZ', 'FIZ', 'VIELSAGE----', 'FIL ', 'FIL ', 'VIEL', 'FIL', 'FIL', 'VIEW', 'WIU', 'FIU', 'VILL(AE)-', 'WIL', None, 'VIS(ACEIKUVWZ)-<^', 'WIS', None, 'VI(ELS)--^', 'F', None, 'VILLON--', 'WILI', 'FILI', 'VIZE^^', 'FIZE', 'FIZE', 'VLIE--^', 'FL', None, 'VL(AEIOU)--', 'W', None, 'VOKA-^', 'WOK', None, 'VOL(ATUVW)--^', 'WO', None, 'VOR^^', 'FOR', 'FUR', 'VR(AEIOU)--', 'W', None, 'VV9', 'W', None, 'VY9^', 'WÜ', 'FI', 'V(ÜY)-', 'W', None, 'V(ÀÁÂÃÅÈÉÊÌÍÎÙÚÛ)-', 'W', None, 'V(AEIJLRU)-<', 'W', None, 'V.^', 'V.', None, 'V<', 'F', 'F', 'WEITERENTWI-----^', 'WEITA ', 'FEITA ', 'WEITREICH-----^', 'WEIT ', 'FEIT ', 'WEITVER^', 'WEIT FER', 'FEIT FA', 'WE(LMNRST)-3^', 'WE', 'FE', 'WER(DST)-', 'WER', None, 'WIC$', 'WIZ', 'FIZ', 'WIEDERU--', 'WIDE', 'FITE', 'WIEDER^$', 'WIDA', 'FITA', 'WIEDER^^', 'WIDA ', 'FITA ', 'WIEVIEL', 'WI FIL', 'FI FIL', 'WISUEL', 'WISUEL', None, 'WR-^', 'W', None, 'WY9^', 'WÜ', 'FI', 'W(BDFGJKLMNPQRSTZ)-', 'F', None, 'W$', 'F', None, 'W', None, 'F', 'X<^', 'Z', 'Z', 'XHAVEN$', 'XAFN', None, 'X(CSZ)', 'X', 'X', 'XTS(CH)--', 'XT', 'XT', 'XT(SZ)', 'Z', 'Z', 'YE(LMNRST)-3^', 'IE', 'IE', 'YE-3', 'I', 'I', 'YOR(GK)^$', 'IÖRK', 'IÖRK', 'Y(AOU)-<7', 'I', 'I', 'Y(BKLMNPRSTX)-1', 'Ü', None, 'YVES^$', 'IF', 'IF', 'YVONNE^$', 'IWON', 'IFUN', 'Y.^', 'Y.', None, 'Y', 'I', 'I', 'ZC(AOU)-', 'SK', 'ZK', 'ZE(LMNRST)-3^', 'ZE', 'ZE', 'ZIEJ$', 'ZI', 'ZI', 'ZIGERJA(HR)-3', 'ZIGA IA', 'ZIKA IA', 'ZL(AEIOU)-', 'SL', None, 'ZS(CHT)--', '', '', 'ZS', 'SH', 'Z', 'ZUERST', 'ZUERST', 'ZUERST', 'ZUGRUNDE^$', 'ZU GRUNDE', 'ZU KRUNTE', 'ZUGRUNDE', 'ZU GRUNDE ', 'ZU KRUNTE ', 'ZUGUNSTEN', 'ZU GUNSTN', 'ZU KUNZTN', 'ZUHAUSE-', 'ZU HAUS', 'ZU AUZ', 'ZULASTEN^$', 'ZU LASTN', 'ZU LAZTN', 'ZURUECK^^', 'ZURÜK', 'ZURIK', 'ZURZEIT', 'ZUR ZEIT', 'ZUR ZEIT', 'ZURÜCK^^', 'ZURÜK', 'ZURIK', 'ZUSTANDE', 'ZU STANDE', 'ZU ZTANTE', 'ZUTAGE', 'ZU TAGE', 'ZU TAKE', 'ZUVER^^', 'ZUFA', 'ZUFA', 'ZUVIEL', 'ZU FIL', 'ZU FIL', 'ZUWENIG', 'ZU WENIK', 'ZU FENIK', 'ZY9^', 'ZÜ', None, 'ZYK3$', 'ZIK', None, 'Z(VW)7^', 'SW', None, None, None, None)¶
-
_rules_no_lang
= ('´', ' ', ' ', '"', ' ', ' ', '`$', '', '', "'", ' ', ' ', ',', ',', ',', ';', ',', ',', '-', ' ', ' ', ' ', ' ', ' ', '.', '.', '.', ':', '.', '.', 'Ä', 'AE', 'AE', 'Ö', 'OE', 'OE', 'Ü', 'UE', 'UE', 'ß', 'S', 'S', 'À', 'A', 'A', 'Á', 'A', 'A', 'Â', 'A', 'A', 'Ã', 'A', 'A', 'Å', 'A', 'A', 'Æ', 'AE', 'AE', 'Ç', 'C', 'C', 'Ð', 'DJ', 'DJ', 'È', 'E', 'E', 'É', 'E', 'E', 'Ê', 'E', 'E', 'Ë', 'E', 'E', 'Ì', 'I', 'I', 'Í', 'I', 'I', 'Î', 'I', 'I', 'Ï', 'I', 'I', 'Ñ', 'NH', 'NH', 'Ò', 'O', 'O', 'Ó', 'O', 'O', 'Ô', 'O', 'O', 'Õ', 'O', 'O', 'Œ', 'OE', 'OE', 'Ø', 'OE', 'OE', 'Š', 'SH', 'SH', 'Þ', 'TH', 'TH', 'Ù', 'U', 'U', 'Ú', 'U', 'U', 'Û', 'U', 'U', 'Ý', 'Y', 'Y', 'Ÿ', 'Y', 'Y', 'MC^', 'MAC', 'MAC', 'MC^', 'MAC', 'MAC', 'M´^', 'MAC', 'MAC', "M'^", 'MAC', 'MAC', 'O´^', 'O', 'O', "O'^", 'O', 'O', 'VAN DEN ^', 'VANDEN', 'VANDEN', None, None, None)¶
-
_upper_trans
= {97: 'A', 98: 'B', 99: 'C', 100: 'D', 101: 'E', 102: 'F', 103: 'G', 104: 'H', 105: 'I', 106: 'J', 107: 'K', 108: 'L', 109: 'M', 110: 'N', 111: 'O', 112: 'P', 113: 'Q', 114: 'R', 115: 'S', 116: 'T', 117: 'U', 118: 'V', 119: 'W', 120: 'X', 121: 'Y', 122: 'Z', 223: 'ß', 224: 'À', 225: 'Á', 226: 'Â', 227: 'Ã', 228: 'Ä', 229: 'Å', 230: 'Æ', 231: 'Ç', 232: 'È', 233: 'É', 234: 'Ê', 235: 'Ë', 236: 'Ì', 237: 'Í', 238: 'Î', 239: 'Ï', 240: 'Ð', 241: 'Ñ', 242: 'Ò', 243: 'Ó', 244: 'Ô', 245: 'Õ', 246: 'Ö', 248: 'Ø', 249: 'Ù', 250: 'Ú', 251: 'Û', 252: 'Ü', 253: 'Ý', 254: 'Þ', 255: 'Ÿ', 339: 'Œ', 353: 'Š'}¶
-
encode
(word)[source]¶ Return the phonet code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The phonet value
- Return type
str
Examples
>>> pe = Phonet() >>> pe.encode('Christopher') 'KRISTOFA' >>> pe.encode('Niall') 'NIAL' >>> pe.encode('Smith') 'SMIT' >>> pe.encode('Schmidt') 'SHMIT'
>>> pe2 = Phonet(mode=2) >>> pe2.encode('Christopher') 'KRIZTUFA' >>> pe2.encode('Niall') 'NIAL' >>> pe2.encode('Smith') 'ZNIT' >>> pe2.encode('Schmidt') 'ZNIT'
>>> pe_none = Phonet(lang='none') >>> pe_none.encode('Christopher') 'CHRISTOPHER' >>> pe_none.encode('Niall') 'NIAL' >>> pe_none.encode('Smith') 'SMITH' >>> pe_none.encode('Schmidt') 'SCHMIDT'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
phonet
(word, mode=1, lang='de')[source]¶ Return the phonet code for a word.
This is a wrapper for
Phonet.encode()
.- Parameters
word (str) -- The word to transform
mode (int) -- The ponet variant to employ (1 or 2)
lang (str) --
de
(default) for German,none
for no language
- Returns
The phonet value
- Return type
str
Examples
>>> phonet('Christopher') 'KRISTOFA' >>> phonet('Niall') 'NIAL' >>> phonet('Smith') 'SMIT' >>> phonet('Schmidt') 'SHMIT'
>>> phonet('Christopher', mode=2) 'KRIZTUFA' >>> phonet('Niall', mode=2) 'NIAL' >>> phonet('Smith', mode=2) 'ZNIT' >>> phonet('Schmidt', mode=2) 'ZNIT'
>>> phonet('Christopher', lang='none') 'CHRISTOPHER' >>> phonet('Niall', lang='none') 'NIAL' >>> phonet('Smith', lang='none') 'SMITH' >>> phonet('Schmidt', lang='none') 'SCHMIDT'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonet.encode method instead.
-
class
abydos.phonetic.
SoundexBR
(max_length=4, zero_pad=True)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
SoundexBR.
This is based on [Mar15].
New in version 0.3.6.
Initialize SoundexBR instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
New in version 0.4.0.
-
_alphabetic
= {48: 'A', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R'}¶
-
_trans
= {65: '0', 66: '1', 67: '2', 68: '3', 69: '0', 70: '1', 71: '2', 72: '0', 73: '0', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '0', 80: '1', 81: '2', 82: '6', 83: '2', 84: '3', 85: '0', 86: '1', 87: '0', 88: '2', 89: '0', 90: '2'}¶
-
encode
(word)[source]¶ Return the SoundexBR encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The SoundexBR code
- Return type
str
Examples
>>> pe = SoundexBR() >>> pe.encode('Oliveira') 'O416' >>> pe.encode('Almeida') 'A453' >>> pe.encode('Barbosa') 'B612' >>> pe.encode('Araújo') 'A620' >>> pe.encode('Gonçalves') 'G524' >>> pe.encode('Goncalves') 'G524'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic SoundexBR encoding of a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SoundexBR code
- Return type
str
Examples
>>> pe = SoundexBR() >>> pe.encode_alpha('Oliveira') 'OLPR' >>> pe.encode_alpha('Almeida') 'ALNT' >>> pe.encode_alpha('Barbosa') 'BRPK' >>> pe.encode_alpha('Araújo') 'ARK' >>> pe.encode_alpha('Gonçalves') 'GNKL' >>> pe.encode_alpha('Goncalves') 'GNKL'
New in version 0.4.0.
-
abydos.phonetic.
soundex_br
(word, max_length=4, zero_pad=True)[source]¶ Return the SoundexBR encoding of a word.
This is a wrapper for
SoundexBR.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 4)
zero_pad (bool) -- Pad the end of the return value with 0s to achieve a max_length string
- Returns
The SoundexBR code
- Return type
str
Examples
>>> soundex_br('Oliveira') 'O416' >>> soundex_br('Almeida') 'A453' >>> soundex_br('Barbosa') 'B612' >>> soundex_br('Araújo') 'A620' >>> soundex_br('Gonçalves') 'G524' >>> soundex_br('Goncalves') 'G524'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SoundexBR.encode method instead.
-
class
abydos.phonetic.
PhoneticSpanish
(max_length=-1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
PhoneticSpanish.
This follows the coding described in [AmonME12] and [delPAngelesEGGM15].
New in version 0.3.6.
Initialize PhoneticSpanish instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
New in version 0.4.0.
-
_alphabetic
= {48: 'P', 49: 'B', 50: 'F', 51: 'T', 52: 'S', 53: 'L', 54: 'N', 55: 'K', 56: 'G', 57: 'R'}¶
-
_trans
= {66: '1', 67: '4', 68: '3', 70: '2', 71: '8', 72: '2', 74: '8', 75: '7', 76: '5', 77: '6', 78: '6', 80: '0', 81: '7', 82: '9', 83: '4', 84: '3', 86: '1', 88: '4', 89: '5', 90: '4'}¶
-
_uc_set
= {'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'X', 'Y', 'Z'}¶
-
encode
(word)[source]¶ Return the PhoneticSpanish coding of word.
- Parameters
word (str) -- The word to transform
- Returns
The PhoneticSpanish code
- Return type
str
Examples
>>> pe = PhoneticSpanish() >>> pe.encode('Perez') '094' >>> pe.encode('Martinez') '69364' >>> pe.encode('Gutierrez') '83994' >>> pe.encode('Santiago') '4638' >>> pe.encode('Nicolás') '6454'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic PhoneticSpanish coding of word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic PhoneticSpanish code
- Return type
str
Examples
>>> pe = PhoneticSpanish() >>> pe.encode_alpha('Perez') 'PRS' >>> pe.encode_alpha('Martinez') 'NRTNS' >>> pe.encode_alpha('Gutierrez') 'GTRRS' >>> pe.encode_alpha('Santiago') 'SNTG' >>> pe.encode_alpha('Nicolás') 'NSLS'
New in version 0.4.0.
-
abydos.phonetic.
phonetic_spanish
(word, max_length=-1)[source]¶ Return the PhoneticSpanish coding of word.
This is a wrapper for
PhoneticSpanish.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to unlimited)
- Returns
The PhoneticSpanish code
- Return type
str
Examples
>>> phonetic_spanish('Perez') '094' >>> phonetic_spanish('Martinez') '69364' >>> phonetic_spanish('Gutierrez') '83994' >>> phonetic_spanish('Santiago') '4638' >>> phonetic_spanish('Nicolás') '6454'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PhoneticSpanish.encode method instead.
-
class
abydos.phonetic.
SpanishMetaphone
(max_length=6, modified=False)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Spanish Metaphone.
This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at https://github.com/amsqr/Spanish-Metaphone and discussed in [MLM12].
Modified version based on [delPAngelesBailonM16].
New in version 0.3.6.
Initialize AlphaSIS instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 6)
modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm
New in version 0.4.0.
-
encode
(word)[source]¶ Return the Spanish Metaphone of a word.
- Parameters
word (str) -- The word to transform
- Returns
The Spanish Metaphone code
- Return type
str
Examples
>>> pe = SpanishMetaphone() >>> pe.encode('Perez') 'PRZ' >>> pe.encode('Martinez') 'MRTNZ' >>> pe.encode('Gutierrez') 'GTRRZ' >>> pe.encode('Santiago') 'SNTG' >>> pe.encode('Nicolás') 'NKLS'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.phonetic.
spanish_metaphone
(word, max_length=6, modified=False)[source]¶ Return the Spanish Metaphone of a word.
This is a wrapper for
SpanishMetaphone.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to 6)
modified (bool) -- Set to True to use del Pilar Angeles & Bailón-Miguel's modified version of the algorithm
- Returns
The Spanish Metaphone code
- Return type
str
Examples
>>> spanish_metaphone('Perez') 'PRZ' >>> spanish_metaphone('Martinez') 'MRTNZ' >>> spanish_metaphone('Gutierrez') 'GTRRZ' >>> spanish_metaphone('Santiago') 'SNTG' >>> spanish_metaphone('Nicolás') 'NKLS'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SpanishMetaphone.encode method instead.
-
class
abydos.phonetic.
SfinxBis
(max_length=-1)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
SfinxBis code.
SfinxBis is a Soundex-like algorithm defined in [Axe09].
This implementation follows the reference implementation: [Sjoo09].
SfinxBis is intended chiefly for Swedish names.
New in version 0.3.6.
Initialize SfinxBis instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to unlimited)
New in version 0.4.0.
-
_adelstitler
= (' DE LA ', ' DE LAS ', ' DE LOS ', ' VAN DE ', ' VAN DEN ', ' VAN DER ', ' VON DEM ', ' VON DER ', ' AF ', ' AV ', ' DA ', ' DE ', ' DEL ', ' DEN ', ' DES ', ' DI ', ' DO ', ' DON ', ' DOS ', ' DU ', ' E ', ' IN ', ' LA ', ' LE ', ' MAC ', ' MC ', ' VAN ', ' VON ', ' Y ', ' S:T ')¶
-
_alphabetic
= {35: 'Š', 49: 'P', 50: 'K', 51: 'T', 52: 'L', 53: 'N', 54: 'R', 55: 'F', 56: 'S', 57: 'A'}¶
-
_harde_vokaler
= {'A', 'O', 'U', 'Å'}¶
-
_mjuka_vokaler
= {'E', 'I', 'Y', 'Ä', 'Ö'}¶
-
_substitutions
= {87: 'V', 90: 'S', 192: 'A', 193: 'A', 194: 'A', 195: 'A', 198: 'Ä', 199: 'C', 200: 'E', 201: 'E', 202: 'E', 203: 'E', 204: 'I', 205: 'I', 206: 'I', 207: 'I', 209: 'N', 210: 'O', 211: 'O', 212: 'O', 213: 'O', 216: 'Ö', 217: 'U', 218: 'U', 219: 'U', 220: 'Y', 221: 'Y'}¶
-
_trans
= {65: '9', 66: '1', 67: '2', 68: '3', 69: '9', 70: '7', 71: '2', 72: '9', 73: '9', 74: '2', 75: '2', 76: '4', 77: '5', 78: '5', 79: '9', 80: '1', 81: '2', 82: '6', 83: '8', 84: '3', 85: '9', 86: '7', 89: '9', 90: '8', 196: '9', 197: '9', 214: '9'}¶
-
_uc_c_set
= {'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Z'}¶
-
_uc_set
= {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'Ä', 'Å', 'Ö'}¶
-
encode
(word)[source]¶ Return the SfinxBis code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The SfinxBis value
- Return type
tuple
Examples
>>> pe = SfinxBis() >>> pe.encode('Christopher') ('K68376',) >>> pe.encode('Niall') ('N4',) >>> pe.encode('Smith') ('S53',) >>> pe.encode('Schmidt') ('S53',)
>>> pe.encode('Johansson') ('J585',) >>> pe.encode('Sjöberg') ('#162',)
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode_alpha
(word)[source]¶ Return the alphabetic SfinxBis code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic SfinxBis value
- Return type
tuple
Examples
>>> pe = SfinxBis() >>> pe.encode_alpha('Christopher') ('KRSTFR',) >>> pe.encode_alpha('Niall') ('NL',) >>> pe.encode_alpha('Smith') ('SNT',) >>> pe.encode_alpha('Schmidt') ('SNT',)
>>> pe.encode_alpha('Johansson') ('JNSN',) >>> pe.encode_alpha('Sjöberg') ('ŠPRK',)
New in version 0.4.0.
-
abydos.phonetic.
sfinxbis
(word, max_length=-1)[source]¶ Return the SfinxBis code for a word.
This is a wrapper for
SfinxBis.encode()
.- Parameters
word (str) -- The word to transform
max_length (int) -- The length of the code returned (defaults to unlimited)
- Returns
The SfinxBis value
- Return type
tuple
Examples
>>> sfinxbis('Christopher') ('K68376',) >>> sfinxbis('Niall') ('N4',) >>> sfinxbis('Smith') ('S53',) >>> sfinxbis('Schmidt') ('S53',)
>>> sfinxbis('Johansson') ('J585',) >>> sfinxbis('Sjöberg') ('#162',)
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SfinxBis.encode method instead.
-
class
abydos.phonetic.
Waahlin
(encoder=None)[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Wåhlin code.
Wåhlin's first-letter coding is based on the description in [Eri97].
New in version 0.3.6.
Initialize Waahlin instance.
- Parameters
encoder (_Phonetic) -- An initialized phonetic algorithm object
New in version 0.4.0.
-
_transforms
= {1: {'Q': 'K', 'W': 'V', 'Z': 'S', 'Ä': 'E'}, 2: {'AE': 'E', 'CH': 'K', 'DJ': 'J', 'GJ': 'J', 'HJ': 'J', 'HR': 'R', 'HV': 'V', 'HW': 'V', 'KJ': '+', 'LJ': 'J', 'PH': 'F', 'QU': 'KV', 'SJ': '*', 'TJ': '+'}, 3: {'SCH': '*', 'SKJ': '*', 'STJ': '*'}}¶
-
encode
(word, alphabetic=False)[source]¶ Return the Wåhlin code for a word.
- Parameters
word (str) -- The word to transform
alphabetic (bool) -- If True, the encoder will apply its alphabetic form (.encode_alpha rather than .encode)
- Returns
The Wåhlin code value
- Return type
str
Examples
>>> pe = Waahlin() >>> pe.encode('Christopher') 'KRISTOFER' >>> pe.encode('Niall') 'NJALL' >>> pe.encode('Smith') 'SMITH' >>> pe.encode('Schmidt') '*MIDT'
New in version 0.4.0.
-
encode_alpha
(word)[source]¶ Return the alphabetic Wåhlin code for a word.
- Parameters
word (str) -- The word to transform
- Returns
The alphabetic Wåhlin code value
- Return type
str
Examples
>>> pe = Waahlin() >>> pe.encode_alpha('Christopher') 'KRISTOFER' >>> pe.encode_alpha('Niall') 'NJALL' >>> pe.encode_alpha('Smith') 'SMITH' >>> pe.encode_alpha('Schmidt') 'ŠMIDT'
New in version 0.4.0.
-
class
abydos.phonetic.
Norphone
[source]¶ Bases:
abydos.phonetic._phonetic._Phonetic
Norphone.
The reference implementation by Lars Marius Garshol is available in [Gar15].
Norphone was designed for Norwegian, but this implementation has been extended to support Swedish vowels as well. This function incorporates the "not implemented" rules from the above file's rule set.
New in version 0.3.6.
-
_replacements
= {1: {'D': 'T', 'G': 'K', 'W': 'V', 'X': 'KS', 'Z': 'S'}, 2: {'CH': 'K', 'CK': 'K', 'GH': 'K', 'GJ': 'J', 'HG': 'K', 'HJ': 'J', 'HL': 'L', 'HR': 'R', 'KI': 'X', 'KJ': 'X', 'LD': 'L', 'ND': 'N', 'PH': 'F', 'SJ': 'X', 'TH': 'T'}, 3: {'KEI': 'X', 'SKJ': 'X'}, 4: {'SKEI': 'X'}}¶
-
_uc_v_set
= {'A', 'E', 'I', 'O', 'U', 'Y', 'Ä', 'Å', 'Æ', 'Ö', 'Ø'}¶
-
encode
(word)[source]¶ Return the Norphone code.
- Parameters
word (str) -- The word to transform
- Returns
The Norphone code
- Return type
str
Examples
>>> pe = Norphone() >>> pe.encode('Hansen') 'HNSN' >>> pe.encode('Larsen') 'LRSN' >>> pe.encode('Aagaard') 'ÅKRT' >>> pe.encode('Braaten') 'BRTN' >>> pe.encode('Sandvik') 'SNVK'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.phonetic.
norphone
(word)[source]¶ Return the Norphone code.
This is a wrapper for
Norphone.encode()
.- Parameters
word (str) -- The word to transform
- Returns
The Norphone code
- Return type
str
Examples
>>> norphone('Hansen') 'HNSN' >>> norphone('Larsen') 'LRSN' >>> norphone('Aagaard') 'ÅKRT' >>> norphone('Braaten') 'BRTN' >>> norphone('Sandvik') 'SNVK'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Norphone.encode method instead.