abydos.fingerprint package

abydos.fingerprint.

The fingerprint package implements string fingerprints such as:

  • Basic fingerprinters originating in OpenRefine <http://openrefine.org>:

    • String (String)
    • Phonetic, which applies a phonetic algorithm and returns the string fingerprint of the result (Phonetic)
    • QGram, which applies Q-gram tokenization and returns the string fingerprint of the result (QGram)
  • Fingerprints developed by Pollock & Zomora:

  • Fingerprints developed by Cisłak & Grabowski:

  • The Synoname toolcode (SynonameToolcode)

Each fingerprint class has a fingerprint method that takes a string and returns the string's fingerprint:

>>> sk = SkeletonKey()
>>> sk.fingerprint('orange')
'ORNGAE'
>>> sk.fingerprint('strange')
'STRNGAE'

class abydos.fingerprint.String[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

String Fingerprint.

The fingerprint of a string is a string consisting of all of the unique words in a string, alphabetized & concatenated with intervening joiners. This fingerprint is described at [Ope12].

fingerprint(phrase, joiner=' ')[source]

Return string fingerprint.

Parameters:
  • phrase (str) -- The string from which to calculate the fingerprint
  • joiner (str) -- The string that will be placed between each word
Returns:

The fingerprint of the phrase

Return type:

str

Example

>>> sf = String()
>>> sf.fingerprint('The quick brown fox jumped over the lazy dog.')
'brown dog fox jumped lazy over quick the'
abydos.fingerprint.str_fingerprint(phrase, joiner=' ')[source]

Return string fingerprint.

This is a wrapper for String.fingerprint().

Parameters:
  • phrase (str) -- The string from which to calculate the fingerprint
  • joiner (str) -- The string that will be placed between each word
Returns:

The fingerprint of the phrase

Return type:

str

Example

>>> str_fingerprint('The quick brown fox jumped over the lazy dog.')
'brown dog fox jumped lazy over quick the'
class abydos.fingerprint.QGram[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Q-Gram Fingerprint.

A q-gram fingerprint is a string consisting of all of the unique q-grams in a string, alphabetized & concatenated. This fingerprint is described at [Ope12].

fingerprint(phrase, qval=2, start_stop='', joiner='')[source]

Return Q-Gram fingerprint.

Parameters:
  • phrase (str) -- The string from which to calculate the q-gram fingerprint
  • qval (int) -- The length of each q-gram (by default 2)
  • start_stop (str) -- The start & stop symbol(s) to concatenate on either end of the phrase, as defined in tokenizer.QGrams
  • joiner (str) -- The string that will be placed between each word
Returns:

The q-gram fingerprint of the phrase

Return type:

str

Examples

>>> qf = QGram()
>>> qf.fingerprint('The quick brown fox jumped over the lazy dog.')
'azbrckdoedeleqerfoheicjukblampnfogovowoxpequrortthuiumvewnxjydzy'
>>> qf.fingerprint('Christopher')
'cherhehrisopphristto'
>>> qf.fingerprint('Niall')
'aliallni'
abydos.fingerprint.qgram_fingerprint(phrase, qval=2, start_stop='', joiner='')[source]

Return Q-Gram fingerprint.

This is a wrapper for QGram.fingerprint().

Parameters:
  • phrase (str) -- The string from which to calculate the q-gram fingerprint
  • qval (int) -- The length of each q-gram (by default 2)
  • start_stop (str) -- The start & stop symbol(s) to concatenate on either end of the phrase, as defined in tokenizer.QGrams
  • joiner (str) -- The string that will be placed between each word
Returns:

The q-gram fingerprint of the phrase

Return type:

str

Examples

>>> qgram_fingerprint('The quick brown fox jumped over the lazy dog.')
'azbrckdoedeleqerfoheicjukblampnfogovowoxpequrortthuiumvewnxjydzy'
>>> qgram_fingerprint('Christopher')
'cherhehrisopphristto'
>>> qgram_fingerprint('Niall')
'aliallni'
class abydos.fingerprint.Phonetic[source]

Bases: abydos.fingerprint._string.String

Phonetic Fingerprint.

A phonetic fingerprint is identical to a standard string fingerprint, as implemented in String, but performs the fingerprinting function after converting the string to its phonetic form, as determined by some phonetic algorithm. This fingerprint is described at [Ope12].

fingerprint(phrase, phonetic_algorithm=<function double_metaphone>, joiner=' ', *args, **kwargs)[source]

Return the phonetic fingerprint of a phrase.

Parameters:
  • phrase (str) -- The string from which to calculate the phonetic fingerprint
  • phonetic_algorithm (function) -- A phonetic algorithm that takes a string and returns a string (presumably a phonetic representation of the original string). By default, this function uses double_metaphone().
  • joiner (str) -- The string that will be placed between each word
  • *args -- Variable length argument list
  • **kwargs -- Arbitrary keyword arguments
Returns:

The phonetic fingerprint of the phrase

Return type:

str

Examples

>>> pf = Phonetic()
>>> pf.fingerprint('The quick brown fox jumped over the lazy dog.')
'0 afr fks jmpt kk ls prn tk'
>>> from abydos.phonetic import soundex
>>> pf.fingerprint('The quick brown fox jumped over the lazy dog.',
... phonetic_algorithm=soundex)
'b650 d200 f200 j513 l200 o160 q200 t000'
abydos.fingerprint.phonetic_fingerprint(phrase, phonetic_algorithm=<function double_metaphone>, joiner=' ', *args, **kwargs)[source]

Return the phonetic fingerprint of a phrase.

This is a wrapper for Phonetic.fingerprint().

Parameters:
  • phrase (str) -- The string from which to calculate the phonetic fingerprint
  • phonetic_algorithm (function) -- A phonetic algorithm that takes a string and returns a string (presumably a phonetic representation of the original string). By default, this function uses double_metaphone().
  • joiner (str) -- The string that will be placed between each word
  • *args -- Variable length argument list
  • **kwargs -- Arbitrary keyword arguments
Returns:

The phonetic fingerprint of the phrase

Return type:

str

Examples

>>> phonetic_fingerprint('The quick brown fox jumped over the lazy dog.')
'0 afr fks jmpt kk ls prn tk'
>>> from abydos.phonetic import soundex
>>> phonetic_fingerprint('The quick brown fox jumped over the lazy dog.',
... phonetic_algorithm=soundex)
'b650 d200 f200 j513 l200 o160 q200 t000'
class abydos.fingerprint.OmissionKey[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Omission Key.

The omission key of a word is defined in [PZ84].

fingerprint(word)[source]

Return the omission key.

Parameters:word (str) -- The word to transform into its omission key
Returns:The omission key
Return type:str

Examples

>>> ok = OmissionKey()
>>> ok.fingerprint('The quick brown fox jumped over the lazy dog.')
'JKQXZVWYBFMGPDHCLNTREUIOA'
>>> ok.fingerprint('Christopher')
'PHCTSRIOE'
>>> ok.fingerprint('Niall')
'LNIA'
abydos.fingerprint.omission_key(word)[source]

Return the omission key.

This is a wrapper for OmissionKey.fingerprint().

Parameters:word (str) -- The word to transform into its omission key
Returns:The omission key
Return type:str

Examples

>>> omission_key('The quick brown fox jumped over the lazy dog.')
'JKQXZVWYBFMGPDHCLNTREUIOA'
>>> omission_key('Christopher')
'PHCTSRIOE'
>>> omission_key('Niall')
'LNIA'
class abydos.fingerprint.SkeletonKey[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Skeleton Key.

The skeleton key of a word is defined in [PZ84].

fingerprint(word)[source]

Return the skeleton key.

Parameters:word (str) -- The word to transform into its skeleton key
Returns:The skeleton key
Return type:str

Examples

>>> sk = SkeletonKey()
>>> sk.fingerprint('The quick brown fox jumped over the lazy dog.')
'THQCKBRWNFXJMPDVLZYGEUIOA'
>>> sk.fingerprint('Christopher')
'CHRSTPIOE'
>>> sk.fingerprint('Niall')
'NLIA'
abydos.fingerprint.skeleton_key(word)[source]

Return the skeleton key.

This is a wrapper for SkeletonKey.fingerprint().

Parameters:word (str) -- The word to transform into its skeleton key
Returns:The skeleton key
Return type:str

Examples

>>> skeleton_key('The quick brown fox jumped over the lazy dog.')
'THQCKBRWNFXJMPDVLZYGEUIOA'
>>> skeleton_key('Christopher')
'CHRSTPIOE'
>>> skeleton_key('Niall')
'NLIA'
class abydos.fingerprint.Occurrence[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Occurrence Fingerprint.

Based on the occurrence fingerprint from [CislakG17].

fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the occurrence fingerprint.

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
Returns:

The occurrence fingerprint

Return type:

int

Examples

>>> of = Occurrence()
>>> bin(of.fingerprint('hat'))
'0b110000100000000'
>>> bin(of.fingerprint('niall'))
'0b10110000100000'
>>> bin(of.fingerprint('colin'))
'0b1110000110000'
>>> bin(of.fingerprint('atcg'))
'0b110000000010000'
>>> bin(of.fingerprint('entreatment'))
'0b1110010010000100'
abydos.fingerprint.occurrence_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the occurrence fingerprint.

This is a wrapper for Occurrence.fingerprint().

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
Returns:

The occurrence fingerprint

Return type:

int

Examples

>>> bin(occurrence_fingerprint('hat'))
'0b110000100000000'
>>> bin(occurrence_fingerprint('niall'))
'0b10110000100000'
>>> bin(occurrence_fingerprint('colin'))
'0b1110000110000'
>>> bin(occurrence_fingerprint('atcg'))
'0b110000000010000'
>>> bin(occurrence_fingerprint('entreatment'))
'0b1110010010000100'
class abydos.fingerprint.OccurrenceHalved[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Occurrence Halved Fingerprint.

Based on the occurrence halved fingerprint from [CislakG17].

fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the occurrence halved fingerprint.

Based on the occurrence halved fingerprint from [CislakG17].

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
Returns:

The occurrence halved fingerprint

Return type:

int

Examples

>>> ohf = OccurrenceHalved()
>>> bin(ohf.fingerprint('hat'))
'0b1010000000010'
>>> bin(ohf.fingerprint('niall'))
'0b10010100000'
>>> bin(ohf.fingerprint('colin'))
'0b1001010000'
>>> bin(ohf.fingerprint('atcg'))
'0b10100000000000'
>>> bin(ohf.fingerprint('entreatment'))
'0b1111010000110000'
abydos.fingerprint.occurrence_halved_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the occurrence halved fingerprint.

This is a wrapper for OccurrenceHalved.fingerprint().

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
Returns:

The occurrence halved fingerprint

Return type:

int

Examples

>>> bin(occurrence_halved_fingerprint('hat'))
'0b1010000000010'
>>> bin(occurrence_halved_fingerprint('niall'))
'0b10010100000'
>>> bin(occurrence_halved_fingerprint('colin'))
'0b1001010000'
>>> bin(occurrence_halved_fingerprint('atcg'))
'0b10100000000000'
>>> bin(occurrence_halved_fingerprint('entreatment'))
'0b1111010000110000'
class abydos.fingerprint.Count[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Count Fingerprint.

Based on the count fingerprint from [CislakG17].

fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the count fingerprint.

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
Returns:

The count fingerprint

Return type:

int

Examples

>>> cf = Count()
>>> bin(cf.fingerprint('hat'))
'0b1010000000001'
>>> bin(cf.fingerprint('niall'))
'0b10001010000'
>>> bin(cf.fingerprint('colin'))
'0b101010000'
>>> bin(cf.fingerprint('atcg'))
'0b1010000000000'
>>> bin(cf.fingerprint('entreatment'))
'0b1111010000100000'
abydos.fingerprint.count_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the count fingerprint.

This is a wrapper for Count.fingerprint().

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
Returns:

The count fingerprint

Return type:

int

Examples

>>> bin(count_fingerprint('hat'))
'0b1010000000001'
>>> bin(count_fingerprint('niall'))
'0b10001010000'
>>> bin(count_fingerprint('colin'))
'0b101010000'
>>> bin(count_fingerprint('atcg'))
'0b1010000000000'
>>> bin(count_fingerprint('entreatment'))
'0b1111010000100000'
class abydos.fingerprint.Position[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Position Fingerprint.

Based on the position fingerprint from [CislakG17].

fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'), bits_per_letter=3)[source]

Return the position fingerprint.

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
  • bits_per_letter (int) -- The bits to assign for letter position
Returns:

The position fingerprint

Return type:

int

Examples

>>> bin(position_fingerprint('hat'))
'0b1110100011111111'
>>> bin(position_fingerprint('niall'))
'0b1111110101110010'
>>> bin(position_fingerprint('colin'))
'0b1111111110010111'
>>> bin(position_fingerprint('atcg'))
'0b1110010001111111'
>>> bin(position_fingerprint('entreatment'))
'0b101011111111'
abydos.fingerprint.position_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'), bits_per_letter=3)[source]

Return the position fingerprint.

This is a wrapper for Position.fingerprint().

Parameters:
  • word (str) -- The word to fingerprint
  • n_bits (int) -- Number of bits in the fingerprint returned
  • most_common (list) -- The most common tokens in the target language, ordered by frequency
  • bits_per_letter (int) -- The bits to assign for letter position
Returns:

The position fingerprint

Return type:

int

Examples

>>> bin(position_fingerprint('hat'))
'0b1110100011111111'
>>> bin(position_fingerprint('niall'))
'0b1111110101110010'
>>> bin(position_fingerprint('colin'))
'0b1111111110010111'
>>> bin(position_fingerprint('atcg'))
'0b1110010001111111'
>>> bin(position_fingerprint('entreatment'))
'0b101011111111'
class abydos.fingerprint.SynonameToolcode[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Synoname Toolcode.

Cf. [JPGTrust91][Gro91].

fingerprint(lname, fname='', qual='', normalize=0)[source]

Build the Synoname toolcode.

Parameters:
  • lname (str) -- Last name
  • fname (str) -- First name (can be blank)
  • qual (str) -- Qualifier
  • normalize (int) -- Normalization mode (0, 1, or 2)
Returns:

The transformed names and the synoname toolcode

Return type:

tuple

Examples

>>> st = SynonameToolcode()
>>> st.fingerprint('hat')
('hat', '', '0000000003$$h')
>>> st.fingerprint('niall')
('niall', '', '0000000005$$n')
>>> st.fingerprint('colin')
('colin', '', '0000000005$$c')
>>> st.fingerprint('atcg')
('atcg', '', '0000000004$$a')
>>> st.fingerprint('entreatment')
('entreatment', '', '0000000011$$e')
>>> st.fingerprint('Ste.-Marie', 'Count John II', normalize=2)
('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
>>> st.fingerprint('Michelangelo IV', '', 'Workshop of')
('michelangelo iv', '', '3000550015$055b$mi')
abydos.fingerprint.synoname_toolcode(lname, fname='', qual='', normalize=0)[source]

Build the Synoname toolcode.

This is a wrapper for SynonameToolcode.fingerprint().

Parameters:
  • lname (str) -- Last name
  • fname (str) -- First name (can be blank)
  • qual (str) -- Qualifier
  • normalize (int) -- Normalization mode (0, 1, or 2)
Returns:

The transformed names and the synoname toolcode

Return type:

tuple

Examples

>>> synoname_toolcode('hat')
('hat', '', '0000000003$$h')
>>> synoname_toolcode('niall')
('niall', '', '0000000005$$n')
>>> synoname_toolcode('colin')
('colin', '', '0000000005$$c')
>>> synoname_toolcode('atcg')
('atcg', '', '0000000004$$a')
>>> synoname_toolcode('entreatment')
('entreatment', '', '0000000011$$e')
>>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
>>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
('michelangelo iv', '', '3000550015$055b$mi')