abydos.fingerprint package

abydos.fingerprint.

The fingerprint package implements string fingerprints such as:

  • Basic fingerprinters originating in OpenRefine <http://openrefine.org>:

    • String (String)

    • Phonetic, which applies a phonetic algorithm and returns the string fingerprint of the result (Phonetic)

    • QGram, which applies Q-gram tokenization and returns the string fingerprint of the result (QGram)

  • Fingerprints developed by Pollock & Zomora:

  • Fingerprints developed by Cisłak & Grabowski:

  • The Synoname toolcode (SynonameToolcode)

  • Taft's codings:

  • L.A. County Sheriff's System (LACSS)

  • Library of Congress Cutter table encoding (LCCutter)

  • Burrows-Wheeler transform (BWTF) and run-length encoded Burrows-Wheeler transform (BWTRLEF)

Each fingerprint class has a fingerprint method that takes a string and returns the string's fingerprint:

>>> sk = SkeletonKey()
>>> sk.fingerprint('orange')
'ORNGAE'
>>> sk.fingerprint('strange')
'STRNGAE'

class abydos.fingerprint.String(joiner=' ')[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

String Fingerprint.

The fingerprint of a string is a string consisting of all of the unique words in a string, alphabetized & concatenated with intervening joiners. This fingerprint is described at [Ope12].

New in version 0.3.6.

Initialize String instance.

Parameters

joiner (str) -- The string that will be placed between each word

New in version 0.4.0.

fingerprint(phrase)[source]

Return string fingerprint.

Parameters

phrase (str) -- The string from which to calculate the fingerprint

Returns

The fingerprint of the phrase

Return type

str

Example

>>> sf = String()
>>> sf.fingerprint('The quick brown fox jumped over the lazy dog.')
'brown dog fox jumped lazy over quick the'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.str_fingerprint(phrase, joiner=' ')[source]

Return string fingerprint.

This is a wrapper for String.fingerprint().

Parameters
  • phrase (str) -- The string from which to calculate the fingerprint

  • joiner (str) -- The string that will be placed between each word

Returns

The fingerprint of the phrase

Return type

str

Example

>>> str_fingerprint('The quick brown fox jumped over the lazy dog.')
'brown dog fox jumped lazy over quick the'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the String.fingerprint method instead.

class abydos.fingerprint.QGram(qval=2, start_stop='', joiner='', skip=0)[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Q-Gram Fingerprint.

A q-gram fingerprint is a string consisting of all of the unique q-grams in a string, alphabetized & concatenated. This fingerprint is described at [Ope12].

New in version 0.3.6.

Initialize Q-Gram fingerprinter.

qvalint

The length of each q-gram (by default 2)

start_stopstr

The start & stop symbol(s) to concatenate on either end of the phrase, as defined in tokenizer.QGrams

joinerstr

The string that will be placed between each word

skipint or Iterable

The number of characters to skip, can be an integer, range object, or list

New in version 0.4.0.

fingerprint(phrase)[source]

Return Q-Gram fingerprint.

Parameters

phrase (str) -- The string from which to calculate the q-gram fingerprint

Returns

The q-gram fingerprint of the phrase

Return type

str

Examples

>>> qf = QGram()
>>> qf.fingerprint('The quick brown fox jumped over the lazy dog.')
'azbrckdoedeleqerfoheicjukblampnfogovowoxpequrortthuiumvewnxjydzy'
>>> qf.fingerprint('Christopher')
'cherhehrisopphristto'
>>> qf.fingerprint('Niall')
'aliallni'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.qgram_fingerprint(phrase, qval=2, start_stop='', joiner='')[source]

Return Q-Gram fingerprint.

This is a wrapper for QGram.fingerprint().

Parameters
  • phrase (str) -- The string from which to calculate the q-gram fingerprint

  • qval (int) -- The length of each q-gram (by default 2)

  • start_stop (str) -- The start & stop symbol(s) to concatenate on either end of the phrase, as defined in tokenizer.QGrams

  • joiner (str) -- The string that will be placed between each word

Returns

The q-gram fingerprint of the phrase

Return type

str

Examples

>>> qgram_fingerprint('The quick brown fox jumped over the lazy dog.')
'azbrckdoedeleqerfoheicjukblampnfogovowoxpequrortthuiumvewnxjydzy'
>>> qgram_fingerprint('Christopher')
'cherhehrisopphristto'
>>> qgram_fingerprint('Niall')
'aliallni'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the QGram.fingerprint method instead.

class abydos.fingerprint.Phonetic(phonetic_algorithm=None, joiner=' ')[source]

Bases: abydos.fingerprint._string.String

Phonetic Fingerprint.

A phonetic fingerprint is identical to a standard string fingerprint, as implemented in String, but performs the fingerprinting function after converting the string to its phonetic form, as determined by some phonetic algorithm. This fingerprint is described at [Ope12].

New in version 0.3.6.

Initialize Phonetic instance.

phonetic_algorithmfunction

A phonetic algorithm that takes a string and returns a string (presumably a phonetic representation of the original string). By default, this function uses double_metaphone().

joinerstr

The string that will be placed between each word

New in version 0.4.0.

fingerprint(phrase)[source]

Return the phonetic fingerprint of a phrase.

Parameters

phrase (str) -- The string from which to calculate the phonetic fingerprint

Returns

The phonetic fingerprint of the phrase

Return type

str

Examples

>>> pf = Phonetic()
>>> pf.fingerprint('The quick brown fox jumped over the lazy dog.')
'0 afr fks jmpt kk ls prn tk'
>>> from abydos.phonetic import Soundex
>>> pf = Phonetic(Soundex())
>>> pf.fingerprint('The quick brown fox jumped over the lazy dog.')
'b650 d200 f200 j513 l200 o160 q200 t000'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.phonetic_fingerprint(phrase, phonetic_algorithm=<function double_metaphone>, joiner=' ', *args, **kwargs)[source]

Return the phonetic fingerprint of a phrase.

This is a wrapper for Phonetic.fingerprint().

Parameters
  • phrase (str) -- The string from which to calculate the phonetic fingerprint

  • phonetic_algorithm (function) -- A phonetic algorithm that takes a string and returns a string (presumably a phonetic representation of the original string). By default, this function uses double_metaphone().

  • joiner (str) -- The string that will be placed between each word

  • *args -- Variable length argument list

  • **kwargs -- Arbitrary keyword arguments

Returns

The phonetic fingerprint of the phrase

Return type

str

Examples

>>> phonetic_fingerprint('The quick brown fox jumped over the lazy dog.')
'0 afr fks jmpt kk ls prn tk'
>>> from abydos.phonetic import soundex
>>> phonetic_fingerprint('The quick brown fox jumped over the lazy dog.',
... phonetic_algorithm=soundex)
'b650 d200 f200 j513 l200 o160 q200 t000'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonetic.fingerprint method instead.

class abydos.fingerprint.OmissionKey[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Omission Key.

The omission key of a word is defined in [PZ84].

New in version 0.3.6.

fingerprint(word)[source]

Return the omission key.

Parameters

word (str) -- The word to transform into its omission key

Returns

The omission key

Return type

str

Examples

>>> ok = OmissionKey()
>>> ok.fingerprint('The quick brown fox jumped over the lazy dog.')
'JKQXZVWYBFMGPDHCLNTREUIOA'
>>> ok.fingerprint('Christopher')
'PHCTSRIOE'
>>> ok.fingerprint('Niall')
'LNIA'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.omission_key(word)[source]

Return the omission key.

This is a wrapper for OmissionKey.fingerprint().

Parameters

word (str) -- The word to transform into its omission key

Returns

The omission key

Return type

str

Examples

>>> omission_key('The quick brown fox jumped over the lazy dog.')
'JKQXZVWYBFMGPDHCLNTREUIOA'
>>> omission_key('Christopher')
'PHCTSRIOE'
>>> omission_key('Niall')
'LNIA'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the OmissionKey.fingerprint method instead.

class abydos.fingerprint.SkeletonKey[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Skeleton Key.

The skeleton key of a word is defined in [PZ84].

New in version 0.3.6.

fingerprint(word)[source]

Return the skeleton key.

Parameters

word (str) -- The word to transform into its skeleton key

Returns

The skeleton key

Return type

str

Examples

>>> sk = SkeletonKey()
>>> sk.fingerprint('The quick brown fox jumped over the lazy dog.')
'THQCKBRWNFXJMPDVLZYGEUIOA'
>>> sk.fingerprint('Christopher')
'CHRSTPIOE'
>>> sk.fingerprint('Niall')
'NLIA'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.skeleton_key(word)[source]

Return the skeleton key.

This is a wrapper for SkeletonKey.fingerprint().

Parameters

word (str) -- The word to transform into its skeleton key

Returns

The skeleton key

Return type

str

Examples

>>> skeleton_key('The quick brown fox jumped over the lazy dog.')
'THQCKBRWNFXJMPDVLZYGEUIOA'
>>> skeleton_key('Christopher')
'CHRSTPIOE'
>>> skeleton_key('Niall')
'NLIA'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SkeletonKey.fingerprint method instead.

class abydos.fingerprint.Occurrence(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Occurrence Fingerprint.

Based on the occurrence fingerprint from [CislakG17].

New in version 0.3.6.

Initialize Count instance.

Parameters
  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

New in version 0.4.0.

fingerprint(word)[source]

Return the occurrence fingerprint.

Parameters

word (str) -- The word to fingerprint

Returns

The occurrence fingerprint

Return type

int

Examples

>>> of = Occurrence()
>>> bin(of.fingerprint('hat'))
'0b110000100000000'
>>> bin(of.fingerprint('niall'))
'0b10110000100000'
>>> bin(of.fingerprint('colin'))
'0b1110000110000'
>>> bin(of.fingerprint('atcg'))
'0b110000000010000'
>>> bin(of.fingerprint('entreatment'))
'0b1110010010000100'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.occurrence_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the occurrence fingerprint.

This is a wrapper for Occurrence.fingerprint().

Parameters
  • word (str) -- The word to fingerprint

  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

Returns

The occurrence fingerprint

Return type

int

Examples

>>> bin(occurrence_fingerprint('hat'))
'0b110000100000000'
>>> bin(occurrence_fingerprint('niall'))
'0b10110000100000'
>>> bin(occurrence_fingerprint('colin'))
'0b1110000110000'
>>> bin(occurrence_fingerprint('atcg'))
'0b110000000010000'
>>> bin(occurrence_fingerprint('entreatment'))
'0b1110010010000100'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Occurrence.fingerprint method instead.

class abydos.fingerprint.OccurrenceHalved(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Occurrence Halved Fingerprint.

Based on the occurrence halved fingerprint from [CislakG17].

New in version 0.3.6.

Initialize Count instance.

Parameters
  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

New in version 0.4.0.

fingerprint(word)[source]

Return the occurrence halved fingerprint.

Based on the occurrence halved fingerprint from [CislakG17].

Parameters
  • word (str) -- The word to fingerprint

  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

Returns

The occurrence halved fingerprint

Return type

int

Examples

>>> ohf = OccurrenceHalved()
>>> bin(ohf.fingerprint('hat'))
'0b1010000000010'
>>> bin(ohf.fingerprint('niall'))
'0b10010100000'
>>> bin(ohf.fingerprint('colin'))
'0b1001010000'
>>> bin(ohf.fingerprint('atcg'))
'0b10100000000000'
>>> bin(ohf.fingerprint('entreatment'))
'0b1111010000110000'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.occurrence_halved_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the occurrence halved fingerprint.

This is a wrapper for OccurrenceHalved.fingerprint().

Parameters
  • word (str) -- The word to fingerprint

  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

Returns

The occurrence halved fingerprint

Return type

int

Examples

>>> bin(occurrence_halved_fingerprint('hat'))
'0b1010000000010'
>>> bin(occurrence_halved_fingerprint('niall'))
'0b10010100000'
>>> bin(occurrence_halved_fingerprint('colin'))
'0b1001010000'
>>> bin(occurrence_halved_fingerprint('atcg'))
'0b10100000000000'
>>> bin(occurrence_halved_fingerprint('entreatment'))
'0b1111010000110000'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the OccurrenceHalved.fingerprint method instead.

class abydos.fingerprint.Count(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Count Fingerprint.

Based on the count fingerprint from [CislakG17].

New in version 0.3.6.

Initialize Count instance.

Parameters
  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

New in version 0.4.0.

fingerprint(word)[source]

Return the count fingerprint.

Parameters

word (str) -- The word to fingerprint

Returns

The count fingerprint

Return type

int

Examples

>>> cf = Count()
>>> bin(cf.fingerprint('hat'))
'0b1010000000001'
>>> bin(cf.fingerprint('niall'))
'0b10001010000'
>>> bin(cf.fingerprint('colin'))
'0b101010000'
>>> bin(cf.fingerprint('atcg'))
'0b1010000000000'
>>> bin(cf.fingerprint('entreatment'))
'0b1111010000100000'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.count_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]

Return the count fingerprint.

This is a wrapper for Count.fingerprint().

Parameters
  • word (str) -- The word to fingerprint

  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

Returns

The count fingerprint

Return type

int

Examples

>>> bin(count_fingerprint('hat'))
'0b1010000000001'
>>> bin(count_fingerprint('niall'))
'0b10001010000'
>>> bin(count_fingerprint('colin'))
'0b101010000'
>>> bin(count_fingerprint('atcg'))
'0b1010000000000'
>>> bin(count_fingerprint('entreatment'))
'0b1111010000100000'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Count.fingerprint method instead.

class abydos.fingerprint.Position(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'), bits_per_letter=3)[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Position Fingerprint.

Based on the position fingerprint from [CislakG17].

New in version 0.3.6.

Initialize Count instance.

Parameters
  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

New in version 0.4.0.

fingerprint(word)[source]

Return the position fingerprint.

Parameters

word (str) -- The word to fingerprint

Returns

The position fingerprint

Return type

int

Examples

>>> bin(position_fingerprint('hat'))
'0b1110100011111111'
>>> bin(position_fingerprint('niall'))
'0b1111110101110010'
>>> bin(position_fingerprint('colin'))
'0b1111111110010111'
>>> bin(position_fingerprint('atcg'))
'0b1110010001111111'
>>> bin(position_fingerprint('entreatment'))
'0b101011111111'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.position_fingerprint(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'), bits_per_letter=3)[source]

Return the position fingerprint.

This is a wrapper for Position.fingerprint().

Parameters
  • word (str) -- The word to fingerprint

  • n_bits (int) -- Number of bits in the fingerprint returned

  • most_common (list) -- The most common tokens in the target language, ordered by frequency

  • bits_per_letter (int) -- The bits to assign for letter position

Returns

The position fingerprint

Return type

int

Examples

>>> bin(position_fingerprint('hat'))
'0b1110100011111111'
>>> bin(position_fingerprint('niall'))
'0b1111110101110010'
>>> bin(position_fingerprint('colin'))
'0b1111111110010111'
>>> bin(position_fingerprint('atcg'))
'0b1110010001111111'
>>> bin(position_fingerprint('entreatment'))
'0b101011111111'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Position.fingerprint method instead.

class abydos.fingerprint.SynonameToolcode[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Synoname Toolcode.

Cf. [JPGTrust91][Gro91].

New in version 0.3.6.

fingerprint(lname, fname='', qual='', normalize=0)[source]

Build the Synoname toolcode.

Parameters
  • lname (str) -- Last name

  • fname (str) -- First name (can be blank)

  • qual (str) -- Qualifier

  • normalize (int) -- Normalization mode (0, 1, or 2)

Returns

The transformed names and the synoname toolcode

Return type

tuple

Examples

>>> st = SynonameToolcode()
>>> st.fingerprint('hat')
('hat', '', '0000000003$$h')
>>> st.fingerprint('niall')
('niall', '', '0000000005$$n')
>>> st.fingerprint('colin')
('colin', '', '0000000005$$c')
>>> st.fingerprint('atcg')
('atcg', '', '0000000004$$a')
>>> st.fingerprint('entreatment')
('entreatment', '', '0000000011$$e')
>>> st.fingerprint('Ste.-Marie', 'Count John II', normalize=2)
('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
>>> st.fingerprint('Michelangelo IV', '', 'Workshop of')
('michelangelo iv', '', '3000550015$055b$mi')

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.fingerprint.synoname_toolcode(lname, fname='', qual='', normalize=0)[source]

Build the Synoname toolcode.

This is a wrapper for SynonameToolcode.fingerprint().

Parameters
  • lname (str) -- Last name

  • fname (str) -- First name (can be blank)

  • qual (str) -- Qualifier

  • normalize (int) -- Normalization mode (0, 1, or 2)

Returns

The transformed names and the synoname toolcode

Return type

tuple

Examples

>>> synoname_toolcode('hat')
('hat', '', '0000000003$$h')
>>> synoname_toolcode('niall')
('niall', '', '0000000005$$n')
>>> synoname_toolcode('colin')
('colin', '', '0000000005$$c')
>>> synoname_toolcode('atcg')
('atcg', '', '0000000004$$a')
>>> synoname_toolcode('entreatment')
('entreatment', '', '0000000011$$e')
>>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
>>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
('michelangelo iv', '', '3000550015$055b$mi')

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SynonameToolcode.fingerprint method instead.

class abydos.fingerprint.Consonant(variant=1, doubles=True, vowels=None)[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Consonant Coding Fingerprint.

Based on the consonant coding from [Taf70], variants 1, 2, 3, 1-D, 2-D, and 3-D.

New in version 0.4.1.

Initialize Consonant instance.

Parameters
  • variant (int) --

    Selects between Taft's 3 variants, which assign to the vowel set one of:

    1. A, E, I, O, & U

    2. A, E, I, O, U, W, & Y

    3. A, E, I, O, U, W, H, & Y

  • doubles (bool) -- If set to False, multiple consonants in a row are conflated to a single instance.

  • vowels (list, set, or str) -- Setting vowels to a non-None value overrides the variant setting and defines the set of letters to be removed from the input.

New in version 0.4.1.

fingerprint(word)[source]

Return the consonant coding.

Parameters

word (str) -- The word to fingerprint

Returns

The consonant coding

Return type

int

Examples

>>> cf = Consonant()
>>> cf.fingerprint('hat')
'HT'
>>> cf.fingerprint('niall')
'NLL'
>>> cf.fingerprint('colin')
'CLN'
>>> cf.fingerprint('atcg')
'ATCG'
>>> cf.fingerprint('entreatment')
'ENTRTMNT'

New in version 0.4.1.

class abydos.fingerprint.Extract(letter_list=1)[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Extract Letter List fingerprint.

Based on the extract letter list coding from [Taf70], for lists 1, 2, 3, & 4.

New in version 0.4.1.

Initialize Extract instance.

Parameters

letter_list (int or iterable) -- If an integer (1-4) is supplied, Taft's specified letter lists are used. If an iterable is supplied, its values will be used as the list of letters to remove (in order).

New in version 0.4.1.

fingerprint(word)[source]

Return the extract letter list coding.

Parameters

word (str) -- The word to fingerprint

Returns

The extract letter list coding

Return type

int

Examples

>>> fp = Extract()
>>> fp.fingerprint('hat')
'HAT'
>>> fp.fingerprint('niall')
'NILL'
>>> fp.fingerprint('colin')
'CLIN'
>>> fp.fingerprint('atcg')
'ATCG'
>>> fp.fingerprint('entreatment')
'NRMN'

New in version 0.4.1.

class abydos.fingerprint.ExtractPositionFrequency[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Extract - Position & Frequency fingerprint.

Based on the extract - position & frequency coding from [Taf70].

New in version 0.4.1.

fingerprint(word)[source]

Return the extract - position & frequency coding.

Parameters

word (str) -- The word to fingerprint

Returns

The extract - position & frequency coding

Return type

int

Examples

>>> fp = ExtractPositionFrequency()
>>> fp.fingerprint('hat')
'HAT'
>>> fp.fingerprint('niall')
'NILL'
>>> fp.fingerprint('colin')
'COLN'
>>> fp.fingerprint('atcg')
'ATCG'
>>> fp.fingerprint('entreatment')
'NMNT'

New in version 0.4.1.

class abydos.fingerprint.LACSS[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

L.A. County Sheriff's System fingerprint.

Based on the description from [Taf70].

New in version 0.4.1.

fingerprint(word)[source]

Return the LACSS coding.

Parameters

word (str) -- The word to fingerprint

Returns

The L.A. County Sheriff's System fingerprint

Return type

int

Examples

>>> cf = LACSS()
>>> cf.fingerprint('hat')
'4911211'
>>> cf.fingerprint('niall')
'6488374'
>>> cf.fingerprint('colin')
'3015957'
>>> cf.fingerprint('atcg')
'1772371'
>>> cf.fingerprint('entreatment')
'3882324'

New in version 0.4.1.

class abydos.fingerprint.LCCutter(max_length=64)[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Library of Congress Cutter table encoding.

This is based on the Library of Congress Cutter table encoding scheme, as described at https://www.loc.gov/aba/pcc/053/table.html [oC13]. Handling for numerals is not included.

New in version 0.4.1.

Initialize LCCutter instance.

Parameters

max_length (int) -- The length of the code returned (defaults to 64)

New in version 0.4.1.

fingerprint(word)[source]

Return the Library of Congress Cutter table encoding of a word.

Parameters

word (str) -- The word to fingerprint

Returns

The Library of Congress Cutter table encoding

Return type

str

Examples

>>> cf = LCCutter()
>>> cf.fingerprint('hat')
'H38'
>>> cf.fingerprint('niall')
'N5355'
>>> cf.fingerprint('colin')
'C6556'
>>> cf.fingerprint('atcg')
'A834'
>>> cf.fingerprint('entreatment')
'E5874386468'

New in version 0.4.1.

class abydos.fingerprint.BWTF(terminator='x00')[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Burrows-Wheeler transform fingerprint.

This is a wrapper of the BWT class in abydos.compression, which provides the same interface as other descendants of _Fingerprint.

New in version 0.4.1.

Initialize BWTF instance.

Parameters

terminator (str) -- A character added to signal the end of the string

New in version 0.4.1.

fingerprint(word)[source]

Return the Burrows-Wheeler transform of a word.

Parameters

word (str) -- The word to fingerprint

Returns

The Burrows-Wheeler transform of a word

Return type

str

Examples

>>> fp = BWTF()
>>> fp.fingerprint('hat')
'th\x00a'
>>> fp.fingerprint('niall')
'linla\x00'
>>> fp.fingerprint('colin')
'n\x00loic'
>>> fp.fingerprint('atcg')
'g\x00tca'
>>> fp.fingerprint('entreatment')
'term\x00teetnan'

New in version 0.4.1.

class abydos.fingerprint.BWTRLEF(terminator='x00')[source]

Bases: abydos.fingerprint._fingerprint._Fingerprint

Burrows-Wheeler transform plus run-length encoding fingerprint.

This is a wrapper of the BWT and RLE classes in abydos.compression, which provides the same interface as other descendants of _Fingerprint.

New in version 0.4.1.

Initialize BWTRLEF instance.

Parameters

terminator (str) -- A character added to signal the end of the string

New in version 0.4.1.

fingerprint(word)[source]

Return the run-length encoded Burrows-Wheeler transform of a word.

Parameters

word (str) -- The word to fingerprint

Returns

The run-length encoded Burrows-Wheeler transform of a word

Return type

str

Examples

>>> fp = BWTRLEF()
>>> fp.fingerprint('hat')
'th\x00a'
>>> fp.fingerprint('niall')
'linla\x00'
>>> fp.fingerprint('colin')
'n\x00loic'
>>> fp.fingerprint('atcg')
'g\x00tca'
>>> fp.fingerprint('entreatment')
'term\x00teetnan'

New in version 0.4.1.