abydos.fingerprint package¶
abydos.fingerprint.
The fingerprint package implements string fingerprints such as:
Basic fingerprinters originating in OpenRefine <http://openrefine.org>:
Fingerprints developed by Pollock & Zomora:
Skeleton key (
SkeletonKey
)Omission key (
OmissionKey
)Fingerprints developed by Cisłak & Grabowski:
Occurrence (
Occurrence
)Occurrence halved (
OccurrenceHalved
)Count (
Count
)Position (
Position
)The Synoname toolcode (
SynonameToolcode
)Taft's codings:
Consonant coding (
Consonant
)Extract - letter list (
Extract
)Extract - position & frequency (
ExtractPositionFrequency
)L.A. County Sheriff's System (
LACSS
)Library of Congress Cutter table encoding (
LCCutter
)Burrows-Wheeler transform (
BWTF
) and run-length encoded Burrows-Wheeler transform (BWTRLEF
)
Each fingerprint class has a fingerprint
method that takes a string and
returns the string's fingerprint:
>>> sk = SkeletonKey()
>>> sk.fingerprint('orange')
'ORNGAE'
>>> sk.fingerprint('strange')
'STRNGAE'
-
class
abydos.fingerprint.
String
(joiner=' ')[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
String Fingerprint.
The fingerprint of a string is a string consisting of all of the unique words in a string, alphabetized & concatenated with intervening joiners. This fingerprint is described at [Ope12].
New in version 0.3.6.
Initialize String instance.
- Parameters
joiner (str) -- The string that will be placed between each word
New in version 0.4.0.
-
fingerprint
(phrase)[source]¶ Return string fingerprint.
- Parameters
phrase (str) -- The string from which to calculate the fingerprint
- Returns
The fingerprint of the phrase
- Return type
str
Example
>>> sf = String() >>> sf.fingerprint('The quick brown fox jumped over the lazy dog.') 'brown dog fox jumped lazy over quick the'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
str_fingerprint
(phrase, joiner=' ')[source]¶ Return string fingerprint.
This is a wrapper for
String.fingerprint()
.- Parameters
phrase (str) -- The string from which to calculate the fingerprint
joiner (str) -- The string that will be placed between each word
- Returns
The fingerprint of the phrase
- Return type
str
Example
>>> str_fingerprint('The quick brown fox jumped over the lazy dog.') 'brown dog fox jumped lazy over quick the'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the String.fingerprint method instead.
-
class
abydos.fingerprint.
QGram
(qval=2, start_stop='', joiner='', skip=0)[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Q-Gram Fingerprint.
A q-gram fingerprint is a string consisting of all of the unique q-grams in a string, alphabetized & concatenated. This fingerprint is described at [Ope12].
New in version 0.3.6.
Initialize Q-Gram fingerprinter.
- qvalint
The length of each q-gram (by default 2)
- start_stopstr
The start & stop symbol(s) to concatenate on either end of the phrase, as defined in
tokenizer.QGrams
- joinerstr
The string that will be placed between each word
- skipint or Iterable
The number of characters to skip, can be an integer, range object, or list
New in version 0.4.0.
-
fingerprint
(phrase)[source]¶ Return Q-Gram fingerprint.
- Parameters
phrase (str) -- The string from which to calculate the q-gram fingerprint
- Returns
The q-gram fingerprint of the phrase
- Return type
str
Examples
>>> qf = QGram() >>> qf.fingerprint('The quick brown fox jumped over the lazy dog.') 'azbrckdoedeleqerfoheicjukblampnfogovowoxpequrortthuiumvewnxjydzy' >>> qf.fingerprint('Christopher') 'cherhehrisopphristto' >>> qf.fingerprint('Niall') 'aliallni'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
qgram_fingerprint
(phrase, qval=2, start_stop='', joiner='')[source]¶ Return Q-Gram fingerprint.
This is a wrapper for
QGram.fingerprint()
.- Parameters
phrase (str) -- The string from which to calculate the q-gram fingerprint
qval (int) -- The length of each q-gram (by default 2)
start_stop (str) -- The start & stop symbol(s) to concatenate on either end of the phrase, as defined in
tokenizer.QGrams
joiner (str) -- The string that will be placed between each word
- Returns
The q-gram fingerprint of the phrase
- Return type
str
Examples
>>> qgram_fingerprint('The quick brown fox jumped over the lazy dog.') 'azbrckdoedeleqerfoheicjukblampnfogovowoxpequrortthuiumvewnxjydzy' >>> qgram_fingerprint('Christopher') 'cherhehrisopphristto' >>> qgram_fingerprint('Niall') 'aliallni'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the QGram.fingerprint method instead.
-
class
abydos.fingerprint.
Phonetic
(phonetic_algorithm=None, joiner=' ')[source]¶ Bases:
abydos.fingerprint._string.String
Phonetic Fingerprint.
A phonetic fingerprint is identical to a standard string fingerprint, as implemented in
String
, but performs the fingerprinting function after converting the string to its phonetic form, as determined by some phonetic algorithm. This fingerprint is described at [Ope12].New in version 0.3.6.
Initialize Phonetic instance.
- phonetic_algorithmfunction
A phonetic algorithm that takes a string and returns a string (presumably a phonetic representation of the original string). By default, this function uses
double_metaphone()
.- joinerstr
The string that will be placed between each word
New in version 0.4.0.
-
fingerprint
(phrase)[source]¶ Return the phonetic fingerprint of a phrase.
- Parameters
phrase (str) -- The string from which to calculate the phonetic fingerprint
- Returns
The phonetic fingerprint of the phrase
- Return type
str
Examples
>>> pf = Phonetic() >>> pf.fingerprint('The quick brown fox jumped over the lazy dog.') '0 afr fks jmpt kk ls prn tk'
>>> from abydos.phonetic import Soundex >>> pf = Phonetic(Soundex()) >>> pf.fingerprint('The quick brown fox jumped over the lazy dog.') 'b650 d200 f200 j513 l200 o160 q200 t000'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
phonetic_fingerprint
(phrase, phonetic_algorithm=<function double_metaphone>, joiner=' ', *args, **kwargs)[source]¶ Return the phonetic fingerprint of a phrase.
This is a wrapper for
Phonetic.fingerprint()
.- Parameters
phrase (str) -- The string from which to calculate the phonetic fingerprint
phonetic_algorithm (function) -- A phonetic algorithm that takes a string and returns a string (presumably a phonetic representation of the original string). By default, this function uses
double_metaphone()
.joiner (str) -- The string that will be placed between each word
*args -- Variable length argument list
**kwargs -- Arbitrary keyword arguments
- Returns
The phonetic fingerprint of the phrase
- Return type
str
Examples
>>> phonetic_fingerprint('The quick brown fox jumped over the lazy dog.') '0 afr fks jmpt kk ls prn tk'
>>> from abydos.phonetic import soundex >>> phonetic_fingerprint('The quick brown fox jumped over the lazy dog.', ... phonetic_algorithm=soundex) 'b650 d200 f200 j513 l200 o160 q200 t000'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Phonetic.fingerprint method instead.
-
class
abydos.fingerprint.
OmissionKey
[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Omission Key.
The omission key of a word is defined in [PZ84].
New in version 0.3.6.
-
fingerprint
(word)[source]¶ Return the omission key.
- Parameters
word (str) -- The word to transform into its omission key
- Returns
The omission key
- Return type
str
Examples
>>> ok = OmissionKey() >>> ok.fingerprint('The quick brown fox jumped over the lazy dog.') 'JKQXZVWYBFMGPDHCLNTREUIOA' >>> ok.fingerprint('Christopher') 'PHCTSRIOE' >>> ok.fingerprint('Niall') 'LNIA'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.fingerprint.
omission_key
(word)[source]¶ Return the omission key.
This is a wrapper for
OmissionKey.fingerprint()
.- Parameters
word (str) -- The word to transform into its omission key
- Returns
The omission key
- Return type
str
Examples
>>> omission_key('The quick brown fox jumped over the lazy dog.') 'JKQXZVWYBFMGPDHCLNTREUIOA' >>> omission_key('Christopher') 'PHCTSRIOE' >>> omission_key('Niall') 'LNIA'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the OmissionKey.fingerprint method instead.
-
class
abydos.fingerprint.
SkeletonKey
[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Skeleton Key.
The skeleton key of a word is defined in [PZ84].
New in version 0.3.6.
-
fingerprint
(word)[source]¶ Return the skeleton key.
- Parameters
word (str) -- The word to transform into its skeleton key
- Returns
The skeleton key
- Return type
str
Examples
>>> sk = SkeletonKey() >>> sk.fingerprint('The quick brown fox jumped over the lazy dog.') 'THQCKBRWNFXJMPDVLZYGEUIOA' >>> sk.fingerprint('Christopher') 'CHRSTPIOE' >>> sk.fingerprint('Niall') 'NLIA'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.fingerprint.
skeleton_key
(word)[source]¶ Return the skeleton key.
This is a wrapper for
SkeletonKey.fingerprint()
.- Parameters
word (str) -- The word to transform into its skeleton key
- Returns
The skeleton key
- Return type
str
Examples
>>> skeleton_key('The quick brown fox jumped over the lazy dog.') 'THQCKBRWNFXJMPDVLZYGEUIOA' >>> skeleton_key('Christopher') 'CHRSTPIOE' >>> skeleton_key('Niall') 'NLIA'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SkeletonKey.fingerprint method instead.
-
class
abydos.fingerprint.
Occurrence
(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Occurrence Fingerprint.
Based on the occurrence fingerprint from [CislakG17].
New in version 0.3.6.
Initialize Count instance.
- Parameters
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
New in version 0.4.0.
-
fingerprint
(word)[source]¶ Return the occurrence fingerprint.
- Parameters
word (str) -- The word to fingerprint
- Returns
The occurrence fingerprint
- Return type
int
Examples
>>> of = Occurrence() >>> bin(of.fingerprint('hat')) '0b110000100000000' >>> bin(of.fingerprint('niall')) '0b10110000100000' >>> bin(of.fingerprint('colin')) '0b1110000110000' >>> bin(of.fingerprint('atcg')) '0b110000000010000' >>> bin(of.fingerprint('entreatment')) '0b1110010010000100'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
occurrence_fingerprint
(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]¶ Return the occurrence fingerprint.
This is a wrapper for
Occurrence.fingerprint()
.- Parameters
word (str) -- The word to fingerprint
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
- Returns
The occurrence fingerprint
- Return type
int
Examples
>>> bin(occurrence_fingerprint('hat')) '0b110000100000000' >>> bin(occurrence_fingerprint('niall')) '0b10110000100000' >>> bin(occurrence_fingerprint('colin')) '0b1110000110000' >>> bin(occurrence_fingerprint('atcg')) '0b110000000010000' >>> bin(occurrence_fingerprint('entreatment')) '0b1110010010000100'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Occurrence.fingerprint method instead.
-
class
abydos.fingerprint.
OccurrenceHalved
(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Occurrence Halved Fingerprint.
Based on the occurrence halved fingerprint from [CislakG17].
New in version 0.3.6.
Initialize Count instance.
- Parameters
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
New in version 0.4.0.
-
fingerprint
(word)[source]¶ Return the occurrence halved fingerprint.
Based on the occurrence halved fingerprint from [CislakG17].
- Parameters
word (str) -- The word to fingerprint
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
- Returns
The occurrence halved fingerprint
- Return type
int
Examples
>>> ohf = OccurrenceHalved() >>> bin(ohf.fingerprint('hat')) '0b1010000000010' >>> bin(ohf.fingerprint('niall')) '0b10010100000' >>> bin(ohf.fingerprint('colin')) '0b1001010000' >>> bin(ohf.fingerprint('atcg')) '0b10100000000000' >>> bin(ohf.fingerprint('entreatment')) '0b1111010000110000'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
occurrence_halved_fingerprint
(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]¶ Return the occurrence halved fingerprint.
This is a wrapper for
OccurrenceHalved.fingerprint()
.- Parameters
word (str) -- The word to fingerprint
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
- Returns
The occurrence halved fingerprint
- Return type
int
Examples
>>> bin(occurrence_halved_fingerprint('hat')) '0b1010000000010' >>> bin(occurrence_halved_fingerprint('niall')) '0b10010100000' >>> bin(occurrence_halved_fingerprint('colin')) '0b1001010000' >>> bin(occurrence_halved_fingerprint('atcg')) '0b10100000000000' >>> bin(occurrence_halved_fingerprint('entreatment')) '0b1111010000110000'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the OccurrenceHalved.fingerprint method instead.
-
class
abydos.fingerprint.
Count
(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Count Fingerprint.
Based on the count fingerprint from [CislakG17].
New in version 0.3.6.
Initialize Count instance.
- Parameters
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
New in version 0.4.0.
-
fingerprint
(word)[source]¶ Return the count fingerprint.
- Parameters
word (str) -- The word to fingerprint
- Returns
The count fingerprint
- Return type
int
Examples
>>> cf = Count() >>> bin(cf.fingerprint('hat')) '0b1010000000001' >>> bin(cf.fingerprint('niall')) '0b10001010000' >>> bin(cf.fingerprint('colin')) '0b101010000' >>> bin(cf.fingerprint('atcg')) '0b1010000000000' >>> bin(cf.fingerprint('entreatment')) '0b1111010000100000'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
count_fingerprint
(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'))[source]¶ Return the count fingerprint.
This is a wrapper for
Count.fingerprint()
.- Parameters
word (str) -- The word to fingerprint
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
- Returns
The count fingerprint
- Return type
int
Examples
>>> bin(count_fingerprint('hat')) '0b1010000000001' >>> bin(count_fingerprint('niall')) '0b10001010000' >>> bin(count_fingerprint('colin')) '0b101010000' >>> bin(count_fingerprint('atcg')) '0b1010000000000' >>> bin(count_fingerprint('entreatment')) '0b1111010000100000'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Count.fingerprint method instead.
-
class
abydos.fingerprint.
Position
(n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'), bits_per_letter=3)[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Position Fingerprint.
Based on the position fingerprint from [CislakG17].
New in version 0.3.6.
Initialize Count instance.
- Parameters
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
New in version 0.4.0.
-
fingerprint
(word)[source]¶ Return the position fingerprint.
- Parameters
word (str) -- The word to fingerprint
- Returns
The position fingerprint
- Return type
int
Examples
>>> bin(position_fingerprint('hat')) '0b1110100011111111' >>> bin(position_fingerprint('niall')) '0b1111110101110010' >>> bin(position_fingerprint('colin')) '0b1111111110010111' >>> bin(position_fingerprint('atcg')) '0b1110010001111111' >>> bin(position_fingerprint('entreatment')) '0b101011111111'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.fingerprint.
position_fingerprint
(word, n_bits=16, most_common=('e', 't', 'a', 'o', 'i', 'n', 's', 'h', 'r', 'd', 'l', 'c', 'u', 'm', 'w', 'f'), bits_per_letter=3)[source]¶ Return the position fingerprint.
This is a wrapper for
Position.fingerprint()
.- Parameters
word (str) -- The word to fingerprint
n_bits (int) -- Number of bits in the fingerprint returned
most_common (list) -- The most common tokens in the target language, ordered by frequency
bits_per_letter (int) -- The bits to assign for letter position
- Returns
The position fingerprint
- Return type
int
Examples
>>> bin(position_fingerprint('hat')) '0b1110100011111111' >>> bin(position_fingerprint('niall')) '0b1111110101110010' >>> bin(position_fingerprint('colin')) '0b1111111110010111' >>> bin(position_fingerprint('atcg')) '0b1110010001111111' >>> bin(position_fingerprint('entreatment')) '0b101011111111'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Position.fingerprint method instead.
-
class
abydos.fingerprint.
SynonameToolcode
[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Synoname Toolcode.
Cf. [JPGTrust91][Gro91].
New in version 0.3.6.
-
fingerprint
(lname, fname='', qual='', normalize=0)[source]¶ Build the Synoname toolcode.
- Parameters
lname (str) -- Last name
fname (str) -- First name (can be blank)
qual (str) -- Qualifier
normalize (int) -- Normalization mode (0, 1, or 2)
- Returns
The transformed names and the synoname toolcode
- Return type
tuple
Examples
>>> st = SynonameToolcode() >>> st.fingerprint('hat') ('hat', '', '0000000003$$h') >>> st.fingerprint('niall') ('niall', '', '0000000005$$n') >>> st.fingerprint('colin') ('colin', '', '0000000005$$c') >>> st.fingerprint('atcg') ('atcg', '', '0000000004$$a') >>> st.fingerprint('entreatment') ('entreatment', '', '0000000011$$e')
>>> st.fingerprint('Ste.-Marie', 'Count John II', normalize=2) ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji') >>> st.fingerprint('Michelangelo IV', '', 'Workshop of') ('michelangelo iv', '', '3000550015$055b$mi')
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.fingerprint.
synoname_toolcode
(lname, fname='', qual='', normalize=0)[source]¶ Build the Synoname toolcode.
This is a wrapper for
SynonameToolcode.fingerprint()
.- Parameters
lname (str) -- Last name
fname (str) -- First name (can be blank)
qual (str) -- Qualifier
normalize (int) -- Normalization mode (0, 1, or 2)
- Returns
The transformed names and the synoname toolcode
- Return type
tuple
Examples
>>> synoname_toolcode('hat') ('hat', '', '0000000003$$h') >>> synoname_toolcode('niall') ('niall', '', '0000000005$$n') >>> synoname_toolcode('colin') ('colin', '', '0000000005$$c') >>> synoname_toolcode('atcg') ('atcg', '', '0000000004$$a') >>> synoname_toolcode('entreatment') ('entreatment', '', '0000000011$$e')
>>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2) ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji') >>> synoname_toolcode('Michelangelo IV', '', 'Workshop of') ('michelangelo iv', '', '3000550015$055b$mi')
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SynonameToolcode.fingerprint method instead.
-
class
abydos.fingerprint.
Consonant
(variant=1, doubles=True, vowels=None)[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Consonant Coding Fingerprint.
Based on the consonant coding from [Taf70], variants 1, 2, 3, 1-D, 2-D, and 3-D.
New in version 0.4.1.
Initialize Consonant instance.
- Parameters
variant (int) --
Selects between Taft's 3 variants, which assign to the vowel set one of:
A, E, I, O, & U
A, E, I, O, U, W, & Y
A, E, I, O, U, W, H, & Y
doubles (bool) -- If set to False, multiple consonants in a row are conflated to a single instance.
vowels (list, set, or str) -- Setting vowels to a non-None value overrides the variant setting and defines the set of letters to be removed from the input.
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the consonant coding.
- Parameters
word (str) -- The word to fingerprint
- Returns
The consonant coding
- Return type
int
Examples
>>> cf = Consonant() >>> cf.fingerprint('hat') 'HT' >>> cf.fingerprint('niall') 'NLL' >>> cf.fingerprint('colin') 'CLN' >>> cf.fingerprint('atcg') 'ATCG' >>> cf.fingerprint('entreatment') 'ENTRTMNT'
New in version 0.4.1.
-
class
abydos.fingerprint.
Extract
(letter_list=1)[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Extract Letter List fingerprint.
Based on the extract letter list coding from [Taf70], for lists 1, 2, 3, & 4.
New in version 0.4.1.
Initialize Extract instance.
- Parameters
letter_list (int or iterable) -- If an integer (1-4) is supplied, Taft's specified letter lists are used. If an iterable is supplied, its values will be used as the list of letters to remove (in order).
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the extract letter list coding.
- Parameters
word (str) -- The word to fingerprint
- Returns
The extract letter list coding
- Return type
int
Examples
>>> fp = Extract() >>> fp.fingerprint('hat') 'HAT' >>> fp.fingerprint('niall') 'NILL' >>> fp.fingerprint('colin') 'CLIN' >>> fp.fingerprint('atcg') 'ATCG' >>> fp.fingerprint('entreatment') 'NRMN'
New in version 0.4.1.
-
class
abydos.fingerprint.
ExtractPositionFrequency
[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Extract - Position & Frequency fingerprint.
Based on the extract - position & frequency coding from [Taf70].
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the extract - position & frequency coding.
- Parameters
word (str) -- The word to fingerprint
- Returns
The extract - position & frequency coding
- Return type
int
Examples
>>> fp = ExtractPositionFrequency() >>> fp.fingerprint('hat') 'HAT' >>> fp.fingerprint('niall') 'NILL' >>> fp.fingerprint('colin') 'COLN' >>> fp.fingerprint('atcg') 'ATCG' >>> fp.fingerprint('entreatment') 'NMNT'
New in version 0.4.1.
-
-
class
abydos.fingerprint.
LACSS
[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
L.A. County Sheriff's System fingerprint.
Based on the description from [Taf70].
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the LACSS coding.
- Parameters
word (str) -- The word to fingerprint
- Returns
The L.A. County Sheriff's System fingerprint
- Return type
int
Examples
>>> cf = LACSS() >>> cf.fingerprint('hat') '4911211' >>> cf.fingerprint('niall') '6488374' >>> cf.fingerprint('colin') '3015957' >>> cf.fingerprint('atcg') '1772371' >>> cf.fingerprint('entreatment') '3882324'
New in version 0.4.1.
-
-
class
abydos.fingerprint.
LCCutter
(max_length=64)[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Library of Congress Cutter table encoding.
This is based on the Library of Congress Cutter table encoding scheme, as described at https://www.loc.gov/aba/pcc/053/table.html [oC13]. Handling for numerals is not included.
New in version 0.4.1.
Initialize LCCutter instance.
- Parameters
max_length (int) -- The length of the code returned (defaults to 64)
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the Library of Congress Cutter table encoding of a word.
- Parameters
word (str) -- The word to fingerprint
- Returns
The Library of Congress Cutter table encoding
- Return type
str
Examples
>>> cf = LCCutter() >>> cf.fingerprint('hat') 'H38' >>> cf.fingerprint('niall') 'N5355' >>> cf.fingerprint('colin') 'C6556' >>> cf.fingerprint('atcg') 'A834' >>> cf.fingerprint('entreatment') 'E5874386468'
New in version 0.4.1.
-
class
abydos.fingerprint.
BWTF
(terminator='x00')[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Burrows-Wheeler transform fingerprint.
This is a wrapper of the BWT class in abydos.compression, which provides the same interface as other descendants of _Fingerprint.
New in version 0.4.1.
Initialize BWTF instance.
- Parameters
terminator (str) -- A character added to signal the end of the string
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the Burrows-Wheeler transform of a word.
- Parameters
word (str) -- The word to fingerprint
- Returns
The Burrows-Wheeler transform of a word
- Return type
str
Examples
>>> fp = BWTF() >>> fp.fingerprint('hat') 'th\x00a' >>> fp.fingerprint('niall') 'linla\x00' >>> fp.fingerprint('colin') 'n\x00loic' >>> fp.fingerprint('atcg') 'g\x00tca' >>> fp.fingerprint('entreatment') 'term\x00teetnan'
New in version 0.4.1.
-
class
abydos.fingerprint.
BWTRLEF
(terminator='x00')[source]¶ Bases:
abydos.fingerprint._fingerprint._Fingerprint
Burrows-Wheeler transform plus run-length encoding fingerprint.
This is a wrapper of the BWT and RLE classes in abydos.compression, which provides the same interface as other descendants of _Fingerprint.
New in version 0.4.1.
Initialize BWTRLEF instance.
- Parameters
terminator (str) -- A character added to signal the end of the string
New in version 0.4.1.
-
fingerprint
(word)[source]¶ Return the run-length encoded Burrows-Wheeler transform of a word.
- Parameters
word (str) -- The word to fingerprint
- Returns
The run-length encoded Burrows-Wheeler transform of a word
- Return type
str
Examples
>>> fp = BWTRLEF() >>> fp.fingerprint('hat') 'th\x00a' >>> fp.fingerprint('niall') 'linla\x00' >>> fp.fingerprint('colin') 'n\x00loic' >>> fp.fingerprint('atcg') 'g\x00tca' >>> fp.fingerprint('entreatment') 'term\x00teetnan'
New in version 0.4.1.