abydos.tokenizer package¶
abydos.tokenizer.
- The tokenizer package includes tokenizers such as:
- the Q-Gram class
-
class
abydos.tokenizer.
QGrams
(term, qval=2, start_stop='$#', skip=0)[source]¶ Bases:
collections.Counter
A q-gram class, which functions like a bag/multiset.
A q-gram is here defined as all sequences of q characters. Q-grams are also known as k-grams and n-grams, but the term n-gram more typically refers to sequences of whitespace-delimited words in a string, where q-gram refers to sequences of characters in a word or string.
-
count
()[source]¶ Return q-grams count.
Returns: the total count of q-grams in a QGrams object Return type: int >>> qg = QGrams('AATTATAT') >>> qg.count() 9
>>> qg = QGrams('AATTATAT', qval=1, start_stop='') >>> qg.count() 8
>>> qg = QGrams('AATTATAT', qval=3, start_stop='') >>> qg.count() 6
-
ordered_list
= []¶
-
term
= ''¶
-
term_ss
= ''¶
-