abydos.tokenizer package

abydos.tokenizer.

The tokenizer package includes tokenizers such as:
  • the Q-Gram class
class abydos.tokenizer.QGrams(term, qval=2, start_stop='$#', skip=0)[source]

Bases: collections.Counter

A q-gram class, which functions like a bag/multiset.

A q-gram is here defined as all sequences of q characters. Q-grams are also known as k-grams and n-grams, but the term n-gram more typically refers to sequences of whitespace-delimited words in a string, where q-gram refers to sequences of characters in a word or string.

count()[source]

Return q-grams count.

Returns:the total count of q-grams in a QGrams object
Return type:int
>>> qg = QGrams('AATTATAT')
>>> qg.count()
9
>>> qg = QGrams('AATTATAT', qval=1, start_stop='')
>>> qg.count()
8
>>> qg = QGrams('AATTATAT', qval=3, start_stop='')
>>> qg.count()
6
ordered_list = []
term = ''
term_ss = ''