abydos.stemmer module¶
abydos.stemmer.
The stemmer module defines word stemmers including:
- the Lovins stemmer
- the Porter and Porter2 (Snowball English) stemmers
- Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
- CLEF German, German plus, and Swedish stemmers
- Caumanns German stemmer
- UEA-Lite Stemmer
- Paice-Husk Stemmer
- Schinke Latin stemmer
- S stemmer
-
abydos.stemmer.
caumanns
(word)[source]¶ Return Caumanns German stem.
Jörg Caumanns’ stemmer is described in his article in [Cau99].
This implementation is based on the GermanStemFilter described at [Lan13].
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> caumanns('lesen') 'les' >>> caumanns('graues') 'grau' >>> caumanns('buchstabieren') 'buchstabier'
-
abydos.stemmer.
clef_german
(word)[source]¶ Return CLEF German stem.
The CLEF German stemmer is defined at [Sav05].
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> clef_german('lesen') 'lese' >>> clef_german('graues') 'grau' >>> clef_german('buchstabieren') 'buchstabier'
-
abydos.stemmer.
clef_german_plus
(word)[source]¶ Return ‘CLEF German stemmer plus’ stem.
The CLEF German stemmer plus is defined at [Sav05].
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> clef_german_plus('lesen') 'les' >>> clef_german_plus('graues') 'grau' >>> clef_german_plus('buchstabieren') 'buchstabi'
-
abydos.stemmer.
clef_swedish
(word)[source]¶ Return CLEF Swedish stem.
The CLEF Swedish stemmer is defined at [Sav05].
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> clef_swedish('undervisa') 'undervis' >>> clef_swedish('suspension') 'suspensio' >>> clef_swedish('visshet') 'viss'
-
abydos.stemmer.
lovins
(word)[source]¶ Return Lovins stem.
Lovins stemmer
The Lovins stemmer is described in Julie Beth Lovins’s article [Lov68].
Parameters: word (str) – the word to stem Returns: word stem Return type: str >>> lovins('reading') 'read' >>> lovins('suspension') 'suspens' >>> lovins('elusiveness') 'elus'
-
abydos.stemmer.
paice_husk
(word)[source]¶ Return Paice-Husk stem.
Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk
This is based on the algorithm’s description in [Pai90].
Parameters: word (str) – the word to stem Returns: the stemmed word Return type: str >>> paice_husk('assumption') 'assum' >>> paice_husk('verifiable') 'ver' >>> paice_husk('fancies') 'fant' >>> paice_husk('fanciful') 'fancy' >>> paice_husk('torment') 'tor'
-
abydos.stemmer.
porter
(word, early_english=False)[source]¶ Return Porter stem.
The Porter stemmer is described in [Por80].
Parameters: - word (str) – the word to calculate the stem of
- early_english (bool) – set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns: word stem
Return type: str
>>> porter('reading') 'read' >>> porter('suspension') 'suspens' >>> porter('elusiveness') 'elus'
>>> porter('eateth', early_english=True) 'eat'
-
abydos.stemmer.
porter2
(word, early_english=False)[source]¶ Return the Porter2 (Snowball English) stem.
The Porter2 (Snowball English) stemmer is defined in [Por02].
Parameters: - word (str) – the word to calculate the stem of
- early_english (bool) – set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns: word stem
Return type: str
>>> porter2('reading') 'read' >>> porter2('suspension') 'suspens' >>> porter2('elusiveness') 'elus'
>>> porter2('eateth', early_english=True) 'eat'
-
abydos.stemmer.
s_stemmer
(word)[source]¶ Return the S-stemmed form of a word.
The S stemmer is defined in [Har91].
Parameters: word (str) – the word to stem Returns: the stemmed word Return type: str >>> s_stemmer('summaries') 'summary' >>> s_stemmer('summary') 'summary' >>> s_stemmer('towers') 'tower' >>> s_stemmer('reading') 'reading' >>> s_stemmer('census') 'census'
-
abydos.stemmer.
sb_danish
(word)[source]¶ Return Snowball Danish stem.
The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> sb_danish('underviser') 'undervis' >>> sb_danish('suspension') 'suspension' >>> sb_danish('sikkerhed') 'sikker'
-
abydos.stemmer.
sb_dutch
(word)[source]¶ Return Snowball Dutch stem.
The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> sb_dutch('lezen') 'lez' >>> sb_dutch('opschorting') 'opschort' >>> sb_dutch('ongrijpbaarheid') 'ongrijp'
-
abydos.stemmer.
sb_german
(word, alternate_vowels=False)[source]¶ Return Snowball German stem.
The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html
Parameters: - word (str) – the word to calculate the stem of
- alternate_vowels (bool) – composes ae as ä, oe as ö, and ue as ü before running the algorithm
Returns: word stem
Return type: str
>>> sb_german('lesen') 'les' >>> sb_german('graues') 'grau' >>> sb_german('buchstabieren') 'buchstabi'
-
abydos.stemmer.
sb_norwegian
(word)[source]¶ Return Snowball Norwegian stem.
The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> sb_norwegian('lese') 'les' >>> sb_norwegian('suspensjon') 'suspensjon' >>> sb_norwegian('sikkerhet') 'sikker'
-
abydos.stemmer.
sb_swedish
(word)[source]¶ Return Snowball Swedish stem.
The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html
Parameters: word (str) – the word to calculate the stem of Returns: word stem Return type: str >>> sb_swedish('undervisa') 'undervis' >>> sb_swedish('suspension') 'suspension' >>> sb_swedish('visshet') 'viss'
-
abydos.stemmer.
schinke
(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
This is defined in [SGRW96].
Parameters: word (str) – the word to stem Returns: a dict of the noun- and verb-stemmed word Return type: dict >>> schinke('atque') {'n': 'atque', 'v': 'atque'} >>> schinke('census') {'n': 'cens', 'v': 'censu'} >>> schinke('virum') {'n': 'uir', 'v': 'uiru'} >>> schinke('populusque') {'n': 'popul', 'v': 'populu'} >>> schinke('senatus') {'n': 'senat', 'v': 'senatu'}
-
abydos.stemmer.
uealite
(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var=None)[source]¶ Return UEA-Lite stem.
The UEA-Lite stemmer is discussed in [JS05].
This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams’ Ruby port.
Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]
Parameters: - word (str) – the word to calculate the stem of
- max_word_length (int) – the maximum word length allowed
- max_acro_length (int) – the maximum acryonym length allowed
- return_rule_no (bool) – if True, returns the stem along with rule number
- var (str) – variant to use (set to ‘Adams’ to use Jason Adams’ rules, or ‘Perl’ to use the original Perl set of rules)
Returns: word stem
Return type: str or (str, int)
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'