abydos.stemmer package¶
abydos.stemmer.
The stemmer package collects stemmer classes for a number of languages including:
English stemmers:
German stemmers:
- Caumanns' (
Caumanns
)- CLEF German (
CLEFGerman
)- CLEF German Plus (
CLEFGermanPlus
)- Snowball German (
SnowballGerman
)Swedish stemmers:
- CLEF Swedish (
CLEFSwedish
)- Snowball Swedish (
SnowballSwedish
)Latin stemmer:
- Schinke (
Schinke
)Danish stemmer:
- Snowball Danish (
SnowballDanish
)Dutch stemmer:
- Snowball Dutch (
SnowballDutch
)Norwegian stemmer:
- Snowball Norwegian (
SnowballNorwegian
)
Each stemmer has a stem
method, which takes a word and returns its stemmed
form:
>>> stmr = Porter()
>>> stmr.stem('democracy')
'democraci'
>>> stmr.stem('trusted')
'trust'
-
class
abydos.stemmer.
Lovins
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Lovins stemmer.
The Lovins stemmer is described in Julie Beth Lovins's article [Lov68].
-
abydos.stemmer.
lovins
(word)[source]¶ Return Lovins stem.
This is a wrapper for
Lovins.stem()
.Parameters: word (str) -- The word to stem Returns: str Return type: Word stem Examples
>>> lovins('reading') 'read' >>> lovins('suspension') 'suspens' >>> lovins('elusiveness') 'elus'
-
class
abydos.stemmer.
PaiceHusk
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Paice-Husk stemmer.
Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk
This is based on the algorithm's description in [Pai90].
-
stem
(word)[source]¶ Return Paice-Husk stem.
Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> stmr = PaiceHusk() >>> stmr.stem('assumption') 'assum' >>> stmr.stem('verifiable') 'ver' >>> stmr.stem('fancies') 'fant' >>> stmr.stem('fanciful') 'fancy' >>> stmr.stem('torment') 'tor'
-
-
abydos.stemmer.
paice_husk
(word)[source]¶ Return Paice-Husk stem.
This is a wrapper for
PaiceHusk.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> paice_husk('assumption') 'assum' >>> paice_husk('verifiable') 'ver' >>> paice_husk('fancies') 'fant' >>> paice_husk('fanciful') 'fancy' >>> paice_husk('torment') 'tor'
-
class
abydos.stemmer.
UEALite
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
UEA-Lite stemmer.
The UEA-Lite stemmer is discussed in [JS05].
This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams' Ruby port.
Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]
-
stem
(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]¶ Return UEA-Lite stem.
Parameters: - word (str) -- The word to stem
- max_word_length (int) -- The maximum word length allowed
- max_acro_length (int) -- The maximum acronym length allowed
- return_rule_no (bool) -- If True, returns the stem along with rule number
- var (str) --
Variant rules to use:
Adams
to use Jason Adams' rulesPerl
to use the original Perl rules
Returns: Word stem
Return type: str or (str, int)
Examples
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'
-
-
abydos.stemmer.
uealite
(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]¶ Return UEA-Lite stem.
This is a wrapper for
UEALite.stem()
.Parameters: - word (str) -- The word to stem
- max_word_length (int) -- The maximum word length allowed
- max_acro_length (int) -- The maximum acronym length allowed
- return_rule_no (bool) -- If True, returns the stem along with rule number
- var (str) --
Variant rules to use:
Adams
to use Jason Adams' rulesPerl
to use the original Perl rules
Returns: Word stem
Return type: str or (str, int)
Examples
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'
-
class
abydos.stemmer.
SStemmer
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
S-stemmer.
The S stemmer is defined in [Har91].
-
stem
(word)[source]¶ Return the S-stemmed form of a word.
Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> stmr = SStemmer() >>> stmr.stem('summaries') 'summary' >>> stmr.stem('summary') 'summary' >>> stmr.stem('towers') 'tower' >>> stmr.stem('reading') 'reading' >>> stmr.stem('census') 'census'
-
-
abydos.stemmer.
s_stemmer
(word)[source]¶ Return the S-stemmed form of a word.
This is a wrapper for
SStemmer.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> s_stemmer('summaries') 'summary' >>> s_stemmer('summary') 'summary' >>> s_stemmer('towers') 'tower' >>> s_stemmer('reading') 'reading' >>> s_stemmer('census') 'census'
-
class
abydos.stemmer.
Caumanns
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Caumanns stemmer.
Jörg Caumanns' stemmer is described in his article in [Cau99].
This implementation is based on the GermanStemFilter described at [Lan13].
-
abydos.stemmer.
caumanns
(word)[source]¶ Return Caumanns German stem.
This is a wrapper for
Caumanns.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> caumanns('lesen') 'les' >>> caumanns('graues') 'grau' >>> caumanns('buchstabieren') 'buchstabier'
-
class
abydos.stemmer.
Schinke
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Schinke stemmer.
This is defined in [SGRW96].
-
stem
(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> stmr = Schinke() >>> stmr.stem('atque') {'n': 'atque', 'v': 'atque'} >>> stmr.stem('census') {'n': 'cens', 'v': 'censu'} >>> stmr.stem('virum') {'n': 'uir', 'v': 'uiru'} >>> stmr.stem('populusque') {'n': 'popul', 'v': 'populu'} >>> stmr.stem('senatus') {'n': 'senat', 'v': 'senatu'}
-
-
abydos.stemmer.
schinke
(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
This is a wrapper for
Schinke.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> schinke('atque') {'n': 'atque', 'v': 'atque'} >>> schinke('census') {'n': 'cens', 'v': 'censu'} >>> schinke('virum') {'n': 'uir', 'v': 'uiru'} >>> schinke('populusque') {'n': 'popul', 'v': 'populu'} >>> schinke('senatus') {'n': 'senat', 'v': 'senatu'}
-
class
abydos.stemmer.
Porter
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Porter stemmer.
The Porter stemmer is described in [Por80].
-
stem
(word, early_english=False)[source]¶ Return Porter stem.
Parameters: - word (str) -- The word to stem
- early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns: Word stem
Return type: str
Examples
>>> stmr = Porter() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr.stem('eateth', early_english=True) 'eat'
-
-
abydos.stemmer.
porter
(word, early_english=False)[source]¶ Return Porter stem.
This is a wrapper for
Porter.stem()
.Parameters: - word (str) -- The word to stem
- early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns: Word stem
Return type: str
Examples
>>> porter('reading') 'read' >>> porter('suspension') 'suspens' >>> porter('elusiveness') 'elus'
>>> porter('eateth', early_english=True) 'eat'
-
class
abydos.stemmer.
Porter2
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Porter2 (Snowball English) stemmer.
The Porter2 (Snowball English) stemmer is defined in [Por02].
-
stem
(word, early_english=False)[source]¶ Return the Porter2 (Snowball English) stem.
Parameters: - word (str) -- The word to stem
- early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns: Word stem
Return type: str
Examples
>>> stmr = Porter2() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr.stem('eateth', early_english=True) 'eat'
-
-
abydos.stemmer.
porter2
(word, early_english=False)[source]¶ Return the Porter2 (Snowball English) stem.
This is a wrapper for
Porter2.stem()
.Parameters: - word (str) -- The word to stem
- early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns: Word stem
Return type: str
Examples
>>> porter2('reading') 'read' >>> porter2('suspension') 'suspens' >>> porter2('elusiveness') 'elus'
>>> porter2('eateth', early_english=True) 'eat'
-
class
abydos.stemmer.
SnowballDanish
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Danish stemmer.
The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html
-
abydos.stemmer.
sb_danish
(word)[source]¶ Return Snowball Danish stem.
This is a wrapper for
SnowballDanish.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> sb_danish('underviser') 'undervis' >>> sb_danish('suspension') 'suspension' >>> sb_danish('sikkerhed') 'sikker'
-
class
abydos.stemmer.
SnowballDutch
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Dutch stemmer.
The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html
-
abydos.stemmer.
sb_dutch
(word)[source]¶ Return Snowball Dutch stem.
This is a wrapper for
SnowballDutch.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> sb_dutch('lezen') 'lez' >>> sb_dutch('opschorting') 'opschort' >>> sb_dutch('ongrijpbaarheid') 'ongrijp'
-
class
abydos.stemmer.
SnowballGerman
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball German stemmer.
The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html
-
stem
(word, alternate_vowels=False)[source]¶ Return Snowball German stem.
Parameters: - word (str) -- The word to stem
- alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
Returns: Word stem
Return type: str
Examples
>>> stmr = SnowballGerman() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabi'
-
-
abydos.stemmer.
sb_german
(word, alternate_vowels=False)[source]¶ Return Snowball German stem.
This is a wrapper for
SnowballGerman.stem()
.Parameters: - word (str) -- The word to stem
- alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
Returns: Word stem
Return type: str
Examples
>>> sb_german('lesen') 'les' >>> sb_german('graues') 'grau' >>> sb_german('buchstabieren') 'buchstabi'
-
class
abydos.stemmer.
SnowballNorwegian
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Norwegian stemmer.
The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
-
abydos.stemmer.
sb_norwegian
(word)[source]¶ Return Snowball Norwegian stem.
This is a wrapper for
SnowballNorwegian.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> sb_norwegian('lese') 'les' >>> sb_norwegian('suspensjon') 'suspensjon' >>> sb_norwegian('sikkerhet') 'sikker'
-
class
abydos.stemmer.
SnowballSwedish
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Swedish stemmer.
The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html
-
abydos.stemmer.
sb_swedish
(word)[source]¶ Return Snowball Swedish stem.
This is a wrapper for
SnowballSwedish.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> sb_swedish('undervisa') 'undervis' >>> sb_swedish('suspension') 'suspension' >>> sb_swedish('visshet') 'viss'
-
class
abydos.stemmer.
CLEFGerman
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF German stemmer.
The CLEF German stemmer is defined at [Sav05].
-
abydos.stemmer.
clef_german
(word)[source]¶ Return CLEF German stem.
This is a wrapper for
CLEFGerman.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> clef_german('lesen') 'lese' >>> clef_german('graues') 'grau' >>> clef_german('buchstabieren') 'buchstabier'
-
class
abydos.stemmer.
CLEFGermanPlus
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF German stemmer plus.
The CLEF German stemmer plus is defined at [Sav05].
-
abydos.stemmer.
clef_german_plus
(word)[source]¶ Return 'CLEF German stemmer plus' stem.
This is a wrapper for
CLEFGermanPlus.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> stmr = CLEFGermanPlus() >>> clef_german_plus('lesen') 'les' >>> clef_german_plus('graues') 'grau' >>> clef_german_plus('buchstabieren') 'buchstabi'
-
class
abydos.stemmer.
CLEFSwedish
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF Swedish stemmer.
The CLEF Swedish stemmer is defined at [Sav05].
-
abydos.stemmer.
clef_swedish
(word)[source]¶ Return CLEF Swedish stem.
This is a wrapper for
CLEFSwedish.stem()
.Parameters: word (str) -- The word to stem Returns: Word stem Return type: str Examples
>>> clef_swedish('undervisa') 'undervis' >>> clef_swedish('suspension') 'suspensio' >>> clef_swedish('visshet') 'viss'