abydos.stemmer package¶
abydos.stemmer.
The stemmer package collects stemmer classes for a number of languages including:
English stemmers:
German stemmers:
Caumanns' (
Caumanns
)CLEF German (
CLEFGerman
)CLEF German Plus (
CLEFGermanPlus
)Snowball German (
SnowballGerman
)Swedish stemmers:
CLEF Swedish (
CLEFSwedish
)Snowball Swedish (
SnowballSwedish
)Latin stemmer:
Schinke (
Schinke
)Danish stemmer:
Snowball Danish (
SnowballDanish
)Dutch stemmer:
Snowball Dutch (
SnowballDutch
)Norwegian stemmer:
Snowball Norwegian (
SnowballNorwegian
)
Each stemmer has a stem
method, which takes a word and returns its stemmed
form:
>>> stmr = Porter()
>>> stmr.stem('democracy')
'democraci'
>>> stmr.stem('trusted')
'trust'
-
class
abydos.stemmer.
Lovins
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Lovins stemmer.
The Lovins stemmer is described in Julie Beth Lovins's article [Lov68].
New in version 0.3.6.
Initialize the stemmer.
New in version 0.3.6.
-
stem
(word)[source]¶ Return Lovins stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Lovins() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
New in version 0.2.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
lovins
(word)[source]¶ Return Lovins stem.
This is a wrapper for
Lovins.stem()
.- Parameters
word (str) -- The word to stem
- Returns
str
- Return type
Word stem
Examples
>>> lovins('reading') 'read' >>> lovins('suspension') 'suspens' >>> lovins('elusiveness') 'elus'
New in version 0.2.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Lovins.stem method instead.
-
class
abydos.stemmer.
PaiceHusk
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Paice-Husk stemmer.
Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk
This is based on the algorithm's description in [Pai90].
New in version 0.3.6.
-
stem
(word)[source]¶ Return Paice-Husk stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = PaiceHusk() >>> stmr.stem('assumption') 'assum' >>> stmr.stem('verifiable') 'ver' >>> stmr.stem('fancies') 'fant' >>> stmr.stem('fanciful') 'fancy' >>> stmr.stem('torment') 'tor'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
paice_husk
(word)[source]¶ Return Paice-Husk stem.
This is a wrapper for
PaiceHusk.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> paice_husk('assumption') 'assum' >>> paice_husk('verifiable') 'ver' >>> paice_husk('fancies') 'fant' >>> paice_husk('fanciful') 'fancy' >>> paice_husk('torment') 'tor'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PaiceHusk.stem method instead.
-
class
abydos.stemmer.
UEALite
(max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
UEA-Lite stemmer.
The UEA-Lite stemmer is discussed in [JS05].
This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams' Ruby port.
Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]
New in version 0.3.6.
Initialize UEALite instance.
- Parameters
max_word_length (int) -- The maximum word length allowed
max_acro_length (int) -- The maximum acronym length allowed
return_rule_no (bool) -- If True, returns the stem along with rule number
var (str) --
Variant rules to use:
standard
to use the original (Java-version) rulesAdams
to use Jason Adams' rulesPerl
to use the original Perl rules
New in version 0.4.0.
-
stem
(word)[source]¶ Return UEA-Lite stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str or (str, int)
Examples
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.
uealite
(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]¶ Return UEA-Lite stem.
This is a wrapper for
UEALite.stem()
.- Parameters
word (str) -- The word to stem
max_word_length (int) -- The maximum word length allowed
max_acro_length (int) -- The maximum acronym length allowed
return_rule_no (bool) -- If True, returns the stem along with rule number
var (str) --
Variant rules to use:
Adams
to use Jason Adams' rulesPerl
to use the original Perl rules
- Returns
Word stem
- Return type
str or (str, int)
Examples
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the UEALite.stem method instead.
-
class
abydos.stemmer.
SStemmer
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
S-stemmer.
The S stemmer is defined in [Har91].
New in version 0.3.6.
-
stem
(word)[source]¶ Return the S-stemmed form of a word.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SStemmer() >>> stmr.stem('summaries') 'summary' >>> stmr.stem('summary') 'summary' >>> stmr.stem('towers') 'tower' >>> stmr.stem('reading') 'reading' >>> stmr.stem('census') 'census'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
s_stemmer
(word)[source]¶ Return the S-stemmed form of a word.
This is a wrapper for
SStemmer.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> s_stemmer('summaries') 'summary' >>> s_stemmer('summary') 'summary' >>> s_stemmer('towers') 'tower' >>> s_stemmer('reading') 'reading' >>> s_stemmer('census') 'census'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SStemmer.stem method instead.
-
class
abydos.stemmer.
Caumanns
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Caumanns stemmer.
Jörg Caumanns' stemmer is described in his article in [Cau99].
This implementation is based on the GermanStemFilter described at [Lan13].
New in version 0.3.6.
-
stem
(word)[source]¶ Return Caumanns German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Caumanns() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabier'
New in version 0.2.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
caumanns
(word)[source]¶ Return Caumanns German stem.
This is a wrapper for
Caumanns.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> caumanns('lesen') 'les' >>> caumanns('graues') 'grau' >>> caumanns('buchstabieren') 'buchstabier'
New in version 0.2.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Caumanns.stem method instead.
-
class
abydos.stemmer.
Schinke
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Schinke stemmer.
This is defined in [SGRW96].
New in version 0.3.6.
-
stem
(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Schinke() >>> stmr.stem('atque') {'n': 'atque', 'v': 'atque'} >>> stmr.stem('census') {'n': 'cens', 'v': 'censu'} >>> stmr.stem('virum') {'n': 'uir', 'v': 'uiru'} >>> stmr.stem('populusque') {'n': 'popul', 'v': 'populu'} >>> stmr.stem('senatus') {'n': 'senat', 'v': 'senatu'}
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
schinke
(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
This is a wrapper for
Schinke.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> schinke('atque') {'n': 'atque', 'v': 'atque'} >>> schinke('census') {'n': 'cens', 'v': 'censu'} >>> schinke('virum') {'n': 'uir', 'v': 'uiru'} >>> schinke('populusque') {'n': 'popul', 'v': 'populu'} >>> schinke('senatus') {'n': 'senat', 'v': 'senatu'}
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Schinke.stem method instead.
-
class
abydos.stemmer.
Porter
(early_english=False)[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Porter stemmer.
The Porter stemmer is described in [Por80].
New in version 0.3.6.
Initialize Porter instance.
- Parameters
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
New in version 0.4.0.
-
stem
(word)[source]¶ Return Porter stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Porter() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr = Porter(early_english=True) >>> stmr.stem('eateth') 'eat'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.
porter
(word, early_english=False)[source]¶ Return Porter stem.
This is a wrapper for
Porter.stem()
.- Parameters
word (str) -- The word to stem
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
- Returns
Word stem
- Return type
str
Examples
>>> porter('reading') 'read' >>> porter('suspension') 'suspens' >>> porter('elusiveness') 'elus'
>>> porter('eateth', early_english=True) 'eat'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Porter.stem method instead.
-
class
abydos.stemmer.
Porter2
(early_english=False)[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Porter2 (Snowball English) stemmer.
The Porter2 (Snowball English) stemmer is defined in [Por02].
New in version 0.3.6.
Initialize Porter2 instance.
- Parameters
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
New in version 0.4.0.
-
stem
(word)[source]¶ Return the Porter2 (Snowball English) stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Porter2() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr = Porter2(early_english=True) >>> stmr.stem('eateth') 'eat'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.
porter2
(word, early_english=False)[source]¶ Return the Porter2 (Snowball English) stem.
This is a wrapper for
Porter2.stem()
.- Parameters
word (str) -- The word to stem
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
- Returns
Word stem
- Return type
str
Examples
>>> porter2('reading') 'read' >>> porter2('suspension') 'suspens' >>> porter2('elusiveness') 'elus'
>>> porter2('eateth', early_english=True) 'eat'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Porter2.stem method instead.
-
class
abydos.stemmer.
SnowballDanish
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Danish stemmer.
The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html
New in version 0.3.6.
-
stem
(word)[source]¶ Return Snowball Danish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballDanish() >>> stmr.stem('underviser') 'undervis' >>> stmr.stem('suspension') 'suspension' >>> stmr.stem('sikkerhed') 'sikker'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
sb_danish
(word)[source]¶ Return Snowball Danish stem.
This is a wrapper for
SnowballDanish.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_danish('underviser') 'undervis' >>> sb_danish('suspension') 'suspension' >>> sb_danish('sikkerhed') 'sikker'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballDanish.stem method instead.
-
class
abydos.stemmer.
SnowballDutch
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Dutch stemmer.
The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html
New in version 0.3.6.
-
stem
(word)[source]¶ Return Snowball Dutch stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballDutch() >>> stmr.stem('lezen') 'lez' >>> stmr.stem('opschorting') 'opschort' >>> stmr.stem('ongrijpbaarheid') 'ongrijp'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
sb_dutch
(word)[source]¶ Return Snowball Dutch stem.
This is a wrapper for
SnowballDutch.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_dutch('lezen') 'lez' >>> sb_dutch('opschorting') 'opschort' >>> sb_dutch('ongrijpbaarheid') 'ongrijp'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballDutch.stem method instead.
-
class
abydos.stemmer.
SnowballGerman
(alternate_vowels=False)[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball German stemmer.
The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html
New in version 0.3.6.
Initialize SnowballGerman instance.
- Parameters
alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
New in version 0.4.0.
-
stem
(word)[source]¶ Return Snowball German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballGerman() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabi'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.
sb_german
(word, alternate_vowels=False)[source]¶ Return Snowball German stem.
This is a wrapper for
SnowballGerman.stem()
.- Parameters
word (str) -- The word to stem
alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
- Returns
Word stem
- Return type
str
Examples
>>> sb_german('lesen') 'les' >>> sb_german('graues') 'grau' >>> sb_german('buchstabieren') 'buchstabi'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballGerman.stem method instead.
-
class
abydos.stemmer.
SnowballNorwegian
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Norwegian stemmer.
The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
New in version 0.3.6.
-
stem
(word)[source]¶ Return Snowball Norwegian stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballNorwegian() >>> stmr.stem('lese') 'les' >>> stmr.stem('suspensjon') 'suspensjon' >>> stmr.stem('sikkerhet') 'sikker'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
sb_norwegian
(word)[source]¶ Return Snowball Norwegian stem.
This is a wrapper for
SnowballNorwegian.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_norwegian('lese') 'les' >>> sb_norwegian('suspensjon') 'suspensjon' >>> sb_norwegian('sikkerhet') 'sikker'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballNorwegian.stem method instead.
-
class
abydos.stemmer.
SnowballSwedish
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Swedish stemmer.
The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html
New in version 0.3.6.
-
stem
(word)[source]¶ Return Snowball Swedish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballSwedish() >>> stmr.stem('undervisa') 'undervis' >>> stmr.stem('suspension') 'suspension' >>> stmr.stem('visshet') 'viss'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
sb_swedish
(word)[source]¶ Return Snowball Swedish stem.
This is a wrapper for
SnowballSwedish.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_swedish('undervisa') 'undervis' >>> sb_swedish('suspension') 'suspension' >>> sb_swedish('visshet') 'viss'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballSwedish.stem method instead.
-
class
abydos.stemmer.
CLEFGerman
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF German stemmer.
The CLEF German stemmer is defined at [Sav05].
New in version 0.3.6.
-
stem
(word)[source]¶ Return CLEF German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGerman() >>> stmr.stem('lesen') 'lese' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabier'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
clef_german
(word)[source]¶ Return CLEF German stem.
This is a wrapper for
CLEFGerman.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> clef_german('lesen') 'lese' >>> clef_german('graues') 'grau' >>> clef_german('buchstabieren') 'buchstabier'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFGerman.stem method instead.
-
class
abydos.stemmer.
CLEFGermanPlus
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF German stemmer plus.
The CLEF German stemmer plus is defined at [Sav05].
New in version 0.3.6.
-
stem
(word)[source]¶ Return 'CLEF German stemmer plus' stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGermanPlus() >>> clef_german_plus('lesen') 'les' >>> clef_german_plus('graues') 'grau' >>> clef_german_plus('buchstabieren') 'buchstabi'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
clef_german_plus
(word)[source]¶ Return 'CLEF German stemmer plus' stem.
This is a wrapper for
CLEFGermanPlus.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGermanPlus() >>> clef_german_plus('lesen') 'les' >>> clef_german_plus('graues') 'grau' >>> clef_german_plus('buchstabieren') 'buchstabi'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFGermanPlus.stem method instead.
-
class
abydos.stemmer.
CLEFSwedish
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF Swedish stemmer.
The CLEF Swedish stemmer is defined at [Sav05].
New in version 0.3.6.
-
stem
(word)[source]¶ Return CLEF Swedish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> clef_swedish('undervisa') 'undervis' >>> clef_swedish('suspension') 'suspensio' >>> clef_swedish('visshet') 'viss'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.
clef_swedish
(word)[source]¶ Return CLEF Swedish stem.
This is a wrapper for
CLEFSwedish.stem()
.- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> clef_swedish('undervisa') 'undervis' >>> clef_swedish('suspension') 'suspensio' >>> clef_swedish('visshet') 'viss'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFSwedish.stem method instead.