abydos.stemmer package¶
abydos.stemmer.
The stemmer package collects stemmer classes for a number of languages including:
English stemmers:
German stemmers:
Caumanns' (
Caumanns)CLEF German (
CLEFGerman)CLEF German Plus (
CLEFGermanPlus)Snowball German (
SnowballGerman)Swedish stemmers:
CLEF Swedish (
CLEFSwedish)Snowball Swedish (
SnowballSwedish)Latin stemmer:
Schinke (
Schinke)Danish stemmer:
Snowball Danish (
SnowballDanish)Dutch stemmer:
Snowball Dutch (
SnowballDutch)Norwegian stemmer:
Snowball Norwegian (
SnowballNorwegian)
Each stemmer has a stem method, which takes a word and returns its stemmed
form:
>>> stmr = Porter()
>>> stmr.stem('democracy')
'democraci'
>>> stmr.stem('trusted')
'trust'
-
class
abydos.stemmer.Lovins[source]¶ Bases:
abydos.stemmer._stemmer._StemmerLovins stemmer.
The Lovins stemmer is described in Julie Beth Lovins's article [Lov68].
New in version 0.3.6.
Initialize the stemmer.
New in version 0.3.6.
-
stem(word)[source]¶ Return Lovins stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Lovins() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
New in version 0.2.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.lovins(word)[source]¶ Return Lovins stem.
This is a wrapper for
Lovins.stem().- Parameters
word (str) -- The word to stem
- Returns
str
- Return type
Word stem
Examples
>>> lovins('reading') 'read' >>> lovins('suspension') 'suspens' >>> lovins('elusiveness') 'elus'
New in version 0.2.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Lovins.stem method instead.
-
class
abydos.stemmer.PaiceHusk[source]¶ Bases:
abydos.stemmer._stemmer._StemmerPaice-Husk stemmer.
Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk
This is based on the algorithm's description in [Pai90].
New in version 0.3.6.
-
stem(word)[source]¶ Return Paice-Husk stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = PaiceHusk() >>> stmr.stem('assumption') 'assum' >>> stmr.stem('verifiable') 'ver' >>> stmr.stem('fancies') 'fant' >>> stmr.stem('fanciful') 'fancy' >>> stmr.stem('torment') 'tor'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.paice_husk(word)[source]¶ Return Paice-Husk stem.
This is a wrapper for
PaiceHusk.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> paice_husk('assumption') 'assum' >>> paice_husk('verifiable') 'ver' >>> paice_husk('fancies') 'fant' >>> paice_husk('fanciful') 'fancy' >>> paice_husk('torment') 'tor'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PaiceHusk.stem method instead.
-
class
abydos.stemmer.UEALite(max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]¶ Bases:
abydos.stemmer._stemmer._StemmerUEA-Lite stemmer.
The UEA-Lite stemmer is discussed in [JS05].
This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams' Ruby port.
Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]
New in version 0.3.6.
Initialize UEALite instance.
- Parameters
max_word_length (int) -- The maximum word length allowed
max_acro_length (int) -- The maximum acronym length allowed
return_rule_no (bool) -- If True, returns the stem along with rule number
var (str) --
Variant rules to use:
standardto use the original (Java-version) rulesAdamsto use Jason Adams' rulesPerlto use the original Perl rules
New in version 0.4.0.
-
stem(word)[source]¶ Return UEA-Lite stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str or (str, int)
Examples
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.uealite(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]¶ Return UEA-Lite stem.
This is a wrapper for
UEALite.stem().- Parameters
word (str) -- The word to stem
max_word_length (int) -- The maximum word length allowed
max_acro_length (int) -- The maximum acronym length allowed
return_rule_no (bool) -- If True, returns the stem along with rule number
var (str) --
Variant rules to use:
Adamsto use Jason Adams' rulesPerlto use the original Perl rules
- Returns
Word stem
- Return type
str or (str, int)
Examples
>>> uealite('readings') 'read' >>> uealite('insulted') 'insult' >>> uealite('cussed') 'cuss' >>> uealite('fancies') 'fancy' >>> uealite('eroded') 'erode'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the UEALite.stem method instead.
-
class
abydos.stemmer.SStemmer[source]¶ Bases:
abydos.stemmer._stemmer._StemmerS-stemmer.
The S stemmer is defined in [Har91].
New in version 0.3.6.
-
stem(word)[source]¶ Return the S-stemmed form of a word.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SStemmer() >>> stmr.stem('summaries') 'summary' >>> stmr.stem('summary') 'summary' >>> stmr.stem('towers') 'tower' >>> stmr.stem('reading') 'reading' >>> stmr.stem('census') 'census'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.s_stemmer(word)[source]¶ Return the S-stemmed form of a word.
This is a wrapper for
SStemmer.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> s_stemmer('summaries') 'summary' >>> s_stemmer('summary') 'summary' >>> s_stemmer('towers') 'tower' >>> s_stemmer('reading') 'reading' >>> s_stemmer('census') 'census'
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SStemmer.stem method instead.
-
class
abydos.stemmer.Caumanns[source]¶ Bases:
abydos.stemmer._stemmer._StemmerCaumanns stemmer.
Jörg Caumanns' stemmer is described in his article in [Cau99].
This implementation is based on the GermanStemFilter described at [Lan13].
New in version 0.3.6.
-
stem(word)[source]¶ Return Caumanns German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Caumanns() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabier'
New in version 0.2.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.caumanns(word)[source]¶ Return Caumanns German stem.
This is a wrapper for
Caumanns.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> caumanns('lesen') 'les' >>> caumanns('graues') 'grau' >>> caumanns('buchstabieren') 'buchstabier'
New in version 0.2.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Caumanns.stem method instead.
-
class
abydos.stemmer.Schinke[source]¶ Bases:
abydos.stemmer._stemmer._StemmerSchinke stemmer.
This is defined in [SGRW96].
New in version 0.3.6.
-
stem(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Schinke() >>> stmr.stem('atque') {'n': 'atque', 'v': 'atque'} >>> stmr.stem('census') {'n': 'cens', 'v': 'censu'} >>> stmr.stem('virum') {'n': 'uir', 'v': 'uiru'} >>> stmr.stem('populusque') {'n': 'popul', 'v': 'populu'} >>> stmr.stem('senatus') {'n': 'senat', 'v': 'senatu'}
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.schinke(word)[source]¶ Return the stem of a word according to the Schinke stemmer.
This is a wrapper for
Schinke.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> schinke('atque') {'n': 'atque', 'v': 'atque'} >>> schinke('census') {'n': 'cens', 'v': 'censu'} >>> schinke('virum') {'n': 'uir', 'v': 'uiru'} >>> schinke('populusque') {'n': 'popul', 'v': 'populu'} >>> schinke('senatus') {'n': 'senat', 'v': 'senatu'}
New in version 0.3.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Schinke.stem method instead.
-
class
abydos.stemmer.Porter(early_english=False)[source]¶ Bases:
abydos.stemmer._stemmer._StemmerPorter stemmer.
The Porter stemmer is described in [Por80].
New in version 0.3.6.
Initialize Porter instance.
- Parameters
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
New in version 0.4.0.
-
stem(word)[source]¶ Return Porter stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Porter() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr = Porter(early_english=True) >>> stmr.stem('eateth') 'eat'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.porter(word, early_english=False)[source]¶ Return Porter stem.
This is a wrapper for
Porter.stem().- Parameters
word (str) -- The word to stem
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
- Returns
Word stem
- Return type
str
Examples
>>> porter('reading') 'read' >>> porter('suspension') 'suspens' >>> porter('elusiveness') 'elus'
>>> porter('eateth', early_english=True) 'eat'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Porter.stem method instead.
-
class
abydos.stemmer.Porter2(early_english=False)[source]¶ Bases:
abydos.stemmer._snowball._SnowballPorter2 (Snowball English) stemmer.
The Porter2 (Snowball English) stemmer is defined in [Por02].
New in version 0.3.6.
Initialize Porter2 instance.
- Parameters
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
New in version 0.4.0.
-
stem(word)[source]¶ Return the Porter2 (Snowball English) stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Porter2() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr = Porter2(early_english=True) >>> stmr.stem('eateth') 'eat'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.porter2(word, early_english=False)[source]¶ Return the Porter2 (Snowball English) stem.
This is a wrapper for
Porter2.stem().- Parameters
word (str) -- The word to stem
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
- Returns
Word stem
- Return type
str
Examples
>>> porter2('reading') 'read' >>> porter2('suspension') 'suspens' >>> porter2('elusiveness') 'elus'
>>> porter2('eateth', early_english=True) 'eat'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Porter2.stem method instead.
-
class
abydos.stemmer.SnowballDanish[source]¶ Bases:
abydos.stemmer._snowball._SnowballSnowball Danish stemmer.
The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html
New in version 0.3.6.
-
stem(word)[source]¶ Return Snowball Danish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballDanish() >>> stmr.stem('underviser') 'undervis' >>> stmr.stem('suspension') 'suspension' >>> stmr.stem('sikkerhed') 'sikker'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.sb_danish(word)[source]¶ Return Snowball Danish stem.
This is a wrapper for
SnowballDanish.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_danish('underviser') 'undervis' >>> sb_danish('suspension') 'suspension' >>> sb_danish('sikkerhed') 'sikker'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballDanish.stem method instead.
-
class
abydos.stemmer.SnowballDutch[source]¶ Bases:
abydos.stemmer._snowball._SnowballSnowball Dutch stemmer.
The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html
New in version 0.3.6.
-
stem(word)[source]¶ Return Snowball Dutch stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballDutch() >>> stmr.stem('lezen') 'lez' >>> stmr.stem('opschorting') 'opschort' >>> stmr.stem('ongrijpbaarheid') 'ongrijp'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.sb_dutch(word)[source]¶ Return Snowball Dutch stem.
This is a wrapper for
SnowballDutch.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_dutch('lezen') 'lez' >>> sb_dutch('opschorting') 'opschort' >>> sb_dutch('ongrijpbaarheid') 'ongrijp'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballDutch.stem method instead.
-
class
abydos.stemmer.SnowballGerman(alternate_vowels=False)[source]¶ Bases:
abydos.stemmer._snowball._SnowballSnowball German stemmer.
The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html
New in version 0.3.6.
Initialize SnowballGerman instance.
- Parameters
alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
New in version 0.4.0.
-
stem(word)[source]¶ Return Snowball German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballGerman() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabi'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.stemmer.sb_german(word, alternate_vowels=False)[source]¶ Return Snowball German stem.
This is a wrapper for
SnowballGerman.stem().- Parameters
word (str) -- The word to stem
alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
- Returns
Word stem
- Return type
str
Examples
>>> sb_german('lesen') 'les' >>> sb_german('graues') 'grau' >>> sb_german('buchstabieren') 'buchstabi'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballGerman.stem method instead.
-
class
abydos.stemmer.SnowballNorwegian[source]¶ Bases:
abydos.stemmer._snowball._SnowballSnowball Norwegian stemmer.
The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
New in version 0.3.6.
-
stem(word)[source]¶ Return Snowball Norwegian stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballNorwegian() >>> stmr.stem('lese') 'les' >>> stmr.stem('suspensjon') 'suspensjon' >>> stmr.stem('sikkerhet') 'sikker'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.sb_norwegian(word)[source]¶ Return Snowball Norwegian stem.
This is a wrapper for
SnowballNorwegian.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_norwegian('lese') 'les' >>> sb_norwegian('suspensjon') 'suspensjon' >>> sb_norwegian('sikkerhet') 'sikker'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballNorwegian.stem method instead.
-
class
abydos.stemmer.SnowballSwedish[source]¶ Bases:
abydos.stemmer._snowball._SnowballSnowball Swedish stemmer.
The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html
New in version 0.3.6.
-
stem(word)[source]¶ Return Snowball Swedish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballSwedish() >>> stmr.stem('undervisa') 'undervis' >>> stmr.stem('suspension') 'suspension' >>> stmr.stem('visshet') 'viss'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.sb_swedish(word)[source]¶ Return Snowball Swedish stem.
This is a wrapper for
SnowballSwedish.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> sb_swedish('undervisa') 'undervis' >>> sb_swedish('suspension') 'suspension' >>> sb_swedish('visshet') 'viss'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballSwedish.stem method instead.
-
class
abydos.stemmer.CLEFGerman[source]¶ Bases:
abydos.stemmer._stemmer._StemmerCLEF German stemmer.
The CLEF German stemmer is defined at [Sav05].
New in version 0.3.6.
-
stem(word)[source]¶ Return CLEF German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGerman() >>> stmr.stem('lesen') 'lese' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabier'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.clef_german(word)[source]¶ Return CLEF German stem.
This is a wrapper for
CLEFGerman.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> clef_german('lesen') 'lese' >>> clef_german('graues') 'grau' >>> clef_german('buchstabieren') 'buchstabier'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFGerman.stem method instead.
-
class
abydos.stemmer.CLEFGermanPlus[source]¶ Bases:
abydos.stemmer._stemmer._StemmerCLEF German stemmer plus.
The CLEF German stemmer plus is defined at [Sav05].
New in version 0.3.6.
-
stem(word)[source]¶ Return 'CLEF German stemmer plus' stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGermanPlus() >>> clef_german_plus('lesen') 'les' >>> clef_german_plus('graues') 'grau' >>> clef_german_plus('buchstabieren') 'buchstabi'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.clef_german_plus(word)[source]¶ Return 'CLEF German stemmer plus' stem.
This is a wrapper for
CLEFGermanPlus.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGermanPlus() >>> clef_german_plus('lesen') 'les' >>> clef_german_plus('graues') 'grau' >>> clef_german_plus('buchstabieren') 'buchstabi'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFGermanPlus.stem method instead.
-
class
abydos.stemmer.CLEFSwedish[source]¶ Bases:
abydos.stemmer._stemmer._StemmerCLEF Swedish stemmer.
The CLEF Swedish stemmer is defined at [Sav05].
New in version 0.3.6.
-
stem(word)[source]¶ Return CLEF Swedish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> clef_swedish('undervisa') 'undervis' >>> clef_swedish('suspension') 'suspensio' >>> clef_swedish('visshet') 'viss'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.stemmer.clef_swedish(word)[source]¶ Return CLEF Swedish stem.
This is a wrapper for
CLEFSwedish.stem().- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> clef_swedish('undervisa') 'undervis' >>> clef_swedish('suspension') 'suspensio' >>> clef_swedish('visshet') 'viss'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFSwedish.stem method instead.