abydos.stemmer package

abydos.stemmer.

The stemmer module defines word stemmers including:

  • the Lovins stemmer
  • the Porter and Porter2 (Snowball English) stemmers
  • Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
  • CLEF German, German plus, and Swedish stemmers
  • Caumanns German stemmer
  • UEA-Lite Stemmer
  • Paice-Husk Stemmer
  • Schinke Latin stemmer
  • S stemmer
abydos.stemmer.lovins(word)[source]

Return Lovins stem.

Lovins stemmer

The Lovins stemmer is described in Julie Beth Lovins’s article [Lov68].

Parameters:word (str) – the word to stem
Returns:word stem
Return type:str
>>> lovins('reading')
'read'
>>> lovins('suspension')
'suspens'
>>> lovins('elusiveness')
'elus'
abydos.stemmer.paice_husk(word)[source]

Return Paice-Husk stem.

Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk

This is based on the algorithm’s description in [Pai90].

Parameters:word (str) – the word to stem
Returns:the stemmed word
Return type:str
>>> paice_husk('assumption')
'assum'
>>> paice_husk('verifiable')
'ver'
>>> paice_husk('fancies')
'fant'
>>> paice_husk('fanciful')
'fancy'
>>> paice_husk('torment')
'tor'
abydos.stemmer.uealite(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var=None)[source]

Return UEA-Lite stem.

The UEA-Lite stemmer is discussed in [JS05].

This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams’ Ruby port.

Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]

Parameters:
  • word (str) – the word to calculate the stem of
  • max_word_length (int) – the maximum word length allowed
  • max_acro_length (int) – the maximum acryonym length allowed
  • return_rule_no (bool) – if True, returns the stem along with rule number
  • var (str) – variant to use (set to ‘Adams’ to use Jason Adams’ rules, or ‘Perl’ to use the original Perl set of rules)
Returns:

word stem

Return type:

str or (str, int)

>>> uealite('readings')
'read'
>>> uealite('insulted')
'insult'
>>> uealite('cussed')
'cuss'
>>> uealite('fancies')
'fancy'
>>> uealite('eroded')
'erode'
abydos.stemmer.s_stemmer(word)[source]

Return the S-stemmed form of a word.

The S stemmer is defined in [Har91].

Parameters:word (str) – the word to stem
Returns:the stemmed word
Return type:str
>>> s_stemmer('summaries')
'summary'
>>> s_stemmer('summary')
'summary'
>>> s_stemmer('towers')
'tower'
>>> s_stemmer('reading')
'reading'
>>> s_stemmer('census')
'census'
abydos.stemmer.caumanns(word)[source]

Return Caumanns German stem.

Jörg Caumanns’ stemmer is described in his article in [Cau99].

This implementation is based on the GermanStemFilter described at [Lan13].

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> caumanns('lesen')
'les'
>>> caumanns('graues')
'grau'
>>> caumanns('buchstabieren')
'buchstabier'
abydos.stemmer.schinke(word)[source]

Return the stem of a word according to the Schinke stemmer.

This is defined in [SGRW96].

Parameters:word (str) – the word to stem
Returns:a dict of the noun- and verb-stemmed word
Return type:dict
>>> schinke('atque')
{'n': 'atque', 'v': 'atque'}
>>> schinke('census')
{'n': 'cens', 'v': 'censu'}
>>> schinke('virum')
{'n': 'uir', 'v': 'uiru'}
>>> schinke('populusque')
{'n': 'popul', 'v': 'populu'}
>>> schinke('senatus')
{'n': 'senat', 'v': 'senatu'}
abydos.stemmer.porter(word, early_english=False)[source]

Return Porter stem.

The Porter stemmer is described in [Por80].

Parameters:
  • word (str) – the word to calculate the stem of
  • early_english (bool) – set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns:

word stem

Return type:

str

>>> porter('reading')
'read'
>>> porter('suspension')
'suspens'
>>> porter('elusiveness')
'elus'
>>> porter('eateth', early_english=True)
'eat'
abydos.stemmer.porter2(word, early_english=False)[source]

Return the Porter2 (Snowball English) stem.

The Porter2 (Snowball English) stemmer is defined in [Por02].

Parameters:
  • word (str) – the word to calculate the stem of
  • early_english (bool) – set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
Returns:

word stem

Return type:

str

>>> porter2('reading')
'read'
>>> porter2('suspension')
'suspens'
>>> porter2('elusiveness')
'elus'
>>> porter2('eateth', early_english=True)
'eat'
abydos.stemmer.sb_danish(word)[source]

Return Snowball Danish stem.

The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> sb_danish('underviser')
'undervis'
>>> sb_danish('suspension')
'suspension'
>>> sb_danish('sikkerhed')
'sikker'
abydos.stemmer.sb_dutch(word)[source]

Return Snowball Dutch stem.

The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> sb_dutch('lezen')
'lez'
>>> sb_dutch('opschorting')
'opschort'
>>> sb_dutch('ongrijpbaarheid')
'ongrijp'
abydos.stemmer.sb_german(word, alternate_vowels=False)[source]

Return Snowball German stem.

The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html

Parameters:
  • word (str) – the word to calculate the stem of
  • alternate_vowels (bool) – composes ae as ä, oe as ö, and ue as ü before running the algorithm
Returns:

word stem

Return type:

str

>>> sb_german('lesen')
'les'
>>> sb_german('graues')
'grau'
>>> sb_german('buchstabieren')
'buchstabi'
abydos.stemmer.sb_norwegian(word)[source]

Return Snowball Norwegian stem.

The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> sb_norwegian('lese')
'les'
>>> sb_norwegian('suspensjon')
'suspensjon'
>>> sb_norwegian('sikkerhet')
'sikker'
abydos.stemmer.sb_swedish(word)[source]

Return Snowball Swedish stem.

The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> sb_swedish('undervisa')
'undervis'
>>> sb_swedish('suspension')
'suspension'
>>> sb_swedish('visshet')
'viss'
abydos.stemmer.clef_german(word)[source]

Return CLEF German stem.

The CLEF German stemmer is defined at [Sav05].

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> clef_german('lesen')
'lese'
>>> clef_german('graues')
'grau'
>>> clef_german('buchstabieren')
'buchstabier'
abydos.stemmer.clef_german_plus(word)[source]

Return ‘CLEF German stemmer plus’ stem.

The CLEF German stemmer plus is defined at [Sav05].

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> clef_german_plus('lesen')
'les'
>>> clef_german_plus('graues')
'grau'
>>> clef_german_plus('buchstabieren')
'buchstabi'
abydos.stemmer.clef_swedish(word)[source]

Return CLEF Swedish stem.

The CLEF Swedish stemmer is defined at [Sav05].

Parameters:word (str) – the word to calculate the stem of
Returns:word stem
Return type:str
>>> clef_swedish('undervisa')
'undervis'
>>> clef_swedish('suspension')
'suspensio'
>>> clef_swedish('visshet')
'viss'