abydos.stemmer package

abydos.stemmer.

The stemmer package collects stemmer classes for a number of languages including:

Each stemmer has a stem method, which takes a word and returns its stemmed form:

>>> stmr = Porter()
>>> stmr.stem('democracy')
'democraci'
>>> stmr.stem('trusted')
'trust'

class abydos.stemmer.Lovins[source]

Bases: abydos.stemmer._stemmer._Stemmer

Lovins stemmer.

The Lovins stemmer is described in Julie Beth Lovins's article [Lov68].

New in version 0.3.6.

Initialize the stemmer.

New in version 0.3.6.

stem(word)[source]

Return Lovins stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = Lovins()
>>> stmr.stem('reading')
'read'
>>> stmr.stem('suspension')
'suspens'
>>> stmr.stem('elusiveness')
'elus'

New in version 0.2.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.lovins(word)[source]

Return Lovins stem.

This is a wrapper for Lovins.stem().

Parameters

word (str) -- The word to stem

Returns

str

Return type

Word stem

Examples

>>> lovins('reading')
'read'
>>> lovins('suspension')
'suspens'
>>> lovins('elusiveness')
'elus'

New in version 0.2.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Lovins.stem method instead.

class abydos.stemmer.PaiceHusk[source]

Bases: abydos.stemmer._stemmer._Stemmer

Paice-Husk stemmer.

Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk

This is based on the algorithm's description in [Pai90].

New in version 0.3.6.

stem(word)[source]

Return Paice-Husk stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = PaiceHusk()
>>> stmr.stem('assumption')
'assum'
>>> stmr.stem('verifiable')
'ver'
>>> stmr.stem('fancies')
'fant'
>>> stmr.stem('fanciful')
'fancy'
>>> stmr.stem('torment')
'tor'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.paice_husk(word)[source]

Return Paice-Husk stem.

This is a wrapper for PaiceHusk.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> paice_husk('assumption')
'assum'
>>> paice_husk('verifiable')
'ver'
>>> paice_husk('fancies')
'fant'
>>> paice_husk('fanciful')
'fancy'
>>> paice_husk('torment')
'tor'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the PaiceHusk.stem method instead.

class abydos.stemmer.UEALite(max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]

Bases: abydos.stemmer._stemmer._Stemmer

UEA-Lite stemmer.

The UEA-Lite stemmer is discussed in [JS05].

This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams' Ruby port.

Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]

New in version 0.3.6.

Initialize UEALite instance.

Parameters
  • max_word_length (int) -- The maximum word length allowed

  • max_acro_length (int) -- The maximum acronym length allowed

  • return_rule_no (bool) -- If True, returns the stem along with rule number

  • var (str) --

    Variant rules to use:

    • standard to use the original (Java-version) rules

    • Adams to use Jason Adams' rules

    • Perl to use the original Perl rules

New in version 0.4.0.

stem(word)[source]

Return UEA-Lite stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str or (str, int)

Examples

>>> uealite('readings')
'read'
>>> uealite('insulted')
'insult'
>>> uealite('cussed')
'cuss'
>>> uealite('fancies')
'fancy'
>>> uealite('eroded')
'erode'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.uealite(word, max_word_length=20, max_acro_length=8, return_rule_no=False, var='standard')[source]

Return UEA-Lite stem.

This is a wrapper for UEALite.stem().

Parameters
  • word (str) -- The word to stem

  • max_word_length (int) -- The maximum word length allowed

  • max_acro_length (int) -- The maximum acronym length allowed

  • return_rule_no (bool) -- If True, returns the stem along with rule number

  • var (str) --

    Variant rules to use:

    • Adams to use Jason Adams' rules

    • Perl to use the original Perl rules

Returns

Word stem

Return type

str or (str, int)

Examples

>>> uealite('readings')
'read'
>>> uealite('insulted')
'insult'
>>> uealite('cussed')
'cuss'
>>> uealite('fancies')
'fancy'
>>> uealite('eroded')
'erode'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the UEALite.stem method instead.

class abydos.stemmer.SStemmer[source]

Bases: abydos.stemmer._stemmer._Stemmer

S-stemmer.

The S stemmer is defined in [Har91].

New in version 0.3.6.

stem(word)[source]

Return the S-stemmed form of a word.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = SStemmer()
>>> stmr.stem('summaries')
'summary'
>>> stmr.stem('summary')
'summary'
>>> stmr.stem('towers')
'tower'
>>> stmr.stem('reading')
'reading'
>>> stmr.stem('census')
'census'

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.s_stemmer(word)[source]

Return the S-stemmed form of a word.

This is a wrapper for SStemmer.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> s_stemmer('summaries')
'summary'
>>> s_stemmer('summary')
'summary'
>>> s_stemmer('towers')
'tower'
>>> s_stemmer('reading')
'reading'
>>> s_stemmer('census')
'census'

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SStemmer.stem method instead.

class abydos.stemmer.Caumanns[source]

Bases: abydos.stemmer._stemmer._Stemmer

Caumanns stemmer.

Jörg Caumanns' stemmer is described in his article in [Cau99].

This implementation is based on the GermanStemFilter described at [Lan13].

New in version 0.3.6.

stem(word)[source]

Return Caumanns German stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = Caumanns()
>>> stmr.stem('lesen')
'les'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabier'

New in version 0.2.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.caumanns(word)[source]

Return Caumanns German stem.

This is a wrapper for Caumanns.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> caumanns('lesen')
'les'
>>> caumanns('graues')
'grau'
>>> caumanns('buchstabieren')
'buchstabier'

New in version 0.2.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Caumanns.stem method instead.

class abydos.stemmer.Schinke[source]

Bases: abydos.stemmer._stemmer._Stemmer

Schinke stemmer.

This is defined in [SGRW96].

New in version 0.3.6.

stem(word)[source]

Return the stem of a word according to the Schinke stemmer.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = Schinke()
>>> stmr.stem('atque')
{'n': 'atque', 'v': 'atque'}
>>> stmr.stem('census')
{'n': 'cens', 'v': 'censu'}
>>> stmr.stem('virum')
{'n': 'uir', 'v': 'uiru'}
>>> stmr.stem('populusque')
{'n': 'popul', 'v': 'populu'}
>>> stmr.stem('senatus')
{'n': 'senat', 'v': 'senatu'}

New in version 0.3.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.schinke(word)[source]

Return the stem of a word according to the Schinke stemmer.

This is a wrapper for Schinke.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> schinke('atque')
{'n': 'atque', 'v': 'atque'}
>>> schinke('census')
{'n': 'cens', 'v': 'censu'}
>>> schinke('virum')
{'n': 'uir', 'v': 'uiru'}
>>> schinke('populusque')
{'n': 'popul', 'v': 'populu'}
>>> schinke('senatus')
{'n': 'senat', 'v': 'senatu'}

New in version 0.3.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Schinke.stem method instead.

class abydos.stemmer.Porter(early_english=False)[source]

Bases: abydos.stemmer._stemmer._Stemmer

Porter stemmer.

The Porter stemmer is described in [Por80].

New in version 0.3.6.

Initialize Porter instance.

Parameters

early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)

New in version 0.4.0.

stem(word)[source]

Return Porter stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = Porter()
>>> stmr.stem('reading')
'read'
>>> stmr.stem('suspension')
'suspens'
>>> stmr.stem('elusiveness')
'elus'
>>> stmr = Porter(early_english=True)
>>> stmr.stem('eateth')
'eat'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.porter(word, early_english=False)[source]

Return Porter stem.

This is a wrapper for Porter.stem().

Parameters
  • word (str) -- The word to stem

  • early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)

Returns

Word stem

Return type

str

Examples

>>> porter('reading')
'read'
>>> porter('suspension')
'suspens'
>>> porter('elusiveness')
'elus'
>>> porter('eateth', early_english=True)
'eat'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Porter.stem method instead.

class abydos.stemmer.Porter2(early_english=False)[source]

Bases: abydos.stemmer._snowball._Snowball

Porter2 (Snowball English) stemmer.

The Porter2 (Snowball English) stemmer is defined in [Por02].

New in version 0.3.6.

Initialize Porter2 instance.

Parameters

early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)

New in version 0.4.0.

stem(word)[source]

Return the Porter2 (Snowball English) stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = Porter2()
>>> stmr.stem('reading')
'read'
>>> stmr.stem('suspension')
'suspens'
>>> stmr.stem('elusiveness')
'elus'
>>> stmr = Porter2(early_english=True)
>>> stmr.stem('eateth')
'eat'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.porter2(word, early_english=False)[source]

Return the Porter2 (Snowball English) stem.

This is a wrapper for Porter2.stem().

Parameters
  • word (str) -- The word to stem

  • early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)

Returns

Word stem

Return type

str

Examples

>>> porter2('reading')
'read'
>>> porter2('suspension')
'suspens'
>>> porter2('elusiveness')
'elus'
>>> porter2('eateth', early_english=True)
'eat'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Porter2.stem method instead.

class abydos.stemmer.SnowballDanish[source]

Bases: abydos.stemmer._snowball._Snowball

Snowball Danish stemmer.

The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html

New in version 0.3.6.

stem(word)[source]

Return Snowball Danish stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = SnowballDanish()
>>> stmr.stem('underviser')
'undervis'
>>> stmr.stem('suspension')
'suspension'
>>> stmr.stem('sikkerhed')
'sikker'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.sb_danish(word)[source]

Return Snowball Danish stem.

This is a wrapper for SnowballDanish.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> sb_danish('underviser')
'undervis'
>>> sb_danish('suspension')
'suspension'
>>> sb_danish('sikkerhed')
'sikker'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballDanish.stem method instead.

class abydos.stemmer.SnowballDutch[source]

Bases: abydos.stemmer._snowball._Snowball

Snowball Dutch stemmer.

The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html

New in version 0.3.6.

stem(word)[source]

Return Snowball Dutch stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = SnowballDutch()
>>> stmr.stem('lezen')
'lez'
>>> stmr.stem('opschorting')
'opschort'
>>> stmr.stem('ongrijpbaarheid')
'ongrijp'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.sb_dutch(word)[source]

Return Snowball Dutch stem.

This is a wrapper for SnowballDutch.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> sb_dutch('lezen')
'lez'
>>> sb_dutch('opschorting')
'opschort'
>>> sb_dutch('ongrijpbaarheid')
'ongrijp'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballDutch.stem method instead.

class abydos.stemmer.SnowballGerman(alternate_vowels=False)[source]

Bases: abydos.stemmer._snowball._Snowball

Snowball German stemmer.

The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html

New in version 0.3.6.

Initialize SnowballGerman instance.

Parameters

alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm

New in version 0.4.0.

stem(word)[source]

Return Snowball German stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = SnowballGerman()
>>> stmr.stem('lesen')
'les'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabi'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.sb_german(word, alternate_vowels=False)[source]

Return Snowball German stem.

This is a wrapper for SnowballGerman.stem().

Parameters
  • word (str) -- The word to stem

  • alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm

Returns

Word stem

Return type

str

Examples

>>> sb_german('lesen')
'les'
>>> sb_german('graues')
'grau'
>>> sb_german('buchstabieren')
'buchstabi'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballGerman.stem method instead.

class abydos.stemmer.SnowballNorwegian[source]

Bases: abydos.stemmer._snowball._Snowball

Snowball Norwegian stemmer.

The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html

New in version 0.3.6.

stem(word)[source]

Return Snowball Norwegian stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = SnowballNorwegian()
>>> stmr.stem('lese')
'les'
>>> stmr.stem('suspensjon')
'suspensjon'
>>> stmr.stem('sikkerhet')
'sikker'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.sb_norwegian(word)[source]

Return Snowball Norwegian stem.

This is a wrapper for SnowballNorwegian.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> sb_norwegian('lese')
'les'
>>> sb_norwegian('suspensjon')
'suspensjon'
>>> sb_norwegian('sikkerhet')
'sikker'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballNorwegian.stem method instead.

class abydos.stemmer.SnowballSwedish[source]

Bases: abydos.stemmer._snowball._Snowball

Snowball Swedish stemmer.

The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html

New in version 0.3.6.

stem(word)[source]

Return Snowball Swedish stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = SnowballSwedish()
>>> stmr.stem('undervisa')
'undervis'
>>> stmr.stem('suspension')
'suspension'
>>> stmr.stem('visshet')
'viss'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.sb_swedish(word)[source]

Return Snowball Swedish stem.

This is a wrapper for SnowballSwedish.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> sb_swedish('undervisa')
'undervis'
>>> sb_swedish('suspension')
'suspension'
>>> sb_swedish('visshet')
'viss'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the SnowballSwedish.stem method instead.

class abydos.stemmer.CLEFGerman[source]

Bases: abydos.stemmer._stemmer._Stemmer

CLEF German stemmer.

The CLEF German stemmer is defined at [Sav05].

New in version 0.3.6.

stem(word)[source]

Return CLEF German stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = CLEFGerman()
>>> stmr.stem('lesen')
'lese'
>>> stmr.stem('graues')
'grau'
>>> stmr.stem('buchstabieren')
'buchstabier'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.clef_german(word)[source]

Return CLEF German stem.

This is a wrapper for CLEFGerman.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> clef_german('lesen')
'lese'
>>> clef_german('graues')
'grau'
>>> clef_german('buchstabieren')
'buchstabier'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFGerman.stem method instead.

class abydos.stemmer.CLEFGermanPlus[source]

Bases: abydos.stemmer._stemmer._Stemmer

CLEF German stemmer plus.

The CLEF German stemmer plus is defined at [Sav05].

New in version 0.3.6.

stem(word)[source]

Return 'CLEF German stemmer plus' stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = CLEFGermanPlus()
>>> clef_german_plus('lesen')
'les'
>>> clef_german_plus('graues')
'grau'
>>> clef_german_plus('buchstabieren')
'buchstabi'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.clef_german_plus(word)[source]

Return 'CLEF German stemmer plus' stem.

This is a wrapper for CLEFGermanPlus.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> stmr = CLEFGermanPlus()
>>> clef_german_plus('lesen')
'les'
>>> clef_german_plus('graues')
'grau'
>>> clef_german_plus('buchstabieren')
'buchstabi'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFGermanPlus.stem method instead.

class abydos.stemmer.CLEFSwedish[source]

Bases: abydos.stemmer._stemmer._Stemmer

CLEF Swedish stemmer.

The CLEF Swedish stemmer is defined at [Sav05].

New in version 0.3.6.

stem(word)[source]

Return CLEF Swedish stem.

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> clef_swedish('undervisa')
'undervis'
>>> clef_swedish('suspension')
'suspensio'
>>> clef_swedish('visshet')
'viss'

New in version 0.1.0.

Changed in version 0.3.6: Encapsulated in class

abydos.stemmer.clef_swedish(word)[source]

Return CLEF Swedish stem.

This is a wrapper for CLEFSwedish.stem().

Parameters

word (str) -- The word to stem

Returns

Word stem

Return type

str

Examples

>>> clef_swedish('undervisa')
'undervis'
>>> clef_swedish('suspension')
'suspensio'
>>> clef_swedish('visshet')
'viss'

New in version 0.1.0.

Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the CLEFSwedish.stem method instead.