Introduction¶
Abydos¶
Abydos is a library of phonetic algorithms, string distance measures & metrics, stemmers, and string fingerprinters including:
- Phonetic algorithms
- Robert C. Russell’s Index
- American Soundex
- Refined Soundex
- Daitch-Mokotoff Soundex
- Kölner Phonetik
- NYSIIS
- Match Rating Algorithm
- Metaphone
- Double Metaphone
- Caverphone
- Alpha Search Inquiry System
- Fuzzy Soundex
- Phonex
- Phonem
- Phonix
- SfinxBis
- phonet
- Standardized Phonetic Frequency Code
- Statistics Canada
- Lein
- Roger Root
- Oxford Name Compression Algorithm (ONCA)
- Eudex phonetic hash
- Haase Phonetik
- Reth-Schek Phonetik
- FONEM
- Parmar-Kumbharana
- Davidson’s Consonant Code
- SoundD
- PSHP Soundex/Viewex Coding
- an early version of Henry Code
- Norphone
- Dolby Code
- Phonetic Spanish
- Spanish Metaphone
- MetaSoundex
- SoundexBR
- NRL English-to-phoneme
- Beider-Morse Phonetic Matching
- String distance metrics
- Levenshtein distance
- Optimal String Alignment distance
- Levenshtein-Damerau distance
- Hamming distance
- Tversky index
- Sørensen–Dice coefficient & distance
- Jaccard similarity coefficient & distance
- overlap similarity & distance
- Tanimoto coefficient & distance
- Minkowski distance & similarity
- Manhattan distance & similarity
- Euclidean distance & similarity
- Chebyshev distance
- cosine similarity & distance
- Jaro distance
- Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
- Longest common substring
- Ratcliff-Obershelp similarity & distance
- Match Rating Algorithm similarity
- Normalized Compression Distance (NCD) & similarity
- Monge-Elkan similarity & distance
- Matrix similarity
- Needleman-Wunsch score
- Smither-Waterman score
- Gotoh score
- Length similarity
- Prefix, Suffix, and Identity similarity & distance
- Modified Language-Independent Product Name Search (MLIPNS) similarity & distance
- Bag distance
- Editex distance
- Eudex distances
- Sift4 distance
- Baystat distance & similarity
- Typo distance
- Indel distance
- Synoname
- Stemmers
- the Lovins stemmer
- the Porter and Porter2 (Snowball English) stemmers
- Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
- CLEF German, German plus, and Swedish stemmers
- Caumann’s German stemmer
- UEA-Lite Stemmer
- Paice-Husk Stemmer
- Schinke Latin stemmer
- S stemmer
- String Fingerprints
- string fingerprint
- q-gram fingerprint
- phonetic fingerprint
- Pollock & Zomora’s skeleton key
- Pollock & Zomora’s omission key
- Cisłak & Grabowski’s occurrence fingerprint
- Cisłak & Grabowski’s occurrence halved fingerprint
- Cisłak & Grabowski’s count fingerprint
- Cisłak & Grabowski’s position fingerprint
- Synoname Toolcode
Installation¶
Required libraries:
- Numpy
- Six
Recommended libraries:
- PylibLZMA (Python 2 only–for LZMA compression string distance metric)
To install Abydos (master) from Github source:
git clone https://github.com/chrislit/abydos.git --recursive
cd abydos
python setup install
If your default python command calls Python 2.7 but you want to install for Python 3, you may instead need to call:
python3 setup install
To install Abydos (latest release) from PyPI using pip:
pip install abydos
To install from conda-forge:
conda install abydos
It should run on Python 2.7 and Python 3.3-3.7.
Testing & Contributing¶
To run the whole test-suite just call tox:
tox
The tox setup has the following environments: py27, py36, doctest, py27-regression, py36-regression, pylint, pycodestyle, flake8, doc8, badges, docs, py27-fuzz, & py36-fuzz. So if only want to generate documentation (in HTML, EPUB, & PDF formats), just call:
tox -e docs
In order to only run & generate Flake8 reports, call:
tox -e flake8
Contributions such as bug reports, PRs, suggestions, desired new features, etc. are welcome through the Github Issues & Pull requests.
License¶
Abydos is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/gpl.txt>.