abydos.compression package¶
abydos.compression.
The compression package defines compression and compression-related functions for use within Abydos, including implementations of the following:
Arithmetic
for arithmetic coding
BWT
for Burrows-Wheeler Transform
RLE
for Run-Length Encoding
Each class exposes encode
and decode
methods for performing and
reversing its encoding. For example, the Burrows-Wheeler Transform can be
performed by creating a BWT
object and then calling
BWT.encode()
on a string:
>>> bwt = BWT()
>>> bwt.encode('^BANANA')
'ANNB^AA\x00'
-
class
abydos.compression.
Arithmetic
(text=None)[source]¶ Bases:
object
Arithmetic Coder.
This is based on Andrew Dalke's public domain implementation [Dal05]. It has been ported to use the fractions.Fraction class.
New in version 0.3.6.
Initialize arithmetic coder object.
- Parameters
text (str) -- The training text
New in version 0.3.6.
-
decode
(longval, nbits)[source]¶ Decode the number to a string using the given statistics.
- Parameters
longval (int) -- The first part of an encoded tuple from encode
nbits (int) -- The second part of an encoded tuple from encode
- Returns
The arithmetically decoded text
- Return type
str
Example
>>> ac = Arithmetic('the quick brown fox jumped over the lazy dog') >>> ac.decode(16720586181, 34) 'align'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode
(text)[source]¶ Encode a text using arithmetic coding.
Text and the 0-order probability statistics -> longval, nbits
The encoded number is Fraction(longval, 2**nbits)
- Parameters
text (str) -- A string to encode
- Returns
The arithmetically coded text
- Return type
tuple
Example
>>> ac = Arithmetic('the quick brown fox jumped over the lazy dog') >>> ac.encode('align') (16720586181, 34)
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
get_probs
()[source]¶ Return the probs dictionary.
- Returns
The dictionary of probabilities
- Return type
dict
New in version 0.3.6.
-
set_probs
(probs)[source]¶ Set the probs dictionary.
- Parameters
probs (dict) -- The dictionary of probabilities
New in version 0.3.6.
-
train
(text)[source]¶ Generate a probability dict from the provided text.
Text to 0-order probability statistics as a dict
- Parameters
text (str) -- The text data over which to calculate probability statistics. This must not contain the NUL (0x00) character because that is used to indicate the end of data.
Example
>>> ac = Arithmetic() >>> ac.train('the quick brown fox jumped over the lazy dog') >>> ac.get_probs() {' ': (Fraction(0, 1), Fraction(8, 45)), 'o': (Fraction(8, 45), Fraction(4, 15)), 'e': (Fraction(4, 15), Fraction(16, 45)), 'u': (Fraction(16, 45), Fraction(2, 5)), 't': (Fraction(2, 5), Fraction(4, 9)), 'r': (Fraction(4, 9), Fraction(22, 45)), 'h': (Fraction(22, 45), Fraction(8, 15)), 'd': (Fraction(8, 15), Fraction(26, 45)), 'z': (Fraction(26, 45), Fraction(3, 5)), 'y': (Fraction(3, 5), Fraction(28, 45)), 'x': (Fraction(28, 45), Fraction(29, 45)), 'w': (Fraction(29, 45), Fraction(2, 3)), 'v': (Fraction(2, 3), Fraction(31, 45)), 'q': (Fraction(31, 45), Fraction(32, 45)), 'p': (Fraction(32, 45), Fraction(11, 15)), 'n': (Fraction(11, 15), Fraction(34, 45)), 'm': (Fraction(34, 45), Fraction(7, 9)), 'l': (Fraction(7, 9), Fraction(4, 5)), 'k': (Fraction(4, 5), Fraction(37, 45)), 'j': (Fraction(37, 45), Fraction(38, 45)), 'i': (Fraction(38, 45), Fraction(13, 15)), 'g': (Fraction(13, 15), Fraction(8, 9)), 'f': (Fraction(8, 9), Fraction(41, 45)), 'c': (Fraction(41, 45), Fraction(14, 15)), 'b': (Fraction(14, 15), Fraction(43, 45)), 'a': (Fraction(43, 45), Fraction(44, 45)), '\x00': (Fraction(44, 45), Fraction(1, 1))}
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.compression.
ac_decode
(longval, nbits, probs)[source]¶ Decode the number to a string using the given statistics.
This is a wrapper for
Arithmetic.decode()
.- Parameters
longval (int) -- The first part of an encoded tuple from ac_encode
nbits (int) -- The second part of an encoded tuple from ac_encode
probs (dict) -- A probability statistics dictionary generated by
Arithmetic.train()
- Returns
The arithmetically decoded text
- Return type
str
Example
>>> pr = ac_train('the quick brown fox jumped over the lazy dog') >>> ac_decode(16720586181, 34, pr) 'align'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Arithmetic.decode method instead.
-
abydos.compression.
ac_encode
(text, probs)[source]¶ Encode a text using arithmetic coding with the provided probabilities.
This is a wrapper for
Arithmetic.encode()
.- Parameters
text (str) -- A string to encode
probs (dict) -- A probability statistics dictionary generated by
Arithmetic.train()
- Returns
The arithmetically coded text
- Return type
tuple
Example
>>> pr = ac_train('the quick brown fox jumped over the lazy dog') >>> ac_encode('align', pr) (16720586181, 34)
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Arithmetic.encode method instead.
-
abydos.compression.
ac_train
(text)[source]¶ Generate a probability dict from the provided text.
This is a wrapper for
Arithmetic.train()
.- Parameters
text (str) -- The text data over which to calculate probability statistics. This must not contain the NUL (0x00) character because that's used to indicate the end of data.
- Returns
A probability dict
- Return type
dict
Example
>>> ac_train('the quick brown fox jumped over the lazy dog') {' ': (Fraction(0, 1), Fraction(8, 45)), 'o': (Fraction(8, 45), Fraction(4, 15)), 'e': (Fraction(4, 15), Fraction(16, 45)), 'u': (Fraction(16, 45), Fraction(2, 5)), 't': (Fraction(2, 5), Fraction(4, 9)), 'r': (Fraction(4, 9), Fraction(22, 45)), 'h': (Fraction(22, 45), Fraction(8, 15)), 'd': (Fraction(8, 15), Fraction(26, 45)), 'z': (Fraction(26, 45), Fraction(3, 5)), 'y': (Fraction(3, 5), Fraction(28, 45)), 'x': (Fraction(28, 45), Fraction(29, 45)), 'w': (Fraction(29, 45), Fraction(2, 3)), 'v': (Fraction(2, 3), Fraction(31, 45)), 'q': (Fraction(31, 45), Fraction(32, 45)), 'p': (Fraction(32, 45), Fraction(11, 15)), 'n': (Fraction(11, 15), Fraction(34, 45)), 'm': (Fraction(34, 45), Fraction(7, 9)), 'l': (Fraction(7, 9), Fraction(4, 5)), 'k': (Fraction(4, 5), Fraction(37, 45)), 'j': (Fraction(37, 45), Fraction(38, 45)), 'i': (Fraction(38, 45), Fraction(13, 15)), 'g': (Fraction(13, 15), Fraction(8, 9)), 'f': (Fraction(8, 9), Fraction(41, 45)), 'c': (Fraction(41, 45), Fraction(14, 15)), 'b': (Fraction(14, 15), Fraction(43, 45)), 'a': (Fraction(43, 45), Fraction(44, 45)), '\x00': (Fraction(44, 45), Fraction(1, 1))}
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the Arithmetic.train method instead.
-
class
abydos.compression.
BWT
(terminator='x00')[source]¶ Bases:
object
Burrows-Wheeler Transform.
The Burrows-Wheeler transform is an attempt at placing similar characters together to improve compression. Cf. [BW94].
New in version 0.3.6.
Initialize BWT instance.
- Parameters
terminator (str) -- A character added to signal the end of the string
New in version 0.4.0.
-
decode
(code)[source]¶ Return a word decoded from BWT form.
- Parameters
code (str) -- The word to transform from BWT form
terminator (str) -- A character added to signal the end of the string
- Returns
Word decoded by BWT
- Return type
str
- Raises
ValueError -- Specified terminator absent from code.
Examples
>>> bwt = BWT() >>> bwt.decode('n\x00ilag') 'align' >>> bwt.decode('annb\x00aa') 'banana'
>>> bwt = BWT('@') >>> bwt.decode('annb@aa') 'banana'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode
(word)[source]¶ Return the Burrows-Wheeler transformed form of a word.
- Parameters
word (str) -- The word to transform using BWT
- Returns
Word encoded by BWT
- Return type
str
- Raises
ValueError -- Specified terminator absent from code.
Examples
>>> bwt = BWT() >>> bwt.encode('align') 'n\x00ilag' >>> bwt.encode('banana') 'annb\x00aa'
>>> bwt = BWT('@') >>> bwt.encode('banana') 'annb@aa'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
abydos.compression.
bwt_decode
(code, terminator='\x00')[source]¶ Return a word decoded from BWT form.
This is a wrapper for
BWT.decode()
.- Parameters
code (str) -- The word to transform from BWT form
terminator (str) -- A character added to signal the end of the string
- Returns
Word decoded by BWT
- Return type
str
Examples
>>> bwt_decode('n\x00ilag') 'align' >>> bwt_decode('annb\x00aa') 'banana' >>> bwt_decode('annb@aa', '@') 'banana'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the BWT.decode method instead.
-
abydos.compression.
bwt_encode
(word, terminator='\x00')[source]¶ Return the Burrows-Wheeler transformed form of a word.
This is a wrapper for
BWT.encode()
.- Parameters
word (str) -- The word to transform using BWT
terminator (str) -- A character added to signal the end of the string
- Returns
Word encoded by BWT
- Return type
str
Examples
>>> bwt_encode('align') 'n\x00ilag' >>> bwt_encode('banana') 'annb\x00aa' >>> bwt_encode('banana', '@') 'annb@aa'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the BWT.encode method instead.
-
class
abydos.compression.
RLE
[source]¶ Bases:
object
Run-Length Encoding.
Cf. [RC67].
Based on http://rosettacode.org/wiki/Run-length_encoding#Python [Cod18b]. This is licensed GFDL 1.2.
Digits 0-9 cannot be in text.
New in version 0.3.6.
-
decode
(text)[source]¶ Perform decoding of run-length-encoding (RLE).
- Parameters
text (str) -- A text string to decode
- Returns
Word decoded by RLE
- Return type
str
Examples
>>> rle = RLE() >>> bwt = BWT() >>> bwt.decode(rle.decode('n\x00ilag')) 'align' >>> rle.decode('align') 'align'
>>> bwt.decode(rle.decode('annb\x00aa')) 'banana' >>> rle.decode('banana') 'banana'
>>> bwt.decode(rle.decode('ab\x00abbab5a')) 'aaabaabababa' >>> rle.decode('3abaabababa') 'aaabaabababa'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
encode
(text)[source]¶ Perform encoding of run-length-encoding (RLE).
- Parameters
text (str) -- A text string to encode
- Returns
Word decoded by RLE
- Return type
str
Examples
>>> rle = RLE() >>> bwt = BWT() >>> rle.encode(bwt.encode('align')) 'n\x00ilag' >>> rle.encode('align') 'align'
>>> rle.encode(bwt.encode('banana')) 'annb\x00aa' >>> rle.encode('banana') 'banana'
>>> rle.encode(bwt.encode('aaabaabababa')) 'ab\x00abbab5a' >>> rle.encode('aaabaabababa') '3abaabababa'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
abydos.compression.
rle_decode
(text, use_bwt=True)[source]¶ Perform decoding of run-length-encoding (RLE).
This is a wrapper for
RLE.decode()
.- Parameters
text (str) -- A text string to decode
use_bwt (bool) -- Indicates whether to perform BWT decoding after RLE decoding
- Returns
Word decoded by RLE
- Return type
str
Examples
>>> rle_decode('n\x00ilag') 'align' >>> rle_decode('align', use_bwt=False) 'align'
>>> rle_decode('annb\x00aa') 'banana' >>> rle_decode('banana', use_bwt=False) 'banana'
>>> rle_decode('ab\x00abbab5a') 'aaabaabababa' >>> rle_decode('3abaabababa', False) 'aaabaabababa'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RLE.decode method instead.
-
abydos.compression.
rle_encode
(text, use_bwt=True)[source]¶ Perform encoding of run-length-encoding (RLE).
This is a wrapper for
RLE.encode()
.- Parameters
text (str) -- A text string to encode
use_bwt (bool) -- Indicates whether to perform BWT encoding before RLE encoding
- Returns
Word decoded by RLE
- Return type
str
Examples
>>> rle_encode('align') 'n\x00ilag' >>> rle_encode('align', use_bwt=False) 'align'
>>> rle_encode('banana') 'annb\x00aa' >>> rle_encode('banana', use_bwt=False) 'banana'
>>> rle_encode('aaabaabababa') 'ab\x00abbab5a' >>> rle_encode('aaabaabababa', False) '3abaabababa'
New in version 0.1.0.
Deprecated since version 0.4.0: This will be removed in 0.6.0. Use the RLE.encode method instead.