markovchain.text package

Submodules

markovchain.text.formatter module

class markovchain.text.formatter.Formatter(case=CharCase.TITLE, replace=None, end_chars='.?!', default_end='.')[source]

Bases: markovchain.text.formatter.FormatterBase

Default formatter.

case

Character case.

Type

markovchain.text.util.CharCase

replace

List of regular expressions to replace.

Type

list of (_sre.SRE_Pattern, str, int)

end_chars

Sentence ending characters.

Type

str

default_end

Default sentence ending character.

Type

None or str

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
DEFAULT_REPLACE = [('\\s+', ' '), ('\\s*([^\\w\\s]+)\\s*', '\\1'), ('([,.?!])(\\w)', '\\1 \\2'), ('([\\w,.?!])([[({<])', '\\1 \\2'), ('([])}>])(\\w)', '\\1 \\2'), ('(\\w)([-+*]+)(\\w)', '\\1 \\2 \\3')]
__call__(string)[source]

Format a string.

Parameters

string (str) – String to format.

Returns

Formatted string.

Return type

str

__init__(case=CharCase.TITLE, replace=None, end_chars='.?!', default_end='.')[source]

Formatter constructor.

Parameters
  • case (int or str or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.TITLE).

  • end_chars (str, optional) – Sentence ending characters (default: ‘.?!’).

  • default_end (None or str, optional) – Default sentence ending character (default: ‘.’).

  • replace (list of ((str, str) or (str, str, str)), optional) – List of regular expressions to replace (default: DEFAULT_REPLACE).

save()[source]

Convert an object to JSON.

Returns

JSON data.

Return type

dict

class markovchain.text.formatter.FormatterBase[source]

Bases: markovchain.util.SaveLoad

Text formatter base class.

classes
Type

dict

Class group.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
abstract __call__(string)[source]

Format a string.

Parameters

string (str) – String to format.

Returns

Formatted string.

Return type

str

classes = {'Formatter': <class 'markovchain.text.formatter.Formatter'>, 'Noop': <class 'markovchain.text.formatter.Noop'>}
class markovchain.text.formatter.Noop[source]

Bases: markovchain.text.formatter.FormatterBase

No-op formatter.

classes
Type

dict

Class group.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(string)[source]

Format a string.

Parameters

string (str) – String to format.

Returns

Formatted string.

Return type

str

markovchain.text.markov module

class markovchain.text.markov.MarkovText(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]

Bases: markovchain.base.Markov

Markov text generator class.

DEFAULT_SCANNER

Default scanner class.

Type

type

DEFAULT_PARSER

Default parser class.

Type

type

DEFAULT_STORAGE

Default storage class.

Type

type

scanner
Type

markovchain.scanner.Scanner

parser
Type

markovchain.parser.ParserBase

storage
Type

markovchain.storage.Storage

DEFAULT_FORMATTER

alias of markovchain.text.formatter.Formatter

DEFAULT_PARSER

alias of markovchain.parser.Parser

DEFAULT_RANK

alias of markovchain.text.rank.Const

DEFAULT_SCANNER

alias of markovchain.text.scanner.RegExpScanner

__call__(max_length=None, state_size=None, reply_to=None, reply_mode=ReplyMode.END, dataset='')[source]

Generate text.

Parameters
  • max_length (int or None, optional) – Maximum sentence length (default: None).

  • state_size (int, optional) – State size (default: parser.state_sizes[0]).

  • reply_to (str or None, optional) – Input string (default: None).

  • reply_mode (markovchain.text.util.ReplyMode, optional) – Reply mode (default: markovchain.text.util.ReplyMode.END)

  • dataset (str, optional) – Dataset key prefix (default: ‘’).

Return type

str

__init__(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]

Markov chain generator base class constructor.

Parameters
data(data, part=False, dataset='')[source]

Parse data and update links.

Parameters
  • data (str) – Text to parse.

  • part (bool, optional) – True if data is partial (default: False).

  • dataset (str, optional) – Dataset key prefix (default: ‘’).

format(parts)[source]

Format generated text.

Parameters

parts (iterable of str) – Text parts.

generate_cont(max_length, state_size, reply_to, backward, dataset)[source]

Generate texts from start/end.

Parameters
  • max_length (int or None) – Maximum sentence length.

  • state_size (int) – State size.

  • reply_to (str or None) – Input string.

  • backward (bool) – True to generate text start.

  • dataset (str) – Dataset key prefix.

Returns

Generated texts.

Return type

generator of str

generate_replies(max_length, state_size, reply_to, dataset)[source]

Generate replies.

Parameters
  • max_length (int or None) – Maximum sentence length.

  • state_size (int) – State size.

  • reply_to (str) – Input string.

  • dataset (str) – Dataset key prefix.

Returns

Generated texts.

Return type

generator of str

get_cont_state(string, backward=False)[source]

Get initial states from input string.

Parameters
  • string (str or None) –

  • backward (bool) –

Return type

tuple of str

get_reply_states(string, dataset)[source]

Get initial states from input string.

Parameters
  • string (str) – Input string.

  • dataset (str) – Dataset key.

Return type

list of list of str

get_settings_json()[source]

Convert generator settings to JSON.

Returns

JSON data.

Return type

dict

markovchain.text.rank module

class markovchain.text.rank.Const(**_)[source]

Bases: markovchain.text.rank.Rank

Constant text rank.

size
Type

int

remove
Type

float

debug

If True, enable debug output.

Type

bool

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__init__(**_)[source]
rank(string)[source]

Rank a string.

Parameters

string (str) –

Return type

float

class markovchain.text.rank.Rank(size=10, remove=0.5)[source]

Bases: markovchain.util.SaveLoad

Base text rank class.

size
Type

int

remove
Type

float

debug

If True, enable debug output.

Type

bool

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(strings)[source]

Filter strings by rank.

Parameters

strings (iterable of str) – Strings to filter.

Returns

Filtered list.

Return type

list of str

__init__(size=10, remove=0.5)[source]
classes = {'Const': <class 'markovchain.text.rank.Const'>, 'Test': <class 'markovchain.text.rank.Test'>}
abstract rank(string)[source]

Rank a string.

Parameters

string (str) –

Return type

float

save()[source]

Convert an object to JSON.

Returns

JSON data.

Return type

dict

class markovchain.text.rank.Test(size, remove)[source]

Bases: markovchain.text.rank.Rank

Base text rank class.

size
Type

int

remove
Type

float

debug

If True, enable debug output.

Type

bool

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(strings)[source]

Filter strings by rank.

Parameters

strings (iterable of str) – Strings to filter.

Returns

Filtered list.

Return type

list of str

__init__(size, remove)[source]
features(string)[source]
log(res, features, string)[source]
rank(string)[source]

Rank a string.

Parameters

string (str) –

Return type

float

markovchain.text.scanner module

class markovchain.text.scanner.CharScanner(end_chars='.?!', default_end='.', case=CharCase.LOWER)[source]

Bases: markovchain.text.scanner.TextScanner

Character scanner.

case

Character case.

Type

markovchain.text.util.CharCase

end_chars

Sentence ending characters.

Type

str

default_end

Default sentence ending character.

Type

str

start

True if current sentence is started.

Type

bool

end

True if current sentence is ended.

Type

bool

Examples

>>> scan = CharScanner()
>>> list(scan('Word'))
['W', 'o', 'r', 'd', '.', Scanner.END]
>>> list(scan('Word', True))
['W', 'o', 'r', 'd']
>>> list(scan(''))
['.', Scanner.END]
__init__(end_chars='.?!', default_end='.', case=CharCase.LOWER)[source]

Character scanner constructor.

Parameters
  • case (str or int or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).

  • end_chars (str, optional) – Sentence ending characters (default: ‘.?!’).

  • default_end (str, optional) – Default sentence ending character (default: ‘.’).

reset()[source]

Reset scanner state.

save()[source]

Convert to JSON.

Returns

JSON data.

Return type

dict

scan(data, part)[source]

Scan a string.

Parameters
  • data (str) – String to scan.

  • part (bool) – True if data is partial.

Returns

Token generator.

Return type

generator of (str or markovchain.scanner.Scanner.END)

class markovchain.text.scanner.RegExpScanner(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=CharCase.LOWER)[source]

Bases: markovchain.text.scanner.TextScanner

Regular expression scanner.

DEFAULT_EXPR

Default regular expression.

Type

_sre.SRE_Pattern

case

Character case.

Type

markovchain.text.util.CharCase

expr

Regular expression..

Type

_sre.SRE_Pattern

default_end

Default sentence ending string.

Type

str

end

True if current sentence is ended.

Type

bool

Examples

>>> scan = RegExpScanner(lambda data: data.split())
>>> list(scan('Word word. word'))
['Word', 'word', '.', Scanner.END, 'word', '.', Scanner.END]
>>> list(scan('word', True))
['word']
>>> list(scan(''))
['.', Scanner.END]
DEFAULT_EXPR = re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))')
__init__(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=CharCase.LOWER)[source]

Regular expression scanner constructor.

Parameters
  • case (str or int or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).

  • expr (str or _sre.SRE_Pattern, optional) – Regular expression (default: markovchain.scanner.RegExpScanner.DEFAULT_EXPR). It should have groups ‘end’ (sentence ending punctuation) and ‘word’ (words / other punctuation).

  • default_end (str, optional) – Default sentence ending string (default: ‘.’).

static get_group(match, group)[source]

Get a group from a regular expression match object if it exists.

Parameters
  • match (_sre.SRE_Match) – Regular expression match object.

  • group (str or int) – Group name or index.

Return type

str or None

static get_regexp(x)[source]

Compile a regular expression if necessary.

Parameters

x (str or _sre.SRE_Pattern) – Regular expression.

Returns

Compiled regular expression.

Return type

_sre.SRE_Pattern

reset()[source]

Reset scanner state.

save()[source]

Convert the scanner to JSON.

Returns

JSON data.

Return type

dict

scan(data, part)[source]

Scan a string.

Parameters
  • data (str) – String to scan.

  • part (bool) – True if data is partial.

Returns

Token generator.

Return type

generator of (str or markovchain.scanner.Scanner.END)

class markovchain.text.scanner.TextScanner(case=CharCase.LOWER)[source]

Bases: markovchain.scanner.Scanner

Text scanner base class.

case
Type

markovchain.text.util.CharCase

Character case.

Examples

>>> scan = Scanner(lambda data: data.split())
>>> scan('a b c')
['a', 'b', 'c']
__call__(data, part=False)[source]

Scan a string.

Parameters
  • data (str) – String to scan.

  • part (bool, optional) – True if data is partial (default: False).

Returns

Token generator.

Return type

generator of (str or markovchain.scanner.Scanner.END)

__init__(case=CharCase.LOWER)[source]

Text scanner constructor.

Parameters

case (str or int or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).

save()[source]

Convert an object to JSON.

Returns

JSON data.

Return type

dict

abstract scan(data, part)[source]

Scan a string.

Parameters
  • data (str) – String to scan.

  • part (bool) – True if data is partial.

Returns

Token generator.

Return type

generator of (str or markovchain.scanner.Scanner.END)

markovchain.text.util module

class markovchain.text.util.CharCase(value)[source]

Bases: enum.IntEnum

Character case.

LOWER = 3
PRESERVE = 0
TITLE = 1
UPPER = 2
convert(string)[source]

Return a copy of string converted to case.

Parameters

string (str) –

Return type

str

Examples

>>> CharCase.LOWER.convert('sTr InG')
'str ing'
>>> CharCase.UPPER.convert('sTr InG')
'STR ING'
>>> CharCase.TITLE.convert('sTr InG')
'Str ing'
>>> CharCase.PRESERVE.convert('sTr InG')
'sTr InG'
class markovchain.text.util.ReFlags(value)[source]

Bases: enum.IntEnum

Custom regexp flags.

O
Type

int

OVERLAP

Replace overlapping occurrences of pattern.

Type

int

O = 1
OVERLAP = 1
class markovchain.text.util.ReplyMode(value)[source]

Bases: enum.IntEnum

Text reply mode.

END = 0
REPLY = 2
START = 1
markovchain.text.util.capitalize(string)[source]

Capitalize a sentence.

Parameters

string (str) – String to capitalize.

Returns

Capitalized string.

Return type

str

Examples

>>> capitalize('worD WORD WoRd')
'Word word word'
markovchain.text.util.get_words(string)[source]

Find all words in a string.

Parameters

string (str) –

Return type

list of str

Examples

>>> get_words('  ..?!word  ,  (Word)..  word')
['word', 'Word', 'word']
markovchain.text.util.ispunct(string)[source]

Return True if all characters in a string are punctuation and it is not empty.

Parameters

string (str) –

Return type

bool

Examples

>>> ispunct('.,?')
True
>>> ispunct('.x.')
False
>>> ispunct('. ')
False
>>> ispunct('')
False
markovchain.text.util.lstrip_ws_and_chars(string, chars)[source]

Remove leading whitespace and characters from a string.

Parameters
  • string (str) – String to strip.

  • chars (str) – Characters to remove.

Returns

Stripped string.

Return type

str

Examples

>>> lstrip_ws_and_chars(' \t.\n , .x. ', '.,?!')
'x. '
markovchain.text.util.re_flags(flags, custom=<enum 'ReFlags'>)[source]

Parse regexp flag string.

Parameters
  • flags (str) – Flag string.

  • custom (IntEnum, optional) – Custom flag enum (default: None).

Returns

(flags for re.compile, custom flags)

Return type

(int, int)

Raises

ValueError

markovchain.text.util.re_flags_str(flags, custom_flags)[source]

Convert regexp flags to string.

Parameters
  • flags (int) – Flags.

  • custom_flags (int) – Custom flags.

Returns

Flag string.

Return type

str

markovchain.text.util.re_sub(pattern, repl, string, count=0, flags=0, custom_flags=0)[source]

Replace regular expression.

Parameters
  • pattern (str or _sre.SRE_Pattern) – Compiled regular expression.

  • repl (str or function) – Replacement.

  • string (str) – Input string.

  • count (int) – Maximum number of pattern occurrences.

  • flags (int) – Flags.

  • custom_flags (int) – Custom flags.

Module contents