markovchain.text package

Submodules

markovchain.text.formatter module

class markovchain.text.formatter.Formatter(case=<CharCase.TITLE: 1>, replace=None, end_chars='.?!', default_end='.')[source]

Bases: markovchain.text.formatter.FormatterBase

Default formatter.

case

markovchain.text.util.CharCase – Character case.

replace

list of (_sre.SRE_Pattern, str, int) – List of regular expressions to replace.

end_chars

str – Sentence ending characters.

default_end

None or str – Default sentence ending character.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
DEFAULT_REPLACE = [('\\s+', ' '), ('\\s*([^\\w\\s]+)\\s*', '\\1'), ('([,.?!])(\\w)', '\\1 \\2'), ('([\\w,.?!])([[({<])', '\\1 \\2'), ('([])}>])(\\w)', '\\1 \\2'), ('(\\w)([-+*]+)(\\w)', '\\1 \\2 \\3')]
__call__(string)[source]
__init__(case=<CharCase.TITLE: 1>, replace=None, end_chars='.?!', default_end='.')[source]

Formatter constructor.

Parameters:
  • case (int or str or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.TITLE).
  • end_chars (str, optional) – Sentence ending characters (default: ‘.?!’).
  • default_end (None or str, optional) – Default sentence ending character (default: ‘.’).
  • replace (list of ((str, str) or (str, str, str)), optional) – List of regular expressions to replace (default: DEFAULT_REPLACE).
save()[source]

Convert an object to JSON.

Returns:JSON data.
Return type:dict
class markovchain.text.formatter.FormatterBase[source]

Bases: markovchain.util.SaveLoad

Text formatter base class.

classes

dict – Class group.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(string)[source]

Format a string.

Parameters:string (str) – String to format.
Returns:Formatted string.
Return type:str
classes = {'Formatter': <class 'markovchain.text.formatter.Formatter'>, 'Noop': <class 'markovchain.text.formatter.Noop'>}
class markovchain.text.formatter.Noop[source]

Bases: markovchain.text.formatter.FormatterBase

No-op formatter.

classes

dict – Class group.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(string)[source]

markovchain.text.markov module

class markovchain.text.markov.MarkovText(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]

Bases: markovchain.base.Markov

Markov text generator class.

DEFAULT_SCANNER

type – Default scanner class.

DEFAULT_PARSER

type – Default parser class.

DEFAULT_STORAGE

type – Default storage class.

scanner

markovchain.scanner.Scanner

parser

markovchain.parser.ParserBase

storage

markovchain.storage.Storage

DEFAULT_FORMATTER

alias of Formatter

DEFAULT_PARSER

alias of Parser

DEFAULT_RANK

alias of Const

DEFAULT_SCANNER

alias of RegExpScanner

__call__(max_length=None, state_size=None, reply_to=None, reply_mode=<ReplyMode.END: 0>, dataset='')[source]

Generate text.

Parameters:
  • max_length (int or None, optional) – Maximum sentence length (default: None).
  • state_size (int, optional) – State size (default: parser.state_sizes[0]).
  • reply_to (str or None, optional) – Input string (default: None).
  • reply_mode (markovchain.text.util.ReplyMode, optional) – Reply mode (default: markovchain.text.util.ReplyMode.END)
  • dataset (str, optional) – Dataset key prefix (default: ‘’).
Returns:

Return type:

str

__init__(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]
data(data, part=False, dataset='')[source]

Parse data and update links.

Parameters:
  • data (str) – Text to parse.
  • part (bool, optional) – True if data is partial (default: False).
  • dataset (str, optional) – Dataset key prefix (default: ‘’).
format(parts)[source]

Format generated text.

Parameters:parts (iterable of str) – Text parts.
generate_cont(max_length, state_size, reply_to, backward, dataset)[source]

Generate texts from start/end.

Parameters:
  • max_length (int or None) – Maximum sentence length.
  • state_size (int) – State size.
  • reply_to (str or None) – Input string.
  • backward (bool) – True to generate text start.
  • dataset (str) – Dataset key prefix.
Returns:

Generated texts.

Return type:

generator of str

generate_replies(max_length, state_size, reply_to, dataset)[source]

Generate replies.

Parameters:
  • max_length (int or None) – Maximum sentence length.
  • state_size (int) – State size.
  • reply_to (str) – Input string.
  • dataset (str) – Dataset key prefix.
Returns:

Generated texts.

Return type:

generator of str

get_cont_state(string, backward=False)[source]

Get initial states from input string.

Parameters:
  • string (str or None) –
  • backward (bool) –
Returns:

Return type:

tuple of str

get_reply_states(string, dataset)[source]

Get initial states from input string.

Parameters:
  • string (str) – Input string.
  • dataset (str) – Dataset key.
Returns:

Return type:

list of list of str

get_settings_json()[source]

Convert generator settings to JSON.

Returns:JSON data.
Return type:dict

markovchain.text.rank module

class markovchain.text.rank.Const(**_)[source]

Bases: markovchain.text.rank.Rank

Constant text rank.

size

int

remove

float

debug

bool – If True, enable debug output.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__init__(**_)[source]
rank(string)[source]

Rank a string.

Parameters:string (str) –
Returns:
Return type:float
class markovchain.text.rank.Rank(size=10, remove=0.5)[source]

Bases: markovchain.util.SaveLoad

Base text rank class.

size

int

remove

float

debug

bool – If True, enable debug output.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(strings)[source]

Filter strings by rank.

Parameters:strings (iterable of str) – Strings to filter.
Returns:Filtered list.
Return type:list of str
__init__(size=10, remove=0.5)[source]
classes = {'Const': <class 'markovchain.text.rank.Const'>, 'Test': <class 'markovchain.text.rank.Test'>}
rank(string)[source]

Rank a string.

Parameters:string (str) –
Returns:
Return type:float
save()[source]

Convert an object to JSON.

Returns:JSON data.
Return type:dict
class markovchain.text.rank.Test(size, remove)[source]

Bases: markovchain.text.rank.Rank

Base text rank class.

size

int

remove

float

debug

bool – If True, enable debug output.

Examples

>>> class SaveLoadGroup(SaveLoad):
...     classes = {}
...
>>> class SaveLoadObject(SaveLoadGroup):
...     def __init__(self, attr=None):
...         self.attr = attr
...     def save(self):
...         data = super().save()
...         data['attr'] = self.attr
...         return data
...
>>> SaveLoadGroup.add_class(SaveLoadObject)
>>> SaveLoadGroup.classes
{'SaveLoadObject': <class '__main__.SaveLoadObject'>}
>>> obj = SaveLoadObject(0)
>>> data = obj.save()
>>> data
{'attr': 0, '__class__': 'SaveLoadObject'}
>>> obj2 = SaveLoadGroup.load(data)
>>> type(obj2)
<class '__main__.SaveLoadObject'>
>>> obj2.attr
0
__call__(strings)[source]
__init__(size, remove)[source]
features(string)[source]
log(res, features, string)[source]
rank(string)[source]

Rank a string.

Parameters:string (str) –
Returns:
Return type:float

markovchain.text.scanner module

class markovchain.text.scanner.CharScanner(end_chars='.?!', default_end='.', case=<CharCase.LOWER: 3>)[source]

Bases: markovchain.text.scanner.TextScanner

Character scanner.

case

markovchain.text.util.CharCase – Character case.

end_chars

str – Sentence ending characters.

default_end

str – Default sentence ending character.

start

bool – True if current sentence is started.

end

bool – True if current sentence is ended.

Examples

>>> scan = CharScanner()
>>> list(scan('Word'))
['W', 'o', 'r', 'd', '.', Scanner.END]
>>> list(scan('Word', True))
['W', 'o', 'r', 'd']
>>> list(scan(''))
['.', Scanner.END]
__init__(end_chars='.?!', default_end='.', case=<CharCase.LOWER: 3>)[source]

Character scanner constructor.

Parameters:
  • case (str or int or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).
  • end_chars (str, optional) – Sentence ending characters (default: ‘.?!’).
  • default_end (str, optional) – Default sentence ending character (default: ‘.’).
reset()[source]

Reset scanner state.

save()[source]

Convert to JSON.

Returns:JSON data.
Return type:dict
scan(data, part)[source]

Scan a string.

Parameters:
  • data (str) – String to scan.
  • part (bool) – True if data is partial.
Returns:

Token generator.

Return type:

generator of (str or markovchain.scanner.Scanner.END)

class markovchain.text.scanner.RegExpScanner(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\w\s]+|\w+)))'), default_end='.', case=<CharCase.LOWER: 3>)[source]

Bases: markovchain.text.scanner.TextScanner

Regular expression scanner.

DEFAULT_EXPR

_sre.SRE_Pattern – Default regular expression.

case

markovchain.text.util.CharCase – Character case.

expr

_sre.SRE_Pattern – Regular expression..

default_end

str – Default sentence ending string.

end

boolTrue if current sentence is ended.

Examples

>>> scan = RegExpScanner(lambda data: data.split())
>>> list(scan('Word word. word'))
['Word', 'word', '.', Scanner.END, 'word', '.', Scanner.END]
>>> list(scan('word', True))
['word']
>>> list(scan(''))
['.', Scanner.END]
DEFAULT_EXPR = re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))')
__init__(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=<CharCase.LOWER: 3>)[source]

Regular expression scanner constructor.

Parameters:
  • case (str or int or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).
  • expr (str or _sre.SRE_Pattern, optional) – Regular expression (default: markovchain.scanner.RegExpScanner.DEFAULT_EXPR). It should have groups ‘end’ (sentence ending punctuation) and ‘word’ (words / other punctuation).
  • default_end (str, optional) – Default sentence ending string (default: ‘.’).
static get_group(match, group)[source]

Get a group from a regular expression match object if it exists.

Parameters:
  • match (_sre.SRE_Match) – Regular expression match object.
  • group (str or int) – Group name or index.
Returns:

Return type:

str or None

static get_regexp(x)[source]

Compile a regular expression if necessary.

Parameters:x (str or _sre.SRE_Pattern) – Regular expression.
Returns:Compiled regular expression.
Return type:_sre.SRE_Pattern
reset()[source]

Reset scanner state.

save()[source]

Convert the scanner to JSON.

Returns:JSON data.
Return type:dict
scan(data, part)[source]

Scan a string.

Parameters:
  • data (str) – String to scan.
  • part (bool) – True if data is partial.
Returns:

Token generator.

Return type:

generator of (str or markovchain.scanner.Scanner.END)

class markovchain.text.scanner.TextScanner(case=<CharCase.LOWER: 3>)[source]

Bases: markovchain.scanner.Scanner

Text scanner base class.

case

markovchain.text.util.CharCase – Character case.

Examples

>>> scan = Scanner(lambda data: data.split())
>>> scan('a b c')
['a', 'b', 'c']
__call__(data, part=False)[source]

Scan a string.

Parameters:
  • data (str) – String to scan.
  • part (bool, optional) – True if data is partial (default: False).
Returns:

Token generator.

Return type:

generator of (str or markovchain.scanner.Scanner.END)

__init__(case=<CharCase.LOWER: 3>)[source]

Text scanner constructor.

Parameters:case (str or int or markovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).
save()[source]

Convert an object to JSON.

Returns:JSON data.
Return type:dict
scan(data, part)[source]

Scan a string.

Parameters:
  • data (str) – String to scan.
  • part (bool) – True if data is partial.
Returns:

Token generator.

Return type:

generator of (str or markovchain.scanner.Scanner.END)

markovchain.text.util module

class markovchain.text.util.CharCase[source]

Bases: enum.IntEnum

Character case.

LOWER = 3
PRESERVE = 0
TITLE = 1
UPPER = 2
convert(string)[source]

Return a copy of string converted to case.

Parameters:string (str) –
Returns:
Return type:str

Examples

>>> CharCase.LOWER.convert('sTr InG')
'str ing'
>>> CharCase.UPPER.convert('sTr InG')
'STR ING'
>>> CharCase.TITLE.convert('sTr InG')
'Str ing'
>>> CharCase.PRESERVE.convert('sTr InG')
'sTr InG'
class markovchain.text.util.ReFlags[source]

Bases: enum.IntEnum

Custom regexp flags.

O

int

OVERLAP

int – Replace overlapping occurrences of pattern.

O = 1
OVERLAP = 1
class markovchain.text.util.ReplyMode[source]

Bases: enum.IntEnum

Text reply mode.

END = 0
REPLY = 2
START = 1
markovchain.text.util.capitalize(string)[source]

Capitalize a sentence.

Parameters:string (str) – String to capitalize.
Returns:Capitalized string.
Return type:str

Examples

>>> capitalize('worD WORD WoRd')
'Word word word'
markovchain.text.util.get_words(string)[source]

Find all words in a string.

Parameters:string (str) –
Returns:
Return type:list of str

Examples

>>> get_words('  ..?!word  ,  (Word)..  word')
['word', 'Word', 'word']
markovchain.text.util.ispunct(string)[source]

Return True if all characters in a string are punctuation and it is not empty.

Parameters:string (str) –
Returns:
Return type:bool

Examples

>>> ispunct('.,?')
True
>>> ispunct('.x.')
False
>>> ispunct('. ')
False
>>> ispunct('')
False
markovchain.text.util.lstrip_ws_and_chars(string, chars)[source]

Remove leading whitespace and characters from a string.

Parameters:
  • string (str) – String to strip.
  • chars (str) – Characters to remove.
Returns:

Stripped string.

Return type:

str

Examples

>>> lstrip_ws_and_chars(' \t.\n , .x. ', '.,?!')
'x. '
markovchain.text.util.re_flags(flags, custom=<enum 'ReFlags'>)[source]

Parse regexp flag string.

Parameters:
  • flags (str) – Flag string.
  • custom (IntEnum, optional) – Custom flag enum (default: None).
Returns:

(flags for re.compile, custom flags)

Return type:

(int, int)

Raises:

ValueError

markovchain.text.util.re_flags_str(flags, custom_flags)[source]

Convert regexp flags to string.

Parameters:
  • flags (int) – Flags.
  • custom_flags (int) – Custom flags.
Returns:

Flag string.

Return type:

str

markovchain.text.util.re_sub(pattern, repl, string, count=0, flags=0, custom_flags=0)[source]

Replace regular expression.

Parameters:
  • pattern (str or _sre.SRE_Pattern) – Compiled regular expression.
  • repl (str or function) – Replacement.
  • string (str) – Input string.
  • count (int) – Maximum number of pattern occurrences.
  • flags (int) – Flags.
  • custom_flags (int) – Custom flags.

Module contents