markovchain.text package¶
Submodules¶
markovchain.text.formatter module¶
- class markovchain.text.formatter.Formatter(case=CharCase.TITLE, replace=None, end_chars='.?!', default_end='.')[source]¶
Bases:
markovchain.text.formatter.FormatterBase
Default formatter.
- case¶
Character case.
- replace¶
List of regular expressions to replace.
- Type
list
of (_sre.SRE_Pattern,str
,int
)
- end_chars¶
Sentence ending characters.
- Type
str
- default_end¶
Default sentence ending character.
- Type
None
orstr
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- DEFAULT_REPLACE = [('\\s+', ' '), ('\\s*([^\\w\\s]+)\\s*', '\\1'), ('([,.?!])(\\w)', '\\1 \\2'), ('([\\w,.?!])([[({<])', '\\1 \\2'), ('([])}>])(\\w)', '\\1 \\2'), ('(\\w)([-+*]+)(\\w)', '\\1 \\2 \\3')]¶
- __call__(string)[source]¶
Format a string.
- Parameters
string (
str
) – String to format.- Returns
Formatted string.
- Return type
str
- __init__(case=CharCase.TITLE, replace=None, end_chars='.?!', default_end='.')[source]¶
Formatter constructor.
- Parameters
case (
int
orstr
ormarkovchain.text.util.CharCase
, optional) – Character case (default:markovchain.text.util.CharCase.TITLE
).end_chars (
str
, optional) – Sentence ending characters (default: ‘.?!’).default_end (
None
orstr
, optional) – Default sentence ending character (default: ‘.’).replace (
list
of ((str
,str
) or (str
,str
,str
)), optional) – List of regular expressions to replace (default: DEFAULT_REPLACE).
- class markovchain.text.formatter.FormatterBase[source]¶
Bases:
markovchain.util.SaveLoad
Text formatter base class.
- classes¶
- Type
dict
- Class group.
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- abstract __call__(string)[source]¶
Format a string.
- Parameters
string (
str
) – String to format.- Returns
Formatted string.
- Return type
str
- classes = {'Formatter': <class 'markovchain.text.formatter.Formatter'>, 'Noop': <class 'markovchain.text.formatter.Noop'>}¶
- class markovchain.text.formatter.Noop[source]¶
Bases:
markovchain.text.formatter.FormatterBase
No-op formatter.
- classes¶
- Type
dict
- Class group.
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
markovchain.text.markov module¶
- class markovchain.text.markov.MarkovText(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]¶
Bases:
markovchain.base.Markov
Markov text generator class.
- DEFAULT_SCANNER¶
Default scanner class.
- Type
type
- DEFAULT_PARSER¶
Default parser class.
- Type
type
- DEFAULT_STORAGE¶
Default storage class.
- Type
type
- scanner¶
- parser¶
- storage¶
- Type
markovchain.storage.Storage
- DEFAULT_FORMATTER¶
alias of
markovchain.text.formatter.Formatter
- DEFAULT_PARSER¶
alias of
markovchain.parser.Parser
- DEFAULT_RANK¶
alias of
markovchain.text.rank.Const
- DEFAULT_SCANNER¶
- __call__(max_length=None, state_size=None, reply_to=None, reply_mode=ReplyMode.END, dataset='')[source]¶
Generate text.
- Parameters
max_length (
int
orNone
, optional) – Maximum sentence length (default: None).state_size (
int
, optional) – State size (default: parser.state_sizes[0]).reply_to (
str
orNone
, optional) – Input string (default: None).reply_mode (
markovchain.text.util.ReplyMode
, optional) – Reply mode (default:markovchain.text.util.ReplyMode.END
)dataset (
str
, optional) – Dataset key prefix (default: ‘’).
- Return type
str
- __init__(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]¶
Markov chain generator base class constructor.
- Parameters
scanner (
dict
ormarkovchain.scanner.Scanner
, optional) – Scanner (default:DEFAULT_SCANNER()
).parser (
dict
ormarkovchain.parser.ParserBase
, optional) – Parser (default:DEFAULT_PARSER()
).storage (
markovchain.storage.Storage
, optional) – Parser (default:DEFAULT_STORAGE()
).
- data(data, part=False, dataset='')[source]¶
Parse data and update links.
- Parameters
data (
str
) – Text to parse.part (
bool
, optional) – True if data is partial (default:False
).dataset (
str
, optional) – Dataset key prefix (default: ‘’).
- generate_cont(max_length, state_size, reply_to, backward, dataset)[source]¶
Generate texts from start/end.
- Parameters
max_length (
int
orNone
) – Maximum sentence length.state_size (
int
) – State size.reply_to (
str
orNone
) – Input string.backward (
bool
) –True
to generate text start.dataset (
str
) – Dataset key prefix.
- Returns
Generated texts.
- Return type
generator
ofstr
- generate_replies(max_length, state_size, reply_to, dataset)[source]¶
Generate replies.
- Parameters
max_length (
int
orNone
) – Maximum sentence length.state_size (
int
) – State size.reply_to (
str
) – Input string.dataset (
str
) – Dataset key prefix.
- Returns
Generated texts.
- Return type
generator
ofstr
- get_cont_state(string, backward=False)[source]¶
Get initial states from input string.
- Parameters
string (
str
orNone
) –backward (
bool
) –
- Return type
tuple
ofstr
markovchain.text.rank module¶
- class markovchain.text.rank.Const(**_)[source]¶
Bases:
markovchain.text.rank.Rank
Constant text rank.
- size¶
- Type
int
- remove¶
- Type
float
- debug¶
If True, enable debug output.
- Type
bool
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- class markovchain.text.rank.Rank(size=10, remove=0.5)[source]¶
Bases:
markovchain.util.SaveLoad
Base text rank class.
- size¶
- Type
int
- remove¶
- Type
float
- debug¶
If True, enable debug output.
- Type
bool
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- __call__(strings)[source]¶
Filter strings by rank.
- Parameters
strings (
iterable
ofstr
) – Strings to filter.- Returns
Filtered list.
- Return type
list
ofstr
- classes = {'Const': <class 'markovchain.text.rank.Const'>, 'Test': <class 'markovchain.text.rank.Test'>}¶
- class markovchain.text.rank.Test(size, remove)[source]¶
Bases:
markovchain.text.rank.Rank
Base text rank class.
- size¶
- Type
int
- remove¶
- Type
float
- debug¶
If True, enable debug output.
- Type
bool
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
markovchain.text.scanner module¶
- class markovchain.text.scanner.CharScanner(end_chars='.?!', default_end='.', case=CharCase.LOWER)[source]¶
Bases:
markovchain.text.scanner.TextScanner
Character scanner.
- case¶
Character case.
- end_chars¶
Sentence ending characters.
- Type
str
- default_end¶
Default sentence ending character.
- Type
str
- start¶
True if current sentence is started.
- Type
bool
- end¶
True if current sentence is ended.
- Type
bool
Examples
>>> scan = CharScanner() >>> list(scan('Word')) ['W', 'o', 'r', 'd', '.', Scanner.END] >>> list(scan('Word', True)) ['W', 'o', 'r', 'd'] >>> list(scan('')) ['.', Scanner.END]
- __init__(end_chars='.?!', default_end='.', case=CharCase.LOWER)[source]¶
Character scanner constructor.
- Parameters
case (
str
orint
ormarkovchain.text.util.CharCase
, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).end_chars (
str
, optional) – Sentence ending characters (default: ‘.?!’).default_end (
str
, optional) – Default sentence ending character (default: ‘.’).
- scan(data, part)[source]¶
Scan a string.
- Parameters
data (
str
) – String to scan.part (
bool
) – True if data is partial.
- Returns
Token generator.
- Return type
generator
of (str
ormarkovchain.scanner.Scanner.END
)
- class markovchain.text.scanner.RegExpScanner(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=CharCase.LOWER)[source]¶
Bases:
markovchain.text.scanner.TextScanner
Regular expression scanner.
- DEFAULT_EXPR¶
Default regular expression.
- Type
_sre.SRE_Pattern
- case¶
Character case.
- expr¶
Regular expression..
- Type
_sre.SRE_Pattern
- default_end¶
Default sentence ending string.
- Type
str
- end¶
True
if current sentence is ended.- Type
bool
Examples
>>> scan = RegExpScanner(lambda data: data.split()) >>> list(scan('Word word. word')) ['Word', 'word', '.', Scanner.END, 'word', '.', Scanner.END] >>> list(scan('word', True)) ['word'] >>> list(scan('')) ['.', Scanner.END]
- DEFAULT_EXPR = re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))')¶
- __init__(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=CharCase.LOWER)[source]¶
Regular expression scanner constructor.
- Parameters
case (
str
orint
ormarkovchain.text.util.CharCase
, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).expr (
str
or_sre.SRE_Pattern
, optional) – Regular expression (default:markovchain.scanner.RegExpScanner.DEFAULT_EXPR
). It should have groups ‘end’ (sentence ending punctuation) and ‘word’ (words / other punctuation).default_end (
str
, optional) – Default sentence ending string (default: ‘.’).
- static get_group(match, group)[source]¶
Get a group from a regular expression match object if it exists.
- Parameters
match (
_sre.SRE_Match
) – Regular expression match object.group (
str
orint
) – Group name or index.
- Return type
str
orNone
- static get_regexp(x)[source]¶
Compile a regular expression if necessary.
- Parameters
x (
str
or_sre.SRE_Pattern
) – Regular expression.- Returns
Compiled regular expression.
- Return type
_sre.SRE_Pattern
- scan(data, part)[source]¶
Scan a string.
- Parameters
data (
str
) – String to scan.part (
bool
) –True
if data is partial.
- Returns
Token generator.
- Return type
generator
of (str
ormarkovchain.scanner.Scanner.END
)
- class markovchain.text.scanner.TextScanner(case=CharCase.LOWER)[source]¶
Bases:
markovchain.scanner.Scanner
Text scanner base class.
- case¶
- Character case.
Examples
>>> scan = Scanner(lambda data: data.split()) >>> scan('a b c') ['a', 'b', 'c']
- __call__(data, part=False)[source]¶
Scan a string.
- Parameters
data (
str
) – String to scan.part (
bool
, optional) – True if data is partial (default:False
).
- Returns
Token generator.
- Return type
generator
of (str
ormarkovchain.scanner.Scanner.END
)
- __init__(case=CharCase.LOWER)[source]¶
Text scanner constructor.
- Parameters
case (
str
orint
ormarkovchain.text.util.CharCase
, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).
- abstract scan(data, part)[source]¶
Scan a string.
- Parameters
data (
str
) – String to scan.part (
bool
) – True if data is partial.
- Returns
Token generator.
- Return type
generator
of (str
ormarkovchain.scanner.Scanner.END
)
markovchain.text.util module¶
- class markovchain.text.util.CharCase(value)[source]¶
Bases:
enum.IntEnum
Character case.
- LOWER = 3¶
- PRESERVE = 0¶
- TITLE = 1¶
- UPPER = 2¶
- convert(string)[source]¶
Return a copy of string converted to case.
- Parameters
string (
str
) –- Return type
str
Examples
>>> CharCase.LOWER.convert('sTr InG') 'str ing' >>> CharCase.UPPER.convert('sTr InG') 'STR ING' >>> CharCase.TITLE.convert('sTr InG') 'Str ing' >>> CharCase.PRESERVE.convert('sTr InG') 'sTr InG'
- class markovchain.text.util.ReFlags(value)[source]¶
Bases:
enum.IntEnum
Custom regexp flags.
- O¶
- Type
int
- OVERLAP¶
Replace overlapping occurrences of pattern.
- Type
int
- O = 1¶
- OVERLAP = 1¶
- class markovchain.text.util.ReplyMode(value)[source]¶
Bases:
enum.IntEnum
Text reply mode.
- END = 0¶
- REPLY = 2¶
- START = 1¶
- markovchain.text.util.capitalize(string)[source]¶
Capitalize a sentence.
- Parameters
string (
str
) – String to capitalize.- Returns
Capitalized string.
- Return type
str
Examples
>>> capitalize('worD WORD WoRd') 'Word word word'
- markovchain.text.util.get_words(string)[source]¶
Find all words in a string.
- Parameters
string (
str
) –- Return type
list
ofstr
Examples
>>> get_words(' ..?!word , (Word).. word') ['word', 'Word', 'word']
- markovchain.text.util.ispunct(string)[source]¶
Return
True
if all characters in a string are punctuation and it is not empty.- Parameters
string (
str
) –- Return type
bool
Examples
>>> ispunct('.,?') True >>> ispunct('.x.') False >>> ispunct('. ') False >>> ispunct('') False
- markovchain.text.util.lstrip_ws_and_chars(string, chars)[source]¶
Remove leading whitespace and characters from a string.
- Parameters
string (
str
) – String to strip.chars (
str
) – Characters to remove.
- Returns
Stripped string.
- Return type
str
Examples
>>> lstrip_ws_and_chars(' \t.\n , .x. ', '.,?!') 'x. '
- markovchain.text.util.re_flags(flags, custom=<enum 'ReFlags'>)[source]¶
Parse regexp flag string.
- Parameters
flags (
str
) – Flag string.custom (
IntEnum
, optional) – Custom flag enum (default: None).
- Returns
(flags for
re.compile
, custom flags)- Return type
(
int
,int
)- Raises
ValueError –
- markovchain.text.util.re_flags_str(flags, custom_flags)[source]¶
Convert regexp flags to string.
- Parameters
flags (
int
) – Flags.custom_flags (
int
) – Custom flags.
- Returns
Flag string.
- Return type
str
- markovchain.text.util.re_sub(pattern, repl, string, count=0, flags=0, custom_flags=0)[source]¶
Replace regular expression.
- Parameters
pattern (
str
or_sre.SRE_Pattern
) – Compiled regular expression.repl (
str
orfunction
) – Replacement.string (
str
) – Input string.count (
int
) – Maximum number of pattern occurrences.flags (
int
) – Flags.custom_flags (
int
) – Custom flags.