markovchain.text package¶
Submodules¶
markovchain.text.formatter module¶
- class markovchain.text.formatter.Formatter(case=CharCase.TITLE, replace=None, end_chars='.?!', default_end='.')[source]¶
Bases:
markovchain.text.formatter.FormatterBaseDefault formatter.
- case¶
Character case.
- replace¶
List of regular expressions to replace.
- Type
listof (_sre.SRE_Pattern,str,int)
- end_chars¶
Sentence ending characters.
- Type
str
- default_end¶
Default sentence ending character.
- Type
Noneorstr
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- DEFAULT_REPLACE = [('\\s+', ' '), ('\\s*([^\\w\\s]+)\\s*', '\\1'), ('([,.?!])(\\w)', '\\1 \\2'), ('([\\w,.?!])([[({<])', '\\1 \\2'), ('([])}>])(\\w)', '\\1 \\2'), ('(\\w)([-+*]+)(\\w)', '\\1 \\2 \\3')]¶
- __call__(string)[source]¶
Format a string.
- Parameters
string (
str) – String to format.- Returns
Formatted string.
- Return type
str
- __init__(case=CharCase.TITLE, replace=None, end_chars='.?!', default_end='.')[source]¶
Formatter constructor.
- Parameters
case (
intorstrormarkovchain.text.util.CharCase, optional) – Character case (default:markovchain.text.util.CharCase.TITLE).end_chars (
str, optional) – Sentence ending characters (default: ‘.?!’).default_end (
Noneorstr, optional) – Default sentence ending character (default: ‘.’).replace (
listof ((str,str) or (str,str,str)), optional) – List of regular expressions to replace (default: DEFAULT_REPLACE).
- class markovchain.text.formatter.FormatterBase[source]¶
Bases:
markovchain.util.SaveLoadText formatter base class.
- classes¶
- Type
dict
- Class group.
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- abstract __call__(string)[source]¶
Format a string.
- Parameters
string (
str) – String to format.- Returns
Formatted string.
- Return type
str
- classes = {'Formatter': <class 'markovchain.text.formatter.Formatter'>, 'Noop': <class 'markovchain.text.formatter.Noop'>}¶
- class markovchain.text.formatter.Noop[source]¶
Bases:
markovchain.text.formatter.FormatterBaseNo-op formatter.
- classes¶
- Type
dict
- Class group.
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
markovchain.text.markov module¶
- class markovchain.text.markov.MarkovText(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]¶
Bases:
markovchain.base.MarkovMarkov text generator class.
- DEFAULT_SCANNER¶
Default scanner class.
- Type
type
- DEFAULT_PARSER¶
Default parser class.
- Type
type
- DEFAULT_STORAGE¶
Default storage class.
- Type
type
- scanner¶
- parser¶
- storage¶
- Type
markovchain.storage.Storage
- DEFAULT_FORMATTER¶
alias of
markovchain.text.formatter.Formatter
- DEFAULT_PARSER¶
alias of
markovchain.parser.Parser
- DEFAULT_RANK¶
alias of
markovchain.text.rank.Const
- DEFAULT_SCANNER¶
- __call__(max_length=None, state_size=None, reply_to=None, reply_mode=ReplyMode.END, dataset='')[source]¶
Generate text.
- Parameters
max_length (
intorNone, optional) – Maximum sentence length (default: None).state_size (
int, optional) – State size (default: parser.state_sizes[0]).reply_to (
strorNone, optional) – Input string (default: None).reply_mode (
markovchain.text.util.ReplyMode, optional) – Reply mode (default:markovchain.text.util.ReplyMode.END)dataset (
str, optional) – Dataset key prefix (default: ‘’).
- Return type
str
- __init__(scanner=None, parser=None, storage=None, formatter=None, rank=None)[source]¶
Markov chain generator base class constructor.
- Parameters
scanner (
dictormarkovchain.scanner.Scanner, optional) – Scanner (default:DEFAULT_SCANNER()).parser (
dictormarkovchain.parser.ParserBase, optional) – Parser (default:DEFAULT_PARSER()).storage (
markovchain.storage.Storage, optional) – Parser (default:DEFAULT_STORAGE()).
- data(data, part=False, dataset='')[source]¶
Parse data and update links.
- Parameters
data (
str) – Text to parse.part (
bool, optional) – True if data is partial (default:False).dataset (
str, optional) – Dataset key prefix (default: ‘’).
- generate_cont(max_length, state_size, reply_to, backward, dataset)[source]¶
Generate texts from start/end.
- Parameters
max_length (
intorNone) – Maximum sentence length.state_size (
int) – State size.reply_to (
strorNone) – Input string.backward (
bool) –Trueto generate text start.dataset (
str) – Dataset key prefix.
- Returns
Generated texts.
- Return type
generatorofstr
- generate_replies(max_length, state_size, reply_to, dataset)[source]¶
Generate replies.
- Parameters
max_length (
intorNone) – Maximum sentence length.state_size (
int) – State size.reply_to (
str) – Input string.dataset (
str) – Dataset key prefix.
- Returns
Generated texts.
- Return type
generatorofstr
- get_cont_state(string, backward=False)[source]¶
Get initial states from input string.
- Parameters
string (
strorNone) –backward (
bool) –
- Return type
tupleofstr
markovchain.text.rank module¶
- class markovchain.text.rank.Const(**_)[source]¶
Bases:
markovchain.text.rank.RankConstant text rank.
- size¶
- Type
int
- remove¶
- Type
float
- debug¶
If True, enable debug output.
- Type
bool
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- class markovchain.text.rank.Rank(size=10, remove=0.5)[source]¶
Bases:
markovchain.util.SaveLoadBase text rank class.
- size¶
- Type
int
- remove¶
- Type
float
- debug¶
If True, enable debug output.
- Type
bool
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
- __call__(strings)[source]¶
Filter strings by rank.
- Parameters
strings (
iterableofstr) – Strings to filter.- Returns
Filtered list.
- Return type
listofstr
- classes = {'Const': <class 'markovchain.text.rank.Const'>, 'Test': <class 'markovchain.text.rank.Test'>}¶
- class markovchain.text.rank.Test(size, remove)[source]¶
Bases:
markovchain.text.rank.RankBase text rank class.
- size¶
- Type
int
- remove¶
- Type
float
- debug¶
If True, enable debug output.
- Type
bool
Examples
>>> class SaveLoadGroup(SaveLoad): ... classes = {} ... >>> class SaveLoadObject(SaveLoadGroup): ... def __init__(self, attr=None): ... self.attr = attr ... def save(self): ... data = super().save() ... data['attr'] = self.attr ... return data ... >>> SaveLoadGroup.add_class(SaveLoadObject) >>> SaveLoadGroup.classes {'SaveLoadObject': <class '__main__.SaveLoadObject'>} >>> obj = SaveLoadObject(0) >>> data = obj.save() >>> data {'attr': 0, '__class__': 'SaveLoadObject'} >>> obj2 = SaveLoadGroup.load(data) >>> type(obj2) <class '__main__.SaveLoadObject'> >>> obj2.attr 0
markovchain.text.scanner module¶
- class markovchain.text.scanner.CharScanner(end_chars='.?!', default_end='.', case=CharCase.LOWER)[source]¶
Bases:
markovchain.text.scanner.TextScannerCharacter scanner.
- case¶
Character case.
- end_chars¶
Sentence ending characters.
- Type
str
- default_end¶
Default sentence ending character.
- Type
str
- start¶
True if current sentence is started.
- Type
bool
- end¶
True if current sentence is ended.
- Type
bool
Examples
>>> scan = CharScanner() >>> list(scan('Word')) ['W', 'o', 'r', 'd', '.', Scanner.END] >>> list(scan('Word', True)) ['W', 'o', 'r', 'd'] >>> list(scan('')) ['.', Scanner.END]
- __init__(end_chars='.?!', default_end='.', case=CharCase.LOWER)[source]¶
Character scanner constructor.
- Parameters
case (
strorintormarkovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).end_chars (
str, optional) – Sentence ending characters (default: ‘.?!’).default_end (
str, optional) – Default sentence ending character (default: ‘.’).
- scan(data, part)[source]¶
Scan a string.
- Parameters
data (
str) – String to scan.part (
bool) – True if data is partial.
- Returns
Token generator.
- Return type
generatorof (strormarkovchain.scanner.Scanner.END)
- class markovchain.text.scanner.RegExpScanner(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=CharCase.LOWER)[source]¶
Bases:
markovchain.text.scanner.TextScannerRegular expression scanner.
- DEFAULT_EXPR¶
Default regular expression.
- Type
_sre.SRE_Pattern
- case¶
Character case.
- expr¶
Regular expression..
- Type
_sre.SRE_Pattern
- default_end¶
Default sentence ending string.
- Type
str
- end¶
Trueif current sentence is ended.- Type
bool
Examples
>>> scan = RegExpScanner(lambda data: data.split()) >>> list(scan('Word word. word')) ['Word', 'word', '.', Scanner.END, 'word', '.', Scanner.END] >>> list(scan('word', True)) ['word'] >>> list(scan('')) ['.', Scanner.END]
- DEFAULT_EXPR = re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))')¶
- __init__(expr=re.compile('(?:(?P<end>[.!?]+)|(?P<word>(?:[^\\w\\s]+|\\w+)))'), default_end='.', case=CharCase.LOWER)[source]¶
Regular expression scanner constructor.
- Parameters
case (
strorintormarkovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).expr (
stror_sre.SRE_Pattern, optional) – Regular expression (default:markovchain.scanner.RegExpScanner.DEFAULT_EXPR). It should have groups ‘end’ (sentence ending punctuation) and ‘word’ (words / other punctuation).default_end (
str, optional) – Default sentence ending string (default: ‘.’).
- static get_group(match, group)[source]¶
Get a group from a regular expression match object if it exists.
- Parameters
match (
_sre.SRE_Match) – Regular expression match object.group (
strorint) – Group name or index.
- Return type
strorNone
- static get_regexp(x)[source]¶
Compile a regular expression if necessary.
- Parameters
x (
stror_sre.SRE_Pattern) – Regular expression.- Returns
Compiled regular expression.
- Return type
_sre.SRE_Pattern
- scan(data, part)[source]¶
Scan a string.
- Parameters
data (
str) – String to scan.part (
bool) –Trueif data is partial.
- Returns
Token generator.
- Return type
generatorof (strormarkovchain.scanner.Scanner.END)
- class markovchain.text.scanner.TextScanner(case=CharCase.LOWER)[source]¶
Bases:
markovchain.scanner.ScannerText scanner base class.
- case¶
- Character case.
Examples
>>> scan = Scanner(lambda data: data.split()) >>> scan('a b c') ['a', 'b', 'c']
- __call__(data, part=False)[source]¶
Scan a string.
- Parameters
data (
str) – String to scan.part (
bool, optional) – True if data is partial (default:False).
- Returns
Token generator.
- Return type
generatorof (strormarkovchain.scanner.Scanner.END)
- __init__(case=CharCase.LOWER)[source]¶
Text scanner constructor.
- Parameters
case (
strorintormarkovchain.text.util.CharCase, optional) – Character case (default: markovchain.text.util.CharCase.LOWER).
- abstract scan(data, part)[source]¶
Scan a string.
- Parameters
data (
str) – String to scan.part (
bool) – True if data is partial.
- Returns
Token generator.
- Return type
generatorof (strormarkovchain.scanner.Scanner.END)
markovchain.text.util module¶
- class markovchain.text.util.CharCase(value)[source]¶
Bases:
enum.IntEnumCharacter case.
- LOWER = 3¶
- PRESERVE = 0¶
- TITLE = 1¶
- UPPER = 2¶
- convert(string)[source]¶
Return a copy of string converted to case.
- Parameters
string (
str) –- Return type
str
Examples
>>> CharCase.LOWER.convert('sTr InG') 'str ing' >>> CharCase.UPPER.convert('sTr InG') 'STR ING' >>> CharCase.TITLE.convert('sTr InG') 'Str ing' >>> CharCase.PRESERVE.convert('sTr InG') 'sTr InG'
- class markovchain.text.util.ReFlags(value)[source]¶
Bases:
enum.IntEnumCustom regexp flags.
- O¶
- Type
int
- OVERLAP¶
Replace overlapping occurrences of pattern.
- Type
int
- O = 1¶
- OVERLAP = 1¶
- class markovchain.text.util.ReplyMode(value)[source]¶
Bases:
enum.IntEnumText reply mode.
- END = 0¶
- REPLY = 2¶
- START = 1¶
- markovchain.text.util.capitalize(string)[source]¶
Capitalize a sentence.
- Parameters
string (
str) – String to capitalize.- Returns
Capitalized string.
- Return type
str
Examples
>>> capitalize('worD WORD WoRd') 'Word word word'
- markovchain.text.util.get_words(string)[source]¶
Find all words in a string.
- Parameters
string (
str) –- Return type
listofstr
Examples
>>> get_words(' ..?!word , (Word).. word') ['word', 'Word', 'word']
- markovchain.text.util.ispunct(string)[source]¶
Return
Trueif all characters in a string are punctuation and it is not empty.- Parameters
string (
str) –- Return type
bool
Examples
>>> ispunct('.,?') True >>> ispunct('.x.') False >>> ispunct('. ') False >>> ispunct('') False
- markovchain.text.util.lstrip_ws_and_chars(string, chars)[source]¶
Remove leading whitespace and characters from a string.
- Parameters
string (
str) – String to strip.chars (
str) – Characters to remove.
- Returns
Stripped string.
- Return type
str
Examples
>>> lstrip_ws_and_chars(' \t.\n , .x. ', '.,?!') 'x. '
- markovchain.text.util.re_flags(flags, custom=<enum 'ReFlags'>)[source]¶
Parse regexp flag string.
- Parameters
flags (
str) – Flag string.custom (
IntEnum, optional) – Custom flag enum (default: None).
- Returns
(flags for
re.compile, custom flags)- Return type
(
int,int)- Raises
ValueError –
- markovchain.text.util.re_flags_str(flags, custom_flags)[source]¶
Convert regexp flags to string.
- Parameters
flags (
int) – Flags.custom_flags (
int) – Custom flags.
- Returns
Flag string.
- Return type
str
- markovchain.text.util.re_sub(pattern, repl, string, count=0, flags=0, custom_flags=0)[source]¶
Replace regular expression.
- Parameters
pattern (
stror_sre.SRE_Pattern) – Compiled regular expression.repl (
strorfunction) – Replacement.string (
str) – Input string.count (
int) – Maximum number of pattern occurrences.flags (
int) – Flags.custom_flags (
int) – Custom flags.