Advanged usage¶
This module exposes the 2 classes used to generate random plausible text:
- The
loremipsum.generating.Sample
which provides the API to extract, load, and dump all the needed informations from a sample. - The
loremipsum.generator.Generator
which provides the API to actually generate the text, using a sample.
-
class
Generator
(sample=None)¶ Generates random strings of plausible text.
Markov chains are used to generate the random text based on the analysis of a sample text. In the analysis, only paragraph, sentence and word lengths, and some basic punctuation matter – the actual words are ignored. A provided list of words is then used to generate the random text, so that it will have a similar distribution of paragraph, sentence and word lengths.
The attributes of this class should be considered ‘read-only’. Even if you can access the internal state of the generator, you don’t want to mess with it: we are all grown adults.
-
default
(*args, **kwds)¶ Context manager. Yields a
Generator
with altered defaults.The purpose of this method is to let the call of more
Generator
methods with predefined set of arguments.>>> from loremipsum import generator >>> g = generator.Generator(...) >>> with g.default(sentence_sigma=0.9, sentence_mean=0.9) as short: >>> sentences = short.generate_sentences(3) >>> paragraps = short.generate_paragraphs(5, incipit=True)
-
generate_paragraph
(**args)¶ Generates a single paragraph, of random length.
Also accepts the same arguments as
generate_sentence()
.Parameters: - paragraph_len (int) – The length of the paragraph in sentences. Takes precedence over paragraph_mean and paragraph_sigma.
- paragraph_mean (float) – Override the paragraph mean value.
- paragraph_sigma (float) – Override the paragraph sigma value.
Returns: A tuple containing number of sentences, number of words, and the paragraph text.
Return type: tuple(int, int, str or unicode)
-
generate_paragraphs
(amount, **args)¶ Generator method that yields paragraphs, of random length.
Also accepts the same arguments as
generate_paragraph()
.Parameters: amount (int) – The amount of paragraphs to generate. Retruns: A generator of specified amount tuples. as per generate_paragraph()
Return type: generator
-
generate_sentence
(**args)¶ Generates a single sentence, of random length.
Parameters: - incipit (bool) – If True, then the text will begin with the sample text incipit sentence.
- sentence_len (int) – The length of the sentence in words. Takes precedence over sentence_mean and sentence_sigma.
- sentence_mean (float) – Override the sentence mean value.
- sentence_sigma (float) – Override the sentence sigma value.
Retruns: A tuple containing sentence length and sentence text.
Return type: tuple(int, str or unicode)
-
generate_sentences
(amount, **args)¶ Generator method that yields sentences, of random length.
Also accepts the same arguments as
generate_sentence()
.Parameters: amount (int) – The amouont of sentences to generate Retruns: A generator of specified amount tuples as per generate_sentence()
.Return type: generator
-
generate_word
(length=None)¶ Selects a random word from the lexicon.
Parameters: length (int) – the length of the generate word Return type: str or unicode or None
-
generate_words
(amount, length=None)¶ Creates a generatator of the specified amount of words.
Words are randomly selected from the lexicon. Also accepts length argument as per
generate_word()
.Parameters: amount (int) – the amount of words to be generated Return type: generator
-
-
class
Sample
(**args)¶ The sample that generated sentences are based on.
Sentences are generated so that they will have a similar distribution of word, sentence and paragraph lengths and punctuation.
Sample text should be a string consisting of a number of paragraphs, each separated by empty lines. Each paragraph should consist of a number of sentences, separated by any character listed as sentence delimiters. Sentences consist of words separated by white spaces. Words may be followed by any character listed as word delimiters.
Sample
instances behave like read-only dictionay and can be hashed.Parameters: - frozen (tuple) – An immutable representation of the sample imformations. This argument takes precedence over sample, text, lexicon, word_delimiters or sentence_delimiters.
- sample (dict) – A dictionary of the sample informations. This argument takes precedence over sample, lexicon, word_delimiters or sentence_delimiters.
- text (str) – A string containing the sample text. Sample text must contain one or more empty-line delimited paragraphs. Each paragraph must contain one or more sentences, delimited by any char included in the sentence_delimiters param.
- lexicon (str) – A list of strings to be used as words.
- word_delimiters (str) – A string of chars used as word delimiters.
- sentence_delimiters (str) – A string of chars used as sentence delimiters.
Raises: - TypeError – If neither frozen nor sample are provided and any of text, lexicon, word_delimiters or sentence_delimiters are missing.
- ValueError – If could not succesfully create an
internal
Sample
out of the supplied arguments.
-
classmethod
cooked
(class_, text, lexicon, word_delimiters, sentence_delimiters)¶ Returns a
Sample
instance based on arguments.See
Sample.row()
for more informations.>>> with open('sample.txt', 'r') as txt: ... text = txt.read().decode('UTF-8') ... >>> with open('lexicon.txt', 'r') as txt: ... lexicon = txt.read().decode('UTF-8') ... >>> with open('word_delimiters.txt', 'r') as txt: ... w_delimiters = txt.read().decode('UTF-8') ... >>> with open('sentence_delimiters.txt', 'r') as txt: ... s_delimiters = txt.read().decode('UTF-8') ... >>> sample = Sample.cooked(text, lexicon, w_delimiters, s_delimiters)
Also, you can do:
>>> type(other) <class 'loremipsum.generator.Sample'> >>> sample = Sample.cooked(*other.row()) >>>
-
copy
()¶ Returns a
dict
representation of itself.Return type: dict
-
dump
(url, **args)¶ Dumps a sample to an URL.
-
classmethod
duplicated
(class_, sample)¶ Returns a
Sample
instance based on a mapping.Parameters: sample – Can be a dict
or aSample
.See
Sample.frozen()
for more informations.>>> type(other) <class 'loremipsum.generator.Sample'> >>> sample = Sample.duplicated(other.copy()) >>>
-
frozen
()¶ Returns a frozen representation of itself.
Return type: tuple of tuples
-
classmethod
load
(class_, url, **args)¶ Loads a sample from an URL.
-
static
remove
(url, **args)¶ Remove a dumped sample from a URL.
-
row
()¶ Returns the row components of a sample.
Returns: text, lexicon, word_delimiters, sentence_delimiters Return type: tuple
-
classmethod
thawed
(class_, frozen)¶ Returns a
Sample
instance based on the frozen sample.Parameters: frozen – A frozen representation of a sample: a tuple of tuples. See
Sample.frozen()
for more informations.>>> type(other) <class 'loremipsum.generator.Sample'> >>> sample = Sample.thawed(other.frozen()) >>>