Advanged usage

This module exposes the 2 classes used to generate random plausible text:

  • The loremipsum.generating.Sample which provides the API to extract, load, and dump all the needed informations from a sample.
  • The loremipsum.generator.Generator which provides the API to actually generate the text, using a sample.
class Generator(sample=None)

Generates random strings of plausible text.

Markov chains are used to generate the random text based on the analysis of a sample text. In the analysis, only paragraph, sentence and word lengths, and some basic punctuation matter – the actual words are ignored. A provided list of words is then used to generate the random text, so that it will have a similar distribution of paragraph, sentence and word lengths.

The attributes of this class should be considered ‘read-only’. Even if you can access the internal state of the generator, you don’t want to mess with it: we are all grown adults.

default(*args, **kwds)

Context manager. Yields a Generator with altered defaults.

The purpose of this method is to let the call of more Generator methods with predefined set of arguments.

>>> from loremipsum import generator
>>> g = generator.Generator(...)
>>> with g.default(sentence_sigma=0.9, sentence_mean=0.9) as short:
>>>     sentences = short.generate_sentences(3)
>>>     paragraps = short.generate_paragraphs(5, incipit=True)
generate_paragraph(**args)

Generates a single paragraph, of random length.

Also accepts the same arguments as generate_sentence().

Parameters:
  • paragraph_len (int) – The length of the paragraph in sentences. Takes precedence over paragraph_mean and paragraph_sigma.
  • paragraph_mean (float) – Override the paragraph mean value.
  • paragraph_sigma (float) – Override the paragraph sigma value.
Returns:

A tuple containing number of sentences, number of words, and the paragraph text.

Return type:

tuple(int, int, str or unicode)

generate_paragraphs(amount, **args)

Generator method that yields paragraphs, of random length.

Also accepts the same arguments as generate_paragraph().

Parameters:amount (int) – The amount of paragraphs to generate.
Retruns:A generator of specified amount tuples. as per generate_paragraph()
Return type:generator
generate_sentence(**args)

Generates a single sentence, of random length.

Parameters:
  • incipit (bool) – If True, then the text will begin with the sample text incipit sentence.
  • sentence_len (int) – The length of the sentence in words. Takes precedence over sentence_mean and sentence_sigma.
  • sentence_mean (float) – Override the sentence mean value.
  • sentence_sigma (float) – Override the sentence sigma value.
Retruns:

A tuple containing sentence length and sentence text.

Return type:

tuple(int, str or unicode)

generate_sentences(amount, **args)

Generator method that yields sentences, of random length.

Also accepts the same arguments as generate_sentence().

Parameters:amount (int) – The amouont of sentences to generate
Retruns:A generator of specified amount tuples as per generate_sentence().
Return type:generator
generate_word(length=None)

Selects a random word from the lexicon.

Parameters:length (int) – the length of the generate word
Return type:str or unicode or None
generate_words(amount, length=None)

Creates a generatator of the specified amount of words.

Words are randomly selected from the lexicon. Also accepts length argument as per generate_word().

Parameters:amount (int) – the amount of words to be generated
Return type:generator
class Sample(**args)

The sample that generated sentences are based on.

Sentences are generated so that they will have a similar distribution of word, sentence and paragraph lengths and punctuation.

Sample text should be a string consisting of a number of paragraphs, each separated by empty lines. Each paragraph should consist of a number of sentences, separated by any character listed as sentence delimiters. Sentences consist of words separated by white spaces. Words may be followed by any character listed as word delimiters.

Sample instances behave like read-only dictionay and can be hashed.

Parameters:
  • frozen (tuple) – An immutable representation of the sample imformations. This argument takes precedence over sample, text, lexicon, word_delimiters or sentence_delimiters.
  • sample (dict) – A dictionary of the sample informations. This argument takes precedence over sample, lexicon, word_delimiters or sentence_delimiters.
  • text (str) – A string containing the sample text. Sample text must contain one or more empty-line delimited paragraphs. Each paragraph must contain one or more sentences, delimited by any char included in the sentence_delimiters param.
  • lexicon (str) – A list of strings to be used as words.
  • word_delimiters (str) – A string of chars used as word delimiters.
  • sentence_delimiters (str) – A string of chars used as sentence delimiters.
Raises:
  • TypeError – If neither frozen nor sample are provided and any of text, lexicon, word_delimiters or sentence_delimiters are missing.
  • ValueError – If could not succesfully create an internal Sample out of the supplied arguments.
classmethod cooked(class_, text, lexicon, word_delimiters, sentence_delimiters)

Returns a Sample instance based on arguments.

See Sample.row() for more informations.

>>> with open('sample.txt', 'r') as txt:
...     text = txt.read().decode('UTF-8')
...
>>> with open('lexicon.txt', 'r') as txt:
...     lexicon = txt.read().decode('UTF-8')
...
>>> with open('word_delimiters.txt', 'r') as txt:
...     w_delimiters = txt.read().decode('UTF-8')
...
>>> with open('sentence_delimiters.txt', 'r') as txt:
...     s_delimiters = txt.read().decode('UTF-8')
...
>>> sample = Sample.cooked(text, lexicon, w_delimiters, s_delimiters)

Also, you can do:

>>> type(other)
<class 'loremipsum.generator.Sample'>
>>> sample = Sample.cooked(*other.row())
>>>
copy()

Returns a dict representation of itself.

Return type:dict
dump(url, **args)

Dumps a sample to an URL.

classmethod duplicated(class_, sample)

Returns a Sample instance based on a mapping.

Parameters:sample – Can be a dict or a Sample.

See Sample.frozen() for more informations.

>>> type(other)
<class 'loremipsum.generator.Sample'>
>>> sample = Sample.duplicated(other.copy())
>>>
frozen()

Returns a frozen representation of itself.

Return type:tuple of tuples
classmethod load(class_, url, **args)

Loads a sample from an URL.

static remove(url, **args)

Remove a dumped sample from a URL.

row()

Returns the row components of a sample.

Returns:text, lexicon, word_delimiters, sentence_delimiters
Return type:tuple
classmethod thawed(class_, frozen)

Returns a Sample instance based on the frozen sample.

Parameters:frozen – A frozen representation of a sample: a tuple of tuples.

See Sample.frozen() for more informations.

>>> type(other)
<class 'loremipsum.generator.Sample'>
>>> sample = Sample.thawed(other.frozen())
>>>