Package python-module-logilab-mtconverter-0 :: Package 8 :: Package 4 :: Package transforms :: Module html2text
[frames] | no frames]

Module html2text

source code

html2text: Turn HTML into equivalent Markdown-structured text.

There is some specific mtconvter code at the end to define the html to text transformation.

Copyright (C) 2004-2008 Aaron Swartz. GNU GPL 3. Copyright (C) 2008 Logilab S.A.


Version: 2.38

Author: Aaron Swartz (me@aaronsw.com)

Copyright: (C) 2004-2008 Aaron Swartz. GNU GPL 3.

Classes
  html_to_formatted_text
transforms html to formatted plain text
Functions
 
name2cp(k) source code
 
charref(name) source code
 
entityref(c) source code
 
replaceEntities(s) source code
 
unescape(s) source code
 
fixattrs(attrs) source code
 
onlywhite(line)
Return true if the line does only consist of whitespace characters.
source code
 
optwrap(text)
Wrap all paragraphs in the provided text.
source code
 
hn(tag) source code
 
wrapwrite(text) source code
 
html2text_file(html, out=wrapwrite, baseurl='', encoding='utf8') source code
 
html2text(html, baseurl='', encoding='utf8') source code
Variables
  __contributors__ = ["Martin 'Joey' Schulze", "Ricardo Reyes", ...
  UNICODE_SNOB = 0
  LINKS_EACH_PARAGRAPH = 0
  BODY_WIDTH = 78
  SKIP_INTERNAL_LINKS = False
  unifiable = {'rsquo': "'", 'lsquo': "'", 'rdquo': '"', 'ldquo'...
  unifiable_n = {}
  r_unescape = re.compile(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));")
Variables Details

__contributors__

Value:
["Martin 'Joey' Schulze", "Ricardo Reyes", "Kevin Jay North"]

unifiable

Value:
{'rsquo': "'", 'lsquo': "'", 'rdquo': '"', 'ldquo': '"', 'copy': '(C)'\
, 'mdash': '--', 'nbsp': ' ', 'rarr': '->', 'larr': '<-', 'middot': '*\
', 'ndash': '-', 'oelig': 'oe', 'aelig': 'ae', 'agrave': 'a', 'aacute'\
: 'a', 'acirc': 'a', 'atilde': 'a', 'auml': 'a', 'aring': 'a', 'egrave\
': 'e', 'eacute': 'e', 'ecirc': 'e', 'euml': 'e', 'igrave': 'i', 'iacu\
te': 'i', 'icirc': 'i', 'iuml': 'i', 'ograve': 'o', 'oacute': 'o', 'oc\
irc': 'o', 'otilde': 'o', 'ouml': 'o', 'ugrave': 'u', 'uacute': 'u', '\
ucirc': 'u', 'uuml': 'u'}