This page contains common examples of common sed-like operations (taken from Sed One-Liners Explained).
>>> stream_lines('foo.txt') | filt(lambda l: l + '\n') | to_stream(sys.stdout)
>>> stream_lines('foo.txt') | filt(lambda l: l + '\n', pre = lambda l: l.strip()) | to_stream(sys.stdout)
>>> stream_lines('foo.txt') | filt(lambda l: l + '\n\n') | to_stream(sys.stdout)
>>> stream_lines('foo.txt') | enumerate_() | \
... consec_group(lambda (n, _): int(n / 2), lambda d: select_inds(1) | nth(0)) | \
... to_stream(sys.stdout)
Note
The first line,
>>> stream_lines('foo.txt') | enumerate_() | \
transforms the lines of the files to tuples, whose first item is a running index.
The second line,
... consec_group(lambda (n, _): int(n / 2), lambda d: select_inds(1) | nth(0)) | \
uses the dagpype.consec_group() filter. The key determining consecutive group is the integer part of the running index divided by 2 (hence each two consecutive lines are in a group); the pipe for a consecutive group selects the first index, then pipes the result to a terminal stage taking only the first element piped.
>>> stream_lines('foo.txt') | \
... filt(lambda l: '\nregex' if l == 'regex' else l) | to_stream(sys.stdout)
>>> stream_lines('foo.txt') | filt(lambda l: 'regex\n' if l == 'regex' else l) | to_stream(sys.stdout)
>>> stream_lines('foo.txt') | filt(lambda l: '\nregex\n' if l == 'regex' else l) | to_stream(sys.stdout)
>>> stream_lines('foo.txt') | enumerate_() | \
... filt(lambda (n, l): '{:<5} {}'.format(n, l)) | \
... to_stream(sys.stdout)
>>> stream_lines('foo.txt') | enumerate_() | \
... filt(lambda (n, l): '{:>5} {}'.format(n, l)) | \
... to_stream(sys.stdout)
Number each non-empty line of a file.
>>> source((int(l.strip() != ''), l.strip()) for l in open('foo.txt')) | \
... (select_inds(0) | cum_sum()) + select_inds(1) | \
... filt(lambda (n, l): '{} {}'.format(n, l) if l else '') | \
... to_stream('tmp.txt')
Count the number of lines in a file.
>>> print stream_lines('foo.txt') | count()
Delete leading whitespace (tabs and spaces) from each line.
>>> stream_lines('foo.txt') | filt(lambda l: l.lstrip()) | to_stream(sys.stdout)
Delete trailing whitespace (tabs and spaces) from each line.
>>> stream_lines('foo.txt') | filt(lambda l: l.rstrip()) | to_stream(sys.stdout)
Delete both leading and trailing whitespace from each line.
>>> stream_lines('foo.txt') | filt(lambda l: l.strip()) | to_stream(sys.stdout)
Insert five blank spaces at the beginning of each line.
>>> stream_lines('foo.txt') | filt(lambda l: 5 * ' ' + l) | to_stream(sys.stdout)
Align lines right on a 79-column width.
>>> stream_lines('foo.txt') | filt(lambda l: '{:>79}'.format(l)) | to_stream(sys.stdout)
Align lines right on a 79-column width.
>>> stream_lines('foo.txt') | filt(lambda l: '{:^79}'.format(l)) | to_stream(sys.stdout)
Substitute (find and replace) the fist occurrence of “foo” with “bar” on each line.
>>> stream_lines('foo.txt') | filt(lambda l: l.replace('foo', 'bar', 1)) | to_stream(sys.stdout)
Substitute (find and replace) the fourth occurrence of “foo” with “bar” on each line.
>>> stream_lines('foo.txt') |
... filt(lambda l: 'foo'.join(l.split('foo', 4)[: 4]) + 'bar' + ''.join(l.split('foo', 4)[4: ]) \
... if len(l.split('foo')) > 3 else l) | \
... to_stream(sys.stdout)
Substitute (find and replace) all occurrence of “foo” with “bar” on each line.
>>> stream_lines('foo.txt') |
... filt(lambda l: l.replace('foo', 'bar')) | \
... to_stream(sys.stdout)
Substitute (find and replace) the first occurrence of a repeated occurrence of “foo” with “bar”.
>>> stream_lines('foo.txt') |
... filt(lambda l: l.replace('foo', 'bar', 1)) | \
... to_stream(sys.stdout)
Substitute (find and replace) only the last occurrence of “foo” with “bar”.
>>> stream_lines('foo.txt') |
... filt(lambda l: 'bar'.join(l.rsplit('foo', 1))) | \
... to_stream(sys.stdout)
Substitute all occurrences of “foo” with “bar” on all lines that contain “baz”.
>>> stream_lines('foo.txt') |
... filt(lambda l: l.replace('foo', 'bar') if 'baz' in l else l) | \
... to_stream(sys.stdout)
Substitute all occurrences of “foo” with “bar” on all lines that DO NOT contain “baz”.
>>> stream_lines('foo.txt') |
... filt(lambda l: l.replace('foo', 'bar') if 'baz' not in l else l) | \
... to_stream(sys.stdout)
Change text “scarlet”, “ruby” or “puce” to “red”.
>>> stream_lines('foo.txt') |
... filt(lambda l: l.replace('scarlet', 'red').replace('ruby', 'red').replace('puce', 'red')) | \
... to_stream(sys.stdout)
Join pairs of lines side-by-side (emulates “paste” Unix command).
>>> stream_lines('foo.txt') | enumerate_() | \
... consec_group(
... lambda (n, _): int(n / 2),
... lambda d: select_inds(1) | (to_list() | sink(lambda l: '\t'.join(l)))) | \
... to_stream(sys.stdout)
Note
see Item 4 for an explanation of the dagpype.consec_group() filter use here.
Append a line to the next if it ends with a backslash “”.
>>> c = [0]
>>> stream_lines('foo.txt') | \
... consec_group(
... lambda l: (c[0], c.__setitem__(0, c[0] + int(len(l) == 0 or l[-1] != '\\'))),
... lambda d: filt(lambda l: l[: -1] if len(l) > 0 and l[-1] == '\\' else l) | sum_()) | \
... to_stream(sys.stdout)
Note
The first line,
>>> c = [0]
initializes c to a list with a single element (0).
lines 3-5,
... consec_group(
... lambda l: (c[0], c.__setitem__(0, c[0] + int(len(l) == 0 or l[-1] != '\\'))),
... lambda d: filt(lambda l: l[: -1] if len(l) > 0 and l[-1] == '\\' else l) | sum_()) | \
group paragraphs together using the dagpype.consec_group() filter. The key determining consecutive group is a counter incremented after each line not ending in \ (see Stupid lambda tricks); the pipe for a consecutive group strips the \ character, and joins the lines.
Append a line to the previous if it starts with an equal sign “=”.
>>> c = [0]
>>> stream_lines('foo.txt') | \
... consec_group(
... lambda l: (c[0], c.__setitem__(0, c[0] + int(len(l) == 0 or l[0] != '='))),
... lambda d: filt(lambda l: l[1 :] if len(l) > 0 and l[0] == '=' else l) | sum_()) | \
... to_stream(sys.stdout)
Note
see Item 39 for an explanation.
Add a blank line after every five lines.
>>> stream_lines('foo.txt') | enumerate_(1) | \
... filt(lambda (n, l): l if n % 5 else (l + '\n')) | to_stream(sys.stdout)
Note
The first line,
>>> stream_lines('foo.txt') | enumerate_(1) | \
transforms the lines of the files to tuples, whose first item is a running index starting from 1.
Print the first 10 lines of a file.
>>> print slice_(open('foo.txt'), 10) | to_list()
Print the first line of a file.
>>> print stream_lines('foo.txt') | nth(0)
Print the last 10 lines of a file.
>>> print stream_lines('foo.txt') | tail(10) | to_list()
Print the last 2 lines of a file.
>>> print stream_lines('foo.txt') | tail(2) | to_list()
Print the last line of a file.
>>> print stream_lines('foo.txt') | nth(-1)
Print next-to-the-last line of a file.
>>> print stream_lines('foo.txt') | nth(-2)
Print only the lines that match a regular expression (emulates “grep”).
>>> stream_lines('foo.txt') | grep(re.compile(r'foo(.+?)bar')) | to_stream(sys.stdout)
Print only the lines that do not match a regular expression.
>>> r = re.compile(r'foo(.+?)bar')
>>> stream_lines('foo.txt') | filt(pre = lambda l: r.match(l) is None) | to_stream(sys.stdout)
Print the line immediately before regexp, but not the line containing the regexp.
>>> r = re.compile(r'foo(.+?)bar')
>>> stream_lines('foo.txt') | to(lambda l: r.match(l)) | (nth(-2) | to_stream(sys.stdout))
Print the line immediately after regexp, but not the line containing the regexp.
>>> r = re.compile(r'foo(.+?)bar')
>>> stream_lines('foo.txt') | from_(lambda l: r.match(l)) | (nth(1) | to_stream(sys.stdout))
Print one line before and after regexp. Also print the line matching regexp and its line number. (emulates “grep -A1 -B1”).
>>> r = re.compile(r'foo(.+?)bar')
>>> stream_lines('foo.txt') | \
... (to(lambda l: r.match(l)) | (nth(-2))) + \
... (from_(lambda l: r.match(l)) | (nth(1))) + \
... (enumerate_() | filt(pre = lambda (n, l): r.match(l)) | nth(0))
Grep for “AAA” and “BBB” and “CCC” in any order.
>>> stream_lines('foo.txt') | \
... filt(pre = lambda l: 'AAA' in l and 'BBB' in l and 'CCC' in l) | to_stream(sys.stdout)
Grep for “AAA” and “BBB” and “CCC” in that order.
>>> stream_lines('foo.txt') | \
... grep(re.compile(r'(.*)AAA(.*)BBB(.*)CCC(.*)')) | to_stream(sys.stdout)
Grep for “AAA” or “BBB”, or “CCC”.
>>> stream_lines('foo.txt') | \
... filt(pre = lambda l: 'AAA' in l or 'BBB' in l or 'CCC' in l) | to_stream(sys.stdout)
Print a paragraph that contains “AAA”. (Paragraphs are separated by blank lines).
>>> stream_lines('foo.txt') | \
... consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
... filt(lambda ls: '\n'.join(ls) + '\n', pre = lambda ls: sum(['AAA' in l for l in ls]) > 0) | \
... to_stream(sys.stdout)
Note
The second line,
... consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
groups paragraphs together using the dagpype.consec_group() filter. The key determining consecutive group is whether a line is empty (hence all lines in a paragraph will get True key, whereas separating lines will get a False key); the pipe for a consecutive group places the lines in a list.
The third line,
... filt(lambda ls: '\n'.join(ls) + '\n', pre = lambda ls: sum(['AAA' in l for l in ls]) > 0) | \
filters lists: the precondition is that ‘AAA’ appears somewhere in the list, and the transformation function joins the lines using a newline.
Print a paragraph if it contains “AAA” and “BBB” and “CCC” in any order.
>>> stream_lines('foo.txt') | \
... consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
... filt(
... lambda ls: '\n'.join(ls) + '\n',
... pre = lambda ls: sum(['AAA' in l and 'BBB' in l and 'CCC' in l for l in ls]) > 0) | \
... to_stream(sys.stdout)
Note
see Item 58 for an explanation.
Print a paragraph if it contains “AAA” or “BBB” or “CCC”.
>>> stream_lines('foo.txt') | \
... consec_group(lambda l: l.strip() == '', lambda is_para: to_list()) | \
... filt(
... lambda ls: '\n'.join(ls) + '\n',
... pre = lambda ls: sum(['AAA' in l or 'BBB' in l or 'CCC' in l for l in ls]) > 0) | \
... to_stream(sys.stdout)
Note
see Item 58 for an explanation.
Print only the lines that are 65 characters in length or more.
>>> print stream_lines('foo.txt') | filt(pre = lambda l: len(l) >= 65) | to_list()
Print only the lines that are less than 65 chars.
>>> print stream_lines('foo.txt') | filt(pre = lambda l: len(l) < 65) | to_list()
Beginning at line 3, print every 7th line.
>>> print stream_lines('foo.txt') | slice_(3, None, 7) | to_list()
Delete duplicate, consecutive lines from a file.
>>> stream_lines('foo.txt') | \
... consec_group(lambda l: l, lambda l: nth(0)) | to_stream(sys.stdout)
Delete duplicate, nonconsecutive lines from a file.
>>> stream_lines('foo.txt') | \
... group(lambda l: l, lambda l: nth(0)) | to_stream(sys.stdout)
Delete all lines except duplicate consecutive lines (emulates “uniq -d”).
Delete the first 10 lines of a file.
>>> stream_lines('foo.txt') | slice_(10, None) | to_stream(sys.stdout)
Delete the last line of a file.
>>> stream_lines('foo.txt') | skip(-1) | to_stream(sys.stdout)
Delete the last 2 lines of a file.
>>> stream_lines('foo.txt') | skip(-2) | to_stream(sys.stdout)
Delete the last 10 lines of a file.
>>> stream_lines('foo.txt') | skip(-10) | to_stream(sys.stdout)
Delete every 8th line.
>>> stream_lines('foo.txt') | enumerate_(1) | \
... filt(pre: lambda (n, l): n % 8) | to_stream(sys.stdout)
Delete lines that match regular expression pattern.
>>> r = re.compile(r'foo(.+?)bar')
>>> stream_lines('foo.txt') | filt(pre = lambda l: r.match(l) is None) | to_stream(sys.stdout)
Delete all blank lines in a file (emulates “grep ‘.’”).
>>> stream_lines('foo.txt') | filt(pre = lambda l: l.strip()) | to_stream(sys.stdout)