COMP(2041|9044) 24T1 — Assignment 2: Eddy
Assignment 2: Eddy
Aims
This assignment aims to give you:
· practice in Python programming generally.
· a clear and concrete understanding of sed 's core semantics.
Introduction
Your task in this assignment is to write a Python program eddy.py which implement the Eddy editing commands described below. Eddy editing commands are a simple subset of the important tool Sed which you met earlier in the course.
Sed is a very complex program that has many commands.
Eddy contains only a few of the most important Sed commands.
There are also some simplifying assumptions below, which make your task easier.
You must implement Eddy in Python only. The Permitted Languages section below has more information.
NOTE:
The more challenging parts of this assignment may require some research. You may need to explore documentation and other information for Python, Sed, and regexes. Searching for this type of information is a very useful skill to practice., The more challenging parts of this assignment may require some research.
You may need to explore documentation and other information for Python, Sed, and regexes. Searching for this type of information is a very useful skill to practice.,
Reference implementation
Many aspects of this assignment are not fully specified in this document;
instead, you must match the behaviour of the reference implementation: 2041 eddy
Provision of a reference implementation is a common method to provide or define an operational specification, and it's something you will likely need to do after you leave UNSW.
Discovering and matching the reference implementation's behaviour is deliberately part of the assignment, and will take some thought.
If you discover what you believe to be a bug in the reference implementation, report it in the class forum.
Andrew and Dylan may fix the bug, or indicate that you do not need to match the reference implementation's behaviour in this case.
Eddy Commands
Subset 0
In subset 0 eddy.py will always be given a single Eddy command as a command-line argument. The Eddy command will be one of 'q', 'p', 'd', or 's' (see below).
The only other command-line argument possible in subset 0 is the -n option. Input files will not be specified in subset 0.
For subset 0 eddy.py need only read from standard input. Subset 0: q - quit command
The Eddy q command causes eddy.py to exit, for example:
$ seq 1 5 | 2041 eddy '3q'
1
2
3
$ seq 9 20 | 2041 eddy '3q'
9
10
11
$ seq 10 15 | 2041 eddy '/.1/q'
10
11
$ seq 500 600 | 2041 eddy '/^.+5$/q'
500
501
502
503
504
505
$ seq 100 1000 | 2041 eddy '/1{3}/q'
100
101
|
Eddy commands are applied to input lines as they are read. The q command means eddy.py may not read all input.
For example, the command prints an "infinite" number of lines containing (by default) " yes".
$ yes | 2041 eddy '3q'
y
y
y
|
This means eddy.py can not read all input first, e.g. into a list, before applying commands.
Subset 0: p - print command
The Eddy p commands prints the input line, for example:
$ seq 1 5 | 2041 eddy '2p'
1
2
2
3
4
5
$ seq 7 11 | 2041 eddy '4p'
7
8
9
10
10
11
$ seq 65 85 | 2041 eddy '/^7/p'
65
66
67
68
69
70
|
Subset 0: d - delete command
The Eddy d command deletes the input line, for example:
$ seq 1 5 | 2041 eddy '4d'
1
2
3
5
$ seq 1 100 | 2041 eddy '/.{2}/d'
1
2
3
4
5
6
7
8
9
$ seq 11 20 | 2041 eddy '/[2468]/d'
11
13
15
17
19
|
Subset 0: s - substitute command
The Eddy s command replaces the specified regex on the input line.
zzz3 zzz4 zzz5 zzz6 zzz7 zzz8 zzz9 20
$ seq 100 111 | 2041 eddy 's/11/zzz/'
100
101
102
103
104
105
106
107
108
109
zzz0
zzz1
|
The substitute command can be followed optionally by the modifier character g , for example:
$ echo Hello Andrew | 2041 eddy 's/e//'
Hllo Andrew
$ echo Hello Andrew | 2041 eddy 's/e//g'
Hllo And rw
|
g is the only permitted modifier character.
Like the other commands, the substitute command can be given addresses to be applied to:
53
54
99
56
57
58
59
60
$ seq 100 111 | 2041 eddy '/1.1/s/1/-/g'
100
-0-
102
103
104
105
106
107
108
109
110
---
|
Subset 0: -n command line option
The Eddy -n command line option stops input lines being printed by default.
$ seq 1 5 | 2041 eddy -n '3p'
3
$ seq 2 3 20 | 2041 eddy -n '/^1/p'
11
14
17
|
-n command line option is the only useful in conjunction with the p command, but can still be used with the other commands.
Subset 0: Addresses
All Eddy commands in subset0 can optionally be preceded by an address specifying the line(s) they apply to.
In subset 0, this address can either be a line number or a regex. The line number must be a positive integer.
The regex must be delimited with slash / characters. Subset 0: Regexes
In subset 0, you can assume backslashes \ do not appear in address or substitution regexes. In subset 0, you can assume semicolons ; do not appear in address or substitution regexes. In subset 0, you can assume commas , do not appear in address or substitution regexes.
In subset 0, regexes are delimited with slash / characters, so you can assume slashes do not appear in regexes.
In subset 0 and all other subsets, you can assume the regex is correct. You do not have to check for errors in the regex. In subset 0 and all other subsets, you can assume the regex is a POSIX-compatible extended regular expression.
In subset 0 and all other subsets, you can assume the regex is compatible with Python.
In other words, the regex can be used directly as a Python regular expression, for example passed to re.search , and will have the same meaning.
Note, if testing regular expressions with sed , you need to specify sed -E for extended regular expressions to work.