COMP2017/COMP9017 Assignment 2
Assignment due: 24 April 2020, 11:59 PM AEST (Week 8 Friday)
This assignment is worth 10% of your final assessment
1 Introduction
Version control systems are crucial in any software project. In this assignment, you will design
and implement the storage method, as well as some functions for Simple Version Control (SVC),
a (very) simplified system derived from the Git version control system.
2 Description of SVC
Projects that are to be placed under SVC must be initialised with svc_init. This allows the
system to create necessary data structures containing information about the state of the project.
Projects are comprised of files, and SVC computes a hash of each file to figure out if a change
has occurred. Only files which are explicitly added to the version control system are tracked.
Commits are like a snapshot in time of the state of the project. In SVC, they contain details
about which files have been added, removed, or modified, and sufficient information to restore a
file to this state. The currently active commit is often referred to as the HEAD. Each commit has
a “commit id” which uniquely identifies a given commit. Commit ids in SVC are represented in
hexadecimal numbers and are exactly 6 characters long.
cb410d Initial commit
+ README.md
ec8856 Added tests
+ tests/test1
+ tests/test2
ce9bd8 Updated readme
/ README.md
Figure 2.1: Examples of commits
Branches are useful to keep working copies of the project while working on new features or fixing
bugs. When a project is created, a master branch is created by default. For example, this branch
may be used to keep release versions of projects, and a branch features/something_special
might contain additional code with a new feature that is not yet complete and would not be
suitable for releasing.
A branch can be “checked out” to indicate that we want to switch our project to a specific
branch and work from there. For example, after we create the features/something_special
branch, it should be checked out before working on that feature.
Merging is the process of integrating changes from one branch into another. Once a new feature
is complete, or a bug is fixed and testing is complete, we may wish to merge these changes back
into the master branch to release. Note that merges can happen between any two branches, not
necessarily with the master branch.
1
COMP2017/COMP9017 Assignment 2
cb410d
master
HEAD
(a) After the initial commit
cb410d ec8856
master
HEAD
(b) After another commit to the master branch
cb410d ec8856
master
feature
HEAD
(c) New branch created
cb410d ec8856
master
feature
HEAD
(d) After checking out the feature branch
cb410d ec8856 90ae30 b96919
master
feature
HEAD
(e) Two commits to the feature branch
cb410d ec8856
90ae30 b96919
ce9bd8
master
feature
HEAD
(f) master branch checked out and one commit made
cb410d ec8856
90ae30 b96919
ce9bd8 e8ca9d0
master
feature
HEAD
(g) feature branch merged into master
Figure 2.2: Examples of branches and merging
Sometimes, we may want to reset to a specific commit. In this case, the files are reverted to the
state they were in at that commit, and any new commits continue from that commit. This may
result in some commits being detached from the rest of the SVC system. For example, if the
master branch is reset to ec8856 and a new commit is made, ce9bd8 would not be reachable.
2 of 11
COMP2017/COMP9017 Assignment 2
3 Functions to Implement
void *svc_init ();
This function will only be called once, before any other functions are called. In this function,
you may choose to perform preparation steps. If you require some data structure to help with
other operations, return a pointer to the area of memory containing this structure. In all further
function calls, this will be passed in as void *helper.
void cleanup(void *helper);
This function will only be called once, after all other functions have been called. Ensure all
dynamically allocated memory has been freed when this function returns.
int hash_file(void *helper , char *file_path);
Given the path to a file, compute and return the hash value using the algorithm described in
Section 3.1. If file_path is NULL, return -1. If no file exists at the given path, return -2. This
function should work even for files that are not being tracked.
char *svc_commit(void *helper , char *message);
Create a commit with the message given, and all changes to the files being tracked. This should
return the commit id, which can be calculated by implementing the algorithm described in Sec-
tion 3.2. If there are no changes since the last commit, or message is NULL, return NULL.
void *get_commit(void *helper , char *commit_id);
Given a commit_id, return a pointer to the area of memory you stored this commit. If a commit
with the given id does not exist, or commit_id is NULL, this function should return NULL. Note
only the NULL return values for this function will be tested. The commit you return here will be
passed to some of the other functions to implement.
char ** get_prev_commits(void *helper , void *commit , int *n_prev);
Given a pointer to a commit, return a dynamically allocated array. Each element in the array
should point to the id of a parent commit. The number of parent commits should be stored in
the area of memory pointed to by n_prev. If n_prev is NULL, return NULL. If commit is NULL,
or it is the very first commit, this function should set the contents of n_prev to 0 and return
NULL. Note: only the allocated array will be freed for you by the tester.
void print_commit(void *helper , char *commit_id);
Given a commit_id, print the details of the commit as detailed below. If no commit with this
id exists, or commit_id is NULL, you should print Invalid commit id
For a valid commit, you should print the commit id, the branch it is on, a list of the added,
removed and changed files in any order, and a list of currently tracked files with their hash
values, also in any order. Hashes are left padded with spaces to be exactly 10 characters wide.
3 of 11
COMP2017/COMP9017 Assignment 2
commit id [branch name]: commit message
+ added file(s)
- removed file(s)
/ changed file(s) [previous hash --> new hash]
Tracked files (number of tracked files):
[hash] file name
int svc_branch(void *helper , char *branch_name);
Create a new branch with the given name. In this SVC, valid branch names are restricted to
those that contain only alphanumeric characters, underscores, slashes and dashes: a-z, A-Z,
0-9, _, /, -. If the given branch name is invalid or NULL, return -1. If the branch name
already exists, return -2. If there are uncommitted changes, return -3. If the branching is
successful, return 0. Note: creating a branch does not check it out.
int svc_checkout(void *helper , char *branch_name);
Make this branch the active one. If branch_name is NULL or no such branch exists, return -1. If
there are uncommitted changes, return -2 and do not make this the active branch. Otherwise,
return 0 and make it the active branch. Note in SVC, the branch is not created if it does not exist.
char ** list_branches(void *helper , int *n_branches);
Print all the branches in the order they were created. In addition, return a dynamically allocated
array of the branch names in the same order, and store the number of branches in the memory
area pointed to by n_branches. If n_branches is NULL, return NULL and do not print anything.
Note: only the allocated array will be freed for you by the tester.
int svc_add(void *helper , char *file_name);
This is a notification that a file with the name file_name should be added to version control.
If file_name is NULL, return -1 and do not add it to version control. If a file with this name is
already under version control, return -2. If this file does not exist, return -3. Otherwise, add
the file to the SVC system and return the file’s hash value.
int svc_rm(void *helper , char *file_name);
This is a notification that a file with the name file_name should be removed from the version
control system. If file_name is NULL, return -1. If the file with the given name is not being
tracked, return -2. Otherwise, remove the file from SVC and return its last known hash value
(from adding or committing).
int svc_reset(void *helper , char *commit_id);
Reset the current branch to the commit with the id given, discarding any uncommitted changes.
If commit_id is NULL, return -1. If no commit with the given id exists, return -2. It is guaran-
teed that if a commit with this id exists, there will be one simple path from the HEAD of the
current branch. That is, all commits from HEAD to the commit will have exactly one previous
commit. Reset the branch to this commit and return 0. Note that this function means that
some commits may be detached from the rest of the SVC system.
4 of 11
COMP2017/COMP9017 Assignment 2
char *svc_merge(void *helper , char *branch_name , resolution *
resolutions , int n_resolutions);
This function will be called to merge the branch with the name branch_name into the current
branch. If branch_name is NULL, print Invalid branch name and return NULL. If no such
branch exists, print Branch not found and return NULL. If the given name is the currently
checked out branch, print Cannot merge a branch with itself and return NULL. If there are
uncommitted changes, print Changes must be committed and return NULL. In all other cases,
the merge procedure begins. Note that the way branches are merged in SVC is different to Git.
To merge two branches together, all tracked files in both branches are used. If there are conflict-
ing files, it will appear in the resolutions array. Each resolution struct contains the name
of the conflicting file, and a path to a resolution file. This file contains the contents that the file
should contain after the merge. However, if the path given is NULL, the file should be deleted.
A commit with the message Merged branch [branch_name] replacing [branch_name] with
branch_name is created with the necessary changes for the current branch to reflect changes
made in the other branch. The previous commits order should be the current branch’s HEAD and
then the other branch’s HEAD. The function should then print the message Merge successful
and return the new commit id.
3.1 File Hash Algorithm
Below is the pseudocode to determine the hash value of a file. Note: this is not the same
algorithm used in real world version control systems.
function file_hash(file_name):
file_contents = read(file_name)
file_length = num_bytes(file_contents)
hash = 0
for unsigned byte in file_name:
hash = (hash + byte) % 1000
for unsigned byte in file_contents:
hash = (hash + byte) % 2000000000
return hash
3.2 Commit ID Algorithm
Below is the pseudocode to determine the commit id of a file. Note: this is not the same
algorithm used in real world version control systems.
function get_commit_id(commit):
id = 0
for unsigned byte in commit.message:
id = (id + byte) % 1000
for change in commit.changes in increasing alphabetical order of file_name:
if change is addition, id = id + 376591
if change is deletion, id = id + 85973
if change is modification, id = id + 9573681
for unsigned byte in change.file_name:
id = (id * (byte % 37)) % 15485863 + 1
return id as hexadecimal string
5 of 11
COMP2017/COMP9017 Assignment 2
4 Examples
4.1 File Hashing
File Name File Contents Hash Value
Tests/diff.txt Empty file (0 bytes) 385
Tests/diff.txt This is some text in a file\n 2832
sample.txt Hello, world\n 1178
sample.txt Hello, world!\n 1211
Tests/test1.in 5 3 2\n 564
hello.py print("Hello")\n 2027
Taking the last example, hello.py corresponds to ASCII values [104, 101, 108, 108, 111, 46,
112, 121]. Following the algorithm gives a hash of 811 % 1000 = 811. The contents of the file
correspond to [112, 114, 105, 110, 116, 40, 34, 72, 101, 108, 108, 111, 34, 41, 10] giving (811 +
1216) % 2000000000 = 2027.
4.2 SVC Example 1
void *helper = svc_init();
Return value: helper
hash_file(helper, "hello.py");
Return value: 2027 (from example above)
hash_file(helper, "fake.c");
Return value: -2 (non-existent file)
svc_commit(helper, "No changes");
Return value: NULL
svc_add(helper, "hello.py");
Return value: 2027
svc_add(helper, "Tests/test1.in");
Return value: 564 (from example above)
svc_add(helper, "Tests/test1.in");
Return value: -2
svc_commit(helper, "Initial commit");
Return value: "74cde7"
The ASCII values for the commit message Initial commit are [73, 110, 105, 116, 105, 97, 108,
32, 99, 111, 109, 109, 105, 116]. After the first stage, this gives an id of 1395 % 1000 = 395. The
two changes are additions of hello.py [104, 101, 108, 108, 111, 46, 112, 121] and Tests/test1
.in [84, 101, 115, 116, 115, 47, 116, 101, 115, 116, 49, 46, 105, 110]. Following the algorithm,
the id is then 7654887. Converting this to a 6 character hexadecimal string gives 74cde7.
6 of 11
COMP2017/COMP9017 Assignment 2
void *commit = get_commit(helper, "74cde7");
Return value: Pointer to area of memory containing the commit created above
int n_prev;
char **prev_commits = get_prev_commits(helper, commit, n_prev);
Return value: NULL
Afterwards, n_prev = 0
print_commit(helper, "74cde7");
Output:
74cde7 [master]: Initial commit
+ hello.py
+ Tests/test1.in
Tracked files (2):
[ 2027] hello.py
[ 564] Tests/test1.in
int n;
char **branches = list_branches(helper, n);
Output: master
Return value: Array with pointer to area of memory containing the string master
Afterwards, n = 1
4.3 Commit ID
In the above example, the commit ID for the valid commit “Initial commit” is calculated by:
1. The commit message has ascii values [73, 110, 105, 116, 105, 97, 108, 32, 99, 111, 109, 109,
105, 116] so after the first step, the ID is 1395 % 1000 = 395
2. There are two changes, both of which are additions
3. The first change adds 376591 to the ID, and then for each byte, the calculation in Section
3.2 is followed, giving 111
4. The second change also adds 376591 to the ID, and the above is repeated giving 7654887
5. Converting this to a 6 digit hexadecimal number gives 74cde7
7 of 11
COMP2017/COMP9017 Assignment 2
4.4 SVC Example 2
Starting from a blank project (no files), we create two files:
COMP2017/svc.h has the contents
#ifndef svc_h\n
#define svc_h\n
void *svc_init(void);\n
#endif\n
COMP2017/svc.c has the contents
#include "svc.h"\n
void *svc_init(void) {\n
// TODO: implement\n
}\n
The hashes of the two files are 5007 and 5217 (all ‘tabs’ are four spaces and new line characters
have been explicitly shown above).
svc_add(helper, "COMP2017/svc.h");
Return value: 5007
svc_add(helper, "COMP2017/svc.c");
Return value: 5217
svc_commit(helper, "Initial commit");
Return value: "7b3e30"
svc_branch(helper, "random_branch");
Return value: 0
svc_checkout(helper, "random_branch");
Return value: 0
Next, the file COMP2017/svc.c is changed to have the following contents
#include "svc.h"\n
void *svc_init(void) {\n
return NULL;\n
}\n
which has the hash 4798.
svc_rm(helper, "COMP2017/svc.h");
Return value: 5007
svc_commit(helper, "Implemented svc_init");
Return value: "73eacd"
8 of 11
COMP2017/COMP9017 Assignment 2
You realise you accidentally deleted COMP2017/svc.h and want to revert to the initial commit
svc_reset(helper, "7b3e30");
Return value: 0
Then, the file COMP2017/svc.c is changed again to have the contents shown above.
svc_commit(helper, "Implemented svc_init");
Return value: "24829b"
void *commit = get_commit(helper, "24829b");
Return value: Pointer to area of memory containing the commit created above
int n_prev;
char **prev_commits = get_prev_commits(helper, commit, n_prev);
Return value: Pointer to an array of length one, containing "7b3e30"
Afterwards, n_prev = 1
svc_checkout(helper, "master");
Return value: 0
The test framework creates a file resolutions/svc.c with the contents
#include "svc.h"\n
void *svc_init(void) {\n
return NULL;\n
}\n
The following code is then executed to perform a merge:
// Resolution(s) are created by the test framework
resolution *resolutions = malloc(sizeof(resolution));
resolutions [0]. file_name = "COMP2017/svc.c";
resolutions [0]. resolved_file = "resolutions/svc.c";
// Call to merge function
svc_merge(helper , "random_branch", resolutions , 1);
// The test framework will free the memory
free(resolutions);
Return value: "48eac3"
The commit message is Merged branch random_branch. The conflicts array indicates that only
the change is to be kept, following the merge rules described above. This means that from the
perspective of the master branch, the only change to be committed is this modification.
int n_prev;
void *commit = get_commit(helper, "48eac3");
char **prev_commits = get_prev_commits(helper, commit, n_prev);
Return value: Pointer to an array of length two, containing "7b3e30" and "24829b" in that
order. Afterwards, n_prev = 2
9 of 11
COMP2017/COMP9017 Assignment 2
5 Notes and Hints
• In SVC, branches are not deleted. After merging a branch, it remains in the list of branches.
• When a reset to a commit occurs, don’t forget to undo all types of changes (addition,
deletion, modification) to the files too!
• Useful tools and functions: address sanitizer, valgrind, strdup
• The hashing algorithm may produce the same hash, even if the contents of the file have
changed. Test cases are designed with this in mind and it is guaranteed this will not occur.
• Error codes should be checked in order described in the function descriptions.
• A file may be deleted from the file system without svc_rm being called Your code should
detect that this has happened and remove the file from SVC accordingly. Note, however,
that the function svc_rm should not delete the file from the file system!
• You should write your own test cases and draw out diagrams to help visualise the problem.
• The maximum file path is 260 characters long (excluding NULL terminator)
• Branch names are at most 50 characters long (excluding NULL terminator)
• All other strings can be of any length
• The two examples given above correspond to the example test cases on Ed
5.1 Staging
In version control systems, there is a concept of “staging”. Adding/removing/changing files are
kept in the “staging” phase until a call to commit actually occurs. This means that multiple
changes can occur in the staging phase that should only be represented as a single change when
a commit occurs. For example, modifying a file twice should appear only as a single change in
the commit. Adding a file and then removing it before committing will mean neither of these
appear in the commit’s changes. Think about all the possible combinations of file operations
that could occur and what the effective change actually is (hint: there are 5 pairs you should
consider).
5.2 Resolution Struct
The resolution struct is provided in the scaffold code and is also provided below for reference.
Do not modify this struct
typedef struct resolution {
char *file_name;
char *resolved_file;
} resolution;
10 of 11
COMP2017/COMP9017 Assignment 2
6 Submission
You must submit your code via git to Ed for automatic marking. It must not produce any
compilation errors, and it must free any dynamically allocated memory unless otherwise specified
in Section 3.
Your code will be compiled with the following options:
gcc -O0 -std=gnu11 -lm -Wextra -Wall -Werror -g -fsanitize=address
Warning: Any attempts to deceive or disrupt the marking system will result in an immedi-
ate zero for the entire assignment. Negative marks can be assigned if you do not follow the
assignment problem description or if your code is unnecessarily or deliberately obfuscated.
7 Mark Breakdown
This assignment is worth 10% of your final mark. There are 10 points available (1% each):
Component Points Description
Correctness 6 Proportion of test cases passed on Ed
Performance 2 How much memory and time does your program take?
Code structure and style 2 Manual marking by your tutor. Marks depend on
style, layout and readability of your code
Note on performance: memory and time usage are each worth 1 point and will be proportionately
allocated based on other students’ submissions. The memory usage is based on the maximum
memory allocated at any given time in the execution of your code. You should consider different
data structures and algorithms before implementing your solution.
8 Academic Declaration
By submitting this assignment you declare the following:
I declare that I have read and understood the University of Sydney Student Plagiarism: Coursework Policy
and Procedure, and except where specifically acknowledged, the work contained in this assignment/project
is my own work, and has not been copied from other sources or been previously submitted for award or
assessment.
I understand that failure to comply with the Student Plagiarism: Coursework Policy and Procedure
can lead to severe penalties as outlined under Chapter 8 of the University of Sydney By-Law 1999 (as
amended). These penalties may be imposed in cases where any significant portion of my submitted work
has been copied without proper acknowledgment from other sources, including published works, the In-
ternet, existing programs, the work of other students, or work previously submitted for other awards or
assessments.
I realise that I may be asked to identify those portions of the work contributed by me and required to
demonstrate my knowledge of the relevant material by answering oral questions or by undertaking sup-
plementary work, either written or in the laboratory, in order to arrive at the final assessment mark.
I acknowledge that the School of Computer Science, in assessing this assignment, may reproduce it en-
tirely, may provide a copy to another member of faculty, and/or communicate a copy of this assignment
to a plagiarism checking service or in-house computer program, and that a copy of the assignment may
be maintained by the service or the School of Computer Science for the purpose of future plagiarism
checking.