|
Computer Science & Software Engineering C Programming (CITS1210) - 2nd project 2008 |
|
||
|
Submission deadline: 12noon, Friday 31st October 2008
See also details of the tests used in the marking process.
Background:In recent years, hard disk sizes and densities have increased dramatically, while their costs are currently approaching just 15 cents per gigabyte. As a consequence, we store files on our computers' disks in very different ways and, because we typically have a huge amount of free space available, we end up having multiple copies of many files on our disks. This presents few problems, until we eventually run out of disk space, or need to backup our files to another (smaller) disk, perhaps over a network with limited bandwidth. At that time we'd like to locate all duplicate files, and only make one copy of them to our backup destination.
Aim:The aim of this project is to design and develop a useful utility program, named duplicates, to locate and report on duplicate files in, and below, a named directory. Your implementation of duplicates will be invoked with zero or more valid command-line options, and one directory name.With no command-line options (i.e. only a directory name is provided) duplicates will simply list 4 things (with just one integer per line):
Files and directories
(other than the "starting" directory indicated on the command-line)
which cannot be read should be silently ignored
(no error messages should be printed).
An explanation of each of the command-line options follows.
Support for the command-line option marked with a chili
Detecting duplicate files:To detect duplicate files we'll employ a cryptographic checksum function named SHA2 (pronounced 'shar-2'). SHA2 examines the contents of a file and produces a fixed-length summary of its contents. Cryptographic checksum functions are designed by mathematicians and those developing encryption and security software.Here is an implementation of the function - strSHA2.c Two or more files are considered identical if their cryptographic checksums are identical. To date, no two (different) files ever have been found with identical cryptographic checksums! For this project, we'll use a string to store this representation, and two files will be considered identical if their SHA2 string representations are identical. The function strSHA2, with the following prototype:
char *strSHA2(char *filename);
will be provided for this project (it is not a standard C99 function). If strSHA2 can read the indicated file, it will returned a dynamically allocated string holding the SHA2 string representation of the file's contents. If the indicated file cannot be read, strSHA2 will return NULL. Note that you do not have to understand the SHA2 function or its implementation for this project.
Getting started:A sample solution, named /cslinux/examples/CITS1210/WWW/project2/duplicates-sample, will be provided.NOTE: your solution's output must be IDENTICAL to that of the sample solution. There is no required sequence of steps to undertake the project, and no sequence of steps will guarantee success. However, the following sequence is strongly recommended (and this is how the sample solution was built). It is assumed (considered essential for success!) that each step:
The sample solution uses the following C99 and Unix system functions. It is strongly recommended that you first read the online documentation for each of these:
and understand how to use the provided strSHA2 function. The suggested steps:
Advanced tasks:If you would like a much greater challenge, you may like to attempt to an advanced version of this project.Those who undertake and complete significant parts of the "Advanced tasks" will have the opportunity to "recover" any marks lost (deducted) in the first part of the project. There are 3 additional tasks in the advanced section:
The first advanced task simply permits multiple directories to be searched. For example, if four directory names are provided, then all files in all four directories should be considered. The second additional task requires you to identify the duplicate files and to then store only one instance of each. The Unix link system call (see man 2 link) provides this facility for us, by creating hard-links between two or more files. The actual file contents will only be stored only once, and multiple filenames will refer to that single copy. WARNING: until you are very confident that your duplicates program is working correctly, you are strongly advised NOT to use your duplicates program to use the -m option to minimize the storage of important files and directories!
Assessment:This project is worth 25% of your final mark for CITS1210: C Programming. It will be marked out of 25. You do not need to complete the tasks detailed in the "Advanced tasks" section to receive full marks (100%) for the project.As always, you are expected to show professionalism in your approach by making appropriate use of the features of the C99 language, following the principles of good program design, and adhering to the programming conventions outlined in this unit. Those who undertake and complete significant parts of the "Advanced tasks" will have the opportunity to "recover" any marks lost (deducted) in the first part of the project. As a rough estimate, completing the "Advanced tasks" of the project may allow a student to recover up to 10 (out of 25) lost marks. Note that a score of >100% is not possible. During the marking, attention will obviously be given to the correctness and readability of your solution.
Marks will be allocated to each function according to:
As a rough estimate, marks will be awarded as follows: 15 (out of 25) marks to program correctness (i.e., how well your functions match the specification of the questions asked), and 10 (out of 25) marks to coding style (i.e. how well you have written the functions in terms of clarity and use of the features of the C99 language). Part marks to questions may be awarded, so if you are not able to get a function working correctly, you should still submit what you have done. However, you will be significantly penalised if your submitted program does not compile successfully (e.g. if it contains any compilation errors).
Working together:This project may be completed in small teams of two students; you may choose to work individually, but you may not work in a team of three. The motivation for this is to develop communication skills amongst students, and to enable you to attempt a project considered of greater difficulty than would normally be reasonable for the time available. No allowance will be made in the marking if you choose to undertake the project individually.You are expected to have read and understood the CSSE Policy on Plagiarism. In accordance with this policy, you may discuss with other teams the general principles required to understand this project, but the work you submit must be the sole result of your team's members.
Submission:You must submit your project's files using cssubmit. No other method of submission will be accepted. If working in a team of two, only one student needs to submit the work using cssubmit. cssubmit will give you a receipt of your submission. You should print and retain this receipt in case of any dispute.Please be fully aware of CSSE's Penalties for late submission. Submitting the wrong file(s) may result in a mark of zero for your project. Note also that the cssubmit facility does not archive submissions and will simply overwrite any previous submission with your latest submission.
You should submit a Makefile and all C header (.h) and code (.c) files that you wish to be considered for assessment.
Any submitted C file must begin with the following lines:
/* CITS1210 2nd Project 2008 Name(s): your name(s) Student number(s): your student number(s) */
Clarifications:Please post requests for clarification about any aspect of the project to help1210 so that all students may remain equally informed. Clarifications will be added to the project clarifications webpage.Good luck!
|
|
|
||
| Top of Page |
|
|