본문 바로가기

U of T: Computer Science/First Year: CSC108

CSC108 Assignment 2

CSC108/A08H Assignment 2

Introduction

In this assignment, you will gain some appreciation for error correction and detection in data. The assignment will also give you practice in a number of areas:

  • Working with strings, functions, methods, and simple lists
  • Testing

Before you start, spend at least one hour per person reading through the entire assignment.

Part 1: Lists, Strings, Files

In this part, you will practice writing functions that manipulate lists, strings, and files, and call list, string, and file methods. It is a warm-up for the next part.

Task: Define the following functions, and place them in a module called warmup.py. As always, make very, very sure that your functions are named exactly as you see them below - we recommend that you copy and paste. Include a docstring for every function.

Function Name Description
whereis(str, str) Given a string to be searched and a string to search for (in that order), return the index (position) of the first occurrence of the second string in the first string (as an int). If you need to, you may assume that the second string contains a only single character. For example, whereis("house","u") should return the index 2. You may also assume that the second string is always present in the first.
evens(list) Given a list, return a new list consisting of those members of the given list whose index is even.
reverse(list) Given a list, return a new list that is the same as the original list except that its members are in reverse order. You must make sure that the original list is not itself reversed.
cat(file) Given an open text file, read the contents of the file line by line and write each line to the screen.
double_my_digits(str) Given a string of digits, return a string that represents the number obtained by doubling each digit in the input string. For example, double_my_digits("123456") should return the string "24681012".
fix_capitalization(list) Given a list of strings, return a new list that contains all of the strings from the original list with their capitalization fixed. A string is considered to be capitalized correctly if it consists of only lowercase letters. An exception is made for the first letter in the string. If the first letter of a string is capitalized, it should remain capitalized. If it is lower-case, it should remain in lower-case. For example, fix_capitalization(["CAnaDa", "cAt","pythoN"] should return the list ["Canada", "cat", "python"]. Any non-letters should be left as they are.

Part 2: Error detection

Zillions of electrons representing information of one sort or another fly around the internet every second. Sometimes, due to faulty switches or bad electrical connections, some electrons end up in the wrong place, and information gets lost or corrupted. Even without the help of the internet and bad connections, data gets corrupted - for example by people making simple mistakes when entering data into computers. This, too, is a huge source of errors in data.

A vital part of computing is software that detects many of these errors. How does it do that? How can software detect when someone types (or a machine sends via the internet) a '3' instead of a '2', or that two digits are interchanged when some long string of data is entered? One way is to structure the data so that some parts of the data depend on other parts, and have software check the dependence. If the dependence is broken, an error has occurred.

Using a checksum is one of the most common ways to do this:

  • The sender computes a value, called a checksum, from the values it is sending.
  • The sender transmits the checksum to the receiver.
  • The receiver also computes the checksum from the received values.
  • The receiver compares the trasmitted checksum value with the computed one.

If the checksums are the same, chances are pretty good that there's been no error; if they are not, then one of the values (perhaps the checksum itself) was not transmitted correctly.

The same technique can be used when there is no actual transmission and we just want to ensure that the data entered is valid. When the data (an OHIP number or similar) is first generated, an extra value is added to the end of it, which is a checksum: it is computed from the rest of the data. When the number is then entered into a computer, a suitable program can check the checksum against the data and detect any corruption.

Although it's called checksum, it does not simply stand for the sum of the data which is checked. That would not be a very good safeguard against certain types of data entry errors. (Can you think of an example?) A checksum is computed using a bit more calculation, so that there is a good chance that simple entry mistakes change the checksum and can be detected.

In the following sections, you will write functions that calculate and check such checksums. You may assume that the arguments to your functions have the proper form - they have the length specified below and and have no characters other than those specified.

Place the functions you write for the ISBN and SIN sections, below, in a file called verify.py; place the test functions in the files indicated.

Important: In the functions you write for these sections, the code is expected to do similar things to each digit in a number. You are expected to write code that uses loops rather than writing individual statements for each digit.

ISBN:

Task:

  • Write a function called verify_isbn(isbn) that takes a string parameter and returns True if and only if the argument is a valid ISBN.
  • Write a good set of nose test cases for verify_isbn and place them in a file called test_isbn.py.
  • Write a function called verify_isbn_file(f) that takes as a parameter an open file f containing one ISBN per line and prints every invalid ISBN.

What is an ISBN, and how do you verify them? ISBN stands for International Standard Book Number, which is a number assigned to every published book. This number almost always appears on the back cover. Mark Guzdial's textbook Introduction to Computing and Programming in Python has this ISBN: 0-13-117655-2.

According to Wikipedia, "the two most common errors that occur when handling an ISBN (e.g., typing it in or writing it down) are an altered digit or transposition of adjacent digits." To guard against such mistakes, the last digit is a checksum for the other digits. Here is how the last digit is computed:

  1. There are 9 digits (other than the checksum digit). Multiply each of those 9 digits by a number in the range 10 down to 2, and add the results. With the ISBN 0-13-117655-2, then, we have
    0*10 + 1*9 + 3*8 + 1*7 + 1*6 + 7*5 + 6*4 + 5*3 + 5*2 = n
  2. Calculate n modulo 11.
  3. Subtract the result of the previous step from 11.
  4. Calculate that result modulo 11. (Yes, take the modulo a second time.)
  5. This last result is the last digit, if it is a single digit. If it is 10, replace it with the letter "X".

If you follow the calculations on the ISBN above, you'll see that the last step yields 2, which is the last digit of this particular ISBN.

SIN:

Task:

  • Write a function called verify_sin(sin) that takes a string as a parameter and returns True if and only if the argument is a valid Social Insurance Number (SIN).
  • Write a good set of nose test cases for verify_sin and place them in a file called test_sin.py

Various government programs in Canada use Social Insurance Numbers. For example, Revenue Canada uses Social Insurance Numbers for tax reporting purposes. Every resident of Canada has a SIN. The first digit indicates the province in which it was registered, and the last digit is again a checksum, computed using a different algorithm. Here is an example SIN: 046-454-286.

The last digit of a SIN is a checksum for the other digits. It is calculated as follows from the first eight digits:

  1. Extract every other digit (every digit with an even index) and add them up. In the example SIN, this gives 0 + 6 + 5 + 2 = 13.
  2. Multiply each of the four remaining digits by 2. In the example SIN, these digits are 4, 4, 4 and 8, so this gives 8, 8, 8 and 16. Now add all the digits: 8 + 8 + 8 + 1 + 6 = 31.
  3. Add the two sums. In the example SIN, this gives 13 + 31 = 44.
  4. Subtract this total from the nearest greater multiple of 10: this will be the last (checksum) digit. In the example SIN, this gives 50 - 44 = 6. So the SIN 046-454-286 is valid.

nose tests

Recall that nose is a module that can evaluate functions called test_whatever (any arbitrary name after "test_"). Each such function contains one or more assert statements, which contain an expression and a string. The expression is the test: if the expression evaluates to True, the test passes; if not, the test fails. The string is used to document the test: if the test fails, nose prints out this string.

The function in nose that runs the test functions is called runmodule(); we call it at the end of our test file, after defining all our test functions.

It is a good idea to place each assert in a separate test_... function, since nose stops evaluating the function as soon as one of the asserts fails. If one or more asserts that come later would also fail, we never find out: they don't get executed.

So: your test_something.py that you hand in should contain several functions starting with "test_". Each such function should contain an assert statement as described above. The expression in the assert statement should be the value returned by your verify_something function when it is called with an "interesting" argument. This value is a boolean, and so the expression will be True or False. (Remember that the not operator can be applied to a bool - this can be helpful sometimes.)

What kind of tests, and how many should you write? There is no set answer. Think of it this way: you need to convince yourself - and your boss (instructor) - that your algorithm correctly detects all good and bad checksum digits. That doesn't mean you have to write a lot of tests.

It means you have to write good ones. Your algorithms do a number of things: they "walk through" strings and/or lists, they add, multiply, subtract, take the modulus of numbers. Some of these operations do unusual things with some operands. (It so happens you don't have to divide - think of what dividing by 0 could do...) Write tests that make sure your algorithms do the right thing. Think "What happens if ... ?".

In lecture, we emphasized that "interesting" values should be tested. It's up to you to think about your algorithm, and figure out what could be "interesting" in this case. (If you are working in pairs, one way to go about this could be to have one of you write a function and the other of you - without looking at the function code, just based on the explanation in the handout - the test for that function! This is a productive way to work as partners)

As stated above, you may assume that all ISBN and SIN codes that your functions validate have the proper form. That is, they are the right length and have no characters other than the permitted ones (digits, -, and X for ISBN). What you cannot assume is that the checksum digit is correct, nor that your algorithm is right. Your tests should be designed so as to convince yourself (and us) that your code does the right thing.

What to Hand In

Hand in the following files:

  • warmup.py
  • verify.py
  • test_isbn.py
  • test_sin.py
Student's Solution
Teacher's Solution

'U of T: Computer Science > First Year: CSC108' 카테고리의 다른 글

CSC108 Project  (0) 2007.12.16
CSC108 Assignment 4  (0) 2007.12.16
CSC108 Assignment 3  (0) 2007.12.16
CSC108 Assignment 1  (0) 2007.12.16
CSC 108 Course 소개.  (0) 2007.12.16