In this problem, you'll implement a few simple functions for dealing with strings. You need not perform any error checking in any of the functions for this problem.
is_palindrome
, which takes a string as its only argument, and returns a Boolean. Your function should return True
if the argument is a palindrome, and False
otherwise. For the purposes of this problem, you may assume that the input is a string and will consist only of alphanumeric characters (i.e., the letters, either upper or lower case, and the digits 0 through 9) and spaces. Your function should ignore spaces and capitalization in assessing whether or not a string is a palindrome, so that tacocat
and T A C O cat
are both considered palindromes.def is_palindrome(inStr):
# Remove spaces and set string to lower case
inStr = inStr.lower().split(' ')
inStr = ''.join(inStr)
# Initialize start and end of string
i = 0
j = len(inStr) - 1
while (i < j):
if(inStr[i] != inStr[j]):
return(False)
i += 1
j -= 1
return(True)
is_abecedarian
, which takes a single argument in the form of a string and returns True
if the argument is abecedarian and False
otherwise. Here you may assume that the input consists only of alphabetic characters and spaces. You function should ignore spaces, so that the string abcd efgh xyz
is considered abecedarian.def is_abecedarian(inStr):
# Remove spaces and set string to lower case
inStr = inStr.lower().split(' ')
inStr = ''.join(inStr)
# Loop through each letter in the string
n = len(inStr)
for i in range(n - 1):
currLetter = inStr[i]
nextLetter = inStr[i + 1]
# Check that the current letter comes before the following letter alphabetically
if(currLetter > nextLetter):
return(False)
return(True)
remove_vowels
that takes a string as its only argument and returns that string with all the vowels removed. For the purposes of this question, the vowels are the letters "a e i o u". Thus, remove_vowels('cat')
should return 'ct'
, remove_vowels('audio')
should return 'd', etc. Take care that your function correctly handles a string of all vowels. Hint: there is a particularly elegant solution to this problem that makes use of the accumulator pattern we saw in lecture and the fact that Python strings implement the addition operation as string concatenation.vowels = ['a', 'e', 'i', 'o', 'u']
def remove_vowels(inStr):
inStr = inStr.lower().replace(' ', '')
return(''.join([x for x in inStr if (x not in vowels)]))
In this problem, you'll implement a few very simple list operations.
def list_reverse(inList):
if not isinstance(inList, list):
raise TypeError('Input should be a list!')
return(inList[::-1])
# list_reverse([1, 2, 3, 4])
is_sorted
that takes a sequence seq
as its only argument and returns True
if the sequence is sorted in ascending order and returns False
otherwise. You may assume that seq
is, in fact, a Python sequence. Your function should require a single traversal of the list, and inefficient solutions (i.e., ones that require more than one traversal) will not receive full credit. You may assume that all elements in the input sequence seq
support the comparison operations (==, </>, >=
, etc), so that there is no need for error checking. (Indeed, if you try to make the comparison, say, 1 < 'cat'
, Python will raise an error for you, anyway.) Note: this problem illustrates a particularly useful aspect of Python's dynamic typing. It is possible to write this function while being agnostic as to the type of the input variable. It requires only that seq supports indexing and that the elements in seq
support the comparison operations.def is_sorted(seq):
n = len(seq)
for i in range(n - 1):
if(seq[i] > seq[i + 1]):
return(False)
return(True)
binary_search
that takes two arguments, a list of integers t
(which is guaranteed to be sorted in ascending order) and an integer elmt
, and returns True
if elmt
appears in list t
and False
otherwise. Of course, you could do this with the in operator, but that will be slow when the list is long, for reasons that we discussed in class. Instead, you should use binary search: To look for elmt
, first look at the "middle" element of the list t
. If it's a match, return True
. If it isn't a match, compare elmt
against the "middle" element, and recurse, searching the first or second half of the list depending on whether elmt
is bigger or smaller than the middle element. Hint: be careful of the base cases: What should you do when t
is empty, length 1, length 2, etc.? Note: your solution must actually make use of binary search to receive credit, and your solution must not use any built-in sorting or searching functions. Note: we could, if we wanted, use the function is_sorted
that we wrote above to do error checking here, but there is a good reason not to do so. This reason will become clear when we make our brief foray into the topic of runtime analysis later in the semester.def binary_search(t, elmt):
if len(t) == 0:
return (False)
elif len(t) == 1:
if (t[0] != elmt):
return (False)
else:
return (True)
elif len(t) == 2:
if (t[0] == elmt or t[1] == elmt):
return (True)
n = len(t)
m = int(n / 2)
if t[m] == elmt:
return (True)
elif t[m] < elmt:
return(binary_search(t[(m + 1):], elmt))
return(binary_search(t[:m], elmt))
# binary_search([1, 4, 5, 6, 7, 8], 1)
In this problem, you'll implement some very simple counting operations that are common in fields like biostatistics and natural language processing. You need not perform any error checking in the functions for this problem.
char_hist
that takes a string as its argument and returns a dictionary whose keys are characters and values are the number of times each character appeared in the input. So, for example, given the string "gattaca", your function should return a dictionary with key-value pairs $(g, 1), (a, 3), (t, 2), (c, 1)$. Your function should count all characters in the input (including spaces, tabs, numbers, etc). The dictionary returned by your function should have as its keys all and only the characters that appeared in the input (i.e., you don't need to have a bunch of keys with value 0!). Your function should count capital and lower-case letters as the same, and key on the lower-case version of the character, so that G and $g$ are both counted as the same character, and the corresponding key in the dictionary is $g$.def char_hist(inStr):
charDict = {}
inStr = inStr.lower()
for char in inStr:
if(charDict.get(char) == None):
charDict[char] = 1
else:
charDict[char] += 1
return(charDict)
# char_hist("gattaca")
'mi', 'is', 'ss', 'si', 'is', 'ss', 'si', 'ip', 'pp', 'pi'
bigram_hist
that takes a string as its argument and returns a dictionary whose keys are 2-tuples of characters and values are the number of times that pair of characters appeared in the string. So, for example, when called on the string 'mississippi'
, your function should return a dictionary with keys 'mi','is','ss','si','ip','pp','pi'
'ab'
occurred four times in the input, then your function should return a dictionary that includes the key-value pair ((a, b), 4)
. Your function should handle all characters (alphanumerics, spaces, punctuation, etc). So, for example, the string 'cat, dog'
includes the bigrams 't,', ', '
and ' d'
. As in the previous subproblem, the dictionary produced by your function should only include pairs that actually appeared in the input, so that the absence of a given key implies that the corresponding two-character string did not appear in the input. Also as in the previous subproblem, you should count upper- and lower-case letters as the same, so that 'GA'
and 'ga'
both count for the same tuple, $(g, a)$.def bigram_hist(inStr):
bigramDict = {}
inStr = inStr.lower()
n = len(inStr)
for i in range(n - 1):
key = (inStr[i], inStr[i + 1])
if(bigramDict.get(key) == None):
bigramDict[key] = 1
else:
bigramDict[key] += 1
return(bigramDict)
# bigram_hist('Cat, Dog')
In this problem, we'll see how we can use tuples to represent vectors. Later in the semester, we'll see the Python numpy
and scipy
packages, which provide objects specifically meant to enable matrix and vector operations, but for now tuples are all we have. So, for this problem we will represent a $d$-dimensional vector by a length-$d$ tuple of floats.
vec_scalar_mult
, which takes two arguments: a tuple of numbers (floats and/or integers) t
and a number (float or integer) s
and returns a tuple of the same length as t
, with its entries equal to the entries of t
multiplied by s
. That is, vec_scalar_mult
implements multiplication of a vector by a scalar. Your function should check to make sure that the types of the input are appropriate (e.g., that s
is a float or integer), and raise a TypeError
with a suitable error message if the types are incorrect. However, your function should gracefully handle the case where the input s is an integer rather than a float, or the case where some or all of the entries of the input tuple are integers rather than floats. Hint: you may find it useful for this subproblem and the next few that follow it to implement a function that checks whether or not a given tuple is a "valid" vector (i.e., checks if a variable is a tuple and checks that its entries are all floats and/or integers).def isValidTuple(t):
if not isinstance(t, tuple):
return(False)
if not all(isinstance(elmt, (int, float)) for elmt in t):
return(False)
return(True)
def vec_scalar_mult(t, s):
if not isValidTuple(t):
raise TypeError('Input must be a tuple and/or instances must be int/float')
else:
return(tuple(s * x for x in t))
# t = (1, 4, 4.5, 1)
# print(vec_scalar_mult(t, 2))
vec_inner_product
which takes two "vectors" (i.e., tuples of floats and/or ints) as its inputs and outputs a float corresponding to the inner product of these two vectors. Recall that the inner product of vectors $x,y\in\mathbb{R}^{d}$ is given by $\sum_{j=1}^{d}x_{j}y_{j}$. Your function should check whether or not the two inputs are of the correct type (i.e., both tuples), and raise a TypeError
if not. Your function should also check whether or not the two inputs agree in their dimension (i.e., length, so that the inner product is well-defined), and raise a ValueError
if not.def vec_inner_product(x, y):
if not (isValidTuple(x) and isValidTuple(y)):
raise TypeError('Input must be a tuple and/or instances must be int/float')
elif not (len(x) == len(y)):
raise ValueError('Dimension mismatch!')
else:
return(float(sum(x[i] * y[i] for i in range(len(x)))))
# x = (1, 2.3)
# y = (2, 4)
# vec_inner_product(x, y)
my_mx
. Then my_mx
will be a length-$m$ tuple of $n$-tuples, so that the $i$-th row of the matrix is given (as a vector) by the $i$-th entry of tuple my_mx
.Write a function check_valid_mx
that takes a single argument and returns a Boolean, which is True
if the given argument is a tuple that validly represents a matrix as described above, and returns False
otherwise. A valid matrix will be a tuple of tuples such that
def isValidTupleAndCol(t, nCol):
if not isinstance(t, tuple):
return(False)
if not all(isinstance(elmt, (int, float)) for elmt in t):
return(False)
if (len(t) != nCol):
return(False)
return(True)
def check_valid_mx(my_mx):
# First check if my_mx is a tuple
if not isinstance(my_mx, tuple):
return(False)
# Get the length of the first tuple and check all other tuples have length equal to this
nCol = len(my_mx[0])
# Iterate through each tuple in my_mx and check conditions
for t in my_mx:
if not isValidTupleAndCol(t, nCol):
return(False)
return(True)
# my_mx = ( (1, 2), (2, 3), (1, 2))
# check_valid_mx(my_mx)
mx_vec_mult
that takes a matrix (i.e., tuple of tuples) and a vector (i.e., a tuple) as its arguments, and returns a vector (i.e., a tuple of numbers) that is the result of multiplying the given vector by the given matrix. Again, if you are not familiar with matrix-vector multiplication, refer to Wikipedia or any linear algebra textbook. Your function should check that all the supplied arguments are reasonable (e.g., using your function check_valid_mx
), and raise an appropriate error if not. Hint: you may find it useful to make use of the inner-product function that you defined previously.def mx_vec_mult(my_mx, x):
# Check first we have a valid matrix
if not check_valid_mx(my_mx):
raise(TypeError)
outVec = []
for i in my_mx:
# Return the inner product of each row of my_mx and vector x.
# Note: vec_inner_product already does all the error checking
# and will raise the appropriate errors
outVec.append(vec_inner_product(i, x))
return(tuple(outVec))
# my_mx = ( (7, 1.1, 6), (2, 2.2, 5), (1, 3.3, 4))
# x = (1, 2.3, 5.5)
# mx_vec_mult(my_mx, x)
In the previous problem, you implemented matrix and vector operations using tuples to represent vectors. In many applications, it is common to have vectors of dimension in the thousands or millions, but in which only a small fraction of the entries are nonzero. Such vectors are called sparse vectors, and if we tried to represent them as tuples, we would be using thousands of entries just to store zeros, which would quickly get out of hand if we needed to store hundreds or thousands of such vectors.
A reasonable solution is to instead represent a sparse vector (or matrix) by only storing its non-zero entries, with (index, value) pairs. We will take this approach in this problem, and represent vectors as dictionaries with positive integer keys (so we index into our vectors starting from 1, just like in MATLAB and R). A valid sparse vector will be a dictionary that has the properties that (1) all its indices are positive integers, and (2) all its values are floats.
is_valid_sparse_vector
that takes one argument, and returns True
if and only if the input is a valid sparse vector, and returns False
otherwise. Note: your function should not assume that the input is a dictionary.def is_valid_sparse_vector(vec):
# First check if vec is a dictionary
if not isinstance(vec, dict):
return(False)
# Check all keys are positive integers
if not all( (isinstance(key, int) and key > 0) for key in vec.keys() ):
return(False)
# Check all values are floats
if not all( isinstance(val, float) for val in vec.values() ):
return(False)
return(True)
# vec = {1: 1.2, 5: 0.1}
# is_valid_sparse_vector(vec)
sparse_inner_product
that takes two "sparse vectors" as its inputs, and returns a float that is the value of the inner product of the vectors that the inputs represent. Your function should raise an appropriate error in the event that either of the inputs is not a valid sparse vector.Note: This may be your first foray into algorithm design, so here's something I'd like you to think about: there are several distinct ways to perform this inner product operation, depending on how one chooses to iterate over the entries of the two dictionaries. For this specific problem, it doesn't much matter which you choose, but there is an important point that you should consider: if the indices of our vectors were sorted, there would be an especially fast way to perform this operation that would require that we look at each entry of the two vectors at most once. Unfortunately, there is no guarantee about order of dictionary keys, so we can't take advantage of this fact, but we'll come back to it. You do not need to write anything about this, but please give it some thought.
def is_valid_sparse_vector_error(vec):
# First check if vec is a dictionary
if not isinstance(vec, dict):
raise(TypeError)
# Check all keys are positive integers
if not all( (isinstance(key, int) and key > 0) for key in vec.keys() ):
raise(ValueError)
# Check all values are floats
if not all( isinstance(val, float) for val in vec.values() ):
raise(ValueError)
def sparse_inner_product(x, y):
# Check if inputs are valid sparse vectors and raise appropriate errors if not
is_valid_sparse_vector_error(x), is_valid_sparse_vector_error(y)
# Initialize the innerProduct value
innerProd = 0
# Iterate through keys in dictionary x and compare them to dictionary y
for key in x:
if(y.get(key) != None):
innerProd += (x[key] * y[key])
return(innerProd)
# x = {1: 2.0, 3: 4.4, 4: 6.4, 6: 8.2}
# y = {1: 2.1, 3: 4.2, 4: 6.4, 7: 8.2}
# y = [1,2]
# sparse_inner_product(x, y)
In this problem, you'll do a bit more with tuples.
min
and max
take any (positive) number of arguments, but that sum
does not behave similarly. Write a function called my_sum
that takes any number of numeric (ints and floats) arguments, and returns the sum of its arguments. Your function should correctly handle the case of zero arguments. You need not perform any error checking for this function. Reminder: by convention, an empty sum is taken to be 0.def my_sum(*args):
return(sum(args))
reverse_tuple
that takes a tuple as its only argument and returns a tuple that is the reverse of the input. That is, the output should have as its first entry the last entry of the input, the second entry of the output should be the second-to-last entry of the input, and so on. You need not perform any error checking for this function.def reverse_tuple(t):
return(t[::-1])
# reverse_tuple((1,2,3))
rotate_tuple
that takes two arguments: a tuple and an integer, in that order. Letting $n$ be the integer given in the input, your function should return a tuple of the same length as the input tuple, but with its entries "rotated" by $n$. If $n$ is positive, this should mean to "push forward" all the entries of the input tuple by $n$ entries, with entries that "go off the end" of the tuple being wrapped around to the beginning, so that the $i$-th entry of the input tuple becomes the $(i + n)$-th entry of the output, wrapping around to the beginning of the tuple if this index goes off the end. If $n$ is negative, then this corresponds to rotating the entries in the other direction, with entries of the input tuple being "pushed backward". Your function should perform error checking to ensure that the inputs are of appropriate types. If the user supplies a non-integer, print a message to alert the user that the input was not as expected, and try to recover by casting it to an integer. Hint: a try/catch statement will likely be useful here.def rotate_tuple(t, n):
if not isinstance(n, int):
print("Warning: n should be an int, n was casted to int!")
n = int(n)
if not isinstance(t, tuple):
raise TypeError('Input should be a tuple!')
n = -n
secondHalf = t[0:n]
firstHalf = t[n:]
return(firstHalf + secondHalf)
# n = -2.5
# t = (1, 2, 3, 4, 5)
# rotate_tuple(t, n)