LIST COMPREHENSIONS

In Mathematics, the comprehension notation can be used to construct new sets from old sets.

$ \{x^2 | x \in \{1...5\}\} $

The set {1,4,9,16,25} of all numbers $x^2$ such that x is an element of the set {1...5}.

In Haskell, a similar comprehension notation can be used to construct new lists from old lists

In [8]:
[x^2 | x <- [1..5]]
[1,4,9,16,25]
  • The expression x $\leftarrow$ [1..5] is called a generator, as it states how to generate values for x.
  • Comprehensions can have multiple generators, separated by commas. For example:
In [9]:
[(x,y) | x <- [1,2,3], y <- [4,5]]
[(1,4),(1,5),(2,4),(2,5),(3,4),(3,5)]
  • Changing the order of the generators changes the order of the elements in the final list:
In [10]:
[(x,y) | y <- [4,5], x <- [1,2,3]]
[(1,4),(2,4),(3,4),(1,5),(2,5),(3,5)]
  • Multiple generators are like nested loops, with later generators as more deeply nested loops whose variables change value more frequently

  • For example:

In [11]:
[(x, y) | y <- [4,5], x <- [1,2,3]]
[(1,4),(2,4),(3,4),(1,5),(2,5),(3,5)]

x $\leftarrow$ [1,2,3] is the last generator, so the value of the x component of each pair changes most frequently

Dependent Generators

Later generators can depend on the variables that are introduced by earlier generators.

In [12]:
[(x, y) | x <- [1..3], y <- [x..3]]
[(1,1),(1,2),(1,3),(2,2),(2,3),(3,3)]

The list [(1,1),(1,2),(1,3),(2,2),(2,3),(3,3)] of all pairs of number (x,y) such that x,y are elements of the list [1..3] and $y \geq x$

Using a dependant generator we can define the library function that concatenates a list of lists:

In [13]:
concat :: [[a]] -> [a]
concat xss = [x | xs <- xss, x <- xs]

concat [[1,2,3], [4,5], [6]]
[1,2,3,4,5,6]

Similarly, we can define the library function, length as follows:

In [14]:
length :: [a] -> Int
length xs = sum [1 | _ <- xs]

length [1,2,3,4,5,6]
6

GUARDS

List comprehensions can use guards to restrict the values produced by earlier generators

In [15]:
[x | x <- [1..10], even x]
[2,4,6,8,10]

The list [2,4,6,8,10] of all numbers x such that x is an element of the list [1..10] and x is even

Using a guard we can define a function that maps a positive integer to its list of factors:

In [16]:
factors :: Int -> [Int]
factors n = [x | x <- [1..n], n `mod` x == 0]

factors 15
factors 7
[1,3,5,15]
[1,7]

A postive integer is prime if its only factors are 1 and itself. Hence, using factors we can define a function that decides if a number is prime:

In [17]:
prime :: Int -> Bool
prime n = factors n == [1,n]
In [18]:
prime 7
True
In [19]:
prime 15
False

Using a guard we can now define a function that returns the list of all primes up to a given limit:

In [20]:
primes :: Int -> [Int]
primes n = [x | x <- [1..n], prime x]

-- notice how you can get the same results
-- even if you do primes n = [x | x <- [2..n], prime x]
-- because we know 1 is not a prime and we do not have to always check for that
In [21]:
primes 40
[2,3,5,7,11,13,17,19,23,29,31,37]

As a final example of guards, consider a list of (key,value) pairs, where the key values support the == operator. The following function finds all the values associated with a given key in the list of key-value pairs:

In [22]:
find :: Eq a => a -> [(a,b)] -> [b]
find k t = [v | (k',v) <- t, k == k']

find 'b' [('a',3), ('b',2),('c',5),('b',4)]
[2,4]

THE zip FUNCTION

A useful library function is zip, which maps two lists to a list of pairs of their corresponding elements.

zip :: [a] -> [b] -> [(a,b)]
In [23]:
zip ['a', 'b', 'c'] [1,2,3,4]
[('a',1),('b',2),('c',3)]

Using zip we can define a function returns the list of all pairs of adjacent elements from a list:

In [24]:
pairs :: [a] -> [(a,a)]
pairs xs = zip xs (tail xs)
In [25]:
pairs [1,2,3,4,5]
[(1,2),(2,3),(3,4),(4,5)]

Using pairs we can define a function that decides if the elements in a list are sorted:

In [26]:
sorted :: Ord a => [a] -> Bool
sorted xs = and [x <= y | (x,y) <- pairs xs]
In [27]:
sorted [1,2,3,4]
True
In [28]:
sorted [1,2,45,3,7]
False

Using zip we can define a function that returns the list of all position of a value in a list:

In [29]:
positions :: Eq a => a -> [a] -> [Int]
positions x xs =
    [i | (x', i) <- zip xs [0..], x == x']
In [30]:
positions 0 [1,0,0,1,0,1,1,0]
positions False [False,True,False,False,True,True]
[1,2,4,7]
[0,2,3]

STRING COMPREHENSIONS

A string is a sequence of characters enclosed in double quotes. Internally, however, strings are represented as lists of characters.

"abc" :: String

means ['a','b','c'] :: Char

Because strings are just special kinds of lists, any polymorphic function that operates on lists can also be applied to strings. For example:

In [31]:
length "abcde"
5
In [32]:
take 3 "abcde"
"abc"
In [33]:
zip "abc" [1,2,3,4]
[('a',1),('b',2),('c',3)]

Similarly, list comprehensions can also be used to define functions on strings, such as counting how many times a character occurs in a string:

In [34]:
count :: Char -> String -> Int
count x xs = length [x' | x' <- xs, x' == x]
In [35]:
count 's' "Mississippi"
4

or counting how many lowercase letters are there in a string:

In [36]:
--import Data.Char

lowers :: String -> Int
lowers xs = length [x | x <- xs, x >='a' && x <= 'z']
--lowers xs = length [x | x <- xs, isAsciiLower x]

lowers "Haskell"
Use isAsciiLower
Found:
x >= 'a' && x <= 'z'
Why Not:
isAsciiLower x
6

THE CAESAR CIPHER

The Caesar Cipher method encodes a regular string using a "shift" factor . For example the string "haskell is fun" encoded with a shift factor of 3 would be encoded as "kdvnhoo lv ixq"

Encoding and Decoding

In [37]:
import Data.Char

For simplicity we will encode only lowercase letters and leave every other letter the same.

In [38]:
let2int :: Char -> Int
let2int c = ord c - ord 'a'

int2let :: Int -> Char
int2let n = chr (ord 'a' + n)

let2int 'a'
let2int 'n'
int2let 0
int2let 13
0
13
'a'
'n'

Using the above functions, we now develop a shift function that takes an shift value and a character and returns the new character after shifting it.

In [39]:
shift :: Int -> Char -> Char
shift n c 
  | isLower c = int2let((let2int c + n) `mod` 26)
  | otherwise = c
  
shift 3 'h'
shift 3 'H'
shift 3 'y'
shift (-3) 'b'
'k'
'H'
'b'
'y'

Using the shift function and list comprehensions, we can now define the encode and decode functions

In [40]:
encode :: Int -> String -> String
encode n xs = [shift n x | x <- xs]

Note that we do not need a separate function to decode because providing a negative shift factor will automatically decode!

In [41]:
encode 3 "haskell is fun"
"kdvnhoo lv ixq"
In [42]:
encode (-3) "kdvnhoo lv ixq"
"haskell is fun"

Frequency Tables

The key to cracking Caesar's cipher is the observation that some letters are more used more frequently than others in English text. By analyzing a large corpus of English text we can get the following frequency distribution of letters a-z:

In [43]:
table :: [Float]
table = [8.1, 1.5, 2.8, 4.2, 12.7, 2.2, 2.0, 6.1, 7.0,
         0.2, 0.8, 4.0, 2.4, 6.7,  7.5, 1.9, 0.1, 6.0,
         6.3, 9.0, 2.8, 1.0, 2.4,  0.2, 2.0, 0.1]
In [44]:
percent :: Int -> Int -> Float
percent n m = (fromIntegral n /fromIntegral m) * 100

percent 5 15
33.333336

Using percent within a list comprehension along with lowers and count defined previously, we can now define a function, called freqs that takes a string and returns the percentages of occurrence for each letter of the alphabet in the string.

In [45]:
freqs :: String -> [Float]
freqs xs = [percent (count x xs) n | x <- ['a'..'z']]
        where n = lowers xs
        
freqs "aaabbbbccc"
[30.000002,40.0,30.000002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

Cracking the cipher

A standard method for comparing a list of "observed" frequencies os with a list of "expected" frequencies es is the chi-square statistic.

$$\sum_{i=0}^{n-1} \frac{(os_i - es_i)^2}{es_i}$$

The details of the chi-squares statistic is not important. The smaller the value of the chi-square value the better is the match between os and es.

In [47]:
-- chi-square statistic
chisqr :: [Float] -> [Float] -> Float
chisqr os es = sum [((o-e)^2)/e | (o,e) <- zip os es]
In [49]:
-- rotate left by n positions
rotate :: Int -> [a] -> [a]
rotate n xs = drop n xs ++ take n xs

rotate 3 [1,2,3,4,5]
[4,5,1,2,3]

Now suppose that we are given an encoded string, but not the shift factor. We can determine the shift factor by producing the frequency table of the encoded string, calculating the chi-square statistic for each possible shift (factor=0..25) and using the position of the minimum chi-square value as the shift factor!

In [58]:
table' = freqs "kdvnhoo lv ixq"
[chisqr (rotate n table') table | n <- [0..25]]
[1408.8524,640.0218,612.3969,202.42024,1439.9456,4247.318,650.9992,1164.7708,972.1826,993.1813,497.46844,1488.8606,2296.3413,1407.4161,1491.524,3033.984,659.5394,2836.3345,984.7049,809.6876,1310.4423,850.64154,2908.0313,954.4321,5313.5776,626.4024]

factor is the index where the chisqr value is minimum! factor = 3

In [50]:
crack :: String -> String
crack xs = encode (-factor) xs
  where
    factor = head (positions (minimum chitab) chitab)
    chitab = [chisqr (rotate n table') table | n <- [0..25]]
    table' = freqs xs
In [51]:
crack "kdvnhoo lv ixq"
"haskell is fun"
In [56]:
crack (encode 6 "list comprehensions are useful")
"list comprehensions are useful"
In [61]:
crack (encode 5 "The lakers beat the warriors by 2 points")
"The lakers beat the warriors by 2 points"

For small strings and for unusual distributions in strings, this method does not work.

In [59]:
crack (encode 3 "haskell")
"piasmtt"
In [60]:
crack (encode 3 "boxing wizards jump quickly")
"wjsdib rduvmyn ephk lpdxfgt"