In Mathematics, the comprehension notation can be used to construct new sets from old sets.
$ \{x^2 | x \in \{1...5\}\} $
The set {1,4,9,16,25} of all numbers $x^2$ such that x is an element of the set {1...5}.
In Haskell, a similar comprehension notation can be used to construct new lists from old lists
[x^2 | x <- [1..5]]
[(x,y) | x <- [1,2,3], y <- [4,5]]
[(x,y) | y <- [4,5], x <- [1,2,3]]
Multiple generators are like nested loops, with later generators as more deeply nested loops whose variables change value more frequently
For example:
[(x, y) | y <- [4,5], x <- [1,2,3]]
x $\leftarrow$ [1,2,3] is the last generator, so the value of the x component of each pair changes most frequently
Later generators can depend on the variables that are introduced by earlier generators.
[(x, y) | x <- [1..3], y <- [x..3]]
The list [(1,1),(1,2),(1,3),(2,2),(2,3),(3,3)] of all pairs of number (x,y) such that x,y are elements of the list [1..3] and $y \geq x$
Using a dependant generator we can define the library function that concatenates a list of lists:
concat :: [[a]] -> [a]
concat xss = [x | xs <- xss, x <- xs]
concat [[1,2,3], [4,5], [6]]
Similarly, we can define the library function, length
as follows:
length :: [a] -> Int
length xs = sum [1 | _ <- xs]
length [1,2,3,4,5,6]
List comprehensions can use guards to restrict the values produced by earlier generators
[x | x <- [1..10], even x]
The list [2,4,6,8,10] of all numbers x such that x is an element of the list [1..10] and x is even
Using a guard we can define a function that maps a positive integer to its list of factors:
factors :: Int -> [Int]
factors n = [x | x <- [1..n], n `mod` x == 0]
factors 15
factors 7
A postive integer is prime if its only factors are 1 and itself. Hence, using factors we can define a function that decides if a number is prime:
prime :: Int -> Bool
prime n = factors n == [1,n]
prime 7
prime 15
Using a guard we can now define a function that returns the list of all primes up to a given limit:
primes :: Int -> [Int]
primes n = [x | x <- [1..n], prime x]
-- notice how you can get the same results
-- even if you do primes n = [x | x <- [2..n], prime x]
-- because we know 1 is not a prime and we do not have to always check for that
primes 40
As a final example of guards, consider a list of (key,value) pairs, where the key values support the == operator. The following function finds all the values associated with a given key in the list of key-value pairs:
find :: Eq a => a -> [(a,b)] -> [b]
find k t = [v | (k',v) <- t, k == k']
find 'b' [('a',3), ('b',2),('c',5),('b',4)]
A useful library function is zip, which maps two lists to a list of pairs of their corresponding elements.
zip :: [a] -> [b] -> [(a,b)]
zip ['a', 'b', 'c'] [1,2,3,4]
Using zip we can define a function returns the list of all pairs of adjacent elements from a list:
pairs :: [a] -> [(a,a)]
pairs xs = zip xs (tail xs)
pairs [1,2,3,4,5]
Using pairs we can define a function that decides if the elements in a list are sorted:
sorted :: Ord a => [a] -> Bool
sorted xs = and [x <= y | (x,y) <- pairs xs]
sorted [1,2,3,4]
sorted [1,2,45,3,7]
Using zip we can define a function that returns the list of all position of a value in a list:
positions :: Eq a => a -> [a] -> [Int]
positions x xs =
[i | (x', i) <- zip xs [0..], x == x']
positions 0 [1,0,0,1,0,1,1,0]
positions False [False,True,False,False,True,True]
A string is a sequence of characters enclosed in double quotes. Internally, however, strings are represented as lists of characters.
"abc" :: String
means ['a','b','c'] :: Char
Because strings are just special kinds of lists, any polymorphic function that operates on lists can also be applied to strings. For example:
length "abcde"
take 3 "abcde"
zip "abc" [1,2,3,4]
Similarly, list comprehensions can also be used to define functions on strings, such as counting how many times a character occurs in a string:
count :: Char -> String -> Int
count x xs = length [x' | x' <- xs, x' == x]
count 's' "Mississippi"
or counting how many lowercase letters are there in a string:
--import Data.Char
lowers :: String -> Int
lowers xs = length [x | x <- xs, x >='a' && x <= 'z']
--lowers xs = length [x | x <- xs, isAsciiLower x]
lowers "Haskell"
The Caesar Cipher method encodes a regular string using a "shift" factor . For example the string "haskell is fun" encoded with a shift factor of 3 would be encoded as "kdvnhoo lv ixq"
import Data.Char
For simplicity we will encode only lowercase letters and leave every other letter the same.
let2int :: Char -> Int
let2int c = ord c - ord 'a'
int2let :: Int -> Char
int2let n = chr (ord 'a' + n)
let2int 'a'
let2int 'n'
int2let 0
int2let 13
Using the above functions, we now develop a shift
function that takes an shift value and a character and returns the new character after shifting it.
shift :: Int -> Char -> Char
shift n c
| isLower c = int2let((let2int c + n) `mod` 26)
| otherwise = c
shift 3 'h'
shift 3 'H'
shift 3 'y'
shift (-3) 'b'
Using the shift function and list comprehensions, we can now define the encode and decode functions
encode :: Int -> String -> String
encode n xs = [shift n x | x <- xs]
Note that we do not need a separate function to decode because providing a negative shift factor will automatically decode!
encode 3 "haskell is fun"
encode (-3) "kdvnhoo lv ixq"
The key to cracking Caesar's cipher is the observation that some letters are more used more frequently than others in English text. By analyzing a large corpus of English text we can get the following frequency distribution of letters a-z:
table :: [Float]
table = [8.1, 1.5, 2.8, 4.2, 12.7, 2.2, 2.0, 6.1, 7.0,
0.2, 0.8, 4.0, 2.4, 6.7, 7.5, 1.9, 0.1, 6.0,
6.3, 9.0, 2.8, 1.0, 2.4, 0.2, 2.0, 0.1]
percent :: Int -> Int -> Float
percent n m = (fromIntegral n /fromIntegral m) * 100
percent 5 15
Using percent within a list comprehension along with lowers and count defined previously, we can now define a function, called freqs that takes a string and returns the percentages of occurrence for each letter of the alphabet in the string.
freqs :: String -> [Float]
freqs xs = [percent (count x xs) n | x <- ['a'..'z']]
where n = lowers xs
freqs "aaabbbbccc"
A standard method for comparing a list of "observed" frequencies os
with a list of "expected" frequencies es
is the chi-square statistic.
The details of the chi-squares statistic is not important. The smaller the value of the chi-square value the better is the match between os
and es
.
-- chi-square statistic
chisqr :: [Float] -> [Float] -> Float
chisqr os es = sum [((o-e)^2)/e | (o,e) <- zip os es]
-- rotate left by n positions
rotate :: Int -> [a] -> [a]
rotate n xs = drop n xs ++ take n xs
rotate 3 [1,2,3,4,5]
Now suppose that we are given an encoded string, but not the shift factor. We can determine the shift factor by producing the frequency table of the encoded string, calculating the chi-square statistic for each possible shift (factor=0..25) and using the position of the minimum chi-square value as the shift factor!
table' = freqs "kdvnhoo lv ixq"
[chisqr (rotate n table') table | n <- [0..25]]
factor is the index where the chisqr value is minimum! factor = 3
crack :: String -> String
crack xs = encode (-factor) xs
where
factor = head (positions (minimum chitab) chitab)
chitab = [chisqr (rotate n table') table | n <- [0..25]]
table' = freqs xs
crack "kdvnhoo lv ixq"
crack (encode 6 "list comprehensions are useful")
crack (encode 5 "The lakers beat the warriors by 2 points")
For small strings and for unusual distributions in strings, this method does not work.
crack (encode 3 "haskell")
crack (encode 3 "boxing wizards jump quickly")