posted Jul 28, 2009 1:30 PM by Wei Hu
[
updated Oct 6, 2009 9:05 PM
]
This slides compares the efficiency of various approaches in implementing a compiler. SYB is very slow: it works by runtime type introspection; Its traversals don’t do any pruning, so it’ll look at every Char of every String to see if it’s a Name. A talk with video and slides. An excellent tutorial. Comparison of libraries. |
posted May 25, 2009 3:14 PM by Wei Hu
http://stackoverflow.com/questions/900585/why-are-functions-in-ocaml-f-not-recursive-by-default/904715#904715
One crucial reason for the explicit use of rec is to do with Hindley-Milner type inference, which underlies all staticly typed functional programming languages (albeit changed and extended in various ways). If you have a definition let f x = x, you'd expect it to have type 'a -> 'a and to be applicable on different 'a types at different points. But equally, if you write let g x = (x + 1) + ..., you'd expectx to be treated as an int in the rest of the body of g. The way that Hindley-Milner inference deals with this distinction is through an explicit generalisation step. At certain points when processing your program, the type system stops and says "ok, the types of these definitions will be generalised at this point, so that when someone uses them, any free type variables in their type will be freshly instantiated, and thus won't interfere with any other uses of this definition." It turns out that the sensible place to do this generalisation is after checking a mutually recursive set of functions. Any earlier, and you'll generalise too much, leading to situations where types could actually collide. Any later, and you'll generalise too little, making definitions that can't be used with multiple type instantiations. So, given that the type checker needs to know about which sets of definitions are mutually recursive, what can it do? One possibility is to simply do a dependency analysis on all the definitions in a scope, and reorder them into the smallest possible groups. Haskell actually does this, but in languages like F# (and OCaml and SML) which have unrestricted side-effects, this is a bad idea because it might reorder the side-effects too. So instead it asks the user to explicitly mark which definitions are mutually recursive, and thus by extension where generalisation should occur. |
posted Apr 1, 2009 8:56 AM by Wei Hu
I never thought n+k patterns was any special. For example, aren't the two definitions equivalent? fac1 0 = 1 fac1 (n+1) = (n+1) * fac1 n
fac2 0 = 1 fac2 n = n * fac2 (n-1)
In fact, they aren't. GHCi infers fac1's type as (Integral t) => t -> t, and fac2's as (Num t) => t -> t. Why's that? The answer is in the Haskell report.
Matching an n+k pattern (where n is a variable and k is a positive integer literal) against a value v succeeds if x >= k, resulting in the binding of n to x - k, and fails otherwise. An n+k pattern can only be matched against a value in the class Integral.
case v of { x+k -> e; _ -> e' } = if v >= k then (\x -> e) (v-k) else e' where k is a numeric literal
That's why fac3 terminates while fac4 diverges:
fac3 (n+1) = (n+1) * fac3 n fac3 0 = 1
fac4 n = n * fac4 (n-1) fac4 0 = 1 And that's also why ghci warns about fac1's and fac3's pattern matches are non-exhaustive.
As the Haskell report points out, many people feel that n+k patterns should not be used. These patterns may be removed or changed in future versions of Haskell . |
posted Mar 8, 2009 10:47 PM by Wei Hu
[
updated May 27, 2009 9:19 AM
]
UPDATE: Since ghc-6.10.3 switched to Haskeline, the problem has been solved forever.
I have long hated the behavior of ghci in Emacs since ghc's transition from readline to editline. I googled for this problem before but found no solution. This problem has bugged me enough that I decided to take a look at haskell-mode's source code, thinking haskell-mode is doing something funny. Then I ended up reading the source code of comint and the Emacs Lisp Manual. Finally, it turns out that the primitive process functions are problematic with ghci.
Executing these two lines (start-file-process "p1" "foo" "ghci") (process-send-string "p1" "3\n") will display Prelude> 3^M 3^M in the buffer foo.
In contrast, executing (start-file-process "p2" "bar" "bash") (process-send-string "p2" "uname\n") simply displays $Linux , without echoing the input "uname".
So, either Emacs or libedit is being dumb (or is trying to be clever). Being unable to progress further, I tried google again. This time Google turns up a helpful result. To quote Judah Jacobson,
The problem is that eshell tells subprocesses that they're running in a terminal (e.g., when queried via hIsTerminalDevice), but always echos user input itself regardless of the tty's ECHO attribute. This confuses libedit, which assumes that if it's connected to a terminal then it can turn off echoing in order to run its own rich line editor.
In "M-x shell", you can work around this issue by starting ghci with, for example, "cat | ghci" to prevent ghci and libedit from trying to run as if in a terminal. Probably something similar could be done for haskell-mode; I'm not familiar enough with emacs to fix it myself.
The workaround given by Conal Elliott is as follows
Given this info, there's a fairly easy emacs haskell-mode work-around. I made a shell script "ghci-no-tty" in my ~/bin that contains
# So ghci+readline won't echo input cat | /usr/local/bin/ghci $*
and used "M-x customize-group" with the "haskell" group to set the "Haskell Program Name" variable to "/home/conal/bin/ghci-no-tty" (must be full path).
After reading the manual more closely, I found yet a better solution: (add-hook 'haskell-mode-hook '(lambda () (setq process-connection-type nil))) |
posted Mar 7, 2009 4:16 PM by Wei Hu
[
updated Mar 14, 2009 9:44 PM
]
Useful ResourcesRuntime Support for Multicore Haskell, Control.Parallel, Control.Parallel.Strategies, Real World Haskell.
My Random Notes1. Internals of parpar stores its first argument as spark in the spark pool, and then continues by evaluating its second argument.
2. seq and strictnessIt's easy to see that seq is strict in its first argument, but it's harder to understand that seq is also strict in its second argument. The wikibook explains why even id is strict, although id itself doesn't look inside its argument. When we say "Does f x force x?" what we really mean is "Given that we're forcing f x, does x get forced as a result?". To look at this in another way, we can imagine forcing the argument of id in id's body without changing the semantics. Because if we get to a point that forces (id x), we'll be forcing x anyway; if we're not forcing (id x), the code that forces x in id's body doesn't get executed.
3. pseq and strictness-- See the code and comments in GHC.Conc pseq x y = x `seq` lazy y
-- See the code and comments in GHC.Base lazy x = x
pseq is only strict in its first argument, as the compiler sees it. Why does this definition make pseq lazy in its second argument? As I understand it, lazy is a magical name that fools the strictness analysis through a hardcoded compiler hack. What's more, lazy is re-exported in GHC.Exts for us to take advantage of.
4. Why pseqDue to their strictness properties, pseq has an order-of-evaluation guarantee, while seq has no guarantee.
5. Algorithms + Strategies = ParallelismNon-strict evaluation can get in the way. Rather than coding up ad-hoc strict evaluating code, we can generalize the idea of evaluation strategies. Evaluation strategies essentially means two things: how deep to evaluate the data structure, and whether to evaluate in parallel or in sequence. The depth is controlled by instances of the type class NFData that build on the basic strategies r0 (lazy) and rwhnf (strict). A related module called DeepSeq has also been proposed to deeply evaluate data structures, but it's not standardized and should be avoided. The parallelism is controlled by strategy combinators that build on par (parallel) and pseq (sequential).
The interesting result is that we can now write code as normal, and then evaluate the "return value" using the strategy of our choice.
When reading the source code of Control.Parallel.Strategies, it's very important to note that seq is redefined as pseq! |
posted Jan 27, 2009 2:16 PM by Wei Hu
[
updated Nov 10, 2009 7:53 PM
]
There are a handful of games developed in Haskell. Here I'm collecting those that can be used as learning tools.
Paratrooper and Plot 4. The programs may be a little too imperative and not so elegant, but they come with very detailed writeup. The games themselves are not very exciting. They depend on the HGL library, whose interface has changed. Basically, you need to change the line > import GraphicsUtils to > import Graphics.HGL.
|
posted Jan 18, 2009 5:28 PM by Wei Hu
[
updated Jan 18, 2009 5:43 PM
]
Learned this from reading the doc of Control.Moand.Cont.
A little background. The function runCont accepts a Cont monad and the monad's continuation. In the 2nd example, the continuation is represented by the do block, which should be the first argument of runCont. One solution I knew of, is: (flip runCont) id $ do ...
Here we see another arguably better solution: (`runCont` id) $ do ...
which translates to (do ...) `runCont` id
and then runCont (do ...) id
Cool! |
posted Dec 13, 2008 11:49 PM by Wei Hu
The interesting paper is published in JFP 08.
The naive sieve (the first example on this page, at the bottom of which is a link to a collection to prime number programs by the paper author), although beautiful, is not a genuine sieve. A genuine sieve crosses off all the multiples of a prime number. The next remaining number is then found to be a prime. However, in the naive sieve, every number is tested against all the primes that are less than it. This is the most inefficient algorithm disguised as a sieve!
The genuine sieve is O(n loglogn), and the unfaithful sieve is O(n2/(logn)2).
How can we fix it? Classic imperative implementations use an array and find primes up to some fixed limit. We want a purely functional program that produce an infinite list efficiently. We perform "crossing off" in a lazy way. For each prime p that is discovered so far, there is an iterator holding the next multiple to cross off. A heap-based priority queue (Section 3.1) is a proper data structure for this problem.
It's apparent that we can cut down the work by skipping even numbers. Generalizing this idea further, a prime wheel can be used for optimization.
Finally, a list-based implementation by Richard Bird is given in Section 6. |
posted Dec 8, 2008 11:16 PM by Wei Hu
这个问题的重点在于只能一次遍历树。
Question设一棵树的类型为: data Tree t = Leaf t | Node t (Tree t ) (Tree t ) deriving (Show) 写一函数 avgTree :: (Fractional t) => Tree t -> Tree t,将树中所有节点中的值 (即类型为 t 的那些值) 全部替换为它们的均值。常见的方法是遍历这棵树取均值、再遍历一次 mirror 这棵树以构造新树,这里的要求是只遍历一次。
AnsweravgTree :: (Fractional t) => Tree t -> Tree t avgTree t = let (finalTree, sum, count) = go t ans = sum / count go (Leaf x) = (Leaf ans, x, 1) go (Node x l r) = (Node ans l' r', x+suml+sumr, 1+nl+nr) where (l', suml, nl) = go l (r', sumr, nr) = go r in finalTree
下面 这个问题更难,起初我只能想到使用imperative style进行destructive update的方法,但不够purely functional。最后想到可以使用一个辅助列表来记录当前访问过的节点,并将待访问的子节点推迟到以后计算。这样一来就把destructive update的需求消除了,毕竟我们只需做一次update(是不是所有只需做一次update的问题都可以转换为lazy evaluation来达到purely functional)?借助一个辅助数据结构来保存数据,也是Haskell中memoization的常用技巧。
Question考慮這樣的 internally labelled binary tree: data Tree a = Node a (Tree a) (Tree a) | Null
目標是寫一個函數 bfl :: Tree a -> [b] -> Tree (a, b) bfl t xs = ?
其中 xs 是無限長的 stream。函數 bfl 會依照先寬度(不是先深度)的順序把 t 的節點一個個用 xs 標記。例如 t = Node 'a' (Node 'b' (Node 'c' Null Null) (Node 'd' Null Null)) (Node 'e' (Node 'f' Null Null) (Node 'g' Null Null))
那麼 bfl t [0..] = Node ('a',0) (Node ('b',1) (Node ('c',3) Null Null) (Node ('d',4) Null Null)) (Node ('e',2) (Node ('f',5) Null Null) (Node ('g',6) Null Null))
Answerdata Tree a = Node a (Tree a) (Tree a) | Null deriving Show
bfl :: Tree a -> [b] -> Tree (a, b) bfl Null _ = Null bfl t stream = head ansList where ansList = go 0 [t] [] go _ [] nodes = nodes go count q nodes = let (Node x l r, q') = deque q q'' = enque r $ enque l q' tag = stream !! count distance = length q node = Node (x, tag) l' r' lcount = count + distance rcount = case l of Null -> lcount _ -> lcount + 1 l' = case l of Null -> Null _ -> ansList !! lcount r' = case r of Null -> Null _ -> ansList !! rcount in go (count + 1) q'' $ nodes ++ [node]
enque Null = id enque x = (x:)
deque q = (last q, init q)
test = bfl t stream where t = Node 'a' (Node 'b' (Node 'c' Null Null) (Node 'd' Null Null)) (Node 'e' (Node 'f' Null Null) (Node 'g' Null Null)) stream = [0..]
|
posted Dec 1, 2008 9:12 PM by Wei Hu
[
updated May 10, 2009 12:50 PM
]
I just read the great tutorial on parallel and concurrent programming by Simon Peyton Jones et al. It covers three forms of parallelism: semi-explicit parallelism (par), explicit concurrency using threads (MVar or STM), and data parallelism. The first two can be called task parallelism. The tutorial even talks about the tricky details of Weak Head Normal Form, evaluation order, and the subtle difference between pseq and seq. It is the best tutorial by far besides Real World Haskell.
DPH [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] is particularly interesting, as it seems like THE Holy Grail. It's still a hot research area and thus hasn't been stabilized yet. It seems that Debian's version of ghc-6.10 is missing the dph package??
Compared with threads, DP can achieve better performance on distributed-memory architecture. In DP, the parallelism comes implicitly from parallel arrays, without using forkIO or par. The idea looks pretty similar to Google's MapReduce.
Flat DP vs. Nested DP (NDP). Flat DP is bad in that all the parallelism must come from a single parallel loop, making composing parallel functions or data structures impossible. NDP doesn't have this restriction. NDP is implemented by transforming the code to flat DP. The transformation is called vectorization.
Parallel arrays are strict. |
|