[bert's blog - A Tangled Web]

TCHS 4O 2000 [4o's nonsense]

alvinny [2] - csq - edchong
jenming - joseph - law
meepok - mingqi - pea
pengkian [2] - qwergopot - woof
xinghao - zhengyu

HCJC 01S60 [understated sixzero]

andy - edwin - jack
jiaqi - peter - rex
serena

SAF 21SA

khenghui - jiaming - jinrui [2]
ritchie - vicknesh - zhenhao

Others
Lwei [2] - shaowei

- website links -

Alien Loves Predator
BloggerSG
Cute Overload!
Cyanide and Happiness
Daily Bunny
Hamleto
Hattrick
Magic: The Gathering
The Onion
The Order of the Stick
Perry Bible Fellowship
PvP Online
Soccernet
Sluggy Freelance
The Students' Sketchpad
Talk Rock
Talking Cock.com
Tom the Dancing Bug
Wikipedia
Wulffmorgenthaler

bert's blog v1.21
Powered by glolg
Programmed with Perl 5.6.1
on Apache/1.3.27 (Red Hat Linux)

best viewed at 1024 x 768 resolution
on Internet Explorer 6.0+
or Mozilla Firefox 1.5+

entry views: 3450
today's page views: 385 (109 mobile)
all-time page views: 3400370

most viewed entry: 18739 views
most commented entry: 14 comments
number of entries: 1228

page created Sat Jul 12, 2025 17:50:24

- tagcloud -

academics [70]
art [8]
changelog [49]
current events [36]
cute stuff [12]
gaming [11]
music [8]
outings [16]
philosophy [10]
poetry [4]
programming [15]
rants [5]
reviews [8]
sport [37]
travel [19]
work [3]

miscellaneous [75]

- category tags -

academics art changelog current events cute stuff gaming miscellaneous music outings philosophy poetry programming rants reviews sport travel work

tags in total: 386

i am now probably: surfing the web [?]
(status updated every 30 minutes)

name: Lim Yong San, Gilbert -

gender: Male
nationality: Singaporean
race: Chinese
dob: 25^th January 1984

height: 1.74m (5'8½")
weight: 67kg (147 pounds)
blood type: A+

Download full resume [PDF] [DOC]
currently: National University of Singapore
(studying Computer Science & Economics)
tertiary: Hwa Chong Junior College*
secondary: The Chinese High School*
* merged into Hwa Chong Institution in 2005
primary: Shuqun Primary School
pre: Jurong Christian Church Kindergarten

fav colour: Green
fav soccer clubs: Manchester United,
Brighton and Hove Albion,
English National Team
hobbies: Many (a few in no particular order:)
reading & writing
programming (sometimes)
webgame timesin ks
DotA
kicking ping-pong balls
all manner of sports involving balls
sleeping

Monday, Dec 17, 2012 - 00:02 SGT
Posted By: Gilbert

- academics -
A Tangled Web
After the weakly amusing downers of the previous days, it's time for some pick-me-ups:

We're all gonna make it (some inside knowledge helpful)
Private Squirrel, uniformed pal of Secret Squirrel
How To Skip Ads on Youtube (trust me, it's a pick-me-up)
A History of Art (alright, outdated)

(Source: redcafe.net)

And others:

A Helpful And Funny Guide
OOTS the Manga
Goodbye Tsukiji Fish Market, I barely knew ya
High Frequency Trading Needs Global Rules - Don't see any reason why the exchange can't just set up a tick system where orders are accumulated and processed only once every, say, five seconds, instead of "as fast as possible"; but that would be one less avenue to suck cash...
The side of economics that is closing the blinds, leaning over the table and asking, "How much do you want it to be?" (and is probably starting up closer to home)
The Numbers Against The Fruit
When they say die-hard fans, they mean it
Yet Another University List
Probably erroneous criticism (but given a 96% [!] writedown, still a warning that being "in it for the long term" does not always magically work)
Save The World (The Call left by a very hamsum Dartmouth scholar, now outsourced to passing heroes)
Good idea, not attempted
The Odds Are Good But The Goods Are Odd
So that's what the symbol on that keychain represents
The Lucifer Principle - on to-read list
Discworld Reading Order Guide - probably through most of them, but many plotlines already forgotten
And Why Is There A Giant Eyeball On The Dollar?
Mr. Ham raged (but he cooled down after figuring he didn't like hipsters either)
Well that didn't go too well
For Pacquaio either, confirmed by his mother, who has enough sense to realise that winning in boxing is more of training in boxing and making a quick sign if one really has to, than reading about how someone up there deserves full credit for your winning in boxing

Neuraltic

I suppose it's time to try this neural network shtick in earnest, motivated by the guy who ended up winning the Mitosis Detection contest having won quite a few similar ones as well. I mean, if a particular technique seems particularly suited to a particular class of problem, it only makes sense to adopt it (perhaps unless one actually invented the other technique); otherwise, it would be a little like sticking with a screwdriver when confronted with a nail, just because the screwdriver was the first tool one picked out from a toolbox, or because some smug rival you don't like used the hammer before you.

It might be noted that SVMs were probably the most popular classification scheme used in the conference contests because easy-to-use libraries are common (have I mentioned that if a researcher wants to popularize a new method, assuming that it is actually good, by far the best bet is probably to release it as a standalone multiplatform program with many examples?) Still, simply being "in" is however no guarantee of performance, as sparse representation, for example, seems unsuited for this class of problem (but don't quote me... well, as if there's any chance of that)

Basically, if facing a classification problem, this is my opinion on how it should be handled:

Generally - apply favourite techniques, quite often nearest-neighbour or regression (many researchers will stick with what they know best, at least at first)
Then - try radial basis function SVM (search for parameters properly!) and random trees. Achieved maximum accuracy/F-measure shouldn't be too far off what is possible with a lot more work
Image patterns - try convolutional neural networks (today's topic, considerations covered later)
If significant prize money is involved - it's a highly-tuned ensemble composed of a Frankenstein mishmash of arcane techniques, baby! Nobody will be able to explain how it works, or maybe even why it does at all, but who cares if it's able to squeeze out the last 0.001% that wins the million dollars?

Of course, neural networks have gained a not wholly-undeserved reputation of being slightly "black-magic" by themselves - it's often not easy to explain exactly how they produce results, unless you simply look at them as heuristic error function minimizers (some see them as a sequence of matrix multiplications); they are known to be universal function approximators, which as I see it basically means that a properly set-up network should be able to classify (near-?) perfectly assuming sufficient noise-free data - a guarantee that, like so many theoretical ones, is unfortunately largely useless in practice.

A comparism with SVMs has been attempted here, where it was noted that neural networks were "(relatively) easier to implement from scratch". Well, the time has come to put money where mouth is, so here's a pure Javascript implementation of a general fully-connected network:

A short FAQ: Click "Build" to, well, build the network. The "2,3,1" in the adjacent textbox describes a network with 2 input nodes, 3 nodes in the second (hidden) layer, and 1 node in the last (output) layer. Try "2,3,2,1", for example.

Input nodes are in green, output nodes in red, and intermediate hidden nodes in grey. Each edge connecting nodes has an associated weight, next to the dot on the edge; while I have tried to spread out the numbers as much as possible, overlaps are inevitable with more nodes/edges due to the limited space.

A freshly-built network doesn't usually do anything useful, so we have to train it by giving it data. Note the textbox "0,0,0-0,1,1-1,0,1-1,1,0". Each set of inputs/desired outputs is separated by a dash, so what this actually says is:

Input Node 1	Input Node 2	Desired Output
0	0	0
0	1	1
1	0	1
1	1	0

which is the XOR function (known to be unlearnable by single-layer perceptrons, i.e. without the hidden layer)

[N.B. Being a rudimentary implementation, there is minimal input checking, so don't be surprised if it blows up with input of the wrong format]

Clicking the "Train" button trains the network such that it hopefully produces the desired output when presented with the given input. Note the "10000" and "0.05" values. "10000" means that the network will be trained for 10000 epoches, or "iterations using all the training data" (four sets, or vectors, in this case), while "0.05" is the learning rate, or how much the edge values can be adjusted in each epoch (more later). Note that the edge values are changed by the process, and the error after every 1000 epoches is displayed in the textbox below the network.

Finally, "Test" runs the given test data (default "1,1") through the trained network, which should give the desired output "0".

So how does this magic work? A more complete explanation would come from Wikipedia, or this project (which was perhaps the best code-cum-theory all-in-one I could find), but I'll try to make it very quick. How (not why) a neural network operates once it is trained is actually very simple. Note that training only ever affects the weights of the edges connecting the nodes. Everything else - the structure, the number of nodes, etc, is untouched.

Using a trained network is simply a matter of passing inputs up, layer by layer. For example, the leftmost hidden node (layer two) has two edge connections to the first layer, and say that their weights are 0.45346 and 0.46975 respectively. Then, the value of the hidden node is just F(0.45346+0.46975), where F() is some squashing/activation function that ensures that the output remains within a certain (small) range regardless of the inputs. In this example, the hyperbolic tangent is used, so all input values above 3 get reduced to (very near) 1.

The same goes for all nodes and... that's it.

Yes, that's it.

This is the forward propagation part. The only really tricky part, which can probably be described as being the heart of the network, is how to learn the edge connection values in the first place, which is backward propagation. The basic idea is actually not that farfetched - simply pass your inputs through the current state of your network, and discover how different the output is from the desired output, and in what direction it is different. The weights are then adjusted slightly in the "correct" direction, again layer-by-layer but backwards this time, hopefully producing a smaller error in the next training epoch. This is repeated until the error is small enough.

There is no actual guarantee that the true global optimum will be reached (especially with a ton of weights to determine), but results don't lie (if achieved). A historical note is that weights were once adjusted only once each epoch (batch learning) instead of for each training vector in the epoch (incremental learning), but this has been convincingly argued to be inferior.

Do note that in this implementation, which has no bias nodes, the output when all input nodes are zero must necessarily be zero (as any weight multiplied by zero is still zero). Come to think of it, Herr Ahm might see an analogy between this and his hierarchical message propagation distortion theory...

That's it for basic neural networks. Convolutional neural networks follow the same basics, but share weights between edges, and are no longer fully-connected, i.e. not all nodes in the previous layers connect to each node in the next layer, a way of reducing complexity in a clever and biologically-inspired manner. Not too ungraspable either. There have been proposed modifications, but those are details.

However, even after this simplification (for the input, not for the programmer), such networks were still not too popular, as they took an awfully long time to train - it was mentioned that the contest winner would have taken a full year on a CPU, which is where the GPU comes in. Note that their power at number-crunching has already been acknowledged in our Bitcoin introduction last year.

Can't Run From The GPU

I'll be frank: learning a new programming paradigm can be a pain, especially under pressure, but in this case it might well be unavoidable. The future of computing is, as colourfully illustrated here, more tractors instead of faster tractors, due to heating constraints. Remember the giddy days when manufacturers boasted of faster processor speeds? I started out with a Pentium 166, upgraded to a 350, then lost track. Today, it's dual, quad, octa-cores.

Now, coding to take advantage of a GPU's capabilities is quite different from the CPU-based programming most are accustomed to (to the common code monkey, anyway). The GPU can do a lot of things simultaneously - just that they come with quite a few restrictions. It's up to the programmer to structure his algorithm logic such that the things that are done together are actually parallelizable, and unfortunately as of now he probably also has to take care of all the muck like shuffling memory between the CPU and GPU and whatnot.

As for the framework, the choice is largely between OpenCL and CUDA, with OpenCL (in theory) supported by many more vendors, and CUDA being Nvidia-specific (but still slightly more popular). Discussion with a lab senior has persuaded me to kick off with OpenCL for its portability, even though there appear to be more and better tutorials for CUDA. And yes, where there's speed, there's money.

While we're still on new (okay, currently less-popular) paradigms, good cases have been made for shifting from imperative to functional programming, it being hard but good. It might be a long wait before this percolates to industry, but Haskell isn't actually that tough, and *gasp* can actually do things! Who would have thought?

Oh, and it's always fun to rag on Java - see this gem, including sage observations like "Advocating Object-Oriented Programming is like advocating Pants-Oriented Clothing"

For those who just want to try GPU-based CNNs out (without worrying about integration into existing software), Theano and cuda-convnet are good bets, and there are a couple other projects out there for those interested in sample code. It should be remembered that a GPU implementation doesn't actually improve a CNN, though - it just speeds up the training a lot, which can be important in practice if not in theory.

Next: Financials

Related Posts:

You Bet Your Life
What I Do
Groundwork
Ghosts of Games Past
Robo Goes App