Powered by glolg
Display Preferences Most Recent Entries Chatterbox Blog Links Site Statistics Category Tags About Me, Myself and Gilbert XML RSS Feed
Sunday, July 21, 2013 - 16:48 SGT
Posted By: Gilbert

Call To Hams


Less confusing than humans



HPC

Thought I'd attend the High Performance Computing Technologies in Finances conference on Tuesday as it overlapped somewhat with my current work, whilst to relieve a bit of last November, and this is what I came away with:


Some Day One Quotes

(Thanks to the first speaker)

"Analyst productivity is important"

"Needles in a haystack vs. relationships between the straws" (Spurious correlations, perhaps the number one confounding factor with so much data)

"To know an elephant, you need all the monks"

"Analysts hate mathematical formulas"

"You need the right tool for the job"


Scalable Graph Analytics

The first (replacement) speaker started off by remarking that it's all connected, with reference to the Euro Crisis, and gave one possible definition of Big Data as that which exceeds infrastructural capabilities, and suggested that we might examine data as a network of connections at scale, instead of as large, fragmented datasets (Rumsfeld's unknown unknowns came in, if appropriately)

In particular, typical compliance environments were stated to have well-established policies and procedures, but a vast array of disconnected resources, rigid data schemes, require manual integration, and few resources to perform unconstrained discovery. Thus, as transaction volumes increase and alert volumes rise in tandem, more analysts had to be hired, which isn't ideal (seems like productivity is eating higher-end jobs too)

Various applications where the power of examining all (or at least more) relationships between data were discussed, including cybersecurity, which should recall the example of terrorist identification in Superfreakonomics (hopefully, they haven't picked up a copy). On the revenue side, customer insight/relationship discovery was touted, to produce cross- and up-selling opportunities.

Graphs were held as an ideal structure for discovery, with dynamic data sources (simple data model, support for multiple types, mixes with schema), increasing volumes (low redundancy, compact data format, easy expansion on the fly) and greater flexibility (supporting unique analytic techniques such as clustering, community detection, path analysis, etc), in contrast to traditional relational models that require prior knowledge, and don't work well under distributed computing where the problem can't be broken into chunks.

Introducing... Urika (not Eureqa). Thinking back, I should have realised that they were there to sell stuff (at least in passing). Anyhow, this presentation ended with the distinction between discovery and search, with the example given of a bank loan evaluation likely turning out very differently if it were realised that the applicant had only one client (e.g. Walmart), than if it were assumed they would track the industry average.


Driving Industrial Innovation

Intel's turn to hawk their Xeon Phi, which was held to outdo equivalent-tier GPUs. They appear to cost quite a bit more too, though, so it may just be getting what one pays for.


Credit Risk Applications

Finally, a guy who's not advertising a product! He's from the NUS Risk Management Institute, which had set up a non-profit Credit Research Initiative using purely public data (e.g. Bloomberg) on some 35000 companies in 106 countries. Being academic, it is proposed to serve the public interest better than corporate outfits such as Moody's (on which more soon), which have been known to be tardy in downgrading their paying client-targets.

On to the technical side, the forward intensity model (said to be analogous to the forward interest rate model) has dependencies statistically estimated using quasi-maximum likelihood estimation (QMLE, appears in A.I. too), which entails multiplying matrices of up to hundreds of thousands of rows, and is where the NVIDIA GPUs enter the picture.

The speedups achieved over the CPU were from five hours to 30 minutes and one day to five hours on two different tasks (around 10x and 5x respectively), which concurs with what I have generally managed to achieve for my own purposes.

Now, how do I put it? An increase in speed by a factor of ten-plus is certainly nothing to sniff at, but when computations still take days, it remains slightly disappointing. Apparently, raw teraflops isn't the best indicator of speed in practice here, as more-portable OpenCL can trail CUDA significantly if not well-optimized - which often negates the portability aspect, and makes one question if one wants to become a code-tuner.

Anyhow, they're planning to do credit stress testing in the future, which means lots of Monte Carlo simulations, said to be highly parallelizable but memory-intensive and compute-lite. So it seems that the remark during Wednesday's steamboat that throwing data into algorithms instead of working out analytical solutions might indeed be the way forward.


Not as glam, but at least winnable (on which more later)
(Source: wikimedia.org)



IBM Platform Symphony

Another product. Multi-user, low latency, high throughput, large scale, cost efficient shared services, heterogeneous and open, more scalable, enhanced MapReduce processing framework - there, that should about cover it. A couple of people in the audience know Haskell!

The high point of this presentation was the anatomy of an investment bank slide, which divided operations into three layers, with inputs being orders, forex, equities and various datafeeds.

The top layer is FGPA-based real-time programs, variously known as algorithmic trading/high-frequency finance/blackbox/robo-trading, involved in arbitrage, trend-following, complex event processing and protocol conversion.

Next down is near-real-time ops, divided into data-intensive and compute-intensive halves. On the data side, there's anti-money laundering, customer relationship management, credit scoring and fraud detection, while on the compute-intensive side, there's exotics/derivative pricing, real-time market risk simulation and forex. Straddling the data and compute sides are counterparty risk and incremental modelling (these distinctions are a little arbitrary to me)

What follows are the batch jobs, that are not quite that urgent. Again, there's the data/compute division, and on the data side is mining of unstructured data, regulatory reporting, sentiment analysis and ETL (whatever acronym that is). Compute includes model backtesting, scenario generation, new product modelling and sensitivity analysis. Deeper counterparty modelling lies in between. All this is backed by plenty of servers, and a custom file system.


Beyond HFT: FPGAs for Analytics

Back to academia. The professor began by introducing his mantra of "How to do Monte Carlo in five microseconds", and continued by comparing GPUs, which are normally known for high throughput and are thus involved in number-crunching over days, to FPGAs, the low latency of which has found them applications in high-frequency trading and data routing, etc.

He further suggested that latency implies throughput but not vice versa, which I thought might not strictly apply, but let slide. In any case, he went on to show how, with quite a bit of spatial recursion (unrolling), he managed to squeeze some 8000 paths out in four microseconds, fulfilling his promise.

Now, he did cover the obvious question of why five microseconds in particular, which was due to it being the (small) window where the FGPA did something before the GPU has time to respond. Well, this probably has some niche applications, though perhaps somewhat less if authorities do the sensible thing and clamp down on trigger-hair trading.


One More Word

"For every Pinochet or Lee Kuan Yew,
there were tens of Mobutus...
"

- Democracy in Retreat, Joshua Kurlantzick


While popping into the cooperative to get a ring file, I spotted a tome titled Democracy in Retreat (from Yale) on the shelves, and bearing in mind last week's coverage decided to stand around as inconspicuously as I could manage, and peruse it for maybe twenty minutes.

The major thrust I got was that the global drive towards democracy had stalled (notably, the Egyptian example of entrenched military influence played out as foreshadowed). Basically, the middle class who have tended to be democracy's biggest supporters have in many newly "democratic" countries not seen the promised increase in wealth, while experiencing instability instead, and therefore elect to trade some liberty for some prosperity.

Notably, such transitioning nations (e.g. Indonesia) often saw a short-tem rise in graft, which the author attributes firstly to corruption becoming decentralised, but also because such incidents finally get reported in the press (ours still hailing questionable stats). Thus, the big question appears to be whether they can get over this hump instead of succumbing to such luminaries as Kim and Obiang, who didn't quite innovate with the declaring-himself-a-god part, but at least stopped short of executions to the strain of Those Were The Days.

Continuing, it appears that the Great Authoritarian Success Story outlier of China, as touted in the book, might not be all it's cracked up to be. One academic makes a convincing case that the already-slowed 7.5% GDP growth remains a fabrication, pointing out that only two out of thirty provinces self-reported GDP rates lower than the national average. Coincidentally, provincial administrators are rewarded for GDP growth (sound very familiar?); apparently, their Prime Minister attempted to cut through the crap by monitoring electricity usage as a proxy, only for those numbers to be fudged too.


Don't bother, nobody's home
(Source: wnd.com)


Returning to the local scene, our scholar-ministers have thankfully become more frank about intentions, even as they throw around tidbits such as minor lapses in public sector procurement (I wonder why) and bonus clawbacks. Tellingly, this comes as a major credit rating agency expressed the obvious - that our inflated housing prices and expanding household debt levels are Not Good, to which the MAS responded by stating that the banks were healthy, revealing their priority concern.

With true structural reform nowhere on the cards, the inadequacies of the CPF system may be beginning to show up - while other countries have been slammed for mismanaging and underfunding their pension kitties, it should be remembered that for many citizens here, their retirement fund is dominated by a single asset class - their flat. However, as things stand, holding fast would simply delay the day of reckoning as high prices are passed on to the next generation, hence the bandage of accelerated immigration (surprising fact of the week: there are more people in Singapore now than the entire Earth, ten thousand years ago)

It's not that I have anything personally against the incumbents, but it is looking as if they are attempting to cling to A+ representation on the back of B+ results. Will they return to their roots and start to lean a little left in earnest? Stay tuned!


Mr. Robo: *rushing in, clutching newspaper* It's an outrage! How dare the responsible local news media say this!

*waves article under noses*

Asia's longest-serving prime minister... is seeking to build a political dynasty, analysts say, with... his son... running in... upcoming general elections... At the same time... other senior members of his ruling... People's... Party... also have their sons contesting the election.

The premier's... sons are senior in the military... hierarchy and their rise has been described by the local press as "meteoric".

"The children from a number of (our party's) senior figures are really talented and educated abroad. They will be the great future leaders of this country. Every young candidate from our party's senior figures holds at least a master's degree from abroad."

- Party member (whose son is running)



Mr. Ham: *sighs* Dear Mr. Robo, I do not think that that piece says what you have taken it to say.

Mr. Robo: Is that right? *reads article more carefully* Oops. Forget I said anything. I'll be taking my leave then.

*slinks off*


Running On Empty

January had me hoping that the sprint sensation of the decade wouldn't be caught up in doping allegations, and it seems that Bolt has remained beyond reproach. Not so his Jamaican teammates, unfortunately, who have become the latest to fail drug tests. A Yahoo! article soon pointed out that most of the other eight men known to have cleared 9.8s had been caught "supplementing" their performances, but in his defence, he could simply be an out-and-out freak (in a good way), being near a full head taller than prototypical compact racers.

It better, or track will take a long time to recover.


Clearing Contemplations

Heisenberg, Gödel, and Chomsky walk into a bar. Heisenberg turns to the other two and says, "Clearly this is a joke, but how can we figure out if it's funny or not?" Gödel replies, "We can't know that because we're inside the joke." Chomsky says, "Of course it's funny. You're just telling it wrong."

- From Reddit


I've been holding off on putting down too many of these, probably unjustifiably given their (lack of) fresh insight, for too long, so here they are:

We might begin with Olber's paradox, particularly its probably least-famous assumption: that the distribution of stars is homogeneous. Otherwise, it can trivially be shown that an infinite number of stars can well produce about as dark a night sky as one desires, by the simple convergence rule of geometric progression.

A second, likely crackpot thought is that if nature is fundamentally discrete, then beyond a certain distance, the assumption of "additive brightness" might not apply. As an illustration, consider if a point source could emit photons only at a finite (but huge) set of angles, e.g. both 17.000...000 and 17.000...001 degrees are possible, but not any values (such as 17.000...0005) between them.

At "close" interstellar distances, any such discreteness would not matter, and modelling expected observations from continuous distributions would work (near-)perfectly. However, when "far enough", such discreteness would imply that photons from some (effectively point-object) stars would never reach Earth, rather than contributing infinitesimally, which the paradox relies on (details depending on parameters, but definitely kicking in at some point). Of course, since the Universe is to the best of our knowledge not infinitely old and moreover expanding, all this is probably moot.

This little example suggests, then, that infinities can get complex, and corralling it in - to any concrete number at all - is some achievement. This does also imply that all knowledge tends to zero, if one assumes that there is no bound on the minimal length of mathematical proofs alone, a wall that is already being bumped against; even for my own experiments, the sheer number of variables involved makes me despair of ever "proving" anything with any generality.

[N.B. There appear (seldom-stated?) relations between classification and (data) compression]

Here, Cantor must come in, for demonstrating that infinites can come in different (possibly discrete) "sizes". He did go mad eventually, together with Gödel, so it might be that there are some things men were never meant to understand; too many titans have found their greatness only in sorrow.

It should be apropos to conclude this section with this riddle, of which I was reminded of by a contest problem (which should moreover have some finance applications as well):

Alice and Bob play a game with the following rules:
  • Alice picks a probability p, 0 <= p < 0.5
  • Bob takes any finite number of counters B.
  • Alice takes any finite number of counters A.

These happen in sequence, so Bob chooses B knowing p, and Alice chooses A knowing p and B.

A series of rounds are then played. Each round, either Bob gives Alice a counter (probability p) or Alice gives Bob a counter (probability 1-p). The game terminates when one player is out of counters, and that player is the loser.

Whom does this game favor? Analyze and discuss probabilities.


Final exercise: Critique this stand.


By Definition

On a slight tangent, mysticism has played its part in scientific discoveries - Cantor identified an Absolute Infinite with God, not neglecting of course Ramanujan, Newton and many others (some of whom might more staidly term it intuition). Therefore, I thought it worthwhile to attempt an analogous scheme for this entity, for reference in measures such as Dawkin's Spectrum, since it is hard to argue for or against something without being clear what that something is, or at least, is not.

Some Proposed Levels of an Almighty

  • G0 - We don't know everything

    [N.B. Perhaps the most minimalist stand possible. There is something, but what properties does it possess? Can it meaningfully be stated to have an identity, be countable, or even to exist in the way we understand? Who knows? Even hardcore atheists would be hard-pressed to refute this, if reasonable, other than by claiming misrepresentation of the usual concept of God. Mystical traditions]

    • G0.1 - Something existed, and it created the universe and the laws of Nature, but does not intervene in the universe's operation

      [Deism (atheists to some theists)]

    • G0.2 - G0.1, but with some expected behaviours/customs

      [Probably most practising deists]

    • G0.3 - G0.2, further identifying this power with Nature and related spirits

      [Some paganism]

  • G1 - A Being(s) exists that is omniscient and omnipotent (and therefore obviously self-aware)

    • G1.1 - G1, and is concerned about human affairs in general

  • G2 - A Being(s) exists that is omniscient, omnipotent, and further demands certain specific practices to be followed, exactly as communicated by certain extraordinary people, that are not to be questioned

    • G2.1 - G2, threatens future consequences if not obeyed, but this offer is open to all

    • G2.2 - G2, but exclusive club, usually along tribal lines


It can be noted that many classical arguments for a Supreme Being at best address G0.X, when what is being defended is at G2.X, but of course the more fundamental definitions are less attractive in usage, being impersonal and unattributable.

But let it not be said that they're not moving with the times.


Ham Lam


But I'm still in shape!


It's therapeutic to spend time with the hamsters. You pick up small things over time, such as how Mr. Ham will not accept food deposited by Mr. Fish from his cheek pouch in a corner if handed to him, but will take fresh rations. Or how Mr. Fish vibrates when held firmly with head against forefinger, and rubbed on the right side. Or how both of them get cranky if not hand-fed once in a while, but don't accept sunflower seeds when served any longer. They got personality.

Mr. Ham: I do hope so. And appreciate your experiments not being... invasive. Oh, and could you fix all of us up with another television?

Mr. Fish: Mr. Ham! Thanks for fixing our sudden hair loss woes at a discount!

Me: That reminds me, I was going to ask you what you were doing, sneaking about with a razor and a chloroform rag.

Mr. Ham: *ignores comment* It was no trouble at all.

Me: Fine, while you're here, Mr. Fish, could you come to Mr. Koala here? That's great, now please insert these earbuds, and hold this letter.

Mr. Ham: Are you setting him up?

Me: No, it's 'cause I passed the Ham Radio Examination!


You paid forty bucks and mugged, just for this throwaway one-liner?
(Radio original source: flickr.com)


Me: It was worth it.

Mr. Robo: How about working on your theham/hamsis, then?

Me: Now that's pushing it too far. Well, it was something to do, I haven't sat for an exam for a couple of years now. The venue was quaint too, on the second level of an old shophouse. Anyhow, I relied on iterating through online tests till I got a feel for the questions, which coupled with the classic figure-it-out-as-we-go-along skills, proved sufficient.

I still wouldn't put equipment together without a guide, though.


Parting Shot (I Liked The Sound Of It)

"Stat rosa pristina nomine, nomina nuda tenemus"

- The Name of the Rose, Umberto Eco




comments (4) - email - share - print - direct link
trackbacks (0) - trackback url


Next: Quickies


Related Posts:
There And Back
Some Explaining
Groundwork
What I Do
Food For Thought

Back to top




4 comments


anonymous said...

One night he's rummaging under the stairs of the family house where he still lives and finds a trunk containing his dad's old ham radio.


July 22, 2013 - 08:41 SGT     

gilbert said...

Well, it could have been worse.


July 22, 2013 - 22:42 SGT     

anonymous said...

The father in 1969 is named Frank Sullivan (Dennis Quaid). He is a firefighter, and he dies heroically while trying to save a life in a warehouse fire. The son in 1999 is named John Sullivan (Jim Caviezel), and he has broken with three generations of family tradition to become a policeman instead of a fireman.

the father and the son can speak to each other across a gap of 30 years. through the ham radio


July 24, 2013 - 10:42 SGT     

gilbert said...

Yup, I Googled that, and noted that it could have been worse because in a movie scenario, it was about as likely to have been a dead body in the trunk.


July 24, 2013 - 11:51 SGT     


Copyright © 2006-2025 GLYS. All Rights Reserved.