[bert's blog - Doublespeak]

TCHS 4O 2000 [4o's nonsense]

alvinny [2] - csq - edchong
jenming - joseph - law
meepok - mingqi - pea
pengkian [2] - qwergopot - woof
xinghao - zhengyu

HCJC 01S60 [understated sixzero]

andy - edwin - jack
jiaqi - peter - rex
serena

SAF 21SA

khenghui - jiaming - jinrui [2]
ritchie - vicknesh - zhenhao

Others
Lwei [2] - shaowei

- website links -

Alien Loves Predator
BloggerSG
Cute Overload!
Cyanide and Happiness
Daily Bunny
Hamleto
Hattrick
Magic: The Gathering
The Onion
The Order of the Stick
Perry Bible Fellowship
PvP Online
Soccernet
Sluggy Freelance
The Students' Sketchpad
Talk Rock
Talking Cock.com
Tom the Dancing Bug
Wikipedia
Wulffmorgenthaler

bert's blog v1.21
Powered by glolg
Programmed with Perl 5.6.1
on Apache/1.3.27 (Red Hat Linux)

best viewed at 1024 x 768 resolution
on Internet Explorer 6.0+
or Mozilla Firefox 1.5+

entry views: 272
today's page views: 619 (20 mobile)
all-time page views: 3386660

most viewed entry: 18739 views
most commented entry: 14 comments
number of entries: 1226

page created Fri Jun 20, 2025 14:36:26

- tagcloud -

academics [70]
art [8]
changelog [49]
current events [36]
cute stuff [12]
gaming [11]
music [8]
outings [16]
philosophy [10]
poetry [4]
programming [15]
rants [5]
reviews [8]
sport [37]
travel [19]
work [3]

miscellaneous [75]

- category tags -

academics art changelog current events cute stuff gaming miscellaneous music outings philosophy poetry programming rants reviews sport travel work

tags in total: 386

i am now probably: reading the papers [?]
(status updated every 30 minutes)

name: Lim Yong San, Gilbert -

gender: Male
nationality: Singaporean
race: Chinese
dob: 25^th January 1984

height: 1.74m (5'8½")
weight: 67kg (147 pounds)
blood type: A+

Download full resume [PDF] [DOC]
currently: National University of Singapore
(studying Computer Science & Economics)
tertiary: Hwa Chong Junior College*
secondary: The Chinese High School*
* merged into Hwa Chong Institution in 2005
primary: Shuqun Primary School
pre: Jurong Christian Church Kindergarten

fav colour: Green
fav soccer clubs: Manchester United,
Brighton and Hove Albion,
English National Team
hobbies: Many (a few in no particular order:)
reading & writing
programming (sometimes)
webgame timesin ks
DotA
kicking ping-pong balls
all manner of sports involving balls
sleeping

Sunday, Feb 18, 2018 - 00:20 SGT
Posted By: Gilbert

Doublespeak
Not sure why we still bother with the local CNY movies - blatant product placement aside, it's basically one public service announcement skit after another... with obligatory anthropomorphic God(dess) of Fortune (remember last year?). As our subreddit has it, the best bits came right at the end, and mainly because it was, well, finishing up.

Babel Nowadays

Lau Pa Sani? Never heard of it.

Tamil too often gets the short end of the stick here
(Though there's more than enough to go around)
(Source: todayonline.com)

I vaguely recall being fascinated by AltaVista Babel Fish a long time ago ~~(it appears, sadly, to have completely vanished, a by-product of having the misfortune to have been palmed off on Yahoo!)~~ [Edit Feb 20: never neglect the obvious: babelfish.com] - type in a phrase, and it'd spit a translation back out, just like that! Back-translation readily exposed its many limitations, but I was easily impressed back then.

The torch has - as with so many online services - been passed to the behemoth that is Google for some time now, and they have to their credit not been resting on their laurels. They've transitioned from more straightforward statistical methods* (shout-out to our old n-gram based input system here), to a deep neural network-based system. As a recent presentation shows, they've not been alone in pursuing this direction.

You'd not be wrong if you guessed what this entails - gobs of data - but there's admittedly technique further involved. Google's 2016 paper on their Neural Machine Translation (NMT) System (open-source version available) describes the usage of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) with all the optional extras, in an encoder-decoder setup as with Deepfakes. This has lately been upgraded to support direct multilingual translation (i.e. without going through English, as is traditional), by simply including the target language as an input token, leaving all else untouched.

[*N.B. A small diversion on the history of machine translation here. Statistical methods were, probably understandably, viewed with heavy suspicion when they first emerged in the Eighties; Mallaby's More Money Than God recounts the opposition faced by Brown and Mercer, when they first applied the Expectation-Maximization (EM) algo to the task. Jelinek, who later employed the duo at IBM, would counter with his famous retort that "every time I fire a linguist, my system's performance improves"; also, Brown & Mercer would later jump to Renaissance Technologies, where they made like a bazillion bucks, so I suppose they had the last laugh too.]

But a little pulling back of the curtain here. Although official announcements might give the vibe that machine translation is a solved problem, slightly more involved inspection has to reveal that NMT is not quite up to actual human translators yet... and by some distance. Douglas Hofstadter provides an analysis with just such a conclusion in The Atlantic [Hacker News commentary] which covers most of the bases. Borrowing his German example (translations only):

Hofstadter

After the defeat, many professors with Pan-Germanistic leanings, who by that time constituted the majority of the faculty, considered it pretty much their duty to protect the institutions of higher learning from "undesirables." The most likely to be dismissed were young scholars who had not yet earned the right to teach university classes. As for female scholars, well, they had no place in the system at all; nothing was clearer than that.

Google NMT

After the lost war, many German-National professors, meanwhile the majority in the faculty, saw themselves as their duty to keep the universities from the "odd"; Young scientists were most vulnerable before their habilitation. And scientists did not question anyway; There were few of them.

Obviously (and unlike the example kindly supplied in Google's research blog), the nuance is well off at a minimum, before going into actual errors in conveying meaning. Hofstadter explains the context behind choices such as "undesirables" rather than "odd", and "Pan-Germanistic" instead of "German-National", but the most interesting miss here was perhaps the feminine case "-in", which completely threw the final sentence off. Skipping to the Chinese example, which I can independently corroborate, the translation of "他仍兼管研究生" as "He still holds the post of graduate student." (output slightly altered from Hofstadter's version; seems like Google has been doing some updating) is indeed egregiously wrong - "He still supervises [his] graduate students" would be the right translation as Hofstadter notes, although it can be noted that changing a single character (to "他仍兼是研究生") would bail NMT out.

Which returns us to the fundamental complaint about the ongoing A.I. boom - there's scant actual intelligence, as understood in the popular sense, involved. Some of the issues Hofstadter highlighted are perhaps to be expected, if the NMT implementation operates on a sentence level, as is suggested by the paper (in which evaluation was performed on isolated single sentences). As such, even the simplest agreement and reconciliation of terms between sentences would be absent! In this light, it is perhaps a wonder that paragraph-length texts are even comprehensible.

Despite there clearly being a lot left to be done, I do disagree with Hofstadter on one point. In his wrap-up, he hopes that true translation, artistic translation, by machines, will not be possible soon. But why? Should not the wisdom of the world be available to all, regardless of what tongue they were born with, and despite what gods might fear? Why should Westerners have to wait to read Jin Yong, for example, when Gu Long borrowed unstintingly from James Bond? There are at least two objections I can muster, the first of which is the impossibility of perfect translation - wordplay, for one, carries badly. And then there are the near-ineffables, like sonder...

The second would be the impact on minor languages. It is plausible that multilingualism would become rare, in a world where everyone has their own private translator - why agonize over learning Mandarin (as many local students may question), when one can just translate to it on demand? But then again, it's a trade-off really; if this opens communication between all and sundry - given how much woe lack of mutual intelligibility has historically caused - the loss could be worth it...

My personal suspicion here would be that of a future closing of the circle - while connectionism is currently king, the potential of re-integrating symbolic and domain knowledge has yet to catch up. This may perhaps become more apparent, when the shortcomings of "just pump in more! data" become harder to be brushed aside as saturation approaches, since it's unlikely - to me at least - that current architectures are the end-all on this.

The Age-Old Debate

Guy pays, or split the bill? Our resident alpha male CS prof has the answer, which has as usual sparked lively discourse amongst local netizens (as with his thoughts on the labour situation, which seems to echo latest policy).

Anyway, my two satoshis, for the record:

I do agree: guy pays. However, the bigger issue is probably whether this even becomes a point of contention. It's not a problem if the lady wants to pay her share, but it's a problem if such a minor thing manages to drive a wedge (then again, I may just be conditioned - my hamsters never pay)
A related observation: while academia leans left in general, the hard sciences (and math) do seem to be more conservative than the humanities. The balance in the force has got to be maintained, I suppose

Next: Annual Ritual

Related Posts:

Gift Of The GEB
Week In Review
Economics Thus Far
Final Days In Italy
Staying On The Move