![]() |
TCHS 4O 2000 [4o's nonsense] alvinny [2] - csq - edchong jenming - joseph - law meepok - mingqi - pea pengkian [2] - qwergopot - woof xinghao - zhengyu HCJC 01S60 [understated sixzero] andy - edwin - jack jiaqi - peter - rex serena SAF 21SA khenghui - jiaming - jinrui [2] ritchie - vicknesh - zhenhao Others Lwei [2] - shaowei - website links - Alien Loves Predator BloggerSG Cute Overload! Cyanide and Happiness Daily Bunny Hamleto Hattrick Magic: The Gathering The Onion The Order of the Stick Perry Bible Fellowship PvP Online Soccernet Sluggy Freelance The Students' Sketchpad Talk Rock Talking Cock.com Tom the Dancing Bug Wikipedia Wulffmorgenthaler ![]() ![]() ![]() ![]() ![]() ![]() |
bert's blog v1.21 Powered by glolg Programmed with Perl 5.6.1 on Apache/1.3.27 (Red Hat Linux) best viewed at 1024 x 768 resolution on Internet Explorer 6.0+ or Mozilla Firefox 1.5+ entry views: 180 today's page views: 67 (6 mobile) all-time page views: 3248214 most viewed entry: 18739 views most commented entry: 14 comments number of entries: 1215 page created Mon Apr 21, 2025 02:49:48 |
- tagcloud - academics [70] art [8] changelog [49] current events [36] cute stuff [12] gaming [11] music [8] outings [16] philosophy [10] poetry [4] programming [15] rants [5] reviews [8] sport [37] travel [19] work [3] miscellaneous [75] |
- category tags - academics art changelog current events cute stuff gaming miscellaneous music outings philosophy poetry programming rants reviews sport travel work tags in total: 386 |
![]() | ||
|
Not sure why we still bother with the local CNY movies - blatant product placement aside, it's basically one public service announcement skit after another... with obligatory anthropomorphic God(dess) of Fortune (remember last year?). As our subreddit has it, the best bits came right at the end, and mainly because it was, well, finishing up. Babel Nowadays ![]() Tamil too often gets the short end of the stick here (Though there's more than enough to go around) (Source: todayonline.com) I vaguely recall being fascinated by AltaVista Babel Fish a long time ago The torch has - as with so many online services - been passed to the behemoth that is Google for some time now, and they have to their credit not been resting on their laurels. They've transitioned from more straightforward statistical methods* (shout-out to our old n-gram based input system here), to a deep neural network-based system. As a recent presentation shows, they've not been alone in pursuing this direction. You'd not be wrong if you guessed what this entails - gobs of data - but there's admittedly technique further involved. Google's 2016 paper on their Neural Machine Translation (NMT) System (open-source version available) describes the usage of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) with all the optional extras, in an encoder-decoder setup as with Deepfakes. This has lately been upgraded to support direct multilingual translation (i.e. without going through English, as is traditional), by simply including the target language as an input token, leaving all else untouched. [*N.B. A small diversion on the history of machine translation here. Statistical methods were, probably understandably, viewed with heavy suspicion when they first emerged in the Eighties; Mallaby's More Money Than God recounts the opposition faced by Brown and Mercer, when they first applied the Expectation-Maximization (EM) algo to the task. Jelinek, who later employed the duo at IBM, would counter with his famous retort that "every time I fire a linguist, my system's performance improves"; also, Brown & Mercer would later jump to Renaissance Technologies, where they made like a bazillion bucks, so I suppose they had the last laugh too.] But a little pulling back of the curtain here. Although official announcements might give the vibe that machine translation is a solved problem, slightly more involved inspection has to reveal that NMT is not quite up to actual human translators yet... and by some distance. Douglas Hofstadter provides an analysis with just such a conclusion in The Atlantic [Hacker News commentary] which covers most of the bases. Borrowing his German example (translations only):
Obviously (and unlike the example kindly supplied in Google's research blog), the nuance is well off at a minimum, before going into actual errors in conveying meaning. Hofstadter explains the context behind choices such as "undesirables" rather than "odd", and "Pan-Germanistic" instead of "German-National", but the most interesting miss here was perhaps the feminine case "-in", which completely threw the final sentence off. Skipping to the Chinese example, which I can independently corroborate, the translation of "他仍兼管研究生" as "He still holds the post of graduate student." (output slightly altered from Hofstadter's version; seems like Google has been doing some updating) is indeed egregiously wrong - "He still supervises [his] graduate students" would be the right translation as Hofstadter notes, although it can be noted that changing a single character (to "他仍兼是研究生") would bail NMT out. Which returns us to the fundamental complaint about the ongoing A.I. boom - there's scant actual intelligence, as understood in the popular sense, involved. Some of the issues Hofstadter highlighted are perhaps to be expected, if the NMT implementation operates on a sentence level, as is suggested by the paper (in which evaluation was performed on isolated single sentences). As such, even the simplest agreement and reconciliation of terms between sentences would be absent! In this light, it is perhaps a wonder that paragraph-length texts are even comprehensible. Despite there clearly being a lot left to be done, I do disagree with Hofstadter on one point. In his wrap-up, he hopes that true translation, artistic translation, by machines, will not be possible soon. But why? Should not the wisdom of the world be available to all, regardless of what tongue they were born with, and despite what gods might fear? Why should Westerners have to wait to read Jin Yong, for example, when Gu Long borrowed unstintingly from James Bond? There are at least two objections I can muster, the first of which is the impossibility of perfect translation - wordplay, for one, carries badly. And then there are the near-ineffables, like sonder... The second would be the impact on minor languages. It is plausible that multilingualism would become rare, in a world where everyone has their own private translator - why agonize over learning Mandarin (as many local students may question), when one can just translate to it on demand? But then again, it's a trade-off really; if this opens communication between all and sundry - given how much woe lack of mutual intelligibility has historically caused - the loss could be worth it... My personal suspicion here would be that of a future closing of the circle - while connectionism is currently king, the potential of re-integrating symbolic and domain knowledge has yet to catch up. This may perhaps become more apparent, when the shortcomings of "just pump in more! data" become harder to be brushed aside as saturation approaches, since it's unlikely - to me at least - that current architectures are the end-all on this. The Age-Old Debate Guy pays, or split the bill? Our resident alpha male CS prof has the answer, which has as usual sparked lively discourse amongst local netizens (as with his thoughts on the labour situation, which seems to echo latest policy). Anyway, my two satoshis, for the record:
Next: Annual Ritual
|
![]() |
|||||||
![]() Copyright © 2006-2025 GLYS. All Rights Reserved. |