TCHS 4O 2000 [4o's nonsense] alvinny [2] - csq - edchong jenming - joseph - law meepok - mingqi - pea pengkian [2] - qwergopot - woof xinghao - zhengyu HCJC 01S60 [understated sixzero] andy - edwin - jack jiaqi - peter - rex serena SAF 21SA khenghui - jiaming - jinrui [2] ritchie - vicknesh - zhenhao Others Lwei [2] - shaowei - website links - Alien Loves Predator BloggerSG Cute Overload! Cyanide and Happiness Daily Bunny Hamleto Hattrick Magic: The Gathering The Onion The Order of the Stick Perry Bible Fellowship PvP Online Soccernet Sluggy Freelance The Students' Sketchpad Talk Rock Talking Cock.com Tom the Dancing Bug Wikipedia Wulffmorgenthaler |
bert's blog v1.21 Powered by glolg Programmed with Perl 5.6.1 on Apache/1.3.27 (Red Hat Linux) best viewed at 1024 x 768 resolution on Internet Explorer 6.0+ or Mozilla Firefox 1.5+ entry views: 2595 today's page views: 151 (12 mobile) all-time page views: 3207877 most viewed entry: 18739 views most commented entry: 14 comments number of entries: 1203 page created Sun Jan 19, 2025 16:46:07 |
- tagcloud - academics [70] art [8] changelog [49] current events [36] cute stuff [12] gaming [11] music [8] outings [16] philosophy [10] poetry [4] programming [15] rants [5] reviews [8] sport [37] travel [19] work [3] miscellaneous [75] |
- category tags - academics art changelog current events cute stuff gaming miscellaneous music outings philosophy poetry programming rants reviews sport travel work tags in total: 386 |
|
As my main line of research has gotten a little bogged down, I resolved to clear my mind by applying it to a new challenge, inspired by how #%*@! frustrating entering even relatively short sentences on a smartphone virtual keyboard can be. A few first measurements: the HTC One V's screen is approximately 4.8cm by 8cm, but with only about 6.8cm of the length available for icons; this works out to 2.04 square cm at a maximum for each of the sixteen icon slots, which is not all that much to begin with (whip out a ruler and take a look). As unappealing as that is, the size of individual keys on the default virtual keyboard is far worse - just 0.4cm by 0.8cm each, or just 0.32 square cm. Now, let us face it, mashing keys that tiny with any sort of speed and accuracy and without any tactile feedback is a right chore; whatever else one might say about traditional handphones, their raised keys at least made multitasking input possible, which has moreover brought some measure of fame to our nation. Sure, it still can't quite match up to a real keyboard (41 seconds for 160 characters [or 32 words using the standard five-character-per-word translation], making it only about a third as fast as the best typists), but it could be done. Such feats are no longer possible... or are they? Fast or correct; choose one (Source: damnyouautocorrect.com) Given the ubiquity of smartphones nowadays (around a billion exist either now or in a few years, depending on who you listen to), any way to input data more quickly (and comprehensibly) onto them must have enormous potential market upside, both for the creator of the method and the platforms that adopt it - and these miserable 0.32 square centimetre things are the best we have? Examining the currently-available options:
For the following explorations, Mr. Robo has gathered some resources. The full bigram statistics are gathered from the supplementary material to "Case-sensitive letter and bigram frequency counts from large-scale English corpora" (Jones, 2004, maybe free to access someday?), Usenet column. The statistics on the ten thousand most popular English words (making up 80.59% of all word occurances) were gathered from the Project Gutenberg list on Wiktionary, and while slightly dated should be good enough for our demonstrations. While we have messed about with bigrams here before, a complete overview has yet to be presented. This deficiency shall be made up now: All bigram frequencies Each row represents the bigrams starting with the same letter; the darker the green, the more often the second letter follows the first. Note "ju" and "qu", of especial interest to Scrabble players, as well as the columns corresponding to vowels. Notably, bigram statistics inferred from the Gutenberg corpus have far less detail than Jones', with a full 221 of the 676 bigrams not represented at all in the 10000 words, though it should be said that a similar proportion of Jones' bigrams have negligible counts (<0.005%), with "qz" taking the prize for rarest bigram. This discrepancy may underline the need for a truly comprehensive corpus in production apps. It should also be called to attention that the highly-popular "th" occurs more than twice as often in practice as a small corpus might suggest (1.52%), making up 4.56% in Gutenberg and 3.01% in Jones' Usenet count. This might be attributable to the small corpus not taking into account the relative frequencies of the words contained within. So might this information be exploited for autocorrection? Quite possibly. We can cast input as a classical noisy model, where the user's actual intention may not be captured properly by the touchscreen. While actual readings are more complex, involving irregular shapes, we simplify each touch to produce a circular margin of error, with the screen then randomly picking any point within the circle. To reflect rapid typing being less precise, I together with Mr. Robo have further set this margin of error to diminish over time to some minimum in the simulation to follow. But first, some intro music Reverse Hamgineering But how then is correction of text achieved? While we do not pretend to replicate current professional implementations, we believe that some basics are sufficient for a workable demo, for which only two kinds of probabilities need be considered:
When the first letter is entered, we do not a priori have reason to suspect it is not correct, since single letters all have non-negligible probabilities of appearing - the user might well intend to begin his word with "y", yes, y not? However, once the second letter is entered, we do have a much better idea of what is going on. "yh" happens to be seen very infrequently - about 0.002% of the time only, to be exact. We might then try to find more plausible alternatives, and "th", as it happens, has a far greater 3.01% probability. But is "th" in fact acceptable? Well, "t" is right next to "y" on the keyboard, and we may apply a small probability, say 10%, that a received "y" is meant as a "t", instead of a 70% probability of actually being "y". With this, the joint probability of "th" becomes 0.1*0.0301=0.00301, which is still much bigger than the 0.7*0.00002=0.000014 of "yh". This idea can be extended to longer strings by basic dynamic programming, and as a finishing step, the most probable candidates obtained after this can be polished up with a dictionary, since bigrams represent only extremely local information. The live Javascript demo follows - to use it, center the green circle over the intended letter and click, and errors simulating the fat finger effect will be introduced automatically. The demo is, of course, limited - for one, it uses only a static and slightly archaic 10000-word dictionary, so good luck trying to correct for words like "processor". Also, it was designed only to check with words of the same length as the input, instead of predicting. For example, the input "th" gives alternatives "ty" and "ti", which while reasonable guesses are probably inferior to offering "the" and "this". Incorporating such predictions is not that difficult - just one more layer of probabilities - and is left as an exercise to the reader. Blindly Hammering Away Now, this is old news, and I gather the autocorrect in some word processors may do something like this. What about some of the more exciting newer inventions? Take blind typing: how can a user just press at approximate positions, perhaps not even on any keys, and get his word correctly predicted? While we have no idea how the makers of BlindType and the like really do it in their apps, we can offer at least one suggestion. Note that even when a user is typing without looking, assuming he has memorized the QWERTY keyboard layout (not too unlikely for most), there are still certain invariants that can be exploited. Let us say that he touches some roughly central point - then, if we assume he is blind-typing, we cannot guess what letter that represents with any confidence. However, say that he then presses a point at a roughly four o'clock position, down and to the right of the original point. This immediately tells us that the second point is probably not on the top row, and also that the first point is then probably not on the bottom row. As the user continues marking more points, assuming that his internal vision of the keyboard is roughly correct, we can thereby get an increasingly better idea of what he is trying to input. For example, consider the word "this" again. It has four letters, and therefore three moves: "t" to "h", "h" to "i" and "i" to "s". Each of these moves can be represented by an angle, or bearing, for which we follow standard mathematical practice of zero degrees to due east, and going counter-clockwise. Then, the bearings are 311°, 47° and 197°. It is easy to see that every word has a corresponding bearing signature; then, assuming that the user has touched the screen a correct number of times (as many as there are letters), it remains only to match the signature he has generated with a dictionary of precomputed signatures. There are of course many minor enhancements and considerations, such as having near-enough touches assumed to be the same letter, but this is the gist of it. The algorithm may be tested with the prototype above. The three closest matches will be displayed once two or more clicks (touches) are made. The efficacy of typing blind can be tested by clicking the "Do it Blind!" button, which removes the keyboard display. You should note that it is not necessary to click anywhere near the actual keys for good predictions to be made - as long as the angles are right, large movements are not required. Clearly, this method does have its limitations too. For one, many words (particularly shorter ones) may share very similar signatures, in particular those with letters all on the same row, such as "try" - currently, it is mispredicted as "has", since we consider only the angles, and not the relative distances, for now. This should however be a fairly trivial fix to add. Have a try? A bigger issue for blind typing is that the top prediction should be very highly accurate, since the user is assumed to continue typing new words without stopping to correct them. Therefore, probability considerations at the word and sentence level should be a given, in actual products. Finally, it can be observed that this idea can be adapted to swipe-type input; the problem is complicated by there not being distinct touches to indicate the number of letters, but this is mitigated by being able to rely on actual key positions once more. Some letters can then be distinguished by a change in direction of the swiping motion, and the trick then would be to decide which letters to keep when they are passed over in a straight line - for example, "pit" and "pot" would result in exactly the same path traced, so again additional context has to be used to distinguish them. Mr. Robo has informed me that he has a couple of promising new methods for keyboard input, which however will take a bit more time to come to fruition. More to look forward to for the weekend! Next: Least Publishable Unit
Trackback by armed Forces life insurance
Trackback by online dating site
Trackback by coconut oil shampoo
|
||||||||||||||
Copyright © 2006-2025 GLYS. All Rights Reserved. |