Powered by glolg
Display Preferences Most Recent Entries Chatterbox Blog Links Site Statistics Category Tags About Me, Myself and Gilbert XML RSS Feed
Thursday, July 26, 2012 - 02:44 SGT
Posted By: Gilbert

Double Tap


I'd Tap It

The first of Mr. Robo's deliberations towards reliable blind texting is: if it ain't broke, don't fix it (and don't let an existing tech giant buy it out)! Millions are using multi-tap and T9 happily enough one-thumbed, and may well wish to keep to the familiar instead of putting their faith in miniscule keys. Apparently some developers have also recognized that - an iPhone multi-tap app has been released barely more than a month ago, while various Android apps also offer the old-school keypad as an option.

Clearly, a direct port is quite easily accomplished; however, as mentioned, the smartphone interface departs significantly from old cellphones in having no tactile feedback - therefore, for blind multi-tapping, the app would have to infer the keys by the relative movement of the thumb.

We might expect this process not to be perfect without physical keys as guidance, though, and it may be that ambiguous input is then the norm - for example, when a user moves down and to the right from after pressing the "2" key (which corresponds to the letters a, b and c in multi-tap), his actual intention might be to reach any of "4", "5", "8" or "9". One approach would then to try and dynamically estimate the position and size of the user's imagined keypad from input, as the creators of BlindType did, but before that it should be asked if this would be reliable - or even necessary.

A little background now: users of the T9 system should well be aware that certain words require the same input pattern, and are thus textonyms. Wikipedia conveniently supplies the example pair "home"/"good", which both arise from the sequence 4663. Obviously, this is a quite unavoidable consequence of the setup, which might be resolved in several ways, some of which will be discussed later.

Mr. Robo's proposal is then to adapt multi-tap, for two good reasons - firstly, T9 and the like are patented, and secondly, while maintaining directionality on a featureless touchscreen while typing blind may be iffy, entering the correct number of taps, even without looking, can be relied upon much more confidently. Recall the number of taps needed for each letter, in the traditional multi-tap system:

TapsLettersTotal Frequency*
1a, d, g, j, m, p, t, w30.562%
2b, e, h, k, n, q, u, x30.790%
3c, f, i, l, o, r, v, y32.568%
4s, z6.407%

[* From Wikipedia - adds up to 100.327% (shrugs)]

Therefore, 2.15 taps are required for each letter, averaged over many text messages. Note that this alphabetical arrangement is likely not the most efficient for true multi-tap entry, which would be faster if the eight most popular keys were assigned single taps, then the next eight most popular, double taps, and so on (in the spirit of Huffman coding) - the average number of taps per letter would then fall to about 1.47, and the separation of the most popular letters might moreover aid predictive algorithms (slightly)

While this might suggest that multikey input would then take twice as long as standard QWERTY keyboard smartphone input, which assigns one key, and therefore one tap, to each individual letter, Mr. Robo posits that this may not be the case. For one, he suspects that there will be time savings in not having to maneuver fingers over tiny keys, or alternatively backspace furiously after mistakes made in haste (for more theory behind this, see Fitt's Law)

We now come back to the problem of ambiguous direction, and Mr. Robo has a quite outrageous suggestion - why not completely disregard the direction moved, and concentrate solely on the number of taps? Therefore, for example, "home" would be input as tap-tap-[move]-tap-tap-tap-[move]-tap-[move]-tap-tap-[swipe], where [move] denotes moving the finger off the screen for some minimal distance in any direction (possibly corresponding to the keypad, if that helps the user), and [swipe] denotes swiping the finger on the screen, again for some minimal distance in any direction.


Taking even the dashes out of Morse!
(Original source: flickr.com; Font: fontspace.com)


Obviously, this would then mean that a multi-tap sequence becomes a one-to-many mapping, from a one-to-one. "kite", among other words, would have the same sequence as "home". It is easily seen that there are in practice only slightly more than 3n unique sequences available to cater for all words of length n. For n=4, that would be barely more than eighty.

First, the bad news. There are clearly far more than eighty four-letter words (Mr. Ham can rattle off that many impolite ones without thinking) - common dictionaries tend to list at least around 2000 of them, though this is thankfully still far less than the over 450000 possible theoretically.

Next, the slightly better news. In general, language models follow Zipf's Law, which states that the frequency of a word will be approximately inversely proportional to its rank, i.e. the second most common word will only be seen about half as often as the most common one, and even the tenth most common word will be seen only 10% as often; if this law holds, then even with 2000 possible four-letter words, a mere 34 of them will comprise half of all observations, a magnitude bearing far more resemblance to the number of available sequences.

The situation improves further with more letters - while the most common word length in dictionaries seems to be about nine, there would be about 20000 unique sequences to describe words of that length, whereas even generously assuming 64000 possible nine-letter words would imply that just 190 of them would comprise half of all observations - a tiny fraction of the available space! [N.B. As a rule, the longer the word, the easier it is for a predictive algorithm to fill in, despite it appearing more impressive at first to guess a huge word than a short one - watch out for this in promo videos]


Word count stats from the Normal and Reverse English Word List


So, the major challenge remains with recognizing shorter words correctly, and unfortunately (but fortunately for texters), short words are quite common - the average length of an English word in actual usage is about five (but it depends). Apparently, we need more data - can we get it?

Happily, yes, and as it happens all links lead to Google. Back in 2006, they released a one trillion word corpus, following the dictum of data is king, which completely dwarfs the Gutenberg (~10k words) and Jones (from ~100 million words) corpora we have tried. Put into perspective, it's as if each word in Jones was multiplied by each word in Gutenberg.

Among other yummy data, it has a complete list of bigram (all 676) and trigram (all 17576) frequencies, with even the least common bigram ("jq" here) appearing nearly three million times! The differences are minor as expected, though there is a maximum disagreement of over 900% (for "qi") in relative frequencies and "in" (2.02%, from 2.36% in Jones) overtakes "th" (2.00%, from 3.02%) as the most popular bigram, as 94% of the bigrams differ by less than 50%.


Mr. Ham isn't too big on spellcheck
(Original source: wikimedia.org)


Mr. Ham: *rushing in* No! No trigrams!

Me: *sighs* What is it this time, Mr. Ham?

Mr. Ham: Have you heard what they are saying about us? "The hamster is greedy as well as cowardly; it has five skills but all are inferior". Well I never! Esquire Pants will take the progenitors of such libel for every last penny they've got! Where is the justified outrage when you need it?

Me: Ignore him.

Mr. Ham: And did you know they borrowed the whole thing from Morse code? If you take the first eight trigrams you get "owkugrds", an anagram of "gud works", clearly an inside joke, as my lecturer told us in HIT!

Mr. Robo: I'm impressed! You were at the Hamster Institute of Technology?

Mr. Ham: What? No, no, it's HIT101, a course where we learnt how to solve problems by hitting them, again and again, until they are rectified. One of my favourites, got an A in it.

Mr. Robo: ...You're right. I should have ignored him.



So back to the tapping. With a single tap before a space, we might suspect that it almost certainly means "a". Two taps is more complicated, with "b", "k", "u" and to a lesser extent "e" all used in texting shorthand, while three taps has two main challengers, "c and "i. On to two letter words, with the most popular 333333 words of the Trillion Word corpus (henceforth, 333333-T) as the dictionary:

1234
1 [5.55%]
at - 50.92% (50.92%)
pm - 13.55% (64.47%)
am - 12.92% (77.39%)
ad - 1.77% (79.16%)
pa - 1.40% (80.56%)
[5.53%]
an - 34.16% (34.16%)
we - 31.29% (65.45%)
me - 12.75% (78.20%)
de - 7.08% (85.28%)
tx - 1.36% (86.64%)
[19.70%]
to - 76.60% (76.60%)
my - 6.69% (83.28%)
do - 6.00% (89.28%)
go - 2.66% (91.94%)
tv - 1.01% (92.95%)
[3.05%]
as - 91.59% (91.59%)
ms - 2.79% (94.37%)
az - 2.00% (96.37%)
ds - 0.88% (97.25%)
ps - 0.83% (98.08%)
2 [2.14%]
up - 48.27% (48.27%)
et - 6.06% (54.33%)
hp - 3.82% (58.15%)
eg - 3.76% (61.90%)
ed - 3.01% (64.91%)
[4.99%]
be - 59.78% (59.78%)
he - 21.00% (80.78%)
uk - 5.80% (86.58%)
en - 2.31% (88.90%)
un - 1.21% (90.11%)
[6.14%]
by - 67.84% (67.84%)
no - 18.98% (86.81%)
ny - 1.54% (88.35%)
hi - 1.37% (89.72%)
el - 1.20% (90.92%)
[1.69%]
us - 90.58% (90.58%)
es - 2.30% (92.87%)
ks - 1.30% (94.17%)
nz - 1.14% (95.31%)
ns - 0.98% (96.29%)
3 [5.12%]
it - 68.32% (68.32%)
cd - 4.64% (72.95%)
la - 3.84% (76.79%)
ca - 3.17% (79.95%)
id - 2.91% (82.87%)
[16.44%]
in - 64.06% (64.06%)
on - 28.37% (92.43%)
re - 3.26% (95.69%)
oh - 0.61% (96.31%)
ok - 0.52% (96.83%)
[22.10%]
of - 73.99% (73.99%)
or - 14.57% (88.56%)
if - 6.38% (94.94%)
ii - 0.75% (95.69%)
co - 0.64% (96.34%)
[6.09%]
is - 96.09% (96.09%)
os - 0.91% (97.00%)
vs - 0.76% (97.76%)
oz - 0.61% (98.36%)
rs - 0.57% (98.94%)
4 [0.31%]
st - 41.23% (41.23%)
sa - 14.96% (56.19%)
sd - 11.55% (67.74%)
sp - 9.36% (77.10%)
sw - 7.11% (84.22%)
[0.15%]
se - 33.20% (33.20%)
su - 13.79% (46.98%)
sb - 11.28% (58.26%)
sh - 10.50% (68.76%)
sk - 6.22% (74.98%)
[0.97%]
so - 84.69% (84.69%)
sc - 5.14% (89.83%)
sf - 2.45% (92.28%)
si - 2.42% (94.71%)
sr - 2.07% (96.78%)
[0.04%]
ss - 75.69% (75.69%)
sz - 13.86% (89.55%)
zz - 7.45% (97.00%)
zs - 3.00% (100.00%)


For each tap sequence, the top five corresponding words are shown, with individual probabilities, and cumulative probabilities in brackets. It can be seen that for almost all tap sequences of length two, the most probable translation is at least twice as likely as the second most probable one. The glaring outlier is "12", which is almost as likely to mean "we" as "an", with "me" quite plausible too. Further, just the top five options will suffice for prediction at least 65% of the time, with "21" the most ambiguous in this respect.

This applies also to the distribution of the tap sequences themselves, with three of them, "33" (of, or, if...), "13" (to, my, do, go...) and "32" (in, on, re, oh, ok...) making up more than half of all tap sequences of length two (though one might suspect that a specialized texting dictionary would put "ok" at higher than just 0.52%). As a crude measure of serviceability, the top option will be correct 66.06% of the time, with a standard deviation of 19.49.

LengthMeanStddevWordsSeqsRatio
266.06%19.496761642.25
347.75%24.281297764202.77
448.78%22.3431140256121.64
557.20%25.303993394342.35
664.83%25.7149040302816.20
772.99%24.955011782866.05
882.71%21.7544551170872.61
991.49%16.2435447232991.52
1096.32%11.0626100219201.19


As foreshadowed, prediction becomes surer the more letters there are - with three letters, over 12000 words have to squeeze into just 64 sequences, a ratio of over 200 words per sequence, while for ten letters, the 26100 words fit into 21920 sequences, or 1.19 words to a sequence, many of which can be attributed to forms of the same word taking suffixes like "le" and "ly", which transform to the same sequence.

Unfortunately, we suspect that many text messages are, indeed, short with short words, and therefore many more inferences have to be made, and/or a custom dictionary slowly built up. For example, the sequence "2121" has a raw 49.1% probability of being "hand" from the corpus, and just 1.71% of being "haha", though in the texting context, "haha" should probably be the far better bet if it were the first word.

There are a couple of ways out of this. The first would be to simply revert to using directionality information by keeping an estimate of the user's vision of the keypad. One standard approach would be the iterative one, where the starting grid-based model is re-estimated as input comes in, and updated by estimating its accuracy online with further input. This is the approach taken by Findlater and Wobbrock in their recent 2012 paper, though that is targeted more towards two-handed normal-sized virtual keyboards.

The other would be to to use more corpora information. Consider two phrases with identical tap sequences, "I bet him at" and "I met him at" - it seems that the second should be the right interpretation nearly all of the time, but how can this be substantiated? Well, we might note that the former only returns about 700 thousand results on Google (which might be thought of as the biggest corpus on Earth), while the latter returns 11.3 million results.

This does however require a reliable, fast Internet connection (or incredible advances in storage) to be feasible even with the number of phrasal candidates winnowed down with various algorithms, and therefore might be a little ahead of its time.

One possible adaptation would then be randomized sampling, where the probabilities are approximated using word pairs within some distance. For example, if "win" were to appear soon after the phrase, we might think "bet" more likely, in the hope of learning from enough weak hypotheses (think Adaboost). Explorations along this line will have to wait, though, as Mr. Robo moves on to his next, more promising work, born after some fruitful discussion with yours truly.


One Move Ahead

One observation made is that currently popular virtual keyboards incorporate probability information, but tend to correct after the fact, as demonstrated last week. In a way, they might be thought of as adjusting the size of the keys behind-the-scenes if the noise modelling is not discrete, which Gunawardana et al. explain in a 2010 paper.

Now, obscuring the input mechanism goes against the WYSIWYG principle and may lead to frustrated users trying to second-guess the system, which some are better at than others. The obvious-as-dirt extension is to actually reflect the increased size of keys visually, which many authors have indeed suggested over the years under different names - BigKey, CATKey, FloodKey and SpreadKey, among others, whether by prior customization or active prediction.

However, we believe that we have some novel contributions - let us get this out of the way first. Unlike BigKey, there is no fixed bound on the number of suggestions, and there is no occulsion. Unlike FloodKey, adjustments are made in a regimented manner instead of utilizing a less-structured Voronoi diagram, thereby retaining a cleaner interface and restricting visual search by the user along one fixed dimension (the rows), which is also not obeyed in SpreadKey.

The basic idea remains that over the course of typing, letters can be predicted, and as a consequence precious space on small touchscreen keyboards can be safely reallocated from unlikely keys to likelier ones. How well can predictions be made? Well, in the spirit of the tap sequence analysis above...


Mr. Ham: Wait! Here's a napkin, do your calculations there.

Mr. Robo: Huh?

Mr. Ham: I tell you, this is a must - nobody ever accomplished anything great on letter-sized paper in all of the movies and television shows that I've watched! Look, at the minimum, use the back of this envelope, I carry a stack of them around nowadays just in case.


It's slightly used, but I'm sure there's enough room left
(Source: flickr.com)


Mr. Robo: Get thee away from me!

How large should the keys become ideally? A relevant study back in 2006 suggested that keys about 0.92cm each side would be optimal. Touchscreen research goes far further back, though, and a review by Sears et. al back in 1992 reports a consensus of about 2.2cm a side for two-handed input, and unsurprisingly notes a falloff in accuracy and slowdown in typing speed with decreasing key size.

More recently, Apple's iOS guidelines suggest a minimum of 44 pixels square for each item, which translates to about 0.71cm, or some 58% larger than our 0.4cm-wide Android keys. Personally, I almost always miss to the left or right only, and would further suspect that this width is dangerously close to where accuracy falls off into unusable territory, but filler empirical tests will have to wait. Either way, the conclusion seems inescapable - bigger equals good.


Mr. Robo: *looks around* Is Mr. Ham gone?

Me: Looks like it.

Mr. Robo: Okay, continuing...


The newly-available trigram data further allows us to get an idea of how greatly a letter can affect the probability of the second letter after it. It can first be noted that the bigram count is always greater then the combined trigram count with that bigram as prefix, indicating that there are unaccounted-for trigrams involving punctuation - this does seem more common with the less-popular bigrams, though, with "hz" not followed by a letter about 57% of the time, for instance.

We can quantify the value added by noting that in a pure bigram model, the probability of a three-character string xyz has to be estimated as P(xy)*P(yz), whereas with trigram data, P(xyz) is given directly. These values can be starkly different - while "tr" (0.40%) and "rt" (0.47%) are fairly common bigrams, their combination "trt" occurs only 0.0002% of the time in reality, ten times less than what we might expect from multiplying bigram probabilities.

Put another way, in a pure bigram model, our expectation of receiving a "t" following "tr" is exactly that of receiving a "t" after an "r", which is P(rt)/[P(ra)+P(rb)+...+P(rz)]=0.070, i.e. more likely than the average letter. However, with a trigram model, this is refined to P(trt)/[P(tra)+...+P(trz)]=0.00044, which is far more in keeping with reality.

There is, of course, nothing to stop us from generating quadgrams, quingrams, etc from the corpus, excepting storage space concerns, since single-character lookup time should be constant. As a precomputed optimized table would require just one byte for each n-gram (probabilities stored at about 0.4% resolution using that byte; memory address computed on the fly), a full quadgram table would take up just 0.5MB and a quingram table, 12MB, which should still be within acceptable download sizes even for mobile apps.

More quantification, using n-gram data inferred from the 333333-T corpus (excluding words with non-letter characters), and used to predict the same; note that there are necessarily always more (n-1)-grams than n-grams for every n, with some 2.16 million individual bigrams, but only about 1.17 million quingrams:



As is expected, with more information about the preceding letters, the confidence of guessing the next letter increases assuming a fixed lexicon. Knowing only the immediately preceding letter (a bigram model) gives only about 20% confidence on average for predicting the current letter, but knowing four preceding letters (a quingram model) raises this confidence to over 70%!

Additionally, even using just a bigram model, the most likely 13 letters account for 93.63% of usage, which strongly suggests that we can increase the display size of around half the letters, and be quite confident of helping the user, and this rises to 97.04% with a trigram model. The reality is for once probably even more promising, since the 333333-T corpus appears not to have been winnowed for misspellings.

A minor obstacle would be that the keyboard rows are not utilized equally - the top row is used about 46% (333333-T) to 52% (Concise Oxford) of the time, the middle row some 35%, and the bottom row just 15% to 19%, which can however be partially explained by the different number of letters on each row; still, crowding should not be a huge issue, with the most probable predicted letters seldom clustering - with the trigram model, having all top five predictions on the same row occurs only about 0.6% of the time, with an about 77% chance that at most three of them are on the same row.

Now, the nasty question - what about those rare occasions where the user actually wants to input an unlikely letter (e.g. some proper names, codewords)? There are quite a few ways of resolving this, but we shall reserve what we think as the best one for now, and present another idea: for the cases where the desired key is shrunk, we can use a drag-selection system, where the user drags his finger across the row and gets the secondary keys highlighted. These keys can then be selected by releasing then, but this should be rare.

But enough of the yakking, behold, the prototype:



[N.B. An earlier attempt with keys resizing more aggressively was abandoned, since additional effort was required to search for keys which may have moved some distance from their standard position]

As can be seen, the keys have become far larger, with the most likely primary letters in white; each of these letters occupy a space at least 50% wider than the default, often much larger, making them easier to hit. The less likely letters are in grey, and as mentioned can be selected by dragging and releasing a finger (mouse here) when over them.

The current procedure is as follows: using quadgram information (including spaces, computed from scratch from the 333333-T corpus), the probability of the next letter is retrieved. Then, from most to least likely, the letters are assigned as primary if and only if the primary letters thus far do not together comprise over 99% of the probability, and the addition of this primary letter will not form a sequence of three primary letters (since that would mean that the middle letter would not have space to grow)


Me: Well, it's not perfect, but might get some love; even the top keyboard apps tend to have about 10% of users rating them as one-star, can't please everyone. About the name...

Mr. Robo: Oh, Mr. Ham has already taken care of that.

Me: What? Why?

Mr. Robo: Well, he was so enthusiastic about his suggestion - positively grinning from ear to ear - and it sounded rather appropriate, "reflecting the underlying technology", as he said...

Me: I'm prepared for the worst.

Mr. Ham: *pops up* Presenting...



Me: No, just no.

Mr. Ham: It's too late, I've printed all the publicity material!

Mr. Robo: Actually, is it wise to expose the idea just like this?

Me: Well, as is so often heard, ideas are cheap, execution is everything; one might be surprised to discover how often an app is panned simply because it doesn't provide pretty skins. In any case, I'm not particularly short on ideas.

Would it be that these days of dread bringing me ever closer to what I am extremely unwilling to see happen be over, such that my darkness might lift, and yet I would have time stop instead. Ah, what will be, will be. Back to other research.



comments (0) - email - share - print - direct link
trackbacks (2) - trackback url


Next: Lelong Lelong


Related Posts:
Robo Goes App
Groundwork
Think Thunk
The Huntster Awakens
It Was A Honest Mistake

Back to top




2 trackbacks


Trackback by Chemist Direct Healthcare Products

Chemist Direct Healthcare Products - [bert's blog]


September 9, 2014 - 05:28 SGT     

Trackback by loans for people with poor credit

loans for people with poor credit - [bert's blog]


October 6, 2014 - 21:01 SGT     


Copyright © 2006-2025 GLYS. All Rights Reserved.