CAPTCHAs are little reading tests that allow to distinguish humans from machines when accessing a computer interface. A word is displayed in a slightly noisy form and the user is required to type it in, easy for humans but not so for machines. The authors of reCAPTCHA wanted to associate a useful task to this test. They say
... in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort?So instead of picking random words for the test, they pick word-sized fragment from large public digitization efforts (currently the one led by the Internet Archive) and submit them to the users. That is, reCAPTCHA words are words that are not in digital format, appear in a paper document of interest and machines could not understand. So by solving a reCAPTCHA you are helping a useful project. So where is the catch? Well it took me a moment of thought but it's kind of obvious: if these are fragments of real digitization projects, by definition we don't know the solution. If we don't know it, we can't use them as CAPTCHAs! So I went back to the reCAPTHCA web site, and what they do is to propose two CAPTCHAs, one with a known but useless solution and the other with an unknown but useful one, in a random order. If you want to use a website, you have to complete both. Therefore there is no use of spare cycles whatsoever: the effort for the user has been doubled. If reCAPTCHA were the standard, there would be another 150,000 hours a day spent solving the second word. Fine by me in exchange for a free service, but not as efficient as advertised, not even close.
No comments:
Post a Comment