Are you familiar with the “completely automated public
turing test to tell computers and humans apart”? More than likely you are,
you’ve just heard of its’ acronym before:
CAPTCHA. CAPTCHA is a program that protects websites against nonhuman
users such as bots. The term was coined back in 2000 by researchers at Carnegie
Mellon University and IBM. The program by definition requires little to no
human intervention to maintain and thus is one of the most widely used forms of
authorization and security on the internet because of its reliability and low
cost. Since its’ inception CAPTCHA has been applauded by some as innovative and
criticized by others as a tedious, often impossible task to achieve.
The most common forms of CAPTCHA
require the user to type the letters or digits corresponding to a distorted
image. Extra layers of securities can be
added to the distorted image by adding background colors or graphics.
Text-based CAPTCHAS today are designed in a way that they have three separate
characteristics, invariant recognition, segmentation, and parsing, to complete
authorizations with consistency. In little detail, invariant recognition refers
to the program’s ability to recognize variations in the shapes and sizes of
letters, something a human can do nearly infinitely while a computer must learn
each. A segmentation characteristic allows the CAPTCHA to separate letters from
one another when often they are crowded together with no spaces in between.
Finally, the CAPTCHA design must take in to account the context of the form.
For example, users must be able to recognize “nn” as two “n”s. CAPTCHAs that
utilize all three components are the most difficult to solve and the most
successful at stopping nonhuman users.
A form of CAPTCHA, simply reCAPTCHA,
has been building on the success of its’ early predecessors. In fact, reCAPTCHA
was founded by one of the developers of the original CAPTCHA program, Luis von
Ahn. reCAPTCHA often presents two or more words to the user, further protecting
the website from bots. Websites like Twitter, Facebook, Craigslist, and
TicketMaster all use the reCAPTCHA service. Every successful reCAPTCHA entry
also positively affects another use of the program and that is its’ text
digitization and machine learning datasets. reCAPTCHA has been able to digitize
and archive more than 13 million articles for the New York Times with articles
dating back to 1851.
Google Inc. acquired reCAPTCHA in
2009 and has been perfecting its’ algorithm for five years. It came as a shock
this week when Google researchers published a paper stating that they developed
an algorithm to solve Google’s very own CAPTCHAs. Further, they stated that
they could do it with 99.8% accuracy. This new algorithm had been created to
help automatically analyze the hard-to-read signs and house numbers
photographed by Google’s Street View cameras. Amazingly, it is able to do both
of these analyses with 90% and 96% success rates respectively. So with
technology being able to accurately dissect and analyze multi-protected
CAPTCHAs and hard to read photographs, the security program seems doomed.
Fortunately, Google believes that their newfound research will help to further
protect CAPTCHAs rather than expose them to vulnerability. Vinay Shet,
reCAPTCHA’s product manager, wrote in a blog post this week that “This shows
that the act of typing in the answer to a distorted image should not be the
only factor when it comes to determining a human versus a machine.” He
continued by saying the reCAPTCHA service is more secure than ever before due
to the researcher’s findings. In closing, CAPTCHAs will more than likely be
here for the foreseeable future. However, to stay relevant CAPTCHAs’
protections must continue to evolve to stay ahead of bots and their analysis
software. Now hopefully reCAPTCHA makes good on their promise of simplifying
the entire process for the user while still keeping the website protected.
Source List: Vinay
Shet, Product Manager of reCAPTCHA Blog
No comments:
Post a Comment