Monday, April 21, 2014

Cracking CAPTCHA

Are you familiar with the “completely automated public turing test to tell computers and humans apart”? More than likely you are, you’ve just heard of its’ acronym before:  CAPTCHA. CAPTCHA is a program that protects websites against nonhuman users such as bots. The term was coined back in 2000 by researchers at Carnegie Mellon University and IBM. The program by definition requires little to no human intervention to maintain and thus is one of the most widely used forms of authorization and security on the internet because of its reliability and low cost. Since its’ inception CAPTCHA has been applauded by some as innovative and criticized by others as a tedious, often impossible task to achieve.
The most common forms of CAPTCHA require the user to type the letters or digits corresponding to a distorted image.  Extra layers of securities can be added to the distorted image by adding background colors or graphics. Text-based CAPTCHAS today are designed in a way that they have three separate characteristics, invariant recognition, segmentation, and parsing, to complete authorizations with consistency. In little detail, invariant recognition refers to the program’s ability to recognize variations in the shapes and sizes of letters, something a human can do nearly infinitely while a computer must learn each. A segmentation characteristic allows the CAPTCHA to separate letters from one another when often they are crowded together with no spaces in between. Finally, the CAPTCHA design must take in to account the context of the form. For example, users must be able to recognize “nn” as two “n”s. CAPTCHAs that utilize all three components are the most difficult to solve and the most successful at stopping nonhuman users.
A form of CAPTCHA, simply reCAPTCHA, has been building on the success of its’ early predecessors. In fact, reCAPTCHA was founded by one of the developers of the original CAPTCHA program, Luis von Ahn. reCAPTCHA often presents two or more words to the user, further protecting the website from bots. Websites like Twitter, Facebook, Craigslist, and TicketMaster all use the reCAPTCHA service. Every successful reCAPTCHA entry also positively affects another use of the program and that is its’ text digitization and machine learning datasets. reCAPTCHA has been able to digitize and archive more than 13 million articles for the New York Times with articles dating back to 1851.
Google Inc. acquired reCAPTCHA in 2009 and has been perfecting its’ algorithm for five years. It came as a shock this week when Google researchers published a paper stating that they developed an algorithm to solve Google’s very own CAPTCHAs. Further, they stated that they could do it with 99.8% accuracy. This new algorithm had been created to help automatically analyze the hard-to-read signs and house numbers photographed by Google’s Street View cameras. Amazingly, it is able to do both of these analyses with 90% and 96% success rates respectively. So with technology being able to accurately dissect and analyze multi-protected CAPTCHAs and hard to read photographs, the security program seems doomed. Fortunately, Google believes that their newfound research will help to further protect CAPTCHAs rather than expose them to vulnerability. Vinay Shet, reCAPTCHA’s product manager, wrote in a blog post this week that “This shows that the act of typing in the answer to a distorted image should not be the only factor when it comes to determining a human versus a machine.” He continued by saying the reCAPTCHA service is more secure than ever before due to the researcher’s findings. In closing, CAPTCHAs will more than likely be here for the foreseeable future. However, to stay relevant CAPTCHAs’ protections must continue to evolve to stay ahead of bots and their analysis software. Now hopefully reCAPTCHA makes good on their promise of simplifying the entire process for the user while still keeping the website protected.

Source List:          Vinay Shet, Product Manager of reCAPTCHA Blog
                               Google develops computer vision accurate enough to solve its own CAPTCHAs

                              Google's reCAPTCHA

No comments:

Post a Comment