
Péter Gyöngyösi - developer
It did not come by surprise, as you could read about cracking captchas, new methods and new results throughout the year (for example here, here, here and here), but with the recent release of the most detailed, highly commented analysis about such an attack, it is time to review this topic a bit.
Let's start with what captcha is: it stands for "Completely Automated Turing-test to tell Computers and Humans Apart." The Turing-test is a reference to Alan Turing British cryptographer-mathematician who greatly contributed to the victory of the Allies by being a key player in cracking Enigma. It was his idea that there are tasks that a human can easily solve within seconds, but they are virtually impossible for machines, so they can be used to tell a machine from a real human.
It took half a century for his idea to become used not only by researchers of artificial intelligence, but also half of the world. Most commonly it is used on webmail sites to prevent automatic access that could create hundreds of accounts to send spam by the billion. Usually the test takes the form of a few distorted letters that the user must type -- algorithms are notoriously bad at this task, but owing to how our minds operate, we can effortlessly solve it. There are also funny solutions: I have seen websites ask which day is now in my country, which one of three pictures shows a cat, or the prettier lady. The security behind these solutions comes more from their uniqueness, but for a computer, answering these questions is still far from trivial.
The vulnerability of these and similar tests is that there the number of questions is limited. A typical recognize-the-letters captcha implementation uses short words instead of random letters, because the human brain recognizes words like "puppy" more easily than the "ahs3Eina" string. An average implementation uses a dictionary of 5-10000 words: if the prize is tempting, the attackers can and will build this dictionary -- with some human input, or even simply using brute force.
Another way would be to create even better algorithms and use the cheaper and cheaper computing capacity to solve the problem that originally we thought difficult and expensive to crack. Since the captchas have first appeared, character-recognition technology (OCR) has tremendouly improved: earlier it was sufficient to skew the letters a it to the right or left, and now on the more secure pages even we are scratching our human head to make out the text between the many skewing, rotations, and strikethroughs that are supposed to prevent automatic recognition. But self-learning algorithms (for example neural networks) can also help -- after some teaching they might find the few parameters required to decide if there is a cat in the picture.
But the third and scariest approach is to delegate the problem: the attacker does not even try to use machines to solve the problem, but simply removes it from its original environment and asks another human to answer the question. This has thousands of variations, ranging from games (using mostly adult content) that require the answer to access the next level to actual "jobs" that pay money for recognizing the letters. And what is scary about that? That it negates the entire concept: in such cases the Turing-test result is obviously positive, because the answers come from real humans. The success of the attack comes from the fact that the problem-solving is centralized and the cost of an answer reduced to a rate that is economically acceptable for the attacker.
It is possible to avoid the first two types of attack with some good ideas, and careful design and implementation -- though it is becoming more difficult because of the available cheap computing capacity, we can still find tasks that the human brain solves much more effectively than a computer, and increasing the dictionary or varying different tasks is just a question of technology. But fighting the centralized human recognition needs a different approach. The conflict cannot be resolved between these two interests: first, the protection must not become a tiring obstacle for the average user, second, we do not want that with some organizing the security questions are easy to answer by the thousands.
To solve the problem, we have to change the concept. If we ask the user to prove that he is a human only seldom (ideally, only once during the user's "online life"), we can ask for something more. Something that only humans have, and it is not worth create many forgeries. At this level, we can surpass the logical problems and questions-to-answer -- think rather of something from the physical world. Like returning a code received in a text message or via snail-mail: for the attacker, a creating a new mail address or phone number usually does not worth the fake account. But the service providers cannot create such security systems on their own, because it would be too expensive for them, and the users would not invest so much energy into every online service they use. The solution could be to manage the identities centrally, and we can be optimistic about that, because - as we have commented - OpenID is right around the corner.
The analysis published by Websense is a must-read for everyone interested in security: you can see in detail - including logs, screenshots, network dumps - as they track an attack against Google's captcha. The attackers have combined multiple methods: they have created a well-designed and professional system that has about 20% success-rate using algorithms, database building and human labor.
The ball is rolling, fictive accounts are created by the thousands even now. And what will answer Google, who is methodically hiring the greatest minds of the world? Probably this will be among the most exciting security ideas of the spring.
1 comments:
Между прочим, лучший способ обезопасить человека от навязчивых мобилок - приобрести Подавители сотовой связи
Post a Comment