原文始发于Yair Mizrahi:Re-ReBreakCaptcha:再次绕过 Google 的 ReCaptcha v2
TL;DR A logic vulnerability working 5 years later, dubbed ReBreakCaptcha, which lets you easily bypass Google’s ReCaptcha v2 anywhere on the web.
ReCaptcha Overview
Many of us know of ReCaptcha, Google’s Human Recognition Program.
There are two versions of it: v2 and v3.
v3 Is not our focus in this post, as it has no user interaction at all and only results in a score without a CAPTCHA challenge.
v2 has two types: “I’m not a robot” Checkbox, and Invisible reCAPTCHA badge.
We’ll focus on the first type, as it has all the challenges.
(https://developers.google.com/recaptcha/docs/versions)
There are two types of ReCaptcha v2 challenges:
Image Challenge – The challenge contains a description and an image which consists of 16 sub-images. The user is requested to select those sub-images that best match the given description.
Audio Challenge – The challenge contains an audio recording, The user is requested to enter the words that are heard.
Re-ReBreakCaptcha knows how to solve ReCaptcha v2 audio challenges, using Google’s own services!
Therefore, we need a methodology of how to get an audio challenge every time.
When clicking the “I’m not a robot” checkbox of ReCaptcha v2, we are often presented with the following challenge type:
To get an audio challenge we need to click the following button:
Then we are presented with an audio challenge that can be easily bypassed:
Sometimes instead of an audio challenge, an error message is presented as Google has automation detection:
We’ll try our best to avoid it and bypass it as well.
A simple sleep of a few minutes cooldown should suffice.
3 days ago ‘The Verge’ posted an article about CAPTCHAs:
https://www.theverge.com/2019/2/1/18205610/google-captcha-ai-robot-human-difficult-artificial-intelligence
It argues that CAPTCHAs are getting harder and harder to solve by humans, but algorithms are getting better at it. It seems Google is part of the problem itself.
Also, thanks Josh for mentioning ReBreakCaptcha indirectly!
2017 ReCaptcha Bypass
Back in 2017, I posted a method that bypasses Google’s ReCaptcha v2 with 93% success rate – ReBreakCaptcha.
See the post here: https://east-ee.com/2017/02/28/rebreakcaptcha-breaking-googles-recaptcha-v2-using-google/
Re-ReBreakCaptcha works in three stages:
- Audio Challenge – Getting the correct challenge type.
- Recognition – Converting the audio challenge audio and sending it to Google’s Speech Recognition API.
- Verification – Verifying the Speech Recognition result and bypassing the ReCaptcha.
The previous post promted Google to respond quickly, and heavy measures were made to prevent it in the short-term.
It’s been 5 years, so I decided to revisit this project and check it out.
As of the time of posting (28/02/2022), it is confirmed that this vulnerability still works with some minor changes to the code with 98% success rate – better than the original!
Backstory
Few days after publishing the original post, it got a lot of traffic and made headlines.
It was brought to Google’s attention that the PoC was live on GitHub so they took action.
They replaced the easy-to-solve audio challenges (4-5 digits) to a much harder variant after only a few audio solves.
Those harder-to-solve audio challenges were longer (10-12 digits).
They also contained background noise so bad, it sometimes was impossible to solve manually.
3/3/2017 I declared the PoC as non-operational anymore.
Started fiddling around with splitting the audio digits using the silence between, but had lower success rates (less than the original 97%) – so decided to leave it at that.
Through the coming years, a lot of researchers based their research on mine (with one team even writing their thesis upon this concept).
Some had little tweaks, some had newer mechanisms.
Thank you, I’m honored.
This includes:
10/2017 https://www.reddit.com/r/netsec/comments/78nbmu/code_release_defeating_googles_recaptcha_with/ (After this publication Google decided to upgrade audio challenges from digits audio to phrases audio)
12/2018 https://www.reddit.com/r/netsec/comments/ab94o0/code_release_uncaptcha2_defeating_googles/
08/2019 https://www.digitalwhisper.co.il/files/Zines/0x6D/DW109-3-reCAPTCHA.pdf
05/2020 https://www.reddit.com/r/netsec/comments/gpcic1/bypassing_captcha_with_visuallyimpaired_robots/
01/2021 https://www.reddit.com/r/netsec/comments/kp7p79/breaking_the_google_audio_recaptcha_with_googles/
So what has changed?
I figured enough time has passed, and tried to have another jab at it.
Google has added some noise at the beginning and ending of the audio challenge – but it seems they don’t use it as a fingerprint to prevent this bypass technique (even without splitting the words!).
Made little tweaks to the original PoC:
- Updated to Python 3 (specifically 3.7+).
- Updated to support the new version of Selenium 4.
- Little tweaks to the logic to circumvent ReCaptcha automation detection.
Below is the link to the updated PoC, a fully-automated bypass of ReCaptcha v2:
https://github.com/eastee/re-rebreakcaptcha
A video showing 100 audio challenges solved successfully using the PoC with 98% success rate (x30 speed):