How reCAPTCHA tracks you - the dark side of reCAPTCHA

How reCAPTCHA tracks you - the dark side of reCAPTCHA

What do you mean reCAPTCHA tracks me?

You want the short and (not so) sweet of it? Google looks at your browsing history, mouse movements, etc. It analyses your behavioural habitual data on their servers when you click that "I'm not a robot" button.

Why? Great question!

TL;DR - Only humans buy shoes on Amazon, right?

This is the "additional level" of "anti-robot technology" that Google reCAPTCHA uses. It uses this data to determine whether your browser history is that of a robot or a human. So if you've just been on Twitter arguing with an AI, then went on Amazon to buy a pair of shoes and logged into your online banking account, chances are, Google won't even bother showing you a pre-school challenge like "click each image that contains a female lion", for example.

Is it safe?

TL;DR - Not really. Just let me click on lions!

I suppose quite a trivial question, if you're reading this right now, chances are you're on a computer or smartphone with access to the internet... so arguably, safety is never guaranteed. But as a rule of thumb, you shouldn't trust anyone or anything (bots & AI) to kindly traul through your browsing data for the "greater good" of the website owner to limit traffic from bots.

Google supposedly uses this information to make the browsing experience more difficult for robots, and reCAPTCHA is using this so-called "advanced risk analysis system" in order to evaluate the requests. This, in turn, selects the difficulty of the CAPTCHA puzzle that is returned to the user.

In other words, if Google doesn't give you a puzzle to fill out and goes as far as basically ticking the box for you - you should be worried. They know what you just bought on undisclosed-website.com...

reCAPTCHA v3

Google are aware of the legal risk of outright blocking users from accessing services, so reCAPTCHA v3 contains no user facing UI, Google merely makes a suggestion in the form of a user score, so the responsibility to delay or block access and the legal liability that comes with it falls on websites.

reCAPTCHA v2 is superseded by v3 because it presents a broader opportunity for Google to collect data, and do so with reduced legal risk.

Since reCAPTCHA v3 scripts must be loaded on every page of a site, you must send Google your browsing history and detailed data about how you interact with sites in order to access basic services on the internet, such as paying your bills, or accessing healthcare services.

It's needless to say that the kind of data that is collected by reCAPTCHA v3 is extremely sensitive. Those requests contain data about your motor skills, health issues, and your interests and desires based on how you interact with content. Everything about you that can be inferred or extracted from a website visit is collected and sent to Google. Not to mention that reCAPTCHA seems to be selectively blocking the audio challenge, discriminating against people with visual disabilities (Dessant, 2019).

So what can I do?

TL;DR - Pester website owners/developers!

According to tech statistics website Built With, 2 Million+ websites are using reCAPTCHA v3. Overall, there are at least 14 million websites use reCAPTCHA, including 37.8% of the top 10,000 sites (BuiltWith, 2022).

In a lot of cases, if you'll refuse to transmit personal data to Google, websites you are visiting that use the reCAPTCHA plugin will hinder or block your access.

A lot of this issue lies with seperate websites who are using this Google service. It's great for the average developer because Google have spent a lot of time carefully creating the API, documenting it to a high degree to ensure ease of use, which makes Google's CAPTCHA services favourable.

That being said, Google have developed a technology (yet again) that works in their corporate favour to scrape as much data as they can.

Aside from the website owners, users can reduce how much they are tracked by using strict cookie and tracking policies on their web browsers, including cross-site tracking (Eastwood, 2020).

Takeaways

Developers should consider all options available to them when designing websites and applications. The easiest and cheapest (or free) option is not always the best. Sure, the functionality and ease of setup might be incredibly easy, but at what cost? If you are not paying for the product, then you probably are the product!

There are plenty of alternatives available (AlternativeTo, 2022), which deliver the same functionaltiy without giving your users data to Google's machine.

References / Useful Links

BuiltWith., 2022. ReCAPTCHA Usage Statistics [viewed 21 October 2022]. Available from: https://trends.builtwith.com/widgets/reCAPTCHA

EASTWOOD, G., 2020. How to Disable Cross-Site Tracking On Your Internet Browser - Fast and easy steps to stop third-party sites from tracking your activity for advertising. [viewed 21 October 2022]. Available from: https://blogs.chapman.edu/information-systems/2020/12/01/how-to-disable-cross-site-tracking-on-your-internet-browser/

AlternativeTo., 2022. Free reCAPTCHA Alternatives | AlternativeTo [viewed 21 October 2022]. Available from: https://alternativeto.net/software/recaptcha/?license=free

Dessant., 2019. Working Draft Feedback: reCAPTCHA selectively blocks audio challenge · Issue #28 · w3c/captcha-accessibility [viewed 21 October 2022]. Available from: https://github.com/w3c/captcha-accessibility/issues/28