Cloudflare

Reverse Engineering Cloudflare's IUAM JS Challenge

Cloudflare IUAM Challenge Page.

I love web scraping. It's awesome! But one thing which I don't love is getting blocked by cloudflare's JavaScript challenge page. It forces me to use a headless browser, which (if you didn't know) is extremely inneficient. I've always wondered if there was a way to avoid having to use hacky methods. I had some free time on my hands, so I decided to give it a go. Here's what I found out.

Phase 0

Now the first thing I asked myself is how should I get started? Since this only appears on websites which are protected by cloudflare and in IUAM (I'm Under Attack Mode). So I got one of my domains and a simple web server, set up cloudflare and started reverse engineering.There's no time to open devtools on the page as you instantly get redirected to the real website. So I instead used chrome's view-source: which would give me the raw HTML of the page. 2 things caught my attention:

The first script tag at the top of the page containing configuration (named _cf_chl_opt) of the javascript challenge.
Another script tag, at the bottom of the page, which injects an invisible image and the script for phase 1.

Phase 1

Here's where the real fun begins. Immediatly we can see that the script has been obfuscated. Further inspection reveals that all strings are being fetched from a big array. We can also see that most operations (calling, concatenating, math, etc...) have been replaced with functions. One last thing is that, wherever possible, block statements ({ a++; return b; }) have been replaced with sequence expressions (a++, b). I spent a few hours coding a small deobfuscator using esprima which fixes said stuff. Now that we have readable code (after some refactoring), we can start looking.
I'll skip the boring part and get straight to explaining what it does.

_cf_chl_enter
-- Check if already running
-- Cookies enabled Check
-- Delete cookie cf_chl_ + ar.cvId
-- Set cookie cf_chl_prog to 's'
-- Setup _cf_chl_ctx
-- Run each function in n
---- bootstrap()
------ if #jc-content exists, prefix = 'jc', otherwise prefix = 'cf'
------ Create div with 
       id: {prefix}-challenge-state
       display: none
       add to #challenge-form
------ Exit if _cf_chl_opt.chlApivId
------ Setup the spinner dots
------ Write fact
---- browserCheck()
------ Exit if internet explorer, or missing support for borderImage or transform
---- cachedCheck()
------ Check if the page was cached by using _cf_chl_opt.crq.t and exit if it was
---- locationCheck()
------ Check if location.href matches _cf_chl_opt.crq.ru
---- loadingSetup()
------ Show the 3 bubbles and verifying text
------ Display the fact
---- ??
------ 25% chance of adding some error text
---- setupChallenge()
------ Creates a span with
       id: trk_jschal_js
------ Hooks to document.querySelector for unkown reason
---- evHook()
------ Listens to keydown, pointermove, pointerover, touchstart, mousemove, and click
       Increments _cf_chl_ctx.ie for everytime one of the events is fired
       Detaches every event once _cf_chl_ctx.ie reaches 25
-- Set cookie cf_chl_prog to 'e'
-- Construct url
   /cdn-cgi/challenge-platform/{ar.cFPWv ? 'h/{ar.cFPWv}/' : ''}/flow/ov1/{EXTHU (can only be found in the code)}/{_cf_chl_opt.cRay}/{_cf_chl_opt.cHash}
-- Create request to that url after 100ms

sendRequest
-- Creates a post request to specified URL
-- Sets CF-Challenge header to _cf_chl_opt.crq.cHash
-- Sets body to
   v_{_cf_chl_opt.crq.cRay}={compress(JSON.stringify(_cf_chl_ctx))}
-- Sends the request
-- Decrypts and evaluates the response body

decrypt
-- Create a variable with value 32 (key)
-- XOR the value with each character code of {_cf_chl_opt.cRay}_0
-- Turn the base64 ciphertext into a string
-- For each character in the string, perform this operation
   (((charCode & 255) - key - index % 65535 + 65535) % 255)
-- Turn the numbers into characters using String.fromCharCode

Now almost none of this is important, except for the part where a request is sent. If you have already looked in a debugging HTTP proxy while loading the page, you should have noticed these requests being made with cryptic base 64 in the request and response body. Now this is because the challenge context first gets compressed and the response body gets decrypted.

The compression algorithm

At first I thought it was a function for encrypting their messages before firing them off to the server, as it accepted an argument for a key. Compression algorithms don't use keys, right? But after multiple hours of struggling to understand, I got it. It's a variation of LZW, but not one I have ever seen before. Here's how it works:

AABABBABBAABA -> rWSfG4iOQ

00 10000010  110 000 01000010 011 1110 1010 1110 0110 1100 0100000
/     A       A   /    B       AB  BA   B    BA   AB   A     2

Dictionary:
A 3
AA 4
B 5
AB 6
BA 7
ABB 8
BAB 9
BB 10
BAA 11
ABA 12

Before I get into the actual compression, I need to tell you about the particular way things are encoded. The binary isn't encoded in bytes, but rather in packs of 6 bits (second argument of the function). When doing so, the 6 bit number will be transformed into the character at that index in the "key" string. Which is also why the key string for the example usage has a length of 64 (2 ^ 6).
But that isn't all. When writing a number from the dictionary, the length varies on how much you have written so far. The first 2 times something get's written, the length will be 2. The next 4 times, it will be 3 bits long. The next 8 times, it will be 4. Etc... Each section is also written in reverse (110 -> 011 -> 3).

Now for the actual compression. When a new character is met, it is first checked if it fits in one byte (charCode < 256). If that's the case, 0 is written followed by one byte containing the character code. If not, a 1 is written with 2 bytes for the character code. Like regular LZW, a dictionary is built. This one is built a little different. You can look above and try to find the pattern. One thing to note is that the dictionary starts at 3, since 0 is reserved for one byte characters, 1 for 2 byte characters, and 2 as a way to mark the end of the message.

Phase 2

Phase 2 is where all the browser fingerprinting happens. There's quite a lot, and most of it is hard coded. Each fingerprinting affects the _cf_chl_ctx object, with _cf_chl_ctx.chC used as an index. The challenges include:

JSFuck match challenge
Loading an image and taking it's height and width
Making a requesting and getting that status code
Collecting browser plugins (navigator.plugins)
Etc...
If you would like to find out all the steps, feel free to take a look at the code.

Phase 3

As described before, the response body of the request gets decrypted and instantly executed. This phase is where the Proof of Work code lies. Math.ceil(navigator.hardwareConcurrency * 0.75 || 6) n workers will be spun up trying to get to first few letters of a SHA256 hash to match up. How many letters need to match is decided by the difficulty variable which is hardcoded. The SHA256 hash is generated like so:
e={time}&d={difficulty}&n={nonce}{iv}
where

time is the unix timestamp (in miliseconds) the worker got created
difficulty is the amount of characters which need to match up with the target hash
nonce is a random number from 0 to 9999
iv is a hardcoded string
Once the matching hash is found, the challenge context is modified and a new post request is made where the code for phase 3 is given and evaluated.