Circumventing JS Obfuscation Techniques

An example of code obfuscation.

When making a secure script, obfuscation is a necessary measure to keep people from reverse engineering your code. Automated programs will turn your program into an unreadable mess which will make it way harder for any unwanted person to understand what it is doing. Now, obfuscation doesn't mean it is impossible to turn the program back into its original form. This is what we are going to attempt in this blog post with the code behind Arkose Labs' FunCaptcha.

Analysis

The first step to deobfuscation is to take a look at the obfuscated an find out which modifications have been done. Below is a snippet of code from the obfuscated source.

        return  h(aY, bV(f_a_gn.ap) + 'ucture', p),
        h(aY, bV(f_a_gn.aq) + bV(f_a_gn.ar) + 'f', q),
        aY;
    }
    function h(aX, aY, aZ, b0) {
        var bW = bU, b1;
        b0 !== undefined ? b1 = aZ['bind'](aX, b0) : b1 = aZ[bW(f_a_go.a)](aX),
        Object[bW(f_a_go.b) + 'erty'](aX, aY, {
            'value': b1,
            'configurable': !![]
        });
    }
    function j() {
        var f_a_gp = {
            a: 0x3ea,
            b: 0x43a,
            c: 0x4c2,
            d: 0x43a,
            e: 0x431
        }
        aX['push']({
            'key': 'DNT',
            'value': this[bX(f_a_gq.a)]()
        })

In this snippet alone, multiple modifications can be noticed

JavaScript's sequence operator is used to chain instructions at a return statement.
At the start of most functions, a variable is defined prefixed with f_a_ with multiple letters as keys which each hold a specific hexadecimal value. These values are then used in a function which resolves a string from a long array of strings.
Long strings have been split up and get put together at runtime. These strings can get resolved or can hardcoded loosely in the code.
True and false get replaced with !![] and ![] respectively. This is also a way of saving some network bandwidth as they need less characters than true and less.
Simple property accessors which could use dot notation use bracket notation with a string instead. It's a simple inconvenience.

Coding

It's now time for creating a deobfuscator. The easiest way of doing this is by turning the code into an AST, modifying stuff, and then turning it back into code. In JavaScript, libraries exist to do this. Namely esprima, estraverse and escodegen.

Basics

const esprima = require('esprima')
const estraverse = require('estraverse')
const escodegen = require('escodegen')
const fs = require("fs")
let parsed = esprima.parse(fs.readFileSync('./funcaptcha_api.js', 'utf8'));

First thing to do is to require the libraries and load up our code into esprima. This will return an AST (or abstract syntax tree) which can then be used for easy modification.

![] and !![]

// Replace ![] with false and !false with true
estraverse.replace(parsed, {
    leave: function (node) {
        if (node.type === 'UnaryExpression'
         && node.operator === '!'
         && node.argument.type == 'ArrayExpression'
         && node.argument.elements.length == 0
        ) {
            return {
                type: 'Literal',
                value: false,
                raw: 'false'
            }
        } else if (node.type === 'UnaryExpression'
                && node.operator === '!'
                && node.argument.type == 'Literal'
                && node.argument.value == false
        ) {
            return {
                type: 'Literal',
                value: true,
                raw: 'true'
            }
        }
    }
})

The first and most simple thing to do is replace ![] and !![] with their appropriate boolean counterparts. This can be done by using estraverse, and checking if we are using the not operator on an empty array. If so, we replace it with false as an array is truthy. If !![] is used, it will be replaced with !false. This can also be replaced by checking for a not operator on a literal false, which is then replaced with true. First step done!

object["property"]

estraverse.traverse(parsed, {
    leave: function (node) {
        if(node.type == "MemberExpression"
        && node.computed == true && node.property.type == "Literal"
        && typeof node.property.value == 'string'
        ) {
            if(node.property.value.match(/^[a-zA-Z][a-zA-Z0-9]*$/)) {
                node.computed = false;
                node.property = {
                    type: "Identifier",
                    name: node.property.value
                }
            }
        }
    }
})

Another simple fix. By traversing the AST and finding any MemberExpressions which use bracket notation (node.computed) and use a string as the property which also matches the regex (dot notation can only be used if the property name consists of letters and numbers and the first character isn't a number), we can object["property"] with a more enjoyable object.property dot notation.

"a" + "b"

estraverse.replace(parsed, {
    leave: function (node) {
        if (node.type === "BinaryExpression"
         && node.operator === "+"
         && node.left.type == "Literal"
         && node.right.type == "Literal"
         && typeof node.left.value == "string"
         && typeof node.right.value == "string"
        ) {
            return {
                type: "Literal",
                value: node.left.value + node.right.value,
                raw: JSON.stringify(node.left.value + node.right.value)
            }
        }
    }
})

Just look for the + operator being used on 2 strings and stitch them together to turn "a" + "b" into "ab". The reason for JSON.stringify being used here is to fix any possible string escape problems.

f_a_ir

let fadb = {}
let fdb = ["f_a_d"]
let constants = require("./constants.json")
estraverse.replace(parsed, {
    enter: function (node) {
        if(node.type == "VariableDeclarator"
        && node.id.name.startsWith("f_a_")
        && node.init.type == "ObjectExpression"
        ) {
            fadb[node.id.name] = Object.fromEntries(node.init.properties.map(x => [x.key.name, x.value.value]))
        } else if (node.type == "VariableDeclarator"
                && node.init.type == "Identifier"
                && fdb.includes(node.init.name)
        ) {
           fdb.push(node.id.name)
        }
    },
    leave: function (node) {
        if (node.type == "MemberExpression"
         && node.computed == false
         && node.property.type == "Identifier"
         && fadb[node.object.name]
         && fadb[node.object.name][node.property.name]
        ) {
            return {
                type: "Literal",
                value: fadb[node.object.name][node.property.name],
                raw: fadb[node.object.name][node.property.name].toString()
            }
        } else if(node.type == "CallExpression"
               && node.callee.type == "Identifier"
               && fdb.includes(node.callee.name)
               && node.arguments.length == 1
               && node.arguments[0].type == "Literal"
               && typeof node.arguments[0].value == "number"
        ) {
            return {
                type: "Literal",
                value: constants[node.arguments[0].value - 0x12e],
                raw: constants[node.arguments[0].value - 0x12e].toString()
            }
        }
    }
})

Now here's where things get a bit more complicated. We need to turn something like this[bX(f_a_gp.a)]() into this.value(). It turns out that all objects starting with f_a_ are unique in variable name. We can traverse the code for anytime an object with starting with f_a_ gets created, to then build a database of values. Sometimes the function which fetches the strings gets redefined under a new name, which again is unique, so a list of those names can also be created. We then replace any member expressions which uses an f_a_ variable with its corresponding number. And finally, any call to the string fetching function with a number as argument is replaced with the matching string. Done!

Once you run your code through your deobfuscator, it'll be way easier for you to read. But, you will most likely have missed some other techniques or broken parts of the code by deobfuscating. You can fix and add more deobfuscation techniques to perfect your deobfuscator up until you can read the code without any issues. Here's where I will wrap up this blog post. Here is the final deobfuscator as well as deobfuscated code.