Make the impossible possible

Root Cause Analysis - CVE-2023-32439 Type Confusion in Webkit

Sunjoo Park @grigoritchy

2 weeks ago in 21 June 2023, Apple have updated new security patch for Webkit(CVE-2023-32439) in iOS 16.5 and Apple suspected this bug is used in the wild activly.

Overview

Impact: Processing maliciously crafted web content may lead to arbitrary code execution. Apple is aware of a report that this issue may have been actively exploited.
Description: A type confusion issue was addressed with improved checks.
WebKit Bugzilla: 256567
CVE-2023-32439: an anonymous researcher

Affected Version: Safari 16.5 and below versions

Issue Report: https://bugs.webkit.org/show_bug.cgi?id=256567

Patch Commit Log: https://github.com/WebKit/WebKit/commit/52fe95e5805c735cc1fa4d6200fcaa1912efbfea

Proof Of Concept

const arr = [0];

function jitter() {
        for (let _ in arr) { // create EnumeratorNextUpdateIndexAndMode DFG node.
                return 0 in arr; // create HasIndexedProperty DFG node.
        }
}

function trigger() {
        let a = jitter(); // a == true
        for(let i = 0; i < 0x100000; i++) { 
                jitter(); // making hot jitter function until it generates FTL bytecode.
        }
        let b = jitter(); // b == 0

        if (a != b) {
            print("a != b");
        }
}

trigger();

There was a testcase script available with patch commit log. This testcase made me easier to write PoC thankfully.

Detail

When JavaScriptCore execute for…in javascript syntax, JavascriptCore lexer parser create ForInNode that generate several low level interpreter opcodes. To handle ForInNode AST node as expectation of javascript behavior, it first generate op_get_property_enumerator opcode to create enumerator object and then this enumerator is used for op_enumerator_next opcode. op_enumerator_next is generated within for loop, which for loop is also opcode too. This loop continues until property name that is a result of op_enumerator_next is not exist. op_enumerator_next argument index is fixed as 0, which is used as a start index of enumerator.

// /Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp
void ByteCodeParser::parseBlock(unsigned limit)
{
    ...
        case op_get_property_enumerator: {
            auto bytecode = currentInstruction->as<OpGetPropertyEnumerator>();
            set(bytecode.m_dst, addToGraph(GetPropertyEnumerator, get(bytecode.m_base)));
            NEXT_OPCODE(op_get_property_enumerator);
        }

        case op_enumerator_next: {
            auto bytecode = currentInstruction->as<OpEnumeratorNext>();
            auto& metadata = bytecode.metadata(codeBlock);
            ArrayMode arrayMode = getArrayMode(metadata.m_arrayProfile, Array::Read);
            Node* base = get(bytecode.m_base);
            Node* index = get(bytecode.m_index);
            Node* enumerator = get(bytecode.m_enumerator);
            Node* mode = get(bytecode.m_mode);

            auto seenModes = OptionSet<JSPropertyNameEnumerator::Flag>::fromRaw(metadata.m_enumeratorMetadata);

            if (!seenModes)
                addToGraph(ForceOSRExit);

            addVarArgChild(base);
            addVarArgChild(index);
            addVarArgChild(mode);
            addVarArgChild(enumerator);
            addVarArgChild(nullptr); // storage for IndexedMode only.
            Node* updatedIndexAndMode = addToGraph(Node::VarArg, EnumeratorNextUpdateIndexAndMode, OpInfo(arrayMode.asWord()), OpInfo(seenModes));

            Node* updatedMode = addToGraph(EnumeratorNextExtractMode, updatedIndexAndMode);
            set(bytecode.m_mode, updatedMode);

            Node* updatedIndex = addToGraph(EnumeratorNextExtractIndex, updatedIndexAndMode);
            set(bytecode.m_index, updatedIndex);

            set(bytecode.m_propertyName, addToGraph(EnumeratorNextUpdatePropertyName, OpInfo(), OpInfo(seenModes), updatedIndex, updatedMode, enumerator));

            NEXT_OPCODE(op_enumerator_next);
        }

So for…in javascript syntax is essentially generate two opcode op_get_property_enumerator and op_enumerator_next. When the function executed many times so tierup counter touched threshold of DFG compilation, these are later turn into GetPropertyEnumerator and EnumeratorNextUpdateIndexAndMode, EnumeratorNextExtractMode, EnumeratorNextExtractIndex, EnumeratorNextUpdatePropertyName DFG node sequentially. EnumeratorNextExtractMode DFG node is converted to JSConstant with mode of enumerator that is JSPropertyNameEnumerator::Flag while DFG constant folding phase is running. It becuase DFGAI set constant value with jsNumber and marks flag of try constant folding as true if JSPropertyNameEnumerator::Flag is JSPropertyNameEnumerator::IndexedMode or JSPropertyNameEnumerator::OwnStructureMode.

// /Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp

void ByteCodeParser::parseBlock(unsigned limit)
{
    ...
        case op_in_by_val: {
            auto bytecode = currentInstruction->as<OpInByVal>();
            Node* base = get(bytecode.m_base);
            Node* property = get(bytecode.m_property);
            bool compiledAsInById = false;

            if (!m_inlineStackTop->m_exitProfile.hasExitSite(m_currentIndex, BadIdent)
                && !m_inlineStackTop->m_exitProfile.hasExitSite(m_currentIndex, BadType)
                && !m_inlineStackTop->m_exitProfile.hasExitSite(m_currentIndex, BadConstantValue)) {

                InByStatus status = InByStatus::computeFor(
                    m_inlineStackTop->m_profiledBlock, m_inlineStackTop->m_baselineMap,
                    m_icContextStack, currentCodeOrigin());

                if (CacheableIdentifier identifier = status.singleIdentifier()) {
                    UniquedStringImpl* uid = identifier.uid();
                    m_graph.identifiers().ensure(uid);
                    if (identifier.isCell()) {
                        FrozenValue* frozen = m_graph.freezeStrong(identifier.cell());
                        if (identifier.isSymbolCell())
                            addToGraph(CheckIsConstant, OpInfo(frozen), property);
                        else
                            addToGraph(CheckIdent, OpInfo(uid), property);
                    } else
                        addToGraph(CheckIdent, OpInfo(uid), property);

                    handleInById(bytecode.m_dst, base, identifier, status);
                    compiledAsInById = true;
                }
            }

            if (!compiledAsInById) {
                ArrayMode arrayMode = getArrayMode(bytecode.metadata(codeBlock).m_arrayProfile, Array::Read);
                set(bytecode.m_dst, addToGraph(InByVal, OpInfo(arrayMode.asWord()), base, property));
            }
            NEXT_OPCODE(op_in_by_val);
        }

In case of in operator, lexer parser create InNode, which AST node generate op_in_by_val low level interpreter opcode, then which node will be converted to InByVal DFG node when it compiles to DFG. InByVal node is converted to HasIndexedProperty node in the DFG fixup phase if base of InByVal is speculated as object type and property is speculated as integer type. this speculated type is measured while executing prediction injection phase and DFG bytecode generator.

// /Source/JavaScriptCore/dfg/DFGClobberize.h
template<typename ReadFunctor, typename WriteFunctor, typename DefFunctor, typename ClobberTopFunctor>
void clobberize(Graph& graph, Node* node, const ReadFunctor& read, const WriteFunctor& write, const DefFunctor& def, const ClobberTopFunctor& clobberTopFunctor)
{
    ...
    case EnumeratorNextUpdateIndexAndMode:
    case HasIndexedProperty: {
        if (node->op() == EnumeratorNextUpdateIndexAndMode) {
            if (node->enumeratorMetadata() == JSPropertyNameEnumerator::OwnStructureMode && graph.varArgChild(node, 0).useKind() == CellUse) {
                read(JSObject_butterfly);
                read(NamedProperties);
                read(JSCell_structureID);
                return;
            }

            if (node->enumeratorMetadata() != JSPropertyNameEnumerator::IndexedMode) {
                clobberTop();
                return;
            }
        }

        read(JSObject_butterfly);
        ArrayMode mode = node->arrayMode();
        switch (mode.type()) {
        case Array::ForceExit: {
            write(SideState);
            return;
        }
        case Array::Int32: {
            if (mode.isInBounds()) {
                read(Butterfly_publicLength);
                read(IndexedInt32Properties);
                def(HeapLocation(HasIndexedPropertyLoc, IndexedInt32Properties, graph.varArgChild(node, 0), graph.varArgChild(node, 1)), LazyNode(node));
                return;
            }
            break;
        }
        ....

The noticeable patch is in the clobberize of DFGClobberize function. clobberize of DFGClobberize function checks whether node can eliminate or clobber as an other node to enhance performance of DFG execution. EnumeratorNextUpdateIndexAndMode and HasIndexedProperty DFG node use same HasIndexedPropertyLoc value at LocationKind of HeapLocation before the patch applied. Then that HeapLocation is defined as key by def method that is forwarded by clobberize argument. There is only one place that is defined def method is using it - common subexpression elimination phase.

As we can guess this behavior by looking the phase name of CSE(Common Subexpression Elimination), The purpose of CSE is elimination to optimize same kind of operation. CSE have two type elimination - Local, Global. Local is only searching for target nodes in the same Block. Global is searching for target nodes in the different Blocks that are dominated by current Block of target node. LCSE(Local CSE) phase is running in the DFG tier and GCSE(Global CSE) phase is running in the FTL tier. If it can make that EnumeratorNextUpdateIndexAndMode and HasIndexedProperty DFG node use same HeapLocation, GCSE phase decide to that these node are same side of effect so it is replaced with one node. All of HasIndexedProperty node that using it as children will be replaced with EnumeratorNextUpdateIndexAndMode node as a result in the GCSE phase. DFGAI set speculatedType of EnumeratorNextUpdateIndexAndMode node to SpecBytecodeNumber and speculatedType of HasIndexedProperty node to SpecBoolean. So it can lead to type confusion that makes DFG thinking that node has SpecBoolean type is now SpecBytecodeNumber type if it can make same HeapLocation for two node.

Below list is condition to replace HasIndexedProperty node with EnumeratorNextUpdateIndexAndMode,

  • The code has to tiered up to FTL. Because LCSE can’t replace nodes since each nodes contain in different block.
  • for…in should use JSPropertyNameEnumerator::IndexedMode.
  • for…in and in operator should use same array.
  • Array type should be one of these type - Array::Int32, Array::Double, Array::Contiguous.
  • Array::Speculation should be InBounds or InBoundSaneChain.
  • in operator should use index 0. Beacuse EnumeratorNextUpdateIndexAndMode use fixed index 0 that is way of ForInNode handling.

DFG bytecodes before beginning global common subexpression elimination phase

     1 27: Block #1 (bc#38): (OSR target)
...
  7  1 27:   D@92:< 1:->        GetButterfly(Cell:D@29, Storage|PureInt, R:JSObject_butterfly, Exits, bc#40, ExitValid)
  8  1 27:   D@48:< 1:->        EnumeratorNextUpdateIndexAndMode(Cell:D@29, KnownInt32:D@32, KnownInt32:D@32, KnownCell:D@37, Check:Untyped:D@92, JS|VarArgs|UseAsOther, BoolInt32|NonBoolInt32|BytecodeDouble|DoubleImpureNaN|Int52Any, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, enumeratorModes = 1, R:Butterfly_publicLength,JSObject_butterfly,IndexedInt32Properties, Exits, bc#40, ExitValid)
...
 20  1 27:   D@58:< 1:->        CompareEqPtr(Check:Untyped:D@53, Boolean|UseAsOther, Bool, <0x62d0000051f0, string, [1], [8 0x603000000d14]>, Exits, bc#47, ExitValid)
 21  1 27:   D@59:<!0:->        Branch(KnownBoolean:D@58, MustGen, T:#4/w:1.000000, F:#2/w:1.000000, W:SideState, bc#47, ExitValid)

     2 27: Block #2 (bc#51):
...
 12  2 27:   D@94:< 1:->        GetButterfly(Cell:D@29, Storage|PureInt, R:JSObject_butterfly, Exits, bc#69, ExitValid)
 13  2 27:   D@81:< 1:->        GetArrayLength(Check:KnownCell:D@29, Check:Untyped:D@94, Int32|PureInt, Int32, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, R:Butterfly_publicLength, Exits, bc#69, ExitValid)
 14  2 27:   D@84:<!0:->        CheckInBounds(Int32:D@32, Check:KnownInt32:D@81, JS|MustGen|PureInt, Int32, Exits, bc#69, ExitValid)
 15  2 27:   D@73:< 1:->        HasIndexedProperty(Object:D@29, Int32:D@32, Check:Untyped:D@94, Check:Untyped:D@84, Boolean|VarArgs|PureInt, Bool, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, R:Butterfly_publicLength,JSObject_butterfly,IndexedInt32Properties, Exits, bc#69, ExitValid)
...
 19  2 27:   D@76:<!0:->        Return(Check:Untyped:D@73, MustGen, W:SideState, Exits, bc#74, ExitValid)

DFG bytecodes after running global common subexpression elimination phase

     1 28: Block #1 (bc#38): (OSR target)
...
  7  1 28:   D@92:< 1:->        GetButterfly(Cell:D@29, Storage|PureInt, R:JSObject_butterfly, Exits, bc#40, ExitValid)
  8  1 28:   D@48:< 1:->        EnumeratorNextUpdateIndexAndMode(Cell:D@29, KnownInt32:D@32, KnownInt32:D@32, KnownCell:D@37, Check:Untyped:D@92, JS|VarArgs|UseAsOther, BoolInt32|NonBoolInt32|BytecodeDouble|DoubleImpureNaN|Int52Any, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, enumeratorModes = 1, R:Butterfly_publicLength,JSObject_butterfly,IndexedInt32Properties, Exits, bc#40, ExitValid)
...
 20  1 28:   D@58:< 1:->        CompareEqPtr(Check:Untyped:D@53, Boolean|UseAsOther, Bool, <0x62d0000051f0, string, [1], [8 0x603000000d14]>, Exits, bc#47, ExitValid)
 21  1 28:   D@59:<!0:->        Branch(KnownBoolean:D@58, MustGen, T:#4/w:1.000000, F:#2/w:1.000000, W:SideState, bc#47, ExitValid)

     2 28: Block #2 (bc#51):
...
 13  2 28:   D@81:< 1:->        GetArrayLength(Check:KnownCell:D@29, Check:Untyped:D@92, Int32|PureInt, Int32, Int32+OriginalCopyOnWriteArray+InBoundsSaneChain+AsIs+Read, R:Butterfly_publicLength, Exits, bc#69, ExitValid)
 14  2 28:   D@84:<!0:->        CheckInBounds(Int32:D@32, Check:KnownInt32:D@81, JS|MustGen|PureInt, Int32, Exits, bc#69, ExitValid)
 15  2 28:   D@73:<!0:->        CheckVarargs(MustGen|VarArgs, Bool, bc#69, ExitValid)
...
 19  2 28:   D@76:<!0:->        Return(Check:Untyped:D@48, MustGen, W:SideState, Exits, bc#74, ExitValid)

The DFG node at D@73 HasIndexedProperty is removed and added new CheckVarargs DFG node since this node has children nodes. This is a DFG behavior to create no-op for this removed DFG node. At D@76 Return DFG node has different children - D@73 —> D@48. D@73 is HasIndexedProperty DFG node and D@48 is EnumeratorNextUpdateIndexAndMode DFG node. The result of D@48 EnumeratorNextUpdateIndexAndMode DFG node value is current index of enumerator, which is integer value. And then after GCSE, DFG now treat this value as boolean type. Because it was a boolean type before the GCSE. That was return value of HasIndexedProperty DFG node.

Conclude

We’ve explored Root Cause Analysis of CVE-2023-32439 WebKit vulnerability, which is used in the wild. It means with this boolean-number type confusion vulnerability, an attacker could make aar/aaw primitive gracely. To make this primitive, further research is required for how to abuse this boolean speculated type to create side effect in the DFG and FTL. Apple addressed this issue by patching WebKit to use different HeapLocation for EnumeratorNextUpdateIndexAndMode and HasIndexedProperty DFG node by introducing new HeapLocation EnumeratorNextUpdateIndexAndModeLoc.