Windows Filtering Platform internals - Reverse Engineering the callout mechanism

11 minute read

Intro

The Windows Filtering Platform (WFP) is a framework designated for host-based network traffic filtering, replacing the older NDIS and TDI filtering capabilities. WFP exposes both UM and KM apis, offering the ability to block, permit or aduit network traffic based on conditions or deep packet inspection (through “callouts”). As you might have guessed, WFP is leveraged by the Windows Firewall, Network filters of security products and with a tiny bit of creativity can be used for offensive purposes such as in rootkits. This blogpost is going to dive into how WFP callouts are managed by the kernel, and use our knowledge to suggest ways to evade components that leverage WFP.

The provided sources

Before we jump right into it, about the blogpost’s repo: WFPEnumUM and WFPEnumDriver can be used to enumerate all registered callouts on the system (including their actual addresses, to use just load the driver and run the client). WFPCalloutDriver is a PoC callout driver (mainly used it for debugging but you can have a look to see the registration process)

Starting with the terminology

There are four terms used heavily across the WFP documentation:

  • Layers - used to categorize the type of network traffic to be evaluated, identified by a GUID and represent a location in the network processing stack. For example, you can attach on layer FWPM_LAYER_INBOUND_TRANSPORT_V4 to filter packets just after their transport header has been parsed by the network stack, but before any additional transport layer processing takes place.

  • Filters - Constructed from conditions (source port, ip etc…) and actions (permit, block, callout unknown, callout inspection and callout terminating). When the action is callout, and if the filter’s conditions match, the filter engine will call the filter’s registered driver callback, providing it with the opportunity to inspect the packet’s content. A callout may return permit, block or continue. If the action is callout terminating, it may return only permit or block, if it’s callout inspection - it should only return continue, and for callout unknown - the callout may act as terminating or not based on the result of the classification, there are no guarantees.

  • Sublayers - A way to logically group filters. Say you filter TCP traffic, and want to implement different filters for different ranges of ports, you can create two seperate sublayers for each range of ports.

  • Shims - A kernel component responsible for initating the classification process. That is, applying the correct filters to the packet and enforce the resulting action. The shim is called by tcpip.sys for each layer a packet arrives to at the network stack:

shimcallstack

More terminology!

Filter arbitration is the logic implemented in WFP to decide the relations between different filters operating on the same layers, and essentially how it all works together. What I mean is, some ordering must be applied when processing filters. So let’s get familiar with a few more WFP terms:

  • Weight - Each filter has an assocciated weight value which defines the filter’s priority within a sublayer. Each sublayer has it’s own weight to define it’s priority within a layer. The shim processes an incoming packet by traversing sublayers from the one with the highest weight to the one with the lowest. A final decision is made only after all sublayers have been evaluated, allowing a multiple matching capability.

  • Filter arbitration - Refers to the process of constructing the list of matching filters ordered by weight and evaluating them until a either a filter returns permit or block, or until the list of filters is exausted. That is of course per sublayer.

  • Policy - As I said, within a layer all sublayers will be traversed regardless of whether a sublayer evaluated a deterministic action (e.g block, permit…). What if one sublayer returns permit and the other returns block? The final decision is based on a well defined policy:

    • Actions are evaluated from high priotiy sublayers to lower priority sublayers.
    • A block decision overrides a permit decision.
    • A block decision is final. The packet will be discarded.

Understanding how callouts are managed internally

The more complex and signficant packet inspection logic is implemented by callouts. For those intrested in offensive security the ability to enumerate the registered callouts on the system, including their actual addresses (not offered by the WFP API), can be useful to evade them. For anyone else? Just a fun exercise!

Registration

A driver registers a callout with the filter engine using FwpsCalloutRegister, with a structure describing the callout to be registered.

typedef struct FWPS_CALLOUT0_ {
  GUID                                calloutKey;
  UINT32                              flags;
  FWPS_CALLOUT_CLASSIFY_FN0           classifyFn;
  FWPS_CALLOUT_NOTIFY_FN0             notifyFn;
  FWPS_CALLOUT_FLOW_DELETE_NOTIFY_FN0 flowDeleteFn;
} FWPS_CALLOUT0;
  • classifyFn - The callback function where the filtering logic is implemented.
  • notifyFn - Called whenever a filter that references the callout is added or removed. Another honorable mention is a flag named FWP_CALLOUT_FLAG_CONDITIONAL_ON_FLOW. As per MSDN: one more thing to note is a flag called FWP_CALLOUT_FLAG_CONDITIONAL_ON_FLOW , as MSDN says :
    "A callout driver can specify this flag when registering a callout that will be added at a layer that supports data flows. If this flag is specified, the filter engine calls the callout driver's classifyFn0 callout function only if there is a context associated with the data flow. A callout driver associates a context with a data flow by calling the FwpsFlowAssociateContext0 function."
    

    Remember this one, we will come back to it later : )

Assocciating the callout with a filter and a layer

First, a driver must add the callout to a layer, by calling FwpmCalloutAdd. After the callout has been added, a driver must create filter that references the callout, by calling FwpmFilterAdd.

  • The latter can be done from Usermode. Generally speaking, a callout is registered with a GUID, and identified internally by the filter engine with a corresponding ID. An example callout driver to demonstrate the callout registration process is provided in the sources.

Registration - This time internally

Taking a look at FwpsCalloutRegister you will observe the following sequence of calls: fwpkclnt!FwpsCalloutRegister<X> -> fwpkclnt!FwppCalloutRegister -> NETIO!KfdAddCalloutEntry -> NETIO!FeAddCalloutEntry. The reversed version of NETIO!FeAddCalloutEntry:

__int64 __fastcall FeAddCalloutEntry(
        int a1,
        __int64 ClassifyFunction,
        __int64 NotifyFn,
        __int64 FlowDeleteFn,
        int Flags,
        char a6,
        unsigned int CalloutId,
        __int64 DeviceObject)
{
  __int64 v12; // rcx
  __int64 CalloutEntry; // rdi
  char v14; // bp
  __int64 CalloutEntryPtr; // rbx
  __int64 v16; // rax

  CalloutEntry = WfpAllocateCalloutEntry(CalloutId);
  if ( CalloutEntry )
    goto LABEL_17;
  v14 = 1;
  CalloutEntryPtr = *(_QWORD *)(gWfpGlobal + 0x198) + 0x50i64 * CalloutId;
  if ( !*(_DWORD *)(CalloutEntryPtr + 4) && !*(_DWORD *)(CalloutEntryPtr + 8) )
  {
LABEL_6:
    if ( !CalloutEntry )
      goto LABEL_7;
LABEL_17:
    WfpReportError(CalloutEntry, "FeAddCalloutEntry");
    return CalloutEntry;
  }
  v16 = WfpReportSysErrorAsNtStatus(v12, "IsCalloutEntryAvailable", 0x40000000i64, 1i64);
  CalloutEntry = v16;
  if ( v16 )
  {
    WfpReportError(v16, "IsCalloutEntryAvailable");
    goto LABEL_6;
  }
LABEL_7:
  memset(CalloutEntryPtr, 0i64, 0x50i64);
  *(_DWORD *)CalloutEntryPtr = a1;
  *(_DWORD *)(CalloutEntryPtr + 4) = 1;
  if ( a1 == 3 )
    *(_QWORD *)(CalloutEntryPtr + 40) = ClassifyFunction;
  else
    *(_QWORD *)(CalloutEntryPtr + 16) = ClassifyFunction;
  *(_DWORD *)(CalloutEntryPtr + 48) = Flags;
  *(_BYTE *)(CalloutEntryPtr + 73) = a6;
  *(_QWORD *)(CalloutEntryPtr + 24) = NotifyFn;
  *(_QWORD *)(CalloutEntryPtr + 32) = FlowDeleteFn;
  *(_BYTE *)(CalloutEntryPtr + 72) = 0;
  *(_WORD *)(CalloutEntryPtr + 74) = 0;
  *(_DWORD *)(CalloutEntryPtr + 76) = 0;
  if ( DeviceObject )
  {
    ObfReferenceObject(DeviceObject);
    *(_QWORD *)(CalloutEntryPtr + 64) = DeviceObject;
  }
  if ( !dword_1C007D018 || !(unsigned __int8)tlgKeywordOn(&dword_1C007D018, 2i64) )
    v14 = 0;
  if ( v14 )
    WfpCalloutDiagTraceCalloutAddOrRegister(CalloutId, CalloutEntryPtr);
  return CalloutEntry;
}

We can see our callout and associated information is stored in an array of callout entries where each entry is of size 0x50 and is indexed by the callout id -> (NETIO!gWfpGlobal + 0x198) * (CalloutId + 0x50). At offset 0x10 we can find our registered ClassifyFunction callback.

Looking around for additional references to the said array I found a function named NETIO!FeInitCalloutTable:

FeInitCalloutTable

The initial size of gWfpGlobal + 0x198 is 0x14000 bytes. On registration, the size can be expanded as required and memory will be allocated, the registration data will be copied and the original allocation will be freed. In addition, gWfpGlobal + 0x190 holds the max callout id in the array.

FeGetWfpGlobalPtr

there’s an exported function by NETIO that will return the address of gWfpGlobal

getwfpglobalptr

by now , we have enough knowledge to :

  • find the address of NETIO!gWfpGlobal (sig scan from UM or FeGetWfpGlobalPtr if you can load a driver)
  • read offsets 0x198 and 0x190 to get the array pointer and the maximum number of entries
  • traverse all entries, the address stored at offset 0x10 from each entry is the classify callout : )

whilst this is a valid an option, and it has actually been used in the wild by Lazarus's FudModule rootkit - we can take a more reliable approach.

NETIO!KfdGetRefCallout

There’s a function called GetCalloutEntry in NETIO: GetCalloutEntry

Even better! There’s an undocumented export named NETIO!KfdGetRefCallout which essentially wraps GetCalloutEntry (KfdGetRefCallout -> FeGetRefCallout > GetCalloutEntry). The latter can be used to get a pointer to the callout entry structure associated with a specific callout id, without using any offsets! KfdGetREF

  • note : we have to call NETIO!KfdDeRefCallout for each call

FwpmCalloutEnum usermode API

Putting it all together, we can find all registered callout ids on the system with the FwpmCalloutEnum0 API from usermode.

The provided source WFPEnumDriver exposes an IOCTL that gets a callout id, and returns it’s corresponding CalloutEntry address, ClassifyFunction callback address and NotifyFunction callback address.

WFPEnum is the usermode client that leverages that IOCTL for each callout id enumerated by FwpmCalloutEnum and displays all avaliable information about each callout:

CalloutsOutput

Silencing callouts - some general ideas

so, let’s say you want to hide your traffic from an AV / AC product, that uses a WFP network filter to scan traffic on a layer you are communicating in, what can you do about it?

Hook the callout

Assuming you can load a driver, hooking callouts can be a solution, prefix your traffic with a certian magic number, in your hooked classify callout inspect the data, if it has your magic return continue (which will call the next filters for your packet if any - skipping the AV / AC one) if it’s not just call the original callout.

you’d also have to maintain a rundown ref for pending operations to avoid premature unloading (generally the WFP handles it for the registered driver by calling ObRefereneObject on the CalloutEntry->DeviceObject and deref when it’s callout returns, IopCheckUnloadDriver will not unload as long as the driver in question has a referenced device object…)

nulling the entry

But what if you don’t have a driver? An alternative would be to null the entire callout entry of the target callout you want to silence. Of course, a possible side effect is the fact the callout will never be called. Another (major) side effect may arise if the targeted filter callout action type is anything but callout inspection , quoting MSDN :

  • “A callout and filters that specify the callout for the filter’s action can be added to the filter engine before a callout driver registers the callout with the filter engine. In this situation, filters with an action type of FWP_ACTION_CALLOUT_TERMINATING or FWP_ACTION_CALLOUT_UNKNOWN are treated as FWP_ACTION_BLOCK, and filters with an action type of FWP_ACTION_CALLOUT_INSPECTION are ignored until the callout is registered with the filter engine.”

It’s worth noting that a filter can have the FWPM_FILTER_FLAG_PERMIT_IF_CALLOUT_UNREGISTERED flag set, but as long as it does not, and the filter action type is callout terminating or unknown, nulling the entry will be equivalent to blocking the action from the sublayer ):

So how can we overcome such case? In theory, an approach would be to manipulte the filter structure in memory and change the action type to callout inspection, making WFP ignore our silenced filter and callout. I haven’t implemented it myself, but in case you might want to - you may find the export NETIO!KfdFindFilterById useful, here’s the partly reversed prototype and code:

image

Under the hood, filters are organised in a hash table (gWfpGlobal + 0x180 , build dependent) where the hash index is calculated based on the layer and filter id as shown below:

image

NETIO!FeDefaultClassifyCallback

An alternative that can be used as part of a data only attack, have a look at the following:

image

ClassifyDefault

The default filter engine classify callout will almost always return permit, thus we can replace the EDR / AV / AC classify callout with it, avoiding the unwanted side effect of nulling a callout terminating / unknown entry and causing legitimate traffic to be blocked! The only side effect with this is traffic that would have been orginially blocked by the AV / EDR will now be permitted, whether that’s good enough opsec wise is up to you and your operation. The address of gFeCallout can be easily found via pattern scanning , adding an offset we have FeDefaultClassifyCallback.

But what about the legitimate usage of FeDefaultClassifyCallback? it seems at least to be used in the FeDeleteCalloutEntry, likely to mark the callout as invalid when it is in the process of being freed. The function resets the is_enabled flag, and waits for its refcount to drop to 0 (if other threads are interacting with the object) then continues to delete the callout. So it does makes sense that if the flag is 0, any function trying to get the callout object retrieves a “default/blank” one instead

enabling a callout entry flag

remember that ‘FWP_CALLOUT_FLAG_CONDITIONAL_ON_FLOW’ flag ? you could intentionally flip it (enable it) in the callout entry so any callout without an associated data flow context will be ignored (read more here MSDN linkk) This is not fullproof as some callouts might use a data flow context by design, and will have a data flow context and the callout will still be triggered.

Tags:

Updated: