Xori Disassembler

Xori is an automation-ready disassembly and static analysis library that consumes shellcode or PE binaries and provides triage analysis data.


Original Abstract from BlackHatUSA 2018:

In a world of high volume malware and limited researchers we need a dramatic improvement in our ability to process and analyze new and old malware at scale. Unfortunately what is currently available to the community is incredibly cost prohibitive or does not rise to the challenge. As malware authors and distributors share code and prepackaged tool kits, the white hat community is dominated by solutions aimed at profit as opposed to augmenting capabilities available to the broader community. With that in mind, we are introducing our library for malware disassembly called Xori as an open source project. Xori is focused on helping reverse engineers analyze binaries, optimizing for time and effort spent per sample.

Xori is an automation-ready disassembly and static analysis library that consumes shellcode or PE binaries and provides triage analysis data. This Rust library emulates the stack, register states, and reference tables to identify suspicious functionality for manual analysis. Xori extracts structured data from binaries to use in machine learning and data science pipelines.

We will go over the pain-points of conventional open source disassemblers that Xori solves, examples of identifying suspicious functionality, and some of the interesting things we’ve done with the library. We invite everyone in the community to use it, help contribute and make it an increasingly valuable tool in this arms race.

Architectures:

  • i386
  • x86-64

File Formats

  • PE, PE+
  • Plain shellcode

Current Features

  • Outputs json of the 1) Disassembly, 2) Functions, and 3) Imports.
  • Manages Image and Stack memory.
  • 2 modes:
    • Light Emulation - meant to enumerate all paths (Registers, Stack, Some Instructions).
    • Full Emulation - only follows the code’s path (Slow performance).
  • Simulated TEB & PEB structures.
  • Evaluates functions based on DLL exports.
  • Displays strings based on referenced memory locations.
  • Uses FLIRT style signatures (Fast Library Identification and Recognition Technology).
  • Allows you to use your own exports for simulating the PEB.
  • Will detect padding after a non-returning call.
  • Will try to identify function references from offsets.

What it doesn't do yet:

  • The engine is interactive.
  • Does not dump strings.
  • Does not process non-executable sections.
  • TEB and PEB are not enabled for non-pe files.
  • Only some x86 instructions are emulated, not all.
  • Patching and assembling.
  • No plugins or scripting.