UPK Format

UPK is the resource packing format for Unreal engine based game. It actually have many versions. Besides the variance between versions, game vendors are able to add new stuff into the format, which adds more difficulty in understanding the package format. I have spent a few time figuring out the format of UPK file of Blade and Soul, and here are some of my understanding summarized.

First, as Gildor (developer of umodel, a UPK extracting tool) pointed out here, UPK is much more than an archive format. It is an object packing format, in which object inside reference each other. It also has import table and export table, allow it to reference object stored in another UPK file. In this way, it shares more similarity with PE or ELF format. Moreover, inside a UPK, data is grouped into serialized objects in a predefined format. So, to unlock the UPK and extract information from it, both the UPK file structure and the object serialization format have to be figured out.

UPK file format is stated very clear in Eliot's blog (Eliot is developer of UE Explorer another nice tool to have in this business). Here are the links in the right order of reading:
For the BNS game, the file version is 573. Inside the package, most of plain text strings are collected in the name table. If objects inside the UPK needs to use a string, it uses the index of the corresponding string in name table. In the objects in export table, the class object is the object that represent the class of this object (seems weird, but everything is object in OOP. I guess that class object contains meta information about the class). The super object is the in case the object is of a class, a sub class of another class. Then super object contains the data for the super class. Also worth noting is the hierarchy information in export table, which is represented by outer object (I prefer the name parent object), indicates in which object this object is referenced. Similar logic apply to object in the import table, only index is replaced with names.

An object inside a UPK is referenced using its index in export table or import table, where 0(zero) represent null object, positive number represents object in import table and negative number represent object in export table (negated). Index for both table are one-based. 

Each export table entry points to a serialized object stored inside UPK file. The SerialOffset marks the file offset of the beginning of the object and SerialSize indicate the total size of storage used in file. In BNS, the object size and file offset record in export table entries are encrypted (naively though, like no one can figure it out). Gildor shows the algorithm to decrypt it here. The encryption algorithm can be obtained by working backwards.

Up to this point, one is able to extract serialized object from UPK. The serialized object are stored in a format which is quite intuitive. Actually I spent more time looking for information about the serialization than figuring out the format myself using a hex editor. Here I define what the file pointer points to is called serial_object_file and rest is defined from here using C structure syntax (extended). Little-endian is used for BNS, a PC game, which seems fair.
struct serial_object_file{
    int32        sn;                            // a serial number inside file, does not seem useful
    serial_object        object;   
};
struct serial_object{
    property        property[N];       // N is the number of properties the object has
    int32               end;                  // it points to the name "None" in name table
};
struct property{
    int32        name;                // name of property, index in the name table (zero-based)
    int32        name_flag;        // not sure, usually zero, maybe the upper half of 64bit int for name
    int32        class;                   // class of property, index in name table
    int32        class_flag;           //not sure, usually zero
    int32        size;                     //size of the data content
    int32        size_flag;            // not sure, usually zero.
    uint8        data[size];           // data of this property
};
Depend on the class name of the property, the data section has different layouts. Below are some frequently seen ones.
BoolProperty:    size=0 (weird)    int32
IntProperty:        size=4        int32
FloatProperty:    size=4        float 
NameProperty:   size=8       int32     index in name table
ByteProperty:      size=8       int32     index in name table
ArrayProperty:     size=?      
        struct array{
                int32        n_element;    //number of element in array
                serial_object  objects[n_element];     //if size/n_element>=8
                int32/float/other   object[n_element]; 
        };
ObjectProperty:    size=4        int32   object reference
Some objects, e.g. AnimSequence object, have additional data appended after this. Of course, the object data should provide enough information about what the attached data is. For example, AnimSequence objects have real animation data appended after the serialized object. The serialized AnimSequence object contains an index table for extracting each individual track of animation from the data afterwards.

After I figured out all these, I found the UE Explorer software did a very good job in both UPK file format and object serialization format. However, I cannot find Eliot's document about the object serialization format and UE Explorer have not support BNS yet due to the encryption. Moreover, my goal is to modify the package so I need some insight in the data structure than just using the tools others made available.
Comments