PAK (Metroid Prime 3)

Revision as of 00:40, 18 February 2015 by >Aruki (→‎Compression)

See PAK (File Format) for the other revisions of this format.

The .pak format in Metroid Prime 3 and Donkey Kong Country Returns is a new version of the pak format with some slight changes. The most notable change is that resource IDs have been extended from 32 bits to 64 bits. The new format also aligns to 64-byte boundaries, rather than 32. Aside from that, the other changes are mainly organizational; it's still largely the same data as older versions of the format.

Note that the Metroid Prime 3 E3 prototype uses a slightly modified version of the Metroid Prime pak format, not this one. See the PAK (Metroid Prime) page for more info on that revision of the format.

This file format is almost completely documented
Research needs to be done on how to order files to optimize loads. Also, the very first part of the header is mostly unknown.

Format

The header is split into a number of smaller chunks that are each padded to the next 64-byte offset.

Header

Offset Size Description
0x0 4 Unknown. Always 2.
0x4 4 Header size
0x8 16 MD5 hash of the entire pak after the first 64 bytes.
0x18 End of header; pad to 64 bytes

Table of Contents

The next part of the file, starting at 0x40, is a brief list of each section in the pak and its size. Although it uses a count value, this section always lists the same data in the same order.

Offset Size Description
0x0 4 Section count; always 3
0x4 4 "STRG" fourCC
0x8 4 Named Resources section size
0xC 4 "RSHD" fourCC
0x10 4 Resource Table section size
0x14 4 "DATA" fourCC
0x18 4 File data section size
0x1C End of ToC; pad to 64 bytes

Named Resources

This section, always starting at 0x80, lists named resources. These are the files that the game has direct access to; every other file in the pak will be a dependency of one of these.

It starts with a 32-bit count, followed by the table itself; each entry of the table is structured as follows:

Type Size Description
string - Name; zero-terminated
fourCC 4 Resource type
u64 8 Resource ID

Resource Table

The final table contains a list of every resource in the pak. Following a 32-bit count value, each resource entry is structured as follows:

Offset Size Description
0x0 4 Compression flag; this will either be 0 or 1, with 1 denoting a compressed file.
0x4 4 Resource type fourCC.
0x8 8 Resource ID
0x10 4 Size (note: always a multiple of 64; the end of the file is padded with 0xFF if necessary)
0x14 4 Offset (relative to the start of the DATA section)
0x18 End of entry

Compression

Some files within paks are compressed; in Prime 3, files are compressed with segmented LZO1X, just like Prime 2. In DKCR, on the other hand, files are compressed as a single zlib stream, like Prime 1. In either game, every compressed file is formatted the same way, starting with some metadata detailing the structure of the compressed data.

Offset Size Description
0x0 4 "CMPD" fourCC
0x4 4 Compressed block count

Each compressed block is structured as follows. Note that the first byte of each size value appears to actually be a completely different value, so the three bytes following are what makes up the actual size value.

For each block, if the compressed and decompressed sizes match, that indicates the block is uncompressed.

Offset Size Description
0x0 1 Unknown
0x1 3 Compressed size
0x4 1 Unknown
0x5 3 Decompressed size
0x8 End of entry

Files are left uncompressed if the compressed data is larger than the uncompressed data. Barring that, the following formats are always compressed:

The following formats are compressed when their uncompressed size is at least 0x80 bytes:

File Order

File order matters significantly and easily makes the difference between a pak loading quickly and optimally, or taking 30+ seconds on every door. While more research is required to figure out exactly how files should be ordered to optimize loading, the game generally clusters together resources that are used together, and a file's dependencies generally appear directly before the file itself. The game also often duplicates assets that are used multiple times within the same pak; there might be 10-11 copies of the same model file in order to allow the game to more easily find and load that file when it needs it.