PAK (Metroid Prime)

See PAK (File Format) for the other revisions of this format.

The .pak format in Metroid Prime and Metroid Prime 2 is a fairly simple packfile format; files are stored with a 32-bit file ID, and can optionally be compressed with zlib (Metroid Prime) or LZO1X-999 (Metroid Prime 2). The assets in a pak are split into two groups: named resources and dependencies. In general, only named resources are accessed directly by the game; the rest are dependencies of the named resources, and are accessed indirectly in the process of parsing those files.

Note that the Metroid Prime 3 E3 prototype uses this pak format as well, but has 64-bit file IDs. This is the only difference; the version number is still the same, so there isn't an easy way to check for this variation of the format.

Header
The pak starts with a short 8-byte header:

Named Resources
The named resource table lists files that the game has direct access to. On world paks, this will generally only contain the MLVL file; in other paks, this table is usually quite a bit larger. Note that in non-world paks, the names are hardcoded and are how the game knows where to find the files; if you repack, you need to make sure you keep the names the same.

This section of the file begins with a 32-bit named resource count, followed by a table.

Resource Table
This could be considered the main table of contents of the pak. This table begins with a 32-bit resource count, followed by one entry per file; each entry is 0x14 bytes large.

Compression
Compressed files begin with their 32-bit decompressed size, followed by the compressed file data. Metroid Prime uses zlib, which is easily recognized from zlib's 0x78DA header at the start of the compressed data. Each file is compressed as a single zlib stream.

Metroid Prime 2 uses segmented LZO1X-999, and gets slightly more complicated. Metroid Prime 2's compressed files are split up into multiple segments of compressed data, each of which is 0x4000 bytes large when decompressed (except for the last one) and should be compressed and decompressed separately. Each segment begins with a 16-bit size value before its actual compressed data begins.

Only certain formats are compressed. The following formats are always compressed:


 * TXTR
 * CMDL
 * CSKR
 * ANCS
 * ANIM
 * FONT

The following formats are compressed when their uncompressed size is at least 0x400 bytes (0x40 bytes in the MP3 prototype):


 * PART
 * ELSC
 * SWHC
 * WPSC
 * DPSC
 * CRSC

Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless).

Optimizing Load Times
When rebuilding new paks, it's extremely important that asset order be optimized in order to minimize the distance the game has to seek on the disc in order to load any given asset. A poorly optimized pak can easily have 30+ second load times on small rooms. There are three main things that are done in order to keep load times fast.


 * In world paks, assets are often duplicated so they appear multiple times within the pak; if an asset is used by a lot of different areas this can help reduce seek distance, but has the downside of bloating the total file size of the pak. To ensure the best balance between the two, asset duplication was likely flagged on a per-area basis, as large areas tend to have a lot of duplicate resources, while smaller ones that load quickly anyway don't.
 * Assets are generally grouped by load order - things that are used together are adjacent in the pak, and an asset's dependencies typically appear immediately preceding the asset itself. This grouping helps ensure that even when assets aren't duplicated the game will still be able to do just one large seek and then load a bunch of assets at once, instead of having to do a separate seek for each asset.
 * Once you have assets stored in the pak in the right order, you need to make sure the game actually reads them in that order! The exact order that assets are loaded in is specified by the dependency list in the MLVL file; as such, the MLVL list should also have assets grouped in the same way and the same order and should at least roughly match the order files are positioned in the pak.

For every asset loaded, the game checks every copy of the asset in the pak and loads whichever duplicate is the shortest distance from the last read position (the end of the last asset that was loaded). When loading an area, this is the order the game loads assets in:
 * First the game goes down the area dependency list in the MLVL file and loads every file in the list for every active layer. The game follows the exact order specified by the list.
 * After loading all area dependencies, the area itself is loaded.
 * Finally the game loads assets being used by the new area that are missing from the list. This primarily means SCAN files (in MP1), the skybox model, and assets being used by PlayerActor animsets (this avoids loading PlayerActor assets for suits that the player doesn't have). If anything else is missing from the list, they will still be loaded, but this step will cause the game to hang until the load is finished.
 * Note that any assets that are already in memory will be skipped.

Note: There are some discrepancies between the order assets appear in the pak and the actual load order specified by the MLVL file. There might be some more research needed to fully explain all the quirks behind Retro's pak optimization.

Tools

 * PakTool by Parax