PAK (Metroid Prime): Difference between revisions

From Retro Modding Wiki
Jump to navigation Jump to search
>Aruki
m (it's probably not actually that minor)
>Aruki
 
(10 intermediate revisions by the same user not shown)
Line 4: Line 4:


Note that the Metroid Prime 3 E3 prototype uses this pak format as well, but has 64-bit file IDs. This is the only difference; the version number is still the same, so there isn't an easy way to check for this variation of the format.
Note that the Metroid Prime 3 E3 prototype uses this pak format as well, but has 64-bit file IDs. This is the only difference; the version number is still the same, so there isn't an easy way to check for this variation of the format.
{{research|minor|Some research is needed to figure out the optimal way to order files to optimize load times.}}


__TOC__
__TOC__
Line 17: Line 15:
{| class="wikitable"
{| class="wikitable"
! Offset
! Offset
! Size
! Type
! Description
! Name
! Notes
|-
|-
| 0x0
| 0x0
| 4
| int16
| '''Version number'''. Always 0x00030005.
| '''Version Number Major'''
| Always 3
|-
| 0x2
| int16
| '''Version Number Minor'''
| Always 5
|-
|-
| 0x4
| 0x4
| 4
| int32
| '''Unused'''; always 0
| '''Unused'''
| Always 0
|-
|-
| 0x8
| 0x8
| colspan=2 {{unknown|End of header}}
| colspan=3 {{unknown|End of header}}
|}
|}


Line 40: Line 46:
{| class="wikitable"
{| class="wikitable"
! Offset
! Offset
! Size
! Type
! Description
! Count
! Name
! Notes
|-
|-
| 0x0
| 0x0
| 4
| {{FourCC}}
| '''File type''' fourCC.
| 1
| '''Asset Type'''
| Indicates the type of asset (texture, model, area, etc) and usually doubles as the asset's cooked file extension.
|-
|-
| 0x4
| 0x4
| 4
| int32
| '''File ID'''.
| 1
| '''Asset ID'''
| Unique identifier for this asset.
|-
|-
| 0x8
| 0x8
| 4
| int32
| '''Name length''' (NL)
| 1
| '''Name Length'''
| Length of the name string
|-
|-
| 0xC
| 0xC
| NL
| char
| '''Name'''; not zero-terminated
| ''Name Length''
| '''Name String'''
| Name of the asset. Not zero-terminated. This name usually corresponds to a hardcoded string that the game uses to look up the asset, so this generally can't be changed.
|-
|-
| 0xC + NL
| colspan=5 {{unknown|End of entry}}
| colspan=2 {{unknown|End of entry}}
|}
|}


Line 69: Line 84:
{| class="wikitable"
{| class="wikitable"
! Offset
! Offset
! Size
! Type
! Description
! Name
! Notes
|-
|-
| 0x0
| 0x0
| 4
| int32
| '''Compression flag'''; this will either be 0 or 1, with 1 denoting a compressed file.
| '''Compression Flag'''
| Value will be either 0 or 1, with 1 denoting that the asset is compressed.
|-
|-
| 0x4
| 0x4
| 4
| {{FourCC}}
| '''File type''' fourCC.
| '''Asset Type'''
| Indicates the type of asset (texture, model, area, etc) and usually doubles as the asset's cooked file extension.
|-
|-
| 0x8
| 0x8
| 4
| int32
| '''File ID'''
| '''Asset ID'''
| Unique identifier for this asset. This ID is used by other assets to reference this one.
|-
|-
| 0xC
| 0xC
| 4
| int32
| '''Size''' (note: always a multiple of 32. The end of the file is padded with 0xFF in order to be 32-byte aligned.)
| '''Size'''
| Size of the asset data in the pak. This is always 32-byte-aligned. The end of the asset is padded with 0xFF.
|-
|-
| 0x10
| 0x10
| 4
| int32
| '''Offset'''
| '''Offset'''
| Offset of the asset data within the pak. This is an absolute offset.
|-
|-
| 0x14
| 0x14
| colspan=2 {{unknown|End of entry}}
| colspan=4 {{unknown|End of entry}}
|}
|}


Line 120: Line 141:
* [[CRSC (File Format)|CRSC]]
* [[CRSC (File Format)|CRSC]]


Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless).
Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless, as there's no benefit to the compression in this scenario).
 
== Optimizing Load Times ==
 
When rebuilding new paks, it's extremely important that asset order be optimized in order to minimize the distance the game has to seek on the disc in order to load any given asset. A poorly optimized pak can easily have 30+ second load times on small rooms. There are three main things that are done in order to keep load times fast.
 
* In world paks, assets are often duplicated so they appear multiple times within the pak; if an asset is used by a lot of different areas this can help reduce seek distance, but has the downside of bloating the total file size of the pak. To ensure the best balance between the two, asset duplication was flagged on a per-area basis, as larger areas take longer to load and would benefit from the faster load speed, whereas small rooms tend to load quickly anyway even without duplicates. The flag that was used for this only existed in Retro's raw files and isn't present in the final game's cooked data. You can recreate it by analyzing the pak and noting down which rooms have duplicate assets (bearing in mind that the assets used by a room always precede that room's MREA asset). If you are working strictly from a pre-unpacked pak, you can't replicate the flag.
* Assets are generally grouped by load order - things that are used together are adjacent in the pak, and an asset's dependencies typically appear immediately preceding the asset itself. This grouping helps ensure that even when assets aren't duplicated the game will still be able to do just one large seek and then load a bunch of assets at once, instead of having to do a separate seek for each asset.
* Once you have assets stored in the pak in the right order, you need to make sure the game actually reads them in that order! The exact order that assets are loaded in is specified by the dependency list in the [[MLVL (File Format)|MLVL]] file; as such, the MLVL list should also have assets grouped in the same way and the same order and should at least roughly match the order files are positioned in the pak.


== File Order ==
For every asset loaded, the game checks every copy of the asset in the pak and loads whichever duplicate is the shortest distance from the last read position (the end of the last asset that was loaded). When loading an area, this is the order the game loads assets in:
* First the game goes down the area dependency list in the [[MLVL (File Format)|MLVL]] file and loads every file in the list for every active layer. The game follows the exact order specified by the list.
* After loading all area dependencies, the area itself is loaded.
* Finally the game loads assets being used by the new area that are missing from the list. This primarily means [[SCAN (File Format)|SCAN]] files (in MP1), the skybox model, and assets being used by PlayerActor animsets (this avoids loading PlayerActor assets for suits that the player doesn't have). If any other assets were missing from the list, they are loaded here. It's very important to know that contrary to the previous steps, assets that still aren't in memory at this point are loaded ''synchronously'', which means the game will hang until loading is complete; therefore, you want as few assets as possible to be left to load at this point. (You can observe this hang even in the base game - there is a small freeze towards the end of loading Artifact Temple.)
* Note that any assets that are already in memory will be skipped.


File order matters significantly and easily makes the difference between a pak loading quickly and optimally, or taking 30+ seconds on every door. While more research is required to figure out exactly how files should be ordered to optimize loading, the game generally clusters together resources that are used together, and a file's dependencies generally appear directly before the file itself. The game also often duplicates assets that are used multiple times within the same pak; there might be 10-11 copies of the same model file in order to allow the game to more easily find and load that file when it needs it.
'''Note:''' There are some discrepancies between the order assets appear in the pak and the actual load order specified by the MLVL file. There might be some more research needed to fully explain all the quirks behind Retro's pak optimization.


== Tools ==
== Tools ==


* [https://drive.google.com/file/d/0B9MLV21H7SDvcGc1QjRfSTFJNk0/view?usp=sharing PakTool] by Parax
* [https://www.dropbox.com/s/bba2n9jzj719by4/PakTool.rar?dl=0 PakTool] by Aruki


[[Category:File Formats]]
[[Category:File Formats]]
[[Category:Metroid Prime]]
[[Category:Metroid Prime]]
[[Category:Metroid Prime 2: Echoes]]
[[Category:Metroid Prime 2: Echoes]]

Latest revision as of 01:31, 13 April 2019

See PAK (File Format) for the other revisions of this format.

The .pak format in Metroid Prime and Metroid Prime 2 is a fairly simple packfile format; files are stored with a 32-bit file ID, and can optionally be compressed with zlib (Metroid Prime) or LZO1X-999 (Metroid Prime 2). The assets in a pak are split into two groups: named resources and dependencies. In general, only named resources are accessed directly by the game; the rest are dependencies of the named resources, and are accessed indirectly in the process of parsing those files.

Note that the Metroid Prime 3 E3 prototype uses this pak format as well, but has 64-bit file IDs. This is the only difference; the version number is still the same, so there isn't an easy way to check for this variation of the format.

Format

Header

The pak starts with a short 8-byte header:

Offset Type Name Notes
0x0 int16 Version Number Major Always 3
0x2 int16 Version Number Minor Always 5
0x4 int32 Unused Always 0
0x8 End of header

Named Resources

The named resource table lists files that the game has direct access to. On world paks, this will generally only contain the MLVL file; in other paks, this table is usually quite a bit larger. Note that in non-world paks, the names are hardcoded and are how the game knows where to find the files; if you repack, you need to make sure you keep the names the same.

This section of the file begins with a 32-bit named resource count, followed by a table.

Offset Type Count Name Notes
0x0 FourCC 1 Asset Type Indicates the type of asset (texture, model, area, etc) and usually doubles as the asset's cooked file extension.
0x4 int32 1 Asset ID Unique identifier for this asset.
0x8 int32 1 Name Length Length of the name string
0xC char Name Length Name String Name of the asset. Not zero-terminated. This name usually corresponds to a hardcoded string that the game uses to look up the asset, so this generally can't be changed.
End of entry

Resource Table

This could be considered the main table of contents of the pak. This table begins with a 32-bit resource count, followed by one entry per file; each entry is 0x14 bytes large.

Offset Type Name Notes
0x0 int32 Compression Flag Value will be either 0 or 1, with 1 denoting that the asset is compressed.
0x4 FourCC Asset Type Indicates the type of asset (texture, model, area, etc) and usually doubles as the asset's cooked file extension.
0x8 int32 Asset ID Unique identifier for this asset. This ID is used by other assets to reference this one.
0xC int32 Size Size of the asset data in the pak. This is always 32-byte-aligned. The end of the asset is padded with 0xFF.
0x10 int32 Offset Offset of the asset data within the pak. This is an absolute offset.
0x14 End of entry

Compression

Compressed files begin with their 32-bit decompressed size, followed by the compressed file data. Metroid Prime uses zlib, which is easily recognized from zlib's 0x78DA header at the start of the compressed data. Each file is compressed as a single zlib stream.

Metroid Prime 2 uses segmented LZO1X-999, and gets slightly more complicated. Metroid Prime 2's compressed files are split up into multiple segments of compressed data, each of which is 0x4000 bytes large when decompressed (except for the last one) and should be compressed and decompressed separately. Each segment begins with a 16-bit size value before its actual compressed data begins.

Only certain formats are compressed. The following formats are always compressed:

The following formats are compressed when their uncompressed size is at least 0x400 bytes (0x40 bytes in the MP3 prototype):

Additionally, in Prime 2 any files can be left uncompressed if the compressed file is larger than the uncompressed one. This is not the case in Prime 1 (although custom repacking implementations should probably do this regardless, as there's no benefit to the compression in this scenario).

Optimizing Load Times

When rebuilding new paks, it's extremely important that asset order be optimized in order to minimize the distance the game has to seek on the disc in order to load any given asset. A poorly optimized pak can easily have 30+ second load times on small rooms. There are three main things that are done in order to keep load times fast.

  • In world paks, assets are often duplicated so they appear multiple times within the pak; if an asset is used by a lot of different areas this can help reduce seek distance, but has the downside of bloating the total file size of the pak. To ensure the best balance between the two, asset duplication was flagged on a per-area basis, as larger areas take longer to load and would benefit from the faster load speed, whereas small rooms tend to load quickly anyway even without duplicates. The flag that was used for this only existed in Retro's raw files and isn't present in the final game's cooked data. You can recreate it by analyzing the pak and noting down which rooms have duplicate assets (bearing in mind that the assets used by a room always precede that room's MREA asset). If you are working strictly from a pre-unpacked pak, you can't replicate the flag.
  • Assets are generally grouped by load order - things that are used together are adjacent in the pak, and an asset's dependencies typically appear immediately preceding the asset itself. This grouping helps ensure that even when assets aren't duplicated the game will still be able to do just one large seek and then load a bunch of assets at once, instead of having to do a separate seek for each asset.
  • Once you have assets stored in the pak in the right order, you need to make sure the game actually reads them in that order! The exact order that assets are loaded in is specified by the dependency list in the MLVL file; as such, the MLVL list should also have assets grouped in the same way and the same order and should at least roughly match the order files are positioned in the pak.

For every asset loaded, the game checks every copy of the asset in the pak and loads whichever duplicate is the shortest distance from the last read position (the end of the last asset that was loaded). When loading an area, this is the order the game loads assets in:

  • First the game goes down the area dependency list in the MLVL file and loads every file in the list for every active layer. The game follows the exact order specified by the list.
  • After loading all area dependencies, the area itself is loaded.
  • Finally the game loads assets being used by the new area that are missing from the list. This primarily means SCAN files (in MP1), the skybox model, and assets being used by PlayerActor animsets (this avoids loading PlayerActor assets for suits that the player doesn't have). If any other assets were missing from the list, they are loaded here. It's very important to know that contrary to the previous steps, assets that still aren't in memory at this point are loaded synchronously, which means the game will hang until loading is complete; therefore, you want as few assets as possible to be left to load at this point. (You can observe this hang even in the base game - there is a small freeze towards the end of loading Artifact Temple.)
  • Note that any assets that are already in memory will be skipped.

Note: There are some discrepancies between the order assets appear in the pak and the actual load order specified by the MLVL file. There might be some more research needed to fully explain all the quirks behind Retro's pak optimization.

Tools