Proposal for a new image format (container)
In addition to several MIP-levels, the image file contains a matrix of fixed-size sub-images for each MIP-level (e.g. 1024x1024px).
Each sub-image has a set of coordinates/indexes attached to it so that the original image can easily be reconstructed by stitching the sub-images together.
The MIP-based file format will have to be altered slightly: Instead of a simple JPG-compressed MIP-level, a tiled image would have to be stored.
Since every MIP-level is only half the size of its previous level, the number of tiles will be reduced by a factor of 4 in each step.
The file header now consisting of a simple offset map would have to be extended to store the intra-MIP offsets for each tile in addition to the MIP-level offset.
Why all the effort?
if the user zooms in on a large image (several megapixels), decoding of the next higher res MIP-level takes a lot of time.
Current screens only have resolutions of 2-3 megapixels, meaning the full image is never visible, so decoding the whole image is a waste of memory and time. A simple grid-based visibility test would quickly yield which tiles are actually visible and which are hidden from view. Only the (partially) visible ones would have to send a load request to the image loader thread, greatly reducing decoding time and memory consumption.
The overhead
instead of reading just one offset and jumping directly to the data, the image loader thread will have to read two offsets.
The first offset value is necessary to jump to the MIP-level internal offset table describing the offset for each image tile.
The second offset value is necessary to jump from the MIP-level offset table directly to the tile data.
Each data request now consists of two offset lookups instead of just one, but considering the significant reduction in the amount of data actually read from the file, the overhead is neglectable.