[Feature Request] Add support for parent chd files

This issue has been tracked since 2020-01-18.

Recently I came across the ability to generate chd files with a parent specified, where data present in the parent is referenced rather than being duplicated in the child. For most platforms, the main usage would be in keeping multiple revisions or applying minor hacks/translations. For PS1 though, there's a bit of an extra use case in handling the massive number of multi-disc games the system offers. I tested the process out on Legend of Dragoon for the sake of curiosity, using disc 1 as the parent for the others.

222M Legend of Dragoon, The (USA) (Disc 1).chd
252M Legend of Dragoon, The (USA) (Disc 2).chd
164M Legend of Dragoon, The (USA) (Disc 2) (child).chd
299M Legend of Dragoon, The (USA) (Disc 3).chd
212M Legend of Dragoon, The (USA) (Disc 3) (child).chd
354M Legend of Dragoon, The (USA) (Disc 4).chd
266M Legend of Dragoon, The (USA) (Disc 4) (child).chd

Grand total space saved on this particular example was 263MB.

So far as I can tell, the parent is only referenced by its chd sha1, so there is a distinct complication in identifying the parent. That's the biggest issue I can see with implementation.

Syntax for generation:
chdman createcd -i "disc.cue" -op "parent.chd" -o "disc.chd"

bslenul wrote this answer on 2020-01-18

Never heard of that parent/child CHD before, this is pretty cool!

Quick test with Final Fantasy VII:

  • Disc 1 (parent): 460MB
  • Disc 2: 248MB (clone) instead of 443MB with normal conversion
  • Disc 2: 175MB (clone) instead of 400MB

So the whole game is ~883MB with parent/child, instead of ~1.3GB with regular CHD, that's really nice :D

Sanaki wrote this answer on 2020-01-19

Another decent one, 382MB saved:

336M Tales of Destiny II (USA) (Disc 1).chd
356M Tales of Destiny II (USA) (Disc 2).chd
165M Tales of Destiny II (USA) (Disc 2) (child).chd
393M Tales of Destiny II (USA) (Disc 3).chd
202M Tales of Destiny II (USA) (Disc 3) (child).chd
i30817 wrote this answer on 2020-01-20

This is lovely but it's probably not a good idea to change the names of the discs from the redump names unless you want the scanner to pretend they don't exist (unless the scanner treats chd special and uses a header name or something).

I'm also curious how this would work with both this stacking and a hack on top, for both games. My guess is not well, but i may be surprised.

Sanaki wrote this answer on 2020-01-20

The scanner doesn't care about filenames at all, nor does the core. The only time that matters is when a ToC type is used, such as a cue sheet. Unlike those, chd stores that information within the file as metadata. That said, I only named them child to keep them distinct for testing.

Example CHD info for Darkstone (many tracks)

chdman - MAME Compressed Hunks of Data (CHD) manager 0.217 (mame0217-423-g05f5625366)
Input file:   Darkstone (USA).chd
File Version: 5
Logical size: 754,972,992 bytes
Hunk Size:    19,584 bytes
Total Hunks:  38,551
Unit Size:    2,448 bytes
Total Units:  308,404
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
CHD size:     393,044,462 bytes
Ratio:        52.1%
SHA1:         0e8d09008e3be03229e9c7aa4d747225b32286f4
Data SHA1:    d731966609d90e2c99fc646486a2cbd9a26c1c38
Metadata:     Tag='CHT2'  Index=0  Length=93 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:159088 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=1  Length=90 bytes
              TRACK:2 TYPE:AUDIO SUBTYPE:NONE FRAMES:5989 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=2  Length=90 bytes
              TRACK:3 TYPE:AUDIO SUBTYPE:NONE FRAMES:5070 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=3  Length=90 bytes
              TRACK:4 TYPE:AUDIO SUBTYPE:NONE FRAMES:5077 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=4  Length=90 bytes
              TRACK:5 TYPE:AUDIO SUBTYPE:NONE FRAMES:4738 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=5  Length=91 bytes
              TRACK:6 TYPE:AUDIO SUBTYPE:NONE FRAMES:10426 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=6  Length=91 bytes
              TRACK:7 TYPE:AUDIO SUBTYPE:NONE FRAMES:10025 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=7  Length=91 bytes
              TRACK:8 TYPE:AUDIO SUBTYPE:NONE FRAMES:10556 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=8  Length=90 bytes
              TRACK:9 TYPE:AUDIO SUBTYPE:NONE FRAMES:9826 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=9  Length=92 bytes
              TRACK:10 TYPE:AUDIO SUBTYPE:NONE FRAMES:10742 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=10  Length=91 bytes
              TRACK:11 TYPE:AUDIO SUBTYPE:NONE FRAMES:6700 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=11  Length=91 bytes
              TRACK:12 TYPE:AUDIO SUBTYPE:NONE FRAMES:5519 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=12  Length=91 bytes
              TRACK:13 TYPE:AUDIO SUBTYPE:NONE FRAMES:5356 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=13  Length=91 bytes
              TRACK:14 TYPE:AUDIO SUBTYPE:NONE FRAMES:4718 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=14  Length=91 bytes
              TRACK:15 TYPE:AUDIO SUBTYPE:NONE FRAMES:5879 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=15  Length=91 bytes
              TRACK:16 TYPE:AUDIO SUBTYPE:NONE FRAMES:6129 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=16  Length=92 bytes
              TRACK:17 TYPE:AUDIO SUBTYPE:NONE FRAMES:14273 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=17  Length=91 bytes
              TRACK:18 TYPE:AUDIO SUBTYPE:NONE FRAMES:5553 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=18  Length=92 bytes
              TRACK:19 TYPE:AUDIO SUBTYPE:NONE FRAMES:14371 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.
Metadata:     Tag='CHT2'  Index=19  Length=91 bytes
              TRACK:20 TYPE:AUDIO SUBTYPE:NONE FRAMES:8333 PREGAP:150 PGTYPE:VAUDIO PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
       341     0.9%  Copy from self                          
    16,940    43.9%  CD LZMA                                 
     3,983    10.3%  CD Deflate                              
    17,287    44.8%  CD FLAC

As for hacks, it depends on what your intention is and what the hack is. If you want to keep both the hacked version and the unhacked version and the hack itself isn't invasive (that is, it only modifies a few files rather than -everything-), your best bet would be to use the originals as the parents for the hacks. Disc 1 clean as parent to disc 1 hack, disc 2 clean as parent to disc 2 hack, etc. You'd end up with hacked chds potentially under 1MB that way. If you aren't keeping the originals, you'd just use the same method I did, with disc 1 as parent, and see what you get. I'd expect similar results between hacked and unhacked, since hacks likely would apply the same edits to the same shared files on each disc.

EDIT: To be clear, PS1 scanning in retroarch is handled by checking the disc serial number, not a checksum or filename.

i30817 wrote this answer on 2020-01-21

While that serial thing is true, it's also not true if you want to use the 'manual' scanner, which is one of the only solutions (with the right dat) to get in playlists hacked game names without them being misidentified as the original game - a serial is a flawed primary key - even more than a filename imo, though obviously the only sure check is a checksum, stored or not.

I haven't tried that yet, mostly because of laziness and being unsure that the dat format on the libretro-database repo being compatible (it's not xml), but i did try the MAME split set dat (which isn't usually supported) and it created the playlist (albeit it only gets images because the split set main game archive happens to have the same filename as the merged set, which wouldn't be the case for hacks). The main problem with it is that i have to filter out the bios and 'devices' zips first (preferably from the dat) because those are useless to show and RA mame core doesn't support keeping those in another dir where the scanner won't sniff around (actually i should try to use the merged dat, since it doesn't have the bios zips and since the 'manual' scan will ignore crc mismatches).

Also i meant recursing the parent-child relationship into a small tree. Ie:

disc1 is parent of
--disc2

but

disc1 is also parent of
--disc1 hack

and

disc2 is also parent of
--disc2 hack (so to read some bytes, disc2 hack might potentially read up to 3 files)

quite a mess of a idea, but 'indistinguishable' to the naive observer that those are 'two separate games on the same dir' and they break stuff moving/renaming them, but hey, lusers won't be using this.

Sanaki wrote this answer on 2020-01-22

I don't really work with hacks often on disc-based systems. Other than a translation I don't believe I've used any on PS1. If manual scanning is the way you go... well, you do you. Generally I'd opt for the desktop menu or manual playlist editing instead, but that's me.

As for the recursing bit, I checked earlier (must not have gotten it written in here), you can only designate one file as a parent, and using a child as a parent will only work with data present in that specific file. If disc 1 has chunks a and b, disc 2 has a and c, and disc 3 has a and d, setting 1 as parent to 2 and 2 as parent to 3 would save nothing on 3.

i30817 wrote this answer on 2020-01-29

opened this issue; which is where this probably should have been opened because libretro is only a consumer of libchd (if i understand it correctly).

rtissera/libchdr#13

Sanaki wrote this answer on 2020-01-29

My understanding (possibly flawed) is that libchdr supports it, but that it needs to be fed both files, which isn't currently set up.

That interpretation is based on this comment: mamedev/mame#2634 (comment)

i30817 wrote this answer on 2020-01-29

Uh, that's a weird way of doing it, i'd expect that the parent file would be part of the format. I suppose doing it that way allows renames or different paths to the files; but i can't say i like it.

Meh then, i'll wait for the dev to close that issue then. It could still be useful if the dev decides to run with the tree idea.

If it's supposed to be something given by a dat, making RA support it is likely to be 'difficult' (ie: not going to happen) because it wouldn't be a standard dat (from a dumping group) and people would want this to make freeform compression.

There is a alternative: CHD has user metadata headers, that could maybe be used to bake in the info (the parent file and its path, preferably as something portable, ie: 'forbidden characters', replaced at runtime, only relative paths allowed, stored as a array without OS path seperator, notify if the file is not found and refuse to continue loading etc) in the files as a post processing step/tool.

i30817 wrote this answer on 2020-01-29

Though in my opinion, you're better off asking for that feature (storing a 'portable' path to the parent as a alternative to dats in the chd) in chdman, if you can manage to convince them.

A extra tool would be annoying and a excuse for people not to adopt your new 'unsupported' idea. And even then, it iikely won't be supported.

spoiler: i actually did a tool that stores checksums of roms and checksums of patched roms (even if they're softpatches) in the rom file itself as extended attributes. Proposed those exattr being used in the scanner as a easy way to solve the 'softpatch id problem' and the caching of the crc calculation, was asked if 'extended attributes work on windows' and when i them no was completely ignored by twinaphex before they implemented a 'serial id mechanism' that ruined all hacks ids. Similar things happened with another idea to separate the launcher file from the 'id' file, which is the root of several problems RA has with specialized core launcher files (though that was more complicated, being also a way for the user to take total control of the scanner). Meh.

There is really no point in proposing detailed features/ideas to RA itself, if you're not willing to dive into C and do the work yourself for twinaphex to put in one of his videos later. I keep making the same mistake though.

Sanaki wrote this answer on 2020-01-29

The chd file format as a whole is a MAME feature. It was designed to work with MAME's setup. The dat being the source of the parent name is only relevant for MAME. The only way this would ever enter the purview of retroarch itself is if the various core developers agreed that playlist entries should directly reference the parent in the entry, similar to how history entries store subsystem entries. Barring that happening, it has nothing to do with retroarch, it's a core issue only.

I did consider the metadata option, but that goes back to restricting file names, which isn't optimal. The best solutions I've come up with are as follows:

  • Scan directory for chd matching sha1 of parent. This wouldn't involve actually hashing the files, just having libchdr check the internally stored sha1 value on every chd in the same directory.
    • Pros: filename-agnostic, no metadata to futz with, least possible complications encountered by less technically adept users
    • Cons: Large directories could be slow to scan
  • Parent index file containing known parent SHA1s (stored in system or saves most likely)
    • Pros: Removes the need for filesystem rescans on each game load, would fall back to scan if listed parent isn't found in the listed location. Filename-agnostic due to fallback.
    • Cons: Still will scan once for each newly loaded child, requires an external file stored for saving these results.
  • Store children in a folder named the same as the parent
    • Pros: Easy to rename if needed, no scanning woes, easy for the average user
    • Cons: Not completely filename-agnostic, will be annoying for some people needing subdirectories in their pristine collections.
  • Metadata entry identifying the parent
    • Pros: No scanning required
    • Cons: Demands exact filename and path (relative), average user won't be able to change this field, some cores/emus have been known to choke on unknown metadata entries
  • Subsystem load
    • Pros: Directly provides the necessary files based on user input, no scanning needed
    • Cons: No easy way to add these entries to playlists, may not play nice with retroachievements file hashing

I understand you have gripes with the way these projects are run, but this isn't the place for that, and it doesn't help the problem at hand. This issue is only for hashing out how to handle -this- issue.

i30817 wrote this answer on 2020-01-29

You're right on all counts, i overstepped. And didn't know about the internal parent sha1 too! I thought the parent sha1 wouldn't be stored if the path wasn't, glad to see it's not the case.

Sanaki wrote this answer on 2020-01-29

If you check the expandable info readout for Darkstone in #587 (comment), the SHA1 listed there (not the Data SHA1) is the same one stored as the Parent SHA1 in the child. Using that instead of directly hashing would massively speed up a scan. Without that, pretty sure it wouldn't even be an option worth consideration.

i30817 wrote this answer on 2020-01-29

From your alternatives, i prefer the first, but only because i put in a game per dir.

However, the question becomes more interesting if you think about game hacks. If you think of that, you probably want to put them in a separate subdir. But that's no good! You'd have to put the 'main' game on the subdir and the 'hacks' on the parent, which is unnatural and doesn't work anyway if there are more than one.

So i'm thinking that the second or third are more 'natural'. And if this idea gets implemented by other cores, you probably want something everyone can agree on and live with as a 'standard' to not confuse the users further.

Anything i missed to make alternative 1 possible to game hacks in subdirs? Is checking 'current dir or parent' sane or insane?

Sanaki wrote this answer on 2020-01-29

The second is basically an extension of the first. I'm sure there's a way to make it reasonable to use a custom directory structure without needing to manually edit the index file, but I'll need to think on that. I overlooked that desire, to be honest.

EDIT: Oh, the subdirectory thing I mentioned, the idea was perhaps written a bit obtuse. File structure (as conceived) would be thus:

.
├── PS1
│   ├── parent.chd
│   ├── parent
│   │   ├── child1.chd
│   │   ├── child2.chd
i30817 wrote this answer on 2020-01-29

That structure also isn't very friendly to multiple unrelated children. Imagine two different hacks with different readmes. Of course you can do it with yet another subdir tree, but i don't like it (that tool also keeps a specialized version file to know when to update hacks, so i need a dir per hack).

Also this is actually a reason to make this part of the scanner. The scanner would treat chd files expecially and build a file (as in alternative 2, with the file in the saves dir, if you don't want it as part of the playlist) once, with all of the data sha1 and paths, and then any core could read that file for the paths, in whatever configuration they were without either user intervention or multiple path crawling.

Isn't the playlist json now, since another dev worked on them anyway?

Of course that comes with its own problems, namely many people avoid the scanner for various reasons, and now there is even the 'manual'/dat scan mode that would also have to be handled, though that doesn't sound that problematic in comparison to 'people avoid the scanner and thus have broken chds' because it's a simple question of calling the function there.

Sanaki wrote this answer on 2020-01-29

Yeah, retroarch playlists are json. Unfortunately, this feature has fairly limited appeal for other systems, so for now I feel like this needs to stay in core until such a time as it's proven expanding to a retroarch-supported feature would be beneficial and not out of scope for the project's goals. I'd love to find out I'm wrong, and that there's a way to implement it without breaking the current standard. When I said scan in the bulleted options above, my thought was for the core to handle that scanning directly, not retroarch.

There's always the option of adding parent handling to the core's options, or as a subsystem maybe (rather than selecting exact files, you provide it an entire directory to scan for children and their requested parents). That would handle the custom directory structure option, since it would recurse subdirectories. We'd need to hear from the core's devs on the matter to know if that solution is viable though.

i30817 wrote this answer on 2020-01-29

Won't be limited for long imo. CHD support will continue to spread just because it's the MAME format. It's also part of the arcade set, fwiw (not much since retroarch still crashes with hard drive chds on a scan last time i checked, so people have to move those MAME subdirs out of the scan dir - RA keeps trying to extract the 'serial' from the 'binary track' on those files - or use the manual scanner for MAME).

edit: also i think flycast supports chd, from some update twinaphex did about supporting MAME dreamcast/NAOMI images, bad idea that it is while the bug above isn't fixed.

negativeExponent wrote this answer on 2020-01-29

why would an issue about scanning CHD hinder features like adding ability to play using CHD files? sound like a hate post to be since both issue is unrelated...

i30817 wrote this answer on 2020-01-29

Because this chd feature needs to find the parent chd. It's related because the best way (as far i can tell - you might disagree and if so, write it out) to find that file is to store the 'data sha1' of each chd scanned 'somewhere' that the interested cores can find. When a core gets a chd that needs a parent (the chd records the sha1 of the parent internal checkum), it would open that file and find the equivalent sha1 -> file mapping. If either of those files don't exist it should show a error ofc.

This would allow any files to be in any scanned directory, instead of depending on 'conventions' (that have the problems mentioned in the other posts) and it would prevent the need for the cores to crawl the filesystem and only crawl once during the scan.

I agree that a way to force this without the scanner should exist though.

rz5 wrote this answer on 2020-01-29

I'm trying to work on this feature and I am stepping through the CHD open functions to understand what's going on. I'm seeing some problems re

@Zapeth @rtissera - Pinging you both hoping you could provide clarifications on this.

See the following:

/* if we need a parent, make sure we have one */
if (parent == NULL && (newchd->header.flags & CHDFLAGS_HAS_PARENT))
EARLY_EXIT(err = CHDERR_REQUIRES_PARENT);

/* extract the common data */
header->flags = get_bigendian_uint32(&rawheader[16]);
header->compression[0] = get_bigendian_uint32(&rawheader[20]);

In these snippets, libchdr assume all CHD header versions have a 32-bit wide flags field on byte 16. This seems to be wrong based on these code comments:

V5 header:
[ 0] char tag[8]; // 'MComprHD'
[ 8] uint32_t length; // length of header (including tag and length fields)
[ 12] uint32_t version; // drive format version
[ 16] uint32_t compressors[4];// which custom compressors are used?
[ 32] uint64_t logicalbytes; // logical size of the data (in bytes)
[ 40] uint64_t mapoffset; // offset to the map
[ 48] uint64_t metaoffset; // offset to the first blob of metadata
[ 56] uint32_t hunkbytes; // number of bytes per hunk (512k maximum)
[ 60] uint32_t unitbytes; // number of bytes per unit within each hunk
[ 64] uint8_t rawsha1[20]; // raw data SHA1
[ 84] uint8_t sha1[20]; // combined raw+meta SHA1
[104] uint8_t parentsha1[20];// combined raw+meta SHA1 of parent
[124] (V5 header length)
If parentsha1 != 0, we have a parent (no need for flags)
If compressors[0] == 0, we are uncompressed (including maps)

For the purposes of this feature request, I would need to change how chd.c detects if a file needs a parent, something like this:

chd_error chd_open_file(...)
{
(...)

    /* if we need a parent, make sure we have one */
    if (parent == NULL)
    {
        /* Checks for V4 and below */
        if (newchd->header.version < 5 && (newchd->header.flags & CHDFLAGS_HAS_PARENT))
            EARLY_EXIT(err = CHDERR_REQUIRES_PARENT);
        
        /* Checks for V5 */
        else if (newchd->header.version >= 5 && memcmp(nullsha1, newchd->header.parentsha1, sizeof(newchd->header.parentsha1)) != 0)
            EARLY_EXIT(err = CHDERR_REQUIRES_PARENT);
    }
    
(...)
}

After this is addressed, I would run into another problem. This libretro core accepts 1 file as the content. If the file is a CHD clone, the way I'd search for its parent file is to first get the clone's header info.
But I think the libchdr API doesn't allow me to open a clone file without specifying its parent first...

rtissera wrote this answer on 2020-01-30

Had a quick look at this.
Internally, it should be somehow easy to add parent / child support to libchdr.
It does need a couple of fixes and implementing missing stuff, but totally doable.

rz5 wrote this answer on 2020-01-31

Thanks for the feedback. I thought libchdr was ready for parent/clone usage, so I worked on the easy fixes/behavior. But even after successfully opening a clone - i.e. correctly opening the parent chd and passing its handle while opening the clone - the emulator just hung on the Sony logo.

So if there was doubt about libchdr being able to handle it, that settled it for me.

The work I did is here: https://github.com/libretro/beetle-psx-libretro/commits/r5/chd-clone-support

Sanaki wrote this answer on 2021-03-21

Clone support is now included in libchdr. In theory it should now be fully possible to implement.

More Details About Repo
Owner Name libretro
Repo Name beetle-psx-libretro
Full Name libretro/beetle-psx-libretro
Language C
Created Date 2014-12-03
Updated Date 2022-04-16
Star Count 258
Watcher Count 42
Fork Count 125
Issue Count 241

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date