mold does not support object files emitted by dmd (D compiler)

This issue has been tracked since 2021-11-03.

DMD has a pretty custom backend, so it's probably doing weird things.

Repro:

wget https://s3.us-west-2.amazonaws.com/downloads.dlang.org/releases/2021/dmd.2.098.0.linux.tar.xz
tar xf dmd.2.098.0.linux.tar.xz
echo 'module test; import std; void main() { writefln!"Hello World"; }' > hello.d
# adjust for mold dir
# WARNING: This step will OOM!
PATH=$HOME/mold:$PATH dmd2/linux/bin64/dmd hello.d -v
./hello

These are the steps I took trying to get mold to work:

  1. Mold OOMs.

Seems to be caused by having a mergeable section with sh_entsize=0? Since I don't know anything about ELF, let's speculatively set entsize to 1 and move on for now.

  1. bad relocation in libphobos2.a

This is D's standard library. Stub out the error and move on.

  1. bad symbol value in libphobos2.a

Stub out the error and move on...

Then it creates a binary, but the binary prints garbage instead of "Hello World."

Oddity: printf("Hello World\n"); works. Something about having the string in a different object file than the function...?

... Actually, I'm having a hard time making this issue appear in any other way than "string passed at compiletime to a template function from the stdlib". So something odd happens there.

Anyway, at least the OOM should probably be fixed regardless?

rui314 wrote this answer on 2021-11-04

Confirmed. Looks like there's a bug in mold. Thank you for reporting! I'll investigate and fix it.

rui314 wrote this answer on 2021-11-04

It turned out that that's not a bug in mold. There are in fact several issues in the DMD-generated object file.

First, sh_entsize=0 is violating the specification of the ELF file format. Here is what is happening: DMD creates a .rodata.str1.1 section to put string literals into it. The section has SHF_MERGE and SHF_STRING flags, indicating that the section contains null-terminated strings. It is expected for the linker to split the section contents into null-terminated strings and merge them by contents (so that if the identical string literals appear in two different object files, they are merged into a single string in the final output file). sh_entsize field for such section should have the size of char in that section -- i.e. 1 for regular C-style string, 2 for UTF-16 strings (or u"" strings in the C terminology), and 4 for UTF-32. My understanding is that 0 is just invalid. In this case, it should have been 1.

Second, there are actually invalid relocations in the DMD-generated object file. Since the contents of .rodata.str1.1 are split into multiple null-terminated strings and merged with other string literals from other object files, relative positions of strings in the section are not significant. Let's assume that a .rodata.str1.1 contains foobar\0baz\0. You can't generate a relocation at offset 2 of that section to locate "4 bytes before string baz", because such relocation will simply interpreted as pointing to the 2nd byte of string foobar. However, DMD generates lots of such out-of-bound relocations.

So, why does the DMD-generated object file work with other linkers? It's because other linkers ignores SHF_MERGE and SHF_STRING flags because it has an invalid sh_entsize value. As a result, .rodata.str1.1 is treated as a non-splittable regular section. Since relative positions in a regular section are significant, out-of-bound relocations happen to work. However, it is obviously not what DMD is trying to achieve.

I'll implement a workaround to mold so that mold will behave the same as other linkers.

Do you mind if I ask you to report the above problem to DMD? Currently, DMD is emitting bogus .rodata.str1.1 sections which prevents a linker from merging identical string literals.

rui314 wrote this answer on 2021-11-04

I submitted a workaround in the above commit. Can you git pull and try again? Thanks!

FeepingCreature wrote this answer on 2021-11-04

I'll try at work tomorrow, thanks!

FeepingCreature wrote this answer on 2021-11-05

https://issues.dlang.org/show_bug.cgi?id=22483 Filed, thanks for the support.

The patched mold still creates binaries that fail when I run it against larger tests, but I've run out of free time on this, so I'll come back to it later.

rui314 wrote this answer on 2021-11-05

If you observed the issue with an open source project, I can build it myself to try to reproduce the issue for you.

FeepingCreature wrote this answer on 2021-11-05

Hm, it doesn't seem to happen for totally trivial projects, but here's a repro:

wget https://s3.us-west-2.amazonaws.com/downloads.dlang.org/releases/2021/dmd.2.098.0.linux.tar.xz
tar xf dmd.2.098.0.linux.tar.xz
# generic serialization lib
git clone https://github.com/funkwerk/serialized.git
cd serialized
PATH=$PWD/../dmd2/linux/bin64:$PATH dub test
# observe test runner working.
# try again with mold...
rm -rf build .obj
# adjust for mold dir
PATH=$PWD/../mold:$PWD/../dmd2/linux/bin64:$PATH dub test
# observe segfault.

Sorry I don't have time for a better reduction rn.

rui314 wrote this answer on 2021-12-25

@FeepingCreature Thank you for sharing the instruction to reproduce the issue. The test indeed crashed on my machine. I fixed it in the above patch.

rui314 wrote this answer on 2021-12-28

My previous fix causes a regression, so I reverted it. I'll land another patch to fix this issue.

More Details About Repo
Owner Name rui314
Repo Name mold
Full Name rui314/mold
Language C++
Created Date 2020-09-29
Updated Date 2022-08-14
Star Count 8457
Watcher Count 97
Fork Count 275
Issue Count 110

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date