What is an ELF Export?
This an article on how one might define what an ELF export is, and by way of relation, what an ELF import is. To avoid confusion and consternation, it is only indirectly about ELF globally visible symbols, and other ELF specific mechanisms, although I believe the two are frequently conflated. The latter is specific to the ELF binary format, whereas an "ELF export" is what I mean by the ELF format's expression of an exported symbol.
Multiple binary format containers express the concept of an exported symbol in different ways, but I think they all are approximating, in their own unique ways, two principles which I sketch later on, and the following.
Put simply, an export is a datum, such as an integer or a string, or a sequence of instructions which execute in some predefined and semantically coherent manner - a routine, or a function - which is available for public use. An exported symbol is therefore a name pointing to either an integer or a string, or a function, and made available for use. Calling anything else an export seems to me a failing of the binary format, typically because we have repurposed its existing mechanisms because they failed us in expressing a new (or older) concept.
The C concept of extern
essentially implements this notion at the source level.
Similarly, an import is the dual or inverse of an export, and will have a similar characterization.
And it is this characterization of an "export" that I have in mind (and I'm pretty sure you do too) when I ask "What is an ELF export?"
The casus belli of this article however is that I can't seem to find anything on the internet explicitly detailing or logically defining what makes something an ELF export or not (focusing on the ELF part). This answer says that if a symbol is not SHN_UNDEF
then it's "defined and exported". That's the best I've seen, but it's not correct.
Normally you just see people telling you to use nm
or readelf
to get the imports or exports, which isn't an explanation, but a means (In the age of cargo culting, however, this distinction might not mean much).
According to this somewhat official looking ELF specification:
Global symbols are visible to all object files being combined. One file’s definition of a global symbol will satisfy another file’s undefined reference to the same global symbol. [1-18]
So there you have it. Except I didn't ask what a global symbol was; I asked what an ELF export was. And as it turns out, even if we accept globalness as a definition, or non-SHN_UNDEF
ness as a definition, as most things are, it's a little more complicated than that.
If you want to go down the hole, then read on; otherwise just skip to the conclusion, where you'll find what I consider a robust, logical definition of ELF imports and exports, as I characterized them above.
ELF Exports
Take a symbol like GLIBC_2.2.5
(yes, it's a real symbol, take a peak inside of the _DYNAMIC
array in /usr/lib/ld-linux-x86-64.so.2
): it's listed as a GLOBAL
OBJECT
with an ABS
shndx, and no address (this might actually make sense).
But if "all defined global symbols are exports", then GLIBC_2.2.5
must be an exported symbol from ld-linux-x86-64-so.2
.
If you answered yes, then you'd be wrong, because it's definitely not defined in ld-linux-x86-64-so.2
, but rather in libc.so.6
. So then our definition isn't quite right, because strictly speaking, GLIBC_2.2.5
isn't undefined (shndx = 0
), so it's a little more complicated than just "anything not undefined is exported and defined".
So it must be an exported symbol in libc.so.6
. Well, it occurs just exactly as it does in ld-linux-x86-64-so.2
, so who knows what to make of that.
Of course someone will lampoon me perhaps at this point, saying it's a way of versioning the function calls using VERNEED
or VERDAUX
or something silly like that in the _DYNAMIC
array, etc. memcpy
, for example, is like this. In my setup, as of this writing, my libc.so.6
defines two memcpy
's, one for different versions of libc.
Which is fine; but it seems clear that GLIBC_2.2.5
isn't really an exported symbol in the sense you or I typically mean, but has been kind of grafted on with the available ELF #define
s that we call types, and repurposed for those uses. Which, as I said, is fine.
And I say it's not in the sense you or I typically mean because I'm pretty sure if I monkey-patch a binary right now and import GLIBC_2.2.5
, weird stuff's going to happen. More on this particular point a little later.
This is perhaps a corner case; and importantly, we should be able to write a program that takes a binary and outputs the exports for that binary. In order words, the "services" it provides for all to use.
After all, some program has to be doing this (it's actually called ld-linux-x86-64.so.2
, which is the default dynamic linker on (my) linux).
So in this export-printing program, we have to tell it exactly what an ELF export is: and now we can say that an ELF export is every symbol which is GLOBAL
, defined, and doesn't have an empty (0x0
) address offset.
Well, that also doesn't work; we actually need to count WEAK
symbols as exported symbols, like vsnprintf
and a host of other symbols in libc.so.6
.
Unfortunately, adding this to our criterion won't work either; compile a crappy binary that increments 0xdeadbeef
or something. Now run something like readelf --syms <your crappy binary>
, and you'll see a bunch of... defined GLOBAL
and WEAK
symbols with address offsets that are non-zero.
Since I'm pretty sure _DYNAMIC
isn't exported by your binary (or _edata
, or the various other nlist symbols for use in debugging), otherwise the (linux) world might explode, this means our definition still isn't right. (Actually, this might be a future exercise, seeing what happens when a symbol named _DYNAMIC
is exported)
Which brings us to our final revision: an ELF symbol is exported iff if it is (GLOBAL
or WEAK
) and is defined and has a non-zero address and has an entry in a SYMTAB
table in the _DYNAMIC
array.
So there you have it.
Unfortunately, as has been noted here, many programs mess this up.
If I run nm -g /usr/lib/libz.so.1.2.8
on archlinux, which as of this writing strips libz, nm
returns no symbols. But I assure you, there's plenty of symbols exported by libz. If you have the ability to install a package like elfutils
, readelf --dyn-syms /usr/lib/libz.so.1.2.8
will show you that.
Unfortunately, as the article above mentions, section headers are also optional in ELF binaries, and you can sstrip
a binary or dynamic lib to really confuse some (looks like most?) of these programs. For me, sstrip
ing (a copy) of libz and running the same command from above does not produce any symbols... ಠ_ಠ
While there is a lot of good information in the section headers and the symbol table/nlist if it's avaialble, from a practical perspective (I'd also say logical/philosophical), an exported symbol is at least that which is visible to the dynamic linker; therefore, if a program which purporteldy outputs a program's exports relies entirely on section headers, and not the _DYNAMIC
array, then it's not really telling us what the exports are, because the dynamic linker only cares about what's in _DYNAMIC
.
ELF Imports
So, I'm going to say something controversial about what an (any) export is, and in doing so, introduce the first recursive principle of imports and exports:
(1) an export is something that can be imported
I think there's something deeply mystical, mutually recursive, and satisfying about that, and I like it --- because now we need to start thinking about what an import is. You might think that an ELF import is just every undefined symbol, like the stack overflow answer suggested.
That's wrong too, unfortunately, at least technically speaking. The first entry of every symbol table, if the binary is ELF API conforming, must be the "null" entry; as a result, there always exists at least one undefined, LOCAL
symbol, with no name.
Of course one might argue it's still an import, which just sounds like they're arguing it is an undefined symbol; and similarly, for exports, that they're simply globally visible symbols.
But I think the philosophical crux of my point is that an import isn't just an undefined symbol and an export isn't just a globally visible symbol - they're subject to (or should be) structured, semantic constraints, and this characterization is something that mach-o binaries really nailed, so much so that it's a pleasure analyzing them. See this long-winded analysis on mach-o binaries for how exports are very structured, constrained things (they only occur in the export trie); and how imports are very structured, constrained things (they only occur in the imports FSA).
To illustrate using a real example, and which is currently vexing me (maybe someone can send a nasty (or nice) email telling me what's going on here), occasionally, you find undefined NOTYPE
WEAK
symbols in the _DYNAMIC
array.
As of this writing, in libc.so.6
, _dl_starting_up
is such a symbol. The _dl
indicates it comes from the dynamic loader's API, which ld-linux-x86-64.so.2
provides.
This is the API-qua-exports of linux's dynamic linker:
Address Offset | Symbol Name (Size) |
---|---|
8ce0 | _dl_rtld_di_serinfo (535) |
fa00 | _dl_debug_state (2) |
113d0 | _dl_mcount (590) |
11d30 | _dl_tls_setup (165) |
11de0 | _dl_get_tls_static_info (21) |
11ea0 | _dl_allocate_tls_init (621) |
12110 | _dl_allocate_tls (49) |
12150 | _dl_deallocate_tls (132) |
124a0 | __tls_get_addr (63) |
128a0 | _dl_make_stack_executable (136) |
13310 | _dl_find_dso_for_object (177) |
16950 | __libc_memalign (248) |
16a50 | malloc (13) |
16a60 | calloc (59) |
16aa0 | free (46) |
16c10 | realloc (138) |
21c40 | _rtld_global_ro (296) |
21d78 | _dl_argv (8) |
21de0 | __libc_stack_end (8) |
21de8 | __libc_enable_secure (4) |
22000 | _rtld_global (3968) |
223100 | _r_debug (40) |
Unfortunately, _dl_starting_up
is nowhere to be seen. Lucky for us, libc.so.6
is executable - yea, you heard that right, we can run the libc dynamic library. And if we can run it, then with some esoteric dynamic linking environment variable knowledge, we can force all those stupid lazily bound symbols to be more like ML and not Haskell, and bind on startup.
To wit, we can do something like:
$ LD_BIND_NOW=1 LD_DEBUG=all /usr/lib/libc.so.6 2>&1 >/dev/null | grep -C 2 _dl_starting_up
22011: symbol=optind; lookup in file=/usr/lib/libc.so.6 [0]
22011: binding file /usr/lib/libc.so.6 [0] to /usr/lib/libc.so.6 [0]: normal symbol `optind' [GLIBC_2.2.5]
22011: symbol=_dl_starting_up; lookup in file=/usr/lib/libc.so.6 [0]
22011: symbol=_dl_starting_up; lookup in file=/usr/lib/ld-linux-x86-64.so.2 [0]
22011: symbol=stdout; lookup in file=/usr/lib/libc.so.6 [0]
22011: binding file /usr/lib/libc.so.6 [0] to /usr/lib/libc.so.6 [0]: normal symbol `stdout' [GLIBC_2.2.5]
Unfortunately, what that tells us is that the symbol _dl_starting_up
was not bound, because it wasn't found in ld-linux-x86-64.so.2
, because it's not exported by that library.
And this is ok, because _dl_starting_up
is a weak symbol. But then we need to ask ourselves: is it really an imported symbol if it doesn't have a type, and isn't exported by any library? I'm sure it has some function (likely another repurposing hack because the binary format failed us), and I really am looking forward to an email explaining this bit of low-level arcana, but I think it illustrates my second recursive principle:
(2) All imported symbols must be exported from somewhere
To put this a bit more formally: for all x, if x is an imported symbol, then there exists a y, such that y is a library and y exports x.
This may be a biconditional, I haven't decided yet (if we remember our formal logic (doubtful?), this is only a necessary condition for imported symbolhood, and not a sufficient one).
Conclusion
To conclude, given the richness of ELF and GNU/Linux visibility conditions, definedness, and symbol types, being an export, in my opinion, is more fuzzy than I'd like, especially compared to other binary container formats (e.g., mach). But I think it is possible to give a set of logical constraints which approximate the characterization of an "exported symbol" I gave at the start.
As such, I embarked on a philosophical discussion of what it is to be an ELF export, and ended up with two "recursive" principles of import and exporthood; pragmatic C hackers and reverse engineers, if they've read this far, have probably gone apoplectic. Sorry about that.
And unfortunately, performing a depth-first search of the internet only turned up dubious explanations or program suggestions for determining ELF exports and imports, whereas I wanted a logical definition.
If you were like me, or you just need to write a program that prints what I'll coin right now are the semantically tenable logical exports (and imports) of an ELF binary, then I present to you, in all their pseudo logic glory:
The Logical ELF Import and Export Definitions 
A symbol x is an ELF export iff:
x.st_value ≠ 0x0 ∧ x.st_shndex ≠ SHN_UNDEF ∧ (x ∈ SYMTAB ∧ SYMTAB ∈ _DYNAMIC) ∧ (x.st_bind = GLOBAL ∨ x.st_bind = WEAK) ∧ (x.st_type = IFUNC ∨ x.st_type = FUNC ∨ x.st_type = OBJECT )
A symbol x is an ELF import iff:
x.st_value = 0x0 ∧ x.st_shndex = SHN_UNDEF ∧ x.st_name ≠ 0 ∧ (x ∈ SYMTAB ∧ SYMTAB ∈ _DYNAMIC) ∧ (x.st_bind = GLOBAL ∨ x.st_bind = WEAK) ∧ (x.st_type = IFUNC ∨ x.st_type = FUNC ∨ x.st_type = OBJECT )
Appendix
A
ELF Type Conditions in Logical Import and Export Definitions
So the final condition in the logical ELF import and export definitions is (x.st_type = IFUNC ∨ x.st_type = FUNC ∨ x.st_type = OBJECT )
, which might be too strict for some.
This is a reasonable concern; for example, many binaries import _Jv_RegisterClasses
which has NOTYPE
, GLOBAL
, an undefined shndx
and a zero address. This symbol even has exporting libraries: /usr/lib/libgcj_bc.so
and /usr/lib/libgcj.so.15.0.0
.
If it weren't for the NOTYPE
it would qualify as an import, and have the bonus of satisfying the recursive principles.
Interestingly, unlike something like __gmon_start__
, which is exported nowhere, this symbol (despite to the contrary what its st_type
says) actually is a function, and which satisfies the logical definition of an ELF export: it is a FUNC
, it's GLOBAL
, is defined, and has an address.
Disassembling gives the added bonus of coherent instruction sequences. E.g., for me, at offset 0x1320
in /usr/lib/libgcj_bc.so
we have:
.text
subq $8, %rsp
xorl %eax, %eax
callq -395
This could be an oversight/bug in the compiler, and it's st_type
really should be FUNC
- it's hard to say.
But in the case of something like __gmon_start__
, which is used by /usr/bin/gprof
(but isn't exported by that binary), the case for it's not really being an import is more reasonable - so different NOTYPE
GLOBAL
undefined zero-address symbols seem to have different behavior.
As such, for practical reasons, it might be prudent to let the last clause in the logical ELF import and export definition be optional, for unusual cases like this (depending on your use case).
In the end, you're free to do whatever you want. Just don't hurt anybody!
B
_dl_starting_up
For the curious, we can actually find the symbol reference in the rtld.c
source code, beginning at line 110:
#ifndef HAVE_INLINED_SYSCALLS
/* Set nonzero during loading and initialization of executable and
libraries, cleared before the executable's entry point runs. This
must not be initialized to nonzero, because the unused dynamic
linker loaded in for libc.so's "ld.so.1" dep will provide the
definition seen by libc.so's initializer; that value must be zero,
and will be since that dynamic linker's _dl_start and dl_main will
never be called. */
int _dl_starting_up = 0;
rtld_hidden_def (_dl_starting_up)
#endif
According to this obtuse piece of documentation, I should be able to inspect _dl_starting_up
in a debugger; if I mess around with this or figure out what's going on, I'll post back here.
Or again, someone well versed in this weirdness can enlighten me through email, and I'll put their answer here.