What is an ELF Export?
This an article on how one might define what an ELF export is, and by way of relation, what an ELF import is. To avoid confusion and consternation, it is only indirectly about ELF globally visible symbols, and other ELF specific mechanisms, although I believe the two are frequently conflated. The latter is specific to the ELF binary format, whereas an "ELF export" is what I mean by the ELF format's expression of an exported symbol.
Multiple binary format containers express the concept of an exported symbol in different ways, but I think they all are approximating, in their own unique ways, two principles which I sketch later on, and the following.
Put simply, an export is a datum, such as an integer or a string, or a sequence of instructions which execute in some predefined and semantically coherent manner - a routine, or a function - which is available for public use. An exported symbol is therefore a name pointing to either an integer or a string, or a function, and made available for use. Calling anything else an export seems to me a failing of the binary format, typically because we have repurposed its existing mechanisms because they failed us in expressing a new (or older) concept.
The C concept of
extern essentially implements this notion at the source level.
Similarly, an import is the dual or inverse of an export, and will have a similar characterization.
And it is this characterization of an "export" that I have in mind (and I'm pretty sure you do too) when I ask "What is an ELF export?"
The casus belli of this article however is that I can't seem to find anything on the internet explicitly detailing or logically defining what makes something an ELF export or not (focusing on the ELF part). This answer says that if a symbol is not
SHN_UNDEF then it's "defined and exported". That's the best I've seen, but it's not correct.
Normally you just see people telling you to use
readelf to get the imports or exports, which isn't an explanation, but a means (In the age of cargo culting, however, this distinction might not mean much).
According to this somewhat official looking ELF specification:
Global symbols are visible to all object files being combined. One file’s definition of a global symbol will satisfy another file’s undefined reference to the same global symbol. [1-18]
So there you have it. Except I didn't ask what a global symbol was; I asked what an ELF export was. And as it turns out, even if we accept globalness as a definition, or non-
SHN_UNDEFness as a definition, as most things are, it's a little more complicated than that.
If you want to go down the hole, then read on; otherwise just skip to the conclusion, where you'll find what I consider a robust, logical definition of ELF imports and exports, as I characterized them above.
Take a symbol like
GLIBC_2.2.5 (yes, it's a real symbol, take a peak inside of the
_DYNAMIC array in
/usr/lib/ld-linux-x86-64.so.2): it's listed as a
OBJECT with an
ABS shndx, and no address (this might actually make sense).
But if "all defined global symbols are exports", then
GLIBC_2.2.5 must be an exported symbol from
If you answered yes, then you'd be wrong, because it's definitely not defined in
ld-linux-x86-64-so.2, but rather in
libc.so.6. So then our definition isn't quite right, because strictly speaking,
GLIBC_2.2.5 isn't undefined (
shndx = 0), so it's a little more complicated than just "anything not undefined is exported and defined".
So it must be an exported symbol in
libc.so.6. Well, it occurs just exactly as it does in
ld-linux-x86-64-so.2, so who knows what to make of that.
Of course someone will lampoon me perhaps at this point, saying it's a way of versioning the function calls using
VERDAUX or something silly like that in the
_DYNAMIC array, etc.
memcpy, for example, is like this. In my setup, as of this writing, my
libc.so.6 defines two
memcpy's, one for different versions of libc.
Which is fine; but it seems clear that
GLIBC_2.2.5 isn't really an exported symbol in the sense you or I typically mean, but has been kind of grafted on with the available ELF
#defines that we call types, and repurposed for those uses. Which, as I said, is fine.
And I say it's not in the sense you or I typically mean because I'm pretty sure if I monkey-patch a binary right now and import
GLIBC_2.2.5, weird stuff's going to happen. More on this particular point a little later.
This is perhaps a corner case; and importantly, we should be able to write a program that takes a binary and outputs the exports for that binary. In order words, the "services" it provides for all to use.
After all, some program has to be doing this (it's actually called
ld-linux-x86-64.so.2, which is the default dynamic linker on (my) linux).
So in this export-printing program, we have to tell it exactly what an ELF export is: and now we can say that an ELF export is every symbol which is
GLOBAL, defined, and doesn't have an empty (
0x0) address offset.
Well, that also doesn't work; we actually need to count
WEAK symbols as exported symbols, like
vsnprintf and a host of other symbols in
Unfortunately, adding this to our criterion won't work either; compile a crappy binary that increments
0xdeadbeef or something. Now run something like
readelf --syms <your crappy binary>, and you'll see a bunch of... defined
WEAK symbols with address offsets that are non-zero.
Since I'm pretty sure
_DYNAMIC isn't exported by your binary (or
_edata, or the various other nlist symbols for use in debugging), otherwise the (linux) world might explode, this means our definition still isn't right. (Actually, this might be a future exercise, seeing what happens when a symbol named
_DYNAMIC is exported)
Which brings us to our final revision: an ELF symbol is exported iff if it is (
WEAK) and is defined and has a non-zero address and has an entry in a
SYMTAB table in the
So there you have it.
Unfortunately, as has been noted here, many programs mess this up.
If I run
nm -g /usr/lib/libz.so.1.2.8 on archlinux, which as of this writing strips libz,
nm returns no symbols. But I assure you, there's plenty of symbols exported by libz. If you have the ability to install a package like
readelf --dyn-syms /usr/lib/libz.so.1.2.8 will show you that.
Unfortunately, as the article above mentions, section headers are also optional in ELF binaries, and you can
sstrip a binary or dynamic lib to really confuse some (looks like most?) of these programs. For me,
sstriping (a copy) of libz and running the same command from above does not produce any symbols... ಠ_ಠ
While there is a lot of good information in the section headers and the symbol table/nlist if it's avaialble, from a practical perspective (I'd also say logical/philosophical), an exported symbol is at least that which is visible to the dynamic linker; therefore, if a program which purporteldy outputs a program's exports relies entirely on section headers, and not the
_DYNAMIC array, then it's not really telling us what the exports are, because the dynamic linker only cares about what's in
So, I'm going to say something controversial about what an (any) export is, and in doing so, introduce the first recursive principle of imports and exports:
(1) an export is something that can be imported
I think there's something deeply mystical, mutually recursive, and satisfying about that, and I like it --- because now we need to start thinking about what an import is. You might think that an ELF import is just every undefined symbol, like the stack overflow answer suggested.
That's wrong too, unfortunately, at least technically speaking. The first entry of every symbol table, if the binary is ELF API conforming, must be the "null" entry; as a result, there always exists at least one undefined,
LOCAL symbol, with no name.
Of course one might argue it's still an import, which just sounds like they're arguing it is an undefined symbol; and similarly, for exports, that they're simply globally visible symbols.
But I think the philosophical crux of my point is that an import isn't just an undefined symbol and an export isn't just a globally visible symbol - they're subject to (or should be) structured, semantic constraints, and this characterization is something that mach-o binaries really nailed, so much so that it's a pleasure analyzing them. See this long-winded analysis on mach-o binaries for how exports are very structured, constrained things (they only occur in the export trie); and how imports are very structured, constrained things (they only occur in the imports FSA).
To illustrate using a real example, and which is currently vexing me (maybe someone can send a nasty (or nice) email telling me what's going on here), occasionally, you find undefined
WEAK symbols in the
As of this writing, in
_dl_starting_up is such a symbol. The
_dl indicates it comes from the dynamic loader's API, which
This is the API-qua-exports of linux's dynamic linker:
|Address Offset||Symbol Name (Size)|
_dl_starting_up is nowhere to be seen. Lucky for us,
libc.so.6 is executable - yea, you heard that right, we can run the libc dynamic library. And if we can run it, then with some esoteric dynamic linking environment variable knowledge, we can force all those stupid lazily bound symbols to be more like ML and not Haskell, and bind on startup.
To wit, we can do something like:
$ LD_BIND_NOW=1 LD_DEBUG=all /usr/lib/libc.so.6 2>&1 >/dev/null | grep -C 2 _dl_starting_up 22011: symbol=optind; lookup in file=/usr/lib/libc.so.6  22011: binding file /usr/lib/libc.so.6  to /usr/lib/libc.so.6 : normal symbol `optind' [GLIBC_2.2.5] 22011: symbol=_dl_starting_up; lookup in file=/usr/lib/libc.so.6  22011: symbol=_dl_starting_up; lookup in file=/usr/lib/ld-linux-x86-64.so.2  22011: symbol=stdout; lookup in file=/usr/lib/libc.so.6  22011: binding file /usr/lib/libc.so.6  to /usr/lib/libc.so.6 : normal symbol `stdout' [GLIBC_2.2.5]
Unfortunately, what that tells us is that the symbol
_dl_starting_up was not bound, because it wasn't found in
ld-linux-x86-64.so.2, because it's not exported by that library.
And this is ok, because
_dl_starting_up is a weak symbol. But then we need to ask ourselves: is it really an imported symbol if it doesn't have a type, and isn't exported by any library? I'm sure it has some function (likely another repurposing hack because the binary format failed us), and I really am looking forward to an email explaining this bit of low-level arcana, but I think it illustrates my second recursive principle:
(2) All imported symbols must be exported from somewhere
To put this a bit more formally: for all x, if x is an imported symbol, then there exists a y, such that y is a library and y exports x.
This may be a biconditional, I haven't decided yet (if we remember our formal logic (doubtful?), this is only a necessary condition for imported symbolhood, and not a sufficient one).
To conclude, given the richness of ELF and GNU/Linux visibility conditions, definedness, and symbol types, being an export, in my opinion, is more fuzzy than I'd like, especially compared to other binary container formats (e.g., mach). But I think it is possible to give a set of logical constraints which approximate the characterization of an "exported symbol" I gave at the start.
As such, I embarked on a philosophical discussion of what it is to be an ELF export, and ended up with two "recursive" principles of import and exporthood; pragmatic C hackers and reverse engineers, if they've read this far, have probably gone apoplectic. Sorry about that.
And unfortunately, performing a depth-first search of the internet only turned up dubious explanations or program suggestions for determining ELF exports and imports, whereas I wanted a logical definition.
If you were like me, or you just need to write a program that prints what I'll coin right now are the semantically tenable logical exports (and imports) of an ELF binary, then I present to you, in all their pseudo logic glory:
The Logical ELF Import and Export Definitions
A symbol x is an ELF export iff:
x.st_value ≠ 0x0 ∧ x.st_shndex ≠ SHN_UNDEF ∧ (x ∈ SYMTAB ∧ SYMTAB ∈ _DYNAMIC) ∧ (x.st_bind = GLOBAL ∨ x.st_bind = WEAK) ∧ (x.st_type = IFUNC ∨ x.st_type = FUNC ∨ x.st_type = OBJECT )
A symbol x is an ELF import iff:
x.st_value = 0x0 ∧ x.st_shndex = SHN_UNDEF ∧ x.st_name ≠ 0 ∧ (x ∈ SYMTAB ∧ SYMTAB ∈ _DYNAMIC) ∧ (x.st_bind = GLOBAL ∨ x.st_bind = WEAK) ∧ (x.st_type = IFUNC ∨ x.st_type = FUNC ∨ x.st_type = OBJECT )
ELF Type Conditions in Logical Import and Export Definitions
So the final condition in the logical ELF import and export definitions is
(x.st_type = IFUNC ∨ x.st_type = FUNC ∨ x.st_type = OBJECT ), which might be too strict for some.
This is a reasonable concern; for example, many binaries import
_Jv_RegisterClasses which has
GLOBAL, an undefined
shndx and a zero address. This symbol even has exporting libraries:
If it weren't for the
NOTYPE it would qualify as an import, and have the bonus of satisfying the recursive principles.
Interestingly, unlike something like
__gmon_start__, which is exported nowhere, this symbol (despite to the contrary what its
st_type says) actually is a function, and which satisfies the logical definition of an ELF export: it is a
GLOBAL, is defined, and has an address.
Disassembling gives the added bonus of coherent instruction sequences. E.g., for me, at offset
/usr/lib/libgcj_bc.so we have:
.text subq $8, %rsp xorl %eax, %eax callq -395
This could be an oversight/bug in the compiler, and it's
st_type really should be
FUNC - it's hard to say.
But in the case of something like
__gmon_start__, which is used by
/usr/bin/gprof (but isn't exported by that binary), the case for it's not really being an import is more reasonable - so different
GLOBAL undefined zero-address symbols seem to have different behavior.
As such, for practical reasons, it might be prudent to let the last clause in the logical ELF import and export definition be optional, for unusual cases like this (depending on your use case).
In the end, you're free to do whatever you want. Just don't hurt anybody!
For the curious, we can actually find the symbol reference in the
rtld.c source code, beginning at line 110:
#ifndef HAVE_INLINED_SYSCALLS /* Set nonzero during loading and initialization of executable and libraries, cleared before the executable's entry point runs. This must not be initialized to nonzero, because the unused dynamic linker loaded in for libc.so's "ld.so.1" dep will provide the definition seen by libc.so's initializer; that value must be zero, and will be since that dynamic linker's _dl_start and dl_main will never be called. */ int _dl_starting_up = 0; rtld_hidden_def (_dl_starting_up) #endif
According to this obtuse piece of documentation, I should be able to inspect
_dl_starting_up in a debugger; if I mess around with this or figure out what's going on, I'll post back here.
Or again, someone well versed in this weirdness can enlighten me through email, and I'll put their answer here.