Tag Archives: pahole

pahole shows data structures from debug data in ELF binaries

pahole is a part of dwarves utilities package and can parse structure information from DWARF and CTF fields in your binary executables. Pahole is useful for highlighting data packing discrepancies and cache alignment in data structures. This can be very useful for data structures that are stored in RAM and are accessed by your main bottlenecks. It can particularly come in useful when switching to 64bit machines. It should be used in very very very late stage of optimization 😉

Installation

Ubuntu makes it very easy:

sudo apt-get install dwarves

Example

Let’s write a simple test file:

#include <stdint.h>

struct S {
  struct S*    a;
  struct S*    b;
  uint8_t      c;
  double       d;
  int32_t      e;
  int32_t      f;
  int32_t      g;
  uint16_t     h;
  uint16_t     i;
  float        j;
  int16_t      k;
  int8_t       l;
  double       m;
  int32_t      n;
  double       o;
  int32_t      p;
  double       q;
};

int main(int argc, char ** argv)
{
  struct S s;
  return 0;
}

and compile it with debug flags

g++ -g test.cc

This will produce an executable called a.out. Now we simply run pahole with the newly created executable as an argument.

pahole a.out

The output is fairly informative:

/tmp> pahole a.out
struct S {
	class S *                  a;                    /*     0     4 */
	class S *                  b;                    /*     4     4 */
	uint8_t                    c;                    /*     8     1 */

	/* XXX 3 bytes hole, try to pack */

	double                     d;                    /*    12     8 */
	int32_t                    e;                    /*    20     4 */
	int32_t                    f;                    /*    24     4 */
	int32_t                    g;                    /*    28     4 */
	uint16_t                   h;                    /*    32     2 */
	uint16_t                   i;                    /*    34     2 */
	float                      j;                    /*    36     4 */
	int16_t                    k;                    /*    40     2 */
	int8_t                     l;                    /*    42     1 */

	/* XXX 1 byte hole, try to pack */

	double                     m;                    /*    44     8 */
	int32_t                    n;                    /*    52     4 */
	double                     o;                    /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	int32_t                    p;                    /*    64     4 */
	double                     q;                    /*    68     8 */

	/* size: 76, cachelines: 2 */
	/* sum members: 72, holes: 2, sum holes: 4 */
	/* last cacheline: 12 bytes */
};	/* definitions: 1 */