AFL插桩（一）概述与普通插桩

————

一、插桩基础

什么是插桩？在AFL编译文件时候afl-gcc或其他插桩模块会在规定位置插入桩代码，可以理解为一个个的探针(但是没有暂停功能)，在后续fuzz的过程中会根据这些桩代码进行路径探索，测试，获取覆盖率等。

详细了解插桩前，回顾下编译的四个阶段：

预处理阶段：预处理器 (cpp) 根据以 # 开头的代码来修改原始的 C 程序。比如 hello world 程序将 #include 命令告诉预处理器读取 stdio.h 的内容，并插入到 hello world 程序中生成 hello.i 文件。
编译阶段：编译器 (cc1) 将 hello.i 文件翻译成汇编文件 hello.s。
汇编阶段：汇编器 (as) 将 hello.s 文件中的汇编指令翻译成机器语言指令，并把这些指令打包成_可重定位目标文件_，并按照ELF文件格式生成 hello.o 文件。
链接阶段：hello world 程序调用了 printf 函数，但是我们的源文件里并没有这个函数的代码，它是一个 C 语言标准库里的代码。链接阶段就由连接器 (ld) 来确定 printf 函数的地址，从而确保程序在执行时能正确调用 printf 函数。

gcc其实也是上面几个工具的一个 wrapper，首先gcc会调用cpp对代码进行预处理，然后将预处理后的文件交给cc1执行编译生成汇编文件，汇编器as将汇编文件作为输入生成机器码–目标文件，最后由连接器ld将目标文件链接生成可执行文件。

一些名词：

tuple 信息：源基本块和目标基本块的配对组合称为 tuple
ASAN和MSAN：ASAN用来检测 **释放后使用(use-after-free)、多次释放(double-free)、缓冲区溢出(buffer overflows)和下溢(underflows)**的内存问题；对于未初始化的内存，需要使用MSAN

fork server：

我们知道，在fuzz的时候，编译target完成后，就可以通过afl-fuzz开始fuzzing了。其大致思路是，对输入的seed文件不断地变化，并将这些mutated input喂给target执行，检查是否会造成崩溃。因此，fuzzing涉及到大量的fork和执行target的过程。
为了更高效地进行上述过程，AFL实现了一套fork server机制。其基本思路是：启动target进程后，target会运行一个fork server；fuzzer并不负责fork子进程，而是与这个fork server通信，并由fork server来完成fork及继续执行目标的操作。这样设计的最大好处，就是不需要调用execve()，从而节省了载入目标文件和库、解析符号地址等重复性工作。

AFL插桩模块
- afl-as.h, afl-as.c, afl-gcc.c：普通插桩模式，针对源码插桩，编译器可以使用gcc，clang
- llvm_mode：llvm 插桩模式，针对源码插桩，编译器使用clang
- qemu_mode：qemu 插桩模式，针对二进制文件插桩
普通模式和 llvm 模式是针对目标程序提供源码的情况，显然相较汇编级的普通模式插桩，编译级的 llvm 模式插桩包含更多优化，在性能上会更佳些，而对仅提供二进制文件的目标程序则需借助 qemu 模式，其性能是最低的。
插桩目的：
- 记录目标程序执行过程中的 tuple 信息，需保证在每个基本块上都有插入
- 必要的初始操作以及维护一个 forkserver（完成fork以及继续执行目标程序的操作）

本质上，AFL并不是独立的编译或汇编工具，而是一个wrapper（包装），进行插桩，最终还是要调用gcc完成。

二、AFL的gcc — afl-gcc

1、概述

afl-gcc 是GCC 或 clang 的一个wrapper（封装），只是会自动加上 -B /usr/local/lib/afl -g -O3

afl-gcc的主要作用是实现对于关键节点的代码插桩，属于汇编级，从而记录程序执行路径之类的关键信息，对程序的运行情况进行反馈。

2、源码

关键变量：

static u8*  as_path;                /* Path to the AFL 'as' wrapper      */
static u8** cc_params;              /* Parameters passed to the real CC  */
static u32  cc_par_cnt = 1;         /* Param count, including argv0      */
static u8   be_quiet,               /* Quiet mode                        */
            clang_mode;             /* Invoked as afl-clang*?            */

主要逻辑函数：

find_as(argv[0]);
edit_params(argc, argv);
execvp(cc_params[0], (char**)cc_params);

find_as(argv[0])
- 作用：查找使用的汇编器afl-as
edit_params(argc, argv)
- 作用：处理传入的编译参数，将确定好的参数放入 cc_params[] 数组
调用 execvp(cc_params[0], (cahr**)cc_params) 执行 afl-gcc

3、总结

afl-gcc.c 用于生成 afl-gcc 文件，其实质是 gcc 的包装，并非一个独立的改进的 gcc 编译器。其指定了编译器的搜索路径，编译器默认优先使用该路径中的汇编器和链接器，即 afl-as，因此，实际的插桩工作发生在汇编的时候。

三、AFL的普通插桩 — afl-as.c

1、概述

afl-as 是 GNU as 的一个wrapper（封装），唯一目的是预处理由 GCC/clang 生成的汇编文件，并插入 afl-as.h 提供的插桩代码。

使用 afl-gcc / afl-clang 编译程序时，工具链会自动调用它。该wapper的目标并不是为了实现向 .s 或 asm 代码块中插入手写的代码。

2、源码

关键变量：

static u8** as_params;          /* Parameters passed to the real 'as'   */

static u8*  input_file;         /* Originally specified input file      */
static u8*  modified_file;      /* Instrumented file for the real 'as'  */

static u8   be_quiet,           /* Quiet mode (no stderr output)        */
            clang_mode,         /* Running in clang mode?              是否运行在clang模式 */
            pass_thru,          /* Just pass data through?             只通过数据 */
            just_version,       /* Just show version?                  只显示版本 */
            sanitizer;          /* Using ASAN / MSAN                   是否使用ASAN/MSAN */

static u32  inst_ratio = 100,   /* Instrumentation probability (%)     插桩覆盖率（密度） */
            as_par_cnt = 1;     /* Number of params to 'as'             */

main函数：

/* Main entry point */

int main(int argc, char** argv) {

  s32 pid;
  u32 rand_seed;
  int status;
  u8* inst_ratio_str = getenv("AFL_INST_RATIO");     //插桩密度（1-100）

  struct timeval tv;
  struct timezone tz;

  clang_mode = !!getenv(CLANG_ENV_VAR);

  if (isatty(2) && !getenv("AFL_QUIET")) {

    SAYF(cCYA "afl-as " cBRI VERSION cRST " by <lcamtuf@google.com>\n");
 
  } else be_quiet = 1;

  if (argc < 2) {

    SAYF("\n"
         "This is a helper application for afl-fuzz. It is a wrapper around GNU 'as',\n"
         "executed by the toolchain whenever using afl-gcc or afl-clang. You probably\n"
         "don't want to run this program directly.\n\n"

         "Rarely, when dealing with extremely complex projects, it may be advisable to\n"
         "set AFL_INST_RATIO to a value less than 100 in order to reduce the odds of\n"
         "instrumenting every discovered branch.\n\n");
    exit(1);

  }

  gettimeofday(&tv, &tz);                          //获取当前时间
  rand_seed = tv.tv_sec ^ tv.tv_usec ^ getpid();   //根据当前时间生产随机种子
  srandom(rand_seed);
  edit_params(argc, argv);
  if (inst_ratio_str) {

    if (sscanf(inst_ratio_str, "%u", &inst_ratio) != 1 || inst_ratio > 100) 
      FATAL("Bad value of AFL_INST_RATIO (must be between 0 and 100)");

  }

  if (getenv(AS_LOOP_ENV_VAR))
    FATAL("Endless loop when calling 'as' (remove '.' from your PATH)");

  setenv(AS_LOOP_ENV_VAR, "1", 1);

  /* When compiling with ASAN, we don't have a particularly elegant way to skip
     ASAN-specific branches. But we can probabilistically compensate for
     that... */

  if (getenv("AFL_USE_ASAN") || getenv("AFL_USE_MSAN")) {
		//当使用ASAN或MSAN时，AFL无法识别出特定的分支，导致插入很多无意义的桩代码，所以直接暴力地将插桩概率/3
    sanitizer = 1;
    inst_ratio /= 3;
  }

  if (!just_version) add_instrumentation();        //插桩函数

  if (!(pid = fork())) {
    execvp(as_params[0], (char**)as_params);
    FATAL("Oops, failed to execute '%s' - check your PATH", as_params[0]);
  }

  if (pid < 0) PFATAL("fork() failed");
  if (waitpid(pid, &status, 0) <= 0) PFATAL("waitpid() failed")
  if (!getenv("AFL_KEEP_ASSEMBLY")) unlink(modified_file);
  exit(WEXITSTATUS(status));

}

主要步骤：

获取插桩密度，即AFL_INST_RATIO环境变量：控制检测每个分支的概率，取值为0到100%，设置为0时则只检测函数入口的跳转，而不会检测函数分支的跳转
gettimeofday(&tv, &tz)：获取时区和时间，当前时间作为种子生成随机数，该随机数用来标识分支 ****key。 srandom()的随机种子 rand_seed = tv.tv_sec ^ tv.tv_usec ^ getpid();
edit_params(argc, argv)：函数进行参数处理

add_instrumentation函数进行插桩：作用：处理输入文件，生成modified_file ，将instrumentation插入所有适当的位置。（注：afl-as 只对代码段进行插桩）插桩步骤：

打开input_file，以及modified_file ，做一些文件读写检查

进入while循环，读取input_file每一行跳过标签、宏、注释、ad-hoc asm blocks、Intel blocks 何时插桩？

第一处：

if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
        instrument_next && line[0] == '\t' && isalpha(line[1])) {

      fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
              R(MAP_SIZE));

      instrument_next = 0;
      ins_lines++;

    }

这些判断变量涉及到函数后面的代码，其中

instr_ok：判断是否为代码

instrument_next：是否为basic block，而afl-as 需要在每个基本块（basic block）中插桩

basic block 标识符包含了冒号和点号，以点开头，中间是数字和字母，可以以此为判断是否为basic block依据（^.L0:或者 ^.LBB0_0:这样的branch label）

if (strstr(line, ":")) {
      if (line[0] == '.') {
			/* Apple: .L<num> / .LBB<num> */
				if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
            && R(100) < inst_ratio) {
						if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;
				}
			} else {

        /* Function label (always instrumented, deferred mode). */

        instrument_next = 1;
    
      }
 
  }

如果存在冒号，但不以点开头，则是函数^func:，直接设置instrument_next = 1

其他skip变量指跳过一些不需要的块

第二处：根据条件分支判断是否为基本块（需要插桩）

/* Conditional branch instruction (jnz, etc). We append the instrumentation
       right after the branch (to instrument the not-taken path) and at the
       branch destination label (handled later on). */

    if (line[0] == '\t') {

      if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {

        fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
                R(MAP_SIZE));

        ins_lines++;

      }

AFL插桩时重点关注的内容包括：^main， ^.L0，^.LBB0_0，^\tjnz foo （_main函数， gcc和clang下的分支标记，条件跳转分支标记），这些内容通常标志了程序的流程变化，因此AFL会重点在这些位置进行插桩完整add_instrumentation函数源码及注释如下：

/* Process input file, generate modified_file. Insert instrumentation in all
   the appropriate places. */

static void add_instrumentation(void) {

  static u8 line[MAX_LINE];

  FILE* inf;
  FILE* outf;
  s32 outfd;
  u32 ins_lines = 0;

  u8  instr_ok = 0, skip_csect = 0, skip_next_label = 0,
      skip_intel = 0, skip_app = 0, instrument_next = 0;

#ifdef __APPLE__

  u8* colon_pos;

#endif /* __APPLE__ */

  if (input_file) {                                              //检查input_file及modified_file文件的读写

    inf = fopen(input_file, "r");
    if (!inf) PFATAL("Unable to read '%s'", input_file);

  } else inf = stdin;

  outfd = open(modified_file, O_WRONLY | O_EXCL | O_CREAT, 0600);

  if (outfd < 0) PFATAL("Unable to write to '%s'", modified_file);

  outf = fdopen(outfd, "w");

  if (!outf) PFATAL("fdopen() failed");  

  while (fgets(line, MAX_LINE, inf)) {                           //循环读取input_file文件，对每一行进行处理             

    /* In some cases, we want to defer writing the instrumentation trampoline
       until after all the labels, macros, comments, etc. If we're in this
       mode, and if the line starts with a tab followed by a character, dump
       the trampoline now. */

    if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
        instrument_next && line[0] == '\t' && isalpha(line[1])) {
      /*instr_ok：是否为代码
				instrument_next ：是否为基本块

			*/

      fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,     //插桩
              R(MAP_SIZE));

      instrument_next = 0;
      ins_lines++;

    }

    /* Output the actual line, call it a day in pass-thru mode. */

    fputs(line, outf);

    if (pass_thru) continue;

    /* All right, this is where the actual fun begins. For one, we only want to
       instrument the .text section. So, let's keep track of that in processed
       files - and let's set instr_ok accordingly. */

    if (line[0] == '\t' && line[1] == '.') {
			//首先判断读入的行是否以‘\t’ 开头，本质上是在匹配.s文件中声明的段，然后判断line[1]是否为.
      /* OpenBSD puts jump tables directly inline with the code, which is
         a bit annoying. They use a specific format of p2align directives
         around them, so we use that as a signal. */

      if (!clang_mode && instr_ok && !strncmp(line + 2, "p2align ", 8) &&
          isdigit(line[10]) && line[11] == '\n') skip_next_label = 1;

      if (!strncmp(line + 2, "text\n", 5) ||
          !strncmp(line + 2, "section\t.text", 13) ||
          !strncmp(line + 2, "section\t__TEXT,__text", 21) ||
          !strncmp(line + 2, "section __TEXT,__text", 21)) {
        instr_ok = 1;    //匹配成功，表示是代码段
        continue; 
      }

      if (!strncmp(line + 2, "section\t", 8) ||
          !strncmp(line + 2, "section ", 8) ||
          !strncmp(line + 2, "bss\n", 4) ||
          !strncmp(line + 2, "data\n", 5)) {
        instr_ok = 0;
        continue;
      }

    }

    /* Detect off-flavor assembly (rare, happens in gdb). When this is
       encountered, we set skip_csect until the opposite directive is
       seen, and we do not instrument. */

    if (strstr(line, ".code")) {

      if (strstr(line, ".code32")) skip_csect = use_64bit;
      if (strstr(line, ".code64")) skip_csect = !use_64bit;

    }

    /* Detect syntax changes, as could happen with hand-written assembly.
       Skip Intel blocks, resume instrumentation when back to AT&T. */

    if (strstr(line, ".intel_syntax")) skip_intel = 1;
    if (strstr(line, ".att_syntax")) skip_intel = 0;

    /* Detect and skip ad-hoc __asm__ blocks, likewise skipping them. */

    if (line[0] == '#' || line[1] == '#') {

      if (strstr(line, "#APP")) skip_app = 1;
      if (strstr(line, "#NO_APP")) skip_app = 0;

    }

    /* If we're in the right mood for instrumenting, check for function
       names or conditional labels. This is a bit messy, but in essence,
       we want to catch:

         ^main:      - function entry point (always instrumented)
         ^.L0:       - GCC branch label
         ^.LBB0_0:   - clang branch label (but only in clang mode)
         ^\tjnz foo  - conditional branches

       ...but not:

         ^# BB#0:    - clang comments
         ^ # BB#0:   - ditto
         ^.Ltmp0:    - clang non-branch labels
         ^.LC0       - GCC non-branch labels
         ^.LBB0_0:   - ditto (when in GCC mode)
         ^\tjmp foo  - non-conditional jumps

       Additionally, clang and GCC on MacOS X follow a different convention
       with no leading dots on labels, hence the weird maze of #ifdefs
       later on.

     */

    if (skip_intel || skip_app || skip_csect || !instr_ok ||
        line[0] == '#' || line[0] == ' ') continue;

    /* Conditional branch instruction (jnz, etc). We append the instrumentation
       right after the branch (to instrument the not-taken path) and at the
       branch destination label (handled later on). */

    if (line[0] == '\t') {

      if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {
			//对于条件跳转指令，当随机数小于插桩密度时，进行插桩
        fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
                R(MAP_SIZE));

        ins_lines++;

      }

      continue;

    }

    /* Label of some sort. This may be a branch destination, but we need to
       tread carefully and account for several different formatting
       conventions. */

#ifdef __APPLE__

    /* Apple: L<whatever><digit>: */

    if ((colon_pos = strstr(line, ":"))) {

      if (line[0] == 'L' && isdigit(*(colon_pos - 1))) {

#else

    /* Everybody else: .L<whatever>: */

    if (strstr(line, ":")) {

      if (line[0] == '.') {

#endif /* __APPLE__ */

        /* .L0: or LBB0_0: style jump destination */

#ifdef __APPLE__

        /* Apple: L<num> / LBB<num> */

        if ((isdigit(line[1]) || (clang_mode && !strncmp(line, "LBB", 3)))
            && R(100) < inst_ratio) {

#else

        /* Apple: .L<num> / .LBB<num> */

        if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
            && R(100) < inst_ratio) {

#endif /* __APPLE__ */

          /* An optimization is possible here by adding the code only if the
             label is mentioned in the code in contexts other than call / jmp.
             That said, this complicates the code by requiring two-pass
             processing (messy with stdin), and results in a speed gain
             typically under 10%, because compilers are generally pretty good
             about not generating spurious intra-function jumps.

             We use deferred output chiefly to avoid disrupting
             .Lfunc_begin0-style exception handling calculations (a problem on
             MacOS X). */

          if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;

        }

      } else {

        /* Function label (always instrumented, deferred mode). */

        instrument_next = 1;
    
      }

    }

  }

  if (ins_lines)
    fputs(use_64bit ? main_payload_64 : main_payload_32, outf);

  if (input_file) fclose(inf);
  fclose(outf);

  if (!be_quiet) {

    if (!ins_lines) WARNF("No instrumentation targets found%s.",
                          pass_thru ? " (pass-thru mode)" : "");
    else OKF("Instrumented %u locations (%s-bit, %s mode, ratio %u%%).",
             ins_lines, use_64bit ? "64" : "32",
             getenv("AFL_HARDEN") ? "hardened" : 
             (sanitizer ? "ASAN/MSAN" : "non-hardened"),
             inst_ratio);
 
  }

}

3、instrumentation trampoline 和 main_payload

trampoline的含义是“蹦床”，直译过来就是“插桩蹦床”。可以直接使用英文更能表达出其代表的真实含义和作用，可以简单理解为桩代码

trampoline_fmt_64/32 根据前面内容知道，在64位环境下，AFL会插入 trampoline_fmt_64到文件中，在32位环境下，AFL会插入trampoline_fmt_32到文件中。查看afl-as.h 头文件中，它们的定义：

static const u8* trampoline_fmt_32 =

  "\n"
  "/* --- AFL TRAMPOLINE (32-BIT) --- */\n"
  "\n"
  ".align 4\n"
  "\n"
  "leal -16(%%esp), %%esp\n"
  "movl %%edi,  0(%%esp)\n"
  "movl %%edx,  4(%%esp)\n"
  "movl %%ecx,  8(%%esp)\n"
  "movl %%eax, 12(%%esp)\n"
  "movl $0x%08x, %%ecx\n"
  "call __afl_maybe_log\n"
  "movl 12(%%esp), %%eax\n"
  "movl  8(%%esp), %%ecx\n"
  "movl  4(%%esp), %%edx\n"
  "movl  0(%%esp), %%edi\n"
  "leal 16(%%esp), %%esp\n"
  "\n"
  "/* --- END --- */\n"
  "\n";

static const u8* trampoline_fmt_64 =

  "\n"
  "/* --- AFL TRAMPOLINE (64-BIT) --- */\n"
  "\n"
  ".align 4\n"
  "\n"
  "leaq -(128+24)(%%rsp), %%rsp\n"
  "movq %%rdx,  0(%%rsp)\n"
  "movq %%rcx,  8(%%rsp)\n"
  "movq %%rax, 16(%%rsp)\n"
  "movq $0x%08x, %%rcx\n"
  "call __afl_maybe_log\n"
  "movq 16(%%rsp), %%rax\n"
  "movq  8(%%rsp), %%rcx\n"
  "movq  0(%%rsp), %%rdx\n"
  "leaq (128+24)(%%rsp), %%rsp\n"
  "\n"
  "/* --- END --- */\n"
  "\n";

功能即：

保存 rdx、 rcx 、rax 寄存器
将 rcx 的值设置为 fprintf() 函数将要打印的变量（key）内容（用于标记当前桩的随机id）
调用 __afl_maybe_log 函数
恢复寄存器

其中，实现插桩的核心代码是__afl_maybe_log 函数（具体汇编代码在main_payload_32/64内），其函数步骤、框架如下：

其中： __afl_area_ptr：共享内存地址；
__afl_prev_loc：上一个插桩位置（id为R(100)随机数的值）；
__afl_fork_pid：由fork产生的子进程的pid；
__afl_temp：缓冲区；
__afl_setup_failure：标志位，如果置位则直接退出；
__afl_global_area_ptr：全局指针。

__afl_maybe_log
__afl_maybe_log:
lahf
seto %al
/* Check if SHM region is already mapped. */
movl __afl_area_ptr, %edx
testl %edx, %edx
je __afl_setup
其中：
- lahf指令（加载状态标志位到AH）将EFLAGS寄存器的低八位复制到 AH，被复制的标志位包括：符号标志位（SF）、零标志位（ZF）、辅助进位标志位（AF）、奇偶标志位（PF）和进位标志位（CF），使用该指令可以方便地将标志位副本保存在变量中
- seto指令溢出置位
- 判断 __afl_area_ptr 是否为NULL（检查共享内存是否进行了设置）：
  - 如果为NULL，跳转到 __afl_setup 函数进行设置
  - 如果不为NULL，继续进行

__afl_setup

__afl_setup:
  /* Do not retry setup if we had previous failures. */
  cmpb $0, __afl_setup_failure(%rip)
  jne __afl_return
  /* Check out if we have a global pointer on file. */
#ifndef __APPLE__
  movq  __afl_global_area_ptr@GOTPCREL(%rip), %rdx
  movq  (%rdx), %rdx
#else
  movq  __afl_global_area_ptr(%rip), %rdx
#endif /* !^__APPLE__ */
  testq %rdx, %rdx
  je    __afl_setup_first
  movq %rdx, __afl_area_ptr(%rip)
  jmp  __afl_store

主要作用为：初始化 __afl_area_ptr ，且只在运行到第一个桩时进行本次初始化其中：

如果 __afl_setup_failure不为0，直接跳转到 __afl_return返回
如果 __afl_setup_failure为0，检查 __afl_global_area_ptr 文件指针是否为NULL
- 如果为NULL，跳转到 __afl_setup_first 进行接下来的工作
- 如果不为NULL，将 __afl_global_area_ptr 的值赋给 __afl_area_ptr，然后跳转到 __afl_store

__afl_setup_first

__afl_setup_first:

  /* Save everything that is not yet saved and that may be touched by
     getenv() and several other libcalls we'll be relying on. */

  leaq -352(%rsp), %rsp

  movq %rax,   0(%rsp)
  movq %rcx,   8(%rsp)
  movq %rdi,  16(%rsp)
  movq %rsi,  32(%rsp)
  movq %r8,   40(%rsp)
  movq %r9,   48(%rsp)
  movq %r10,  56(%rsp)
  movq %r11,  64(%rsp)

  movq %xmm0,  96(%rsp)
  movq %xmm1,  112(%rsp)
  movq %xmm2,  128(%rsp)
  movq %xmm3,  144(%rsp)
  movq %xmm4,  160(%rsp)
  movq %xmm5,  176(%rsp)
  movq %xmm6,  192(%rsp)
  movq %xmm7,  208(%rsp)
  movq %xmm8,  224(%rsp)
  movq %xmm9,  240(%rsp)
  movq %xmm10, 256(%rsp)
  movq %xmm11, 272(%rsp)
  movq %xmm12, 288(%rsp)
  movq %xmm13, 304(%rsp)
  movq %xmm14, 320(%rsp)
  movq %xmm15, 336(%rsp)

  /* Map SHM, jumping to __afl_setup_abort if something goes wrong. */

  /* The 64-bit ABI requires 16-byte stack alignment. We'll keep the
     original stack ptr in the callee-saved r12. */

  pushq %r12
  movq  %rsp, %r12
  subq  $16, %rsp
  andq  $0xfffffffffffffff0, %rsp

  leaq .AFL_SHM_ENV(%rip), %rdi
  CALL_L64("getenv")

  testq %rax, %rax
  je    __afl_setup_abort

  movq  %rax, %rdi
  CALL_L64("atoi")

  xorq %rdx, %rdx   /* shmat flags    */
  xorq %rsi, %rsi   /* requested addr */
  movq %rax, %rdi   /* SHM ID         */
  CALL_L64("shmat")

  cmpq $-1, %rax
  je   __afl_setup_abort

  /* Store the address of the SHM region. */

  movq %rax, %rdx
  movq %rax, __afl_area_ptr(%rip)

#ifdef __APPLE__
  movq %rax, __afl_global_area_ptr(%rip)
#else
  movq __afl_global_area_ptr@GOTPCREL(%rip), %rdx
  movq %rax, (%rdx)
#endif /* ^__APPLE__ */
  movq %rax, %rdx

其中

首先保存所有寄存器的值，包括 xmm 寄存器组
进行 rsp的对齐
获取环境变量 __AFL_SHM_ID，该环境变量保存的是共享内存的ID：
- 如果获取失败，跳转到 __afl_setup_abort
- 如果获取成功，调用 _shmat ，启用对共享内存的访问，启用失败跳转到 __afl_setup_abort
将 _shmat返回的共享内存地址存储在 __afl_area_ptr 和 __afl_global_area_ptr 变量中
最后运行 __afl_forkserver

__afl_forkserver

__afl_forkserver:

  /* Enter the fork server mode to avoid the overhead of execve() calls. We
     push rdx (area ptr) twice to keep stack alignment neat. */

  pushq %rdx
  pushq %rdx

  /* Phone home and tell the parent that we're OK. (Note that signals with
     no SA_RESTART will mess it up). If this fails, assume that the fd is
     closed because we were execve()d from an instrumented binary, or because
     the parent doesn't want to use the fork server. */

  movq $4, %rdx               /* length    */
  leaq __afl_temp(%rip), %rsi /* data      */
  movq $" STRINGIFY((FORKSRV_FD + 1)) ", %rdi       /* file desc */
  CALL_L64("write")

  cmpq $4, %rax
  jne  __afl_fork_resume

主要功能：向 FORKSRV_FD+1 （也就是198+1）号描述符（即状态管道）中写 __afl_temp 中的4个字节，告诉 fork server 已经成功启动完成后继续运行（__afl_fork_wait_loop函数）

__afl_fork_wait_loop

__afl_fork_wait_loop:

  /* Wait for parent by reading from the pipe. Abort if read fails. */

  movq $4, %rdx               /* length    */
  leaq __afl_temp(%rip), %rsi /* data      */
  movq $" STRINGIFY(FORKSRV_FD) ", %rdi             /* file desc */
  CALL_L64("read")
  cmpq $4, %rax
  jne  __afl_die

  /* Once woken up, create a clone of our process. This is an excellent use
     case for syscall(__NR_clone, 0, CLONE_PARENT), but glibc boneheadedly
     caches getpid() results and offers no way to update the value, breaking
     abort(), raise(), and a bunch of other things :-( */

  CALL_L64("fork")
  cmpq $0, %rax
  jl   __afl_die
  je   __afl_fork_resume

  /* In parent process: write PID to pipe, then wait for child. */

  movl %eax, __afl_fork_pid(%rip)

  movq $4, %rdx                   /* length    */
  leaq __afl_fork_pid(%rip), %rsi /* data      */
  movq $" STRINGIFY((FORKSRV_FD + 1)) ", %rdi             /* file desc */
  CALL_L64("write")

  movq $0, %rdx                   /* no flags  */
  leaq __afl_temp(%rip), %rsi     /* status    */
  movq __afl_fork_pid(%rip), %rdi /* PID       */
  CALL_L64("waitpid")
  cmpq $0, %rax
  jle  __afl_die

  /* Relay wait status to pipe, then loop back. */

  movq $4, %rdx               /* length    */
  leaq __afl_temp(%rip), %rsi /* data      */
  movq $" STRINGIFY((FORKSRV_FD + 1)) ", %rdi         /* file desc */
  CALL_L64("write")

  jmp  __afl_fork_wait_loop

等待fuzzer通过控制管道发送过来的命令，读入到 __afl_temp 中：
- 读取失败，跳转到 __afl_die ，结束循环
- 读取成功，继续
fork 一个子进程，子进程执行 __afl_fork_resume
将子进程的pid赋给 __afl_fork_pid，并写到状态管道中通知父进程
等待子进程执行完成，写入状态管道告知 fuzzer
重新执行下一轮 __afl_fork_wait_loop

__afl_fork_resume

__afl_fork_resume:

  /* In child process: close fds, resume execution. */

  movq $" STRINGIFY(FORKSRV_FD) ", %rdi
  CALL_L64("close")

  movq $" STRINGIFY((FORKSRV_FD + 1)) ", %rdi
  CALL_L64("close")

  popq %rdx
  popq %rdx

  movq %r12, %rsp
  popq %r12

  movq  0(%rsp), %rax
  movq  8(%rsp), %rcx
  movq 16(%rsp), %rdi
  movq 32(%rsp), %rsi
  movq 40(%rsp), %r8
  movq 48(%rsp), %r9
  movq 56(%rsp), %r10
  movq 64(%rsp), %r11

  movq  96(%rsp), %xmm0
  movq 112(%rsp), %xmm1
  movq 128(%rsp), %xmm2
  movq 144(%rsp), %xmm3
  movq 160(%rsp), %xmm4
  movq 176(%rsp), %xmm5
  movq 192(%rsp), %xmm6
  movq 208(%rsp), %xmm7
  movq 224(%rsp), %xmm8
  movq 240(%rsp), %xmm9
  movq 256(%rsp), %xmm10
  movq 272(%rsp), %xmm11
  movq 288(%rsp), %xmm12
  movq 304(%rsp), %xmm13
  movq 320(%rsp), %xmm14
  movq 336(%rsp), %xmm15

  leaq 352(%rsp), %rsp

  jmp  __afl_store

关闭子进程中的fd
恢复子进程的寄存器状态
jmp至__afl_store

__afl_store

__afl_store:

  /* Calculate and store hit for the code location specified in rcx. */

#ifndef COVERAGE_ONLY
  xorq __afl_prev_loc(%rip), %rcx
  xorq %rcx, __afl_prev_loc(%rip)
  shrq $1, __afl_prev_loc(%rip)
#endif /* ^!COVERAGE_ONLY */

#ifdef SKIP_COUNTS
  orb  $1, (%rdx, %rcx, 1)
#else
  incb (%rdx, %rcx, 1)
#endif /* ^SKIP_COUNTS */

IDA查看：

第一步的异或中的 a4 ，其实是调用 __afl_maybe_log 时传入的参数（标记当前桩的随机ID）而_afl_prev_loc是上一个桩的随机ID 经过两次异或之后，再将 _afl_prev_loc 右移一位作为新的 _afl_prev_loc，最后再共享内存中存储当前插桩位置的地方计数加一

参考资料

http://lcamtuf.coredump.cx/afl/

https://github.com/google/AFL

https://blog.csdn.net/further_eye/article/details/120842471

https://bbs.pediy.com/thread-269536.htm