幾篇先前幫新玩 C 語言的一個朋友整理的 pointer 操作說明

例 1

int v = 0;
int *p = &v;

C 的資源管理通常會是滿需要花心思的, 這部份的管理策略不一致或不清楚時, 非常容易發生問題 (double free, memory leak, stack/heap corruption, dangling pointer...).

首先要清楚 pointer 指向的內存位置, 是在 stack 或是 heap 上.

此例 p 指向 v. 這裡 v 及 p 都在 stack 上

  • 兩者都不需要手動釋放.
  • 兩者的 life cycle 被限制在 { } 之間

例 2

int *p = malloc(sizeof(int));
*p = 0;
free(p);

指向 heap 上 int 的 pointer.

  • p 在 stack 上
  • 被指向的 int 在 heap 上
  • p 不需要手動釋放
  • 在 heap 上的 int 需要 free()
  • free(p) 指的是取出 stack 上 p 的值 (是個 address) 後傳給 free(), 後者再去 heap allocation list 中把這塊內存標注為可配置

例 3

int v = 0;
int *p = &v;
free(p);
int *p = malloc(sizeof(int) * 2);
p ++;
free(p);

傳給 free() 的 address 必須是在 heap allocation list 中被記錄的. 所以如果傳入指向 stack 上的 address (第一個例子), 或是傳入的 address 不是開頭而是加上了個 offset 的 (第二個例子), 在 list 中就查不到, 會報類似以下錯誤 (runtime error)

$ gcc -o test test.c && ./test
free(): invalid pointer
Aborted (core dumped)

例 4

int *p = malloc(sizeof(int) * 2);
free(p);
*p = 0;

指針用完後, 如果該變量之後還會存活一段時間 (例如 struct 的 member 之一是個 pointer, 只有該 pointer 指向的 resource 釋放了, 但整個 struct 還要被接下去用), 記得要重新初始成 NULL, 以免 dangling pointer 容易產生 bug.

像上例的 pfree(p) 以後指向的 memory block 己經歸還了 (但 address 保存在 p 中) 接著又對該 address 寫入 (*p = 0), 因為指向的 virtual address 己經 map 到 physical memory 上了, 所以不一定 (視操作系統跟 libc 的實現) 會掛掉 (此例就是正常結束程序). 所以, 當該 memory block 又配置給其他內途時, 我們又誤寫到同一 address 上時, 就會造成很不好找的 bug.

一般 0 (NULL) 或是 0 開頭的一個 memory page 會故意不 map 到 physical memory 上, 這樣一旦你對 NULL pointer 寫入時, 就會發生 SIGSEGV (segmentation fault), 問題就會好找得多, 所以上例這麼改以後就會明確的掛掉

int *p = malloc(sizeof(int) * 2);
free(p);
p = NULL;
*p = 0;

運行結果

$ gcc -ggdb -o test test.c && ./test
Segmentation fault (core dumped)

例 5

內存配置是會失敗的 (runtime error), 會發生問題的點就得要正視它, 不然在運行時碰到就會 SIGSEGV (因為 malloc() 傳回了 NULL, 如果你又對該 pointer 寫入就會掛掉)

int *p = malloc(sizeof(int));
if(!p) {
    fprintf(stderr, "failed to ...\n");
    return;
}

*p = 0;

例 6

float v = 1.0;
int *p = (int *) &v;
printf("%f (%zd), 0x%x (%zd)\n", v, sizeof(v), *p, sizeof(*p));

一個 pointer 可以視為 "看 memory block 的觀點", "解讀 memory block 內容的角度", 所以, 同一塊 memory block, 用不同類型的 pointer 指向後, 讀出來的內容就 "有可能" 會天差地別, 上例的運行結果

$ gcc -ggdb -o test test.c && ./test
1.000000 (4), 0x3f800000 (4)

因為放在 memory 中 1.0 這個值是按照 IEEE-754 的格式保存, 所以用 int * 的視角去看的時候值是 0x3f800000.

順帶一提, scalar variable (int, float, ...) 的 cast 跟 pointer 的 cast 是不一樣的.

在 C 裡, 隱式的 (implicit) scalar variable 的 cast 是被允許的, 可能會造成 的改變 (例如 double => float 造成的精度下降, int64 => int32 造成 4 bytes 資訊完全被丟棄), 但 pointer 的 cast 不會造成例如 address 的改變以及被指向的 memory block 的內容改變, 唯一不同的只有 compiler 類型檢查的視角不同, 這點非常重要, C 裡面的 OO 就是利用這個特性完成的.

例 6

int *p = malloc(sizeof(char));
*p = 0;

內存配置時大小要注意, 這也是很不好查的問題, 以上例而言, 一般為了效率, libc 跟 kernel 要內存時不會是只要 1 byte, 而是以 page 為單位 (例如 4k, 可用 getconf PAGESIZE 查詢), 所以, 我們 malloc() 了 1 byte, 卻寫入了 4 bytes 也不會有掛掉的問題 (segmenetation fault 是發生在你寫入的 virtual address 沒被 kernel map 到 physical address 時 CPU 才會發生 trap, kernel 補捉到以後轉發 SIGSEGV 給你的程序).

例 7

int x;

int func()
{
    static int y;
    return y ++;
}

int func2()
{
    int a;
    printf("%d\n", a);
    a = 10;
}

int main()
{
    printf("%d, %d, %d\n", x, func(), func());
    func2();
    func2();
}

執行結果

$ gcc -ggdb -o test test.c && ./test
0, 1, 0
22034
10

為了效率, C 不會對大部份的變量初始, 除了 global variable 及 static variable.

此例的 x 是 global variable, y 是 static variable (注意 static keyword, 所以 y 並不會被放在 stack 上, life span 跟 global variable 一樣到 program 結束為止), 都會被初始為 0, 因此當 func() 第一次被 call, 傳回的值是 0, 第二次傳回 1.

變量 a 被放在 stack 上, stack 的內容會隨著 stack variable 的存取, function 的進出而改變, 且因為放置於 stack 上的 variable 不會被初始, 所以會有亂七八糟的內容在裡面. 上例在第一次 call func2() 時, 打印出了 22034, 然後第二次 call 時打印了第一次 call func2() 時初始 a 的值 (10), 所以使用 stack variable 前的初始是非常重要的.

另外, heap 配置來的 memory block 因為是在 runtime 時的行為, 所以 C compiler 也不會對其內容進行初使, 所以使用前也得要初始 (方法視情境決定).

工具 1

配合 valgrind, 可偵測 memory access 的問題如 leak, buffer overflow, use before initializing 之類的問題. 好處是它不需要跟額外的 library link 在一塊, 也不會產生額外的 machine code, 壞處是會讓程序運行起來慢不少.

舉例

#include <stdlib.h>
#include <stdio.h>

int main()
{
        char *p = malloc(1);
        p[1] = 0;

        char *p2 = malloc(1);
        free(p2);
        p2[0] = 0;

        int i;
        printf("%d\n", i);

        char a[1];
        a[1] = 0;
}

配合 valgrind 執行

$ valgrind --leak-check=full ./test
==29793== Memcheck, a memory error detector
==29793== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==29793== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==29793== Command: ./test
==29793== 
==29793== Invalid write of size 1
==29793==    at 0x108767: main (test.c:8)
==29793==  Address 0x4a4d041 is 0 bytes after a block of size 1 alloc'd
==29793==    at 0x483874F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==29793==    by 0x10875A: main (test.c:7)
==29793== 
==29793== Invalid write of size 1
==29793==    at 0x108788: main (test.c:12)
==29793==  Address 0x4a4d090 is 0 bytes inside a block of size 1 free'd
==29793==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==29793==    by 0x108783: main (test.c:11)
==29793==  Block was alloc'd at
==29793==    at 0x483874F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==29793==    by 0x108773: main (test.c:10)
==29793== 
==29793== Conditional jump or move depends on uninitialised value(s)
==29793==    at 0x48D6E40: __vfprintf_internal (vfprintf-internal.c:1644)
==29793==    by 0x48C18D7: printf (printf.c:33)
==29793==    by 0x1087A0: main (test.c:15)
==29793== 
==29793== Use of uninitialised value of size 8
==29793==    at 0x48BB32E: _itoa_word (_itoa.c:179)
==29793==    by 0x48D69EF: __vfprintf_internal (vfprintf-internal.c:1644)
==29793==    by 0x48C18D7: printf (printf.c:33)
==29793==    by 0x1087A0: main (test.c:15)
==29793== 
==29793== Conditional jump or move depends on uninitialised value(s)
==29793==    at 0x48BB339: _itoa_word (_itoa.c:179)
==29793==    by 0x48D69EF: __vfprintf_internal (vfprintf-internal.c:1644)
==29793==    by 0x48C18D7: printf (printf.c:33)
==29793==    by 0x1087A0: main (test.c:15)
==29793== 
==29793== Conditional jump or move depends on uninitialised value(s)
==29793==    at 0x48D748B: __vfprintf_internal (vfprintf-internal.c:1644)
==29793==    by 0x48C18D7: printf (printf.c:33)
==29793==    by 0x1087A0: main (test.c:15)
==29793== 
==29793== Conditional jump or move depends on uninitialised value(s)
==29793==    at 0x48D6B5A: __vfprintf_internal (vfprintf-internal.c:1644)
==29793==    by 0x48C18D7: printf (printf.c:33)
==29793==    by 0x1087A0: main (test.c:15)
==29793== 
0
==29793== 
==29793== HEAP SUMMARY:
==29793==     in use at exit: 1 bytes in 1 blocks
==29793==   total heap usage: 3 allocs, 2 frees, 1,026 bytes allocated
==29793== 
==29793== 1 bytes in 1 blocks are definitely lost in loss record 1 of 1
==29793==    at 0x483874F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==29793==    by 0x10875A: main (test.c:7)
==29793== 
==29793== LEAK SUMMARY:
==29793==    definitely lost: 1 bytes in 1 blocks
==29793==    indirectly lost: 0 bytes in 0 blocks
==29793==      possibly lost: 0 bytes in 0 blocks
==29793==    still reachable: 0 bytes in 0 blocks
==29793==         suppressed: 0 bytes in 0 blocks
==29793== 
==29793== For counts of detected and suppressed errors, rerun with: -v
==29793== Use --track-origins=yes to see where uninitialised values come from
==29793== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)

工具 2

現代的 compiler 為了增加安全性, 經過多年的發展, 現在能針對 C 語言的特性能做出額外的檢查及運行時偵錯 (使用跟工具 1 一樣的源碼).

Compile time 的問題捕捉

$ gcc -g -o test test.c -fsanitize=leak -fsanitize=address -O2 -Wall && ./test
test.c: In function ‘main’:
test.c:17:7: warning: variable ‘a’ set but not used [-Wunused-but-set-variable]
  char a[1];
       ^
test.c:15:2: warning: ‘i’ is used uninitialized in this function [-Wuninitialized]
  printf("%d\n", i);
  ^~~~~~~~~~~~~~~~~
test.c:18:3: warning: array subscript is above array bounds [-Warray-bounds]
  a[1] = 0;
  ~^~~

gcc 提供了一系列的 sanitize 開關, 可用在偵測 runtime 的問題. 例如這個能偵測 memory leak

$ gcc -g -o test test.c -fsanitize=leak && ./test
0

=================================================================
==2416==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1 byte(s) in 1 object(s) allocated from:
    #0 0x7f6905fc0acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x560c8c6bb7ea in main /home/derekdai/test.c:7
    #2 0x7f6905dedb6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)

SUMMARY: LeakSanitizer: 1 byte(s) leaked in 1 allocation(s).

配合這個能偵測內存操作的其他問題 (原理是在配置的 memory block 前後加上些預先填充的值, 如果改變了的話就代表寫超範圍了)

$ gcc -g -o test test.c -fsanitize=leak -fsanitize=address && ./test
=================================================================
==9544==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000011 at pc 0x555a893bcb9d bp 0x7ffdadf90d00 sp 0x7ffdadf90cf0
WRITE of size 1 at 0x602000000011 thread T0
    #0 0x555a893bcb9c in main /home/derekdai/test.c:8
    #1 0x7ff49bb5ab6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)
    #2 0x555a893bc9e9 in _start (/home/derekdai/test+0x9e9)

0x602000000011 is located 0 bytes to the right of 1-byte region [0x602000000010,0x602000000011)
allocated by thread T0 here:
    #0 0x7ff49bdfdb50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
    #1 0x555a893bcb56 in main /home/derekdai/test.c:7
    #2 0x7ff49bb5ab6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/derekdai/test.c:8 in main
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[01]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==9544==ABORTING

因為 heap buffer overflow 是滿嚴重的問題 (很容易被攻擊方利用), 所以會直接就 abort() (在 Linux 下呼叫 stdlib 的 abort() 效果是會收到 SIGABRT, 預設的行為是 core dump, 但需要事先將保存 core dump 的設定打開, 不然不會有 core file 被保留. 有了 core file, 就可以配合執行檔以 gdb 還原掛掉當下 process 的狀態).

Sanitizer 也支持優化選項, 差別在於 memory block 前後的保留區會小些

$ gcc -o test test.c -fsanitize=leak -fsanitize=address -O2 && ./test
0
=================================================================
==18965==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffd8eb6831 at pc 0x55ad105daa44 bp 0x7fffd8eb6800 sp 0x7fffd8eb67f0
WRITE of size 1 at 0x7fffd8eb6831 thread T0
    #0 0x55ad105daa43 in main (/home/derekdai/test+0xa43)
    #1 0x7f306df51b6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)
    #2 0x55ad105daad9 in _start (/home/derekdai/test+0xad9)

Address 0x7fffd8eb6831 is located in stack of thread T0 at offset 33 in frame
    #0 0x55ad105da97f in main (/home/derekdai/test+0x97f)

  This frame has 1 object(s):
    [32, 33) 'a' <== Memory access at offset 33 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/derekdai/test+0xa43) in main
Shadow bytes around the buggy address:
  0x10007b1cecb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1cecc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1cecd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1cece0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1cecf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007b1ced00: 00 00 f1 f1 f1 f1[01]f2 f2 f2 00 00 00 00 00 00
  0x10007b1ced10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1ced20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1ced30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1ced40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007b1ced50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18965==ABORTING

這個能用來偵對 C 語言中 undefined behaviour 進行 runtime 的檢查

$ gcc -g -o test test.c -fsanitize=undefined -O2 && ./test
test.c:8:7: runtime error: store to address 0x558f5e938e71 with insufficient space for an object of type 'char'
0x558f5e938e71: note: pointer points here
 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  81 f1 00 00 00
              ^ 
0
test.c:18:3: runtime error: index 1 out of bounds for type 'char [1]'
test.c:18:7: runtime error: store to address 0x7ffec3ed2888 with insufficient space for an object of type 'char'
0x7ffec3ed2888: note: pointer points here
 8f 55 00 00  00 e4 c4 9c 3c da 03 23  80 29 ed c3 fe 7f 00 00  00 00 00 00 00 00 00 00  90 4a 7d 5d
              ^

以上 3 個 sanitizer 建議不管是 debug build 或是 release build 都要開著.

還可以參考

工具 3

libc 本身也做了些加固, 讓一些 C standard library 中設計得不是太好的 function 能安全些.

只要在 build code 時加上 `-D

參考

工具 4

現代的 compiler 能支持 -O (優化) 跟 -g (除錯) 同時存在, 在早些的版末共存時可能會造成雖能 debug, 但 debugger 在 step forward 時會有不準確的情況, 在新版基本問題少得多.

如果 release build 能有 debug information, 那發佈後在外運行發生問題時, 取得 core file 時就能很好的重現出問題時的情境.

但, release build 發佈時, 如果包含 debug information 在 object file 中會造成容量的增加, 可能會影響使用者的更新意願.

好在, toolchain 提供了分離 debug information 成為獨立 elf file 的功能. 首先, 看下我們的 object file 是處於什麼情況

$ ls -l test
-rwxr-xr-x 1 user user 12640  9月  6 00:25 test
$ file test
test: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=3f781ca58146ef32b371ed5a8bdea0c731298a4c, with debug_info, not stripped

大小為 12640 bytes, with debug_info, not stripped, 帶了不少額外的資訊在 object file 中.

首先把 debug info 抽出

$ objcopy --only-keep-debug test test.debug
$ ls -l test.debug
-rwxr-xr-x 1 user user 8880  9月  6 00:26 test.debug
$ file test.debug
test.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, for GNU/Linux 3.2.0, BuildID[sha1]=3f781ca58146ef32b371ed5a8bdea0c731298a4c, with debug_info, not stripped

接著從要發佈的 object file 中去掉不需要的資訊

$ strip --strip-unneeded test
$ ls -l test
-rwxr-xr-x 1 user user 6440  9月  6 00:29 test
$ file test
test: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=3f781ca58146ef32b371ed5a8bdea0c731298a4c, stripped

這樣, 要發佈的 object file 大小減小到了 6440 bytes, 狀態中也不再回報 with debug_info. 我們只需要把跟這個 object file 相對應的 debug information 管理好 (可以 BuildID 識別, 這也是新的 toolchain 加上的, 在同樣的源碼 + 同樣的環境 (如 toolchain 版本, toolchain 參數, library 版本, 硬件, ... 等) 之下產生的 object file 就會有相同的 BuildID), 之後就能用它來除錯.

其他參考資訊

Debian 對安全很是注重, 近年來在打包時盡可能的會將一些安全加固機制打開, 如果開發的是 PAM module, 並且會有連網的情況, 可能除了穩定性外還需要考慮到不被攻破. 這裡, 可以參考他們的做法 (基本沒什麼成本, 就只是 build 時候加些開關)
https://wiki.debian.org/Hardening