CLK2015 速記 - How To Be A Filesystem Developer

  • Why to be a filesystem developer
    • Big data (GFS)
    • flash storage
    • container (btfs)
    • ...
  • Why to be a btrfs developer
    • bleeding edge
      • feature rich
      • already have
        • cow, compression, deduplications, raid
      • Under active development
        • inband deduplication
        • subpage sector size support
        • separate qgroup accounting for metadat
      • ...
  • Why to be a btrfs developer
    • next gen filesystem
      • btrfs is considered as next gene9ration filesystem
      • come projects are already using btfs
        • systemd, docker, oopensuse, facebook
      • lead on latest tech
        • oracle database with zfs deduplication
  • How to be a FS developer (roadmap)
    0. normal user, bug report and QT
    0. understanding on-disk format
    0. btrfs-progs developer
    0. kernel btrfs developer
  • recommend rolling deistributino to test latest filesystem
  • normal use bug report and QA
    • bug report
      • with detailed infor
      • kernel/btffs-progs version
      • kernel backtrace if needed
      • reproducer if reproducible
    • QA: need a little more skills
      • compoile latest kernel and btrfs-progs from source
      • git & bisect
  • QA
    • performance test
      • PTS
      • sysbench
      • fio
    • function test
      • fstests
      • LTP
  • Understanding on-disk format
    • on-disk data is statick
    • no c codes involved
    • existing good tool to exam them: dump2fs, btrfs-debug-tree
  • btrfs
    • btrfs stands for b-tree fs
    • all metadata is stored in a b-tree
    • node
      • records pointer to its child lead/node
    • leaf; record detailed info with its idnex key
  • practice with btrfs-debug-tree
    • with almost every detail of btrfs -b-tree
    • debug-tree -> do some operation => debug-tree
      • don't forgot to call 'sync' before debug-tree
      • to see how btrfs records filess and dirs
      • if careful enought , you can also see how btrfs do cow
      • "fs tree" shoould be the easiest start point
    • reference
  • btrfs-progs developer
    • why starts from btrfs-progs?
      • single thread
      • direct metadata operation: no extra infrastrucre, can use what you learn in previouse step
      • quick review: special thanks for David Sterb
    • needed skill
      • C
      • GDB
      • Understanding btrfs b-tree
  • development directions
    • btrfs-debugtree enhancement
      • easiest one
      • can refer to existing codes quite easily
      • helop you to undersnad b-tree
    • btrfsck enhancement
      • more challenge
      • a little complicated data sructure
      • may fix your own problem
    • btrfs-convert debug
      • most complicated
      • needs to refer to kernel codes
      • not
  • btrfs kernel developer
    • the hardest part, a lot of challenges
      • extra kernel facility
      • kernel race/debugging
      • concurrency
      • old, bad commented codes
      • ...
    • needs much more time to test
      • just make it run, without panic/BUG_ON/warning
      • function test
      • performance test
    • but also huge accomplishment when patch is merged
  • challenges (kernel facilities)
    • modern filesystem also implement quite a lot optimization
    • delay allocatino
      • at buffered wirte time, only early check is done, no space is allocated
    • page cache
      • these unwritten data is stored in page cache by MM
      • fs need to keep page cache up to date under a lot of operations (fallocate, truncate, unlink, ...)
    • tons of minor features
      • Direct I/O
      • fsync
      • ...
    • solution?
      • read the funning code
  • challenges (kernel trace/debugging)
    • hard to debug compare to user-space program
      • recompile tackes a lot of time
      • kernel panic is hard to capture
      • hard to set breakpoint/watchpoint
      • ...
    • solutions
      • use ccache/distcache and only recompiile given module
      • use kdump to capture crash
      • use vm with gdb to set kernel breakpoint/watchpoint
        • or old fashion pr_info()
  • challenges (concurrency)
    • kernel is designed for performance, not education
    • concurrency is everywhere, tons of lock, mutex, workqueue, wait_event
    • lockdep is the best solution
      • need to enable in kernel config
      • it's runtime detection, needs tests to trigger it
      • with quite google output explaining how it will cause deadlock
      • but not perfect, only detect spinlock/rwlock/mutex and so on, not support for wait_event
    • echo w >/proc/sysrq-tirgger
  • kernel doc => lwn => google => rtfc (fs) => rtfc (facility)
    • RTFC - Read The Funny Code
  • future plans
    • quota reserve space framework rework
    • inband de-duplication
    • btrfs-convert rework
    • RAID 5/6 readahead

Q&A:

  • ECC RAM?
  • dedup 實用性
  • block size