Observed limiting application scalability with Multi process Cobol application on ZFS.
[ Keyword : ZFS, dmu_zfetch_find, dmu_zfetch, dmu zfetch, Solaris 11, Cobol, high SYS]
Recently we have seen this issue where cobol application was
not scaling on ZFS due to default file-level prefetch mechanism in ZFS. Significant
system time was spent in file system locking and “prstat –mL” showed application
thread is spending 100% CPU in SYS.
$prstat –mL 5
PID
USERNAME USR SYS TRP TFL DFL LCK SLP LAT
VCX ICX SCL SIG PROCESS/LWPID
10523
cobol 0.3 100 0.0 0.0 0.0 0.0 0.0
0.0 0
22 7K 0 runcobol/1
13050
cobol 0.3 100 0.0 0.0 0.0 0.0 0.0
0.0 2
29 8K 0 runcobol/1
13387
cobol 0.7 99 0.0 0.0 0.0 0.0 0.0 0.0 5 26
18K 0 runcobol/1
10654
cobol 0.6 99 0.0 0.0 0.0 0.0 0.0 0.0 3 26
18K 0 runcobol/1
11444
cobol 1.0 99 0.0 0.0 0.0 0.0 0.0 0.0 6 27
26K 0 runcobol/1
13504
cobol 1.2 99 0.0 0.0 0.0 0.0 0.0 0.0 7 33
31K 0 runcobol/1
10155
cobol 1.1 99 0.0 0.0 0.0 0.0 0.0 0.0 9 29
29K 0 runcobol/1
12271
cobol 0.5 99 0.0 0.0 0.0 0.0 0.0 0.0 9 26
16K 0 runcobol/1
9907
cobol 1.0 99 0.0 0.0 0.0 0.0 0.0 0.0 15 26
26K 0 runcobol/1
9684
cobol 0.7 99 0.0 0.0 0.0 0.0 0.0 0.0 8 24
21K 0 runcobol/1
and lockstat showed “dmu_zfetch_find” caller is the main consumer.
# lockstat –C 10
Adaptive mutex spin: 144772 events in 10.051
seconds (14404 events/sec)
Count indv cuml rcnt nsec Lock Caller
-------------------------------------------------------------------------------
54159
37% 37% 0.00 2000931 0xc4064c958f98 dmu_zfetch_find+0x2fc
53930
37% 75% 0.00 1307159 0xc4064c958f98 dmu_zfetch_find+0x3d8
5824
4% 79% 0.00 4712 cpu_info_template_lock
read_kstat_data+0x174
3792
3% 81% 0.00 2498 kstat_chain_lock kstat_rele+0xc
1822
1% 83% 0.00 72808 0xc4046a36ee58 taskq_thread_wait+0x40
1657
1% 84% 0.00 124276 0xc4046a36ee58 taskq_thread+0x408
1632
1% 85% 0.00 1485 0xc404656960d8 rrw_enter_read+0x4
1515 1%
86% 0.00 5833
kstat_chain_lock kstat_hold+0xc
Looked around a bit and the Solaris documentation says about this
drawback with zfs_prefetch options http://docs.oracle.com/cd/E26502_01/html/E29022/chapterzfs-4.html
Default it will be enabled,
# echo zfs_prefetch_disable/D |mdb -k
zfs_prefetch_disable:
zfs_prefetch_disable: 0
Then we tried disabling
it. Immediate after disabling this, Application started scaling normally.
We can set this value Dynamically with “ # echo zfs_prefetch_disable/W0t1 |mdb –kw” Or by adding “set
zfs:zfs_prefetch_disable=1” to /etc/system and reboot the server.
Ref: