commit 9380e9f0a39bf24d4f05df739c9251eaf4757ded Author: Richard Yao Date: Wed Feb 5 17:15:35 2014 -0500 Revert changes to zbookmark_t Commit 1421c89142376bfd41e4de22ed7c7846b9e41f95 added a field to zbookmark_t that unintentinoally caused a disk format change. This negatively affected backward compatibility and platform portability. Therefore, this field is being removed. The function that field permitted is left unimplemented until a later patch that will reimplement the field in a way that does not affect the disk format. Signed-off-by: Richard Yao commit 881f45c6a8f44486f76c4713ecef0d533d6601e8 Author: Ralf Ertzinger Date: Sun Jan 19 15:36:49 2014 +0100 Add systemd unit files for ZFS startup This adds systemd unit files replacing the functionality offered by the SysV init script found in etc/init.d. It has been developed and tested on Fedora 19, Fedora 20 and openSuSE 13.1. Four unit files and one target are offered. zfs-import-cache.service: Import pools from /etc/zfs/zpool.cache. This unit will wait for udev to settle. zfs-import-scan.service: Import pools by scanning /dev/disk/by-id for zvols. This unit will only run if /etc/zfs/zpool.cache is not present. This unit will wait for udev to settle zfs-mount.service: Mount ZFS native filesystems. It contains a dependency to be loaded before local-fs.target. zfs-share.service: Share NFS/SMB filesystems. This unit contains a dependency that will cause it to be restarted whenever the smb or nfs-server unit is restarted, restoring the shares added. zfs.target: This target pulls in the other units in order to start ZFS. It's the only unit that can be enabled/disabled, all other services are static and pulled in by dependencies. It will honour zfs=off and zfs=no options on the kernel command line. Signed-off-by: Brian Behlendorf Closes #2108 commit c5cb66addcb947ae8843c40c6db134ccd821adb7 Author: Brian Behlendorf Date: Fri Jan 31 16:35:53 2014 -0800 Fix corrupted l2_asize in arcstats Commit e0b0ca9 accidentally corrupted the l2_asize displayed in arcstats. This was caused by changing the l2arc_buf_hdr.b_asize member from an int to uint32_t type. There are places in the code where this field is cast to a uint64_t resulting in the b_hits member being treated as part of b_asize. To resolve the issue the type has been changed to a uint64_t, and the b_hits member is placed after the enum to prevent the size of the structure from increasing. This is a good example of exactly why it's a bad idea to use ambiguous types (int) in these structures. Signed-off-by: DHE Signed-off-by: Brian Behlendorf Closes #1990 commit 2e7b7657cdb9ad02c0e0fcf6c8b2bb1c58d1273a Author: Matthew Ahrens Date: Sat Feb 1 02:52:11 2014 +1100 4188 assertion failed in dmu_tx_hold_free(): dn_datablkshift != 0 Reviewed by: George Wilson Reviewed by: Christopher Siden Approved by: Garrett D'Amore Refences: https://www.illumos.org/issues/4188 illumos/illumos-gate@bb411a08b05466bfe0c7095b6373bbc1587e259a Ported-by: Chris Dunlop Signed-off-by: Brian Behlendorf Closes #2091 commit 8b4646494c23fc17c7cc5f7a857e27c463540098 Author: Matthew Ahrens Date: Fri Jan 24 09:54:37 2014 -0600 Illumos 4504 traverse_visitbp: visit group before user 4504 traverse_visitbp: visit DMU_GROUPUSED_OBJECT before DMU_USERUSED_OBJECT Reviewed by: Christopher Siden Reviewed by: George Wilson References: https://illumos.org/issues/4504 http://code.delphix.com/illumos-4504 http://svnweb.freebsd.org/base?view=revision&revision=260812 Signed-off-by: Brian Behlendorf Signed-off-by: Tim Chase Closes #2079 commit 6d111134c0d1eb9b179eb9fddf26a31d5d45ae22 Author: Tim Chase Date: Sat Jan 18 13:00:53 2014 -0600 Implement relatime. Add the "relatime" property. When set to "on", a file's atime will only be updated if the existing atime at least a day old or if the existing ctime or mtime has been updated since the last access. This behavior is compatible with the Linux "relatime" mount option. Signed-off-by: Tim Chase Signed-off-by: Brian Behlendorf Closes #2064 Closes #1917 commit 2278381ce2a820afe76dd9650298858d7037a01b Author: Patrik Greco Date: Fri Jan 24 19:19:34 2014 +0100 Fix error message in zpios The chunksize must always be strictly smaller than the regionsize. Signed-off-by: Andrew Uselton Signed-off-by: Brian Behlendorf Closes #2072 commit 01b738f457f2a406fb6b4b264fb7a947b9b9989b Author: Cyril Plisko Date: Wed Jan 15 11:26:12 2014 +0200 Call gethrtime() only once per new txg creation When transitioning current open TXG into QUIESCE state and opening a new one txg_quiesce() calls gethrtime(): - to mark the birth time of the new TXG - to record the SPA txg history kstat - implicitely inside spa_txg_history_add() These timestamps are practically the same, so that the first one can be used instead of the other two. The only visible difference is that inside spa_txg_history_add() the time spent in kmem_zalloc() will be counted towards the opened TXG. Since at this point the new TXG already exists (tx->tx_open_txg has been already incremented) it is actually a correct accounting. In any case this extra work is only happening when spa_txg_history kstat is activated (i.e. zfs_txg_history > 0) and doesn't affect the normal processing in any way. Signed-off-by: Cyril Plisko Issue #2075 commit 478d64fdaeb89c8f029f3dd1969447317eedaa6e Author: Igor Lvovsky Date: Thu Jan 16 11:41:27 2014 +0200 Add additional state TXG_STATE_WAIT_FOR_SYNC for txg. In several cases when digging into kstats we can found two txgs in SYNC state, e.g. txg birth state nreserved nread nwritten ... 985452 258127184872561 C 0 373948416 2376272384 ... 985453 258129016180616 C 0 378173440 28793344 ... 985454 258129016271523 S 0 0 0 ... 985455 258130864245986 S 0 0 0 ... 985456 258130867458851 O 0 0 0 ... However only first txg (985454) is really syncing at this moment. The other one (985455) marked as SYNCED is actually in a post-QUIESCED state and waiting to start sync. So, the new TXG_STATE_WAIT_FOR_SYNC state between TXG_STATE_QUIESCED and TXG_STATE_SYNCED was added to reveal this situation. txg birth state nreserved nread nwritten ... 1086896 235261068743969 C 0 163577856 8437248 ... 1086897 235262870830801 C 0 280625152 822594048 ... 1086898 235264172219064 S 0 0 0 ... 1086899 235264936134407 W 0 0 0 ... 1086900 235264936296156 O 0 0 0 ... Signed-off-by: Igor Lvovsky Signed-off-by: Brian Behlendorf Issue #2075 commit 93292b308178cb885e1b11ca1a270c36f5b08a23 Author: Shen Yan Date: Wed Jan 22 12:44:35 2014 +0800 Use enum type(zfetch_dirn_t) instead Fix code with zfetch_dirn_t, which is more readable and clear. Signed-off-by: Brian Behlendorf Closes #2068 commit 4461aa6118fa55dc83f5d75c6d428767c3634fba Author: Tim Chase Date: Sat Jan 18 10:46:43 2014 -0600 Allow chown/chgrp when no ACL SAs exist. From the comment in the commit: Some ZFS implementations (ZEVO) create neither a ZNODE_ACL nor a DACL_ACES SA in which case ENOENT is returned from zfs_acl_node_read() when the SA can't be located. Allow chown/chgrp to succeed in these cases rather than returning an error that makes no sense in the context of the caller. Signed-off-by: Tim Chase Signed-off-by: Brian Behlendorf Issue zfs-osx/zfs#86 Closes #1911 Closes #2029 commit 04aa2de8f788654dda15e0b598fc874915b0fc06 Author: Ned Bass Date: Wed Jan 15 13:52:57 2014 -0800 vdev_file_io_start() to use taskq_dispatch(TQ_PUSHPAGE) The vdev_file_io_start() function may be processing a zio that the txg_sync thread is waiting on. In this case it is not safe to perform memory allocations that may generate new I/O since this could cause a deadlock. To avoid this, call taskq_dispatch() with TQ_PUSHPAGE instead of TQ_SLEEP. Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Issue #1928 commit 3566d5c7c3cb415a53218251fc0247da55dfde46 Author: Brian Behlendorf Date: Fri Jan 17 11:21:48 2014 -0800 Remove incorrect use of EXTRA_DIST for man pages Setting the 'dist_' prefix is the correct way to instruct Automake to include these files in the distribution. The EXTRA_DIST variable is reserved for files which are not covered by the automatic rules. http://www.gnu.org/software/automake/manual/automake.html#Basics Signed-off-by: Brian Behlendorf commit 09d0b30fd1ba08a95e86909d2e1abb2997b0a871 Author: Ned Bass Date: Mon Jan 13 13:32:41 2014 -0800 vdev_id: support per-channel slot mappings The vdev_id udev helper currently applies slot renumbering rules to every channel (JBOD) in the system. This is too inflexible for systems with non-homogeneous storage topologies. The "slot" keyword now takes an optional third parameter which names a channel to which the mapping will apply. If the third parameter is omitted then the rule applies to all channels. The first-specified rule that can match a slot takes precedence. Therefore a channel-specific rule for a given slot should generally appear before a generic rule for the same slot number. In this way a custom slot mapping can be applied to a particular channel and a default mapping applied to the rest. Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #2056 commit 35d3e32274ff05d9b080ea0a77ade1f9c9d7bafc Author: Brian Behlendorf Date: Mon Jan 13 14:27:33 2014 -0800 Use long holds in zvol_set_volsize() Under Linux the zvol_set_volsize() function was originally written to use dmu_objset_hold()/dmu_objset_rele(). Subsequently, the dmu_objset_own()/dmu_objset_disown() interfaces were added but the ZVOL code wasn't updated to take advantage of them. This was never an issue but after the dsl_pool_config changes the code now takes the config lock twice. The cleanest solution is to shift to using dmu_objset_own() which takes a long hold on the dataset and does not hold the dsl pool lock. This patch also slightly restructures the existing code such that it more closely resembles the upstream Illumos code. Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #2039 commit 0f62f3f9abc4bfa0bcafee9bfa3d55e91dcb371d Author: Brian Behlendorf Date: Tue Jan 14 09:39:13 2014 -0800 Disable GCCs aggressive loop optimization GCC >+ 4.8's aggressive loop optimization breaks some of the iterators over the dn_blkptr[] pseudo-array in dnode_phys. Since dn_blkptr[] is defined as a single-element array, GCC believes an iterator can only access index 0 and will unroll the loop into a single iteration. One way to resolve the issue would be to cast the array to a pointer and fix all the iterators that might break. The only loop where it is known to cause a problem is this loop in dmu_objset_write_ready(): for (i = 0; i < dnp->dn_nblkptr; i++) bp->blk_fill += dnp->dn_blkptr[i].blk_fill; In the common case where dn_nblkptr is 3, the loop is only executed a single time and "i" is equal to 1 following the loop. The specific breakage caused by this problem is that the blk_fill of root block pointers wouldn't be set properly when more than one blkptr is in use (when no indrect blocks are needed). The simple reproducing sequence is: zpool create tank /tank.img zdb -ddddd tank 0 Notice that "fill=31", however, there are two L0 indirect blocks with "F=31" and "F=5". The fill count should be 36 rather than 31. This problem causes an assert to be hit in a simple "zdb tank" when built with --enable-debug. However, this approach was not taken because we need to be absolutely sure we catch all instances of this unwanted optimization. Therefore, the build system has been updated to detect if GCC supports the aggressive loop optimization. If it does the optimization will be explicitly disabled using the -fno-aggressive-loop-optimization option. Original-fix-by: Tim Chase Signed-off-by: Tim Chase Signed-off-by: Brian Behlendorf Closes #2010 Closes #2051 commit cbe8e6198cb167f34adc30c6993032a4f4491397 Author: Richard Yao Date: Sat Jan 11 16:07:27 2014 -0500 Properly link zpool command to libblkid 31fc19399e597e3391f19f1392ab120f1de0d5f2 incorrectly removed $(LIBBLKID) from cmd/zpool/Makefile.am. This meant that the toolchain was not given -lblkid, which resulted in the following build failure on Ubuntu 13.10: /usr/bin/ld: zpool_vdev.o: undefined reference to symbol 'blkid_put_cache@@BLKID_1.0' /lib/x86_64-linux-gnu/libblkid.so.1: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status That commit reworked various Makefile.am to follow best practices, so we reintroduce $(LIBBLKID) in a manner consistent with that, rather than explicitly reverting the change. Reproduction of this issue was done on a Gentoo Linux system by executing the following commands: zfs create -o mountpoint=/mnt/ubuntu-13.10 rpool/ROOT/ubuntu-13.10 debootstrap --variant=buildd --arch amd64 saucy /mnt/ubuntu-13.10 http://archive.ubuntu.com/ubuntu/ mount -o bind /dev /mnt/ubuntu-13.10/dev/ mount -o bind /proc/ /mnt/ubuntu-13.10/proc/ mount -o bind /sys/ /mnt/ubuntu-13.10/sys/ cp /etc/resolv.conf /mnt/ubuntu-13.10/etc/ (cd /mnt/ubuntu-13.10/root/ && git clone git://github.com/zfsonlinux/zfs.git) chroot /mnt/ubuntu-13.10/ apt-get install git autoconf libtool zlib1g-dev uuid-dev libblkid-dev \#apt-get install alien fakeroot vim cd /root/zfs ./autogen.sh ./configure --with-config=user --prefix=/usr make That will create a Ubuntu 13.10 chroot, fetch the sources and build test. At this point, cmd/zpool/Makefile.am was modified and the following commands were run to verify that the build issue was resolved: git clean -xdf ./autogen.sh ./configure --with-config=user --prefix=/usr make Although it is not shown here, the absence of libblkid-dev enables ZFS to build successfully without the patch. This could explain how this escaped detection until recently. A test without libblkid-dev was done to verify that the patch did not cause a regression in the absence of libblkid: apt-get remove libblkid-dev git clean -xdf ./autogen.sh ./configure --with-config=user --prefix=/usr make Additionally, the commands themselves were tested against my live system from within the chroot to ensure basic functionality. My live system had corresponding kernel modules already installed and basic commands such as `zpool list` and `zfs list` worked without incident. Lastly, this patch was also build tested on Gentoo Linux, where it caused no problems. At time of writing, these steps can be used to reproduce these results on any modern Linux system that has debootstrap installed. On Gentoo, installing debootstrap can be done with `emerge dev-util/debootstrap`. The current ZFSOnLinux HEAD revision as of writing is fd23720ae14dca926800ae70e6c8f4b4f82efc08. Once this is fixed in HEAD, either that revision or another before this fix and after 31fc19399e597e3391f19f1392ab120f1de0d5f2 will be needed to reproduce this issue. Lastly, it remains to be seen why the toolchains on the systems performing regression tests did not catch this. This is not a ZFS-specific issue, but it is something that we will want to explore in the future. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #2038 commit 741304503a28fc51a6c0a14a0f3c1c88cc825979 Author: Brian Behlendorf Date: Mon Jan 13 13:02:59 2014 -0800 Prevent duplicate mnttab cache entries Under Linux its possible to mount the same filesystem multiple times in the namespace. This can be done either with bind mounts or simply with multiple mount points. Unfortunately, the mnttab cache code is implemented using an AVL tree which does not support duplicate entries. To avoid this issue this patch updates the code to check for a duplicate entry before adding a new one. Signed-off-by: Brian Behlendorf Signed-off-by: Michael Martin Closes #2041 commit fd23720ae14dca926800ae70e6c8f4b4f82efc08 Author: Brian Behlendorf Date: Wed Jan 8 10:25:42 2014 -0800 Drain iput taskq outside z_teardown_lock It's unsafe to drain the iput taskq while holding the z_teardown_lock as a writer. This is because when the last reference on an inode is dropped it may still have pages which need to be written to disk. This will be done through zpl_writepages which will acquire the z_teardown_lock as a reader in ZFS_ENTER. Therefore, if we're holding the lock as a writer in zfs_sb_teardown the unmount will deadlock. Signed-off-by: Brian Behlendorf Signed-off-by: Chris Dunlop Closes #1988 commit 4fcc43790c872139a2e318ebe4100e8404f841c0 Author: Brian Behlendorf Date: Wed Jan 8 00:24:30 2014 +0100 Force LZ4_FORCE_SW_BITCOUNT for Sparc This change was proposed for Sparc but it's not clear to me why it's required. Proper support exists in the lz4 code to detect the endianness and the required builtins are available for gcc. Still I'm including the patch because it will only impact Sparc and it may resolve a case which hasn't occured to me. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: marku89 Issue #1700 commit b585bc4afaf37b744acba6be87f5909b4564b845 Author: Brian Behlendorf Date: Wed Jan 8 00:17:24 2014 +0100 Fix zfs_getattr_fast types On Sparc sp->blksize will be a 64-bit value which is then cast incorrectly to a 32-bit value. For big endian systems this results in an incorrect value for sp->blksize. To resolve the problem local variables of the correct size are used and then assigned to sp->blksize. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: marku89 Issue #1700 commit aa0218d6a12814fac50b287214f9f3b0b99e11b1 Author: Brian Behlendorf Date: Tue Jan 7 23:24:37 2014 +0100 Fix nvlist 'Bus Error' for Sparc The mis-aligned memory accesses in nvpair_native_embedded() and nvpair_native_embedded_array() will cause a 'Bus Error' for architectures such as Sparc which not fully byte addressible. To avoid this issue care is taken to avoid dereferencing the potentially mis-aligned packed nvlist_t. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: marku89 Issue #1700 commit 7f89ae6ba0f4e3c1b3e62272bbaa8228afdb020d Author: Brian Behlendorf Date: Tue Jan 7 23:16:46 2014 +0100 Use local variable to read zp->z_mode When accessing the zp->z_mode through the SA bulk interface we expect that 64-bits are available to hold the result. However,on 32-bit platforms mode_t will only be 32-bits so we cannot pass it to SA_ADD_BULK_ATTR(). Instead a local uint64_t variable must be used and the result assigned to zp->z_mode. This went unnoticed on 32-bit little endian platforms because the bytes happen to end up in the correct 32-bits. But on big endian platforms like Sparc the zp->z_mode will always end up set to zero. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: marku89 Issue #1700 commit d7ec8d4fd9b704f6bc1220e6a79472ad9b3af0c8 Author: Brian Behlendorf Date: Tue Jan 7 23:14:33 2014 +0100 Define the needed ISA types for Sparc Add the minimum required ISA types to support the Sparc architecture. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: marku89 Issue #1700 commit ecf3d9b8e63e5659269e15db527380c65780f71a Author: John Layman Date: Tue Nov 19 16:34:46 2013 -0500 Add ddt, ddt_entry, and l2arc_hdr caches Back the allocations for ddt tables+entries and l2arc headers with kmem caches. This will reduce the cost of allocating these commonly used structures and allow for greater visibility of them through the /proc/spl/kmem/slab interface. Signed-off-by: John Layman Signed-off-by: Brian Behlendorf Closes #1893 commit 4dad7d91e24875f077e26808fec900224e97dcb2 Author: Brian Behlendorf Date: Tue Jan 7 09:31:38 2014 -0800 Remove unconditional sharetab update Removes the unconditional sharetab update when running any zfs command. This means the sharetab might become out of date if users are manually adding/removing shares with exportfs. But we shouldn't punish all callers to zfs in order to handle that unlikely case. In the unlikely event we observe issues because of this it can always be added back to just the share/unshare call paths where we need an up to date sharetab. Signed-off-by: Brian Behlendorf Signed-off-by: Turbo Fredriksson Signed-off-by: Chris Dunlop Issue #845 commit e07306687d0862e8d43b5a0e32003748dedcfa3b Author: Brian Behlendorf Date: Tue Jan 7 09:21:20 2014 -0800 Enable /etc/mtab cache to improve performance Re-enable the /etc/mtab cache to prevent the zfs command from having to repeatedly open and read from the /etc/mtab file. Instead an AVL tree of the mounted filesystems is created and used to vastly speed up lookups. This means that if non-zfs filesystems are mounted concurrently the 'zfs mount' will not immediately detect them. In practice that will rarely happen and even if it does the absolute worst case would be a failed mount. This was originally disabled out of an abundance of paranoia. NOTE: There may still be some parts of the code which do not consult the mtab cache. They should be updated to check the mtab cache as they as discovered to be a problem. Signed-off-by: Brian Behlendorf Signed-off-by: Turbo Fredriksson Signed-off-by: Chris Dunlop Issue #845 commit 8c091798f26e7c1e6fd105e90065ebe12d97dfc2 Author: Turbo Fredriksson Date: Tue Dec 24 16:18:00 2013 +0000 Add UNSHARING of filesystems and EXPORTING pools As a 'stop' action ensure the filesystem is unshared before it is unmounted, just in case. Additionally, export the pool so it may be cleanly imported by a different host. Signed-off-by: Turbo Fredriksson Signed-off-by: Brian Behlendorf Closes #2003 commit fb8e608d9dacf2f6703da8c853f6086e4dd79824 Author: Tim Chase Date: Mon Dec 23 14:06:34 2013 -0600 Fix the creation of ZPOOL_HIST_CMD pool history entries. Move the libzfs_fini() after the zpool_log_history() call so the ZPOOL_HIST_CMD entry can get written. Fix the handling of saved_poolname in zfsdev_ioctl() which was broken as part of the stack-reduction work in a16878805388c4d96cb8a294de965071d138a47b. Since ZoL destroys the TSD data in which the previously successful ioctl()'s pool name is stored following every vop, the ZFS_IOC_LOG_HISTORY ioctl has a very important restriction: it can only successfully write a long entry following a successful ioctl() if no intervening vops have been performed. Some of zfs subcommands do perform intervening vops and to do the logging themselves. At the moment, the "create" and "clone" subcommands have been modified appropriately. Signed-off-by: Tim Chase Signed-off-by: Brian Behlendorf Closes #1998 commit 5d862cb0d9a4b6dcc97a88fa0d5a7a717566e5ab Author: Tim Chase Date: Thu Dec 19 00:30:56 2013 -0600 Properly handle updates of variably-sized SA entries. During the update process in sa_modify_attrs(), the sizes of existing variably-sized SA entries are obtained from sa_lengths[]. The case where a variably-sized SA was being replaced neglected to increment the index into sa_lengths[], so subsequent variable-length SAs would be rewritten with the wrong length. This patch adds the missing increment operation so all variably-sized SA entries are stored with their correct lengths. Previously, a size-changing update of a variably-sized SA that occurred when there were other variably-sized SAs in the bonus buffer would cause the subsequent SAs to be corrupted. The most common case in which this would occur is when a mode change caused the ZPL_DACL_ACES entry to change size when a ZPL_DXATTR (SA xattr) entry already existed. The following sequence would have caused a failure when xattr=sa was in force and would corrupt the bonus buffer: open(filename, O_WRONLY | O_CREAT, 0600); ... lsetxattr(filename, ...); /* create xattr SA */ chmod(filename, 0650); /* enlarges the ACL */ Signed-off-by: Chris Dunlop Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #1978 commit ac0340970c8f548a97f3c3c1e9c6fc7b60efd824 Author: Brian Behlendorf Date: Thu Dec 19 14:30:11 2013 -0800 Register correct handlers for nvlist_{dup,pack,unpack} This change is related to commit 81eaf15 which ensured the correct allocation handlers were installed for nvlist_alloc(). The nvlist functions nvlist_dup(), nvlist_pack(), and nvlist_unpack() suffer from the same issue and have been updated accordingly. Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Issue #1937 commit 11b9ec23b98eefe1e7bde0033dc8285f94cb0b90 Author: Matthew Thode Date: Thu Dec 19 00:24:14 2013 -0600 Add full SELinux support Four new dataset properties have been added to support SELinux. They are 'context', 'fscontext', 'defcontext' and 'rootcontext' which map directly to the context options described in mount(8). When one of these properties is set to something other than 'none'. That string will be passed verbatim as a mount option for the given context when the filesystem is mounted. For example, if you wanted the rootcontext for a filesystem to be set to 'system_u:object_r:fs_t' you would set the property as follows: $ zfs set rootcontext="system_u:object_r:fs_t" storage-pool/media This will ensure the filesystem is automatically mounted with that rootcontext. It is equivalent to manually specifying the rootcontext with the -o option like this: $ zfs mount -o rootcontext=system_u:object_r:fs_t storage-pool/media By default all four contexts are set to 'none'. Further information on SELinux contexts is detailed in mount(8) and selinux(8) man pages. Signed-off-by: Matthew Thode Signed-off-by: Brian Behlendorf Signed-off-by: Richard Yao Closes #1504 commit d1d7e2689db9e03f11c069ebc9f1ba12829e5dac Author: Michael Kjorling Date: Fri Nov 1 20:26:11 2013 +0100 cstyle: Resolve C style issues The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf Closes #1821 commit 8ffef572ed2ba97e0c2d6a8aa2240012e611dc6f Author: Brian Behlendorf Date: Tue Dec 17 13:30:44 2013 -0800 cstyle: Allow spaces in all comments Update the cstyle.pl script to allow pictures in all comments not just header comments. Recent changes from Illumos such as d3cc8b1 have relocated various pictures in the standard block comments to make the code more readable. Signed-off-by: Brian Behlendorf Issue #1821 commit 351a26ddc0a1ec85886fc961612f05686cce82e6 Author: Brian Behlendorf Date: Tue Dec 17 16:11:57 2013 -0800 cstyle: Exclude several files from 'make checkstyle' The zfs_config.h header and *.mod.c files are both products of the build process. They must be excluded from the style check because they are not part of the pristine source. Signed-off-by: Brian Behlendorf Issue #1821 commit 2820bc49c5b7d63aa3941b8e173005f17dd0cee4 Author: John Wren Kennedy Date: Wed Dec 18 15:09:45 2013 -0800 Illumos #4208 4208 Typo in zfs_main.c: "posxiuser" Reviewed by: Sonu Pillai Reviewed by: Will Guyette Reviewed by: Eric Diven Reviewed by: Christopher Siden Approved by: Richard Lowe References: https://www.illumos.org/issues/4208 illumos/illumos-gate@f38cb554a534c6df738be3f4d23327e69888e634 Ported-by: Brian Behlendorf Closes #1986 commit fd8febbd1e6ff3d3eec6b9d395ab65400769da19 Author: Turbo Fredriksson Date: Tue Dec 17 21:53:52 2013 +0000 Add zfs_send_corrupt_data module option Tuning setting to ignore read/checksum errors when sending data. Signed-off-by: Turbo Fredriksson Signed-off-by: Brian Behlendorf Closes #1982 Issue #1897 commit 4788a01dbd11b8fd22e0ff95a197a753778e04ca Author: Aaron Fineman Date: Wed Dec 18 02:33:40 2013 +0000 Cause zfs.spec to place dracut files properly This is an extension of commit ffb2111. As the fedora conditional has been added, this allows centos/rhel-6 to fall back to the proper directory (/usr/share/dracut) Signed-off-by: Brian Behlendorf Closes #1984 commit a5f3665168946318c98ed5407b9314d400bd6dde Author: renelson Date: Tue Dec 17 10:44:23 2013 -0800 Handle acl flags from util-linux mount command Add acl, noacl and posixacl to option_map, avoiding ENOENT error case when mount from util-linux-2.24 execs mount.zfs with any of those flags Signed-off-by: Brian Behlendorf Signed-off-by: renelson Issue #1968 commit 758d35520b7e15fd6db2e8c8f45294a9cf0514cb Author: renelson Date: Tue Dec 17 10:38:28 2013 -0800 Fix grammar in parse_options() error message A minor grammar error was corrected in in the parse_options() error handling for the ENOENT case. Signed-off-by: Brian Behlendorf Signed-off-by: renelson Issue #1968 commit 7dc71949f2f013a7bf744230d60770893ce23a6a Author: Chunwei Chen Date: Tue Dec 17 10:18:25 2013 -0800 Fix z_sync_cnt decrement in zfs_close The comment in zfs_close states that "Under Linux the zfs_close() hook is not symmetric with zfs_open()". This is not true. zfs_open/zfs_close is associated with every successful struct file creation/deletion, which should always be balanced. Here is an example of what's wrong: Process A B open(O_SYNC) z_sync_cnt = 1 open(O_SYNC) z_sync_cnt = 2 close() z_sync_cnt = 0 So z_sync_cnt is 0 even if B still has the file with O_SYNC. Also moves the generic_file_open call before zfs_open to ensure that in the case generic_file_open fails z_sync_cnt is not incremented. This is safe because generic_file_open has no side effects. Signed-off-by: Chunwei Chen Signed-off-by: Brian Behlendorf Issue #1962 commit c2d439dffd4c404d39e82e5b174a338515080f26 Author: Brian Behlendorf Date: Fri Dec 13 11:29:06 2013 -0800 Silence e2fsck warning in zconfig.sh When running zconfig.sh test 7 and 8 cause the following warning to be printed to the console. It's caused because we're snapshoting a mounted ext2 filesystem which is not in a 'clean' state. This is to be expected since we have no guarentees about the on-disk consistency of the filesystem. EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended To silence the warning and preserve the intent of these test cases they have been updated to unmount the filesystem prior to snapshoting them. This ensures the ext2 filesystem is in a consistent state when the snapshot is taken. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Closes #1972 commit ce37ebd2ebcbc8ec6bbaa56cd22e6e807b6d36f3 Author: Brian Behlendorf Date: Thu Dec 12 13:04:40 2013 -0800 cstyle: zvol.c Update zvol.c to conform to the style guidelines, verified by running cstyle.pl on the source file. This patch contains no functional changes. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: Tim Chase Issue #1821 commit d17eab9ce0437d99e165c8d9758a9d0e2c029bdf Author: Brian Behlendorf Date: Thu Dec 12 14:55:19 2013 -0800 Update zfs(8) Snapshots section The Snapshots section of the zfs(8) man page is incorrect and should have been updated as part of #1312. Snapshots of volumes can be accessed independently and their visibility is determined by the 'snapdev=hidden|visible' property. This is analogous to the existing 'snapdir=hidden|visible' property. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: Tim Chase Closes #1921 commit 2e0358cbcab49f7be18762e8cb51e642188709e7 Author: Brian Behlendorf Date: Fri Dec 13 14:49:33 2013 -0800 Sync /dev/zfs ioctl ordering In order to minimize any future disruption caused by the addition and removal /dev/zfs ioctls this patch makes the following changes. 1) Sync ZoL's ioctl ordering such that it matches Illumos. For historic reasons the ZFS_IOC_DESTROY_SNAPS and ZFS_IOC_POOL_REGUID ioctls were out of order. 2) Move Linux and FreeBSD specific ioctls in to their own reserved ranges. This allows us to preserve the existing ordering when new ioctls are added by either Illumos or FreeBSD. When an ioctl is no longer needed it should be retired in place. This change alters the ZFS user/kernel ABI so make sure you rebuild both your user and kernel modules. However, it should allow for a much stabler interface going forward. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Closes #1973 commit ba6a24026c6eb910188c24b5c921fb793d3c998e Author: Brian Behlendorf Date: Fri Dec 6 14:20:22 2013 -0800 Remove ZFC_IOC_*_MINOR ioctl()s Early versions of ZFS coordinated the creation and destruction of device minors from userspace. This was inherently racy and in late 2009 these ioctl()s were removed leaving everything up to the kernel. This significantly simplified the code. However, we never picked up these changes in ZoL since we'd already significantly adjusted this code for Linux. This patch aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s and moving all the functionality down in to the kernel. Since this cleanup will change the kernel/user ABI it's being done in the same tag as the previous libzfs_core ABI changes. This will minimize, but not eliminate, the disruption to end users. Once merged ZoL, Illumos, and FreeBSD will basically be back in sync in regards to handling ZVOLs in the common code. While each platform must have its own custom zvol.c implemenation the interfaces provided are consistent. NOTES: 1) This patch introduces one subtle change in behavior which could not be easily avoided. Prior to this change callers of 'zfs create -V ...' were guaranteed that upon exit the /dev/zvol/ block device link would be created or an error returned. That's no longer the case. The utilities will no longer block waiting for the symlink to be created. Callers are now responsible for blocking, this is why a 'udev_wait' call was added to the 'label' function in scripts/common.sh. 2) The read-only behavior of a ZVOL now solely depends on if the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant policy setting in the gendisk structure was removed. This both simplifies the code and allows us to safely leverage set_disk_ro() to issue a KOBJ_CHANGE uevent. See the comment in the code for futher details on this. 3) Because __zvol_create_minor() and zvol_alloc() may now be called in a sync task they must use KM_PUSHPAGE. References: illumos/illumos-gate@681d9761e8516a7dc5ab6589e2dfe717777e1123 Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Signed-off-by: Tim Chase Closes #1969 commit dda12da9f1ec714af0e468aa03c24f402961f135 Author: George Wilson Date: Thu Dec 12 10:19:54 2013 -0800 Illumos #4121 vdev_label_init read only 4121 vdev_label_init should treat request as succeeded when pool is read only Reviewed by: Christopher Siden Reviewed by: Matthew Ahrens Reviewed by: Saso Kiselkov Approved by: Richard Lowe References: https://www.illumos.org/issues/4121 illumos/illumos-gate@973c78e94bf9634782164382c9e291bf81161fa5 Ported-by: Brian Behlendorf Closes #1863 commit 84b0aac5fdab6daf8c4179dfba4abeb47e0d8b8e Author: Tim Chase Date: Tue Dec 10 16:36:42 2013 -0600 Fix atime handling. Previously, the atime-modifying vnops called ZFS_ACCESSTIME_STAMP() followed by zfs_inode_update() to update the atime. However, since atimes are cached in the znode for delayed writing, the zfs_inode_update() function would effectively ignore the cached atime by reading it from the SA. This commit moves the updating of the atime in the inode into zfs_tstamp_update_setup() which is called by the ZFS_ACCESSTIME_STAMP() macro and eliminates the call to zfs_inode_update() in the atime-modifying vnops. It's possible the same thing could have been done directly in zfs_inode_update() but I wasn't sure that it was safe in all cases where it is called. The effect is that atime handling is as if "strictatime" were selected; even if the filesystem is mounted with "relatime". Signed-off-by: Brian Behlendorf Issue #1949 commit 5cb65efe2c3d4aaa77a5881be364c443c859bbc8 Author: Shen Yan Date: Tue Dec 10 14:58:53 2013 +0800 Fix zstream_t incorrect type The DMU zfetch code organizes streams with lists not avl trees. A avl_node_t was mistakenly used for a list_node_t in the zstream_t type. This is incorrect (but harmless) and when unnoticed because: 1) The list functions explicitly cast the value preventing a warning, 2) sizeof(avl_node_t) >= sizeof(list_node_t) so no overrun occurs, and 3) The calculated offset is the same regardless of the type. Signed-off-by: Brian Behlendorf Closes #1946 commit be5db977eaffd11ae52ddcbb0b64b53ec000082a Author: david.chen Date: Mon Dec 9 15:55:01 2013 +0800 Remove MAX when initializing arc_c_max The MAX when initializing arc_c_max doesn't make any sense because it hasn't been set anywhere before. Though, arc_c_max should be implicitly set to zero when initializing arc_stats, so the MAX doesn't make any difference. The MAX was mistakenly left if place when the Illumos default values were changed for Linux. Signed-off-by: david.chen Signed-off-by: Brian Behlendorf Closes #1941 commit 383efa5743ecf05e11b859e2dcc0133ceab8b458 Author: Simon Guest Date: Mon Dec 9 17:20:20 2013 +1300 Fix multipath bug in vdev_id caused by inconsistent field numbering The bug is caused by multipath output like this: 35000c50056bd77a7 dm-15 HP,MB3000FCWDH size=2.7T features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 2:0:16:0 sdq 65:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 4:0:52:0 sdfp 130:176 active undef running Note that the pipe symbols mean that the field numbering is different between the sdq and sdfp lines. The fix edits out the pipe symbols. Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #1692 commit b6e335bfc489c08bb92a8667e71fa7f69e87d960 Author: Ned Bass Date: Fri Dec 6 15:56:22 2013 -0800 Revert "Use directory xattrs for symlinks" This reverts commit 6a7c0ccca44ad02c476a111d8f7911fc8b12fff7. A proper fix for Issue #1648 was landed under Issue #1890, so this is no longer needed. Signed-off-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #1648 commit 472e7c60853af099fbdf9d52162fd39818884f4f Author: James Pan Date: Fri Dec 6 14:16:40 2013 -0800 sa_find_sizes() may compute wrong SA header size Under the right conditions sa_find_sizes() will compute an incorrect size of the system attribute (SA) header. This causes a failed assertion when the SA_HDR_SIZE_MATCH_LAYOUT() test returns false, and may lead to corruption of SA data. The bug presents itself when there are more than two variable-length SAs of just the right size to fit in the bonus buffer of a dnode. The existing logic fails to account for the SA header space needed to store the sizes of all the variable-length SAs. A reproducer was possible on Linux by setting the xattr=sa dataset property and storing xattrs on symbolic links (Issue #1648). Note the corrupt link target name: $ zfs set xattr=sa tank/fish $ cd /tank/fish $ ln -fs 12345678901234567 link $ setfattr -n trusted.0000000000000000000 -v 0x000000000000000000000000 -h link $ setfattr -n trusted.1111111111111111111 -v 0x000000000000000000000000 -h link $ ls -l link lrwxrwxrwx 1 root root 17 Dec 6 15:40 link -> 90123456701234567 Commit 6a7c0ccca44ad02c476a111d8f7911fc8b12fff7 worked around this bug by forcing xattr's on symlinks to be stored in directory format. This change implements a proper fix, so the workaround can now be reverted. The reference link below contains a reproducer for FreeBSD. References: http://lists.open-zfs.org/pipermail/developer/2013-November/000306.html Ported-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #1890 commit c1ab64d3931cbab45fbb197588cb27cb6fd10c33 Author: Turbo Fredriksson Date: Thu Dec 5 11:37:25 2013 +0000 Update init script to allow verbose mounts Allow verbose mounts to make is easier to monitor progress when mounting a large number of filesystems. This functionality is disabled by default. Signed-off-by: Brian Behlendorf Closes #1929 commit fc220e9ea536ea7a5bcdd231c8ae36e8fef18cfa Author: Turbo Fredriksson Date: Thu Dec 5 11:36:58 2013 +0000 Update init script to allow /dev/disk/by-id import Many people prefer to use by-id at import time instead of using the cache file. This can be a much better solution than the cache file in some environments so we're adding some infrastructure to allow it. This functionality is disabled by default. Signed-off-by: Brian Behlendorf Closes #1929 commit 90ee9ed32faa31174c673165c876b92272a93c72 Author: Brian Behlendorf Date: Wed Dec 4 13:50:34 2013 -0800 Fix 'zfs diff' shares error When creating a dataset with ZoL a zsb->z_shares_dir ZAP object will not be created because shares are unimplemented. Instead ZoL just sets zsb->z_shares_dir to zero to indicate there are no shares. However, if you import a pool which was created with a different ZFS implementation then the shares ZAP object may exist. Code was added to handle this case but it clearly wasn't sufficiently tested with other ZFS pools. There was a bug in the zpl_shares_getattr() function which passed the wrong inode to zfs_getattr_fast() for the case where are shares ZAP object does exist. This causes an EIO to be returned to stat64() which in turn causes 'zfs diff' to fail. This fix is the pass the correct inode after a sucessful zfs_zget(). Additionally, only put away the references if we were able to get one. Signed-off-by: Brian Behlendorf Signed-off-by: Graham Booker Signed-off-by: timemaster67 Closes #1426 Closes #481 commit 99e349db92008ee61dad5a612056cf0fdecb3896 Author: Brian Behlendorf Date: Wed Dec 4 10:32:08 2013 -0800 Add module versioning Use the standard Linux MODULE_VERSION macro to expose the installed zavl, znvpair, zunicode, zcommon, zfs, and zpios module versions. This will also automatically add a checksum of the .c files and headers in "srcversion". See: /sys/module/zavl/version /sys/module/zavl/srcversion /sys/module/znvpair/version /sys/module/znvpair/srcversion /sys/module/zunicode/version /sys/module/zunicode/srcversion /sys/module/zcommon/version /sys/module/zcommon/srcversion /sys/module/zfs/version /sys/module/zfs/srcversion /sys/module/zpios/version /sys/module/zpios/srcversion Signed-off-by: Brian Behlendorf Closes #1923 commit e8b96c6007bf97cdf34869c1ffbd0ce753873a3d Author: Matthew Ahrens Date: Wed Aug 28 20:01:20 2013 -0700 Illumos #4045 write throttle & i/o scheduler performance work 4045 zfs write throttle & i/o scheduler performance work 1. The ZFS i/o scheduler (vdev_queue.c) now divides i/os into 5 classes: sync read, sync write, async read, async write, and scrub/resilver. The scheduler issues a number of concurrent i/os from each class to the device. Once a class has been selected, an i/o is selected from this class using either an elevator algorithem (async, scrub classes) or FIFO (sync classes). The number of concurrent async write i/os is tuned dynamically based on i/o load, to achieve good sync i/o latency when there is not a high load of writes, and good write throughput when there is. See the block comment in vdev_queue.c (reproduced below) for more details. 2. The write throttle (dsl_pool_tempreserve_space() and txg_constrain_throughput()) is rewritten to produce much more consistent delays when under constant load. The new write throttle is based on the amount of dirty data, rather than guesses about future performance of the system. When there is a lot of dirty data, each transaction (e.g. write() syscall) will be delayed by the same small amount. This eliminates the "brick wall of wait" that the old write throttle could hit, causing all transactions to wait several seconds until the next txg opens. One of the keys to the new write throttle is decrementing the amount of dirty data as i/o completes, rather than at the end of spa_sync(). Note that the write throttle is only applied once the i/o scheduler is issuing the maximum number of outstanding async writes. See the block comments in dsl_pool.c and above dmu_tx_delay() (reproduced below) for more details. This diff has several other effects, including: * the commonly-tuned global variable zfs_vdev_max_pending has been removed; use per-class zfs_vdev_*_max_active values or zfs_vdev_max_active instead. * the size of each txg (meaning the amount of dirty data written, and thus the time it takes to write out) is now controlled differently. There is no longer an explicit time goal; the primary determinant is amount of dirty data. Systems that are under light or medium load will now often see that a txg is always syncing, but the impact to performance (e.g. read latency) is minimal. Tune zfs_dirty_data_max and zfs_dirty_data_sync to control this. * zio_taskq_batch_pct = 75 -- Only use 75% of all CPUs for compression, checksum, etc. This improves latency by not allowing these CPU-intensive tasks to consume all CPU (on machines with at least 4 CPU's; the percentage is rounded up). --matt APPENDIX: problems with the current i/o scheduler The current ZFS i/o scheduler (vdev_queue.c) is deadline based. The problem with this is that if there are always i/os pending, then certain classes of i/os can see very long delays. For example, if there are always synchronous reads outstanding, then no async writes will be serviced until they become "past due". One symptom of this situation is that each pass of the txg sync takes at least several seconds (typically 3 seconds). If many i/os become "past due" (their deadline is in the past), then we must service all of these overdue i/os before any new i/os. This happens when we enqueue a batch of async writes for the txg sync, with deadlines 2.5 seconds in the future. If we can't complete all the i/os in 2.5 seconds (e.g. because there were always reads pending), then these i/os will become past due. Now we must service all the "async" writes (which could be hundreds of megabytes) before we service any reads, introducing considerable latency to synchronous i/os (reads or ZIL writes). Notes on porting to ZFS on Linux: - zio_t gained new members io_physdone and io_phys_children. Because object caches in the Linux port call the constructor only once at allocation time, objects may contain residual data when retrieved from the cache. Therefore zio_create() was updated to zero out the two new fields. - vdev_mirror_pending() relied on the depth of the per-vdev pending queue (vq->vq_pending_tree) to select the least-busy leaf vdev to read from. This tree has been replaced by vq->vq_active_tree which is now used for the same purpose. - vdev_queue_init() used the value of zfs_vdev_max_pending to determine the number of vdev I/O buffers to pre-allocate. That global no longer exists, so we instead use the sum of the *_max_active values for each of the five I/O classes described above. - The Illumos implementation of dmu_tx_delay() delays a transaction by sleeping in condition variable embedded in the thread (curthread->t_delay_cv). We do not have an equivalent CV to use in Linux, so this change replaced the delay logic with a wrapper called zfs_sleep_until(). This wrapper could be adopted upstream and in other downstream ports to abstract away operating system-specific delay logic. - These tunables are added as module parameters, and descriptions added to the zfs-module-parameters.5 man page. spa_asize_inflation zfs_deadman_synctime_ms zfs_vdev_max_active zfs_vdev_async_write_active_min_dirty_percent zfs_vdev_async_write_active_max_dirty_percent zfs_vdev_async_read_max_active zfs_vdev_async_read_min_active zfs_vdev_async_write_max_active zfs_vdev_async_write_min_active zfs_vdev_scrub_max_active zfs_vdev_scrub_min_active zfs_vdev_sync_read_max_active zfs_vdev_sync_read_min_active zfs_vdev_sync_write_max_active zfs_vdev_sync_write_min_active zfs_dirty_data_max_percent zfs_delay_min_dirty_percent zfs_dirty_data_max_max_percent zfs_dirty_data_max zfs_dirty_data_max_max zfs_dirty_data_sync zfs_delay_scale The latter four have type unsigned long, whereas they are uint64_t in Illumos. This accommodates Linux's module_param() supported types, but means they may overflow on 32-bit architectures. The values zfs_dirty_data_max and zfs_dirty_data_max_max are the most likely to overflow on 32-bit systems, since they express physical RAM sizes in bytes. In fact, Illumos initializes zfs_dirty_data_max_max to 2^32 which does overflow. To resolve that, this port instead initializes it in arc_init() to 25% of physical RAM, and adds the tunable zfs_dirty_data_max_max_percent to override that percentage. While this solution doesn't completely avoid the overflow issue, it should be a reasonable default for most systems, and the minority of affected systems can work around the issue by overriding the defaults. - Fixed reversed logic in comment above zfs_delay_scale declaration. - Clarified comments in vdev_queue.c regarding when per-queue minimums take effect. - Replaced dmu_tx_write_limit in the dmu_tx kstat file with dmu_tx_dirty_delay and dmu_tx_dirty_over_max. The first counts how many times a transaction has been delayed because the pool dirty data has exceeded zfs_delay_min_dirty_percent. The latter counts how many times the pool dirty data has exceeded zfs_dirty_data_max (which we expect to never happen). - The original patch would have regressed the bug fixed in zfsonlinux/zfs@c418410, which prevented users from setting the zfs_vdev_aggregation_limit tuning larger than SPA_MAXBLOCKSIZE. A similar fix is added to vdev_queue_aggregate(). - In vdev_queue_io_to_issue(), dynamically allocate 'zio_t search' on the heap instead of the stack. In Linux we can't afford such large structures on the stack. Reviewed by: George Wilson Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Reviewed by: Ned Bass Reviewed by: Brendan Gregg Approved by: Robert Mustacchi References: http://www.illumos.org/issues/4045 illumos/illumos-gate@69962b5647e4a8b9b14998733b765925381b727e Ported-by: Ned Bass Signed-off-by: Brian Behlendorf Closes #1913 commit 384f8a09f8423d951bb81d9ca945e588de14f95f Author: Matthew Ahrens Date: Fri Nov 22 15:13:18 2013 -0800 Illumos #4347 ZPL can use dmu_tx_assign(TXG_WAIT) Fix a lock contention issue by allowing threads not holding ZPL locks to block when waiting to assign a transaction. Porting Notes: zfs_putpage() still uses TXG_NOWAIT, unlike the upstream version. This case may be a contention point just like zfs_write(), however it is not safe to block here since it may be called during memory reclaim. Reviewed by: George Wilson Reviewed by: Adam Leventhal Reviewed by: Dan McDonald Reviewed by: Boris Protopopov Approved by: Dan McDonald References: https://www.illumos.org/issues/4347 illumos/illumos-gate@e722410c49fe67cbf0f639cbcc288bd6cbcf7dd1 Ported-by: Ned Bass Signed-off-by: Brian Behlendorf commit 729210564a5325e190fc4fba22bf17bacf957ace Author: Richard Yao Date: Mon Nov 25 12:21:21 2013 -0500 Properly ignore bdi_setup_and_register return value This broke compilation against Linux 3.13 and GCC 4.7.3. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1906 commit 2e40f094109c2b345447351f07b0b525f44988d2 Author: Brian Behlendorf Date: Mon Dec 2 10:26:21 2013 -0800 Remove incorrect ASSERT in zfs_sb_teardown() As part of zfs_sb_teardown() there is an assertion that all inodes which are part of the zsb->z_all_znodes list have at least one reference on them. This is always true for the standard unmount case but there are two other cases where it is not strictly true. * zfs_ioc_rollback() - This is the most common case and it results from the fact that we aren't unmounting the filesystem. During a normal unmount the MS_ACTIVE flag will be cleared on the super block causing iput_final() to evict the inode when its reference count drops to zero. However, during a rollback MS_ACTIVE remains set since we're rolling back a live filesystem and need to preserve the existing super block. This allows inodes with a zero reference count to stay in the cache thereby violating the assertion. * destroy_inode() / zfs_sb_teardown() - There exists a small race between dropping the last reference on an inode and removing it from the zsb->z_all_znodes list. This is unlikely to occur but could also trigger the assertion which is incorrect. The inode may safely have a zero reference count in this case. Since allowing a zero reference count on the inode is expected and safe for both of these cases the simplest thing to do is remove the ASSERT. This code is only enabled for default builds so removing this entirely is a very safe change. Signed-off-by: Brian Behlendorf Signed-off-by: Chris Dunlop Signed-off-by: Tim Chase Closes #1417 Closes #1536 commit c8c8d1e7e5dd156ac0c268895edcd9e552a3adea Author: Richard Yao Date: Wed Nov 13 09:30:21 2013 -0500 Drive database update Added: Adata S396 (obtained from drive_id) Apple MacBookAir3,1 SSD (obtained from drive_id) Apple MacBookPro10,1 SSD (obtained from drive_id) Intel 510 (obtained from drive_id) Intel 710 (obtained from drive_id) Intel DC S3500 (obtained from drive_id) Netapp LUN (obtained from illumos user's sd.conf) OCZ Agility 3 (obtained from drive_id) OCZ Vertex (obtained from drive_id) Samsung PM800 (obtained from drive_id) Sandisk U100 (obtained from drive_id) Sun Comstar (obtained from illumos user's sd.conf) Notes: 1. The entries for the Intel DC S3500 were extrapolated from the 800GB model's entry, which is "ATA INTEL SSDSC2BB80". 2. The entires for the Intel 710 were extrapolated from the 120GG model's entry, which is "ATA INTEL SSDSA2BZ12". 3. The entires for the Intel 510 were extrapolated from the 250GB model's entry, which is "ATA INTEL SSDSC2MH25". 4. The entires for the Apple MacBookPro10,1 SSD were extrapolated from the 512GB model's entry, which is "ATA APPLE SSD SM512E". Google searches suggest that this is a rebadged Samsung 830. 5. The entires for the Apple MacBookAir3,1 SSD were extrapolated from the 128GB model's entry, which is "ATA APPLE SSD TS128C". Google searches suggest that this is a rebadged Kingston SSDNow V+ 100 (based on Toshiba). 6. Sun Comstar is an iSCSI Target, so we cannot tell what the correct sector size is through this method. We list it only for reference purposes, but it is commented out. Similarly, it is not clear what the right thing to do for Netapp is, so we comment it out. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1907 commit f707635fa5a0a687f243a9b0976d7296955744d9 Author: Tim Chase Date: Wed Nov 20 07:56:56 2013 -0600 Some nvlist allocations in hold processing need to use KM_PUSHPAGE. This should hopefully catch the rest of the allocations in the user hold/release processing that were missed by commit 65c67ea86e9f112177f1ad32de8e780f10798a64. Signed-off-by: Brian Behlendorf Closes #1852 Closes #1855 commit 119a394ab0eee137a5198ad3fffab45fb11ef108 Author: Etienne Dechamps Date: Sun Nov 10 15:00:11 2013 +0000 Only commit the ZIL once in zpl_writepages() (msync() case). Currently, using msync() results in the following code path: sys_msync -> zpl_fsync -> filemap_write_and_wait_range -> zpl_writepages -> write_cache_pages -> zpl_putpage In such a code path, zil_commit() is called as part of zpl_putpage(). This means that for each page, the write is handed to the DMU, the ZIL is committed, and only then do we move on to the next page. As one might imagine, this results in atrocious performance where there is a large number of pages to write: instead of committing a batch of N writes, we do N commits containing one page each. In some extreme cases this can result in msync() being ~700 times slower than it should be, as well as very inefficient use of ZIL resources. This patch fixes this issue by making sure that the requested writes are batched and then committed only once. Unfortunately, the implementation is somewhat non-trivial because there is no way to run write_cache_pages in SYNC mode (so that we get all pages) without making it wait on the writeback tag for each page. The solution implemented here is composed of two parts: - I added a new callback system to the ZIL, which allows the caller to be notified when its ITX gets written to stable storage. One nice thing is that the callback is called not only in zil_commit() but in zil_sync() as well, which means that the caller doesn't have to care whether the write ended up in the ZIL or the DMU: it will get notified as soon as it's safe, period. This is an improvement over dmu_tx_callback_register() that was used previously, which only supports DMU writes. The rationale for this change is to allow zpl_putpage() to be notified when a ZIL commit is completed without having to block on zil_commit() itself. - zpl_writepages() now calls write_cache_pages in non-SYNC mode, which will prevent (1) write_cache_pages from blocking, and (2) zpl_putpage from issuing ZIL commits. zpl_writepages() will issue the commit itself instead of relying on zpl_putpage() to do it, thus nicely batching the writes. Note, however, that we still have to call write_cache_pages() again in SYNC mode because there is an edge case documented in the implementation of write_cache_pages() whereas it will not give us all dirty pages when running in non-SYNC mode. Thus we need to run it at least once in SYNC mode to make sure we honor persistency guarantees. This only happens when the pages are modified at the same time msync() is running, which should be rare. In most cases there won't be any additional pages and this second call will do nothing. Note that this change also fixes a bug related to #907 whereas calling msync() on pages that were already handed over to the DMU in a previous writepages() call would make msync() block until the next TXG sync instead of returning as soon as the ZIL commit is complete. The new callback system fixes that problem. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1849 Closes #907 commit 14cecbb159a45c37f84948c758db1009dda49c62 Author: Trey Dockendorf Date: Fri Nov 15 13:36:24 2013 -0600 Change zfs-dkms requirement Version 2.2.0.3-20 of dkms in the EPEL/Fedora repositories added the necessary patches to support ZoL, Therefore, the zfs-dkms requirement on dkms is set to match that version or higher. This allows us to drop the custom dkms build in the ZoL EPEL/Fedora repositories. References: https://bugzilla.redhat.com/show_bug.cgi?id=1023598 Signed-off-by: Brian Behlendorf Closes #1873 commit 54d5378faea6861156fe94b4cd8d817836ed0242 Author: Yuri Pankov Date: Tue Nov 19 16:41:37 2013 +0100 Illumos #2583 2583 Add -p (parsable) option to zfs list References: https://www.illumos.org/issues/2583 illumos/illumos-gate@43d68d68c1ce08fb35026bebfb141af422e7082e Ported-by: Gregor Kopka Signed-off-by: Brian Behlendorf Closes: #937 commit e3dc14b86182a82d99faaa5979846750d937160e Author: Brian Behlendorf Date: Fri Nov 15 09:59:09 2013 -0800 Add I/O Read/Write Accounting Because ZFS bypasses the page cache we don't inherit per-task I/O accounting for free. However, the Linux kernel does provide helper functions allow us to perform our own accounting. These are most commonly used for direct IO which also bypasses the page cache, but they can be used for the common read/write call paths as well. Signed-off-by: Pavel Snajdr Signed-off-by: Brian Behlendorf Closes #313 Closes #1275 commit 29714574fa17291d8427f9a45b109292166d5551 Author: Turbo Fredriksson Date: Sat Nov 16 06:52:54 2013 +0000 Document ZFS module parameters. This is a first draft of a zfs-module-parameters(5) man page. I have just extracted the parameter name and its description with modinfo,then checked the source what type it is and its default value. This will need more work, preferably someone that actually know thesevalues and what to use them for. Signed-off-by: Brian Behlendorf Closes #1856 commit 539defc873dd1b53d7fc483947e56cbfaeebeee8 Author: Maximilian Mehnert Date: Sun Nov 17 11:47:50 2013 +0100 Add missing libzfs_core to Makefiles On some platforms symbols provided by libzfs_core and used by libzfs were not available to the linker. To avoid this issue libzfs_core has been added to the list of required libraries when building utilities which depend on libzfs. This should have been handled properly by libtool and it's still not entirely clear why it wasn't on all platforms. Signed-off-by: Brian Behlendorf Closes #1841 commit e5bacf2109943b813e77185cffb7db07747a6d9c Author: Steven Hartland Date: Wed Nov 20 00:48:28 2013 +1100 Illumos #4322 4322 ZFS deadlock on dp_config_rwlock Reviewed by: Matthew Ahrens Reviewed by: Ilya Usvyatsky Approved by: Dan McDonald References: https://www.illumos.org/issues/4322 illumos/illumos-gate@c50d56f667f119d78fa3d94d6bef2c298ba556f6 Ported by: Chris Dunlop Signed-off-by: Brian Behlendorf Closes #1886 commit fd2366300025ef7b836d80d629bd505c10693b4f Author: DHE Date: Tue Nov 19 19:00:43 2013 -0500 Fix typos in commit b83e3e48c9b183a80dd00eb6c7431a1cbc7d89c9 There's a missing semicolon and equals sign in the first hunk of this commit in config/kernel-bdi.m4. This results in the test always failing. The effects were noticed when rrdtool, a tool which modifies files by mmap() and msync(), would have data never get saved to disk in spite of the files working while the mounted filesystem remains mounted. Signed-off-by: DHE Signed-off-by: Brian Behlendorf Signed-off-by: Richard Yao Closes #1889 commit 64ad2b26e24ae9f70d3a41c786144552c2e6ac12 Author: Brian Behlendorf Date: Thu Nov 14 14:22:52 2013 -0800 Remove the slog restriction on bootfs pools Under Linux this restriction does not apply because we have access to all the required devices. Signed-off-by: Brian Behlendorf Closes #1631 commit 28967367c9e1e97bbd9745da21e26650b508f6f8 Author: Cyril Plisko Date: Mon Aug 26 09:04:38 2013 +0300 Tighten zfs dependency on zfs-kmod Make zfs depend on the same version of zfs-kmod, rather than on same or better. When yum repository contains a number of versions the dependency resolution breaks on trying to install non-latest version. Signed-off-by: Cyril Plisko Signed-off-by: Brian Behlendorf Closes #1677 commit 227bc96951c020a6ea16dbb244901d65d5ee4ba1 Author: Matthew Thode Date: Wed Nov 6 15:54:54 2013 -0600 Fixes (extends) support for selinux xattrs to more inode types Properly initialize SELinux xattrs for all inode types. The initial implementation accidentally only did this for files. Signed-off-by: Brian Behlendorf Closes #1832 commit a16878805388c4d96cb8a294de965071d138a47b Author: Brian Behlendorf Date: Wed Nov 13 11:05:17 2013 -0800 Reduce stack for traverse_visitbp() recursion During pool import stack overflows may still occur due to the potentially deep recursion of traverse_visitbp(). This is most likely to occur when additional layers are added to the block device stack such as DM multipath. To minimize the stack usage for this call path the following changes were made: 1) Added the keywork 'noinline' to the vdev_*_map_alloc() functions to prevent them from being inlined by gcc. This reduced the stack usage of vdev_raidz_io_start() from 208 to 128 bytes, and vdev_mirror_io_start() from 144 to 128 bytes. 2) The 'saved_poolname' charater array in zfsdev_ioctl() was moved from the stack to the heap. This reduced the stack usage of zfsdev_ioctl() from 368 to 112 bytes. 3) The major saving came from slimming down traverse_visitbp() from from 224 to 144 bytes. Since this function is called recursively the 80 bytes saved per invokation adds up. The following changes were made: a) The 'hard' local variable was replaced by a TD_HARD() macro. b) The 'pd' local variable was replaced by 'td->td_pfd' references. c) The zbookmark_t was moved to the heap. This does cost us an additional memory allocation per recursion by that cost should still be minimal. The cost could be further reduced by adding a dedicated zbookmark_t slab cache. d) The variable declarations in 'if (BP_GET_LEVEL()) { }' were restructured to use the minimum amount of stack. This includes removing the 'cbp' local variable. Overall for the offending use case roughly 1584 of total stack space has been saved. This is enough to avoid overflowing the stack on stock kernels with 8k stacks. See #1778 for additional details. Signed-off-by: Brian Behlendorf Signed-off-by: Ned Bass Closes #1778 commit 65c67ea86e9f112177f1ad32de8e780f10798a64 Author: Tim Chase Date: Sun Nov 10 09:00:54 2013 -0600 Some nvlist allocations in hold processing need to use KM_PUSHPAGE. Commit 95fd54a1c5b93bb2aa3e7dffc28c784b1e21a8bb restructured the hold/release processing and moved some of the work into the sync task. A number of nvlist allocations now need to use KM_PUSHPAGE. Signed-off-by: Brian Behlendorf Closes #1852 Closes #1855 commit 2008e9209f2ec37321ec06de4988c5c7f9a015b8 Author: Tim Chase Date: Sat Nov 9 19:22:06 2013 -0600 Fix rollback of mounted filesystem regression The Illumos #3875 patch reverted a part of ZoL's 7b3e34b which added special-case error handling for zfs_rezget(). The error handling dealt with the case in which an all-ones object number ended up being passed to dnode_hold() and causing an EINVAL to be returned from zfs_rezget(). Signed-off-by: Brian Behlendorf Closes #1859 Closes #1861 commit 09d672d331377e5764bc94b3362c35481ae96a52 Author: Matthew Thode Date: Fri Nov 8 15:53:54 2013 -0600 Python 3 fixes Future proofing for compatibility with newer versions of Python. Signed-off-by: Matthew Thode Signed-off-by: Brian Behlendorf Issue #1838 commit 23bc1f91fc5694699750be6343070e0d16fbe4ea Author: Matthew Thode Date: Fri Nov 8 15:52:06 2013 -0600 pep8 code readability changes Update the code to follow the pep8 style guide. References: http://www.python.org/dev/peps/pep-0008/ Signed-off-by: Matthew Thode Signed-off-by: Brian Behlendorf Issue #1838 commit 7a4f54688ee9503c5bf9fcd7a88c4b4b33c36572 Author: Bassu Date: Sat Nov 9 02:16:38 2013 +0500 Explain 'zfs list -t snap -o name -s name' speedup Commit 0cee240 from FreeBSD dramatically speeds up 'zfs list' performance assuming you're only interested in the dataset names. This optimization should be mentioned in the man page to allow end users to take advantage of it. Signed-off-by: Brian Behlendorf Closes #1847 commit 760ec997dfde8cf7dcbe1f367456423668e0cf76 Author: Matthew Thode Date: Wed Nov 6 16:56:50 2013 -0600 Updating init scripts to have more robust grepping The previous pattern could accidentally match on things like 'real_root=ZFS=node02-zp00/ROOT/rootfs' due to the 'ZFS=no' substring. Signed-off-by: Matthew Thode Signed-off-by: Brian Behlendorf Closes #1837 commit fd4f76160cb34539f875781fe7f2dea4b937ace5 Author: Tim Chase Date: Wed Nov 6 23:55:18 2013 -0600 Handle concurrent snapshot automounts failing due to EBUSY. In the current snapshot automount implementation, it is possible for multiple mounts to attempted concurrently. Only one of the mounts will succeed and the other will fail. The failed mounts will cause an EREMOTE to be propagated back to the application. This commit works around the problem by adding a new exit status, MOUNT_BUSY to the mount.zfs program which is used when the underlying mount(2) call returns EBUSY. The zfs code detects this condition and treats it as if the mount had succeeded. Signed-off-by: Brian Behlendorf Closes #1819 commit b1d13a60d12a7df0f2e1bed6405529790213a6cb Author: Tim Chase Date: Thu Nov 7 22:45:39 2013 -0600 Document the dedupditto pool property. Signed-off-by: Brian Behlendorf Closes #1839 commit b695c34ea4ca3037cfbc0fe7a9283334b761abc1 Author: Massimo Maggi Date: Sun Nov 3 00:40:26 2013 +0100 Honor CONFIG_FS_POSIX_ACL kernel option The required Posix ACL interfaces are only available for kernels with CONFIG_FS_POSIX_ACL defined. Therefore, only enable Posix ACL support for these kernels. All major distribution kernels enable CONFIG_FS_POSIX_ACL by default. If your kernel does not support Posix ACLs the following warning will be printed at ZFS module load time. "ZFS: Posix ACLs disabled by kernel" Signed-off-by: Massimo Maggi Signed-off-by: Brian Behlendorf Closes #1825 commit 78e2739d3c9e433c92cd1623a510edb2c83a97d9 Author: Matthew Ahrens Date: Mon Aug 12 12:53:33 2013 -0400 26126 panic system rather than corrupting pool if we hit bug 26100 References: delphix/delphix-os@931c8aaab74b6412933d299890894262e2ef8380 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1650 commit 2517c8ee08ef21ba112c00a94070302cdca04a58 Author: Brian Behlendorf Date: Tue Nov 5 10:32:39 2013 -0800 Switch allocations from KM_SLEEP to KM_PUSHPAGE A couple of kmem_alloc() allocations were using KM_SLEEP in the sync thread context. These were accidentally introduced by the recent set of Illumos patches. The solution is to switch to KM_PUSHPAGE. dsl_dataset_promote_sync() -> promote_hold() -> snaplist_make() ->kmem_alloc(sizeof (*snap), KM_SLEEP); dsl_dataset_user_hold_sync() -> dsl_onexit_hold_cleanup() -> kmem_alloc(sizeof (*ca), KM_SLEEP) Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 1ca546b33888b8f4c7e737faf8f038732926fd6e Author: Saso Kiselkov Date: Mon Oct 14 18:29:45 2013 -0400 Illumos #3995 3995 Memory leak of compressed buffers in l2arc_write_done References: https://illumos.org/issues/3995 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1688 Issue #1775 commit 43a696ed38cae25ec2d7b6466ab4a99eb86df7bd Author: George Wilson Date: Fri Oct 4 14:13:23 2013 -0800 Illumos #4168, #4169, #4170 4168 ztest assertion failure in dbuf_undirty 4169 verbatim import causes zdb to segfault 4170 zhack leaves pool in ACTIVE state Reviewed by: Adam Leventhal Reviewed by: Eric Schrock Reviewed by: Matthew Ahrens Approved by: Dan McDonald References: https://www.illumos.org/issues/4168 https://www.illumos.org/issues/4169 https://www.illumos.org/issues/4170 illumos/illumos-gate@7fdd916c474ea52896c671bbe7b56ba34a1ca132 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 92bc214c2e00bd4a430eac1629f1bcf2fc590d51 Author: Matthew Ahrens Date: Fri Aug 30 01:19:35 2013 -0800 Illumos #4082 4082 zfs receive gets EFBIG from dmu_tx_hold_free() Reviewed by: Eric Schrock Reviewed by: Christopher Siden Reviewed by: George Wilson Approved by: Richard Lowe References: https://www.illumos.org/issues/4082 illumos/illumos-gate@5253393b09789ec67bec153b866d7285a1cf1645 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit ac72fac3eaa569902cad88053167f7d74e7fe7e4 Author: George Wilson Date: Thu Aug 29 10:56:49 2013 -0800 Illumos #3954, #4080, #4081 3954 metaslabs continue to load even after hitting zfs_mg_alloc_failure limit 4080 zpool clear fails to clear pool 4081 need zfs_mg_noalloc_threshold Reviewed by: Adam Leventhal Reviewed by: Matthew Ahrens Approved by: Richard Lowe References: https://www.illumos.org/issues/3954 https://www.illumos.org/issues/4080 https://www.illumos.org/issues/4081 illumos/illumos-gate@22e30981d82a0b6dc89253596ededafae8655e00 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit a169a625a6d57ae0a92147cfde0da69235b2d4f1 Author: Matthew Ahrens Date: Thu Aug 22 09:51:47 2013 -0800 Illumos #4046 4046 dsl_dataset_t ds_dir->dd_lock is highly contended Reviewed by: Eric Schrock Reviewed by: George Wilson Approved by: Garrett D'Amore References: https://www.illumos.org/issues/4046 illumos/illumos-gate@b62969f868a827f0823a084bc0af9c7d8b76c659 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. This commit removed dsl_dataset_namelen in Illumos, but that appears to have been removed from ZFSOnLinux in an earlier commit. commit 8ce0af07bb3227c152d32e74683d1fdc1869246c Author: Marcel Telka Date: Thu Aug 15 22:33:42 2013 -0400 Illumos #4061 4061 libzfs: memory leak in iter_dependents_cb() Reviewed by: Jeffry Molanus Reviewed by: Boris Protopopov Reviewed by: Andy Stormont Reviewed by: Matthew Ahrens Approved by: Dan McDonald References: https://www.illumos.org/issues/4061 illumos/illumos-gate@2fbdf8dbf01ec1c85fcd3827cdf9e9f5f46c4c8a Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit b663a23d36d805dd5e9d1b4663dbf5966944002d Author: Matthew Ahrens Date: Tue Aug 20 20:11:52 2013 -0800 Illumos #4047 4047 panic from dbuf_free_range() from dmu_free_object() while doing zfs receive Reviewed by: Adam Leventhal Reviewed by: George Wilson Approved by: Dan McDonald References: https://www.illumos.org/issues/4047 illumos/illumos-gate@713d6c208802cfbb806329ec0d154b641b80c355 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. The exported symbol dmu_free_object() was renamed to dmu_free_long_object() in Illumos. commit 46ba1e59d3ae7e374c7a98f15f4bef21ee3fcded Author: Matthew Ahrens Date: Wed Aug 14 11:42:31 2013 -0800 Illumos #3996 3996 want a libzfs_core API to rollback to latest snapshot Reviewed by: Christopher Siden Reviewed by: Adam Leventhal Reviewed by: George Wilson Reviewed by: Andy Stormont Approved by: Richard Lowe References: https://www.illumos.org/issues/3996 illumos/illumos-gate@a7027df17fad220a20367b9d1eb251bc6300d203 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 5d1f7fb647e8923d154901ef3e19676e7bf3d345 Author: George Wilson Date: Wed Aug 7 12:16:22 2013 -0800 Illumos #3956, #3957, #3958, #3959, #3960, #3961, #3962 3956 ::vdev -r should work with pipelines 3957 ztest should update the cachefile before killing itself 3958 multiple scans can lead to partial resilvering 3959 ddt entries are not always resilvered 3960 dsl_scan can skip over dedup-ed blocks if physical birth != logical birth 3961 freed gang blocks are not resilvered and can cause pool to suspend 3962 ztest should print out zfs debug buffer before exiting Reviewed by: Matthew Ahrens Reviewed by: Adam Leventhal Approved by: Richard Lowe References: https://www.illumos.org/issues/3956 https://www.illumos.org/issues/3957 https://www.illumos.org/issues/3958 https://www.illumos.org/issues/3959 https://www.illumos.org/issues/3960 https://www.illumos.org/issues/3961 https://www.illumos.org/issues/3962 illumos/illumos-gate@b4952e17e8858d3225793b28788278de9fe6038d Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Porting notes: 1. zfs_dbgmsg_print() is only used in userland. Since we do not have mdb on Linux, it does not make sense to make it available in the kernel. This means that a build failure will occur if any future kernel patch depends on it. However, that is unlikely given that this functionality was added to support zdb. 2. zfs_dbgmsg_print() is only invoked for -VVV or greater log levels. This preserves the existing behavior of minimal noise when running with -V, and -VV. 3. In vdev_config_generate() the call to nvlist_alloc() was not changed to fnvlist_alloc() because we must pass KM_PUSHPAGE in the txg_sync context. commit 621dd7bb2c970838bcf2226ac365c517af7a4bb1 Author: George Wilson Date: Wed Aug 7 10:24:34 2013 -0800 Illumos #3949, #3950, #3952, #3953 3949 ztest fault injection should avoid resilvering devices 3950 ztest: deadman fires when we're doing a scan 3951 ztest hang when running dedup test 3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together Reviewed by: Matthew Ahrens Reviewed by: Adam Leventhal Approved by: Richard Lowe References: https://www.illumos.org/issues/3949 https://www.illumos.org/issues/3950 https://www.illumos.org/issues/3951 https://www.illumos.org/issues/3952 illumos/illumos-gate@2c1e2b44148432fb7a509dd216a99299b6740250 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. The deadman thread was removed from ztest during the original port because it depended on Solaris thr_create() interface. This functionality should be reintroduced using the more portable pthreads. commit 383fc4a9970ede483dc4bd7579f1c62942d1312f Author: Matthew Ahrens Date: Wed Aug 7 10:32:46 2013 -0800 Illumos #3955 3955 ztest failure: assertion refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite Reviewed by: Adam Leventhal Reviewed by: Dan Kimmel Reviewed by: George Wilson Approved by: Richard Lowe References: https://www.illumos.org/issues/3955 illumos/illumos-gate@be9000cc677e0a8d04e5be45c61d7370fc8c7b54 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 9554185d90a9f833c023c1bb8bc35779b8fd1b10 Author: Steven Hartland Date: Tue Aug 6 09:50:40 2013 -0800 Illumos #3973 3973 zfs_ioc_rename alters passed in zc->zc_name Reviewed by: Matthew Ahrens Reviewed by: George Wilson Approved by: Christopher Siden References: https://www.illumos.org/issues/3973 illumos/illumos-gate@a0c1127b147dc6a0372b141deb8c0c2b0195b8ea Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 6389d42205f56083b7658b2c67f117a244f13e52 Author: Steven Hartland Date: Mon Jul 29 11:36:31 2013 -0800 Illumos #3909 3909 "zfs send -D" does not work Reviewed by: Matthew Ahrens Approved by: Christopher Siden References: https://www.illumos.org/issues/3909 illumos/illumos-gate@36f7455d36b60be70d7aae5959fa19e71954678e Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit ea97f8ce35a8a515610e52b7e4744549f9c510f4 Author: Matthew Ahrens Date: Mon Jul 29 10:58:53 2013 -0800 Illumos #3834 3834 incremental replication of 'holey' file systems is slow Reviewed by: Adam Leventhal Reviewed by: George Wilson Approved by: Richard Lowe References: https://www.illumos.org/issues/3834 illumos/illumos-gate@ca48f36f20f6098ceb19d5b084b6b3d4b8eca9fa Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 2883cad5b747b5e4e2164fbe3236451d5b43f333 Author: Matthew Ahrens Date: Wed Jul 3 08:13:38 2013 -0800 Illumos #3836 3836 zio_free() can be processed immediately in the common case Reviewed by: George Wilson Reviewed by: Adam Leventhal Approved by: Dan McDonald References: https://www.illumos.org/issues/3836 illumos/illumos-gate@9cb154a3c9f170904dce9bad5bd5a7d256b922a4 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 498877baf5038b32c1531e5ec96b435023200f4d Author: Matthew Ahrens Date: Thu May 16 14:18:06 2013 -0700 Illumos #3112, #3113, #3114 3112 ztest does not honor ZFS_DEBUG 3113 ztest should use watchpoints to protect frozen arc bufs 3114 some leaked nvlists in zfsdev_ioctl Reviewed by: Adam Leventhal Reviewed by: Matt Amdur Reviewed by: George Wilson Reviewed by: Christopher Siden Approved by: Eric Schrock References: https://www.illumos.org/issues/3112 https://www.illumos.org/issues/3113 https://www.illumos.org/issues/3114 illumos/illumos-gate@cd1c8b85eb30b568e9816221430c479ace7a559d The /proc/self/cmd watchpoint interface is specific to Solaris. Therefore, the #3113 implementation was reworked to use the more portable mprotect(2) system call. When the pages are watched they are marked read-only for protection. Any write to the protected address range immediately trigger a SIGSEGV. The pages are marked writable again when they are unwatched. Ported-by: Brian Behlendorf Issue #1489 commit 03c6040bee6c87a9413b7da41d9f580f79a8ab62 Author: George Wilson Date: Fri May 10 12:47:54 2013 -0700 Illumos #3236 3236 zio nop-write Reviewed by: Matt Ahrens Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Approved by: Garrett D'Amore References: illumos/illumos-gate@80901aea8e78a2c20751f61f01bebd1d5b5c2ba5 https://www.illumos.org/issues/3236 Porting Notes 1. This patch is being merged dispite an increased instance of https://www.illumos.org/issues/3113 being triggered by ztest. Ported-by: Brian Behlendorf Issue #1489 commit 831baf06efb3023ddee7ed41800d3b44521bf2ee Author: Keith M Wesolowski Date: Sat Jul 27 10:50:07 2013 -0700 Illumos #3875 3875 panic in zfs_root() after failed rollback Reviewed by: Jerry Jelinek Reviewed by: Matthew Ahrens Approved by: Gordon Ross References: https://www.illumos.org/issues/3875 illumos/illumos-gate@91948b51b8e978ddc88a36b2bc3ae83c20cdc9aa Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 19580676295b4e271da63dce145bb17c3731d069 Author: Matthew Ahrens Date: Mon Jul 29 10:55:16 2013 -0800 Illumos #3888 3888 zfs recv -F should destroy any snapshots created since the incremental source Reviewed by: George Wilson Reviewed by: Adam Leventhal Reviewed by: Peng Dai Approved by: Richard Lowe References: https://www.illumos.org/issues/3888 illumos/illumos-gate@34f2f8cf94052481c81be2e134b94a00b501bf21 Porting notes: 1. Commit 1fde1e37208c2f56c72c70a06676676f04b65998 wrapped a declaration in dsl_dataset_modified_since_lastsnap in ASSERTV(). The ASSERTV() and local variable have been removed to avoid an unused variable warning. Signed-off-by: Brian Behlendorf Ported-by: Richard Yao Issue #1775 commit 96c2e961938d4018ddb393fa60e004d8a91a58e9 Author: Keith M Wesolowski Date: Sat Jul 27 10:51:50 2013 -0700 Illumos #3894 3894 zfs should not allow snapshot of inconsistent dataset Reviewed by: Matthew Ahrens Approved by: Gordon Ross References: https://www.illumos.org/issues/3894 illumos/illumos-gate@ca48f36f20f6098ceb19d5b084b6b3d4b8eca9fa Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 1a077756e8ba946a55f999fa1cb5f0e7dcb9aa81 Author: Matthew Ahrens Date: Thu Jun 20 14:43:17 2013 -0800 Illumos #3829 3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl Reviewed by: Matt Amdur Reviewed by: Christopher Siden Approved by: Richard Lowe References: https://www.illumos.org/issues/3829 illumos/illumos-gate@bb6e70758d0c30c09f148026d6e686e21cfc8d18 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 34ffbed88c949bc4c8b52691e548db16a6e6816a Author: Steven Hartland Date: Tue Jun 18 22:36:40 2013 -0800 Illumos #3818 3818 zpool status -x should report pools with removed l2arc devices Reviewed by: Saso Kiselkov Reviewed by: George Wilson Approved by: Christopher Siden References: https://www.illumos.org/issues/3818 illumos/illumos-gate@7f2416ef64fb43dab18d9b36c0da64bea37c0df3 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 95fd54a1c5b93bb2aa3e7dffc28c784b1e21a8bb Author: Steven Hartland Date: Sat May 25 02:06:23 2013 +0000 Illumos #3740 3740 Poor ZFS send / receive performance due to snapshot hold / release processing Reviewed by: Matthew Ahrens Approved by: Christopher Siden References: https://www.illumos.org/issues/3740 illumos/illumos-gate@a7a845e4bf22fd1b2a284729ccd95c7370a0438c Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. 13fe019870c8779bf2f5b3ff731b512cf89133ef introduced a merge conflict in dsl_dataset_user_release_tmp where some variables were moved outside of the preprocessor directive. 2. dea9dfefdd747534b3846845629d2200f0616dad made the previous merge conflict worse by switching KM_SLEEP to KM_PUSHPAGE. This is notable because this commit refactors the code, adding a new KM_SLEEP allocation. It is not clear to me whether this should be converted to KM_PUSHPAGE. 3. We had a merge conflict in libzfs_sendrecv.c because of copyright notices. 4. Several small C99 compatibility fixed were made. commit 7bc7f25040e68d6094a6c46fc300a3c4d66d2970 Author: Will Andrews Date: Tue Jun 11 09:13:47 2013 -0800 Illumos #3745, #3811 3745 zpool create should treat -O mountpoint and -m the same 3811 zpool create -o altroot=/xyz -O mountpoint=/mnt ignores the mountpoint option Reviewed by: Matthew Ahrens Approved by: Christopher Siden References: https://www.illumos.org/issues/3745 https://www.illumos.org/issues/3811 illumos/illumos-gate@8b713775314bbbf24edd503b4869342d8711ce95 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit d09f25dc66774959499a89bf3680d09c6e541ce8 Author: Will Andrews Date: Tue Jun 11 09:13:43 2013 -0800 Illumos #3744 3744 zfs shouldn't ignore errors unmounting snapshots Reviewed by: Matthew Ahrens Approved by: Christopher Siden References: https://www.illumos.org/issues/3744 illumos/illumos-gate@fc7a6e3fefc649cb65c8e2a35d194781445008b0 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. There is no clear way to distinguish between a failure when we tried to unmount the snapdir of a zvol (which does not exist)and the failure when we try to unmount a snapdir of a dataset,so the changes to zfs_unmount_snap() were dropped in favor of an altered Linux function that unconditionally returns 0. commit 3a84951d7dfb5357509a1ed1699f80b71f87982a Author: Will Andrews Date: Tue Jun 11 09:13:38 2013 -0800 Illumos #3743 3743 zfs needs a refcount audit Reviewed by: Matthew Ahrens Reviewed by: Eric Schrock Reviewed by: George Wilson Approved by: Christopher Siden References: https://www.illumos.org/issues/3743 illumos/illumos-gate@b287be1ba86043996f49b1cc34c80cc620f9b841 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit d3cc8b152edc608fa4b73d4cb5354356da6b451c Author: Will Andrews Date: Tue Jun 11 09:12:34 2013 -0800 Illumos #3742 3742 zfs comments need cleaner, more consistent style Reviewed by: Matthew Ahrens Reviewed by: George Wilson Reviewed by: Eric Schrock Approved by: Christopher Siden References: https://www.illumos.org/issues/3742 illumos/illumos-gate@f7170741490edba9d1d9c697c177c887172bc741 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. The change to zfs_vfsops.c was dropped because it involves zfs_mount_label_policy, which does not exist in the Linux port. commit e49f1e20a09181d03382d64afdc4b7a12a5dfdf1 Author: Will Andrews Date: Tue Jun 11 09:12:34 2013 -0800 Illumos #3741 3741 zfs needs better comments Reviewed by: Matthew Ahrens Reviewed by: Eric Schrock Approved by: Christopher Siden References: https://www.illumos.org/issues/3741 illumos/illumos-gate@3e30c24aeefdee1631958ecf17f18da671781956 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit b1118acbb16ec347f6a3eb091d9b7097d12b8d54 Author: Martin Matuska Date: Thu May 23 13:07:25 2013 -0400 Illumos #3699, #3739 3699 zfs hold or release of a non-existent snapshot does not output error 3739 cannot set zfs quota or reservation on pool version < 22 Reviewed by: Matthew Ahrens Reviewed by: Eric Shrock Approved by: Dan McDonald References: https://www.illumos.org/issues/3699 https://www.illumos.org/issues/3739 illumos/illumos-gate@013023d4ed2f6d0cf75380ec686a4aac392b4e43 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 63fd3c6cfd264cab94dc186fe8cceecac8bc0d50 Author: Adam Leventhal Date: Wed Aug 28 16:05:48 2013 -0700 Illumos #3582, #3584 3582 zfs_delay() should support a variable resolution 3584 DTrace sdt probes for ZFS txg states Reviewed by: Matthew Ahrens Reviewed by: George Wilson Reviewed by: Christopher Siden Reviewed by: Dan McDonald Reviewed by: Richard Elling Approved by: Garrett D'Amore References: https://www.illumos.org/issues/3582 illumos/illumos-gate@0689f76 Ported by: Ned Bass Signed-off-by: Brian Behlendorf Issue #1775 commit c1fabe7961b100a7dfd77cddba1650d9a6580dc0 Author: Mark Shellenbaum Date: Wed Aug 18 13:59:31 2010 -0600 6977619 NULL pointer deference in sa_handle_get_from_db() References: illumos/illumos-gate@44bffe012cad6481c82ad67bacd6b40bd29def2b Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit c0ebc844c78cd40c086dd145dc129b73f17af21b Author: Mark Shellenbaum Date: Mon Apr 5 19:59:44 2010 -0600 6939941 problem with moving files in zfs References: illumos/illumos-gate@d39ee142a97a7c58f60f7b52c62409f2ff64b234 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. This commit was so old that only two lines applied to the modern code base. commit 2696dfafd9ebce5e3aa227c630b13f2c5b26bce9 Author: George Wilson Date: Tue Apr 23 09:31:42 2013 -0800 Illumos #3642, #3643 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock Reviewed by: Matthew Ahrens Reviewed by: Adam Leventhal Approved by: Gordon Ross References: https://www.illumos.org/issues/3642 https://www.illumos.org/issues/3643 illumos/illumos-gate@4a92375985c37d61406d66cd2b10ee642eb1f5e7 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting Notes: 1. The alignment assumptions for the tx_cpu structure assume that a kmutex_t is 8 bytes. This isn't true under Linux but tc_pad[] was adjusted anyway for consistency since this structure was never carefully aligned in ZoL. If careful alignment does impact performance significantly this should be reworked to be portable. commit 7ec09286b761ee1fb85178ff55daaf8f74d935be Author: Matthew Ahrens Date: Wed Apr 10 13:54:56 2013 -0800 Illumos #3645, #3692 3645 dmu_send_impl: possibilty of pool hold leak 3692 Panic on zfs receive of a recursive deduplicated stream Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Reviewed by: Dan McDonald Approved by: Richard Lowe References: https://www.illumos.org/issues/3645 https://www.illumos.org/issues/3692 illumos/illumos-gate@de8d9cff565e928d0ace86f3ea0e2b15094d61df Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1792 Issue #1775 commit 2e528b49f8a0f8f2f51536a00fdf3ea9343bf302 Author: Matthew Ahrens Date: Fri Mar 8 10:41:28 2013 -0800 Illumos #3598 3598 want to dtrace when errors are generated in zfs Reviewed by: Dan Kimmel Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Approved by: Garrett D'Amore References: https://www.illumos.org/issues/3598 illumos/illumos-gate@be6fd75a69ae679453d9cda5bff3326111e6d1ca Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Porting notes: 1. include/sys/zfs_context.h has been modified to render some new macros inert until dtrace is available on Linux. 2. Linux-specific changes have been adapted to use SET_ERROR(). 3. I'm NOT happy about this change. It does nothing but ugly up the code under Linux. Unfortunately we need to take it to avoid more merge conflicts in the future. -Brian commit 7011fb6004b2227ff9e89894ed69ab83d36c1696 Author: Yuri Pankov Date: Wed Mar 6 17:57:09 2013 -0800 Illumos #3517 3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd Reviewed by: Albert Lee Reviewed by: Jeffry Molanus Reviewed by: George Wilson Approved by: Christopher Siden References: https://www.illumos.org/issues/3517 illumos/illumos-gate@efb4a871d8fd510a833bdca610528dde5ed69e42 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit d1fada1e6d953e32de4080bd366df17c640de191 Author: Matthew Ahrens Date: Fri Jul 5 15:37:16 2013 -0400 Illumos #3603, #3604: bobj improvements 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely 3871 GCC 4.5.3 does not like issue 3604 patch Reviewed by: Henrik Mattson Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Reviewed by: George Wilson Reviewed by: Garrett D'Amore Approved by: Dan McDonald References: https://www.illumos.org/issues/3603 https://www.illumos.org/issues/3604 https://www.illumos.org/issues/3871 illumos/illumos-gate@d04756377ddd1cf28ebcf652541094e17b03c889 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 Note that the patch from Illumos issue 3871 is not accepted into Illumos at the time of this writing. It is something that I wrote when porting this. Documentation is in the Illumos issue. commit 24a64651b4163d47b1187821152d762e9a263d5a Author: Matthew Ahrens Date: Fri Feb 22 01:23:09 2013 -0800 Illumos #3588 3588 provide zfs properties for logical (uncompressed) space used and referenced Reviewed by: Adam Leventhal Reviewed by: George Wilson Reviewed by: Dan McDonald Reviewed by: Richard Elling Approved by: Richard Lowe References: https://www.illumos.org/issues/3588 illumos/illumos-gate@77372cb0f35e8d3615ca2e16044f033397e88e21 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf commit c2e42f9d53bec422abb71efade2c004383345038 Author: George Wilson Date: Wed Feb 20 13:30:36 2013 -0800 Illumos #3578, #3579 3578 transferring the freed map to the defer map should be constant time 3579 ztest trips assertion in metaslab_weight() Reviewed by: Matthew Ahrens Reviewed by: Dan Kimmel Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Reviewed by: Richard Elling Approved by: Dan McDonald References: https://www.illumos.org/issues/3578 https://www.illumos.org/issues/3579 illumos/illumos-gate@9eb57f7f3fbb970d4b9b89dcd5ecf543fe2414d5 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf commit 23c0a1333c09f353ec872fb9eca2d36f6214cedc Author: George Wilson Date: Sun Feb 17 12:00:54 2013 -0800 Illumos #3561, #3116 3561 arc_meta_limit should be exposed via kstats 3116 zpool reguid may log negative guids to internal SPA history Reviewed by: Matthew Ahrens Reviewed by: Adam Leventhal Reviewed by: Christopher Siden Reviewed by: Gordon Ross Approved by: Garrett D'Amore References: https://www.illumos.org/issues/3561 https://www.illumos.org/issues/3116 illumos/illumos-gate@20128a0826f9c53167caa9215c12f08beee48e30 Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Porting Notes: 1. The spa change was accidentally included in the libzfs_core merge. 2. "Add missing arcstats" (1834f2d8b715d25bafbb0e4a099994f45c3211ae) already implemented these kstats a few years ago. commit 330847ff36146a427a48e79a9733dda3828284e8 Author: Matthew Ahrens Date: Mon Aug 26 17:09:29 2013 -0700 Illumos #3537 3537 want pool io kstats Reviewed by: George Wilson Reviewed by: Adam Leventhal Reviewed by: Eric Schrock Reviewed by: Sa?o Kiselkov Reviewed by: Garrett D'Amore Reviewed by: Brendan Gregg Approved by: Gordon Ross References: http://www.illumos.org/issues/3537 illumos/illumos-gate@c3a6601 Ported by: Cyril Plisko Signed-off-by: Brian Behlendorf Porting Notes: 1. The patch was restructured to take advantage of the existing spa statistics infrastructure. To accomplish this the kstat was moved in to spa->io_stats and the init/destroy code moved to spa_stats.c. 2. The I/O kstat was simply named which conflicted with the pool directory we had already created. Therefore it was renamed to /io 3. An update handler was added to allow the kstat to be zeroed. commit a117a6d66e5cf1e9d4f173bccc786a169e9a8e04 Author: George Wilson Date: Sun Feb 10 22:21:05 2013 -0800 Illumos #3522 3522 zfs module should not allow uninitialized variables Reviewed by: Sebastien Roy Reviewed by: Adam Leventhal Reviewed by: Matthew Ahrens Approved by: Garrett D'Amore References: https://www.illumos.org/issues/3522 illumos/illumos-gate@d5285cae913f4e01ffa0e6693a6d8ef1fbea30ba Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Porting notes: 1. ZFSOnLinux had already addressed many of these issues because of its use of -Wall. However, the manner in which they were addressed differed. The illumos fixes replace the ones previously made in ZFSOnLinux to reduce code differences. 2. Part of the upstream patch made a small change to arc.c that might address zfsonlinux/zfs#1334. 3. The initialization of aclsize in zfs_log_create() differs because vsecp is a NULL pointer on ZFSOnLinux. 4. The changes to zfs_register_callbacks() were dropped because it has diverged and needs to be resynced. commit a35beedfb3f25596b4ec9122742c1337083118f5 Author: Brian Behlendorf Date: Wed Oct 30 11:19:53 2013 -0700 Add cstyle.pl utility and cstyle.1 man page Cstyle is the C source style checker used by Illumos. Since the original ZFS source was written using these style guidelines they must also be followed by ZoL for consistency. The checker has been added to the scripts directory and may berun on a per file basis. New patches should be careful to avoid introducing new style warnings. Additionally, the 'checkstyle' target has been added to the top level Makefile and can be used to check the entire source tree. While Zol has historically attempted to follow the SunOS style guide the lack of a rigorous style checker has allowed various warning to be introduced. Currently there are 2211 reported style violations and we want to gradually eliminate these from the tree. Note the cstyle.1 man page is provided under man/man1/cstyle.1 but since it is a developer utility it is not installed along with the other man pages. Signed-off-by: Brian Behlendorf commit 495b25a91a8f29aeec9e2965752a1fc9b9569583 Author: Richard Yao Date: Tue Oct 8 22:37:38 2013 -0400 Add missing code to zfs_debug.{c,h} This is required to make Illumos 3962 merge. Signed-off-by: Richard Yao commit 632a242e8352f0a4684f41286a288689f97e504b Author: Richard Yao Date: Mon Oct 7 06:53:58 2013 -0400 Add missing copyright notices from Illumos This resolves merge conflicts when merging Illumos #3588 and Illumos #4047. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 20f04f08aa5032f1e958ba38654d9ed833b6b636 Author: Richard Yao Date: Tue Oct 8 17:59:42 2013 -0400 Fix incorrect usage of strdup() in zfs_unmount_snap() Modifying the length of a string returned by strdup() is incorrect because strfree() is allowed to use strlen() to determine which slab cache was used to do the allocation. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 8c8417933f11d2bda734056f34f5d7c982acbcec Author: Richard Yao Date: Mon Oct 7 07:30:22 2013 -0400 Fix order of function calls in zio_free_sync() The resolution of a merge conflict when merging Illumos #3464 caused us to invert the order couple of function calls in zio_free_sync() versus what they are in Illumos. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 9cac042cfeccb2d3ecc5a96c0c2ba9afe631338b Author: Richard Yao Date: Mon Sep 2 00:22:30 2013 -0400 Reintroduce uio_prefaultpages() This was accidentally removed by overzealous commenting. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1775 commit 023699cd62eb033ebed5e5fae4e13acaba4c5461 Author: Massimo Maggi Date: Mon Oct 28 09:22:15 2013 -0700 Posix ACL Support This change adds support for Posix ACLs by storing them as an xattr which is common practice for many Linux file systems. Since the Posix ACL is stored as an xattr it will not overwrite any existing ZFS/NFSv4 ACLs which may have been set. The Posix ACL will also be non-functional on other platforms although it may be visibleas an xattr if that platform understands SA based xattrs. By default Posix ACLs are disabled but they may be enabled with the new 'aclmode=noacl|posixacl' property. Set the property to 'posixacl' to enable them. If ZFS/NFSv4 ACL support is ever added an appropriate acltype will be added. This change passes the POSIX Test Suite cleanly with the exception of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too). http://www.tuxera.com/community/posix-test-suite/ Signed-off-by: Massimo Maggi Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #170 commit 7c2448a33ee71be1671c158a167559d1320ff839 Author: Brian Behlendorf Date: Mon Oct 28 11:57:15 2013 -0700 Improve xattr property documentation Extend the xattr property section of zfs(8) such that it covers both styles of supported xattr. A short discussion of the benefits and drawbacks of each type is presented to allow users to make an informed choice. Signed-off-by: Massimo Maggi Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #170 commit fc9e0530c9b1be00c122f88d9e4c8c329f2d5d26 Author: Brian Behlendorf Date: Mon Oct 28 09:07:00 2013 -0700 Prevent xattr remove from creating xattr directory Attempting to remove an xattr from a file which does not contain any directory based xattrs would result in the xattr directory being created. This behavior is non-optimal because it results in write operations to the pool in addition to the expected error being returned. To prevent this the CREATE_XATTR_DIR flag is only passed in zpl_xattr_set_dir() when setting a non-NULL xattr value. In addition, zpl_xattr_set() is updated similarly such that it will return immediately if passed an xattr name which doesn't existand a NULL value. Signed-off-by: Massimo Maggi Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #170 commit 37fd6e00a699aff3fea24199497e9484cd218a84 Author: Prakash Surya Date: Mon Aug 26 09:23:09 2013 -0700 Add script to fix file names in upstream patches Added a simple sed script to do a search and replace on the Illumos ZFS file names and replace them with the ZFS on Linux equivalent. Example usage: # Replace Illumos paths with Linux paths $ ./scripts/zfs2zol-patch.sed arc.c.patch > arc.c.patch.linux # Ensure the script worked as expected $ diff arc.c.patch arc.c.patch.linux # Apply the patch using Linux paths $ patch -p1 < arc.c.patch.linux Signed-off-by: Richard Yao Signed-off-by: Prakash Surya Signed-off-by: Brian Behlendorf Closes #1679 commit c12e3a594a49ed10b7870d950c1f336f78f136cb Author: Richard Yao Date: Wed Oct 2 11:22:53 2013 -0400 Restructure zfs_readdir() to fix regressions This does the following: 1. It creates a uint8_t type value, which is initialized to DT_DIR on dot directories and ZFS_DIRENT_TYPE(zap.za_first_integer) otherwise. This resolves a regression where we return unintialized values as the directory entry type on dot directories. This was accidentally introduced by commit 8170d281263e52ff33d7fba93ab625196844df36. 2. It restructures zfs_readdir() code to use `uint64_t offset` like Illumos instead of `loff_t *pos`. This resolves a regression where negative ZAP cursors were treated as if they were dot directories. 3. It restructures the function to more closely match the structure of zfs_readdir() on Illumos and removes the unused variable outcount, which was only used on Illumos. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1750 commit d65e73810938e5619b72591d3438063b00949e77 Author: Ralf Ertzinger Date: Wed Oct 23 10:50:48 2013 +0200 Add -p switch to "zpool get" This works the same as the -p switch to "zfs get", displaying full resolution values for appropriate attributes. Signed-off-by: Brian Behlendorf Closes #1813 commit 8b921f667afc86c452242be0b6d3b257472ebe76 Author: Ralf Ertzinger Date: Wed Oct 23 10:33:33 2013 +0200 Introduce zpool_get_prop_literal interface This change introduces zpool_get_prop_literal. It's an expanded version of zpool_get_prop taking one additional boolean parameter. With this parameter set to B_FALSE it will behave identically to zpool_get_prop. Setting it to B_TRUE will return full precision numbers for the following properties: ZPOOL_PROP_SIZE ZPOOL_PROP_ALLOCATED ZPOOL_PROP_FREE ZPOOL_PROP_FREEING ZPOOL_PROP_EXPANDSZ ZPOOL_PROP_ASHIFT Also introduced is a wrapper function for zpool_get_prop making it use zpool_get_prop_literal in the background. Signed-off-by: Brian Behlendorf Issue #1813 commit 157c9b6981ab6203550e8857144ac49e1e867fb7 Author: Steven Hartland Date: Thu Oct 24 00:45:45 2013 +0100 Corrected "zfs list -t " syntax in man page and in command help. Signed-off-by: Brian Behlendorf Closes #1805 commit 8eaf9f3543aa6843aa276010768cce8c0626e2d8 Merge: 11cb9d7 d738d34 Author: Brian Behlendorf Date: Fri Oct 25 15:22:34 2013 -0700 Merge branch 'kstat' This branch updates several of the zfs kstats to take advantage of the improved raw kstat functionality. In addition, two new kstats and a script called dbufstat.py are introduced. Updated+New Kstats * dbufs - Stats for all dbufs in the dbuf_hash * /txgs - Stats for the last N txgs synced to disk * /reads - Stats for rhe last N reads issues by the ARC * /dmu_tx_assign - Histogram of tx assign times Signed-off-by: Brian Behlendorf commit d738d34da5b25b5e5daef966c29386468fd16263 Author: Brian Behlendorf Date: Fri Oct 25 13:58:45 2013 -0700 Add dbufstat.py command The dbufstat.py command was added to provide a conveniant way to easily determine what ZFS is caching. The script consumes the raw /proc/spl/kstat/zfs/dbufs kstat data can consolidates it in to a more human readable form. This was designed primarily as a tool to aid developers but it may also be useful for advanced users who want more visibility in to what the ARC is caching. When run without options dbufstat.py will default to showing a list of all objects with at least one buffer present in the cache. The total cache space consumed by that object will be printed on the right along with the object type. Similar to the arcstats.py command the -x option may used to display additional fields. Two other modes of operation are also supported by dbufstat.pyand the expectation is additional display modes may be added as needed. The -t option will summerize the total number of bytes cached for each object type, and the -b option will show every dbuf currently cached. The script was designed to be consistent with arcstat.py and includes most of the same options and funcationality. Signed-off-by: Prakash Surya Signed-off-by: Brian Behlendorf commit e0b0ca983d6897bcddf05af2c0e5d01ff66f90db Author: Brian Behlendorf Date: Wed Oct 2 17:11:19 2013 -0700 Add visibility in to cached dbufs Currently there is no mechanism to inspect which dbufs are being cached by the system. There are some coarse counters in arcstats by they only give a rough idea of what's being cached. This patch aims to improve the current situation by adding a new dbufs kstat. When read this new kstat will walk all cached dbufs linked in to the dbuf_hash. For each dbuf it will dump detailed information about the buffer. It will also dump additional information about the referenced arc buffer and its related dnode. This provides a more complete view in to exactly what is being cached. With this generic infrastructure in place utilities can be written to post-process the data to understand exactly how the caching is working. For example, the data could be processed to show a list of all cached dnodes and how much space they're consuming. Or a similar list could be generated based on dnode type. Many other ways to interpret the data exist based on what kinds of questions you're trying to answer. Signed-off-by: Brian Behlendorf Signed-off-by: Prakash Surya commit 2d37239a28b8b2ddc0e8312093f8d8810c6351fa Author: Brian Behlendorf Date: Wed Oct 2 11:43:52 2013 -0700 Add visibility in to dmu_tx_assign times This change adds a new kstat to gain some visibility into the amount of time spent in each call to dmu_tx_assign. A histogram is exported via the new dmu_tx_assign file. The information contained in this histogram is the frequency dmu_tx_assign took to complete given an interval range. Signed-off-by: Prakash Surya Signed-off-by: Brian Behlendorf commit 0b1401ee911c5a0c0bdb7a8e6ad36840cea3af24 Author: Brian Behlendorf Date: Tue Oct 1 09:50:50 2013 -0700 Add visibility in to txg sync behavior This change is an attempt to add visibility in to how txgs are being formed on a system, in real time. To do this, a list was added to the in memory SPA data structure for a pool, with each element on the list corresponding to txg. These entries are then exported through the kstat interface, which can then be interpreted in userspace. For each txg, the following information is exported: * Unique txg number (uint64_t) * The time the txd was born (hrtime_t) (*not* wall clock time; relative to the other entries on the list) * The current txg state ((O)pen/(Q)uiescing/(S)yncing/(C)ommitted) * The number of reserved bytes for the txg (uint64_t) * The number of bytes read during the txg (uint64_t) * The number of bytes written during the txg (uint64_t) * The number of read operations during the txg (uint64_t) * The number of write operations during the txg (uint64_t) * The time the txg was closed (hrtime_t) * The time the txg was quiesced (hrtime_t) * The time the txg was synced (hrtime_t) Note that while the raw kstat now stores relative hrtimes for the open, quiesce, and sync times. Those relative times are used to calculate how long each state took and these deltas and printed by output handlers. Signed-off-by: Brian Behlendorf commit 1421c89142376bfd41e4de22ed7c7846b9e41f95 Author: Prakash Surya Date: Fri Sep 6 16:09:05 2013 -0700 Add visibility in to arc_read This change is an attempt to add visibility into the arc_read calls occurring on a system, in real time. To do this, a list was added to the in memory SPA data structure for a pool, with each element on the list corresponding to a call to arc_read. These entries are then exported through the kstat interface, which can then be interpreted in userspace. For each arc_read call, the following information is exported: * A unique identifier (uint64_t) * The time the entry was added to the list (hrtime_t) (*not* wall clock time; relative to the other entries on the list) * The objset ID (uint64_t) * The object number (uint64_t) * The indirection level (uint64_t) * The block ID (uint64_t) * The name of the function originating the arc_read call (char[24]) * The arc_flags from the arc_read call (uint32_t) * The PID of the reading thread (pid_t) * The command or name of thread originating read (char[16]) From this exported information one can see, in real time, exactly what is being read, what function is generating the read, and whether or not the read was found to be already cached. There is still some work to be done, but this should serve as a good starting point. Specifically, dbuf_read's are not accounted for in the currently exported information. Thus, a follow up patch should probably be added to export these calls that never call into arc_read (they only hit the dbuf hash table). In addition, it might be nice to create a utility similar to "arcstat.py" to digest the exported information and display it in a more readable format. Or perhaps, log the information and allow for it to be "replayed" at a later time. Signed-off-by: Prakash Surya Signed-off-by: Brian Behlendorf commit 76463d4026e0fa4b3d7b96acd58cb5fb79c49af7 Author: Brian Behlendorf Date: Mon Sep 30 11:51:20 2013 -0700 Revert "Add txgs- kstat file" This reverts commit e95853a331529a6cb96fdf10476c53441e59f4e1. commit 98ab38d1096079d82247350f526f0d7268956fb5 Author: Brian Behlendorf Date: Mon Sep 30 11:45:58 2013 -0700 Revert "Add new kstat for monitoring time in dmu_tx_assign" This reverts commit 92334b14ec378b1693573b52c09816bbade9cf3e. Signed-off-by: Brian Behlendorf commit 11cb9d773f48830cf3ff718861c070a8937c6a03 Author: Brian Behlendorf Date: Fri Oct 11 14:24:18 2013 -0700 Increase default udev wait time When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf Closes #1646 commit b3c49d3df82466646bde9beebce7bbf0b3c41853 Author: Richard Yao Date: Sat Oct 5 17:55:24 2013 -0400 Linux 3.11 compat: Rename LZ4 symbols Linus Torvalds merged LZ4 into Linux 3.11. This causes a conflict whenever CONFIG_LZ4_DECOMPRESS=y or CONFIG_LZ4_COMPRESS=y are set in the kernel's .config. We rename the symbols to avoid the conflict. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1789 commit 2e2ddc30b47c174d95c2eb491452a7587e3e129f Author: Tim Chase Date: Sun Oct 13 11:36:15 2013 -0500 Dedup-related documentation additions for zpool and zdb. Document the "-D" and "-T" options and the optional interval and count or "zpool status". Also for zpool's man page, use a consistent order for the various "-T" options to match the program's help output. Document the effect of additional "-D" options for zdb. Signed-off-by: Brian Behlendorf Closes #1786 commit fbcb768c8fd1f32653f46ed4a8a9ceafe139087b Author: Tim Chase Date: Sat Oct 12 17:33:28 2013 -0500 Add missing dsl pool configuration lock The semantics introduced by the restructured sync task of illumos 3464 require this lock when calling dmu_snapshot_list_next(). The pool is locked/unlocked for each iteration to reduce the chance of long-running locks. This was accidentally missed when doing the original port because ZoL's control directory code is Linux-specific and is in a different file than in illumos. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1785 commit 7a6144076166944655d86f1449be8566d1a3c71a Author: George Wilson Date: Thu Feb 21 13:58:29 2013 -0800 Illumos #3552 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread (fix race condition) References: https://www.illumos.org/issues/3552 illumos/illumos-gate@03f8c366886542ed249a15d755ae78ea4e775d9d Ported-by: Richard Yao Signed-off-by: Brian Behlendorf Porting notes: This fixes an upstream regression that was introduced in commit zfsonlinux/zfs@e51be06697762215dc3b679f8668987034a5a048, which ported the Illumos 3552 changes. This fix was added to upstream rather quickly, but at the time of the port, no one spotted it and the race was rare enough that it passed our regression tests. I discovered this when comparing our metaslab.c to the illumos metaslab.c. Without this change it is possible for metaslab_group_alloc() to consume a large amount of cpu time. Since this occurs under a mutex in a rcu critical section the kernel will log this to the console as a self-detected cpu stall as follows: INFO: rcu_sched self-detected stall on CPU { 0} (t=60000 jiffies g=11431890 c=11431889 q=18271) Closes #1687 Closes #1720 Closes #1731 Closes #1747 commit a6ce1eae54ca048ae7e7dfdcad05c5565a129226 Author: Richard Yao Date: Thu Sep 26 13:44:10 2013 -0400 Fix libzfs_core changes to follow GNU libtool guidelines The GNU libtool documentation states to start with a version of 0:0:0, rather than 1:1:0. Illumos uses the name libzfs_core.so.1, so to be consistent, we should go with 1:0:0. http://www.gnu.org/software/libtool/manual/libtool.html#Updating-version-info The GNU libtool documentation also provides guidence on how the version information should be incremented. Doing this does a SONAME bump of the libzfs and libzpool libraries. This is particularly important on Gentoo because a SONAME bump enables portage to retain the older libraries until any packages that link to them are rebuilt. The main example of this is GRUB2's grub2-mkconfig, which will break unless it is rebuilt against the new libraries. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1751 commit 31fc19399e597e3391f19f1392ab120f1de0d5f2 Author: Richard Yao Date: Thu Sep 26 13:42:41 2013 -0400 Generate libraries with correct DT_NEEDED entries Libraries that depend on other libraries should list them in ELF's DT_NEEDED field so that programs linking to them do not need to specify those libraries unless they depend on them as well. This is not the case in the current code and the consequence is that anything that needs a library must know its dependencies. This is fragile and caused GRUB2's configure script to break when a dependency was added on libblkid in libzfs. This resolves that problem by using LIBADD/LDADD to specify libraries in Makefile.am instead of LDFLAGS. This ensures that proper DT_NEEDED entries are generated and prevents GRUB2's configure script from breaking in the presence of a libblkid dependency. This also removes unneeded dependencies from various files. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1751 commit 1db7b9be75a225cedb3b7a60028ca5695e5b8346 Author: Richard Yao Date: Wed Aug 28 16:17:47 2013 -0400 Fix libblkid support libblkid support is dormant because the autotools check is broken and liblkid identifies ZFS vdevs as "zfs_member", not "zfs". We fix that with a few changes: First, we fix the libblkid autotools check to do a few things: 1. Make a 64MB file, which is the minimum size ZFS permits. 2. Make 4 fake uberblock entries to make libblkid's check succeed. 3. Return 0 upon success to make autotools use the success case. 4. Include stdlib.h to avoid implicit declration of free(). 5. Check for "zfs_member", not "zfs" 6. Make --with-blkid disable autotools check (avoids Gentoo sandbox violation) 7. Pass '-lblkid' correctly using LIBS not LDFLAGS. Second, we change the libblkid support to scan for "zfs_member", not "zfs". This makes --with-blkid work on Gentoo. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Issue #1751 commit 65ee05acd773beafd03bfedf96a092dd08cb2739 Author: Neil Stockbridge Date: Wed Oct 9 18:58:30 2013 +1300 Update detach section of zpool(8) The detach section of the zpool(8) man page now suggests the offline command. Using offline may be more appropriate for certain situations. Signed-off-by: Brian Behlendorf Closes #1776 commit 40a806df259c0b826b8e962579dff64e8dfbf0d7 Author: Ned Bass Date: Mon Sep 30 16:29:37 2013 -0700 Export symbols dsl_pool_config_{enter,exit} These are needed by consumers (i.e. Lustre) who wish to use the dsl_prop_register() interface to register callbacks when pool properties of interest change. This interface requires that the DSL pool configuration lock is held when called. Signed-off-by: Brian Behlendorf Closes #1762 commit 222b94805903dfa6879565ab9b1c8e3b0d70cbdf Author: Brian Behlendorf Date: Wed Oct 2 10:00:04 2013 -0700 Fix memory leak false positive in log_internal() When building the spl with --enable-debug-kmem-tracking a memory leak is detected in log_internal(). This happens to be a false positive because the memory was freed using strfree() instead of kmem_free(). All kmem_alloc()'s must be released with kmem_free() to ensure correct accounting. SPL: kmem leaked 135/5641311 bytes address size data func:line ffff8800cba7cd80 135 ZZZZZZZZZZZZZZZZ log_internal:456 Signed-off-by: Brian Behlendorf commit 3549721c9e1f737fb7ba83d1fd52f396fd16889c Author: Richard Yao Date: Thu Sep 5 15:23:24 2013 -0400 Update drive database Add Corsair Force GS drive (obtained from drive_id) Add Kingston HyperX 3K (obtained from drive_id) Add OCZ Vertex 4 drive (obtained from drive_id) Add Samsung SM843T enterprise drive (obtained from drive_id) Add entries for additional sizes of Intel 320/330/335/520 series Add Cruical C400 (obtained from Illumos user's sd.conf) Add Toshiba SSD (obtained from Illumos user's sd.conf) Add Samsung's first SLC SSD (obtained from drive_id) Add OCZ Core Series (obtained from drive_id) Add Intel DC S3700 (obtained from drive_id) Notes: 1. The drive identifer obtained for the Samsung SM843T was MZ7WD480. The rest were extrapolated. The additional entries were checked with Google to verify that such drives exist in the wild. 2. The additional entries for Intel drives were extrapolated from existing entries. The additional entries were checked with Google to verify that such drives exist in the wild. 3. The "ATA C400-MTFDDAC512M" and "ATA TOSHIBA THNSNH51" entries are from the sd.conf of gcbirzan on freenode. Additional entries were extrapolated from them and checked with Google. 4. I obtained the Samsung MCCOE64G entry from an actual drive. The Samsung MCCOE32G entry was extrapolated from it and checked with Google. 5. I obtained the SSDSC2BA10 from a 100GB Intel DC S3700 drive and extrapolated the entries for the additional models. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1752 commit 36342b13d9973a8c4e83f7c702545494aa5b80b4 Author: Brian Behlendorf Date: Wed Sep 25 09:33:00 2013 -0700 Export addition dsl_prop_* symbols The recent sync task restructuring in 13fe019 introduced several new symbols which should be exported for use by consumers such as Lustre. Signed-off-by: Brian Behlendorf commit 8769db396691978a48abee1d1855709d7b01d4d0 Author: Tim Chase Date: Fri Sep 20 09:30:04 2013 -0500 Allocate the ioctl "output" nvlist with KM_PUSHPAGE. Some ZFS errors such as certain snapshot failures can occur in the sync task context. Because they may require additional memory allocations, the initial nvlist must be allocated with KM_PUSHPAGE. Signed-off-by: Brian Behlendorf Issue #1746 Issue #1737 commit c5322236eccc7c5e1d23983c78928ad566685e7c Author: Tim Chase Date: Sat Sep 14 22:09:09 2013 -0500 Fix several new KM_SLEEP warnings A handful of allocations now occur in the sync path and need to use KM_PUSHPAGE. These were introduced by commit 13fe019. Signed-off-by: Brian Behlendorf Issue #1746 Issue #1737 commit cbfa294de4937ae1af5845e9f765a3dc188cbcef Author: Brian Behlendorf Date: Wed Sep 25 09:29:30 2013 -0700 Fix spa_deadman() TQ_SLEEP warning The spa_deadman() and spa_sync() functions can both be run in the spa_sync context and therefore should use TQ_PUSHPAGE instead of TQ_SLEEP. Signed-off-by: Brian Behlendorf Closes #1734 Closes #1749 commit f9f3f1ef983e987a2e09a49c3684405561fed634 Author: GregorKopka Date: Thu Sep 19 16:42:17 2013 +0200 Removing unneeded mutex for reading vq_pending_tree size Locking mutex &vq->vq_lock in vdev_mirror_pending is unneeded: * no data is modified * only vq_pending_tree is read * in case garbage is returned (eg. vq_pending_tree being updated while the read is made) the worst case would be that a single read could be queued on a mirror side which more busy than thought The benefit of this change is streamlining of the code path since it is taken for *every* mirror member on *every* read. Signed-off-by: Brian Behlendorf Closes #1739 commit 77831e17385ba822fe70436d862c0e14df5d67b2 Author: Kohsuke Kawaguchi Date: Wed Sep 25 15:14:47 2013 -0700 Reduce the stack usage of dsl_dataset_remove_clones_key dataset_remove_clones_key does recursion, so if the recursion goes deep it can overrun the linux kernel stack size of 8KB. I have seen this happen in the actual deployment, and subsequently confirmed it by running a test workload on a custom-built kernel that uses 32KB stack. See the following stack trace as an example of the case where it would have run over the 8KB stack kernel: Depth Size Location (42 entries) ----- ---- -------- 0) 11192 72 __kmalloc+0x2e/0x240 1) 11120 144 kmem_alloc_debug+0x20e/0x500 2) 10976 72 dbuf_hold_impl+0x4a/0xa0 3) 10904 120 dbuf_prefetch+0xd3/0x280 4) 10784 80 dmu_zfetch_dofetch.isra.5+0x10f/0x180 5) 10704 240 dmu_zfetch+0x5f7/0x10e0 6) 10464 168 dbuf_read+0x71e/0x8f0 7) 10296 104 dnode_hold_impl+0x1ee/0x620 8) 10192 16 dnode_hold+0x19/0x20 9) 10176 88 dmu_buf_hold+0x42/0x1b0 10) 10088 144 zap_lockdir+0x48/0x730 11) 9944 128 zap_cursor_retrieve+0x1c4/0x2f0 12) 9816 392 dsl_dataset_remove_clones_key.isra.14+0xab/0x190 13) 9424 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 14) 9032 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 15) 8640 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 16) 8248 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 17) 7856 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 18) 7464 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 19) 7072 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 20) 6680 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 21) 6288 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 22) 5896 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 23) 5504 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 24) 5112 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 25) 4720 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 26) 4328 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 27) 3936 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 28) 3544 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 29) 3152 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 30) 2760 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 31) 2368 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 32) 1976 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 33) 1584 392 dsl_dataset_remove_clones_key.isra.14+0x10c/0x190 34) 1192 232 dsl_dataset_destroy_sync+0x311/0xf60 35) 960 72 dsl_sync_task_group_sync+0x12f/0x230 36) 888 168 dsl_pool_sync+0x48b/0x5c0 37) 720 184 spa_sync+0x417/0xb00 38) 536 184 txg_sync_thread+0x325/0x5b0 39) 352 48 thread_generic_wrapper+0x7a/0x90 40) 304 128 kthread+0xc0/0xd0 41) 176 176 ret_from_fork+0x7c/0xb0 This change reduces the stack usage in dsl_dataset_remove_clones_key by allocating structures in heap, not in stack. This is not a fundamental fix, as one can create an arbitrary large data set that runs over any fixed size stack, but this will make the problem far less likely. Signed-off-by: Brian Behlendorf Signed-off-by: Kohsuke Kawaguchi Closes #1726 commit 34d5a5fd03210d9efdd5966070df1f71c0dbef96 Author: Brian Behlendorf Date: Fri Sep 13 13:20:15 2013 -0700 Fix zpl_mknod() return values The zpl_mknod() function was incorrectly negating its return value. This doesn't cause any problems in the success case, but it does prevent us from returning the correct error code for a failure. The implementation of this function is now consistent with all the other zpl_* functions. Signed-off-by: Brian Behlendorf Closes #1717 commit 17897ce2c88476f6fb7413f05e183694cb7482ef Author: Brian Behlendorf Date: Fri Sep 13 13:10:36 2013 -0700 Fix uninitialized variables When compiling on an ARM device using gcc 4.7.3 several variables in the zfs_obj_to_path_impl() function were flagged as uninitialized. To resolve the warnings explicitly initialize them to zero. Signed-off-by: Brian Behlendorf Closes #1716 commit b83e3e48c9b183a80dd00eb6c7431a1cbc7d89c9 Author: Richard Yao Date: Tue Sep 10 15:13:44 2013 -0400 Stop runtime pointer modifications in autotools checks c38367c73f592ca9729ba0d5e70b5e3bc67e0745 was meant to eliminate runtime function pointer modifications in autotools checks because they were prone to false negatives on kernels hardened by the PaX project. Unfortunately, I missed the xattr_handler and super_block->s_bdi autotools checks. Recent changes to PaX constified xattr_handler->get/set, which lead me to discover this oversight. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #1433 commit 4cf652e5d4becca29df8c961daaa68f9c9c81245 Author: Tim Chase Date: Wed Sep 11 11:47:43 2013 -0700 Fix dmu_objset_find_dp() KM_SLEEP warning After the restructuring in 13fe019 The 'zfs rename' command will result in a KM_SLEEP being called in the sync context. This may deadlock due to reclaim so it was changed to KM_PUSHPAGE. Signed-off-by: Brian Behlendorf Closes #1711 commit 13fe019870c8779bf2f5b3ff731b512cf89133ef Author: Matthew Ahrens Date: Wed Sep 4 07:00:57 2013 -0500 Illumos #3464 3464 zfs synctask code needs restructuring Reviewed by: Dan Kimmel Reviewed by: Adam Leventhal Reviewed by: George Wilson Reviewed by: Christopher Siden Approved by: Garrett D'Amore References: https://www.illumos.org/issues/3464 illumos/illumos-gate@3b2aab18808792cbd248a12f1edf139b89833c13 Ported-by: Tim Chase Signed-off-by: Brian Behlendorf Closes #1495 commit 6f1ffb06655008c9b519108ed29fbf03acd6e5de Author: Matthew Ahrens Date: Wed Aug 28 06:45:09 2013 -0500 Illumos #2882, #2883, #2900 2882 implement libzfs_core 2883 changing "canmount" property to "on" should not always remount dataset 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once Reviewed by: George Wilson Reviewed by: Chris Siden Reviewed by: Garrett D'Amore Reviewed by: Bill Pijewski Reviewed by: Dan Kruchinin Approved by: Eric Schrock References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2883 https://www.illumos.org/issues/2900 illumos/illumos-gate@4445fffbbb1ea25fd0e9ea68b9380dd7a6709025 Ported-by: Tim Chase Signed-off-by: Brian Behlendorf Closes #1293 Porting notes: WARNING: This patch changes the user/kernel ABI. That means that the zfs/zpool utilities built from master are NOT compatible with the 0.6.2 kernel modules. Ensure you load the matching kernel modules from master after updating the utilities. Otherwise the zfs/zpool commands will be unable to interact with your pool and you will see errors similar to the following: $ zpool list failed to read pool configuration: bad address no pools available $ zfs list no datasets available Add zvol minor device creation to the new zfs_snapshot_nvl function. Remove the logging of the "release" operation in dsl_dataset_user_release_sync(). The logging caused a null dereference because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the logging functions try to get the ds name via the dsl_dataset_name() function. I've got no idea why this particular code would have worked in Illumos. This code has subsequently been completely reworked in Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring). Squash some "may be used uninitialized" warning/erorrs. Fix some printf format warnings for %lld and %llu. Apply a few spa_writeable() changes that were made to Illumos in illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and 3115 fixes. Add a missing call to fnvlist_free(nvl) in log_internal() that was added in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time (zfsonlinux/zfs@9e11c73) because it depended on future work.