sbuild (Debian sbuild) 0.78.1 (09 February 2019) on gcc131.bak.milne.osuosl.org +==============================================================================+ | wiggle 1.1-1 (arm64) Mon, 09 Mar 2020 16:04:35 +0000 | +==============================================================================+ Package: wiggle Version: 1.1-1 Source Version: 1.1-1 Distribution: unstable Machine Architecture: amd64 Host Architecture: arm64 Build Architecture: amd64 Build Profiles: cross nocheck Build Type: any I: NOTICE: Log filtering will replace 'var/run/schroot/mount/unstable-amd64-sbuild-96a4a8fd-a1df-4f3a-bd07-21de1e0ec7dd' with '<>' I: NOTICE: Log filtering will replace 'build/wiggle-uQJWu7/resolver-GzDfQI' with '<>' +------------------------------------------------------------------------------+ | Update chroot | +------------------------------------------------------------------------------+ Get:1 http://debian.oregonstate.edu/debian unstable InRelease [142 kB] Get:2 http://debian.oregonstate.edu/debian unstable/main Sources.diff/Index [27.9 kB] Get:3 http://debian.oregonstate.edu/debian unstable/main amd64 Packages.diff/Index [27.9 kB] Get:4 http://debian.oregonstate.edu/debian unstable/main Sources 2020-03-09-0813.36.pdiff [3541 B] Get:5 http://debian.oregonstate.edu/debian unstable/main Sources 2020-03-09-1419.12.pdiff [14.8 kB] Get:5 http://debian.oregonstate.edu/debian unstable/main Sources 2020-03-09-1419.12.pdiff [14.8 kB] Get:6 http://debian.oregonstate.edu/debian unstable/main amd64 Packages 2020-03-09-0813.36.pdiff [8466 B] Get:7 http://debian.oregonstate.edu/debian unstable/main amd64 Packages 2020-03-09-1419.12.pdiff [14.3 kB] Get:7 http://debian.oregonstate.edu/debian unstable/main amd64 Packages 2020-03-09-1419.12.pdiff [14.3 kB] Get:8 http://debian.oregonstate.edu/debian unstable/main arm64 Packages [8076 kB] Fetched 8315 kB in 1s (5563 kB/s) Reading package lists... Reading package lists... Building dependency tree... Reading state information... Calculating upgrade... 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. +------------------------------------------------------------------------------+ | Fetch source files | +------------------------------------------------------------------------------+ Check APT --------- Checking available source versions... Download source files with APT ------------------------------ Reading package lists... NOTICE: 'wiggle' packaging is maintained in the 'Git' version control system at: https://salsa.debian.org/debian/wiggle.git Please use: git clone https://salsa.debian.org/debian/wiggle.git to retrieve the latest (possibly unreleased) updates to the package. Need to get 855 kB of source archives. Get:1 http://debian.oregonstate.edu/debian unstable/main wiggle 1.1-1 (dsc) [1839 B] Get:2 http://debian.oregonstate.edu/debian unstable/main wiggle 1.1-1 (tar) [847 kB] Get:3 http://debian.oregonstate.edu/debian unstable/main wiggle 1.1-1 (diff) [6840 B] Fetched 855 kB in 0s (11.1 MB/s) Download complete and in download only mode I: NOTICE: Log filtering will replace 'build/wiggle-uQJWu7/wiggle-1.1' with '<>' I: NOTICE: Log filtering will replace 'build/wiggle-uQJWu7' with '<>' +------------------------------------------------------------------------------+ | Install package build dependencies | +------------------------------------------------------------------------------+ Setup apt archive ----------------- Merged Build-Depends: debhelper (>= 12), groff, libncurses-dev, time, libc-dev, libstdc++-dev, build-essential:amd64, fakeroot:amd64, crossbuild-essential-arm64:amd64, libc-dev:arm64, libstdc++-dev:arm64 Filtered Build-Depends: debhelper (>= 12), groff, libncurses-dev, time, libc-dev, libstdc++-dev, build-essential:amd64, fakeroot:amd64, crossbuild-essential-arm64:amd64, libc-dev:arm64, libstdc++-dev:arm64 dpkg-deb: building package 'sbuild-build-depends-main-dummy' in '/<>/apt_archive/sbuild-build-depends-main-dummy.deb'. Ign:1 copy:/<>/apt_archive ./ InRelease Get:2 copy:/<>/apt_archive ./ Release [957 B] Ign:3 copy:/<>/apt_archive ./ Release.gpg Get:4 copy:/<>/apt_archive ./ Sources [415 B] Get:5 copy:/<>/apt_archive ./ Packages [502 B] Fetched 1874 B in 0s (72.1 kB/s) Reading package lists... Reading package lists... Install main build dependencies (apt-based resolver) ---------------------------------------------------- Installing build dependencies Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: autoconf automake autopoint autotools-dev binutils-aarch64-linux-gnu bsdmainutils build-essential cpp-9-aarch64-linux-gnu cpp-aarch64-linux-gnu cross-config crossbuild-essential-arm64 debhelper dh-autoreconf dh-strip-nondeterminism dpkg-cross dwz file g++ g++-9 g++-9-aarch64-linux-gnu g++-aarch64-linux-gnu gcc-10-base:arm64 gcc-10-cross-base gcc-9-aarch64-linux-gnu gcc-9-aarch64-linux-gnu-base gcc-9-base:arm64 gcc-9-cross-base gcc-aarch64-linux-gnu gettext gettext-base groff groff-base intltool-debian libarchive-zip-perl libasan5:arm64 libasan5-arm64-cross libatomic1:arm64 libatomic1-arm64-cross libbsd0 libc6:arm64 libc6-arm64-cross libc6-dev libc6-dev:arm64 libc6-dev-arm64-cross libconfig-auto-perl libconfig-inifiles-perl libcroco3 libcrypt-dev libcrypt-dev:arm64 libcrypt1:arm64 libdebhelper-perl libdebian-dpkgcross-perl libelf1 libfile-homedir-perl libfile-stripnondeterminism-perl libfile-which-perl libgcc-9-dev:arm64 libgcc-9-dev-arm64-cross libgcc-s1:arm64 libgcc-s1-arm64-cross libglib2.0-0 libgomp1:arm64 libgomp1-arm64-cross libice6 libicu63 libio-string-perl libitm1:arm64 libitm1-arm64-cross liblocale-gettext-perl liblsan0:arm64 liblsan0-arm64-cross libmagic-mgc libmagic1 libncurses-dev:arm64 libncurses6:arm64 libncursesw6:arm64 libpipeline1 libsigsegv2 libsm6 libstdc++-9-dev libstdc++-9-dev:arm64 libstdc++-9-dev-arm64-cross libstdc++6:arm64 libstdc++6-arm64-cross libsub-override-perl libtinfo6:arm64 libtool libtsan0:arm64 libtsan0-arm64-cross libubsan1:arm64 libubsan1-arm64-cross libuchardet0 libx11-6 libx11-data libxau6 libxaw7 libxcb1 libxdmcp6 libxext6 libxml-libxml-perl libxml-namespacesupport-perl libxml-sax-base-perl libxml-sax-perl libxml-simple-perl libxml2 libxmu6 libxpm4 libxt6 libyaml-perl linux-libc-dev:arm64 linux-libc-dev-arm64-cross m4 man-db po-debconf sensible-utils time ucf x11-common Suggested packages: autoconf-archive gnu-standards autoconf-doc binutils-doc wamerican | wordlist whois vacation gcc-9-locales cpp-doc dh-make binutils-multiarch g++-multilib g++-9-multilib gcc-9-doc manpages-dev flex bison gdb-aarch64-linux-gnu gcc-doc gettext-doc libasprintf-dev libgettextpo-dev glibc-doc:arm64 libc-l10n:arm64 locales:arm64 glibc-doc manpages-dev:arm64 ncurses-doc:arm64 libstdc++-9-doc libstdc++-9-doc:arm64 libtool-doc gfortran | fortran95-compiler gcj-jdk libyaml-shell-perl m4-doc apparmor less www-browser libmail-box-perl Recommended packages: curl | wget | lynx ghostscript imagemagick libpaper1 netpbm psutils libidn2-0:arm64 libarchive-cpio-perl libglib2.0-data shared-mime-info xdg-user-dirs libgpm2:arm64 libltdl-dev libwww-perl libxml-sax-expat-perl libyaml-libyaml-perl | libyaml-syck-perl libmail-sendmail-perl The following NEW packages will be installed: autoconf automake autopoint autotools-dev binutils-aarch64-linux-gnu bsdmainutils build-essential cpp-9-aarch64-linux-gnu cpp-aarch64-linux-gnu cross-config crossbuild-essential-arm64 debhelper dh-autoreconf dh-strip-nondeterminism dpkg-cross dwz file g++ g++-9 g++-9-aarch64-linux-gnu g++-aarch64-linux-gnu gcc-10-base:arm64 gcc-10-cross-base gcc-9-aarch64-linux-gnu gcc-9-aarch64-linux-gnu-base gcc-9-base:arm64 gcc-9-cross-base gcc-aarch64-linux-gnu gettext gettext-base groff groff-base intltool-debian libarchive-zip-perl libasan5:arm64 libasan5-arm64-cross libatomic1:arm64 libatomic1-arm64-cross libbsd0 libc6:arm64 libc6-arm64-cross libc6-dev libc6-dev:arm64 libc6-dev-arm64-cross libconfig-auto-perl libconfig-inifiles-perl libcroco3 libcrypt-dev libcrypt-dev:arm64 libcrypt1:arm64 libdebhelper-perl libdebian-dpkgcross-perl libelf1 libfile-homedir-perl libfile-stripnondeterminism-perl libfile-which-perl libgcc-9-dev:arm64 libgcc-9-dev-arm64-cross libgcc-s1:arm64 libgcc-s1-arm64-cross libglib2.0-0 libgomp1:arm64 libgomp1-arm64-cross libice6 libicu63 libio-string-perl libitm1:arm64 libitm1-arm64-cross liblocale-gettext-perl liblsan0:arm64 liblsan0-arm64-cross libmagic-mgc libmagic1 libncurses-dev:arm64 libncurses6:arm64 libncursesw6:arm64 libpipeline1 libsigsegv2 libsm6 libstdc++-9-dev libstdc++-9-dev:arm64 libstdc++-9-dev-arm64-cross libstdc++6:arm64 libstdc++6-arm64-cross libsub-override-perl libtinfo6:arm64 libtool libtsan0:arm64 libtsan0-arm64-cross libubsan1:arm64 libubsan1-arm64-cross libuchardet0 libx11-6 libx11-data libxau6 libxaw7 libxcb1 libxdmcp6 libxext6 libxml-libxml-perl libxml-namespacesupport-perl libxml-sax-base-perl libxml-sax-perl libxml-simple-perl libxml2 libxmu6 libxpm4 libxt6 libyaml-perl linux-libc-dev:arm64 linux-libc-dev-arm64-cross m4 man-db po-debconf sbuild-build-depends-main-dummy:arm64 sensible-utils time ucf x11-common 0 upgraded, 119 newly installed, 0 to remove and 0 not upgraded. Need to get 85.4 MB of archives. After this operation, 353 MB of additional disk space will be used. Get:1 copy:/<>/apt_archive ./ sbuild-build-depends-main-dummy 0.invalid.0 [928 B] Get:2 http://debian.oregonstate.edu/debian unstable/main amd64 libbsd0 amd64 0.10.0-1 [107 kB] Get:3 http://debian.oregonstate.edu/debian unstable/main amd64 bsdmainutils amd64 11.1.2+b1 [191 kB] Get:4 http://debian.oregonstate.edu/debian unstable/main amd64 libuchardet0 amd64 0.0.6-3 [64.9 kB] Get:5 http://debian.oregonstate.edu/debian unstable/main amd64 groff-base amd64 1.22.4-4 [919 kB] Get:6 http://debian.oregonstate.edu/debian unstable/main amd64 libpipeline1 amd64 1.5.2-2 [33.9 kB] Get:7 http://debian.oregonstate.edu/debian unstable/main amd64 man-db amd64 2.9.1-1 [1308 kB] Get:8 http://debian.oregonstate.edu/debian unstable/main amd64 liblocale-gettext-perl amd64 1.07-4 [18.8 kB] Get:9 http://debian.oregonstate.edu/debian unstable/main arm64 gcc-10-base arm64 10-20200304-1 [195 kB] Get:10 http://debian.oregonstate.edu/debian unstable/main arm64 gcc-9-base arm64 9.2.1-31 [196 kB] Get:11 http://debian.oregonstate.edu/debian unstable/main amd64 sensible-utils all 0.0.12+nmu1 [16.0 kB] Get:12 http://debian.oregonstate.edu/debian unstable/main amd64 libmagic-mgc amd64 1:5.38-4 [262 kB] Get:13 http://debian.oregonstate.edu/debian unstable/main amd64 libmagic1 amd64 1:5.38-4 [120 kB] Get:14 http://debian.oregonstate.edu/debian unstable/main amd64 file amd64 1:5.38-4 [67.9 kB] Get:15 http://debian.oregonstate.edu/debian unstable/main amd64 gettext-base amd64 0.19.8.1-10 [123 kB] Get:16 http://debian.oregonstate.edu/debian unstable/main amd64 ucf all 3.0038+nmu1 [69.0 kB] Get:17 http://debian.oregonstate.edu/debian unstable/main amd64 libsigsegv2 amd64 2.12-2 [32.8 kB] Get:18 http://debian.oregonstate.edu/debian unstable/main amd64 m4 amd64 1.4.18-4 [203 kB] Get:19 http://debian.oregonstate.edu/debian unstable/main amd64 autoconf all 2.69-11.1 [341 kB] Get:20 http://debian.oregonstate.edu/debian unstable/main amd64 autotools-dev all 20180224.1 [77.0 kB] Get:21 http://debian.oregonstate.edu/debian unstable/main amd64 automake all 1:1.16.1-4 [771 kB] Get:22 http://debian.oregonstate.edu/debian unstable/main amd64 autopoint all 0.19.8.1-10 [435 kB] Get:23 http://debian.oregonstate.edu/debian unstable/main amd64 binutils-aarch64-linux-gnu amd64 2.34-4 [2786 kB] Get:24 http://debian.oregonstate.edu/debian unstable/main amd64 libcrypt-dev amd64 1:4.4.15-1 [104 kB] Get:25 http://debian.oregonstate.edu/debian unstable/main amd64 libc6-dev amd64 2.29-10 [2638 kB] Get:26 http://debian.oregonstate.edu/debian unstable/main amd64 libstdc++-9-dev amd64 9.2.1-31 [1698 kB] Get:27 http://debian.oregonstate.edu/debian unstable/main amd64 g++-9 amd64 9.2.1-31 [10.7 MB] Get:28 http://debian.oregonstate.edu/debian unstable/main amd64 g++ amd64 4:9.2.1-3.1 [1644 B] Get:29 http://debian.oregonstate.edu/debian unstable/main amd64 build-essential amd64 12.8 [7640 B] Get:30 http://debian.oregonstate.edu/debian unstable/main amd64 gcc-9-aarch64-linux-gnu-base amd64 9.2.1-28cross1 [195 kB] Get:31 http://debian.oregonstate.edu/debian unstable/main amd64 cpp-9-aarch64-linux-gnu amd64 9.2.1-28cross1 [6542 kB] Get:32 http://debian.oregonstate.edu/debian unstable/main amd64 cpp-aarch64-linux-gnu amd64 4:9.2.1-3.1 [16.7 kB] Get:33 http://debian.oregonstate.edu/debian unstable/main amd64 cross-config all 2.6.15-3 [39.9 kB] Get:34 http://debian.oregonstate.edu/debian unstable/main amd64 gcc-9-cross-base all 9.2.1-28cross1 [191 kB] Get:35 http://debian.oregonstate.edu/debian unstable/main amd64 gcc-10-cross-base all 10-20200211-1cross1 [191 kB] Get:36 http://debian.oregonstate.edu/debian unstable/main amd64 libc6-arm64-cross all 2.29-9cross1 [1242 kB] Get:37 http://debian.oregonstate.edu/debian unstable/main amd64 libgcc-s1-arm64-cross all 10-20200211-1cross1 [34.6 kB] Get:38 http://debian.oregonstate.edu/debian unstable/main amd64 libgomp1-arm64-cross all 10-20200211-1cross1 [87.2 kB] Get:39 http://debian.oregonstate.edu/debian unstable/main amd64 libitm1-arm64-cross all 10-20200211-1cross1 [22.7 kB] Get:40 http://debian.oregonstate.edu/debian unstable/main amd64 libatomic1-arm64-cross all 10-20200211-1cross1 [8540 B] Get:41 http://debian.oregonstate.edu/debian unstable/main amd64 libasan5-arm64-cross all 9.2.1-28cross1 [347 kB] Get:42 http://debian.oregonstate.edu/debian unstable/main amd64 liblsan0-arm64-cross all 10-20200211-1cross1 [125 kB] Get:43 http://debian.oregonstate.edu/debian unstable/main amd64 libtsan0-arm64-cross all 10-20200211-1cross1 [285 kB] Get:44 http://debian.oregonstate.edu/debian unstable/main amd64 libstdc++6-arm64-cross all 10-20200211-1cross1 [409 kB] Get:45 http://debian.oregonstate.edu/debian unstable/main amd64 libubsan1-arm64-cross all 10-20200211-1cross1 [122 kB] Get:46 http://debian.oregonstate.edu/debian unstable/main amd64 libgcc-9-dev-arm64-cross all 9.2.1-28cross1 [886 kB] Get:47 http://debian.oregonstate.edu/debian unstable/main amd64 gcc-9-aarch64-linux-gnu amd64 9.2.1-28cross1 [6914 kB] Get:48 http://debian.oregonstate.edu/debian unstable/main amd64 gcc-aarch64-linux-gnu amd64 4:9.2.1-3.1 [1460 B] Get:49 http://debian.oregonstate.edu/debian unstable/main amd64 linux-libc-dev-arm64-cross all 5.4.8-1cross1 [1126 kB] Get:50 http://debian.oregonstate.edu/debian unstable/main amd64 libc6-dev-arm64-cross all 2.29-9cross1 [2263 kB] Get:51 http://debian.oregonstate.edu/debian unstable/main amd64 libstdc++-9-dev-arm64-cross all 9.2.1-28cross1 [1636 kB] Get:52 http://debian.oregonstate.edu/debian unstable/main amd64 g++-9-aarch64-linux-gnu amd64 9.2.1-28cross1 [7124 kB] Get:53 http://debian.oregonstate.edu/debian unstable/main amd64 g++-aarch64-linux-gnu amd64 4:9.2.1-3.1 [1180 B] Get:54 http://debian.oregonstate.edu/debian unstable/main amd64 libconfig-inifiles-perl all 3.000002-1 [52.0 kB] Get:55 http://debian.oregonstate.edu/debian unstable/main amd64 libio-string-perl all 1.08-3 [12.3 kB] Get:56 http://debian.oregonstate.edu/debian unstable/main amd64 libicu63 amd64 63.2-2 [8301 kB] Get:57 http://debian.oregonstate.edu/debian unstable/main amd64 libxml2 amd64 2.9.10+dfsg-4 [709 kB] Get:58 http://debian.oregonstate.edu/debian unstable/main amd64 libxml-namespacesupport-perl all 1.12-1 [14.8 kB] Get:59 http://debian.oregonstate.edu/debian unstable/main amd64 libxml-sax-base-perl all 1.09-1 [20.4 kB] Get:60 http://debian.oregonstate.edu/debian unstable/main amd64 libxml-sax-perl all 1.02+dfsg-1 [59.0 kB] Get:61 http://debian.oregonstate.edu/debian unstable/main amd64 libxml-libxml-perl amd64 2.0134+dfsg-2 [343 kB] Get:62 http://debian.oregonstate.edu/debian unstable/main amd64 libxml-simple-perl all 2.25-1 [72.0 kB] Get:63 http://debian.oregonstate.edu/debian unstable/main amd64 libyaml-perl all 1.30-1 [67.7 kB] Get:64 http://debian.oregonstate.edu/debian unstable/main amd64 libconfig-auto-perl all 0.44-1 [19.5 kB] Get:65 http://debian.oregonstate.edu/debian unstable/main amd64 libfile-which-perl all 1.23-1 [16.6 kB] Get:66 http://debian.oregonstate.edu/debian unstable/main amd64 libfile-homedir-perl all 1.004-1 [42.7 kB] Get:67 http://debian.oregonstate.edu/debian unstable/main amd64 libdebian-dpkgcross-perl all 2.6.15-3 [38.7 kB] Get:68 http://debian.oregonstate.edu/debian unstable/main amd64 dpkg-cross all 2.6.15-3 [49.3 kB] Get:69 http://debian.oregonstate.edu/debian unstable/main amd64 crossbuild-essential-arm64 all 12.8 [6644 B] Get:70 http://debian.oregonstate.edu/debian unstable/main amd64 libtool all 2.4.6-14 [513 kB] Get:71 http://debian.oregonstate.edu/debian unstable/main amd64 dh-autoreconf all 19 [16.9 kB] Get:72 http://debian.oregonstate.edu/debian unstable/main amd64 libdebhelper-perl all 12.9 [183 kB] Get:73 http://debian.oregonstate.edu/debian unstable/main amd64 libarchive-zip-perl all 1.67-2 [104 kB] Get:74 http://debian.oregonstate.edu/debian unstable/main amd64 libsub-override-perl all 0.09-2 [10.2 kB] Get:75 http://debian.oregonstate.edu/debian unstable/main amd64 libfile-stripnondeterminism-perl all 1.6.3-2 [23.7 kB] Get:76 http://debian.oregonstate.edu/debian unstable/main amd64 dh-strip-nondeterminism all 1.6.3-2 [14.7 kB] Get:77 http://debian.oregonstate.edu/debian unstable/main amd64 libelf1 amd64 0.176-1.1 [161 kB] Get:78 http://debian.oregonstate.edu/debian unstable/main amd64 dwz amd64 0.13-5 [151 kB] Get:79 http://debian.oregonstate.edu/debian unstable/main amd64 libglib2.0-0 amd64 2.62.5-1 [1320 kB] Get:80 http://debian.oregonstate.edu/debian unstable/main amd64 libcroco3 amd64 0.6.13-1 [146 kB] Get:81 http://debian.oregonstate.edu/debian unstable/main amd64 gettext amd64 0.19.8.1-10 [1303 kB] Get:82 http://debian.oregonstate.edu/debian unstable/main amd64 intltool-debian all 0.35.0+20060710.5 [26.8 kB] Get:83 http://debian.oregonstate.edu/debian unstable/main amd64 po-debconf all 1.0.21 [248 kB] Get:84 http://debian.oregonstate.edu/debian unstable/main amd64 debhelper all 12.9 [994 kB] Get:85 http://debian.oregonstate.edu/debian unstable/main amd64 libxau6 amd64 1:1.0.8-1+b2 [19.9 kB] Get:86 http://debian.oregonstate.edu/debian unstable/main amd64 libxdmcp6 amd64 1:1.1.2-3 [26.3 kB] Get:87 http://debian.oregonstate.edu/debian unstable/main amd64 libxcb1 amd64 1.13.1-5 [137 kB] Get:88 http://debian.oregonstate.edu/debian unstable/main amd64 libx11-data all 2:1.6.9-2 [298 kB] Get:89 http://debian.oregonstate.edu/debian unstable/main amd64 libx11-6 amd64 2:1.6.9-2 [759 kB] Get:90 http://debian.oregonstate.edu/debian unstable/main amd64 libxext6 amd64 2:1.3.3-1+b2 [52.5 kB] Get:91 http://debian.oregonstate.edu/debian unstable/main amd64 x11-common all 1:7.7+20 [252 kB] Get:92 http://debian.oregonstate.edu/debian unstable/main amd64 libice6 amd64 2:1.0.9-2 [58.7 kB] Get:93 http://debian.oregonstate.edu/debian unstable/main amd64 libsm6 amd64 2:1.2.3-1 [35.1 kB] Get:94 http://debian.oregonstate.edu/debian unstable/main amd64 libxt6 amd64 1:1.1.5-1+b3 [190 kB] Get:95 http://debian.oregonstate.edu/debian unstable/main amd64 libxmu6 amd64 2:1.1.2-2+b3 [60.8 kB] Get:96 http://debian.oregonstate.edu/debian unstable/main amd64 libxpm4 amd64 1:3.5.12-1 [49.1 kB] Get:97 http://debian.oregonstate.edu/debian unstable/main amd64 libxaw7 amd64 2:1.0.13-1+b2 [201 kB] Get:98 http://debian.oregonstate.edu/debian unstable/main amd64 groff amd64 1.22.4-4 [3973 kB] Get:99 http://debian.oregonstate.edu/debian unstable/main arm64 libgcc-s1 arm64 10-20200304-1 [34.7 kB] Get:100 http://debian.oregonstate.edu/debian unstable/main arm64 libcrypt1 arm64 1:4.4.15-1 [90.4 kB] Get:101 http://debian.oregonstate.edu/debian unstable/main arm64 libc6 arm64 2.29-10 [2456 kB] Get:102 http://debian.oregonstate.edu/debian unstable/main arm64 libasan5 arm64 9.2.1-31 [354 kB] Get:103 http://debian.oregonstate.edu/debian unstable/main arm64 libatomic1 arm64 10-20200304-1 [8856 B] Get:104 http://debian.oregonstate.edu/debian unstable/main arm64 linux-libc-dev arm64 5.4.19-1 [1055 kB] Get:105 http://debian.oregonstate.edu/debian unstable/main arm64 libcrypt-dev arm64 1:4.4.15-1 [111 kB] Get:106 http://debian.oregonstate.edu/debian unstable/main arm64 libc6-dev arm64 2.29-10 [2274 kB] Get:107 http://debian.oregonstate.edu/debian unstable/main arm64 libgomp1 arm64 10-20200304-1 [90.0 kB] Get:108 http://debian.oregonstate.edu/debian unstable/main arm64 libitm1 arm64 10-20200304-1 [23.4 kB] Get:109 http://debian.oregonstate.edu/debian unstable/main arm64 liblsan0 arm64 10-20200304-1 [126 kB] Get:110 http://debian.oregonstate.edu/debian unstable/main arm64 libtsan0 arm64 10-20200304-1 [292 kB] Get:111 http://debian.oregonstate.edu/debian unstable/main arm64 libstdc++6 arm64 10-20200304-1 [448 kB] Get:112 http://debian.oregonstate.edu/debian unstable/main arm64 libubsan1 arm64 10-20200304-1 [123 kB] Get:113 http://debian.oregonstate.edu/debian unstable/main arm64 libgcc-9-dev arm64 9.2.1-31 [887 kB] Get:114 http://debian.oregonstate.edu/debian unstable/main arm64 libtinfo6 arm64 6.2-1 [329 kB] Get:115 http://debian.oregonstate.edu/debian unstable/main arm64 libncurses6 arm64 6.2-1 [92.6 kB] Get:116 http://debian.oregonstate.edu/debian unstable/main arm64 libncursesw6 arm64 6.2-1 [121 kB] Get:117 http://debian.oregonstate.edu/debian unstable/main arm64 libncurses-dev arm64 6.2-1 [327 kB] Get:118 http://debian.oregonstate.edu/debian unstable/main arm64 libstdc++-9-dev arm64 9.2.1-31 [1658 kB] Get:119 http://debian.oregonstate.edu/debian unstable/main amd64 time amd64 1.7-25.1+b1 [31.6 kB] debconf: delaying package configuration, since apt-utils is not installed Fetched 85.4 MB in 1s (111 MB/s) Selecting previously unselected package libbsd0:amd64. (Reading database ... 12835 files and directories currently installed.) Preparing to unpack .../000-libbsd0_0.10.0-1_amd64.deb ... Unpacking libbsd0:amd64 (0.10.0-1) ... Selecting previously unselected package bsdmainutils. Preparing to unpack .../001-bsdmainutils_11.1.2+b1_amd64.deb ... Unpacking bsdmainutils (11.1.2+b1) ... Selecting previously unselected package libuchardet0:amd64. Preparing to unpack .../002-libuchardet0_0.0.6-3_amd64.deb ... Unpacking libuchardet0:amd64 (0.0.6-3) ... Selecting previously unselected package groff-base. Preparing to unpack .../003-groff-base_1.22.4-4_amd64.deb ... Unpacking groff-base (1.22.4-4) ... Selecting previously unselected package libpipeline1:amd64. Preparing to unpack .../004-libpipeline1_1.5.2-2_amd64.deb ... Unpacking libpipeline1:amd64 (1.5.2-2) ... Selecting previously unselected package man-db. Preparing to unpack .../005-man-db_2.9.1-1_amd64.deb ... Unpacking man-db (2.9.1-1) ... Selecting previously unselected package liblocale-gettext-perl. Preparing to unpack .../006-liblocale-gettext-perl_1.07-4_amd64.deb ... Unpacking liblocale-gettext-perl (1.07-4) ... Selecting previously unselected package gcc-10-base:arm64. Preparing to unpack .../007-gcc-10-base_10-20200304-1_arm64.deb ... Unpacking gcc-10-base:arm64 (10-20200304-1) ... Selecting previously unselected package gcc-9-base:arm64. Preparing to unpack .../008-gcc-9-base_9.2.1-31_arm64.deb ... Unpacking gcc-9-base:arm64 (9.2.1-31) ... Selecting previously unselected package sensible-utils. Preparing to unpack .../009-sensible-utils_0.0.12+nmu1_all.deb ... Unpacking sensible-utils (0.0.12+nmu1) ... Selecting previously unselected package libmagic-mgc. Preparing to unpack .../010-libmagic-mgc_1%3a5.38-4_amd64.deb ... Unpacking libmagic-mgc (1:5.38-4) ... Selecting previously unselected package libmagic1:amd64. Preparing to unpack .../011-libmagic1_1%3a5.38-4_amd64.deb ... Unpacking libmagic1:amd64 (1:5.38-4) ... Selecting previously unselected package file. Preparing to unpack .../012-file_1%3a5.38-4_amd64.deb ... Unpacking file (1:5.38-4) ... Selecting previously unselected package gettext-base. Preparing to unpack .../013-gettext-base_0.19.8.1-10_amd64.deb ... Unpacking gettext-base (0.19.8.1-10) ... Selecting previously unselected package ucf. Preparing to unpack .../014-ucf_3.0038+nmu1_all.deb ... Moving old data out of the way Unpacking ucf (3.0038+nmu1) ... Selecting previously unselected package libsigsegv2:amd64. Preparing to unpack .../015-libsigsegv2_2.12-2_amd64.deb ... Unpacking libsigsegv2:amd64 (2.12-2) ... Selecting previously unselected package m4. Preparing to unpack .../016-m4_1.4.18-4_amd64.deb ... Unpacking m4 (1.4.18-4) ... Selecting previously unselected package autoconf. Preparing to unpack .../017-autoconf_2.69-11.1_all.deb ... Unpacking autoconf (2.69-11.1) ... Selecting previously unselected package autotools-dev. Preparing to unpack .../018-autotools-dev_20180224.1_all.deb ... Unpacking autotools-dev (20180224.1) ... Selecting previously unselected package automake. Preparing to unpack .../019-automake_1%3a1.16.1-4_all.deb ... Unpacking automake (1:1.16.1-4) ... Selecting previously unselected package autopoint. Preparing to unpack .../020-autopoint_0.19.8.1-10_all.deb ... Unpacking autopoint (0.19.8.1-10) ... Selecting previously unselected package binutils-aarch64-linux-gnu. Preparing to unpack .../021-binutils-aarch64-linux-gnu_2.34-4_amd64.deb ... Unpacking binutils-aarch64-linux-gnu (2.34-4) ... Selecting previously unselected package libcrypt-dev:amd64. Preparing to unpack .../022-libcrypt-dev_1%3a4.4.15-1_amd64.deb ... Unpacking libcrypt-dev:amd64 (1:4.4.15-1) ... Selecting previously unselected package libc6-dev:amd64. Preparing to unpack .../023-libc6-dev_2.29-10_amd64.deb ... Unpacking libc6-dev:amd64 (2.29-10) ... Selecting previously unselected package libstdc++-9-dev:amd64. Preparing to unpack .../024-libstdc++-9-dev_9.2.1-31_amd64.deb ... Unpacking libstdc++-9-dev:amd64 (9.2.1-31) ... Selecting previously unselected package g++-9. Preparing to unpack .../025-g++-9_9.2.1-31_amd64.deb ... Unpacking g++-9 (9.2.1-31) ... Selecting previously unselected package g++. Preparing to unpack .../026-g++_4%3a9.2.1-3.1_amd64.deb ... Unpacking g++ (4:9.2.1-3.1) ... Selecting previously unselected package build-essential. Preparing to unpack .../027-build-essential_12.8_amd64.deb ... Unpacking build-essential (12.8) ... Selecting previously unselected package gcc-9-aarch64-linux-gnu-base:amd64. Preparing to unpack .../028-gcc-9-aarch64-linux-gnu-base_9.2.1-28cross1_amd64.deb ... Unpacking gcc-9-aarch64-linux-gnu-base:amd64 (9.2.1-28cross1) ... Selecting previously unselected package cpp-9-aarch64-linux-gnu. Preparing to unpack .../029-cpp-9-aarch64-linux-gnu_9.2.1-28cross1_amd64.deb ... Unpacking cpp-9-aarch64-linux-gnu (9.2.1-28cross1) ... Selecting previously unselected package cpp-aarch64-linux-gnu. Preparing to unpack .../030-cpp-aarch64-linux-gnu_4%3a9.2.1-3.1_amd64.deb ... Unpacking cpp-aarch64-linux-gnu (4:9.2.1-3.1) ... Selecting previously unselected package cross-config. Preparing to unpack .../031-cross-config_2.6.15-3_all.deb ... Unpacking cross-config (2.6.15-3) ... Selecting previously unselected package gcc-9-cross-base. Preparing to unpack .../032-gcc-9-cross-base_9.2.1-28cross1_all.deb ... Unpacking gcc-9-cross-base (9.2.1-28cross1) ... Selecting previously unselected package gcc-10-cross-base. Preparing to unpack .../033-gcc-10-cross-base_10-20200211-1cross1_all.deb ... Unpacking gcc-10-cross-base (10-20200211-1cross1) ... Selecting previously unselected package libc6-arm64-cross. Preparing to unpack .../034-libc6-arm64-cross_2.29-9cross1_all.deb ... Unpacking libc6-arm64-cross (2.29-9cross1) ... Selecting previously unselected package libgcc-s1-arm64-cross. Preparing to unpack .../035-libgcc-s1-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libgcc-s1-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libgomp1-arm64-cross. Preparing to unpack .../036-libgomp1-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libgomp1-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libitm1-arm64-cross. Preparing to unpack .../037-libitm1-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libitm1-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libatomic1-arm64-cross. Preparing to unpack .../038-libatomic1-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libatomic1-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libasan5-arm64-cross. Preparing to unpack .../039-libasan5-arm64-cross_9.2.1-28cross1_all.deb ... Unpacking libasan5-arm64-cross (9.2.1-28cross1) ... Selecting previously unselected package liblsan0-arm64-cross. Preparing to unpack .../040-liblsan0-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking liblsan0-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libtsan0-arm64-cross. Preparing to unpack .../041-libtsan0-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libtsan0-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libstdc++6-arm64-cross. Preparing to unpack .../042-libstdc++6-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libstdc++6-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libubsan1-arm64-cross. Preparing to unpack .../043-libubsan1-arm64-cross_10-20200211-1cross1_all.deb ... Unpacking libubsan1-arm64-cross (10-20200211-1cross1) ... Selecting previously unselected package libgcc-9-dev-arm64-cross. Preparing to unpack .../044-libgcc-9-dev-arm64-cross_9.2.1-28cross1_all.deb ... Unpacking libgcc-9-dev-arm64-cross (9.2.1-28cross1) ... Selecting previously unselected package gcc-9-aarch64-linux-gnu. Preparing to unpack .../045-gcc-9-aarch64-linux-gnu_9.2.1-28cross1_amd64.deb ... Unpacking gcc-9-aarch64-linux-gnu (9.2.1-28cross1) ... Selecting previously unselected package gcc-aarch64-linux-gnu. Preparing to unpack .../046-gcc-aarch64-linux-gnu_4%3a9.2.1-3.1_amd64.deb ... Unpacking gcc-aarch64-linux-gnu (4:9.2.1-3.1) ... Selecting previously unselected package linux-libc-dev-arm64-cross. Preparing to unpack .../047-linux-libc-dev-arm64-cross_5.4.8-1cross1_all.deb ... Unpacking linux-libc-dev-arm64-cross (5.4.8-1cross1) ... Selecting previously unselected package libc6-dev-arm64-cross. Preparing to unpack .../048-libc6-dev-arm64-cross_2.29-9cross1_all.deb ... Unpacking libc6-dev-arm64-cross (2.29-9cross1) ... Selecting previously unselected package libstdc++-9-dev-arm64-cross. Preparing to unpack .../049-libstdc++-9-dev-arm64-cross_9.2.1-28cross1_all.deb ... Unpacking libstdc++-9-dev-arm64-cross (9.2.1-28cross1) ... Selecting previously unselected package g++-9-aarch64-linux-gnu. Preparing to unpack .../050-g++-9-aarch64-linux-gnu_9.2.1-28cross1_amd64.deb ... Unpacking g++-9-aarch64-linux-gnu (9.2.1-28cross1) ... Selecting previously unselected package g++-aarch64-linux-gnu. Preparing to unpack .../051-g++-aarch64-linux-gnu_4%3a9.2.1-3.1_amd64.deb ... Unpacking g++-aarch64-linux-gnu (4:9.2.1-3.1) ... Selecting previously unselected package libconfig-inifiles-perl. Preparing to unpack .../052-libconfig-inifiles-perl_3.000002-1_all.deb ... Unpacking libconfig-inifiles-perl (3.000002-1) ... Selecting previously unselected package libio-string-perl. Preparing to unpack .../053-libio-string-perl_1.08-3_all.deb ... Unpacking libio-string-perl (1.08-3) ... Selecting previously unselected package libicu63:amd64. Preparing to unpack .../054-libicu63_63.2-2_amd64.deb ... Unpacking libicu63:amd64 (63.2-2) ... Selecting previously unselected package libxml2:amd64. Preparing to unpack .../055-libxml2_2.9.10+dfsg-4_amd64.deb ... Unpacking libxml2:amd64 (2.9.10+dfsg-4) ... Selecting previously unselected package libxml-namespacesupport-perl. Preparing to unpack .../056-libxml-namespacesupport-perl_1.12-1_all.deb ... Unpacking libxml-namespacesupport-perl (1.12-1) ... Selecting previously unselected package libxml-sax-base-perl. Preparing to unpack .../057-libxml-sax-base-perl_1.09-1_all.deb ... Unpacking libxml-sax-base-perl (1.09-1) ... Selecting previously unselected package libxml-sax-perl. Preparing to unpack .../058-libxml-sax-perl_1.02+dfsg-1_all.deb ... Unpacking libxml-sax-perl (1.02+dfsg-1) ... Selecting previously unselected package libxml-libxml-perl. Preparing to unpack .../059-libxml-libxml-perl_2.0134+dfsg-2_amd64.deb ... Unpacking libxml-libxml-perl (2.0134+dfsg-2) ... Selecting previously unselected package libxml-simple-perl. Preparing to unpack .../060-libxml-simple-perl_2.25-1_all.deb ... Unpacking libxml-simple-perl (2.25-1) ... Selecting previously unselected package libyaml-perl. Preparing to unpack .../061-libyaml-perl_1.30-1_all.deb ... Unpacking libyaml-perl (1.30-1) ... Selecting previously unselected package libconfig-auto-perl. Preparing to unpack .../062-libconfig-auto-perl_0.44-1_all.deb ... Unpacking libconfig-auto-perl (0.44-1) ... Selecting previously unselected package libfile-which-perl. Preparing to unpack .../063-libfile-which-perl_1.23-1_all.deb ... Unpacking libfile-which-perl (1.23-1) ... Selecting previously unselected package libfile-homedir-perl. Preparing to unpack .../064-libfile-homedir-perl_1.004-1_all.deb ... Unpacking libfile-homedir-perl (1.004-1) ... Selecting previously unselected package libdebian-dpkgcross-perl. Preparing to unpack .../065-libdebian-dpkgcross-perl_2.6.15-3_all.deb ... Unpacking libdebian-dpkgcross-perl (2.6.15-3) ... Selecting previously unselected package dpkg-cross. Preparing to unpack .../066-dpkg-cross_2.6.15-3_all.deb ... Unpacking dpkg-cross (2.6.15-3) ... Selecting previously unselected package crossbuild-essential-arm64. Preparing to unpack .../067-crossbuild-essential-arm64_12.8_all.deb ... Unpacking crossbuild-essential-arm64 (12.8) ... Selecting previously unselected package libtool. Preparing to unpack .../068-libtool_2.4.6-14_all.deb ... Unpacking libtool (2.4.6-14) ... Selecting previously unselected package dh-autoreconf. Preparing to unpack .../069-dh-autoreconf_19_all.deb ... Unpacking dh-autoreconf (19) ... Selecting previously unselected package libdebhelper-perl. Preparing to unpack .../070-libdebhelper-perl_12.9_all.deb ... Unpacking libdebhelper-perl (12.9) ... Selecting previously unselected package libarchive-zip-perl. Preparing to unpack .../071-libarchive-zip-perl_1.67-2_all.deb ... Unpacking libarchive-zip-perl (1.67-2) ... Selecting previously unselected package libsub-override-perl. Preparing to unpack .../072-libsub-override-perl_0.09-2_all.deb ... Unpacking libsub-override-perl (0.09-2) ... Selecting previously unselected package libfile-stripnondeterminism-perl. Preparing to unpack .../073-libfile-stripnondeterminism-perl_1.6.3-2_all.deb ... Unpacking libfile-stripnondeterminism-perl (1.6.3-2) ... Selecting previously unselected package dh-strip-nondeterminism. Preparing to unpack .../074-dh-strip-nondeterminism_1.6.3-2_all.deb ... Unpacking dh-strip-nondeterminism (1.6.3-2) ... Selecting previously unselected package libelf1:amd64. Preparing to unpack .../075-libelf1_0.176-1.1_amd64.deb ... Unpacking libelf1:amd64 (0.176-1.1) ... Selecting previously unselected package dwz. Preparing to unpack .../076-dwz_0.13-5_amd64.deb ... Unpacking dwz (0.13-5) ... Selecting previously unselected package libglib2.0-0:amd64. Preparing to unpack .../077-libglib2.0-0_2.62.5-1_amd64.deb ... Unpacking libglib2.0-0:amd64 (2.62.5-1) ... Selecting previously unselected package libcroco3:amd64. Preparing to unpack .../078-libcroco3_0.6.13-1_amd64.deb ... Unpacking libcroco3:amd64 (0.6.13-1) ... Selecting previously unselected package gettext. Preparing to unpack .../079-gettext_0.19.8.1-10_amd64.deb ... Unpacking gettext (0.19.8.1-10) ... Selecting previously unselected package intltool-debian. Preparing to unpack .../080-intltool-debian_0.35.0+20060710.5_all.deb ... Unpacking intltool-debian (0.35.0+20060710.5) ... Selecting previously unselected package po-debconf. Preparing to unpack .../081-po-debconf_1.0.21_all.deb ... Unpacking po-debconf (1.0.21) ... Selecting previously unselected package debhelper. Preparing to unpack .../082-debhelper_12.9_all.deb ... Unpacking debhelper (12.9) ... Selecting previously unselected package libxau6:amd64. Preparing to unpack .../083-libxau6_1%3a1.0.8-1+b2_amd64.deb ... Unpacking libxau6:amd64 (1:1.0.8-1+b2) ... Selecting previously unselected package libxdmcp6:amd64. Preparing to unpack .../084-libxdmcp6_1%3a1.1.2-3_amd64.deb ... Unpacking libxdmcp6:amd64 (1:1.1.2-3) ... Selecting previously unselected package libxcb1:amd64. Preparing to unpack .../085-libxcb1_1.13.1-5_amd64.deb ... Unpacking libxcb1:amd64 (1.13.1-5) ... Selecting previously unselected package libx11-data. Preparing to unpack .../086-libx11-data_2%3a1.6.9-2_all.deb ... Unpacking libx11-data (2:1.6.9-2) ... Selecting previously unselected package libx11-6:amd64. Preparing to unpack .../087-libx11-6_2%3a1.6.9-2_amd64.deb ... Unpacking libx11-6:amd64 (2:1.6.9-2) ... Selecting previously unselected package libxext6:amd64. Preparing to unpack .../088-libxext6_2%3a1.3.3-1+b2_amd64.deb ... Unpacking libxext6:amd64 (2:1.3.3-1+b2) ... Selecting previously unselected package x11-common. Preparing to unpack .../089-x11-common_1%3a7.7+20_all.deb ... Unpacking x11-common (1:7.7+20) ... Selecting previously unselected package libice6:amd64. Preparing to unpack .../090-libice6_2%3a1.0.9-2_amd64.deb ... Unpacking libice6:amd64 (2:1.0.9-2) ... Selecting previously unselected package libsm6:amd64. Preparing to unpack .../091-libsm6_2%3a1.2.3-1_amd64.deb ... Unpacking libsm6:amd64 (2:1.2.3-1) ... Selecting previously unselected package libxt6:amd64. Preparing to unpack .../092-libxt6_1%3a1.1.5-1+b3_amd64.deb ... Unpacking libxt6:amd64 (1:1.1.5-1+b3) ... Selecting previously unselected package libxmu6:amd64. Preparing to unpack .../093-libxmu6_2%3a1.1.2-2+b3_amd64.deb ... Unpacking libxmu6:amd64 (2:1.1.2-2+b3) ... Selecting previously unselected package libxpm4:amd64. Preparing to unpack .../094-libxpm4_1%3a3.5.12-1_amd64.deb ... Unpacking libxpm4:amd64 (1:3.5.12-1) ... Selecting previously unselected package libxaw7:amd64. Preparing to unpack .../095-libxaw7_2%3a1.0.13-1+b2_amd64.deb ... Unpacking libxaw7:amd64 (2:1.0.13-1+b2) ... Selecting previously unselected package groff. Preparing to unpack .../096-groff_1.22.4-4_amd64.deb ... Unpacking groff (1.22.4-4) ... Selecting previously unselected package libgcc-s1:arm64. Preparing to unpack .../097-libgcc-s1_10-20200304-1_arm64.deb ... Unpacking libgcc-s1:arm64 (10-20200304-1) ... Selecting previously unselected package libcrypt1:arm64. Preparing to unpack .../098-libcrypt1_1%3a4.4.15-1_arm64.deb ... Unpacking libcrypt1:arm64 (1:4.4.15-1) ... Selecting previously unselected package libc6:arm64. Preparing to unpack .../099-libc6_2.29-10_arm64.deb ... Unpacking libc6:arm64 (2.29-10) ... Selecting previously unselected package libasan5:arm64. Preparing to unpack .../100-libasan5_9.2.1-31_arm64.deb ... Unpacking libasan5:arm64 (9.2.1-31) ... Selecting previously unselected package libatomic1:arm64. Preparing to unpack .../101-libatomic1_10-20200304-1_arm64.deb ... Unpacking libatomic1:arm64 (10-20200304-1) ... Selecting previously unselected package linux-libc-dev:arm64. Preparing to unpack .../102-linux-libc-dev_5.4.19-1_arm64.deb ... Unpacking linux-libc-dev:arm64 (5.4.19-1) ... Selecting previously unselected package libcrypt-dev:arm64. Preparing to unpack .../103-libcrypt-dev_1%3a4.4.15-1_arm64.deb ... Unpacking libcrypt-dev:arm64 (1:4.4.15-1) ... Selecting previously unselected package libc6-dev:arm64. Preparing to unpack .../104-libc6-dev_2.29-10_arm64.deb ... Unpacking libc6-dev:arm64 (2.29-10) ... Selecting previously unselected package libgomp1:arm64. Preparing to unpack .../105-libgomp1_10-20200304-1_arm64.deb ... Unpacking libgomp1:arm64 (10-20200304-1) ... Selecting previously unselected package libitm1:arm64. Preparing to unpack .../106-libitm1_10-20200304-1_arm64.deb ... Unpacking libitm1:arm64 (10-20200304-1) ... Selecting previously unselected package liblsan0:arm64. Preparing to unpack .../107-liblsan0_10-20200304-1_arm64.deb ... Unpacking liblsan0:arm64 (10-20200304-1) ... Selecting previously unselected package libtsan0:arm64. Preparing to unpack .../108-libtsan0_10-20200304-1_arm64.deb ... Unpacking libtsan0:arm64 (10-20200304-1) ... Selecting previously unselected package libstdc++6:arm64. Preparing to unpack .../109-libstdc++6_10-20200304-1_arm64.deb ... Unpacking libstdc++6:arm64 (10-20200304-1) ... Selecting previously unselected package libubsan1:arm64. Preparing to unpack .../110-libubsan1_10-20200304-1_arm64.deb ... Unpacking libubsan1:arm64 (10-20200304-1) ... Selecting previously unselected package libgcc-9-dev:arm64. Preparing to unpack .../111-libgcc-9-dev_9.2.1-31_arm64.deb ... Unpacking libgcc-9-dev:arm64 (9.2.1-31) ... Selecting previously unselected package libtinfo6:arm64. Preparing to unpack .../112-libtinfo6_6.2-1_arm64.deb ... Unpacking libtinfo6:arm64 (6.2-1) ... Selecting previously unselected package libncurses6:arm64. Preparing to unpack .../113-libncurses6_6.2-1_arm64.deb ... Unpacking libncurses6:arm64 (6.2-1) ... Selecting previously unselected package libncursesw6:arm64. Preparing to unpack .../114-libncursesw6_6.2-1_arm64.deb ... Unpacking libncursesw6:arm64 (6.2-1) ... Selecting previously unselected package libncurses-dev:arm64. Preparing to unpack .../115-libncurses-dev_6.2-1_arm64.deb ... Unpacking libncurses-dev:arm64 (6.2-1) ... Selecting previously unselected package libstdc++-9-dev:arm64. Preparing to unpack .../116-libstdc++-9-dev_9.2.1-31_arm64.deb ... Unpacking libstdc++-9-dev:arm64 (9.2.1-31) ... Selecting previously unselected package time. Preparing to unpack .../117-time_1.7-25.1+b1_amd64.deb ... Unpacking time (1.7-25.1+b1) ... Selecting previously unselected package sbuild-build-depends-main-dummy:arm64. Preparing to unpack .../118-sbuild-build-depends-main-dummy_0.invalid.0_arm64.deb ... Unpacking sbuild-build-depends-main-dummy:arm64 (0.invalid.0) ... Setting up libconfig-inifiles-perl (3.000002-1) ... Setting up libpipeline1:amd64 (1.5.2-2) ... Setting up libfile-which-perl (1.23-1) ... Setting up libxau6:amd64 (1:1.0.8-1+b2) ... Setting up time (1.7-25.1+b1) ... Setting up libmagic-mgc (1:5.38-4) ... Setting up libarchive-zip-perl (1.67-2) ... Setting up libglib2.0-0:amd64 (2.62.5-1) ... No schema files found: doing nothing. Setting up gcc-9-aarch64-linux-gnu-base:amd64 (9.2.1-28cross1) ... Setting up libdebhelper-perl (12.9) ... Setting up x11-common (1:7.7+20) ... update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults invoke-rc.d: could not determine current runlevel All runlevel operations denied by policy invoke-rc.d: policy-rc.d denied execution of start. Setting up libmagic1:amd64 (1:5.38-4) ... Setting up linux-libc-dev:arm64 (5.4.19-1) ... Setting up libxml-namespacesupport-perl (1.12-1) ... Setting up gettext-base (0.19.8.1-10) ... Setting up binutils-aarch64-linux-gnu (2.34-4) ... Setting up cpp-9-aarch64-linux-gnu (9.2.1-28cross1) ... Setting up file (1:5.38-4) ... Setting up libyaml-perl (1.30-1) ... Setting up libicu63:amd64 (63.2-2) ... Setting up libxml-sax-base-perl (1.09-1) ... Setting up libio-string-perl (1.08-3) ... Setting up gcc-10-base:arm64 (10-20200304-1) ... Setting up cpp-aarch64-linux-gnu (4:9.2.1-3.1) ... Setting up autotools-dev (20180224.1) ... Setting up cross-config (2.6.15-3) ... Setting up libx11-data (2:1.6.9-2) ... Setting up libsigsegv2:amd64 (2.12-2) ... Setting up libc6-arm64-cross (2.29-9cross1) ... Setting up autopoint (0.19.8.1-10) ... Setting up gcc-9-cross-base (9.2.1-28cross1) ... Setting up gcc-10-cross-base (10-20200211-1cross1) ... Setting up linux-libc-dev-arm64-cross (5.4.8-1cross1) ... Setting up sensible-utils (0.0.12+nmu1) ... Setting up libcrypt-dev:amd64 (1:4.4.15-1) ... Setting up libuchardet0:amd64 (0.0.6-3) ... Setting up libsub-override-perl (0.09-2) ... Setting up libc6-dev:amd64 (2.29-10) ... Setting up libfile-homedir-perl (1.004-1) ... Setting up libbsd0:amd64 (0.10.0-1) ... Setting up libelf1:amd64 (0.176-1.1) ... Setting up libxml2:amd64 (2.9.10+dfsg-4) ... Setting up liblocale-gettext-perl (1.07-4) ... Setting up gcc-9-base:arm64 (9.2.1-31) ... Setting up libgcc-s1-arm64-cross (10-20200211-1cross1) ... Setting up libatomic1-arm64-cross (10-20200211-1cross1) ... Setting up libfile-stripnondeterminism-perl (1.6.3-2) ... Setting up liblsan0-arm64-cross (10-20200211-1cross1) ... Setting up libice6:amd64 (2:1.0.9-2) ... Setting up libgomp1-arm64-cross (10-20200211-1cross1) ... Setting up libxdmcp6:amd64 (1:1.1.2-3) ... Setting up libxcb1:amd64 (1.13.1-5) ... Setting up libstdc++-9-dev:amd64 (9.2.1-31) ... Setting up libtool (2.4.6-14) ... Setting up libtsan0-arm64-cross (10-20200211-1cross1) ... Setting up m4 (1.4.18-4) ... Setting up libc6-dev-arm64-cross (2.29-9cross1) ... Setting up libasan5-arm64-cross (9.2.1-28cross1) ... Setting up libstdc++6-arm64-cross (10-20200211-1cross1) ... Setting up bsdmainutils (11.1.2+b1) ... update-alternatives: using /usr/bin/bsd-write to provide /usr/bin/write (write) in auto mode update-alternatives: using /usr/bin/bsd-from to provide /usr/bin/from (from) in auto mode Setting up libcroco3:amd64 (0.6.13-1) ... Setting up libitm1-arm64-cross (10-20200211-1cross1) ... Setting up ucf (3.0038+nmu1) ... Setting up g++-9 (9.2.1-31) ... Setting up autoconf (2.69-11.1) ... Setting up dh-strip-nondeterminism (1.6.3-2) ... Setting up g++ (4:9.2.1-3.1) ... update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode Setting up dwz (0.13-5) ... Setting up groff-base (1.22.4-4) ... Setting up build-essential (12.8) ... Setting up libx11-6:amd64 (2:1.6.9-2) ... Setting up libsm6:amd64 (2:1.2.3-1) ... Setting up automake (1:1.16.1-4) ... update-alternatives: using /usr/bin/automake-1.16 to provide /usr/bin/automake (automake) in auto mode Setting up gettext (0.19.8.1-10) ... Setting up libxpm4:amd64 (1:3.5.12-1) ... Setting up libubsan1-arm64-cross (10-20200211-1cross1) ... Setting up libxext6:amd64 (2:1.3.3-1+b2) ... Setting up libgcc-9-dev-arm64-cross (9.2.1-28cross1) ... Setting up man-db (2.9.1-1) ... Not building database; man-db/auto-update is not 'true'. Setting up libxml-sax-perl (1.02+dfsg-1) ... update-perl-sax-parsers: Registering Perl SAX parser XML::SAX::PurePerl with priority 10... update-perl-sax-parsers: Updating overall Perl SAX parser modules info file... Creating config file /etc/perl/XML/SAX/ParserDetails.ini with new version Setting up intltool-debian (0.35.0+20060710.5) ... Setting up gcc-9-aarch64-linux-gnu (9.2.1-28cross1) ... Setting up libxt6:amd64 (1:1.1.5-1+b3) ... Setting up libxml-libxml-perl (2.0134+dfsg-2) ... update-perl-sax-parsers: Registering Perl SAX parser XML::LibXML::SAX::Parser with priority 50... update-perl-sax-parsers: Registering Perl SAX parser XML::LibXML::SAX with priority 50... update-perl-sax-parsers: Updating overall Perl SAX parser modules info file... Replacing config file /etc/perl/XML/SAX/ParserDetails.ini with new version Setting up libstdc++-9-dev-arm64-cross (9.2.1-28cross1) ... Setting up gcc-aarch64-linux-gnu (4:9.2.1-3.1) ... Setting up libxmu6:amd64 (2:1.1.2-2+b3) ... Setting up po-debconf (1.0.21) ... Setting up libxaw7:amd64 (2:1.0.13-1+b2) ... Setting up groff (1.22.4-4) ... Setting up g++-9-aarch64-linux-gnu (9.2.1-28cross1) ... Setting up libxml-simple-perl (2.25-1) ... Setting up g++-aarch64-linux-gnu (4:9.2.1-3.1) ... Setting up libconfig-auto-perl (0.44-1) ... Setting up libdebian-dpkgcross-perl (2.6.15-3) ... Setting up dpkg-cross (2.6.15-3) ... Setting up crossbuild-essential-arm64 (12.8) ... Setting up libcrypt1:arm64 (1:4.4.15-1) ... Setting up libgcc-s1:arm64 (10-20200304-1) ... Setting up dh-autoreconf (19) ... Setting up libc6:arm64 (2.29-10) ... Setting up libcrypt-dev:arm64 (1:4.4.15-1) ... Setting up libc6-dev:arm64 (2.29-10) ... Setting up libstdc++6:arm64 (10-20200304-1) ... Setting up liblsan0:arm64 (10-20200304-1) ... Setting up libitm1:arm64 (10-20200304-1) ... Setting up libtinfo6:arm64 (6.2-1) ... Setting up libtsan0:arm64 (10-20200304-1) ... Setting up debhelper (12.9) ... Setting up libgomp1:arm64 (10-20200304-1) ... Setting up libasan5:arm64 (9.2.1-31) ... Setting up libncurses6:arm64 (6.2-1) ... Setting up libatomic1:arm64 (10-20200304-1) ... Setting up libncursesw6:arm64 (6.2-1) ... Setting up libubsan1:arm64 (10-20200304-1) ... Setting up libncurses-dev:arm64 (6.2-1) ... Setting up libgcc-9-dev:arm64 (9.2.1-31) ... Setting up libstdc++-9-dev:arm64 (9.2.1-31) ... Setting up sbuild-build-depends-main-dummy:arm64 (0.invalid.0) ... Processing triggers for libc-bin (2.29-10) ... +------------------------------------------------------------------------------+ | Check architectures | +------------------------------------------------------------------------------+ Arch check ok (arm64 included in any) +------------------------------------------------------------------------------+ | Build environment | +------------------------------------------------------------------------------+ Kernel: Linux 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) amd64 (x86_64) Toolchain package versions: binutils_2.34-4 dpkg-dev_1.19.7 g++-9_9.2.1-31 gcc-8_8.4.0-1 gcc-9_9.2.1-31 libc6-dev_2.29-10 libstdc++-9-dev_9.2.1-31 libstdc++-9-dev-arm64-cross_9.2.1-28cross1 libstdc++6_10-20200304-1 libstdc++6-arm64-cross_10-20200211-1cross1 linux-libc-dev_5.4.19-1 Package versions: adduser_3.118 apt_2.0.0 autoconf_2.69-11.1 automake_1:1.16.1-4 autopoint_0.19.8.1-10 autotools-dev_20180224.1 base-files_11 base-passwd_3.5.47 bash_5.0-6 binutils_2.34-4 binutils-aarch64-linux-gnu_2.34-4 binutils-common_2.34-4 binutils-x86-64-linux-gnu_2.34-4 bsdmainutils_11.1.2+b1 bsdutils_1:2.34-0.1 build-essential_12.8 bzip2_1.0.8-2 coreutils_8.30-3+b1 cpp_4:9.2.1-3.1 cpp-8_8.4.0-1 cpp-9_9.2.1-31 cpp-9-aarch64-linux-gnu_9.2.1-28cross1 cpp-aarch64-linux-gnu_4:9.2.1-3.1 cross-config_2.6.15-3 crossbuild-essential-arm64_12.8 dash_0.5.10.2-6 debconf_1.5.73 debhelper_12.9 debian-archive-keyring_2019.1 debianutils_4.9.1 dh-autoreconf_19 dh-strip-nondeterminism_1.6.3-2 diffutils_1:3.7-3 dpkg_1.19.7 dpkg-cross_2.6.15-3 dpkg-dev_1.19.7 dwz_0.13-5 e2fsprogs_1.45.5-2 fakeroot_1.24-1 fdisk_2.34-0.1 file_1:5.38-4 findutils_4.7.0-1 g++_4:9.2.1-3.1 g++-9_9.2.1-31 g++-9-aarch64-linux-gnu_9.2.1-28cross1 g++-aarch64-linux-gnu_4:9.2.1-3.1 gcc_4:9.2.1-3.1 gcc-10-base_10-20200304-1 gcc-10-cross-base_10-20200211-1cross1 gcc-8_8.4.0-1 gcc-8-base_8.4.0-1 gcc-9_9.2.1-31 gcc-9-aarch64-linux-gnu_9.2.1-28cross1 gcc-9-aarch64-linux-gnu-base_9.2.1-28cross1 gcc-9-base_9.2.1-31 gcc-9-cross-base_9.2.1-28cross1 gcc-aarch64-linux-gnu_4:9.2.1-3.1 gettext_0.19.8.1-10 gettext-base_0.19.8.1-10 gpgv_2.2.19-2 grep_3.4-1 groff_1.22.4-4 groff-base_1.22.4-4 gzip_1.10-1 hostname_3.23 init-system-helpers_1.57 intltool-debian_0.35.0+20060710.5 libacl1_2.2.53-6 libapt-pkg5.0_1.8.4 libapt-pkg6.0_2.0.0 libarchive-zip-perl_1.67-2 libasan5_9.2.1-31 libasan5-arm64-cross_9.2.1-28cross1 libatomic1_10-20200304-1 libatomic1-arm64-cross_10-20200211-1cross1 libattr1_1:2.4.48-5 libaudit-common_1:2.8.5-2 libaudit1_1:2.8.5-2+b1 libbinutils_2.34-4 libblkid1_2.34-0.1 libbsd0_0.10.0-1 libbz2-1.0_1.0.8-2 libc-bin_2.29-10 libc-dev-bin_2.29-10 libc6_2.29-10 libc6-arm64-cross_2.29-9cross1 libc6-dev_2.29-10 libc6-dev-arm64-cross_2.29-9cross1 libcap-ng0_0.7.9-2.1+b2 libcc1-0_10-20200304-1 libcom-err2_1.45.5-2 libconfig-auto-perl_0.44-1 libconfig-inifiles-perl_3.000002-1 libcroco3_0.6.13-1 libcrypt-dev_1:4.4.15-1 libcrypt1_1:4.4.15-1 libctf-nobfd0_2.34-4 libctf0_2.34-4 libdb5.3_5.3.28+dfsg1-0.6 libdebconfclient0_0.251 libdebhelper-perl_12.9 libdebian-dpkgcross-perl_2.6.15-3 libdpkg-perl_1.19.7 libelf1_0.176-1.1 libext2fs2_1.45.5-2 libfakeroot_1.24-1 libfdisk1_2.34-0.1 libffi6_3.2.1-9 libffi7_3.3-3 libfile-homedir-perl_1.004-1 libfile-stripnondeterminism-perl_1.6.3-2 libfile-which-perl_1.23-1 libgcc-8-dev_8.4.0-1 libgcc-9-dev_9.2.1-31 libgcc-9-dev-arm64-cross_9.2.1-28cross1 libgcc-s1_10-20200304-1 libgcc-s1-arm64-cross_10-20200211-1cross1 libgcc1_1:10-20200304-1 libgcrypt20_1.8.5-5 libgdbm-compat4_1.18.1-5 libgdbm6_1.18.1-5 libglib2.0-0_2.62.5-1 libgmp10_2:6.2.0+dfsg-4 libgnutls30_3.6.12-2 libgomp1_10-20200304-1 libgomp1-arm64-cross_10-20200211-1cross1 libgpg-error0_1.37-1 libhogweed4_3.5.1+really3.4.1-1 libhogweed5_3.5.1+really3.5.1-2 libice6_2:1.0.9-2 libicu63_63.2-2 libidn2-0_2.3.0-1 libio-string-perl_1.08-3 libisl19_0.20-2 libisl22_0.22.1-1 libitm1_10-20200304-1 libitm1-arm64-cross_10-20200211-1cross1 liblocale-gettext-perl_1.07-4 liblsan0_10-20200304-1 liblsan0-arm64-cross_10-20200211-1cross1 liblz4-1_1.9.2-2 liblzma5_5.2.4-1+b1 libmagic-mgc_1:5.38-4 libmagic1_1:5.38-4 libmount1_2.34-0.1 libmpc3_1.1.0-1 libmpfr6_4.0.2-1 libmpx2_8.4.0-1 libncurses-dev_6.2-1 libncurses6_6.2-1 libncursesw6_6.2-1 libnettle6_3.5.1+really3.4.1-1 libnettle7_3.5.1+really3.5.1-2 libp11-kit0_0.23.20-1 libpam-modules_1.3.1-5 libpam-modules-bin_1.3.1-5 libpam-runtime_1.3.1-5 libpam0g_1.3.1-5 libpcre2-8-0_10.34-7 libpcre3_2:8.39-12+b1 libperl5.28_5.28.1-6 libperl5.30_5.30.0-9 libpipeline1_1.5.2-2 libquadmath0_10-20200304-1 libseccomp2_2.4.2-2 libselinux1_3.0-1+b1 libsemanage-common_3.0-1 libsemanage1_3.0-1+b1 libsepol1_3.0-1 libsigsegv2_2.12-2 libsm6_2:1.2.3-1 libsmartcols1_2.34-0.1 libss2_1.45.5-2 libstdc++-9-dev_9.2.1-31 libstdc++-9-dev-arm64-cross_9.2.1-28cross1 libstdc++6_10-20200304-1 libstdc++6-arm64-cross_10-20200211-1cross1 libsub-override-perl_0.09-2 libsystemd0_244.3-1 libtasn1-6_4.16.0-2 libtinfo6_6.2-1 libtool_2.4.6-14 libtsan0_10-20200304-1 libtsan0-arm64-cross_10-20200211-1cross1 libubsan1_10-20200304-1 libubsan1-arm64-cross_10-20200211-1cross1 libuchardet0_0.0.6-3 libudev1_244.3-1 libunistring2_0.9.10-2 libuuid1_2.34-0.1 libx11-6_2:1.6.9-2 libx11-data_2:1.6.9-2 libxau6_1:1.0.8-1+b2 libxaw7_2:1.0.13-1+b2 libxcb1_1.13.1-5 libxdmcp6_1:1.1.2-3 libxext6_2:1.3.3-1+b2 libxml-libxml-perl_2.0134+dfsg-2 libxml-namespacesupport-perl_1.12-1 libxml-sax-base-perl_1.09-1 libxml-sax-perl_1.02+dfsg-1 libxml-simple-perl_2.25-1 libxml2_2.9.10+dfsg-4 libxmu6_2:1.1.2-2+b3 libxpm4_1:3.5.12-1 libxt6_1:1.1.5-1+b3 libyaml-perl_1.30-1 libzstd1_1.4.4+dfsg-3 linux-libc-dev_5.4.19-1 linux-libc-dev-arm64-cross_5.4.8-1cross1 login_1:4.8.1-1 logsave_1.45.5-2 lsb-base_11.1.0 m4_1.4.18-4 make_4.2.1-1.2 man-db_2.9.1-1 mawk_1.3.4.20200120-2 mount_2.34-0.1 ncurses-base_6.2-1 ncurses-bin_6.2-1 passwd_1:4.8.1-1 patch_2.7.6-6 perl_5.30.0-9 perl-base_5.30.0-9 perl-modules-5.28_5.28.1-6 perl-modules-5.30_5.30.0-9 po-debconf_1.0.21 sbuild-build-depends-main-dummy_0.invalid.0 sed_4.7-1 sensible-utils_0.0.12+nmu1 sysvinit-utils_2.96-2.1 tar_1.30+dfsg-6+b1 time_1.7-25.1+b1 tzdata_2019c-3 ucf_3.0038+nmu1 util-linux_2.34-0.1 x11-common_1:7.7+20 xz-utils_5.2.4-1+b1 zlib1g_1:1.2.11.dfsg-2 +------------------------------------------------------------------------------+ | Build | +------------------------------------------------------------------------------+ Unpack source ------------- -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Format: 3.0 (quilt) Source: wiggle Binary: wiggle Architecture: any Version: 1.1-1 Maintainer: Debian QA Group Homepage: https://neil.brown.name/wiggle Standards-Version: 4.3.0 Vcs-Browser: https://salsa.debian.org/debian/wiggle Vcs-Git: https://salsa.debian.org/debian/wiggle.git Build-Depends: debhelper (>= 12), groff, libncurses-dev, time Package-List: wiggle deb vcs optional arch=any Checksums-Sha1: e54338c2955677263c3075013099d0b9c573498f 846772 wiggle_1.1.orig.tar.gz 8156a321810f26130b35319230dea2c35577f225 6840 wiggle_1.1-1.debian.tar.xz Checksums-Sha256: 3da3cf6a456dd1415d2644e345f9831eb2912c6fa8dfa5d63d9bf49d744abff3 846772 wiggle_1.1.orig.tar.gz 49b9f5efc494917f6b71d3f4c6951a1c73894b696af69198518019b18e9e00bd 6840 wiggle_1.1-1.debian.tar.xz Files: 0a76d5ed008094da05ac15abe89c1641 846772 wiggle_1.1.orig.tar.gz ce8b3417d03e592b7abe3c58b38dc510 6840 wiggle_1.1-1.debian.tar.xz -----BEGIN PGP SIGNATURE----- iQJHBAEBCgAxFiEEhnHVzDbtdH7ktKj4SBLY3qgmEeYFAlxLjD0THGthY3Rpb25A ZGViaWFuLm9yZwAKCRBIEtjeqCYR5lsfD/9Flo7Heux8pKCw7gFQE98ebkJFL42C nJ9pxsUHM5yU1UyO0kfJNU4dJ/6RFc4rloqauams4vgaL2vxI31a6GRRfCm5PIVQ Q4an/1s7bUxMsp4WZACzEMvHjUeTABLeCzMekHop2JOzryGmErQqOsD1PZHq+E1A MnPbhpnbYrcN87jnOelqi+BnnWUpZcqmiunD8efdtq9tSlMXIfLEsnem8xASnfnE akcXxzjT1Hq2L2jaMmDZd4Xh1GT8i6i3q/ZRuK41SimaHMSVS8rFk9JVUQNNUDXH g6I3agFmIbS3UrQeQDrbgl76FBd8JPdyKRRpBg9+99iXzv5+thYOrRUE85KUdIAf muW8EUzRRIP7lyCS9BUxXfLP55tS09wuePkUcksdAb+WPZFfOIplnBOdfP7+rNGu A+SCimsWQTrISAmIUNySp7Ss91CKXQhAda69Hy4PPbkdeCpZD0XallvR/fZlLjHi 1hGJ4zAld/S3WAKuphr9CQndUYXOpcdC+fQc1w7iZH4wr+5H9WKvtpeI67Z7K6rQ 9lTlKpJVuGmSqDtWmvE97ycvr/OzmzSciLzkAqPgeYwSBe+yUx5tu8f18zHPrbN7 w2EpYdBq0RaM7wyiaa+mP6aPed4w7Y6eb60hfi0VfwaXFdNIo6Hxi9ccDWR/jCup ZsUusQur/oMkkg== =Bu66 -----END PGP SIGNATURE----- gpgv: unknown type of key resource 'trustedkeys.kbx' gpgv: keyblock resource '/sbuild-nonexistent/.gnupg/trustedkeys.kbx': General error gpgv: Signature made Fri Jan 25 22:22:53 2019 UTC gpgv: using RSA key 8671D5CC36ED747EE4B4A8F84812D8DEA82611E6 gpgv: issuer "kaction@debian.org" gpgv: Can't check signature: No public key dpkg-source: warning: failed to verify signature on ./wiggle_1.1-1.dsc dpkg-source: info: extracting wiggle in /<> dpkg-source: info: unpacking wiggle_1.1.orig.tar.gz dpkg-source: info: unpacking wiggle_1.1-1.debian.tar.xz dpkg-source: info: using patch list from debian/patches/series dpkg-source: info: applying typos.patch dpkg-source: info: applying gcc8-format-security.patch Check disk space ---------------- Sufficient free space for build User Environment ---------------- APT_CONFIG=/var/lib/sbuild/apt.conf CONFIG_SITE=/etc/dpkg-cross/cross-config.arm64 DEB_BUILD_OPTIONS=nocheck HOME=/sbuild-nonexistent LANG=en_US.UTF-8 LC_ALL=C.UTF-8 LOGNAME=helmut PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games SCHROOT_ALIAS_NAME=unstable-amd64-sbuild SCHROOT_CHROOT_NAME=unstable-amd64-sbuild SCHROOT_COMMAND=env SCHROOT_GID=1003 SCHROOT_GROUP=helmut SCHROOT_SESSION_ID=unstable-amd64-sbuild-96a4a8fd-a1df-4f3a-bd07-21de1e0ec7dd SCHROOT_UID=1003 SCHROOT_USER=helmut SHELL=/bin/sh USER=helmut dpkg-buildpackage ----------------- Command: dpkg-buildpackage -aarm64 -Pcross,nocheck -us -uc -B -rfakeroot --jobs-try=1 dpkg-buildpackage: info: source package wiggle dpkg-buildpackage: info: source version 1.1-1 dpkg-buildpackage: info: source distribution unstable dpkg-buildpackage: info: source changed by Carlos Maddela dpkg-architecture: warning: specified GNU system type aarch64-linux-gnu does not match CC system type x86_64-linux-gnu, try setting a correct CC environment variable dpkg-source --before-build . dpkg-buildpackage: info: host architecture arm64 debian/rules clean dh clean dh_auto_clean make -j1 clean make[1]: Entering directory '/<>' rm -f *.o ccan/hash/*.o *.man wiggle .version* demo.patch version find . -name core -o -name '*.tmp*' -o -name .tmp -o -name .time | xargs rm -f make[1]: Leaving directory '/<>' dh_clean rm -f debian/debhelper-build-stamp rm -rf debian/.debhelper/ rm -f -- debian/wiggle.substvars debian/files rm -fr -- debian/wiggle/ debian/tmp/ find . \( \( \ \( -path .\*/.git -o -path .\*/.svn -o -path .\*/.bzr -o -path .\*/.hg -o -path .\*/CVS -o -path .\*/.pc -o -path .\*/_darcs \) -prune -o -type f -a \ \( -name '#*#' -o -name '.*~' -o -name '*~' -o -name DEADJOE \ -o -name '*.orig' -o -name '*.rej' -o -name '*.bak' \ -o -name '.*.orig' -o -name .*.rej -o -name '.SUMS' \ -o -name TAGS -o \( -path '*/.deps/*' -a -name '*.P' \) \ \) -exec rm -f {} + \) -o \ \( -type d -a -name autom4te.cache -prune -exec rm -rf {} + \) \) debian/rules binary-arch dh binary-arch dh_update_autotools_config -a dh_autoreconf -a dh_auto_configure -a debian/rules override_dh_auto_build make[1]: Entering directory '/<>' dh_auto_build -- CFLAGS="-g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed" make -j1 "INSTALL=install --strip-program=true" PKG_CONFIG=aarch64-linux-gnu-pkg-config CXX=aarch64-linux-gnu-g\+\+ CC=aarch64-linux-gnu-gcc "CFLAGS=-g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed" make[2]: Entering directory '/<>' aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o wiggle.o wiggle.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o load.o load.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o parse.o parse.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o split.o split.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o split.o split.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o extract.o extract.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o diff.o diff.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o bestmatch.o bestmatch.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o ReadMe.o ReadMe.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o merge2.o merge2.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o vpatch.o vpatch.c aarch64-linux-gnu-gcc -g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -Wdate-time -D_FORTIFY_SOURCE=2 -c -o ccan/hash/hash.o ccan/hash/hash.c aarch64-linux-gnu-gcc -Wl,-z,relro -Wl,-z,now -Wl,--as-needed wiggle.o load.o parse.o split.o extract.o diff.o bestmatch.o ReadMe.o merge2.o vpatch.o ccan/hash/hash.o -lncurses -o wiggle nroff -man wiggle.1 > wiggle.man ./dotest /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.037555894 +0000 @@ -1,7 +0,0 @@ -<<<<<<< found -c -||||||| expected -a -======= -b ->>>>>>> replacement ./simple/trivial-conflict/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- Wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.054240953 +0000 @@ -1,15 +0,0 @@ -This is a test for show-merge -mode of wiggle where there -are other extraneous sections of text -that should not cause an extra wiggle - - -<<<<<<< found -Here is a line -||||||| expected -Here was a line -======= -Here will be a line ->>>>>>> replacement - -There is nothing else. ./simple/show-wiggle-3/Wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- Wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.071091927 +0000 @@ -1,13 +0,0 @@ -Openning line - -<<<<<<< found -content line with content -||||||| expected -content line content -======= -middle line content -&&&&&&& resolution -middle line with content ->>>>>>> replacement - -closing line ./simple/show-wiggle-2/Wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- Wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.089689144 +0000 @@ -1,14 +0,0 @@ - -<<<<<<< found -This is one line of the file -||||||| expected -This is 1 line of the file -======= -This is 1 line of the document -&&&&&&& resolution -This is one line of the document ->>>>>>> replacement - -I think this is another line - -So is this ./simple/show-wiggle-1/Wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.108298046 +0000 @@ -1,9 +0,0 @@ -This -is -the -current -version -of -the -file<<<---.|||=== that has changed--->>> - ./simple/multiple-add/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.126716924 +0000 @@ -1,15 +0,0 @@ -This -is -the -current -version -of -the -<<<<<<< found -file. -||||||| expected -file -======= -file that has changed ->>>>>>> replacement - ./simple/multiple-add/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.145127773 +0000 @@ -1,15 +0,0 @@ -This -is -the -current -version -of -the -<<<<<<< found -file. -||||||| expected -file -======= -file that has changed ->>>>>>> replacement - ./simple/multiple-add/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.163519339 +0000 @@ -1,2 +0,0 @@ -First line -last line ./simple/multideletes/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.182023945 +0000 @@ -1,2 +0,0 @@ -First line -last line ./simple/multideletes/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.200476080 +0000 @@ -1,4 +0,0 @@ -this is a file -with the word -<<<---two|||to===too--->>> which was -misspelt ./simple/conflictmixed/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.218999602 +0000 @@ -1,10 +0,0 @@ -this is a file -with the word -<<<<<<< found -two which is -||||||| expected -to which is -======= -too which was ->>>>>>> replacement -misspelt ./simple/conflictmixed/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.237394358 +0000 @@ -1,10 +0,0 @@ -this is a file -with the word -<<<<<<< found -two which is -||||||| expected -to which is -======= -too which was ->>>>>>> replacement -misspelt ./simple/conflictmixed/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- ldiff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.255841431 +0000 @@ -1,6 +0,0 @@ -@@ -1,4 +1,4 @@ - this is a file - with the word --two which is -+to which is - misspelt ./simple/conflictmixed/ldiff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.274373838 +0000 @@ -1,5 +0,0 @@ -@@ -1,4 +1,4 @@ - this is a file - with the word -|<<<--two-->>><<<++to++>>> which is - misspelt ./simple/conflictmixed/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.292754484 +0000 @@ -1,4 +0,0 @@ -this is a file -with the word -<<<---two|||to===too--->>> which is -misspelt ./simple/conflict/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.311080919 +0000 @@ -1,10 +0,0 @@ -this is a file -with the word -<<<<<<< found -two which is -||||||| expected -to which is -======= -too which is ->>>>>>> replacement -misspelt ./simple/conflict/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- ldiff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.329440928 +0000 @@ -1,6 +0,0 @@ -@@ -1,4 +1,4 @@ - this is a file - with the word --two which is -+to which is - misspelt ./simple/conflict/ldiff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.347691479 +0000 @@ -1,5 +0,0 @@ -@@ -1,4 +1,4 @@ - this is a file - with the word -|<<<--two-->>><<<++to++>>> which is - misspelt ./simple/conflict/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.366165107 +0000 @@ -1,5 +0,0 @@ -here -is -the -inaugural -file ./simple/changeafteradd/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.384537189 +0000 @@ -1,5 +0,0 @@ -This is a longish line that might be split up -and this is -a broken line -that might be -catenated ./simple/brokenlines/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.402977552 +0000 @@ -1,7 +0,0 @@ -@@ -1,5 +1,3 @@ -|This is a long line that <<<--might-->>><<<++has++>>> <<<--be -->>><<<++been -|++>>>broken -|and this is<<<-- -|-->>><<<++ ++>>>a broken line<<<-- -|-->>><<<++ ++>>>that <<<--might-->>><<<++will++>>> be<<<-- -|-->>><<<++ ++>>>joined ./simple/brokenlines/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.421322576 +0000 @@ -1,4 +0,0 @@ -this is a -line of text -that was added -to the file ./simple/bothadd/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.439640266 +0000 @@ -1,4 +0,0 @@ -this is a -line of text -that was added -to the file ./simple/bothadd/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.458019628 +0000 @@ -1,20 +0,0 @@ - -This is a base file -some changes are going to happen to it -but it has -several lines -so that alll -the changes -don't h... -I don't know waht I am saying. -This lion will have some modifications made. -but this one wont -stuf stuf stuff -thing thing -xxxxx -that is all -except -for -this -last -bit ./simple/base/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- ldiff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.476375451 +0000 @@ -1,25 +0,0 @@ -@@ -1,20 +1,21 @@ -- - This is a base file - some changes are going to happen to it - but it has -+had - several lines - so that alll - the changes - don't h... --I don't know waht I am saying. --This lion will have some changes made. -+I don't know what I am saying. -+This line will have some changes made. - but this one wont - stuf stuf stuff - thing thing - xxxxx - that is all - except - for - this - last - bit -+x ./simple/base/ldiff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.494691204 +0000 @@ -1,23 +0,0 @@ -@@ -1,20 +1,21 @@ -- - This is a base file - some changes are going to happen to it - but it has -+had - several lines - so that alll - the changes - don't h... -|I don't know <<<--waht-->>><<<++what++>>> I am saying. -|This <<<--lion-->>><<<++line++>>> will have some changes made. - but this one wont - stuf stuf stuff - thing thing - xxxxx - that is all - except - for - this - last - bit -+x ./simple/base/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.512882718 +0000 @@ -1,3 +0,0 @@ -This is the -current version of the file -which has already had the word 'current' updated. ./simple/already-applied/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.531310966 +0000 @@ -1,11 +0,0 @@ -<<<---1|||a===A--->>> -<<<---2|||b===B--->>> -<<<---3|||c===C--->>> -<<<---4|||d===D--->>> -<<<---5|||e===E--->>> -<<<---6|||f===F--->>> -<<<---7|||g===G--->>> -<<<---8|||h===H--->>> -<<<---9|||i===I--->>> -<<<---0|||j===J--->>> -yes ./simple/all-different/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.549878187 +0000 @@ -1,35 +0,0 @@ -<<<<<<< found -1 -2 -3 -4 -5 -6 -7 -8 -9 -0 -||||||| expected -a -b -c -d -e -f -g -h -i -j -======= -A -B -C -D -E -F -G -H -I -J ->>>>>>> replacement -yes ./simple/all-different/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.568391844 +0000 @@ -1,35 +0,0 @@ -<<<<<<< found -1 -2 -3 -4 -5 -6 -7 -8 -9 -0 -||||||| expected -a -b -c -d -e -f -g -h -i -j -======= -A -B -C -D -E -F -G -H -I -J ->>>>>>> replacement -yes ./simple/all-different/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.586806035 +0000 @@ -1,10 +0,0 @@ -<<<---1|||a===A--->>> -<<<---2|||b===B--->>> -<<<---3|||c===C--->>> -<<<---4|||d===D--->>> -<<<---5|||e===E--->>> -<<<---6|||f===F--->>> -<<<---7|||g===G--->>> -<<<---8|||h===H--->>> -<<<---9|||i===I--->>> -<<<---0|||j===J--->>> ./simple/all-different-2/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.605281894 +0000 @@ -1,34 +0,0 @@ -<<<<<<< found -1 -2 -3 -4 -5 -6 -7 -8 -9 -0 -||||||| expected -a -b -c -d -e -f -g -h -i -j -======= -A -B -C -D -E -F -G -H -I -J ->>>>>>> replacement ./simple/all-different-2/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.623741331 +0000 @@ -1,34 +0,0 @@ -<<<<<<< found -1 -2 -3 -4 -5 -6 -7 -8 -9 -0 -||||||| expected -a -b -c -d -e -f -g -h -i -j -======= -A -B -C -D -E -F -G -H -I -J ->>>>>>> replacement ./simple/all-different-2/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.642336383 +0000 @@ -1,1789 +0,0 @@ -/* - * mdadm - manage Linux "md" devices aka RAID arrays. - * - * Copyright (C) 2001-2012 Neil Brown - * - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * Author: Neil Brown - * Email: - * - * Additions for bitmap and write-behind RAID options, Copyright (C) 2003-2004, - * Paul Clements, SteelEye Technology, Inc. - */ - -#include "mdadm.h" -#include "md_p.h" -#include - - -static int scan_assemble(struct supertype *ss, - struct context *c, - struct mddev_ident *ident); -static int misc_scan(char devmode, struct context *c); -static int stop_scan(int verbose); -static int misc_list(struct mddev_dev *devlist, - struct mddev_ident *ident, - struct supertype *ss, struct context *c); - - -int main(int argc, char *argv[]) -{ - int mode = 0; - int opt; - int option_index; - int rv; - int i; - - unsigned long long array_size = 0; - unsigned long long data_offset = INVALID_SECTORS; - struct mddev_ident ident; - char *configfile = NULL; - int devmode = 0; - int bitmap_fd = -1; - struct mddev_dev *devlist = NULL; - struct mddev_dev **devlistend = & devlist; - struct mddev_dev *dv; - int devs_found = 0; - char *symlinks = NULL; - int grow_continue = 0; - /* autof indicates whether and how to create device node. - * bottom 3 bits are style. Rest (when shifted) are number of parts - * 0 - unset - * 1 - don't create (no) - * 2 - if is_standard, then create (yes) - * 3 - create as 'md' - reject is_standard mdp (md) - * 4 - create as 'mdp' - reject is_standard md (mdp) - * 5 - default to md if not is_standard (md in config file) - * 6 - default to mdp if not is_standard (part, or mdp in config file) - */ - struct context c = { - .require_homehost = 1, - }; - struct shape s = { - .level = UnSet, - .layout = UnSet, - .bitmap_chunk = UnSet, - }; - - char sys_hostname[256]; - char *mailaddr = NULL; - char *program = NULL; - int increments = 20; - int daemonise = 0; - char *pidfile = NULL; - int oneshot = 0; - int spare_sharing = 1; - struct supertype *ss = NULL; - int writemostly = 0; - char *shortopt = short_options; - int dosyslog = 0; - int rebuild_map = 0; - char *remove_path = NULL; - char *udev_filename = NULL; - - int print_help = 0; - FILE *outf; - - int mdfd = -1; - - srandom(time(0) ^ getpid()); - - ident.uuid_set=0; - ident.level = UnSet; - ident.raid_disks = UnSet; - ident.super_minor= UnSet; - ident.devices=0; - ident.spare_group = NULL; - ident.autof = 0; - ident.st = NULL; - ident.bitmap_fd = -1; - ident.bitmap_file = NULL; - ident.name[0] = 0; - ident.container = NULL; - ident.member = NULL; - - /* - * set first char of argv[0] to @. This is used by - * systemd to signal that the task was launched from - * initrd/initramfs and should be preserved during shutdown - */ - argv[0][0] = '@'; - - while ((option_index = -1) , - (opt=getopt_long(argc, argv, - shortopt, long_options, - &option_index)) != -1) { - int newmode = mode; - /* firstly, some mode-independent options */ - switch(opt) { - case HelpOptions: - print_help = 2; - continue; - case 'h': - print_help = 1; - continue; - - case 'V': - fputs(Version, stderr); - exit(0); - - case 'v': c.verbose++; - continue; - - case 'q': c.verbose--; - continue; - - case 'b': - if (mode == ASSEMBLE || mode == BUILD || mode == CREATE - || mode == GROW || mode == INCREMENTAL - || mode == MANAGE) - break; /* b means bitmap */ - case Brief: - c.brief = 1; - continue; - - case 'Y': c.export++; - continue; - - case HomeHost: - if (strcasecmp(optarg, "") == 0) - c.require_homehost = 0; - else - c.homehost = optarg; - continue; - - case Prefer: - if (c.prefer) - free(c.prefer); - if (asprintf(&c.prefer, "/%s/", optarg) <= 0) - c.prefer = NULL; - continue; - - case ':': - case '?': - fputs(Usage, stderr); - exit(2); - } - /* second, figure out the mode. - * Some options force the mode. Others - * set the mode if it isn't already - */ - - switch(opt) { - case ManageOpt: - newmode = MANAGE; - shortopt = short_bitmap_options; - break; - case 'a': - case Add: - case 'r': - case Remove: - case Replace: - case With: - case 'f': - case Fail: - case ReAdd: /* re-add */ - if (!mode) { - newmode = MANAGE; - shortopt = short_bitmap_options; - } - break; - - case 'A': newmode = ASSEMBLE; - shortopt = short_bitmap_auto_options; - break; - case 'B': newmode = BUILD; - shortopt = short_bitmap_auto_options; - break; - case 'C': newmode = CREATE; - shortopt = short_bitmap_auto_options; - break; - case 'F': newmode = MONITOR; - break; - case 'G': newmode = GROW; - shortopt = short_bitmap_options; - break; - case 'I': newmode = INCREMENTAL; - shortopt = short_bitmap_auto_options; - break; - case AutoDetect: - newmode = AUTODETECT; - break; - - case MiscOpt: - case 'D': - case 'E': - case 'X': - case 'Q': - case ExamineBB: - newmode = MISC; - break; - - case 'R': - case 'S': - case 'o': - case 'w': - case 'W': - case WaitOpt: - case Waitclean: - case DetailPlatform: - case KillSubarray: - case UpdateSubarray: - case UdevRules: - case KillOpt: - if (!mode) - newmode = MISC; - break; - - case NoSharing: - newmode = MONITOR; - break; - } - if (mode && newmode == mode) { - /* everybody happy ! */ - } else if (mode && newmode != mode) { - /* not allowed.. */ - pr_err(""); - if (option_index >= 0) - fprintf(stderr, "--%s", long_options[option_index].name); - else - fprintf(stderr, "-%c", opt); - fprintf(stderr, " would set mdadm mode to \"%s\", but it is already set to \"%s\".\n", - map_num(modes, newmode), - map_num(modes, mode)); - exit(2); - } else if (!mode && newmode) { - mode = newmode; - if (mode == MISC && devs_found) { - pr_err("No action given for %s in --misc mode\n", - devlist->devname); - fprintf(stderr," Action options must come before device names\n"); - exit(2); - } - } else { - /* special case of -c --help */ - if ((opt == 'c' || opt == ConfigFile) && - ( strncmp(optarg, "--h", 3)==0 || - strncmp(optarg, "-h", 2)==0)) { - fputs(Help_config, stdout); - exit(0); - } - - /* If first option is a device, don't force the mode yet */ - if (opt == 1) { - if (devs_found == 0) { - dv = xmalloc(sizeof(*dv)); - dv->devname = optarg; - dv->disposition = devmode; - dv->writemostly = writemostly; - dv->used = 0; - dv->next = NULL; - *devlistend = dv; - devlistend = &dv->next; - - devs_found++; - continue; - } - /* No mode yet, and this is the second device ... */ - pr_err("An option must be given to set the mode before a second device\n" - " (%s) is listed\n", optarg); - exit(2); - } - if (option_index >= 0) - pr_err("--%s", long_options[option_index].name); - else - pr_err("-%c", opt); - fprintf(stderr, " does not set the mode, and so cannot be the first option.\n"); - exit(2); - } - - /* if we just set the mode, then done */ - switch(opt) { - case ManageOpt: - case MiscOpt: - case 'A': - case 'B': - case 'C': - case 'F': - case 'G': - case 'I': - case AutoDetect: - continue; - } - if (opt == 1) { - /* an undecorated option - must be a device name. - */ - - if (devs_found > 0 && devmode == DetailPlatform) { - pr_err("controller may only be specified once. %s ignored\n", - optarg); - continue; - } - - if (devs_found > 0 && mode == MANAGE && !devmode) { - pr_err("Must give one of -a/-r/-f" - " for subsequent devices at %s\n", optarg); - exit(2); - } - if (devs_found > 0 && mode == GROW && !devmode) { - pr_err("Must give -a/--add for" - " devices to add: %s\n", optarg); - exit(2); - } - dv = xmalloc(sizeof(*dv)); - dv->devname = optarg; - dv->disposition = devmode; - dv->writemostly = writemostly; - dv->used = 0; - dv->next = NULL; - *devlistend = dv; - devlistend = &dv->next; - - devs_found++; - continue; - } - - /* We've got a mode, and opt is now something else which - * could depend on the mode */ -#define O(a,b) ((a<<16)|b) - switch (O(mode,opt)) { - case O(GROW,'c'): - case O(GROW,ChunkSize): - case O(CREATE,'c'): - case O(CREATE,ChunkSize): - case O(BUILD,'c'): /* chunk or rounding */ - case O(BUILD,ChunkSize): /* chunk or rounding */ - if (s.chunk) { - pr_err("chunk/rounding may only be specified once. " - "Second value is %s.\n", optarg); - exit(2); - } - s.chunk = parse_size(optarg); - if (s.chunk == INVALID_SECTORS || - s.chunk < 8 || (s.chunk&1)) { - pr_err("invalid chunk/rounding value: %s\n", - optarg); - exit(2); - } - /* Convert sectors to K */ - s.chunk /= 2; - continue; - - case O(INCREMENTAL, 'e'): - case O(CREATE,'e'): - case O(ASSEMBLE,'e'): - case O(MISC,'e'): /* set metadata (superblock) information */ - if (ss) { - pr_err("metadata information already given\n"); - exit(2); - } - for(i=0; !ss && superlist[i]; i++) - ss = superlist[i]->match_metadata_desc(optarg); - - if (!ss) { - pr_err("unrecognised metadata identifier: %s\n", optarg); - exit(2); - } - continue; - - case O(MANAGE,'W'): - case O(MANAGE,WriteMostly): - case O(BUILD,'W'): - case O(BUILD,WriteMostly): - case O(CREATE,'W'): - case O(CREATE,WriteMostly): - /* set write-mostly for following devices */ - writemostly = 1; - continue; - - case O(MANAGE,'w'): - /* clear write-mostly for following devices */ - writemostly = 2; - continue; - - - case O(GROW,'z'): - case O(CREATE,'z'): - case O(BUILD,'z'): /* size */ - if (s.size > 0) { - pr_err("size may only be specified once. " - "Second value is %s.\n", optarg); - exit(2); - } - if (strcmp(optarg, "max")==0) - s.size = MAX_SIZE; - else { - s.size = parse_size(optarg); - if (s.size == INVALID_SECTORS || - s.size < 8) { - pr_err("invalid size: %s\n", - optarg); - exit(2); - } - /* convert sectors to K */ - s.size /= 2; - } - continue; - - case O(GROW,'Z'): /* array size */ - if (array_size > 0) { - pr_err("array-size may only be specified once. " - "Second value is %s.\n", optarg); - exit(2); - } - if (strcmp(optarg, "max") == 0) - array_size = MAX_SIZE; - else { - array_size = parse_size(optarg); - if (array_size == 0 || - array_size == INVALID_SECTORS) { - pr_err("invalid array size: %s\n", - optarg); - exit(2); - } - } - continue; - - case O(CREATE,DataOffset): - case O(GROW,DataOffset): - if (data_offset != INVALID_SECTORS) { - fprintf(stderr, Name ": data-offset may only be specified one. " - "Second value is %s.\n", optarg); - exit(2); - } - if (mode == CREATE && - strcmp(optarg, "variable") == 0) - data_offset = VARIABLE_OFFSET; - else - data_offset = parse_size(optarg); - if (data_offset == INVALID_SECTORS) { - fprintf(stderr, Name ": invalid data-offset: %s\n", - optarg); - exit(2); - } - continue; - - case O(GROW,'l'): - case O(CREATE,'l'): - case O(BUILD,'l'): /* set raid level*/ - if (s.level != UnSet) { - pr_err("raid level may only be set once. " - "Second value is %s.\n", optarg); - exit(2); - } - s.level = map_name(pers, optarg); - if (s.level == UnSet) { - pr_err("invalid raid level: %s\n", - optarg); - exit(2); - } - if (s.level != 0 && s.level != LEVEL_LINEAR && s.level != 1 && - s.level != LEVEL_MULTIPATH && s.level != LEVEL_FAULTY && - s.level != 10 && - mode == BUILD) { - pr_err("Raid level %s not permitted with --build.\n", - optarg); - exit(2); - } - if (s.sparedisks > 0 && s.level < 1 && s.level >= -1) { - pr_err("raid level %s is incompatible with spare-devices setting.\n", - optarg); - exit(2); - } - ident.level = s.level; - continue; - - case O(GROW, 'p'): /* new layout */ - case O(GROW, Layout): - if (s.layout_str) { - pr_err("layout may only be sent once. " - "Second value was %s\n", optarg); - exit(2); - } - s.layout_str = optarg; - /* 'Grow' will parse the value */ - continue; - - case O(CREATE,'p'): /* raid5 layout */ - case O(CREATE,Layout): - case O(BUILD,'p'): /* faulty layout */ - case O(BUILD,Layout): - if (s.layout != UnSet) { - pr_err("layout may only be sent once. " - "Second value was %s\n", optarg); - exit(2); - } - switch(s.level) { - default: - pr_err("layout not meaningful for %s arrays.\n", - map_num(pers, s.level)); - exit(2); - case UnSet: - pr_err("raid level must be given before layout.\n"); - exit(2); - - case 5: - s.layout = map_name(r5layout, optarg); - if (s.layout==UnSet) { - pr_err("layout %s not understood for raid5.\n", - optarg); - exit(2); - } - break; - case 6: - s.layout = map_name(r6layout, optarg); - if (s.layout==UnSet) { - pr_err("layout %s not understood for raid6.\n", - optarg); - exit(2); - } - break; - - case 10: - s.layout = parse_layout_10(optarg); - if (s.layout < 0) { - pr_err("layout for raid10 must be 'nNN', 'oNN' or 'fNN' where NN is a number, not %s\n", optarg); - exit(2); - } - break; - case LEVEL_FAULTY: - /* Faulty - * modeNNN - */ - s.layout = parse_layout_faulty(optarg); - if (s.layout == -1) { - pr_err("layout %s not understood for faulty.\n", - optarg); - exit(2); - } - break; - } - continue; - - case O(CREATE,AssumeClean): - case O(BUILD,AssumeClean): /* assume clean */ - case O(GROW,AssumeClean): - s.assume_clean = 1; - continue; - - case O(GROW,'n'): - case O(CREATE,'n'): - case O(BUILD,'n'): /* number of raid disks */ - if (s.raiddisks) { - pr_err("raid-devices set twice: %d and %s\n", - s.raiddisks, optarg); - exit(2); - } - s.raiddisks = parse_num(optarg); - if (s.raiddisks <= 0) { - pr_err("invalid number of raid devices: %s\n", - optarg); - exit(2); - } - ident.raid_disks = s.raiddisks; - continue; - - case O(CREATE,'x'): /* number of spare (eXtra) disks */ - if (s.sparedisks) { - pr_err("spare-devices set twice: %d and %s\n", - s.sparedisks, optarg); - exit(2); - } - if (s.level != UnSet && s.level <= 0 && s.level >= -1) { - pr_err("spare-devices setting is incompatible with raid level %d\n", - s.level); - exit(2); - } - s.sparedisks = parse_num(optarg); - if (s.sparedisks < 0) { - pr_err("invalid number of spare-devices: %s\n", - optarg); - exit(2); - } - continue; - - case O(CREATE,'a'): - case O(CREATE,Auto): - case O(BUILD,'a'): - case O(BUILD,Auto): - case O(INCREMENTAL,'a'): - case O(INCREMENTAL,Auto): - case O(ASSEMBLE,'a'): - case O(ASSEMBLE,Auto): /* auto-creation of device node */ - c.autof = parse_auto(optarg, "--auto flag", 0); - continue; - - case O(CREATE,Symlinks): - case O(BUILD,Symlinks): - case O(ASSEMBLE,Symlinks): /* auto creation of symlinks in /dev to /dev/md */ - symlinks = optarg; - continue; - - case O(BUILD,'f'): /* force honouring '-n 1' */ - case O(BUILD,Force): /* force honouring '-n 1' */ - case O(GROW,'f'): /* ditto */ - case O(GROW,Force): /* ditto */ - case O(CREATE,'f'): /* force honouring of device list */ - case O(CREATE,Force): /* force honouring of device list */ - case O(ASSEMBLE,'f'): /* force assembly */ - case O(ASSEMBLE,Force): /* force assembly */ - case O(MISC,'f'): /* force zero */ - case O(MISC,Force): /* force zero */ - case O(MANAGE,Force): /* add device which is too large */ - c.force=1; - continue; - /* now for the Assemble options */ - case O(ASSEMBLE, FreezeReshape): /* Freeze reshape during - * initrd phase */ - case O(INCREMENTAL, FreezeReshape): - c.freeze_reshape = 1; - continue; - case O(CREATE,'u'): /* uuid of array */ - case O(ASSEMBLE,'u'): /* uuid of array */ - if (ident.uuid_set) { - pr_err("uuid cannot be set twice. " - "Second value %s.\n", optarg); - exit(2); - } - if (parse_uuid(optarg, ident.uuid)) - ident.uuid_set = 1; - else { - pr_err("Bad uuid: %s\n", optarg); - exit(2); - } - continue; - - case O(CREATE,'N'): - case O(ASSEMBLE,'N'): - case O(MISC,'N'): - if (ident.name[0]) { - pr_err("name cannot be set twice. " - "Second value %s.\n", optarg); - exit(2); - } - if (mode == MISC && !c.subarray) { - pr_err("-N/--name only valid with --update-subarray in misc mode\n"); - exit(2); - } - if (strlen(optarg) > 32) { - pr_err("name '%s' is too long, 32 chars max.\n", - optarg); - exit(2); - } - strcpy(ident.name, optarg); - continue; - - case O(ASSEMBLE,'m'): /* super-minor for array */ - case O(ASSEMBLE,SuperMinor): - if (ident.super_minor != UnSet) { - pr_err("super-minor cannot be set twice. " - "Second value: %s.\n", optarg); - exit(2); - } - if (strcmp(optarg, "dev")==0) - ident.super_minor = -2; - else { - ident.super_minor = parse_num(optarg); - if (ident.super_minor < 0) { - pr_err("Bad super-minor number: %s.\n", optarg); - exit(2); - } - } - continue; - - case O(ASSEMBLE,'o'): - case O(MANAGE,'o'): - case O(CREATE,'o'): - c.readonly = 1; - continue; - - case O(ASSEMBLE,'U'): /* update the superblock */ - case O(MISC,'U'): - if (c.update) { - pr_err("Can only update one aspect" - " of superblock, both %s and %s given.\n", - c.update, optarg); - exit(2); - } - if (mode == MISC && !c.subarray) { - pr_err("Only subarrays can be" - " updated in misc mode\n"); - exit(2); - } - c.update = optarg; - if (strcmp(c.update, "sparc2.2")==0) - continue; - if (strcmp(c.update, "super-minor") == 0) - continue; - if (strcmp(c.update, "summaries")==0) - continue; - if (strcmp(c.update, "resync")==0) - continue; - if (strcmp(c.update, "uuid")==0) - continue; - if (strcmp(c.update, "name")==0) - continue; - if (strcmp(c.update, "homehost")==0) - continue; - if (strcmp(c.update, "devicesize")==0) - continue; - if (strcmp(c.update, "no-bitmap")==0) - continue; - if (strcmp(c.update, "bbl") == 0) - continue; - if (strcmp(c.update, "no-bbl") == 0) - continue; - if (strcmp(c.update, "byteorder")==0) { - if (ss) { - pr_err("must not set metadata" - " type with --update=byteorder.\n"); - exit(2); - } - for(i=0; !ss && superlist[i]; i++) - ss = superlist[i]->match_metadata_desc( - "0.swap"); - if (!ss) { - pr_err("INTERNAL ERROR" - " cannot find 0.swap\n"); - exit(2); - } - - continue; - } - if (strcmp(c.update,"?") == 0 || - strcmp(c.update, "help") == 0) { - outf = stdout; - fprintf(outf, Name ": "); - } else { - outf = stderr; - fprintf(outf, - Name ": '--update=%s' is invalid. ", - c.update); - } - fprintf(outf, "Valid --update options are:\n" - " 'sparc2.2', 'super-minor', 'uuid', 'name', 'resync',\n" - " 'summaries', 'homehost', 'byteorder', 'devicesize',\n" - " 'no-bitmap'\n"); - exit(outf == stdout ? 0 : 2); - - case O(MANAGE,'U'): - /* update=devicesize is allowed with --re-add */ - if (devmode != 'A') { - pr_err("--update in Manage mode only" - " allowed with --re-add.\n"); - exit(1); - } - if (c.update) { - pr_err("Can only update one aspect" - " of superblock, both %s and %s given.\n", - c.update, optarg); - exit(2); - } - c.update = optarg; - if (strcmp(c.update, "devicesize") != 0 && - strcmp(c.update, "bbl") != 0 && - strcmp(c.update, "no-bbl") != 0) { - pr_err("only 'devicesize', 'bbl' and 'no-bbl' can be" - " updated with --re-add\n"); - exit(2); - } - continue; - - case O(INCREMENTAL,NoDegraded): - pr_err("--no-degraded is deprecated in Incremental mode\n"); - case O(ASSEMBLE,NoDegraded): /* --no-degraded */ - c.runstop = -1; /* --stop isn't allowed for --assemble, - * so we overload slightly */ - continue; - - case O(ASSEMBLE,'c'): - case O(ASSEMBLE,ConfigFile): - case O(INCREMENTAL, 'c'): - case O(INCREMENTAL, ConfigFile): - case O(MISC, 'c'): - case O(MISC, ConfigFile): - case O(MONITOR,'c'): - case O(MONITOR,ConfigFile): - if (configfile) { - pr_err("configfile cannot be set twice. " - "Second value is %s.\n", optarg); - exit(2); - } - configfile = optarg; - set_conffile(configfile); - /* FIXME possibly check that config file exists. Even parse it */ - continue; - case O(ASSEMBLE,'s'): /* scan */ - case O(MISC,'s'): - case O(MONITOR,'s'): - case O(INCREMENTAL,'s'): - c.scan = 1; - continue; - - case O(MONITOR,'m'): /* mail address */ - case O(MONITOR,EMail): - if (mailaddr) - pr_err("only specify one mailaddress. %s ignored.\n", - optarg); - else - mailaddr = optarg; - continue; - - case O(MONITOR,'p'): /* alert program */ - case O(MONITOR,ProgramOpt): /* alert program */ - if (program) - pr_err("only specify one alter program. %s ignored.\n", - optarg); - else - program = optarg; - continue; - - case O(MONITOR,'r'): /* rebuild increments */ - case O(MONITOR,Increment): - increments = atoi(optarg); - if (increments > 99 || increments < 1) { - pr_err("please specify positive integer between 1 and 99 as rebuild increments.\n"); - exit(2); - } - continue; - - case O(MONITOR,'d'): /* delay in seconds */ - case O(GROW, 'd'): - case O(BUILD,'d'): /* delay for bitmap updates */ - case O(CREATE,'d'): - if (c.delay) - pr_err("only specify delay once. %s ignored.\n", - optarg); - else { - c.delay = parse_num(optarg); - if (c.delay < 1) { - pr_err("invalid delay: %s\n", - optarg); - exit(2); - } - } - continue; - case O(MONITOR,'f'): /* daemonise */ - case O(MONITOR,Fork): - daemonise = 1; - continue; - case O(MONITOR,'i'): /* pid */ - if (pidfile) - pr_err("only specify one pid file. %s ignored.\n", - optarg); - else - pidfile = optarg; - continue; - case O(MONITOR,'1'): /* oneshot */ - oneshot = 1; - spare_sharing = 0; - continue; - case O(MONITOR,'t'): /* test */ - c.test = 1; - continue; - case O(MONITOR,'y'): /* log messages to syslog */ - openlog("mdadm", LOG_PID, SYSLOG_FACILITY); - dosyslog = 1; - continue; - case O(MONITOR, NoSharing): - spare_sharing = 0; - continue; - - /* now the general management options. Some are applicable - * to other modes. None have arguments. - */ - case O(GROW,'a'): - case O(GROW,Add): - case O(MANAGE,'a'): - case O(MANAGE,Add): /* add a drive */ - devmode = 'a'; - continue; - case O(MANAGE,ReAdd): - devmode = 'A'; - continue; - case O(MANAGE,'r'): /* remove a drive */ - case O(MANAGE,Remove): - devmode = 'r'; - continue; - case O(MANAGE,'f'): /* set faulty */ - case O(MANAGE,Fail): - case O(INCREMENTAL,'f'): - case O(INCREMENTAL,Remove): - case O(INCREMENTAL,Fail): /* r for incremental is taken, use f - * even though we will both fail and - * remove the device */ - devmode = 'f'; - continue; - case O(MANAGE,Replace): - /* Mark these devices for replacement */ - devmode = 'R'; - continue; - case O(MANAGE,With): - /* These are the replacements to use */ - if (devmode != 'R') { - pr_err("--with must follow --replace\n"); - exit(2); - } - devmode = 'W'; - continue; - case O(INCREMENTAL,'R'): - case O(MANAGE,'R'): - case O(ASSEMBLE,'R'): - case O(BUILD,'R'): - case O(CREATE,'R'): /* Run the array */ - if (c.runstop < 0) { - pr_err("Cannot both Stop and Run an array\n"); - exit(2); - } - c.runstop = 1; - continue; - case O(MANAGE,'S'): - if (c.runstop > 0) { - pr_err("Cannot both Run and Stop an array\n"); - exit(2); - } - c.runstop = -1; - continue; - case O(MANAGE,'t'): - c.test = 1; - continue; - - case O(MISC,'Q'): - case O(MISC,'D'): - case O(MISC,'E'): - case O(MISC,KillOpt): - case O(MISC,'R'): - case O(MISC,'S'): - case O(MISC,'X'): - case O(MISC, ExamineBB): - case O(MISC,'o'): - case O(MISC,'w'): - case O(MISC,'W'): - case O(MISC, WaitOpt): - case O(MISC, Waitclean): - case O(MISC, DetailPlatform): - case O(MISC, KillSubarray): - case O(MISC, UpdateSubarray): - if (opt == KillSubarray || opt == UpdateSubarray) { - if (c.subarray) { - pr_err("subarray can only" - " be specified once\n"); - exit(2); - } - c.subarray = optarg; - } - if (devmode && devmode != opt && - (devmode == 'E' || (opt == 'E' && devmode != 'Q'))) { - pr_err("--examine/-E cannot be given with "); - if (devmode == 'E') { - if (option_index >= 0) - fprintf(stderr, "--%s\n", - long_options[option_index].name); - else - fprintf(stderr, "-%c\n", opt); - } else if (isalpha(devmode)) - fprintf(stderr, "-%c\n", devmode); - else - fprintf(stderr, "previous option\n"); - exit(2); - } - devmode = opt; - continue; - case O(MISC, UdevRules): - if (devmode && devmode != opt) { - pr_err("--udev-rules must" - " be the only option.\n"); - } else { - if (udev_filename) - pr_err("only specify one udev " - "rule filename. %s ignored.\n", - optarg); - else - udev_filename = optarg; - } - devmode = opt; - continue; - case O(MISC,'t'): - c.test = 1; - continue; - - case O(MISC, Sparc22): - if (devmode != 'E') { - pr_err("--sparc2.2 only allowed with --examine\n"); - exit(2); - } - c.SparcAdjust = 1; - continue; - - case O(ASSEMBLE,'b'): /* here we simply set the bitmap file */ - case O(ASSEMBLE,Bitmap): - if (!optarg) { - pr_err("bitmap file needed with -b in --assemble mode\n"); - exit(2); - } - if (strcmp(optarg, "internal")==0) { - pr_err("there is no need to specify --bitmap when assembling arrays with internal bitmaps\n"); - continue; - } - bitmap_fd = open(optarg, O_RDWR); - if (!*optarg || bitmap_fd < 0) { - pr_err("cannot open bitmap file %s: %s\n", optarg, strerror(errno)); - exit(2); - } - ident.bitmap_fd = bitmap_fd; /* for Assemble */ - continue; - - case O(ASSEMBLE, BackupFile): - case O(GROW, BackupFile): - /* Specify a file into which grow might place a backup, - * or from which assemble might recover a backup - */ - if (c.backup_file) { - pr_err("backup file already specified, rejecting %s\n", optarg); - exit(2); - } - c.backup_file = optarg; - continue; - - case O(GROW, Continue): - /* Continue interrupted grow - */ - grow_continue = 1; - continue; - case O(ASSEMBLE, InvalidBackup): - /* Acknowledge that the backupfile is invalid, but ask - * to continue anyway - */ - c.invalid_backup = 1; - continue; - - case O(BUILD,'b'): - case O(BUILD,Bitmap): - case O(CREATE,'b'): - case O(CREATE,Bitmap): /* here we create the bitmap */ - if (strcmp(optarg, "none") == 0) { - pr_err("'--bitmap none' only" - " supported for --grow\n"); - exit(2); - } - /* FALL THROUGH */ - case O(GROW,'b'): - case O(GROW,Bitmap): - if (strcmp(optarg, "internal")== 0 || - strcmp(optarg, "none")== 0 || - strchr(optarg, '/') != NULL) { - s.bitmap_file = optarg; - continue; - } - /* probable typo */ - pr_err("bitmap file must contain a '/', or be 'internal', or 'none'\n" - " not '%s'\n", optarg); - exit(2); - - case O(GROW,BitmapChunk): - case O(BUILD,BitmapChunk): - case O(CREATE,BitmapChunk): /* bitmap chunksize */ - s.bitmap_chunk = parse_size(optarg); - if (s.bitmap_chunk == 0 || - s.bitmap_chunk == INVALID_SECTORS || - s.bitmap_chunk & (s.bitmap_chunk - 1)) { - pr_err("invalid bitmap chunksize: %s\n", - optarg); - exit(2); - } - s.bitmap_chunk = s.bitmap_chunk * 512; - continue; - - case O(GROW, WriteBehind): - case O(BUILD, WriteBehind): - case O(CREATE, WriteBehind): /* write-behind mode */ - s.write_behind = DEFAULT_MAX_WRITE_BEHIND; - if (optarg) { - s.write_behind = parse_num(optarg); - if (s.write_behind < 0 || - s.write_behind > 16383) { - pr_err("Invalid value for maximum outstanding write-behind writes: %s.\n\tMust be between 0 and 16383.\n", optarg); - exit(2); - } - } - continue; - - case O(INCREMENTAL, 'r'): - case O(INCREMENTAL, RebuildMapOpt): - rebuild_map = 1; - continue; - case O(INCREMENTAL, IncrementalPath): - remove_path = optarg; - continue; - } - /* We have now processed all the valid options. Anything else is - * an error - */ - if (option_index > 0) - pr_err(":option --%s not valid in %s mode\n", - long_options[option_index].name, - map_num(modes, mode)); - else - pr_err("option -%c not valid in %s mode\n", - opt, map_num(modes, mode)); - exit(2); - - } - - if (print_help) { - char *help_text; - if (print_help == 2) - help_text = OptionHelp; - else - help_text = mode_help[mode]; - if (help_text == NULL) - help_text = Help; - fputs(help_text,stdout); - exit(0); - } - - if (!mode && devs_found) { - mode = MISC; - devmode = 'Q'; - if (devlist->disposition == 0) - devlist->disposition = devmode; - } - if (!mode) { - fputs(Usage, stderr); - exit(2); - } - - if (symlinks) { - struct createinfo *ci = conf_get_create_info(); - - if (strcasecmp(symlinks, "yes") == 0) - ci->symlinks = 1; - else if (strcasecmp(symlinks, "no") == 0) - ci->symlinks = 0; - else { - pr_err("option --symlinks must be 'no' or 'yes'\n"); - exit(2); - } - } - /* Ok, got the option parsing out of the way - * hopefully it's mostly right but there might be some stuff - * missing - * - * That is mosty checked in the per-mode stuff but... - * - * For @,B,C and A without -s, the first device listed must be - * an md device. We check that here and open it. - */ - - if (mode == MANAGE || mode == BUILD || mode == CREATE - || mode == GROW - || (mode == ASSEMBLE && ! c.scan)) { - if (devs_found < 1) { - pr_err("an md device must be given in this mode\n"); - exit(2); - } - if ((int)ident.super_minor == -2 && c.autof) { - pr_err("--super-minor=dev is incompatible with --auto\n"); - exit(2); - } - if (mode == MANAGE || mode == GROW) { - mdfd = open_mddev(devlist->devname, 1); - if (mdfd < 0) - exit(1); - } else - /* non-existent device is OK */ - mdfd = open_mddev(devlist->devname, 0); - if (mdfd == -2) { - pr_err("device %s exists but is not an " - "md array.\n", devlist->devname); - exit(1); - } - if ((int)ident.super_minor == -2) { - struct stat stb; - if (mdfd < 0) { - pr_err("--super-minor=dev given, and " - "listed device %s doesn't exist.\n", - devlist->devname); - exit(1); - } - fstat(mdfd, &stb); - ident.super_minor = minor(stb.st_rdev); - } - if (mdfd >= 0 && mode != MANAGE && mode != GROW) { - /* We don't really want this open yet, we just might - * have wanted to check some things - */ - close(mdfd); - mdfd = -1; - } - } - - if (s.raiddisks) { - if (s.raiddisks == 1 && !c.force && s.level != LEVEL_FAULTY) { - pr_err("'1' is an unusual number of drives for an array, so it is probably\n" - " a mistake. If you really mean it you will need to specify --force before\n" - " setting the number of drives.\n"); - exit(2); - } - } - - if (c.homehost == NULL) - c.homehost = conf_get_homehost(&c.require_homehost); - if (c.homehost == NULL || strcasecmp(c.homehost, "")==0) { - if (gethostname(sys_hostname, sizeof(sys_hostname)) == 0) { - sys_hostname[sizeof(sys_hostname)-1] = 0; - c.homehost = sys_hostname; - } - } - if (c.homehost && (!c.homehost[0] || strcasecmp(c.homehost, "") == 0)) { - c.homehost = NULL; - c.require_homehost = 0; - } - - if ((mode == MISC && devmode == 'E') - || (mode == MONITOR && spare_sharing == 0)) - /* Anyone may try this */; - else if (geteuid() != 0) { - pr_err("must be super-user to perform this action\n"); - exit(1); - } - - ident.autof = c.autof; - - if (c.scan && c.verbose < 2) - /* --scan implied --brief unless -vv */ - c.brief = 1; - - rv = 0; - switch(mode) { - case MANAGE: - /* readonly, add/remove, readwrite, runstop */ - if (c.readonly > 0) - rv = Manage_ro(devlist->devname, mdfd, c.readonly); - if (!rv && devs_found>1) - rv = Manage_subdevs(devlist->devname, mdfd, - devlist->next, c.verbose, c.test, - c.update, c.force); - if (!rv && c.readonly < 0) - rv = Manage_ro(devlist->devname, mdfd, c.readonly); - if (!rv && c.runstop) - rv = Manage_runstop(devlist->devname, mdfd, c.runstop, c.verbose, 0); - break; - case ASSEMBLE: - if (devs_found == 1 && ident.uuid_set == 0 && - ident.super_minor == UnSet && ident.name[0] == 0 && !c.scan ) { - /* Only a device has been given, so get details from config file */ - struct mddev_ident *array_ident = conf_get_ident(devlist->devname); - if (array_ident == NULL) { - pr_err("%s not identified in config file.\n", - devlist->devname); - rv |= 1; - if (mdfd >= 0) - close(mdfd); - } else { - if (array_ident->autof == 0) - array_ident->autof = c.autof; - rv |= Assemble(ss, devlist->devname, array_ident, - NULL, &c); - } - } else if (!c.scan) - rv = Assemble(ss, devlist->devname, &ident, - devlist->next, &c); - else if (devs_found > 0) { - if (c.update && devs_found > 1) { - pr_err("can only update a single array at a time\n"); - exit(1); - } - if (c.backup_file && devs_found > 1) { - pr_err("can only assemble a single array when providing a backup file.\n"); - exit(1); - } - for (dv = devlist ; dv ; dv=dv->next) { - struct mddev_ident *array_ident = conf_get_ident(dv->devname); - if (array_ident == NULL) { - pr_err("%s not identified in config file.\n", - dv->devname); - rv |= 1; - continue; - } - if (array_ident->autof == 0) - array_ident->autof = c.autof; - rv |= Assemble(ss, dv->devname, array_ident, - NULL, &c); - } - } else { - if (c.update) { - pr_err("--update not meaningful with a --scan assembly.\n"); - exit(1); - } - if (c.backup_file) { - pr_err("--backup_file not meaningful with a --scan assembly.\n"); - exit(1); - } - rv = scan_assemble(ss, &c, &ident); - } - - break; - case BUILD: - if (c.delay == 0) - c.delay = DEFAULT_BITMAP_DELAY; - if (s.write_behind && !s.bitmap_file) { - pr_err("write-behind mode requires a bitmap.\n"); - rv = 1; - break; - } - if (s.raiddisks == 0) { - pr_err("no raid-devices specified.\n"); - rv = 1; - break; - } - - if (s.bitmap_file) { - if (strcmp(s.bitmap_file, "internal")==0) { - pr_err("'internal' bitmaps not supported with --build\n"); - rv |= 1; - break; - } - } - rv = Build(devlist->devname, devlist->next, &s, &c); - break; - case CREATE: - if (c.delay == 0) - c.delay = DEFAULT_BITMAP_DELAY; - if (s.write_behind && !s.bitmap_file) { - pr_err("write-behind mode requires a bitmap.\n"); - rv = 1; - break; - } - if (s.raiddisks == 0) { - pr_err("no raid-devices specified.\n"); - rv = 1; - break; - } - - rv = Create(ss, devlist->devname, - ident.name, ident.uuid_set ? ident.uuid : NULL, - devs_found-1, devlist->next, - &s, &c, data_offset); - break; - case MISC: - if (devmode == 'E') { - if (devlist == NULL && !c.scan) { - pr_err("No devices to examine\n"); - exit(2); - } - if (devlist == NULL) - devlist = conf_get_devs(); - if (devlist == NULL) { - pr_err("No devices listed in %s\n", configfile?configfile:DefaultConfFile); - exit(1); - } - rv = Examine(devlist, &c, ss); - } else if (devmode == DetailPlatform) { - rv = Detail_Platform(ss ? ss->ss : NULL, ss ? c.scan : 1, - c.verbose, c.export, - devlist ? devlist->devname : NULL); - } else if (devlist == NULL) { - if (devmode == 'S' && c.scan) - rv = stop_scan(c.verbose); - else if ((devmode == 'D' || devmode == Waitclean) && c.scan) - rv = misc_scan(devmode, &c); - else if (devmode == UdevRules) - rv = Write_rules(udev_filename); - else { - pr_err("No devices given.\n"); - exit(2); - } - } else - rv = misc_list(devlist, &ident, ss, &c); - break; - case MONITOR: - if (!devlist && !c.scan) { - pr_err("Cannot monitor: need --scan or at least one device\n"); - rv = 1; - break; - } - if (pidfile && !daemonise) { - pr_err("Cannot write a pid file when not in daemon mode\n"); - rv = 1; - break; - } - if (c.delay == 0) { - if (get_linux_version() > 2006016) - /* mdstat responds to poll */ - c.delay = 1000; - else - c.delay = 60; - } - if (c.delay == 0) - c.delay = 60; - rv= Monitor(devlist, mailaddr, program, - &c, daemonise, oneshot, - dosyslog, pidfile, increments, - spare_sharing); - break; - - case GROW: - if (array_size > 0) { - /* alway impose array size first, independent of - * anything else - * Do not allow level or raid_disks changes at the - * same time as that can be irreversibly destructive. - */ - struct mdinfo sra; - int err; - if (s.raiddisks || s.level != UnSet) { - pr_err("cannot change array size in same operation " - "as changing raiddisks or level.\n" - " Change size first, then check that data is still intact.\n"); - rv = 1; - break; - } - sysfs_init(&sra, mdfd, 0); - if (array_size == MAX_SIZE) - err = sysfs_set_str(&sra, NULL, "array_size", "default"); - else - err = sysfs_set_num(&sra, NULL, "array_size", array_size / 2); - if (err < 0) { - if (errno == E2BIG) - pr_err("--array-size setting" - " is too large.\n"); - else - pr_err("current kernel does" - " not support setting --array-size\n"); - rv = 1; - break; - } - } - if (devs_found > 1 && s.raiddisks == 0) { - /* must be '-a'. */ - if (s.size > 0 || s.chunk || s.layout_str != NULL || s.bitmap_file) { - pr_err("--add cannot be used with " - "other geometry changes in --grow mode\n"); - rv = 1; - break; - } - for (dv=devlist->next; dv ; dv=dv->next) { - rv = Grow_Add_device(devlist->devname, mdfd, - dv->devname); - if (rv) - break; - } - } else if (s.bitmap_file) { - if (s.size > 0 || s.raiddisks || s.chunk || - s.layout_str != NULL || devs_found > 1) { - pr_err("--bitmap changes cannot be " - "used with other geometry changes " - "in --grow mode\n"); - rv = 1; - break; - } - if (c.delay == 0) - c.delay = DEFAULT_BITMAP_DELAY; - rv = Grow_addbitmap(devlist->devname, mdfd, &c, &s); - } else if (grow_continue) - rv = Grow_continue_command(devlist->devname, - mdfd, c.backup_file, - c.verbose); - else if (s.size > 0 || s.raiddisks || s.layout_str != NULL - || s.chunk != 0 || s.level != UnSet) { - rv = Grow_reshape(devlist->devname, mdfd, - devlist->next, - data_offset, &c, &s); - } else if (array_size == 0) - pr_err("no changes to --grow\n"); - break; - case INCREMENTAL: - if (rebuild_map) { - RebuildMap(); - } - if (c.scan) { - if (c.runstop <= 0) { - pr_err("--incremental --scan meaningless without --run.\n"); - break; - } - if (devmode == 'f') { - pr_err("--incremental --scan --fail not supported.\n"); - break; - } - rv = IncrementalScan(c.verbose); - } - if (!devlist) { - if (!rebuild_map && !c.scan) { - pr_err("--incremental requires a device.\n"); - rv = 1; - } - break; - } - if (devlist->next) { - pr_err("--incremental can only handle one device.\n"); - rv = 1; - break; - } - if (devmode == 'f') - rv = IncrementalRemove(devlist->devname, remove_path, - c.verbose); - else - rv = Incremental(devlist->devname, &c, ss); - break; - case AUTODETECT: - autodetect(); - break; - } - exit(rv); -} - -static int scan_assemble(struct supertype *ss, - struct context *c, - struct mddev_ident *ident) -{ - struct mddev_ident *a, *array_list = conf_get_ident(NULL); - struct mddev_dev *devlist = conf_get_devs(); - struct map_ent *map = NULL; - int cnt = 0; - int rv = 0; - int failures, successes; - - if (conf_verify_devnames(array_list)) { - pr_err("Duplicate MD device names in " - "conf file were found.\n"); - return 1; - } - if (devlist == NULL) { - pr_err("No devices listed in conf file were found.\n"); - return 1; - } - for (a = array_list; a ; a = a->next) { - a->assembled = 0; - if (a->autof == 0) - a->autof = c->autof; - } - if (map_lock(&map)) - pr_err("%s: failed to get " - "exclusive lock on mapfile\n", - __func__); - do { - failures = 0; - successes = 0; - rv = 0; - for (a = array_list; a ; a = a->next) { - int r; - if (a->assembled) - continue; - if (a->devname && - strcasecmp(a->devname, "") == 0) - continue; - - r = Assemble(ss, a->devname, - a, NULL, c); - if (r == 0) { - a->assembled = 1; - successes++; - } else - failures++; - rv |= r; - cnt++; - } - } while (failures && successes); - if (c->homehost && cnt == 0) { - /* Maybe we can auto-assemble something. - * Repeatedly call Assemble in auto-assemble mode - * until it fails - */ - int rv2; - int acnt; - ident->autof = c->autof; - do { - struct mddev_dev *devlist = conf_get_devs(); - acnt = 0; - do { - rv2 = Assemble(ss, NULL, - ident, - devlist, c); - if (rv2==0) { - cnt++; - acnt++; - } - } while (rv2!=2); - /* Incase there are stacked devices, we need to go around again */ - } while (acnt); - if (cnt == 0 && rv == 0) { - pr_err("No arrays found in config file or automatically\n"); - rv = 1; - } else if (cnt) - rv = 0; - } else if (cnt == 0 && rv == 0) { - pr_err("No arrays found in config file\n"); - rv = 1; - } - map_unlock(&map); - return rv; -} - -static int misc_scan(char devmode, struct context *c) -{ - /* apply --detail or --wait-clean to - * all devices in /proc/mdstat - */ - struct mdstat_ent *ms = mdstat_read(0, 1); - struct mdstat_ent *e; - struct map_ent *map = NULL; - int members; - int rv = 0; - - for (members = 0; members <= 1; members++) { - for (e=ms ; e ; e=e->next) { - char *name; - struct map_ent *me; - int member = e->metadata_version && - strncmp(e->metadata_version, - "external:/", 10) == 0; - if (members != member) - continue; - me = map_by_devnum(&map, e->devnum); - if (me && me->path - && strcmp(me->path, "/unknown") != 0) - name = me->path; - else - name = get_md_name(e->devnum); - - if (!name) { - pr_err("cannot find device file for %s\n", - e->dev); - continue; - } - if (devmode == 'D') - rv |= Detail(name, c); - else - rv |= WaitClean(name, -1, c->verbose); - put_md_name(name); - } - } - free_mdstat(ms); - return rv; -} - -static int stop_scan(int verbose) -{ - /* apply --stop to all devices in /proc/mdstat */ - /* Due to possible stacking of devices, repeat until - * nothing more can be stopped - */ - int progress=1, err; - int last = 0; - int rv = 0; - do { - struct mdstat_ent *ms = mdstat_read(0, 0); - struct mdstat_ent *e; - - if (!progress) last = 1; - progress = 0; err = 0; - for (e=ms ; e ; e=e->next) { - char *name = get_md_name(e->devnum); - int mdfd; - - if (!name) { - pr_err("cannot find device file for %s\n", - e->dev); - continue; - } - mdfd = open_mddev(name, 1); - if (mdfd >= 0) { - if (Manage_runstop(name, mdfd, -1, verbose, !last)) - err = 1; - else - progress = 1; - close(mdfd); - } - - put_md_name(name); - } - free_mdstat(ms); - } while (!last && err); - if (err) - rv |= 1; - return rv; -} - -static int misc_list(struct mddev_dev *devlist, - struct mddev_ident *ident, - struct supertype *ss, struct context *c) -{ - struct mddev_dev *dv; - int rv = 0; - - for (dv=devlist ; dv; dv=dv->next) { - int mdfd; - - switch(dv->disposition) { - case 'D': - rv |= Detail(dv->devname, c); - continue; - case KillOpt: /* Zero superblock */ - if (ss) - rv |= Kill(dv->devname, ss, c->force, c->verbose,0); - else { - int v = c->verbose; - do { - rv |= Kill(dv->devname, NULL, c->force, v, 0); - v = -1; - } while (rv == 0); - rv &= ~2; - } - continue; - case 'Q': - rv |= Query(dv->devname); continue; - case 'X': - rv |= ExamineBitmap(dv->devname, c->brief, ss); continue; - case ExamineBB: - rv |= ExamineBadblocks(dv->devname, c->brief, ss); continue; - case 'W': - case WaitOpt: - rv |= Wait(dv->devname); continue; - case Waitclean: - rv |= WaitClean(dv->devname, -1, c->verbose); continue; - case KillSubarray: - rv |= Kill_subarray(dv->devname, c->subarray, c->verbose); - continue; - case UpdateSubarray: - if (c->update == NULL) { - pr_err("-U/--update must be specified with --update-subarray\n"); - rv |= 1; - continue; - } - rv |= Update_subarray(dv->devname, c->subarray, - c->update, ident, c->verbose); - continue; - } - mdfd = open_mddev(dv->devname, 1); - if (mdfd>=0) { - switch(dv->disposition) { - case 'R': - rv |= Manage_runstop(dv->devname, mdfd, 1, c->verbose, 0); break; - case 'S': - rv |= Manage_runstop(dv->devname, mdfd, -1, c->verbose, 0); break; - case 'o': - rv |= Manage_ro(dv->devname, mdfd, 1); break; - case 'w': - rv |= Manage_ro(dv->devname, mdfd, -1); break; - } - close(mdfd); - } else - rv |= 1; - } - return rv; -} ./mdadm/offroot/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.661942050 +0000 @@ -1,1528 +0,0 @@ -/* - * linux/net/sunrpc/svcsock.c - * - * These are the RPC server socket internals. - * - * The server scheduling algorithm does not always distribute the load - * evenly when servicing a single client. May need to modify the - * svc_sock_enqueue procedure... - * - * TCP support is largely untested and may be a little slow. The problem - * is that we currently do two separate recvfrom's, one for the 4-byte - * record length, and the second for the actual record. This could possibly - * be improved by always reading a minimum size of around 100 bytes and - * tucking any superfluous bytes away in a temporary store. Still, that - * leaves write requests out in the rain. An alternative may be to peek at - * the first skb in the queue, and if it matches the next TCP sequence - * number, to extract the record marker. Yuck. - * - * Copyright (C) 1995, 1996 Olaf Kirch - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -/* SMP locking strategy: - * - * svc_serv->sv_lock protects most stuff for that service. - * - * Some flags can be set to certain values at any time - * providing that certain rules are followed: - * - * SK_BUSY can be set to 0 at any time. - * svc_sock_enqueue must be called afterwards - * SK_CONN, SK_DATA, can be set or cleared at any time. - * after a set, svc_sock_enqueue must be called. - * after a clear, the socket must be read/accepted - * if this succeeds, it must be set again. - * SK_CLOSE can set at any time. It is never cleared. - * - */ - -#define RPCDBG_FACILITY RPCDBG_SVCSOCK - - -static struct svc_sock *svc_setup_socket(struct svc_serv *, struct socket *, - int *errp, int pmap_reg); -static void svc_udp_data_ready(struct sock *, int); -static int svc_udp_recvfrom(struct svc_rqst *); -static int svc_udp_sendto(struct svc_rqst *); - -static struct svc_deferred_req *svc_deferred_dequeue(struct svc_sock *svsk); -static int svc_deferred_recv(struct svc_rqst *rqstp); -static struct cache_deferred_req *svc_defer(struct cache_req *req); - -/* - * Queue up an idle server thread. Must have serv->sv_lock held. - * Note: this is really a stack rather than a queue, so that we only - * use as many different threads as we need, and the rest don't polute - * the cache. - */ -static inline void -svc_serv_enqueue(struct svc_serv *serv, struct svc_rqst *rqstp) -{ - list_add(&rqstp->rq_list, &serv->sv_threads); -} - -/* - * Dequeue an nfsd thread. Must have serv->sv_lock held. - */ -static inline void -svc_serv_dequeue(struct svc_serv *serv, struct svc_rqst *rqstp) -{ - list_del(&rqstp->rq_list); -} - -/* - * Release an skbuff after use - */ -static inline void -svc_release_skb(struct svc_rqst *rqstp) -{ - struct sk_buff *skb = rqstp->rq_skbuff; - struct svc_deferred_req *dr = rqstp->rq_deferred; - - if (skb) { - rqstp->rq_skbuff = NULL; - - dprintk("svc: service %p, releasing skb %p\n", rqstp, skb); - skb_free_datagram(rqstp->rq_sock->sk_sk, skb); - } - if (dr) { - rqstp->rq_deferred = NULL; - kfree(dr); - } -} - -/* - * Queue up a socket with data pending. If there are idle nfsd - * processes, wake 'em up. - * - */ -static void -svc_sock_enqueue(struct svc_sock *svsk) -{ - struct svc_serv *serv = svsk->sk_server; - struct svc_rqst *rqstp; - - if (!(svsk->sk_flags & - ( (1<sv_lock); - - if (!list_empty(&serv->sv_threads) && - !list_empty(&serv->sv_sockets)) - printk(KERN_ERR - "svc_sock_enqueue: threads and sockets both waiting??\n"); - - if (test_bit(SK_DEAD, &svsk->sk_flags)) { - /* Don't enqueue dead sockets */ - dprintk("svc: socket %p is dead, not enqueued\n", svsk->sk_sk); - goto out_unlock; - } - - if (test_bit(SK_BUSY, &svsk->sk_flags)) { - /* Don't enqueue socket while daemon is receiving */ - dprintk("svc: socket %p busy, not enqueued\n", svsk->sk_sk); - goto out_unlock; - } - - if (((svsk->sk_reserved + serv->sv_bufsz)*2 - > sock_wspace(svsk->sk_sk)) - && !test_bit(SK_CLOSE, &svsk->sk_flags) - && !test_bit(SK_CONN, &svsk->sk_flags)) { - /* Don't enqueue while not enough space for reply */ - dprintk("svc: socket %p no space, %d*2 > %ld, not enqueued\n", - svsk->sk_sk, svsk->sk_reserved+serv->sv_bufsz, - sock_wspace(svsk->sk_sk)); - goto out_unlock; - } - - /* Mark socket as busy. It will remain in this state until the - * server has processed all pending data and put the socket back - * on the idle list. - */ - set_bit(SK_BUSY, &svsk->sk_flags); - - if (!list_empty(&serv->sv_threads)) { - rqstp = list_entry(serv->sv_threads.next, - struct svc_rqst, - rq_list); - dprintk("svc: socket %p served by daemon %p\n", - svsk->sk_sk, rqstp); - svc_serv_dequeue(serv, rqstp); - if (rqstp->rq_sock) - printk(KERN_ERR - "svc_sock_enqueue: server %p, rq_sock=%p!\n", - rqstp, rqstp->rq_sock); - rqstp->rq_sock = svsk; - svsk->sk_inuse++; - rqstp->rq_reserved = serv->sv_bufsz; - svsk->sk_reserved += rqstp->rq_reserved; - wake_up(&rqstp->rq_wait); - } else { - dprintk("svc: socket %p put into queue\n", svsk->sk_sk); - list_add_tail(&svsk->sk_ready, &serv->sv_sockets); - } - -out_unlock: - spin_unlock_bh(&serv->sv_lock); -} - -/* - * Dequeue the first socket. Must be called with the serv->sv_lock held. - */ -static inline struct svc_sock * -svc_sock_dequeue(struct svc_serv *serv) -{ - struct svc_sock *svsk; - - if (list_empty(&serv->sv_sockets)) - return NULL; - - svsk = list_entry(serv->sv_sockets.next, - struct svc_sock, sk_ready); - list_del_init(&svsk->sk_ready); - - dprintk("svc: socket %p dequeued, inuse=%d\n", - svsk->sk_sk, svsk->sk_inuse); - - return svsk; -} - -/* - * Having read something from a socket, check whether it - * needs to be re-enqueued. - * Note: SK_DATA only gets cleared when a read-attempt finds - * no (or insufficient) data. - */ -static inline void -svc_sock_received(struct svc_sock *svsk) -{ - clear_bit(SK_BUSY, &svsk->sk_flags); - svc_sock_enqueue(svsk); -} - - -/** - * svc_reserve - change the space reserved for the reply to a request. - * @rqstp: The request in question - * @space: new max space to reserve - * - * Each request reserves some space on the output queue of the socket - * to make sure the reply fits. This function reduces that reserved - * space to be the amount of space used already, plus @space. - * - */ -void svc_reserve(struct svc_rqst *rqstp, int space) -{ - space += rqstp->rq_res.head[0].iov_len; - - if (space < rqstp->rq_reserved) { - struct svc_sock *svsk = rqstp->rq_sock; - spin_lock_bh(&svsk->sk_server->sv_lock); - svsk->sk_reserved -= (rqstp->rq_reserved - space); - rqstp->rq_reserved = space; - spin_unlock_bh(&svsk->sk_server->sv_lock); - - svc_sock_enqueue(svsk); - } -} - -/* - * Release a socket after use. - */ -static inline void -svc_sock_put(struct svc_sock *svsk) -{ - struct svc_serv *serv = svsk->sk_server; - - spin_lock_bh(&serv->sv_lock); - if (!--(svsk->sk_inuse) && test_bit(SK_DEAD, &svsk->sk_flags)) { - spin_unlock_bh(&serv->sv_lock); - dprintk("svc: releasing dead socket\n"); - sock_release(svsk->sk_sock); - kfree(svsk); - } - else - spin_unlock_bh(&serv->sv_lock); -} - -static void -svc_sock_release(struct svc_rqst *rqstp) -{ - struct svc_sock *svsk = rqstp->rq_sock; - - svc_release_skb(rqstp); - - svc_free_allpages(rqstp); - rqstp->rq_res.page_len = 0; - rqstp->rq_res.page_base = 0; - - - /* Reset response buffer and release - * the reservation. - * But first, check that enough space was reserved - * for the reply, otherwise we have a bug! - */ - if ((rqstp->rq_res.len) > rqstp->rq_reserved) - printk(KERN_ERR "RPC request reserved %d but used %d\n", - rqstp->rq_reserved, - rqstp->rq_res.len); - - rqstp->rq_res.head[0].iov_len = 0; - svc_reserve(rqstp, 0); - rqstp->rq_sock = NULL; - - svc_sock_put(svsk); -} - -/* - * External function to wake up a server waiting for data - */ -void -svc_wake_up(struct svc_serv *serv) -{ - struct svc_rqst *rqstp; - - spin_lock_bh(&serv->sv_lock); - if (!list_empty(&serv->sv_threads)) { - rqstp = list_entry(serv->sv_threads.next, - struct svc_rqst, - rq_list); - dprintk("svc: daemon %p woken up.\n", rqstp); - /* - svc_serv_dequeue(serv, rqstp); - rqstp->rq_sock = NULL; - */ - wake_up(&rqstp->rq_wait); - } - spin_unlock_bh(&serv->sv_lock); -} - -/* - * Generic sendto routine - */ -static int -svc_sendto(struct svc_rqst *rqstp, struct xdr_buf *xdr) -{ - struct svc_sock *svsk = rqstp->rq_sock; - struct socket *sock = svsk->sk_sock; - int slen; - int len = 0; - int result; - int size; - struct page **ppage = xdr->pages; - size_t base = xdr->page_base; - unsigned int pglen = xdr->page_len; - unsigned int flags = MSG_MORE; - - slen = xdr->len; - - /* Grab svsk->sk_sem to serialize outgoing data. */ - down(&svsk->sk_sem); - - if (rqstp->rq_prot == IPPROTO_UDP) { - /* set the destination */ - struct msghdr msg; - msg.msg_name = &rqstp->rq_addr; - msg.msg_namelen = sizeof(rqstp->rq_addr); - msg.msg_iov = NULL; - msg.msg_iovlen = 0; - msg.msg_control = NULL; - msg.msg_controllen = 0; - msg.msg_flags = MSG_MORE; - - if (sock_sendmsg(sock, &msg, 0) < 0) - goto out; - } - - /* send head */ - if (slen == xdr->head[0].iov_len) - flags = 0; - len = sock->ops->sendpage(sock, rqstp->rq_respages[0], 0, xdr->head[0].iov_len, flags); - if (len != xdr->head[0].iov_len) - goto out; - slen -= xdr->head[0].iov_len; - if (slen == 0) - goto out; - - /* send page data */ - size = PAGE_SIZE - base < pglen ? PAGE_SIZE - base : pglen; - while (pglen > 0) { - if (slen == size) - flags = 0; - result = sock->ops->sendpage(sock, *ppage, base, size, flags); - if (result > 0) - len += result; - if (result != size) - goto out; - slen -= size; - pglen -= size; - size = PAGE_SIZE < pglen ? PAGE_SIZE : pglen; - base = 0; - ppage++; - } - /* send tail */ - if (xdr->tail[0].iov_len) { - /* The tail *will* be in respages[0]; */ - result = sock->ops->sendpage(sock, rqstp->rq_respages[rqstp->rq_restailpage], - ((unsigned long)xdr->tail[0].iov_base)& (PAGE_SIZE-1), - xdr->tail[0].iov_len, 0); - - if (result > 0) - len += result; - } -out: - up(&svsk->sk_sem); - - dprintk("svc: socket %p sendto([%p %Zu... ], %d) = %d (addr %x)\n", - rqstp->rq_sock, xdr->head[0].iov_base, xdr->head[0].iov_len, xdr->len, len, - rqstp->rq_addr.sin_addr.s_addr); - - return len; -} - -/* - * Check input queue length - */ -static int -svc_recv_available(struct svc_sock *svsk) -{ - mm_segment_t oldfs; - struct socket *sock = svsk->sk_sock; - int avail, err; - - oldfs = get_fs(); set_fs(KERNEL_DS); - err = sock->ops->ioctl(sock, TIOCINQ, (unsigned long) &avail); - set_fs(oldfs); - - return (err >= 0)? avail : err; -} - -/* - * Generic recvfrom routine. - */ -static int -svc_recvfrom(struct svc_rqst *rqstp, struct iovec *iov, int nr, int buflen) -{ - mm_segment_t oldfs; - struct msghdr msg; - struct socket *sock; - int len, alen; - - rqstp->rq_addrlen = sizeof(rqstp->rq_addr); - sock = rqstp->rq_sock->sk_sock; - - msg.msg_name = &rqstp->rq_addr; - msg.msg_namelen = sizeof(rqstp->rq_addr); - msg.msg_iov = iov; - msg.msg_iovlen = nr; - msg.msg_control = NULL; - msg.msg_controllen = 0; - - msg.msg_flags = MSG_DONTWAIT; - - oldfs = get_fs(); set_fs(KERNEL_DS); - len = sock_recvmsg(sock, &msg, buflen, MSG_DONTWAIT); - set_fs(oldfs); - - /* sock_recvmsg doesn't fill in the name/namelen, so we must.. - * possibly we should cache this in the svc_sock structure - * at accept time. FIXME - */ - alen = sizeof(rqstp->rq_addr); - sock->ops->getname(sock, (struct sockaddr *)&rqstp->rq_addr, &alen, 1); - - dprintk("svc: socket %p recvfrom(%p, %Zu) = %d\n", - rqstp->rq_sock, iov[0].iov_base, iov[0].iov_len, len); - - return len; -} - -/* - * Set socket snd and rcv buffer lengths - */ -static inline void -svc_sock_setbufsize(struct socket *sock, unsigned int snd, unsigned int rcv) -{ -#if 0 - mm_segment_t oldfs; - oldfs = get_fs(); set_fs(KERNEL_DS); - sock_setsockopt(sock, SOL_SOCKET, SO_SNDBUF, - (char*)&snd, sizeof(snd)); - sock_setsockopt(sock, SOL_SOCKET, SO_RCVBUF, - (char*)&rcv, sizeof(rcv)); -#else - /* sock_setsockopt limits use to sysctl_?mem_max, - * which isn't acceptable. Until that is made conditional - * on not having CAP_SYS_RESOURCE or similar, we go direct... - * DaveM said I could! - */ - lock_sock(sock->sk); - sock->sk->sndbuf = snd * 2; - sock->sk->rcvbuf = rcv * 2; - sock->sk->userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK; - release_sock(sock->sk); -#endif -} -/* - * INET callback when data has been received on the socket. - */ -static void -svc_udp_data_ready(struct sock *sk, int count) -{ - struct svc_sock *svsk = (struct svc_sock *)(sk->user_data); - - if (!svsk) - goto out; - dprintk("svc: socket %p(inet %p), count=%d, busy=%d\n", - svsk, sk, count, test_bit(SK_BUSY, &svsk->sk_flags)); - set_bit(SK_DATA, &svsk->sk_flags); - svc_sock_enqueue(svsk); - out: - if (sk->sleep && waitqueue_active(sk->sleep)) - wake_up_interruptible(sk->sleep); -} - -/* - * INET callback when space is newly available on the socket. - */ -static void -svc_write_space(struct sock *sk) -{ - struct svc_sock *svsk = (struct svc_sock *)(sk->user_data); - - if (svsk) { - dprintk("svc: socket %p(inet %p), write_space busy=%d\n", - svsk, sk, test_bit(SK_BUSY, &svsk->sk_flags)); - svc_sock_enqueue(svsk); - } - - if (sk->sleep && waitqueue_active(sk->sleep)) { - printk(KERN_WARNING "RPC svc_write_space: some sleeping on %p\n", - svsk); - wake_up_interruptible(sk->sleep); - } -} - -/* - * Receive a datagram from a UDP socket. - */ -extern int -csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb); - -static int -svc_udp_recvfrom(struct svc_rqst *rqstp) -{ - struct svc_sock *svsk = rqstp->rq_sock; - struct svc_serv *serv = svsk->sk_server; - struct sk_buff *skb; - int err, len; - - if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags)) - /* udp sockets need large rcvbuf as all pending - * requests are still in that buffer. sndbuf must - * also be large enough that there is enough space - * for one reply per thread. - */ - svc_sock_setbufsize(svsk->sk_sock, - (serv->sv_nrthreads+3) * serv->sv_bufsz, - (serv->sv_nrthreads+3) * serv->sv_bufsz); - - if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) - return svc_deferred_recv(rqstp); - - clear_bit(SK_DATA, &svsk->sk_flags); - while ((skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err)) == NULL) { - svc_sock_received(svsk); - if (err == -EAGAIN) - return err; - /* possibly an icmp error */ - dprintk("svc: recvfrom returned error %d\n", -err); - } - set_bit(SK_DATA, &svsk->sk_flags); /* there may be more data... */ - - len = skb->len - sizeof(struct udphdr); - rqstp->rq_arg.len = len; - - rqstp->rq_prot = IPPROTO_UDP; - - /* Get sender address */ - rqstp->rq_addr.sin_family = AF_INET; - rqstp->rq_addr.sin_port = skb->h.uh->source; - rqstp->rq_addr.sin_addr.s_addr = skb->nh.iph->saddr; - - if (skb_is_nonlinear(skb)) { - /* we have to copy */ - local_bh_disable(); - if (csum_partial_copy_to_xdr(&rqstp->rq_arg, skb)) { - local_bh_enable(); - /* checksum error */ - skb_free_datagram(svsk->sk_sk, skb); - svc_sock_received(svsk); - return 0; - } - local_bh_enable(); - skb_free_datagram(svsk->sk_sk, skb); - } else { - /* we can use it in-place */ - rqstp->rq_arg.head[0].iov_base = skb->data + sizeof(struct udphdr); - rqstp->rq_arg.head[0].iov_len = len; - if (skb->ip_summed != CHECKSUM_UNNECESSARY) { - if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb->len, skb->csum))) { - skb_free_datagram(svsk->sk_sk, skb); - svc_sock_received(svsk); - return 0; - } - skb->ip_summed = CHECKSUM_UNNECESSARY; - } - rqstp->rq_skbuff = skb; - } - - rqstp->rq_arg.page_base = 0; - if (len <= rqstp->rq_arg.head[0].iov_len) { - rqstp->rq_arg.head[0].iov_len = len; - rqstp->rq_arg.page_len = 0; - } else { - rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len; - rqstp->rq_argused += (rqstp->rq_arg.page_len + PAGE_SIZE - 1)/ PAGE_SIZE; - } - - if (serv->sv_stats) - serv->sv_stats->netudpcnt++; - - /* One down, maybe more to go... */ - svsk->sk_sk->stamp = skb->stamp; - svc_sock_received(svsk); - - return len; -} - -static int -svc_udp_sendto(struct svc_rqst *rqstp) -{ - int error; - - error = svc_sendto(rqstp, &rqstp->rq_res); - if (error == -ECONNREFUSED) - /* ICMP error on earlier request. */ - error = svc_sendto(rqstp, &rqstp->rq_res); - - return error; -} - -static void -svc_udp_init(struct svc_sock *svsk) -{ - svsk->sk_sk->data_ready = svc_udp_data_ready; - svsk->sk_sk->write_space = svc_write_space; - svsk->sk_recvfrom = svc_udp_recvfrom; - svsk->sk_sendto = svc_udp_sendto; - - /* initialise setting must have enough space to - * receive and respond to one request. - * svc_udp_recvfrom will re-adjust if necessary - */ - svc_sock_setbufsize(svsk->sk_sock, - 3 * svsk->sk_server->sv_bufsz, - 3 * svsk->sk_server->sv_bufsz); - - set_bit(SK_DATA, &svsk->sk_flags); /* might have come in before data_ready set up */ - set_bit(SK_CHNGBUF, &svsk->sk_flags); -} - -/* - * A data_ready event on a listening socket means there's a connection - * pending. Do not use state_change as a substitute for it. - */ -static void -svc_tcp_listen_data_ready(struct sock *sk, int count_unused) -{ - struct svc_sock *svsk; - - dprintk("svc: socket %p TCP (listen) state change %d\n", - sk, sk->state); - - if (sk->state != TCP_ESTABLISHED) { - /* Aborted connection, SYN_RECV or whatever... */ - goto out; - } - if (!(svsk = (struct svc_sock *) sk->user_data)) { - printk("svc: socket %p: no user data\n", sk); - goto out; - } - set_bit(SK_CONN, &svsk->sk_flags); - svc_sock_enqueue(svsk); - out: - if (sk->sleep && waitqueue_active(sk->sleep)) - wake_up_interruptible_all(sk->sleep); -} - -/* - * A state change on a connected socket means it's dying or dead. - */ -static void -svc_tcp_state_change(struct sock *sk) -{ - struct svc_sock *svsk; - - dprintk("svc: socket %p TCP (connected) state change %d (svsk %p)\n", - sk, sk->state, sk->user_data); - - if (!(svsk = (struct svc_sock *) sk->user_data)) { - printk("svc: socket %p: no user data\n", sk); - goto out; - } - set_bit(SK_CLOSE, &svsk->sk_flags); - svc_sock_enqueue(svsk); - out: - if (sk->sleep && waitqueue_active(sk->sleep)) - wake_up_interruptible_all(sk->sleep); -} - -static void -svc_tcp_data_ready(struct sock *sk, int count) -{ - struct svc_sock * svsk; - - dprintk("svc: socket %p TCP data ready (svsk %p)\n", - sk, sk->user_data); - if (!(svsk = (struct svc_sock *)(sk->user_data))) - goto out; - set_bit(SK_DATA, &svsk->sk_flags); - svc_sock_enqueue(svsk); - out: - if (sk->sleep && waitqueue_active(sk->sleep)) - wake_up_interruptible(sk->sleep); -} - -/* - * Accept a TCP connection - */ -static void -svc_tcp_accept(struct svc_sock *svsk) -{ - struct sockaddr_in sin; - struct svc_serv *serv = svsk->sk_server; - struct socket *sock = svsk->sk_sock; - struct socket *newsock; - struct proto_ops *ops; - struct svc_sock *newsvsk; - int err, slen; - - dprintk("svc: tcp_accept %p sock %p\n", svsk, sock); - if (!sock) - return; - - if (!(newsock = sock_alloc())) { - printk(KERN_WARNING "%s: no more sockets!\n", serv->sv_name); - return; - } - dprintk("svc: tcp_accept %p allocated\n", newsock); - - newsock->type = sock->type; - newsock->ops = ops = sock->ops; - - clear_bit(SK_CONN, &svsk->sk_flags); - if ((err = ops->accept(sock, newsock, O_NONBLOCK)) < 0) { - if (err != -EAGAIN && net_ratelimit()) - printk(KERN_WARNING "%s: accept failed (err %d)!\n", - serv->sv_name, -err); - goto failed; /* aborted connection or whatever */ - } - set_bit(SK_CONN, &svsk->sk_flags); - svc_sock_enqueue(svsk); - - slen = sizeof(sin); - err = ops->getname(newsock, (struct sockaddr *) &sin, &slen, 1); - if (err < 0) { - if (net_ratelimit()) - printk(KERN_WARNING "%s: peername failed (err %d)!\n", - serv->sv_name, -err); - goto failed; /* aborted connection or whatever */ - } - - /* Ideally, we would want to reject connections from unauthorized - * hosts here, but when we get encription, the IP of the host won't - * tell us anything. For now just warn about unpriv connections. - */ - if (ntohs(sin.sin_port) >= 1024) { - dprintk(KERN_WARNING - "%s: connect from unprivileged port: %u.%u.%u.%u:%d\n", - serv->sv_name, - NIPQUAD(sin.sin_addr.s_addr), ntohs(sin.sin_port)); - } - - dprintk("%s: connect from %u.%u.%u.%u:%04x\n", serv->sv_name, - NIPQUAD(sin.sin_addr.s_addr), ntohs(sin.sin_port)); - - /* make sure that a write doesn't block forever when - * low on memory - */ - newsock->sk->sndtimeo = HZ*30; - - if (!(newsvsk = svc_setup_socket(serv, newsock, &err, 0))) - goto failed; - - - /* make sure that we don't have too many active connections. - * If we have, something must be dropped. - * We randomly choose between newest and oldest (in terms - * of recent activity) and drop it. - */ - if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*5) { - struct svc_sock *svsk = NULL; - spin_lock_bh(&serv->sv_lock); - if (!list_empty(&serv->sv_tempsocks)) { - if (net_random()&1) - svsk = list_entry(serv->sv_tempsocks.prev, - struct svc_sock, - sk_list); - else - svsk = list_entry(serv->sv_tempsocks.next, - struct svc_sock, - sk_list); - set_bit(SK_CLOSE, &svsk->sk_flags); - svsk->sk_inuse ++; - } - spin_unlock_bh(&serv->sv_lock); - - if (svsk) { - svc_sock_enqueue(svsk); - svc_sock_put(svsk); - } - - } - - if (serv->sv_stats) - serv->sv_stats->nettcpconn++; - - return; - -failed: - sock_release(newsock); - return; -} - -/* - * Receive data from a TCP socket. - */ -static int -svc_tcp_recvfrom(struct svc_rqst *rqstp) -{ - struct svc_sock *svsk = rqstp->rq_sock; - struct svc_serv *serv = svsk->sk_server; - int len; - struct iovec vec[RPCSVC_MAXPAGES]; - int pnum, vlen; - - dprintk("svc: tcp_recv %p data %d conn %d close %d\n", - svsk, test_bit(SK_DATA, &svsk->sk_flags), - test_bit(SK_CONN, &svsk->sk_flags), - test_bit(SK_CLOSE, &svsk->sk_flags)); - - if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) - return svc_deferred_recv(rqstp); - - if (test_bit(SK_CLOSE, &svsk->sk_flags)) { - svc_delete_socket(svsk); - return 0; - } - - if (test_bit(SK_CONN, &svsk->sk_flags)) { - svc_tcp_accept(svsk); - svc_sock_received(svsk); - return 0; - } - - if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags)) - /* sndbuf needs to have room for one request - * per thread, otherwise we can stall even when the - * network isn't a bottleneck. - * rcvbuf just needs to be able to hold a few requests. - * Normally they will be removed from the queue - * as soon a a complete request arrives. - */ - svc_sock_setbufsize(svsk->sk_sock, - (serv->sv_nrthreads+3) * serv->sv_bufsz, - 3 * serv->sv_bufsz); - - clear_bit(SK_DATA, &svsk->sk_flags); - - /* Receive data. If we haven't got the record length yet, get - * the next four bytes. Otherwise try to gobble up as much as - * possible up to the complete record length. - */ - if (svsk->sk_tcplen < 4) { - unsigned long want = 4 - svsk->sk_tcplen; - struct iovec iov; - - iov.iov_base = ((char *) &svsk->sk_reclen) + svsk->sk_tcplen; - iov.iov_len = want; - if ((len = svc_recvfrom(rqstp, &iov, 1, want)) < 0) - goto error; - svsk->sk_tcplen += len; - - if (len < want) { - dprintk("svc: short recvfrom while reading record length (%d of %d)\n", - len, want); - svc_sock_received(svsk); - return -EAGAIN; /* record header not complete */ - } - - svsk->sk_reclen = ntohl(svsk->sk_reclen); - if (!(svsk->sk_reclen & 0x80000000)) { - /* FIXME: technically, a record can be fragmented, - * and non-terminal fragments will not have the top - * bit set in the fragment length header. - * But apparently no known nfs clients send fragmented - * records. */ - printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (non-terminal)\n", - (unsigned long) svsk->sk_reclen); - goto err_delete; - } - svsk->sk_reclen &= 0x7fffffff; - dprintk("svc: TCP record, %d bytes\n", svsk->sk_reclen); - if (svsk->sk_reclen > serv->sv_bufsz) { - printk(KERN_NOTICE "RPC: bad TCP reclen 0x%08lx (large)\n", - (unsigned long) svsk->sk_reclen); - goto err_delete; - } - } - - /* Check whether enough data is available */ - len = svc_recv_available(svsk); - if (len < 0) - goto error; - - if (len < svsk->sk_reclen) { - dprintk("svc: incomplete TCP record (%d of %d)\n", - len, svsk->sk_reclen); - svc_sock_received(svsk); - return -EAGAIN; /* record not complete */ - } - len = svsk->sk_reclen; - set_bit(SK_DATA, &svsk->sk_flags); - - vec[0] = rqstp->rq_arg.head[0]; - vlen = PAGE_SIZE; - pnum = 1; - while (vlen < len) { - vec[pnum].iov_base = page_address(rqstp->rq_argpages[rqstp->rq_argused++]); - vec[pnum].iov_len = PAGE_SIZE; - pnum++; - vlen += PAGE_SIZE; - } - - /* Now receive data */ - len = svc_recvfrom(rqstp, vec, pnum, len); - if (len < 0) - goto error; - - dprintk("svc: TCP complete record (%d bytes)\n", len); - rqstp->rq_arg.len = len; - rqstp->rq_arg.page_base = 0; - if (len <= rqstp->rq_arg.head[0].iov_len) { - rqstp->rq_arg.head[0].iov_len = len; - rqstp->rq_arg.page_len = 0; - } else { - rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len; - } - - rqstp->rq_skbuff = 0; - rqstp->rq_prot = IPPROTO_TCP; - - /* Reset TCP read info */ - svsk->sk_reclen = 0; - svsk->sk_tcplen = 0; - - svc_sock_received(svsk); - if (serv->sv_stats) - serv->sv_stats->nettcpcnt++; - - return len; - - err_delete: - svc_delete_socket(svsk); - return -EAGAIN; - - error: - if (len == -EAGAIN) { - dprintk("RPC: TCP recvfrom got EAGAIN\n"); - svc_sock_received(svsk); - } else { - printk(KERN_NOTICE "%s: recvfrom returned errno %d\n", - svsk->sk_server->sv_name, -len); - svc_sock_received(svsk); - } - - return len; -} - -/* - * Send out data on TCP socket. - */ -static int -svc_tcp_sendto(struct svc_rqst *rqstp) -{ - struct xdr_buf *xbufp = &rqstp->rq_res; - int sent; - u32 reclen; - - /* Set up the first element of the reply iovec. - * Any other iovecs that may be in use have been taken - * care of by the server implementation itself. - */ - reclen = htonl(0x80000000|((xbufp->len ) - 4)); - memcpy(xbufp->head[0].iov_base, &reclen, 4); - - sent = svc_sendto(rqstp, &rqstp->rq_res); - if (sent != xbufp->len) { - printk(KERN_NOTICE "rpc-srv/tcp: %s: %s %d when sending %d bytes - shutting down socket\n", - rqstp->rq_sock->sk_server->sv_name, - (sent<0)?"got error":"sent only", - sent, xbufp->len); - svc_delete_socket(rqstp->rq_sock); - sent = -EAGAIN; - } - return sent; -} - -static void -svc_tcp_init(struct svc_sock *svsk) -{ - struct sock *sk = svsk->sk_sk; - struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); - - svsk->sk_recvfrom = svc_tcp_recvfrom; - svsk->sk_sendto = svc_tcp_sendto; - - if (sk->state == TCP_LISTEN) { - dprintk("setting up TCP socket for listening\n"); - sk->data_ready = svc_tcp_listen_data_ready; - set_bit(SK_CONN, &svsk->sk_flags); - } else { - dprintk("setting up TCP socket for reading\n"); - sk->state_change = svc_tcp_state_change; - sk->data_ready = svc_tcp_data_ready; - sk->write_space = svc_write_space; - - svsk->sk_reclen = 0; -<<<<<<< found - svsk->sk_tcplen = 0; - - /* initialise setting must have enough space to - * receive and respond to one request. - * svc_tcp_recvfrom will re-adjust if necessary -||||||| expected - svsk->sk_tcplen = 0; - - /* initialise setting must have enough space to - * receive and respond to one request. - * svc_tcp_recvfrom will re-adjust if necessary -======= - svsk->sk_tcplen = 0; - - tp->nonagle = 1; /* disable Nagle's algorithm */ - - /* initialise setting must have enough space to - * receive and respond to one request. - * svc_tcp_recvfrom will re-adjust if necessary ->>>>>>> replacement - */ - svc_sock_setbufsize(svsk->sk_sock, - 3 * svsk->sk_server->sv_bufsz, - 3 * svsk->sk_server->sv_bufsz); - - set_bit(SK_CHNGBUF, &svsk->sk_flags); - set_bit(SK_DATA, &svsk->sk_flags); - } -} - -void -svc_sock_update_bufs(struct svc_serv *serv) -{ - /* - * The number of server threads has changed. Update - * rcvbuf and sndbuf accordingly on all sockets - */ - struct list_head *le; - - spin_lock_bh(&serv->sv_lock); - list_for_each(le, &serv->sv_permsocks) { - struct svc_sock *svsk = - list_entry(le, struct svc_sock, sk_list); - set_bit(SK_CHNGBUF, &svsk->sk_flags); - } - list_for_each(le, &serv->sv_tempsocks) { - struct svc_sock *svsk = - list_entry(le, struct svc_sock, sk_list); - set_bit(SK_CHNGBUF, &svsk->sk_flags); - } - spin_unlock_bh(&serv->sv_lock); -} - -/* - * Receive the next request on any socket. - */ -int -svc_recv(struct svc_serv *serv, struct svc_rqst *rqstp, long timeout) -{ - struct svc_sock *svsk =NULL; - int len; - int pages; - struct xdr_buf *arg; - DECLARE_WAITQUEUE(wait, current); - - dprintk("svc: server %p waiting for data (to = %ld)\n", - rqstp, timeout); - - if (rqstp->rq_sock) - printk(KERN_ERR - "svc_recv: service %p, socket not NULL!\n", - rqstp); - if (waitqueue_active(&rqstp->rq_wait)) - printk(KERN_ERR - "svc_recv: service %p, wait queue active!\n", - rqstp); - - /* Initialize the buffers */ - /* first reclaim pages that were moved to response list */ - svc_pushback_allpages(rqstp); - - /* now allocate needed pages. If we get a failure, sleep briefly */ - pages = 2 + (serv->sv_bufsz + PAGE_SIZE -1) / PAGE_SIZE; - while (rqstp->rq_arghi < pages) { - struct page *p = alloc_page(GFP_KERNEL); - if (!p) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ/2); - current->state = TASK_RUNNING; - continue; - } - rqstp->rq_argpages[rqstp->rq_arghi++] = p; - } - - /* Make arg->head point to first page and arg->pages point to rest */ - arg = &rqstp->rq_arg; - arg->head[0].iov_base = page_address(rqstp->rq_argpages[0]); - arg->head[0].iov_len = PAGE_SIZE; - rqstp->rq_argused = 1; - arg->pages = rqstp->rq_argpages + 1; - arg->page_base = 0; - /* save at least one page for response */ - arg->page_len = (pages-2)*PAGE_SIZE; - arg->len = (pages-1)*PAGE_SIZE; - arg->tail[0].iov_len = 0; - - if (signalled()) - return -EINTR; - - spin_lock_bh(&serv->sv_lock); - if (!list_empty(&serv->sv_tempsocks)) { - svsk = list_entry(serv->sv_tempsocks.next, - struct svc_sock, sk_list); - /* apparently the "standard" is that clients close - * idle connections after 5 minutes, servers after - * 6 minutes - * http://www.connectathon.org/talks96/nfstcp.pdf - */ - if (get_seconds() - svsk->sk_lastrecv < 6*60 - || test_bit(SK_BUSY, &svsk->sk_flags)) - svsk = NULL; - } - if (svsk) { - set_bit(SK_BUSY, &svsk->sk_flags); - set_bit(SK_CLOSE, &svsk->sk_flags); - rqstp->rq_sock = svsk; - svsk->sk_inuse++; - } else if ((svsk = svc_sock_dequeue(serv)) != NULL) { - rqstp->rq_sock = svsk; - svsk->sk_inuse++; - rqstp->rq_reserved = serv->sv_bufsz; - svsk->sk_reserved += rqstp->rq_reserved; - } else { - /* No data pending. Go to sleep */ - svc_serv_enqueue(serv, rqstp); - - /* - * We have to be able to interrupt this wait - * to bring down the daemons ... - */ - set_current_state(TASK_INTERRUPTIBLE); - add_wait_queue(&rqstp->rq_wait, &wait); - spin_unlock_bh(&serv->sv_lock); - - schedule_timeout(timeout); - - spin_lock_bh(&serv->sv_lock); - remove_wait_queue(&rqstp->rq_wait, &wait); - - if (!(svsk = rqstp->rq_sock)) { - svc_serv_dequeue(serv, rqstp); - spin_unlock_bh(&serv->sv_lock); - dprintk("svc: server %p, no data yet\n", rqstp); - return signalled()? -EINTR : -EAGAIN; - } - } - spin_unlock_bh(&serv->sv_lock); - - dprintk("svc: server %p, socket %p, inuse=%d\n", - rqstp, svsk, svsk->sk_inuse); - len = svsk->sk_recvfrom(rqstp); - dprintk("svc: got len=%d\n", len); - - /* No data, incomplete (TCP) read, or accept() */ - if (len == 0 || len == -EAGAIN) { - svc_sock_release(rqstp); - return -EAGAIN; - } - svsk->sk_lastrecv = get_seconds(); - if (test_bit(SK_TEMP, &svsk->sk_flags)) { - /* push active sockets to end of list */ - spin_lock_bh(&serv->sv_lock); - if (!list_empty(&svsk->sk_list)) - list_move_tail(&svsk->sk_list, &serv->sv_tempsocks); - spin_unlock_bh(&serv->sv_lock); - } - - rqstp->rq_secure = ntohs(rqstp->rq_addr.sin_port) < 1024; - rqstp->rq_userset = 0; - rqstp->rq_chandle.defer = svc_defer; - - if (serv->sv_stats) - serv->sv_stats->netcnt++; - return len; -} - -/* - * Drop request - */ -void -svc_drop(struct svc_rqst *rqstp) -{ - dprintk("svc: socket %p dropped request\n", rqstp->rq_sock); - svc_sock_release(rqstp); -} - -/* - * Return reply to client. - */ -int -svc_send(struct svc_rqst *rqstp) -{ - struct svc_sock *svsk; - int len; - struct xdr_buf *xb; - - if ((svsk = rqstp->rq_sock) == NULL) { - printk(KERN_WARNING "NULL socket pointer in %s:%d\n", - __FILE__, __LINE__); - return -EFAULT; - } - - /* release the receive skb before sending the reply */ - svc_release_skb(rqstp); - - /* calculate over-all length */ - xb = & rqstp->rq_res; - xb->len = xb->head[0].iov_len + - xb->page_len + - xb->tail[0].iov_len; - - len = svsk->sk_sendto(rqstp); - svc_sock_release(rqstp); - - if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN) - return 0; - return len; -} - -/* - * Initialize socket for RPC use and create svc_sock struct - * XXX: May want to setsockopt SO_SNDBUF and SO_RCVBUF. - */ -static struct svc_sock * -svc_setup_socket(struct svc_serv *serv, struct socket *sock, - int *errp, int pmap_register) -{ - struct svc_sock *svsk; - struct sock *inet; - - dprintk("svc: svc_setup_socket %p\n", sock); - if (!(svsk = kmalloc(sizeof(*svsk), GFP_KERNEL))) { - *errp = -ENOMEM; - return NULL; - } - memset(svsk, 0, sizeof(*svsk)); - - inet = sock->sk; - - /* Register socket with portmapper */ - if (*errp >= 0 && pmap_register) - *errp = svc_register(serv, inet->protocol, - ntohs(inet_sk(inet)->sport)); - - if (*errp < 0) { - kfree(svsk); - return NULL; - } - - set_bit(SK_BUSY, &svsk->sk_flags); - inet->user_data = svsk; - svsk->sk_sock = sock; - svsk->sk_sk = inet; - svsk->sk_ostate = inet->state_change; - svsk->sk_odata = inet->data_ready; - svsk->sk_owspace = inet->write_space; - svsk->sk_server = serv; - svsk->sk_lastrecv = get_seconds(); - INIT_LIST_HEAD(&svsk->sk_deferred); - INIT_LIST_HEAD(&svsk->sk_ready); - sema_init(&svsk->sk_sem, 1); - - /* Initialize the socket */ - if (sock->type == SOCK_DGRAM) - svc_udp_init(svsk); - else - svc_tcp_init(svsk); - - spin_lock_bh(&serv->sv_lock); - if (!pmap_register) { - set_bit(SK_TEMP, &svsk->sk_flags); - list_add(&svsk->sk_list, &serv->sv_tempsocks); - serv->sv_tmpcnt++; - } else { - clear_bit(SK_TEMP, &svsk->sk_flags); - list_add(&svsk->sk_list, &serv->sv_permsocks); - } - spin_unlock_bh(&serv->sv_lock); - - dprintk("svc: svc_setup_socket created %p (inet %p)\n", - svsk, svsk->sk_sk); - - clear_bit(SK_BUSY, &svsk->sk_flags); - svc_sock_enqueue(svsk); - return svsk; -} - -/* - * Create socket for RPC service. - */ -static int -svc_create_socket(struct svc_serv *serv, int protocol, struct sockaddr_in *sin) -{ - struct svc_sock *svsk; - struct socket *sock; - int error; - int type; - - dprintk("svc: svc_create_socket(%s, %d, %u.%u.%u.%u:%d)\n", - serv->sv_program->pg_name, protocol, - NIPQUAD(sin->sin_addr.s_addr), - ntohs(sin->sin_port)); - - if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) { - printk(KERN_WARNING "svc: only UDP and TCP " - "sockets supported\n"); - return -EINVAL; - } - type = (protocol == IPPROTO_UDP)? SOCK_DGRAM : SOCK_STREAM; - - if ((error = sock_create(PF_INET, type, protocol, &sock)) < 0) - return error; - - if (sin != NULL) { - sock->sk->reuse = 1; /* allow address reuse */ - error = sock->ops->bind(sock, (struct sockaddr *) sin, - sizeof(*sin)); - if (error < 0) - goto bummer; - } - - if (protocol == IPPROTO_TCP) { - if ((error = sock->ops->listen(sock, 64)) < 0) - goto bummer; - } - - if ((svsk = svc_setup_socket(serv, sock, &error, 1)) != NULL) - return 0; - -bummer: - dprintk("svc: svc_create_socket error = %d\n", -error); - sock_release(sock); - return error; -} - -/* - * Remove a dead socket - */ -void -svc_delete_socket(struct svc_sock *svsk) -{ - struct svc_serv *serv; - struct sock *sk; - - dprintk("svc: svc_delete_socket(%p)\n", svsk); - - serv = svsk->sk_server; - sk = svsk->sk_sk; - - sk->state_change = svsk->sk_ostate; - sk->data_ready = svsk->sk_odata; - sk->write_space = svsk->sk_owspace; - - spin_lock_bh(&serv->sv_lock); - - list_del_init(&svsk->sk_list); - list_del_init(&svsk->sk_ready); - if (!test_and_set_bit(SK_DEAD, &svsk->sk_flags)) - if (test_bit(SK_TEMP, &svsk->sk_flags)) - serv->sv_tmpcnt--; - - if (!svsk->sk_inuse) { - spin_unlock_bh(&serv->sv_lock); - sock_release(svsk->sk_sock); - kfree(svsk); - } else { - spin_unlock_bh(&serv->sv_lock); - dprintk(KERN_NOTICE "svc: server socket destroy delayed\n"); - /* svsk->sk_server = NULL; */ - } -} - -/* - * Make a socket for nfsd and lockd - */ -int -svc_makesock(struct svc_serv *serv, int protocol, unsigned short port) -{ - struct sockaddr_in sin; - - dprintk("svc: creating socket proto = %d\n", protocol); - sin.sin_family = AF_INET; - sin.sin_addr.s_addr = INADDR_ANY; - sin.sin_port = htons(port); - return svc_create_socket(serv, protocol, &sin); -} - -/* - * Handle defer and revisit of requests - */ - -static void svc_revisit(struct cache_deferred_req *dreq, int too_many) -{ - struct svc_deferred_req *dr = container_of(dreq, struct svc_deferred_req, handle); - struct svc_serv *serv = dr->serv; - struct svc_sock *svsk; - - if (too_many) { - svc_sock_put(dr->svsk); - kfree(dr); - return; - } - dprintk("revisit queued\n"); - svsk = dr->svsk; - dr->svsk = NULL; - spin_lock(&serv->sv_lock); - list_add(&dr->handle.recent, &svsk->sk_deferred); - spin_unlock(&serv->sv_lock); - set_bit(SK_DEFERRED, &svsk->sk_flags); - svc_sock_enqueue(svsk); - svc_sock_put(svsk); -} - -static struct cache_deferred_req * -svc_defer(struct cache_req *req) -{ - struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle); - int size = sizeof(struct svc_deferred_req) + (rqstp->rq_arg.len); - struct svc_deferred_req *dr; - - if (rqstp->rq_arg.page_len) - return NULL; /* if more than a page, give up FIXME */ - if (rqstp->rq_deferred) { - dr = rqstp->rq_deferred; - rqstp->rq_deferred = NULL; - } else { - int skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len; - /* FIXME maybe discard if size too large */ - dr = kmalloc(size, GFP_KERNEL); - if (dr == NULL) - return NULL; - - dr->serv = rqstp->rq_server; - dr->prot = rqstp->rq_prot; - dr->addr = rqstp->rq_addr; - dr->argslen = rqstp->rq_arg.len >> 2; - memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2); - } - spin_lock(&rqstp->rq_server->sv_lock); - rqstp->rq_sock->sk_inuse++; - dr->svsk = rqstp->rq_sock; - spin_unlock(&rqstp->rq_server->sv_lock); - - dr->handle.revisit = svc_revisit; - return &dr->handle; -} - -/* - * recv data from a deferred request into an active one - */ -static int svc_deferred_recv(struct svc_rqst *rqstp) -{ - struct svc_deferred_req *dr = rqstp->rq_deferred; - - rqstp->rq_arg.head[0].iov_base = dr->args; - rqstp->rq_arg.head[0].iov_len = dr->argslen<<2; - rqstp->rq_arg.page_len = 0; - rqstp->rq_arg.len = dr->argslen<<2; - rqstp->rq_prot = dr->prot; - rqstp->rq_addr = dr->addr; - return dr->argslen<<2; -} - - -static struct svc_deferred_req *svc_deferred_dequeue(struct svc_sock *svsk) -{ - struct svc_deferred_req *dr = NULL; - struct svc_serv *serv = svsk->sk_server; - - if (!test_bit(SK_DEFERRED, &svsk->sk_flags)) - return NULL; - spin_lock(&serv->sv_lock); - clear_bit(SK_DEFERRED, &svsk->sk_flags); - if (!list_empty(&svsk->sk_deferred)) { - dr = list_entry(svsk->sk_deferred.next, - struct svc_deferred_req, - handle.recent); - list_del_init(&dr->handle.recent); - set_bit(SK_DEFERRED, &svsk->sk_flags); - } - spin_unlock(&serv->sv_lock); - svc_sock_received(svsk); - return dr; -} ./linux/rpc_tcp_nonagle/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.672207211 +0000 @@ -1 +0,0 @@ -<<<--- clear_bit(BH_Uptodate, &||| clear_buffer_uptodate(=== dev--->>>-><<<---->b_state|||===flags = 0--->>>; ./linux/raid5line/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.683421542 +0000 @@ -1,7 +0,0 @@ -<<<<<<< found - clear_bit(BH_Uptodate, &sh->bh_cache[i]->b_state); -||||||| expected - clear_buffer_uptodate(sh->bh_cache[i]); -======= - dev->flags = 0; ->>>>>>> replacement ./linux/raid5line/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.694021671 +0000 @@ -1,7 +0,0 @@ - clear_bit(BH_Uptodate, &sh->bh_cache[i]->b_state); -<<<<<<< found -||||||| expected - clear_buffer_uptodate(sh->bh_cache[i]); -======= - dev->flags = 0; ->>>>>>> replacement ./linux/raid5line/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.704757847 +0000 @@ -1,36 +0,0 @@ -static void raid5_build_block (struct stripe_head *sh, int i) -{ - raid5_conf_t *conf = sh->raid_conf; - struct r5dev *dev = &sh->dev[i]; - - bio_init(&dev->req); - dev->req.bi_io_vec = &dev->vec; - dev->req.bi_vcnt++; - dev->vec.bv_page = dev->page; - dev->vec.bv_len = STRIPE_SIZE; - dev->vec.bv_offset = 0; - -<<<<<<< found - bh->b_dev = conf->disks[i].dev; -||||||| expected - bh->b_dev = conf->disks[i].dev; - /* FIXME - later we will need bdev here */ -======= - dev->req.bi_bdev = conf->disks[i].bdev; - dev->req.bi_sector = sh->sector; ->>>>>>> replacement - dev->req.bi_private = sh; - - dev->flags = 0; - if (i != sh->pd_idx) -<<<<<<< found - bh->b_size = sh->size; - bh->b_list = BUF_LOCKED; - return bh; -||||||| expected - bh->b_size = sh->size; - return bh; -======= - dev->sector = compute_blocknr(sh, i); ->>>>>>> replacement -} ./linux/raid5build/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.715410197 +0000 @@ -1,2244 +0,0 @@ -/* - * raid5.c : Multiple Devices driver for Linux - * Copyright (C) 1996, 1997 Ingo Molnar, Miguel de Icaza, Gadi Oxman - * Copyright (C) 1999, 2000 Ingo Molnar - * - * RAID-5 management functions. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * You should have received a copy of the GNU General Public License - * (for example /usr/src/linux/COPYING); if not, write to the Free - * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - - -#include -#include -#include -#include -#include -#include -#include - -/* - * Stripe cache - */ - -#define NR_STRIPES 256 -#define STRIPE_SIZE PAGE_SIZE -#define STRIPE_SECTORS (STRIPE_SIZE>>9) -#define IO_THRESHOLD 1 -#define HASH_PAGES 1 -#define HASH_PAGES_ORDER 0 -#define NR_HASH (HASH_PAGES * PAGE_SIZE / sizeof(struct stripe_head *)) -#define HASH_MASK (NR_HASH - 1) -#define stripe_hash(conf, sect) ((conf)->stripe_hashtbl[((sect) / STRIPE_SECTORS) & HASH_MASK]) - -/* - * The following can be used to debug the driver - */ -#define RAID5_DEBUG 0 -#define RAID5_PARANOIA 1 -#if RAID5_PARANOIA && CONFIG_SMP -# define CHECK_DEVLOCK() if (!spin_is_locked(&conf->device_lock)) BUG() -#else -# define CHECK_DEVLOCK() -#endif - -#if RAID5_DEBUG -#define PRINTK(x...) printk(x) -#define inline -#define __inline__ -#else -#define PRINTK(x...) do { } while (0) -#endif - -static void print_raid5_conf (raid5_conf_t *conf); - -static inline void __release_stripe(raid5_conf_t *conf, struct stripe_head *sh) -{ - if (atomic_dec_and_test(&sh->count)) { - if (!list_empty(&sh->lru)) - BUG(); - if (atomic_read(&conf->active_stripes)==0) - BUG(); - if (test_bit(STRIPE_HANDLE, &sh->state)) { - if (test_bit(STRIPE_DELAYED, &sh->state)) - list_add_tail(&sh->lru, &conf->delayed_list); - else - list_add_tail(&sh->lru, &conf->handle_list); - md_wakeup_thread(conf->thread); - } else { - if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { - atomic_dec(&conf->preread_active_stripes); - if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD) - md_wakeup_thread(conf->thread); - } - list_add_tail(&sh->lru, &conf->inactive_list); - atomic_dec(&conf->active_stripes); - if (!conf->inactive_blocked || - atomic_read(&conf->active_stripes) < (NR_STRIPES*3/4)) - wake_up(&conf->wait_for_stripe); - } - } -} -static void release_stripe(struct stripe_head *sh) -{ - raid5_conf_t *conf = sh->raid_conf; - unsigned long flags; - - spin_lock_irqsave(&conf->device_lock, flags); - __release_stripe(conf, sh); - spin_unlock_irqrestore(&conf->device_lock, flags); -} - -static void remove_hash(struct stripe_head *sh) -{ - PRINTK("remove_hash(), stripe %lu\n", sh->sector); - - if (sh->hash_pprev) { - if (sh->hash_next) - sh->hash_next->hash_pprev = sh->hash_pprev; - *sh->hash_pprev = sh->hash_next; - sh->hash_pprev = NULL; - } -} - -static __inline__ void insert_hash(raid5_conf_t *conf, struct stripe_head *sh) -{ - struct stripe_head **shp = &stripe_hash(conf, sh->sector); - - PRINTK("insert_hash(), stripe %lu\n",sh->sector); - - CHECK_DEVLOCK(); - if ((sh->hash_next = *shp) != NULL) - (*shp)->hash_pprev = &sh->hash_next; - *shp = sh; - sh->hash_pprev = shp; -} - - -/* find an idle stripe, make sure it is unhashed, and return it. */ -static struct stripe_head *get_free_stripe(raid5_conf_t *conf) -{ - struct stripe_head *sh = NULL; - struct list_head *first; - - CHECK_DEVLOCK(); - if (list_empty(&conf->inactive_list)) - goto out; - first = conf->inactive_list.next; - sh = list_entry(first, struct stripe_head, lru); - list_del_init(first); - remove_hash(sh); - atomic_inc(&conf->active_stripes); -out: - return sh; -} - -static void shrink_buffers(struct stripe_head *sh, int num) -{ - struct page *p; - int i; - - for (i=0; idev[i].page; - if (!p) - continue; - sh->dev[i].page = NULL; - page_cache_release(p); - } -} - -static int grow_buffers(struct stripe_head *sh, int num) -{ - int i; - - for (i=0; ib_wait); - if ((page = alloc_page(priority))) - bh->b_data = page_address(page); - else { -||||||| expected - if (!bh) - return 1; - memset(bh, 0, sizeof (struct buffer_head)); - if ((page = alloc_page(priority))) - bh->b_data = page_address(page); - else { -======= - if (!(page = alloc_page(GFP_KERNEL))) { ->>>>>>> replacement - return 1; - } - sh->dev[i].page = page; - } - return 0; -} - -static void raid5_build_block (struct stripe_head *sh, int i); - -static inline void init_stripe(struct stripe_head *sh, unsigned long sector, int pd_idx) -{ - raid5_conf_t *conf = sh->raid_conf; - int disks = conf->raid_disks, i; - - if (atomic_read(&sh->count) != 0) - BUG(); - if (test_bit(STRIPE_HANDLE, &sh->state)) - BUG(); - - CHECK_DEVLOCK(); - PRINTK("init_stripe called, stripe %lu\n", sh->sector); - - remove_hash(sh); - - sh->sector = sector; - sh->pd_idx = pd_idx; - sh->state = 0; - - for (i=disks; i--; ) { - struct r5dev *dev = &sh->dev[i]; - - if (dev->toread || dev->towrite || dev->written || - test_bit(R5_LOCKED, &dev->flags)) { - printk("sector=%lx i=%d %p %p %p %d\n", - sh->sector, i, dev->toread, - dev->towrite, dev->written, - test_bit(R5_LOCKED, &dev->flags)); - BUG(); - } -<<<<<<< found - clear_bit(BH_Uptodate, &sh->bh_cache[i]->b_state); -||||||| expected - clear_buffer_uptodate(sh->bh_cache[i]); -======= - dev->flags = 0; ->>>>>>> replacement - raid5_build_block(sh, i); - } - insert_hash(conf, sh); -} - -static struct stripe_head *__find_stripe(raid5_conf_t *conf, unsigned long sector) -{ - struct stripe_head *sh; - - CHECK_DEVLOCK(); - PRINTK("__find_stripe, sector %lu\n", sector); - for (sh = stripe_hash(conf, sector); sh; sh = sh->hash_next) - if (sh->sector == sector) - return sh; - PRINTK("__stripe %lu not in cache\n", sector); - return NULL; -} - -static struct stripe_head *get_active_stripe(raid5_conf_t *conf, unsigned long sector, - int pd_idx, int noblock) -{ - struct stripe_head *sh; - - PRINTK("get_stripe, sector %lu\n", sector); - - spin_lock_irq(&conf->device_lock); - - do { - sh = __find_stripe(conf, sector); - if (!sh) { - if (!conf->inactive_blocked) - sh = get_free_stripe(conf); - if (noblock && sh == NULL) - break; - if (!sh) { - conf->inactive_blocked = 1; - wait_event_lock_irq(conf->wait_for_stripe, - !list_empty(&conf->inactive_list) && - (atomic_read(&conf->active_stripes) < (NR_STRIPES *3/4) - || !conf->inactive_blocked), - conf->device_lock); - conf->inactive_blocked = 0; - } else - init_stripe(sh, sector, pd_idx); - } else { - if (atomic_read(&sh->count)) { - if (!list_empty(&sh->lru)) - BUG(); - } else { - if (!test_bit(STRIPE_HANDLE, &sh->state)) - atomic_inc(&conf->active_stripes); - if (list_empty(&sh->lru)) - BUG(); - list_del_init(&sh->lru); - } - } - } while (sh == NULL); - - if (sh) - atomic_inc(&sh->count); - - spin_unlock_irq(&conf->device_lock); - return sh; -} - -static int grow_stripes(raid5_conf_t *conf, int num) -{ - struct stripe_head *sh; - kmem_cache_t *sc; - int devs = conf->raid_disks; - - sprintf(conf->cache_name, "md/raid5-%d", conf->mddev->__minor); - - sc = kmem_cache_create(conf->cache_name, - sizeof(struct stripe_head)+(devs-1)*sizeof(struct r5dev), - 0, 0, NULL, NULL); - if (!sc) - return 1; - conf->slab_cache = sc; - while (num--) { - sh = kmem_cache_alloc(sc, GFP_KERNEL); - if (!sh) - return 1; - memset(sh, 0, sizeof(*sh) + (devs-1)*sizeof(struct r5dev)); - sh->raid_conf = conf; - sh->lock = SPIN_LOCK_UNLOCKED; - - if (grow_buffers(sh, conf->raid_disks)) { - shrink_buffers(sh, conf->raid_disks); - kmem_cache_free(sc, sh); - return 1; - } - /* we just created an active stripe so... */ - atomic_set(&sh->count, 1); - atomic_inc(&conf->active_stripes); - INIT_LIST_HEAD(&sh->lru); - release_stripe(sh); - } - return 0; -} - -static void shrink_stripes(raid5_conf_t *conf) -{ - struct stripe_head *sh; - - while (1) { - spin_lock_irq(&conf->device_lock); - sh = get_free_stripe(conf); - spin_unlock_irq(&conf->device_lock); - if (!sh) - break; - if (atomic_read(&sh->count)) - BUG(); - shrink_buffers(sh, conf->raid_disks); - kmem_cache_free(conf->slab_cache, sh); - atomic_dec(&conf->active_stripes); - } - kmem_cache_destroy(conf->slab_cache); - conf->slab_cache = NULL; -} - -static void raid5_end_read_request (struct bio * bi) -{ - struct stripe_head *sh = bi->bi_private; - raid5_conf_t *conf = sh->raid_conf; - int disks = conf->raid_disks, i; - int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags); - - for (i=0 ; idev[i].req) - break; - - PRINTK("end_read_request %lu/%d, count: %d, uptodate %d.\n", sh->sector, i, atomic_read(&sh->count), uptodate); - if (i == disks) { - BUG(); - return; - } - - if (uptodate) { -#if 0 - struct bio *bio; - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - /* we can return a buffer if we bypassed the cache or - * if the top buffer is not in highmem. If there are - * multiple buffers, leave the extra work to - * handle_stripe - */ - buffer = sh->bh_read[i]; - if (buffer && - (!PageHighMem(buffer->b_page) - || buffer->b_page == bh->b_page ) - ) { - sh->bh_read[i] = buffer->b_reqnext; - buffer->b_reqnext = NULL; - } else - buffer = NULL; - spin_unlock_irqrestore(&conf->device_lock, flags); - if (sh->bh_page[i]==bh->b_page) - set_bit(BH_Uptodate, &bh->b_state); - if (buffer) { - if (buffer->b_page != bh->b_page) - memcpy(buffer->b_data, bh->b_data, bh->b_size); - buffer->b_end_io(buffer, 1); - } -#else - set_bit(R5_UPTODATE, &sh->dev[i].flags); -#endif -<<<<<<< found - } else { - md_error(conf->mddev, bh->b_dev); - clear_bit(BH_Uptodate, &bh->b_state); -||||||| expected - } else { - md_error(conf->mddev, bh->b_bdev); - clear_buffer_uptodate(bh); -======= - } else { - md_error(conf->mddev, bi->bi_bdev); - clear_bit(R5_UPTODATE, &sh->dev[i].flags); ->>>>>>> replacement - } -#if 0 - /* must restore b_page before unlocking buffer... */ - if (sh->bh_page[i] != bh->b_page) { - bh->b_page = sh->bh_page[i]; - bh->b_data = page_address(bh->b_page); - clear_bit(BH_Uptodate, &bh->b_state); - } -#endif -<<<<<<< found - clear_bit(BH_Lock, &bh->b_state); -||||||| expected - clear_buffer_locked(bh); -======= - clear_bit(R5_LOCKED, &sh->dev[i].flags); ->>>>>>> replacement - set_bit(STRIPE_HANDLE, &sh->state); - release_stripe(sh); -} - -static void raid5_end_write_request (struct bio *bi) -{ - struct stripe_head *sh = bi->bi_private; - raid5_conf_t *conf = sh->raid_conf; - int disks = conf->raid_disks, i; - unsigned long flags; - int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags); - - for (i=0 ; idev[i].req) - break; - - PRINTK("end_write_request %lu/%d, count %d, uptodate: %d.\n", sh->sector, i, atomic_read(&sh->count), uptodate); - if (i == disks) { - BUG(); - return; - } - - spin_lock_irqsave(&conf->device_lock, flags); - if (!uptodate) -<<<<<<< found - md_error(conf->mddev, bh->b_dev); - clear_bit(BH_Lock, &bh->b_state); -||||||| expected - md_error(conf->mddev, bh->b_bdev); - clear_buffer_locked(bh); -======= - md_error(conf->mddev, bi->bi_bdev); - - clear_bit(R5_LOCKED, &sh->dev[i].flags); ->>>>>>> replacement - set_bit(STRIPE_HANDLE, &sh->state); - __release_stripe(conf, sh); - spin_unlock_irqrestore(&conf->device_lock, flags); -} - - -static unsigned long compute_blocknr(struct stripe_head *sh, int i); - -static void raid5_build_block (struct stripe_head *sh, int i) -{ - raid5_conf_t *conf = sh->raid_conf; - struct r5dev *dev = &sh->dev[i]; - - bio_init(&dev->req); - dev->req.bi_io_vec = &dev->vec; - dev->req.bi_vcnt++; - dev->vec.bv_page = dev->page; - dev->vec.bv_len = STRIPE_SIZE; - dev->vec.bv_offset = 0; - -<<<<<<< found - bh->b_dev = conf->disks[i].dev; -||||||| expected - bh->b_dev = conf->disks[i].dev; - /* FIXME - later we will need bdev here */ -======= - dev->req.bi_bdev = conf->disks[i].bdev; - dev->req.bi_sector = sh->sector; ->>>>>>> replacement - dev->req.bi_private = sh; - - dev->flags = 0; - if (i != sh->pd_idx) -<<<<<<< found - bh->b_size = sh->size; - bh->b_list = BUF_LOCKED; - return bh; -||||||| expected - bh->b_size = sh->size; - return bh; -======= - dev->sector = compute_blocknr(sh, i); ->>>>>>> replacement -} - -static int error (mddev_t *mddev, kdev_t dev) -{ - raid5_conf_t *conf = (raid5_conf_t *) mddev->private; - mdp_super_t *sb = mddev->sb; - struct disk_info *disk; - int i; - - PRINTK("raid5: error called\n"); - - for (i = 0, disk = conf->disks; i < conf->raid_disks; i++, disk++) { - if (disk->dev == dev) { - if (disk->operational) { - disk->operational = 0; - mark_disk_faulty(sb->disks+disk->number); - mark_disk_nonsync(sb->disks+disk->number); - mark_disk_inactive(sb->disks+disk->number); - sb->active_disks--; - sb->working_disks--; - sb->failed_disks++; - mddev->sb_dirty = 1; - conf->working_disks--; - conf->failed_disks++; - md_wakeup_thread(conf->thread); - printk (KERN_ALERT - "raid5: Disk failure on %s, disabling device." - " Operation continuing on %d devices\n", - partition_name (dev), conf->working_disks); - } - return 0; - } - } - /* - * handle errors in spares (during reconstruction) - */ - if (conf->spare) { - disk = conf->spare; - if (disk->dev == dev) { - printk (KERN_ALERT - "raid5: Disk failure on spare %s\n", - partition_name (dev)); - if (!conf->spare->operational) { - /* probably a SET_DISK_FAULTY ioctl */ - return -EIO; - } - disk->operational = 0; - disk->write_only = 0; - conf->spare = NULL; - mark_disk_faulty(sb->disks+disk->number); - mark_disk_nonsync(sb->disks+disk->number); - mark_disk_inactive(sb->disks+disk->number); - sb->spare_disks--; - sb->working_disks--; - sb->failed_disks++; - - mddev->sb_dirty = 1; - md_wakeup_thread(conf->thread); - - return 0; - } - } - MD_BUG(); - return -EIO; -} - -/* - * Input: a 'big' sector number, - * Output: index of the data and parity disk, and the sector # in them. - */ -static unsigned long raid5_compute_sector(sector_t r_sector, unsigned int raid_disks, - unsigned int data_disks, unsigned int * dd_idx, - unsigned int * pd_idx, raid5_conf_t *conf) -{ - sector_t stripe; - unsigned long chunk_number; - unsigned int chunk_offset; - sector_t new_sector; - int sectors_per_chunk = conf->chunk_size >> 9; - - /* First compute the information on this sector */ - - /* - * Compute the chunk number and the sector offset inside the chunk - */ - chunk_number = r_sector / sectors_per_chunk; - chunk_offset = r_sector % sectors_per_chunk; - - /* - * Compute the stripe number - */ - stripe = chunk_number / data_disks; - - /* - * Compute the data disk and parity disk indexes inside the stripe - */ - *dd_idx = chunk_number % data_disks; - - /* - * Select the parity disk based on the user selected algorithm. - */ - if (conf->level == 4) - *pd_idx = data_disks; - else switch (conf->algorithm) { - case ALGORITHM_LEFT_ASYMMETRIC: - *pd_idx = data_disks - stripe % raid_disks; - if (*dd_idx >= *pd_idx) - (*dd_idx)++; - break; - case ALGORITHM_RIGHT_ASYMMETRIC: - *pd_idx = stripe % raid_disks; - if (*dd_idx >= *pd_idx) - (*dd_idx)++; - break; - case ALGORITHM_LEFT_SYMMETRIC: - *pd_idx = data_disks - stripe % raid_disks; - *dd_idx = (*pd_idx + 1 + *dd_idx) % raid_disks; - break; - case ALGORITHM_RIGHT_SYMMETRIC: - *pd_idx = stripe % raid_disks; - *dd_idx = (*pd_idx + 1 + *dd_idx) % raid_disks; - break; - default: - printk ("raid5: unsupported algorithm %d\n", conf->algorithm); - } - - /* - * Finally, compute the new sector number - */ - new_sector = stripe * sectors_per_chunk + chunk_offset; - return new_sector; -} - - -static sector_t compute_blocknr(struct stripe_head *sh, int i) -{ - raid5_conf_t *conf = sh->raid_conf; - int raid_disks = conf->raid_disks, data_disks = raid_disks - 1; - sector_t new_sector = sh->sector, check; - int sectors_per_chunk = conf->chunk_size >> 9; - sector_t stripe = new_sector / sectors_per_chunk; - int chunk_offset = new_sector % sectors_per_chunk; - int chunk_number, dummy1, dummy2, dd_idx = i; - sector_t r_sector; - - switch (conf->algorithm) { - case ALGORITHM_LEFT_ASYMMETRIC: - case ALGORITHM_RIGHT_ASYMMETRIC: - if (i > sh->pd_idx) - i--; - break; - case ALGORITHM_LEFT_SYMMETRIC: - case ALGORITHM_RIGHT_SYMMETRIC: - if (i < sh->pd_idx) - i += raid_disks; - i -= (sh->pd_idx + 1); - break; - default: - printk ("raid5: unsupported algorithm %d\n", conf->algorithm); - } - - chunk_number = stripe * data_disks + i; - r_sector = chunk_number * sectors_per_chunk + chunk_offset; - - check = raid5_compute_sector (r_sector, raid_disks, data_disks, &dummy1, &dummy2, conf); - if (check != sh->sector || dummy1 != dd_idx || dummy2 != sh->pd_idx) { - printk("compute_blocknr: map not correct\n"); - return 0; - } - return r_sector; -} - - - -/* - * Copy data between a page in the stripe cache, and one or more bion - * The page could align with the middle of the bio, or there could be - * several bion, each with several bio_vecs, which cover part of the page - * Multiple bion are linked together on bi_next. There may be extras - * at the end of this list. We ignore them. - */ -static void copy_data(int frombio, struct bio *bio, - struct page *page, - sector_t sector) -{ - char *pa = page_address(page); - struct bio_vec *bvl; - int i; - - for (;bio && bio->bi_sector < sector+STRIPE_SECTORS; - bio = bio->bi_next) { - int page_offset; - if (bio->bi_sector >= sector) - page_offset = (signed)(bio->bi_sector - sector) * 512; - else - page_offset = (signed)(sector - bio->bi_sector) * -512; - bio_for_each_segment(bvl, bio, i) { - char *ba = __bio_kmap(bio, i); - int len = bio_iovec_idx(bio,i)->bv_len; - int clen; - int b_offset = 0; - - if (page_offset < 0) { - b_offset = -page_offset; - page_offset += b_offset; - len -= b_offset; - } - - if (len > 0 && page_offset + len > STRIPE_SIZE) - clen = STRIPE_SIZE - page_offset; - else clen = len; - - if (len > 0) { - if (frombio) - memcpy(pa+page_offset, ba+b_offset, clen); - else - memcpy(ba+b_offset, pa+page_offset, clen); - } - __bio_kunmap(bio, i); - page_offset += len; - } - } -} - -#define check_xor() do { \ - if (count == MAX_XOR_BLOCKS) { \ - xor_block(count, STRIPE_SIZE, ptr); \ - count = 1; \ - } \ - } while(0) - - -static void compute_block(struct stripe_head *sh, int dd_idx) -{ - raid5_conf_t *conf = sh->raid_conf; - int i, count, disks = conf->raid_disks; - void *ptr[MAX_XOR_BLOCKS], *p; - - PRINTK("compute_block, stripe %lu, idx %d\n", sh->sector, dd_idx); - - ptr[0] = page_address(sh->dev[dd_idx].page); - memset(ptr[0], 0, STRIPE_SIZE); - count = 1; - for (i = disks ; i--; ) { - if (i == dd_idx) - continue; - p = page_address(sh->dev[i].page); - if (test_bit(R5_UPTODATE, &sh->dev[i].flags)) - ptr[count++] = p; - else - printk("compute_block() %d, stripe %lu, %d not present\n", dd_idx, sh->sector, i); - - check_xor(); - } - if (count != 1) -<<<<<<< found - xor_block(count, bh_ptr); - set_bit(BH_Uptodate, &sh->bh_cache[dd_idx]->b_state); -||||||| expected - xor_block(count, bh_ptr); - set_buffer_uptodate(sh->bh_cache[dd_idx]); -======= - xor_block(count, STRIPE_SIZE, ptr); - set_bit(R5_UPTODATE, &sh->dev[i].flags); ->>>>>>> replacement -} - -static void compute_parity(struct stripe_head *sh, int method) -{ - raid5_conf_t *conf = sh->raid_conf; - int i, pd_idx = sh->pd_idx, disks = conf->raid_disks, count; - void *ptr[MAX_XOR_BLOCKS]; - struct bio *chosen[MD_SB_DISKS]; - - PRINTK("compute_parity, stripe %lu, method %d\n", sh->sector, method); - memset(chosen, 0, sizeof(chosen)); - - count = 1; - ptr[0] = page_address(sh->dev[pd_idx].page); - switch(method) { - case READ_MODIFY_WRITE: - if (!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags)) - BUG(); - for (i=disks ; i-- ;) { - if (i==pd_idx) - continue; - if (sh->dev[i].towrite && - test_bit(R5_UPTODATE, &sh->dev[i].flags)) { - ptr[count++] = page_address(sh->dev[i].page); - chosen[i] = sh->dev[i].towrite; - sh->dev[i].towrite = NULL; - if (sh->dev[i].written) BUG(); - sh->dev[i].written = chosen[i]; - check_xor(); - } - } - break; - case RECONSTRUCT_WRITE: - memset(ptr[0], 0, STRIPE_SIZE); - for (i= disks; i-- ;) - if (i!=pd_idx && sh->dev[i].towrite) { - chosen[i] = sh->dev[i].towrite; - sh->dev[i].towrite = NULL; - if (sh->dev[i].written) BUG(); - sh->dev[i].written = chosen[i]; - } - break; - case CHECK_PARITY: - break; - } - if (count>1) { - xor_block(count, STRIPE_SIZE, ptr); - count = 1; - } - - for (i = disks; i--;) - if (chosen[i]) { - sector_t sector = sh->dev[i].sector; - copy_data(1, chosen[i], sh->dev[i].page, sector); - -<<<<<<< found - memcpy(bh->b_data, - bdata,sh->size); - bh_kunmap(chosen[i]); - set_bit(BH_Lock, &bh->b_state); - mark_buffer_uptodate(bh, 1); -||||||| expected - memcpy(bh->b_data, - bdata,sh->size); - bh_kunmap(chosen[i]); - set_buffer_locked(bh); - set_buffer_uptodate(bh); -======= - set_bit(R5_LOCKED, &sh->dev[i].flags); - set_bit(R5_UPTODATE, &sh->dev[i].flags); ->>>>>>> replacement - } - - switch(method) { - case RECONSTRUCT_WRITE: - case CHECK_PARITY: - for (i=disks; i--;) - if (i != pd_idx) { - ptr[count++] = page_address(sh->dev[i].page); - check_xor(); - } - break; - case READ_MODIFY_WRITE: - for (i = disks; i--;) - if (chosen[i]) { - ptr[count++] = page_address(sh->dev[i].page); - check_xor(); - } - } - if (count != 1) - xor_block(count, STRIPE_SIZE, ptr); - - if (method != CHECK_PARITY) { -<<<<<<< found - mark_buffer_uptodate(sh->bh_cache[pd_idx], 1); - set_bit(BH_Lock, &sh->bh_cache[pd_idx]->b_state); - } else - mark_buffer_uptodate(sh->bh_cache[pd_idx], 0); -||||||| expected - set_buffer_uptodate(sh->bh_cache[pd_idx]); - set_buffer_locked(sh->bh_cache[pd_idx]); - } else - clear_buffer_uptodate(sh->bh_cache[pd_idx]); -======= - set_bit(R5_UPTODATE, &sh->dev[pd_idx].flags); - set_bit(R5_LOCKED, &sh->dev[pd_idx].flags); - } else - clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags); ->>>>>>> replacement -} - -/* - * Each stripe/dev can have one or more bion attached. - * toread/towrite point to the first in a chain. - * The bi_next chain must be in order. - */ -static void add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite) -{ - struct bio **bip; - raid5_conf_t *conf = sh->raid_conf; - - PRINTK("adding bh b#%lu to stripe s#%lu\n", bi->bi_sector, sh->sector); - - - spin_lock(&sh->lock); - spin_lock_irq(&conf->device_lock); - if (forwrite) - bip = &sh->dev[dd_idx].towrite; - else - bip = &sh->dev[dd_idx].toread; - while (*bip && (*bip)->bi_sector < bi->bi_sector) - bip = & (*bip)->bi_next; -/* FIXME do I need to worry about overlapping bion */ - if (*bip && bi->bi_next && (*bip) != bi->bi_next) - BUG(); - if (*bip) - bi->bi_next = *bip; - *bip = bi; - bi->bi_phys_segments ++; - spin_unlock_irq(&conf->device_lock); - spin_unlock(&sh->lock); - - if (forwrite) { - /* check if page is coverred */ - sector_t sector = sh->dev[dd_idx].sector; - for (bi=sh->dev[dd_idx].towrite; - sector < sh->dev[dd_idx].sector + STRIPE_SECTORS && - bi && bi->bi_sector <= sector; - bi = bi->bi_next) { - if (bi->bi_sector + (bi->bi_size>>9) >= sector) - sector = bi->bi_sector + (bi->bi_size>>9); - } - if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS) - set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags); - } - - PRINTK("added bi b#%lu to stripe s#%lu, disk %d.\n", bi->bi_sector, sh->sector, dd_idx); -} - - -/* - * handle_stripe - do things to a stripe. - * - * We lock the stripe and then examine the state of various bits - * to see what needs to be done. - * Possible results: - * return some read request which now have data - * return some write requests which are safely on disc - * schedule a read on some buffers - * schedule a write of some buffers - * return confirmation of parity correctness - * - * Parity calculations are done inside the stripe lock - * buffers are taken off read_list or write_list, and bh_cache buffers - * get BH_Lock set before the stripe lock is released. - * - */ - -static void handle_stripe(struct stripe_head *sh) -{ - raid5_conf_t *conf = sh->raid_conf; - int disks = conf->raid_disks; - struct bio *return_bi= NULL; - struct bio *bi; - int action[MD_SB_DISKS]; - int i; - int syncing; - int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0; - int failed_num=0; - struct r5dev *dev; - - PRINTK("handling stripe %ld, cnt=%d, pd_idx=%d\n", sh->sector, atomic_read(&sh->count), sh->pd_idx); - memset(action, 0, sizeof(action)); - - spin_lock(&sh->lock); - clear_bit(STRIPE_HANDLE, &sh->state); - clear_bit(STRIPE_DELAYED, &sh->state); - - syncing = test_bit(STRIPE_SYNCING, &sh->state); - /* Now to look around and see what can be done */ - - for (i=disks; i--; ) { - dev = &sh->dev[i]; - PRINTK("check %d: state 0x%lx read %p write %p written %p\n", i, - dev->flags, dev->toread, dev->towrite, dev->written); - /* maybe we can reply to a read */ - if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread) { - struct bio *rbi, *rbi2; - PRINTK("Return read for disc %d\n", i); - spin_lock_irq(&conf->device_lock); - rbi = dev->toread; - dev->toread = NULL; - spin_unlock_irq(&conf->device_lock); - while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) { - copy_data(0, rbi, dev->page, dev->sector); - rbi2 = rbi->bi_next; - spin_lock_irq(&conf->device_lock); - if (--rbi->bi_phys_segments == 0) { - rbi->bi_next = return_bi; - return_bi = rbi; - } - spin_unlock_irq(&conf->device_lock); - rbi = rbi2; - } - } - - /* now count some things */ - if (test_bit(R5_LOCKED, &dev->flags)) locked++; - if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++; - - - if (dev->toread) to_read++; - if (dev->towrite) to_write++; - if (dev->written) written++; - if (!conf->disks[i].operational) { - failed++; - failed_num = i; - } - } - PRINTK("locked=%d uptodate=%d to_read=%d to_write=%d failed=%d failed_num=%d\n", - locked, uptodate, to_read, to_write, failed, failed_num); - /* check if the array has lost two devices and, if so, some requests might - * need to be failed - */ - if (failed > 1 && to_read+to_write) { - spin_lock_irq(&conf->device_lock); - for (i=disks; i--; ) { - /* fail all writes first */ - bi = sh->dev[i].towrite; - sh->dev[i].towrite = NULL; - if (bi) to_write--; - - while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){ - struct bio *nextbi = bi->bi_next; - clear_bit(BIO_UPTODATE, &bi->bi_flags); - if (--bi->bi_phys_segments == 0) { - bi->bi_next = return_bi; - return_bi = bi; - } - bi = nextbi; - } - /* fail any reads if this device is non-operational */ - if (!conf->disks[i].operational) { - bi = sh->dev[i].toread; - sh->dev[i].toread = NULL; - if (bi) to_read--; - while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){ - struct bio *nextbi = bi->bi_next; - clear_bit(BIO_UPTODATE, &bi->bi_flags); - if (--bi->bi_phys_segments == 0) { - bi->bi_next = return_bi; - return_bi = bi; - } - bi = nextbi; - } - } - } - spin_unlock_irq(&conf->device_lock); - } - if (failed > 1 && syncing) { - md_done_sync(conf->mddev, STRIPE_SECTORS,0); - clear_bit(STRIPE_SYNCING, &sh->state); - syncing = 0; - } - - /* might be able to return some write requests if the parity block - * is safe, or on a failed drive - */ - dev = &sh->dev[sh->pd_idx]; - if ( written && - ( (conf->disks[sh->pd_idx].operational && !test_bit(R5_LOCKED, &dev->flags) && - test_bit(R5_UPTODATE, &dev->flags)) - || (failed == 1 && failed_num == sh->pd_idx)) - ) { - /* any written block on an uptodate or failed drive can be returned */ - for (i=disks; i--; ) - if (sh->dev[i].written) { - dev = &sh->dev[i]; - if (!conf->disks[sh->pd_idx].operational || - (!test_bit(R5_LOCKED, &dev->flags) && test_bit(R5_UPTODATE, &dev->flags)) ) { - /* maybe we can return some write requests */ - struct bio *wbi, *wbi2; - PRINTK("Return write for disc %d\n", i); - wbi = dev->written; - dev->written = NULL; - while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) { - wbi2 = wbi->bi_next; - if (--wbi->bi_phys_segments == 0) { - wbi->bi_next = return_bi; - return_bi = wbi; - } - wbi = wbi2; - } - } - } - } - - /* Now we might consider reading some blocks, either to check/generate - * parity, or to satisfy requests - */ - if (to_read || (syncing && (uptodate+failed < disks))) { - for (i=disks; i--;) { - dev = &sh->dev[i]; - if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) && - (dev->toread || syncing || (failed && sh->dev[failed_num].toread))) { - /* we would like to get this block, possibly - * by computing it, but we might not be able to - */ - if (uptodate == disks-1) { - PRINTK("Computing block %d\n", i); - compute_block(sh, i); - uptodate++; - } else if (conf->disks[i].operational) { -<<<<<<< found - set_bit(BH_Lock, &bh->b_state); - action[i] = READ+1; -||||||| expected - set_buffer_locked(bh); - action[i] = READ+1; -======= - set_bit(R5_LOCKED, &dev->flags); - action[i] = READ+1; -#if 0 ->>>>>>> replacement - /* if I am just reading this block and we don't have - a failed drive, or any pending writes then sidestep the cache */ - if (sh->bh_read[i] && !sh->bh_read[i]->b_reqnext && - ! syncing && !failed && !to_write) { - sh->bh_cache[i]->b_page = sh->bh_read[i]->b_page; - sh->bh_cache[i]->b_data = sh->bh_read[i]->b_data; - } -#endif - locked++; - PRINTK("Reading block %d (sync=%d)\n", i, syncing); - if (syncing) - md_sync_acct(conf->disks[i].dev, STRIPE_SECTORS); - } - } - } - set_bit(STRIPE_HANDLE, &sh->state); - } - - /* now to consider writing and what else, if anything should be read */ - if (to_write) { - int rmw=0, rcw=0; - for (i=disks ; i--;) { - /* would I have to read this buffer for read_modify_write */ - dev = &sh->dev[i]; - if ((dev->towrite || i == sh->pd_idx) && - (!test_bit(R5_LOCKED, &dev->flags) -#if 0 -|| sh->bh_page[i]!=bh->b_page -#endif - ) && - !test_bit(R5_UPTODATE, &dev->flags)) { - if (conf->disks[i].operational -/* && !(conf->resync_parity && i == sh->pd_idx) */ - ) - rmw++; - else rmw += 2*disks; /* cannot read it */ - } - /* Would I have to read this buffer for reconstruct_write */ - if (!test_bit(R5_OVERWRITE, &dev->flags) && i != sh->pd_idx && - (!test_bit(R5_LOCKED, &dev->flags) -#if 0 -|| sh->bh_page[i] != bh->b_page -#endif - ) && - !test_bit(R5_UPTODATE, &dev->flags)) { - if (conf->disks[i].operational) rcw++; - else rcw += 2*disks; - } - } - PRINTK("for sector %ld, rmw=%d rcw=%d\n", sh->sector, rmw, rcw); - set_bit(STRIPE_HANDLE, &sh->state); - if (rmw < rcw && rmw > 0) - /* prefer read-modify-write, but need to get some data */ - for (i=disks; i--;) { - dev = &sh->dev[i]; - if ((dev->towrite || i == sh->pd_idx) && - !test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) && - conf->disks[i].operational) { - if (test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - { - PRINTK("Read_old block %d for r-m-w\n", i); -<<<<<<< found - set_bit(BH_Lock, &bh->b_state); -||||||| expected - set_buffer_locked(bh); -======= - set_bit(R5_LOCKED, &dev->flags); ->>>>>>> replacement - action[i] = READ+1; - locked++; - } else { - set_bit(STRIPE_DELAYED, &sh->state); - set_bit(STRIPE_HANDLE, &sh->state); - } - } - } - if (rcw <= rmw && rcw > 0) - /* want reconstruct write, but need to get some data */ - for (i=disks; i--;) { - dev = &sh->dev[i]; - if (!test_bit(R5_OVERWRITE, &dev->flags) && i != sh->pd_idx && - !test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) && - conf->disks[i].operational) { - if (test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - { - PRINTK("Read_old block %d for Reconstruct\n", i); -<<<<<<< found - set_bit(BH_Lock, &bh->b_state); -||||||| expected - set_buffer_locked(bh); -======= - set_bit(R5_LOCKED, &dev->flags); ->>>>>>> replacement - action[i] = READ+1; - locked++; - } else { - set_bit(STRIPE_DELAYED, &sh->state); - set_bit(STRIPE_HANDLE, &sh->state); - } - } - } - /* now if nothing is locked, and if we have enough data, we can start a write request */ - if (locked == 0 && (rcw == 0 ||rmw == 0)) { - PRINTK("Computing parity...\n"); - compute_parity(sh, rcw==0 ? RECONSTRUCT_WRITE : READ_MODIFY_WRITE); - /* now every locked buffer is ready to be written */ - for (i=disks; i--;) - if (test_bit(R5_LOCKED, &sh->dev[i].flags)) { - PRINTK("Writing block %d\n", i); - locked++; - action[i] = WRITE+1; - if (!conf->disks[i].operational - || (i==sh->pd_idx && failed == 0)) - set_bit(STRIPE_INSYNC, &sh->state); - } - if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { - atomic_dec(&conf->preread_active_stripes); - if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD) - md_wakeup_thread(conf->thread); - } - } - } - - /* maybe we need to check and possibly fix the parity for this stripe - * Any reads will already have been scheduled, so we just see if enough data - * is available - */ - if (syncing && locked == 0 && - !test_bit(STRIPE_INSYNC, &sh->state) && failed <= 1) { - set_bit(STRIPE_HANDLE, &sh->state); - if (failed == 0) { - char *pagea; - if (uptodate != disks) - BUG(); - compute_parity(sh, CHECK_PARITY); - uptodate--; - pagea = page_address(sh->dev[sh->pd_idx].page); - if ((*(u32*)pagea) == 0 && - !memcmp(pagea, pagea+4, STRIPE_SIZE-4)) { - /* parity is correct (on disc, not in buffer any more) */ - set_bit(STRIPE_INSYNC, &sh->state); - } - } - if (!test_bit(STRIPE_INSYNC, &sh->state)) { - struct disk_info *spare; - if (failed==0) - failed_num = sh->pd_idx; - /* should be able to compute the missing block and write it to spare */ - if (!test_bit(R5_UPTODATE, &sh->dev[failed_num].flags)) { - if (uptodate+1 != disks) - BUG(); - compute_block(sh, failed_num); - uptodate++; - } - if (uptodate != disks) - BUG(); -<<<<<<< found - bh = sh->bh_cache[failed_num]; - set_bit(BH_Lock, &bh->b_state); -||||||| expected - bh = sh->bh_cache[failed_num]; - set_buffer_locked(bh); -======= - dev = &sh->dev[failed_num]; - set_bit(R5_LOCKED, &dev->flags); ->>>>>>> replacement - action[failed_num] = WRITE+1; - locked++; - set_bit(STRIPE_INSYNC, &sh->state); - if (conf->disks[failed_num].operational) - md_sync_acct(conf->disks[failed_num].dev, STRIPE_SECTORS); - else if ((spare=conf->spare)) - md_sync_acct(spare->dev, STRIPE_SECTORS); - - } - } - if (syncing && locked == 0 && test_bit(STRIPE_INSYNC, &sh->state)) { - md_done_sync(conf->mddev, STRIPE_SECTORS,1); - clear_bit(STRIPE_SYNCING, &sh->state); - } - - spin_unlock(&sh->lock); - - while ((bi=return_bi)) { - return_bi = bi->bi_next; - bi->bi_next = NULL; - bi->bi_end_io(bi); - } - for (i=disks; i-- ;) - if (action[i]) { - struct bio *bi = &sh->dev[i].req; - struct disk_info *spare = conf->spare; - int skip = 0; - if (action[i] == READ+1) - bi->bi_end_io = raid5_end_read_request; - else - bi->bi_end_io = raid5_end_write_request; - if (conf->disks[i].operational) - bi->bi_bdev = conf->disks[i].bdev; - else if (spare && action[i] == WRITE+1) - bi->bi_bdev = spare->bdev; - else skip=1; - if (!skip) { - PRINTK("for %ld schedule op %d on disc %d\n", sh->sector, action[i]-1, i); - atomic_inc(&sh->count); - bi->bi_sector = sh->sector; - if (action[i] == READ+1) - bi->bi_rw = 0; - else - bi->bi_rw = 1; - bi->bi_flags = 0; - bi->bi_vcnt = 1; - bi->bi_idx = 0; - bi->bi_io_vec = &sh->dev[i].vec; - bi->bi_size = STRIPE_SIZE; - bi->bi_next = NULL; - generic_make_request(bi); - } else { - PRINTK("skip op %d on disc %d for sector %ld\n", action[i]-1, i, sh->sector); -<<<<<<< found - clear_bit(BH_Lock, &bh->b_state); -||||||| expected - clear_buffer_locked(bh); -======= - clear_bit(R5_LOCKED, &dev->flags); ->>>>>>> replacement - set_bit(STRIPE_HANDLE, &sh->state); - } - } -} - -static inline void raid5_activate_delayed(raid5_conf_t *conf) -{ - if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD) { - while (!list_empty(&conf->delayed_list)) { - struct list_head *l = conf->delayed_list.next; - struct stripe_head *sh; - sh = list_entry(l, struct stripe_head, lru); - list_del_init(l); - clear_bit(STRIPE_DELAYED, &sh->state); - if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - atomic_inc(&conf->preread_active_stripes); - list_add_tail(&sh->lru, &conf->handle_list); - } - } -} -static void raid5_unplug_device(void *data) -{ - raid5_conf_t *conf = (raid5_conf_t *)data; - unsigned long flags; - - spin_lock_irqsave(&conf->device_lock, flags); - - raid5_activate_delayed(conf); - - conf->plugged = 0; - md_wakeup_thread(conf->thread); - - spin_unlock_irqrestore(&conf->device_lock, flags); -} - -static inline void raid5_plug_device(raid5_conf_t *conf) -{ - spin_lock_irq(&conf->device_lock); - if (list_empty(&conf->delayed_list)) - if (!conf->plugged) { - conf->plugged = 1; - queue_task(&conf->plug_tq, &tq_disk); - } - spin_unlock_irq(&conf->device_lock); -} - -static int make_request (mddev_t *mddev, int rw, struct bio * bi) -{ - raid5_conf_t *conf = (raid5_conf_t *) mddev->private; - const unsigned int raid_disks = conf->raid_disks; - const unsigned int data_disks = raid_disks - 1; - unsigned int dd_idx, pd_idx; - sector_t new_sector; - sector_t logical_sector, last_sector; - int read_ahead = 0; - - struct stripe_head *sh; - - if (rw == READA) { - rw = READ; - read_ahead=1; - } - - logical_sector = bi->bi_sector & ~(STRIPE_SECTORS-1); - last_sector = bi->bi_sector + (bi->bi_size>>9); - - bi->bi_next = NULL; - set_bit(BIO_UPTODATE, &bi->bi_flags); /* will be cleared if error detected */ - bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ - for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) { - - new_sector = raid5_compute_sector(logical_sector, - raid_disks, data_disks, &dd_idx, &pd_idx, conf); - - PRINTK("raid5: make_request, sector %ul logical %ul\n", - new_sector, logical_sector); - - sh = get_active_stripe(conf, new_sector, pd_idx, read_ahead); - if (sh) { - - add_stripe_bio(sh, bi, dd_idx, rw); - - raid5_plug_device(conf); - handle_stripe(sh); - release_stripe(sh); - } -<<<<<<< found - } else - bh->b_end_io(bh, test_bit(BH_Uptodate, &bh->b_state)); -||||||| expected - } else - bh->b_end_io(bh, buffer_uptodate(bh)); -======= - } - spin_lock_irq(&conf->device_lock); - if (--bi->bi_phys_segments == 0) - bi->bi_end_io(bi); - spin_unlock_irq(&conf->device_lock); ->>>>>>> replacement - return 0; -} - -<<<<<<< found -/* - * Determine correct block size for this device. - */ -unsigned int device_bsize (kdev_t dev) -{ - unsigned int i, correct_size; - - correct_size = BLOCK_SIZE; - if (blksize_size[MAJOR(dev)]) { - i = blksize_size[MAJOR(dev)][MINOR(dev)]; - if (i) - correct_size = i; - } - - return correct_size; -} - -||||||| expected -======= -/* FIXME go_faster isn't used */ ->>>>>>> replacement -static int sync_request (mddev_t *mddev, sector_t sector_nr, int go_faster) -{ - raid5_conf_t *conf = (raid5_conf_t *) mddev->private; - struct stripe_head *sh; - int sectors_per_chunk = conf->chunk_size >> 9; - unsigned long stripe = sector_nr/sectors_per_chunk; - int chunk_offset = sector_nr % sectors_per_chunk; - int dd_idx, pd_idx; - unsigned long first_sector; - int raid_disks = conf->raid_disks; - int data_disks = raid_disks-1; - - first_sector = raid5_compute_sector(stripe*data_disks*sectors_per_chunk - + chunk_offset, raid_disks, data_disks, &dd_idx, &pd_idx, conf); - sh = get_active_stripe(conf, sector_nr, pd_idx, 0); - spin_lock(&sh->lock); - set_bit(STRIPE_SYNCING, &sh->state); - clear_bit(STRIPE_INSYNC, &sh->state); - spin_unlock(&sh->lock); - - handle_stripe(sh); - release_stripe(sh); - - return STRIPE_SECTORS; -} - -/* - * This is our raid5 kernel thread. - * - * We scan the hash table for stripes which can be handled now. - * During the scan, completed stripes are saved for us by the interrupt - * handler, so that they will not have to wait for our next wakeup. - */ -static void raid5d (void *data) -{ - struct stripe_head *sh; - raid5_conf_t *conf = data; - mddev_t *mddev = conf->mddev; - int handled; - - PRINTK("+++ raid5d active\n"); - - handled = 0; - - if (mddev->sb_dirty) - md_update_sb(mddev); - spin_lock_irq(&conf->device_lock); - while (1) { - struct list_head *first; - - if (list_empty(&conf->handle_list) && - atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD && - !conf->plugged && - !list_empty(&conf->delayed_list)) - raid5_activate_delayed(conf); - - if (list_empty(&conf->handle_list)) - break; - - first = conf->handle_list.next; - sh = list_entry(first, struct stripe_head, lru); - - list_del_init(first); - atomic_inc(&sh->count); - if (atomic_read(&sh->count)!= 1) - BUG(); - spin_unlock_irq(&conf->device_lock); - - handled++; - handle_stripe(sh); - release_stripe(sh); - - spin_lock_irq(&conf->device_lock); - } - PRINTK("%d stripes handled\n", handled); - - spin_unlock_irq(&conf->device_lock); - - PRINTK("--- raid5d inactive\n"); -} - -/* - * Private kernel thread for parity reconstruction after an unclean - * shutdown. Reconstruction on spare drives in case of a failed drive - * is done by the generic mdsyncd. - */ -static void raid5syncd (void *data) -{ - raid5_conf_t *conf = data; - mddev_t *mddev = conf->mddev; - - if (!conf->resync_parity) - return; - if (conf->resync_parity == 2) - return; - down(&mddev->recovery_sem); - if (md_do_sync(mddev,NULL)) { - up(&mddev->recovery_sem); - printk("raid5: resync aborted!\n"); - return; - } - conf->resync_parity = 0; - up(&mddev->recovery_sem); - printk("raid5: resync finished.\n"); -} - -static int run (mddev_t *mddev) -{ - raid5_conf_t *conf; - int i, j, raid_disk, memory; - mdp_super_t *sb = mddev->sb; - mdp_disk_t *desc; - mdk_rdev_t *rdev; - struct disk_info *disk; - struct list_head *tmp; - int start_recovery = 0; - - MOD_INC_USE_COUNT; - - if (sb->level != 5 && sb->level != 4) { - printk("raid5: md%d: raid level not set to 4/5 (%d)\n", mdidx(mddev), sb->level); - MOD_DEC_USE_COUNT; - return -EIO; - } - - mddev->private = kmalloc (sizeof (raid5_conf_t), GFP_KERNEL); - if ((conf = mddev->private) == NULL) - goto abort; - memset (conf, 0, sizeof (*conf)); - conf->mddev = mddev; - - if ((conf->stripe_hashtbl = (struct stripe_head **) __get_free_pages(GFP_ATOMIC, HASH_PAGES_ORDER)) == NULL) - goto abort; - memset(conf->stripe_hashtbl, 0, HASH_PAGES * PAGE_SIZE); - - conf->device_lock = SPIN_LOCK_UNLOCKED; - init_waitqueue_head(&conf->wait_for_stripe); - INIT_LIST_HEAD(&conf->handle_list); - INIT_LIST_HEAD(&conf->delayed_list); - INIT_LIST_HEAD(&conf->inactive_list); - atomic_set(&conf->active_stripes, 0); - atomic_set(&conf->preread_active_stripes, 0); - - conf->plugged = 0; - conf->plug_tq.sync = 0; - conf->plug_tq.routine = &raid5_unplug_device; - conf->plug_tq.data = conf; - - PRINTK("raid5: run(md%d) called.\n", mdidx(mddev)); - - ITERATE_RDEV(mddev,rdev,tmp) { - /* - * This is important -- we are using the descriptor on - * the disk only to get a pointer to the descriptor on - * the main superblock, which might be more recent. - */ - desc = sb->disks + rdev->desc_nr; - raid_disk = desc->raid_disk; - disk = conf->disks + raid_disk; - - if (disk_faulty(desc)) { - printk(KERN_ERR "raid5: disabled device %s (errors detected)\n", partition_name(rdev->dev)); - if (!rdev->faulty) { - MD_BUG(); - goto abort; - } - disk->number = desc->number; - disk->raid_disk = raid_disk; - disk->dev = rdev->dev; - disk->bdev = rdev->bdev; - - disk->operational = 0; - disk->write_only = 0; - disk->spare = 0; - disk->used_slot = 1; - continue; - } - if (disk_active(desc)) { - if (!disk_sync(desc)) { - printk(KERN_ERR "raid5: disabled device %s (not in sync)\n", partition_name(rdev->dev)); - MD_BUG(); - goto abort; - } - if (raid_disk > sb->raid_disks) { - printk(KERN_ERR "raid5: disabled device %s (inconsistent descriptor)\n", partition_name(rdev->dev)); - continue; - } - if (disk->operational) { - printk(KERN_ERR "raid5: disabled device %s (device %d already operational)\n", partition_name(rdev->dev), raid_disk); - continue; - } - printk(KERN_INFO "raid5: device %s operational as raid disk %d\n", partition_name(rdev->dev), raid_disk); - - disk->number = desc->number; - disk->raid_disk = raid_disk; - disk->dev = rdev->dev; - disk->bdev = rdev->bdev; - disk->operational = 1; - disk->used_slot = 1; - - conf->working_disks++; - } else { - /* - * Must be a spare disk .. - */ - printk(KERN_INFO "raid5: spare disk %s\n", partition_name(rdev->dev)); - disk->number = desc->number; - disk->raid_disk = raid_disk; - disk->dev = rdev->dev; - disk->bdev = rdev->bdev; - - disk->operational = 0; - disk->write_only = 0; - disk->spare = 1; - disk->used_slot = 1; - } - } - - for (i = 0; i < MD_SB_DISKS; i++) { - desc = sb->disks + i; - raid_disk = desc->raid_disk; - disk = conf->disks + raid_disk; - - if (disk_faulty(desc) && (raid_disk < sb->raid_disks) && - !conf->disks[raid_disk].used_slot) { - - disk->number = desc->number; - disk->raid_disk = raid_disk; - disk->dev = MKDEV(0,0); - disk->bdev = NULL; - - disk->operational = 0; - disk->write_only = 0; - disk->spare = 0; - disk->used_slot = 1; - } - } - - conf->raid_disks = sb->raid_disks; - /* - * 0 for a fully functional array, 1 for a degraded array. - */ - conf->failed_disks = conf->raid_disks - conf->working_disks; - conf->mddev = mddev; - conf->chunk_size = sb->chunk_size; - conf->level = sb->level; - conf->algorithm = sb->layout; - conf->max_nr_stripes = NR_STRIPES; - -#if 0 - for (i = 0; i < conf->raid_disks; i++) { - if (!conf->disks[i].used_slot) { - MD_BUG(); - goto abort; - } - } -#endif - if (!conf->chunk_size || conf->chunk_size % 4) { - printk(KERN_ERR "raid5: invalid chunk size %d for md%d\n", conf->chunk_size, mdidx(mddev)); - goto abort; - } - if (conf->algorithm > ALGORITHM_RIGHT_SYMMETRIC) { - printk(KERN_ERR "raid5: unsupported parity algorithm %d for md%d\n", conf->algorithm, mdidx(mddev)); - goto abort; - } - if (conf->failed_disks > 1) { - printk(KERN_ERR "raid5: not enough operational devices for md%d (%d/%d failed)\n", mdidx(mddev), conf->failed_disks, conf->raid_disks); - goto abort; - } - - if (conf->working_disks != sb->raid_disks) { - printk(KERN_ALERT "raid5: md%d, not all disks are operational -- trying to recover array\n", mdidx(mddev)); - start_recovery = 1; - } - - { - const char * name = "raid5d"; - - conf->thread = md_register_thread(raid5d, conf, name); - if (!conf->thread) { - printk(KERN_ERR "raid5: couldn't allocate thread for md%d\n", mdidx(mddev)); - goto abort; - } - } - - memory = conf->max_nr_stripes * (sizeof(struct stripe_head) + - conf->raid_disks * ((sizeof(struct buffer_head) + PAGE_SIZE))) / 1024; - if (grow_stripes(conf, conf->max_nr_stripes)) { - printk(KERN_ERR "raid5: couldn't allocate %dkB for buffers\n", memory); - shrink_stripes(conf); - goto abort; - } else - printk(KERN_INFO "raid5: allocated %dkB for md%d\n", memory, mdidx(mddev)); - - /* - * Regenerate the "device is in sync with the raid set" bit for - * each device. - */ - for (i = 0; i < MD_SB_DISKS ; i++) { - mark_disk_nonsync(sb->disks + i); - for (j = 0; j < sb->raid_disks; j++) { - if (!conf->disks[j].operational) - continue; - if (sb->disks[i].number == conf->disks[j].number) - mark_disk_sync(sb->disks + i); - } - } - sb->active_disks = conf->working_disks; - - if (sb->active_disks == sb->raid_disks) - printk("raid5: raid level %d set md%d active with %d out of %d devices, algorithm %d\n", conf->level, mdidx(mddev), sb->active_disks, sb->raid_disks, conf->algorithm); - else - printk(KERN_ALERT "raid5: raid level %d set md%d active with %d out of %d devices, algorithm %d\n", conf->level, mdidx(mddev), sb->active_disks, sb->raid_disks, conf->algorithm); - - if (!start_recovery && !(sb->state & (1 << MD_SB_CLEAN))) { - const char * name = "raid5syncd"; - - conf->resync_thread = md_register_thread(raid5syncd, conf,name); - if (!conf->resync_thread) { - printk(KERN_ERR "raid5: couldn't allocate thread for md%d\n", mdidx(mddev)); - goto abort; - } - - printk("raid5: raid set md%d not clean; reconstructing parity\n", mdidx(mddev)); - conf->resync_parity = 1; - md_wakeup_thread(conf->resync_thread); - } - - print_raid5_conf(conf); - if (start_recovery) - md_recover_arrays(); - print_raid5_conf(conf); - - /* Ok, everything is just fine now */ - return (0); -abort: - if (conf) { - print_raid5_conf(conf); - if (conf->stripe_hashtbl) - free_pages((unsigned long) conf->stripe_hashtbl, - HASH_PAGES_ORDER); - kfree(conf); - } - mddev->private = NULL; - printk(KERN_ALERT "raid5: failed to run raid set md%d\n", mdidx(mddev)); - MOD_DEC_USE_COUNT; - return -EIO; -} - -static int stop_resync (mddev_t *mddev) -{ - raid5_conf_t *conf = mddev_to_conf(mddev); - mdk_thread_t *thread = conf->resync_thread; - - if (thread) { - if (conf->resync_parity) { - conf->resync_parity = 2; - md_interrupt_thread(thread); - printk(KERN_INFO "raid5: parity resync was not fully finished, restarting next time.\n"); - return 1; - } - return 0; - } - return 0; -} - -static int restart_resync (mddev_t *mddev) -{ - raid5_conf_t *conf = mddev_to_conf(mddev); - - if (conf->resync_parity) { - if (!conf->resync_thread) { - MD_BUG(); - return 0; - } - printk("raid5: waking up raid5resync.\n"); - conf->resync_parity = 1; - md_wakeup_thread(conf->resync_thread); - return 1; - } else - printk("raid5: no restart-resync needed.\n"); - return 0; -} - - -static int stop (mddev_t *mddev) -{ - raid5_conf_t *conf = (raid5_conf_t *) mddev->private; - - if (conf->resync_thread) - md_unregister_thread(conf->resync_thread); - md_unregister_thread(conf->thread); - shrink_stripes(conf); - free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER); - kfree(conf); - mddev->private = NULL; - MOD_DEC_USE_COUNT; - return 0; -} - -#if RAID5_DEBUG -static void print_sh (struct stripe_head *sh) -{ - int i; - - printk("sh %lu, pd_idx %d, state %ld.\n", sh->sector, sh->pd_idx, sh->state); - printk("sh %lu, count %d.\n", sh->sector, atomic_read(&sh->count)); - printk("sh %lu, ", sh->sector); - for (i = 0; i < sh->raid_conf->raid_disks; i++) { - printk("(cache%d: %p %ld) ", i, sh->dev[i].page, sh->dev[i].flags); - } - printk("\n"); -} - -static void printall (raid5_conf_t *conf) -{ - struct stripe_head *sh; - int i; - - spin_lock_irq(&conf->device_lock); - for (i = 0; i < NR_HASH; i++) { - sh = conf->stripe_hashtbl[i]; - for (; sh; sh = sh->hash_next) { - if (sh->raid_conf != conf) - continue; - print_sh(sh); - } - } - spin_unlock_irq(&conf->device_lock); - - PRINTK("--- raid5d inactive\n"); -} -#endif - -static void status (struct seq_file *seq, mddev_t *mddev) -{ - raid5_conf_t *conf = (raid5_conf_t *) mddev->private; - mdp_super_t *sb = mddev->sb; - int i; - - seq_printf (seq, " level %d, %dk chunk, algorithm %d", sb->level, sb->chunk_size >> 10, sb->layout); - seq_printf (seq, " [%d/%d] [", conf->raid_disks, conf->working_disks); - for (i = 0; i < conf->raid_disks; i++) - seq_printf (seq, "%s", conf->disks[i].operational ? "U" : "_"); - seq_printf (seq, "]"); -#if RAID5_DEBUG -#define D(x) \ - seq_printf (seq, "<"#x":%d>", atomic_read(&conf->x)) - printall(conf); -#endif - -} - -static void print_raid5_conf (raid5_conf_t *conf) -{ - int i; - struct disk_info *tmp; - - printk("RAID5 conf printout:\n"); - if (!conf) { - printk("(conf==NULL)\n"); - return; - } - printk(" --- rd:%d wd:%d fd:%d\n", conf->raid_disks, - conf->working_disks, conf->failed_disks); - -#if RAID5_DEBUG - for (i = 0; i < MD_SB_DISKS; i++) { -#else - for (i = 0; i < conf->working_disks+conf->failed_disks; i++) { -#endif - tmp = conf->disks + i; - printk(" disk %d, s:%d, o:%d, n:%d rd:%d us:%d dev:%s\n", - i, tmp->spare,tmp->operational, - tmp->number,tmp->raid_disk,tmp->used_slot, - partition_name(tmp->dev)); - } -} - -static int diskop(mddev_t *mddev, mdp_disk_t **d, int state) -{ - int err = 0; - int i, failed_disk=-1, spare_disk=-1, removed_disk=-1, added_disk=-1; - raid5_conf_t *conf = mddev->private; - struct disk_info *tmp, *sdisk, *fdisk, *rdisk, *adisk; - mdp_super_t *sb = mddev->sb; - mdp_disk_t *failed_desc, *spare_desc, *added_desc; - mdk_rdev_t *spare_rdev, *failed_rdev; - - print_raid5_conf(conf); - spin_lock_irq(&conf->device_lock); - /* - * find the disk ... - */ - switch (state) { - - case DISKOP_SPARE_ACTIVE: - - /* - * Find the failed disk within the RAID5 configuration ... - * (this can only be in the first conf->raid_disks part) - */ - for (i = 0; i < conf->raid_disks; i++) { - tmp = conf->disks + i; - if ((!tmp->operational && !tmp->spare) || - !tmp->used_slot) { - failed_disk = i; - break; - } - } - /* - * When we activate a spare disk we _must_ have a disk in - * the lower (active) part of the array to replace. - */ - if ((failed_disk == -1) || (failed_disk >= conf->raid_disks)) { - MD_BUG(); - err = 1; - goto abort; - } - /* fall through */ - - case DISKOP_SPARE_WRITE: - case DISKOP_SPARE_INACTIVE: - - /* - * Find the spare disk ... (can only be in the 'high' - * area of the array) - */ - for (i = conf->raid_disks; i < MD_SB_DISKS; i++) { - tmp = conf->disks + i; - if (tmp->spare && tmp->number == (*d)->number) { - spare_disk = i; - break; - } - } - if (spare_disk == -1) { - MD_BUG(); - err = 1; - goto abort; - } - break; - - case DISKOP_HOT_REMOVE_DISK: - - for (i = 0; i < MD_SB_DISKS; i++) { - tmp = conf->disks + i; - if (tmp->used_slot && (tmp->number == (*d)->number)) { - if (tmp->operational) { - err = -EBUSY; - goto abort; - } - removed_disk = i; - break; - } - } - if (removed_disk == -1) { - MD_BUG(); - err = 1; - goto abort; - } - break; - - case DISKOP_HOT_ADD_DISK: - - for (i = conf->raid_disks; i < MD_SB_DISKS; i++) { - tmp = conf->disks + i; - if (!tmp->used_slot) { - added_disk = i; - break; - } - } - if (added_disk == -1) { - MD_BUG(); - err = 1; - goto abort; - } - break; - } - - switch (state) { - /* - * Switch the spare disk to write-only mode: - */ - case DISKOP_SPARE_WRITE: - if (conf->spare) { - MD_BUG(); - err = 1; - goto abort; - } - sdisk = conf->disks + spare_disk; - sdisk->operational = 1; - sdisk->write_only = 1; - conf->spare = sdisk; - break; - /* - * Deactivate a spare disk: - */ - case DISKOP_SPARE_INACTIVE: - sdisk = conf->disks + spare_disk; - sdisk->operational = 0; - sdisk->write_only = 0; - /* - * Was the spare being resynced? - */ - if (conf->spare == sdisk) - conf->spare = NULL; - break; - /* - * Activate (mark read-write) the (now sync) spare disk, - * which means we switch it's 'raid position' (->raid_disk) - * with the failed disk. (only the first 'conf->raid_disks' - * slots are used for 'real' disks and we must preserve this - * property) - */ - case DISKOP_SPARE_ACTIVE: - if (!conf->spare) { - MD_BUG(); - err = 1; - goto abort; - } - sdisk = conf->disks + spare_disk; - fdisk = conf->disks + failed_disk; - - spare_desc = &sb->disks[sdisk->number]; - failed_desc = &sb->disks[fdisk->number]; - - if (spare_desc != *d) { - MD_BUG(); - err = 1; - goto abort; - } - - if (spare_desc->raid_disk != sdisk->raid_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - if (sdisk->raid_disk != spare_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - if (failed_desc->raid_disk != fdisk->raid_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - if (fdisk->raid_disk != failed_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - /* - * do the switch finally - */ - spare_rdev = find_rdev_nr(mddev, spare_desc->number); - failed_rdev = find_rdev_nr(mddev, failed_desc->number); - - /* There must be a spare_rdev, but there may not be a - * failed_rdev. That slot might be empty... - */ - spare_rdev->desc_nr = failed_desc->number; - if (failed_rdev) - failed_rdev->desc_nr = spare_desc->number; - - xchg_values(*spare_desc, *failed_desc); - xchg_values(*fdisk, *sdisk); - - /* - * (careful, 'failed' and 'spare' are switched from now on) - * - * we want to preserve linear numbering and we want to - * give the proper raid_disk number to the now activated - * disk. (this means we switch back these values) - */ - - xchg_values(spare_desc->raid_disk, failed_desc->raid_disk); - xchg_values(sdisk->raid_disk, fdisk->raid_disk); - xchg_values(spare_desc->number, failed_desc->number); - xchg_values(sdisk->number, fdisk->number); - - *d = failed_desc; - - if (sdisk->dev == MKDEV(0,0)) - sdisk->used_slot = 0; - - /* - * this really activates the spare. - */ - fdisk->spare = 0; - fdisk->write_only = 0; - - /* - * if we activate a spare, we definitely replace a - * non-operational disk slot in the 'low' area of - * the disk array. - */ - conf->failed_disks--; - conf->working_disks++; - conf->spare = NULL; - - break; - - case DISKOP_HOT_REMOVE_DISK: - rdisk = conf->disks + removed_disk; - - if (rdisk->spare && (removed_disk < conf->raid_disks)) { - MD_BUG(); - err = 1; - goto abort; - } - rdisk->dev = MKDEV(0,0); - rdisk->bdev = NULL; - rdisk->used_slot = 0; - - break; - - case DISKOP_HOT_ADD_DISK: - adisk = conf->disks + added_disk; - added_desc = *d; - - if (added_disk != added_desc->number) { - MD_BUG(); - err = 1; - goto abort; - } - - adisk->number = added_desc->number; - adisk->raid_disk = added_desc->raid_disk; - adisk->dev = MKDEV(added_desc->major,added_desc->minor); - /* it will be held open by rdev */ - adisk->bdev = bdget(kdev_t_to_nr(adisk->dev)); - - adisk->operational = 0; - adisk->write_only = 0; - adisk->spare = 1; - adisk->used_slot = 1; - - - break; - - default: - MD_BUG(); - err = 1; - goto abort; - } -abort: - spin_unlock_irq(&conf->device_lock); - print_raid5_conf(conf); - return err; -} - -static mdk_personality_t raid5_personality= -{ - name: "raid5", - make_request: make_request, - run: run, - stop: stop, - status: status, - error_handler: error, - diskop: diskop, - stop_resync: stop_resync, - restart_resync: restart_resync, - sync_request: sync_request -}; - -static int __init raid5_init (void) -{ - return register_md_personality (RAID5, &raid5_personality); -} - -static void raid5_exit (void) -{ - unregister_md_personality (RAID5); -} - -module_init(raid5_init); -module_exit(raid5_exit); -MODULE_LICENSE("GPL"); ./linux/raid5/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.726930991 +0000 @@ -1,4642 +0,0 @@ -/* - * raid10.c : Multiple Devices driver for Linux - * - * Copyright (C) 2000-2004 Neil Brown - * - * RAID-10 support for md. - * - * Base on code in raid1.c. See raid1.c for further copyright information. - * - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * You should have received a copy of the GNU General Public License - * (for example /usr/src/linux/COPYING); if not, write to the Free - * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include -#include -#include -#include "md.h" -#include "raid10.h" -#include "raid0.h" -#include "bitmap.h" - -/* - * RAID10 provides a combination of RAID0 and RAID1 functionality. - * The layout of data is defined by - * chunk_size - * raid_disks - * near_copies (stored in low byte of layout) - * far_copies (stored in second byte of layout) - * far_offset (stored in bit 16 of layout ) - * - * The data to be stored is divided into chunks using chunksize. - * Each device is divided into far_copies sections. - * In each section, chunks are laid out in a style similar to raid0, but - * near_copies copies of each chunk is stored (each on a different drive). - * The starting device for each section is offset near_copies from the starting - * device of the previous section. - * Thus they are (near_copies*far_copies) of each chunk, and each is on a different - * drive. - * near_copies and far_copies must be at least one, and their product is at most - * raid_disks. - * - * If far_offset is true, then the far_copies are handled a bit differently. - * The copies are still in different stripes, but instead of be very far apart - * on disk, there are adjacent stripes. - */ - -/* - * Number of guaranteed r10bios in case of extreme VM load: - */ -#define NR_RAID10_BIOS 256 - -/* when we get a read error on a read-only array, we redirect to another - * device without failing the first device, or trying to over-write to - * correct the read error. To keep track of bad blocks on a per-bio - * level, we store IO_BLOCKED in the appropriate 'bios' pointer - */ -#define IO_BLOCKED ((struct bio *)1) -/* When we successfully write to a known bad-block, we need to remove the - * bad-block marking which must be done from process context. So we record - * the success by setting devs[n].bio to IO_MADE_GOOD - */ -#define IO_MADE_GOOD ((struct bio *)2) - -#define BIO_SPECIAL(bio) ((unsigned long)bio <= 2) - -/* When there are this many requests queued to be written by - * the raid10 thread, we become 'congested' to provide back-pressure - * for writeback. - */ -static int max_queued_requests = 1024; - -static void allow_barrier(struct r10conf *conf); -static void lower_barrier(struct r10conf *conf); -static int enough(struct r10conf *conf, int ignore); -static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, - int *skipped); -static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio); -static void end_reshape_write(struct bio *bio, int error); -static void end_reshape(struct r10conf *conf); - -static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) -{ - struct r10conf *conf = data; - int size = offsetof(struct r10bio, devs[conf->copies]); - - /* allocate a r10bio with room for raid_disks entries in the - * bios array */ - return kzalloc(size, gfp_flags); -} - -static void r10bio_pool_free(void *r10_bio, void *data) -{ - kfree(r10_bio); -} - -/* Maximum size of each resync request */ -#define RESYNC_BLOCK_SIZE (64*1024) -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) -/* amount of memory to reserve for resync requests */ -#define RESYNC_WINDOW (1024*1024) -/* maximum number of concurrent requests, memory permitting */ -#define RESYNC_DEPTH (32*1024*1024/RESYNC_BLOCK_SIZE) - -/* - * When performing a resync, we need to read and compare, so - * we need as many pages are there are copies. - * When performing a recovery, we need 2 bios, one for read, - * one for write (we recover only one drive per r10buf) - * - */ -static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) -{ - struct r10conf *conf = data; - struct page *page; - struct r10bio *r10_bio; - struct bio *bio; - int i, j; - int nalloc; - - r10_bio = r10bio_pool_alloc(gfp_flags, conf); - if (!r10_bio) - return NULL; - - if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery) || - test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery)) - nalloc = conf->copies; /* resync */ - else - nalloc = 2; /* recovery */ - - /* - * Allocate bios. - */ - for (j = nalloc ; j-- ; ) { - bio = bio_kmalloc(gfp_flags, RESYNC_PAGES); - if (!bio) - goto out_free_bio; - r10_bio->devs[j].bio = bio; - if (!conf->have_replacement) - continue; - bio = bio_kmalloc(gfp_flags, RESYNC_PAGES); - if (!bio) - goto out_free_bio; - r10_bio->devs[j].repl_bio = bio; - } - /* - * Allocate RESYNC_PAGES data pages and attach them - * where needed. - */ - for (j = 0 ; j < nalloc; j++) { - struct bio *rbio = r10_bio->devs[j].repl_bio; - bio = r10_bio->devs[j].bio; - for (i = 0; i < RESYNC_PAGES; i++) { - if (j > 0 && !test_bit(MD_RECOVERY_SYNC, - &conf->mddev->recovery)) { - /* we can share bv_page's during recovery - * and reshape */ - struct bio *rbio = r10_bio->devs[0].bio; - page = rbio->bi_io_vec[i].bv_page; - get_page(page); - } else - page = alloc_page(gfp_flags); - if (unlikely(!page)) - goto out_free_pages; - - bio->bi_io_vec[i].bv_page = page; - if (rbio) - rbio->bi_io_vec[i].bv_page = page; - } - } - - return r10_bio; - -out_free_pages: - for ( ; i > 0 ; i--) - safe_put_page(bio->bi_io_vec[i-1].bv_page); - while (j--) - for (i = 0; i < RESYNC_PAGES ; i++) - safe_put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page); - j = 0; -out_free_bio: - for ( ; j < nalloc; j++) { - if (r10_bio->devs[j].bio) - bio_put(r10_bio->devs[j].bio); - if (r10_bio->devs[j].repl_bio) - bio_put(r10_bio->devs[j].repl_bio); - } - r10bio_pool_free(r10_bio, conf); - return NULL; -} - -static void r10buf_pool_free(void *__r10_bio, void *data) -{ - int i; - struct r10conf *conf = data; - struct r10bio *r10bio = __r10_bio; - int j; - - for (j=0; j < conf->copies; j++) { - struct bio *bio = r10bio->devs[j].bio; - if (bio) { - for (i = 0; i < RESYNC_PAGES; i++) { - safe_put_page(bio->bi_io_vec[i].bv_page); - bio->bi_io_vec[i].bv_page = NULL; - } - bio_put(bio); - } - bio = r10bio->devs[j].repl_bio; - if (bio) - bio_put(bio); - } - r10bio_pool_free(r10bio, conf); -} - -static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio) -{ - int i; - - for (i = 0; i < conf->copies; i++) { - struct bio **bio = & r10_bio->devs[i].bio; - if (!BIO_SPECIAL(*bio)) - bio_put(*bio); - *bio = NULL; - bio = &r10_bio->devs[i].repl_bio; - if (r10_bio->read_slot < 0 && !BIO_SPECIAL(*bio)) - bio_put(*bio); - *bio = NULL; - } -} - -static void free_r10bio(struct r10bio *r10_bio) -{ - struct r10conf *conf = r10_bio->mddev->private; - - put_all_bios(conf, r10_bio); - mempool_free(r10_bio, conf->r10bio_pool); -} - -static void put_buf(struct r10bio *r10_bio) -{ - struct r10conf *conf = r10_bio->mddev->private; - - mempool_free(r10_bio, conf->r10buf_pool); - - lower_barrier(conf); -} - -static void reschedule_retry(struct r10bio *r10_bio) -{ - unsigned long flags; - struct mddev *mddev = r10_bio->mddev; - struct r10conf *conf = mddev->private; - - spin_lock_irqsave(&conf->device_lock, flags); - list_add(&r10_bio->retry_list, &conf->retry_list); - conf->nr_queued ++; - spin_unlock_irqrestore(&conf->device_lock, flags); - - /* wake up frozen array... */ - wake_up(&conf->wait_barrier); - - md_wakeup_thread(mddev->thread); -} - -/* - * raid_end_bio_io() is called when we have finished servicing a mirrored - * operation and are ready to return a success/failure code to the buffer - * cache layer. - */ -static void raid_end_bio_io(struct r10bio *r10_bio) -{ - struct bio *bio = r10_bio->master_bio; - int done; - struct r10conf *conf = r10_bio->mddev->private; - - if (bio->bi_phys_segments) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - bio->bi_phys_segments--; - done = (bio->bi_phys_segments == 0); - spin_unlock_irqrestore(&conf->device_lock, flags); - } else - done = 1; - if (!test_bit(R10BIO_Uptodate, &r10_bio->state)) - clear_bit(BIO_UPTODATE, &bio->bi_flags); - if (done) { - bio_endio(bio, 0); - /* - * Wake up any possible resync thread that waits for the device - * to go idle. - */ - allow_barrier(conf); - } - free_r10bio(r10_bio); -} - -/* - * Update disk head position estimator based on IRQ completion info. - */ -static inline void update_head_pos(int slot, struct r10bio *r10_bio) -{ - struct r10conf *conf = r10_bio->mddev->private; - - conf->mirrors[r10_bio->devs[slot].devnum].head_position = - r10_bio->devs[slot].addr + (r10_bio->sectors); -} - -/* - * Find the disk number which triggered given bio - */ -static int find_bio_disk(struct r10conf *conf, struct r10bio *r10_bio, - struct bio *bio, int *slotp, int *replp) -{ - int slot; - int repl = 0; - - for (slot = 0; slot < conf->copies; slot++) { - if (r10_bio->devs[slot].bio == bio) - break; - if (r10_bio->devs[slot].repl_bio == bio) { - repl = 1; - break; - } - } - - BUG_ON(slot == conf->copies); - update_head_pos(slot, r10_bio); - - if (slotp) - *slotp = slot; - if (replp) - *replp = repl; - return r10_bio->devs[slot].devnum; -} - -static void raid10_end_read_request(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - struct r10bio *r10_bio = bio->bi_private; - int slot, dev; - struct md_rdev *rdev; - struct r10conf *conf = r10_bio->mddev->private; - - - slot = r10_bio->read_slot; - dev = r10_bio->devs[slot].devnum; - rdev = r10_bio->devs[slot].rdev; - /* - * this branch is our 'one mirror IO has finished' event handler: - */ - update_head_pos(slot, r10_bio); - - if (uptodate) { - /* - * Set R10BIO_Uptodate in our master bio, so that - * we will return a good error code to the higher - * levels even if IO on some other mirrored buffer fails. - * - * The 'master' represents the composite IO operation to - * user-side. So if something waits for IO, then it will - * wait for the 'master' bio. - */ - set_bit(R10BIO_Uptodate, &r10_bio->state); - } else { - /* If all other devices that store this block have - * failed, we want to return the error upwards rather - * than fail the last device. Here we redefine - * "uptodate" to mean "Don't want to retry" - */ - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - if (!enough(conf, rdev->raid_disk)) - uptodate = 1; - spin_unlock_irqrestore(&conf->device_lock, flags); - } - if (uptodate) { - raid_end_bio_io(r10_bio); - rdev_dec_pending(rdev, conf->mddev); - } else { - /* - * oops, read error - keep the refcount on the rdev - */ - char b[BDEVNAME_SIZE]; - printk_ratelimited(KERN_ERR - "md/raid10:%s: %s: rescheduling sector %llu\n", - mdname(conf->mddev), - bdevname(rdev->bdev, b), - (unsigned long long)r10_bio->sector); - set_bit(R10BIO_ReadError, &r10_bio->state); - reschedule_retry(r10_bio); - } -} - -static void close_write(struct r10bio *r10_bio) -{ - /* clear the bitmap if all writes complete successfully */ - bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, - r10_bio->sectors, - !test_bit(R10BIO_Degraded, &r10_bio->state), - 0); - md_write_end(r10_bio->mddev); -} - -static void one_write_done(struct r10bio *r10_bio) -{ - if (atomic_dec_and_test(&r10_bio->remaining)) { - if (test_bit(R10BIO_WriteError, &r10_bio->state)) - reschedule_retry(r10_bio); - else { - close_write(r10_bio); - if (test_bit(R10BIO_MadeGood, &r10_bio->state)) - reschedule_retry(r10_bio); - else - raid_end_bio_io(r10_bio); - } - } -} - -static void raid10_end_write_request(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - struct r10bio *r10_bio = bio->bi_private; - int dev; - int dec_rdev = 1; - struct r10conf *conf = r10_bio->mddev->private; - int slot, repl; - struct md_rdev *rdev = NULL; - - dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl); - - if (repl) - rdev = conf->mirrors[dev].replacement; - if (!rdev) { - smp_rmb(); - repl = 0; - rdev = conf->mirrors[dev].rdev; - } - /* - * this branch is our 'one mirror IO has finished' event handler: - */ - if (!uptodate) { - if (repl) - /* Never record new bad blocks to replacement, - * just fail it. - */ - md_error(rdev->mddev, rdev); - else { - set_bit(WriteErrorSeen, &rdev->flags); - if (!test_and_set_bit(WantReplacement, &rdev->flags)) - set_bit(MD_RECOVERY_NEEDED, - &rdev->mddev->recovery); - set_bit(R10BIO_WriteError, &r10_bio->state); - dec_rdev = 0; - } - } else { - /* - * Set R10BIO_Uptodate in our master bio, so that - * we will return a good error code for to the higher - * levels even if IO on some other mirrored buffer fails. - * - * The 'master' represents the composite IO operation to - * user-side. So if something waits for IO, then it will - * wait for the 'master' bio. - */ - sector_t first_bad; - int bad_sectors; - - set_bit(R10BIO_Uptodate, &r10_bio->state); - - /* Maybe we can clear some bad blocks. */ - if (is_badblock(rdev, - r10_bio->devs[slot].addr, - r10_bio->sectors, - &first_bad, &bad_sectors)) { - bio_put(bio); - if (repl) - r10_bio->devs[slot].repl_bio = IO_MADE_GOOD; - else - r10_bio->devs[slot].bio = IO_MADE_GOOD; - dec_rdev = 0; - set_bit(R10BIO_MadeGood, &r10_bio->state); - } - } - - /* - * - * Let's see if all mirrored write operations have finished - * already. - */ - one_write_done(r10_bio); - if (dec_rdev) - rdev_dec_pending(conf->mirrors[dev].rdev, conf->mddev); -} - -/* - * RAID10 layout manager - * As well as the chunksize and raid_disks count, there are two - * parameters: near_copies and far_copies. - * near_copies * far_copies must be <= raid_disks. - * Normally one of these will be 1. - * If both are 1, we get raid0. - * If near_copies == raid_disks, we get raid1. - * - * Chunks are laid out in raid0 style with near_copies copies of the - * first chunk, followed by near_copies copies of the next chunk and - * so on. - * If far_copies > 1, then after 1/far_copies of the array has been assigned - * as described above, we start again with a device offset of near_copies. - * So we effectively have another copy of the whole array further down all - * the drives, but with blocks on different drives. - * With this layout, and block is never stored twice on the one device. - * - * raid10_find_phys finds the sector offset of a given virtual sector - * on each device that it is on. - * - * raid10_find_virt does the reverse mapping, from a device and a - * sector offset to a virtual address - */ - -static void __raid10_find_phys(struct geom *geo, struct r10bio *r10bio) -{ - int n,f; - sector_t sector; - sector_t chunk; - sector_t stripe; - int dev; - int slot = 0; - - /* now calculate first sector/dev */ - chunk = r10bio->sector >> geo->chunk_shift; - sector = r10bio->sector & geo->chunk_mask; - - chunk *= geo->near_copies; - stripe = chunk; - dev = sector_div(stripe, geo->raid_disks); - if (geo->far_offset) - stripe *= geo->far_copies; - - sector += stripe << geo->chunk_shift; - - /* and calculate all the others */ - for (n = 0; n < geo->near_copies; n++) { - int d = dev; - sector_t s = sector; - r10bio->devs[slot].addr = sector; - r10bio->devs[slot].devnum = d; - slot++; - - for (f = 1; f < geo->far_copies; f++) { - d += geo->near_copies; - if (d >= geo->raid_disks) - d -= geo->raid_disks; - s += geo->stride; - r10bio->devs[slot].devnum = d; - r10bio->devs[slot].addr = s; - slot++; - } - dev++; - if (dev >= geo->raid_disks) { - dev = 0; - sector += (geo->chunk_mask + 1); - } - } -} - -static void raid10_find_phys(struct r10conf *conf, struct r10bio *r10bio) -{ - struct geom *geo = &conf->geo; - - if (conf->reshape_progress != MaxSector && - ((r10bio->sector >= conf->reshape_progress) != - conf->mddev->reshape_backwards)) { - set_bit(R10BIO_Previous, &r10bio->state); - geo = &conf->prev; - } else - clear_bit(R10BIO_Previous, &r10bio->state); - - __raid10_find_phys(geo, r10bio); -} - -static sector_t raid10_find_virt(struct r10conf *conf, sector_t sector, int dev) -{ - sector_t offset, chunk, vchunk; - /* Never use conf->prev as this is only called during resync - * or recovery, so reshape isn't happening - */ - struct geom *geo = &conf->geo; - - offset = sector & geo->chunk_mask; - if (geo->far_offset) { - int fc; - chunk = sector >> geo->chunk_shift; - fc = sector_div(chunk, geo->far_copies); - dev -= fc * geo->near_copies; - if (dev < 0) - dev += geo->raid_disks; - } else { - while (sector >= geo->stride) { - sector -= geo->stride; - if (dev < geo->near_copies) - dev += geo->raid_disks - geo->near_copies; - else - dev -= geo->near_copies; - } - chunk = sector >> geo->chunk_shift; - } - vchunk = chunk * geo->raid_disks + dev; - sector_div(vchunk, geo->near_copies); - return (vchunk << geo->chunk_shift) + offset; -} - -/** - * raid10_mergeable_bvec -- tell bio layer if a two requests can be merged - * @q: request queue - * @bvm: properties of new bio - * @biovec: the request that could be merged to it. - * - * Return amount of bytes we can accept at this offset - * This requires checking for end-of-chunk if near_copies != raid_disks, - * and for subordinate merge_bvec_fns if merge_check_needed. - */ -static int raid10_mergeable_bvec(struct request_queue *q, - struct bvec_merge_data *bvm, - struct bio_vec *biovec) -{ - struct mddev *mddev = q->queuedata; - struct r10conf *conf = mddev->private; - sector_t sector = bvm->bi_sector + get_start_sect(bvm->bi_bdev); - int max; - unsigned int chunk_sectors; - unsigned int bio_sectors = bvm->bi_size >> 9; - struct geom *geo = &conf->geo; - - chunk_sectors = (conf->geo.chunk_mask & conf->prev.chunk_mask) + 1; - if (conf->reshape_progress != MaxSector && - ((sector >= conf->reshape_progress) != - conf->mddev->reshape_backwards)) - geo = &conf->prev; - - if (geo->near_copies < geo->raid_disks) { - max = (chunk_sectors - ((sector & (chunk_sectors - 1)) - + bio_sectors)) << 9; - if (max < 0) - /* bio_add cannot handle a negative return */ - max = 0; - if (max <= biovec->bv_len && bio_sectors == 0) - return biovec->bv_len; - } else - max = biovec->bv_len; - - if (mddev->merge_check_needed) { - struct { - struct r10bio r10_bio; - struct r10dev devs[conf->copies]; - } on_stack; - struct r10bio *r10_bio = &on_stack.r10_bio; - int s; - if (conf->reshape_progress != MaxSector) { - /* Cannot give any guidance during reshape */ - if (max <= biovec->bv_len && bio_sectors == 0) - return biovec->bv_len; - return 0; - } - r10_bio->sector = sector; - raid10_find_phys(conf, r10_bio); - rcu_read_lock(); - for (s = 0; s < conf->copies; s++) { - int disk = r10_bio->devs[s].devnum; - struct md_rdev *rdev = rcu_dereference( - conf->mirrors[disk].rdev); - if (rdev && !test_bit(Faulty, &rdev->flags)) { - struct request_queue *q = - bdev_get_queue(rdev->bdev); - if (q->merge_bvec_fn) { - bvm->bi_sector = r10_bio->devs[s].addr - + rdev->data_offset; - bvm->bi_bdev = rdev->bdev; - max = min(max, q->merge_bvec_fn( - q, bvm, biovec)); - } - } - rdev = rcu_dereference(conf->mirrors[disk].replacement); - if (rdev && !test_bit(Faulty, &rdev->flags)) { - struct request_queue *q = - bdev_get_queue(rdev->bdev); - if (q->merge_bvec_fn) { - bvm->bi_sector = r10_bio->devs[s].addr - + rdev->data_offset; - bvm->bi_bdev = rdev->bdev; - max = min(max, q->merge_bvec_fn( - q, bvm, biovec)); - } - } - } - rcu_read_unlock(); - } - return max; -} - -/* - * This routine returns the disk from which the requested read should - * be done. There is a per-array 'next expected sequential IO' sector - * number - if this matches on the next IO then we use the last disk. - * There is also a per-disk 'last know head position' sector that is - * maintained from IRQ contexts, both the normal and the resync IO - * completion handlers update this position correctly. If there is no - * perfect sequential match then we pick the disk whose head is closest. - * - * If there are 2 mirrors in the same 2 devices, performance degrades - * because position is mirror, not device based. - * - * The rdev for the device selected will have nr_pending incremented. - */ - -/* - * FIXME: possibly should rethink readbalancing and do it differently - * depending on near_copies / far_copies geometry. - */ -static struct md_rdev *read_balance(struct r10conf *conf, - struct r10bio *r10_bio, - int *max_sectors) -{ - const sector_t this_sector = r10_bio->sector; - int disk, slot; - int sectors = r10_bio->sectors; - int best_good_sectors; - sector_t new_distance, best_dist; - struct md_rdev *best_rdev, *rdev = NULL; - int do_balance; - int best_slot; - struct geom *geo = &conf->geo; - - raid10_find_phys(conf, r10_bio); - rcu_read_lock(); -retry: - sectors = r10_bio->sectors; - best_slot = -1; - best_rdev = NULL; - best_dist = MaxSector; - best_good_sectors = 0; - do_balance = 1; - /* - * Check if we can balance. We can balance on the whole - * device if no resync is going on (recovery is ok), or below - * the resync window. We take the first readable disk when - * above the resync window. - */ - if (conf->mddev->recovery_cp < MaxSector - && (this_sector + sectors >= conf->next_resync)) - do_balance = 0; - - for (slot = 0; slot < conf->copies ; slot++) { - sector_t first_bad; - int bad_sectors; - sector_t dev_sector; - - if (r10_bio->devs[slot].bio == IO_BLOCKED) - continue; - disk = r10_bio->devs[slot].devnum; - rdev = rcu_dereference(conf->mirrors[disk].replacement); - if (rdev == NULL || test_bit(Faulty, &rdev->flags) || - test_bit(Unmerged, &rdev->flags) || - r10_bio->devs[slot].addr + sectors > rdev->recovery_offset) - rdev = rcu_dereference(conf->mirrors[disk].rdev); - if (rdev == NULL || - test_bit(Faulty, &rdev->flags) || - test_bit(Unmerged, &rdev->flags)) - continue; - if (!test_bit(In_sync, &rdev->flags) && - r10_bio->devs[slot].addr + sectors > rdev->recovery_offset) - continue; - - dev_sector = r10_bio->devs[slot].addr; - if (is_badblock(rdev, dev_sector, sectors, - &first_bad, &bad_sectors)) { - if (best_dist < MaxSector) - /* Already have a better slot */ - continue; - if (first_bad <= dev_sector) { - /* Cannot read here. If this is the - * 'primary' device, then we must not read - * beyond 'bad_sectors' from another device. - */ - bad_sectors -= (dev_sector - first_bad); - if (!do_balance && sectors > bad_sectors) - sectors = bad_sectors; - if (best_good_sectors > sectors) - best_good_sectors = sectors; - } else { - sector_t good_sectors = - first_bad - dev_sector; - if (good_sectors > best_good_sectors) { - best_good_sectors = good_sectors; - best_slot = slot; - best_rdev = rdev; - } - if (!do_balance) - /* Must read from here */ - break; - } - continue; - } else - best_good_sectors = sectors; - - if (!do_balance) - break; - - /* This optimisation is debatable, and completely destroys - * sequential read speed for 'far copies' arrays. So only - * keep it for 'near' arrays, and review those later. - */ - if (geo->near_copies > 1 && !atomic_read(&rdev->nr_pending)) - break; - - /* for far > 1 always use the lowest address */ - if (geo->far_copies > 1) - new_distance = r10_bio->devs[slot].addr; - else - new_distance = abs(r10_bio->devs[slot].addr - - conf->mirrors[disk].head_position); - if (new_distance < best_dist) { - best_dist = new_distance; - best_slot = slot; - best_rdev = rdev; - } - } - if (slot >= conf->copies) { - slot = best_slot; - rdev = best_rdev; - } - - if (slot >= 0) { - atomic_inc(&rdev->nr_pending); - if (test_bit(Faulty, &rdev->flags)) { - /* Cannot risk returning a device that failed - * before we inc'ed nr_pending - */ - rdev_dec_pending(rdev, conf->mddev); - goto retry; - } - r10_bio->read_slot = slot; - } else - rdev = NULL; - rcu_read_unlock(); - *max_sectors = best_good_sectors; - - return rdev; -} - -int md_raid10_congested(struct mddev *mddev, int bits) -{ - struct r10conf *conf = mddev->private; - int i, ret = 0; - - if ((bits & (1 << BDI_async_congested)) && - conf->pending_count >= max_queued_requests) - return 1; - - rcu_read_lock(); - for (i = 0; - (i < conf->geo.raid_disks || i < conf->prev.raid_disks) - && ret == 0; - i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (rdev && !test_bit(Faulty, &rdev->flags)) { - struct request_queue *q = bdev_get_queue(rdev->bdev); - - ret |= bdi_congested(&q->backing_dev_info, bits); - } - } - rcu_read_unlock(); - return ret; -} -EXPORT_SYMBOL_GPL(md_raid10_congested); - -static int raid10_congested(void *data, int bits) -{ - struct mddev *mddev = data; - - return mddev_congested(mddev, bits) || - md_raid10_congested(mddev, bits); -} - -static void flush_pending_writes(struct r10conf *conf) -{ - /* Any writes that have been queued but are awaiting - * bitmap updates get flushed here. - */ - spin_lock_irq(&conf->device_lock); - - if (conf->pending_bio_list.head) { - struct bio *bio; - bio = bio_list_get(&conf->pending_bio_list); - conf->pending_count = 0; - spin_unlock_irq(&conf->device_lock); - /* flush any pending bitmap writes to disk - * before proceeding w/ I/O */ - bitmap_unplug(conf->mddev->bitmap); - wake_up(&conf->wait_barrier); - - while (bio) { /* submit pending writes */ - struct bio *next = bio->bi_next; - bio->bi_next = NULL; - generic_make_request(bio); - bio = next; - } - } else - spin_unlock_irq(&conf->device_lock); -} - -/* Barriers.... - * Sometimes we need to suspend IO while we do something else, - * either some resync/recovery, or reconfigure the array. - * To do this we raise a 'barrier'. - * The 'barrier' is a counter that can be raised multiple times - * to count how many activities are happening which preclude - * normal IO. - * We can only raise the barrier if there is no pending IO. - * i.e. if nr_pending == 0. - * We choose only to raise the barrier if no-one is waiting for the - * barrier to go down. This means that as soon as an IO request - * is ready, no other operations which require a barrier will start - * until the IO request has had a chance. - * - * So: regular IO calls 'wait_barrier'. When that returns there - * is no backgroup IO happening, It must arrange to call - * allow_barrier when it has finished its IO. - * backgroup IO calls must call raise_barrier. Once that returns - * there is no normal IO happeing. It must arrange to call - * lower_barrier when the particular background IO completes. - */ - -static void raise_barrier(struct r10conf *conf, int force) -{ - BUG_ON(force && !conf->barrier); - spin_lock_irq(&conf->resync_lock); - - /* Wait until no block IO is waiting (unless 'force') */ - wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting, - conf->resync_lock, ); - - /* block any new IO from starting */ - conf->barrier++; - - /* Now wait for all pending IO to complete */ - wait_event_lock_irq(conf->wait_barrier, - !conf->nr_pending && conf->barrier < RESYNC_DEPTH, - conf->resync_lock, ); - - spin_unlock_irq(&conf->resync_lock); -} - -static void lower_barrier(struct r10conf *conf) -{ - unsigned long flags; - spin_lock_irqsave(&conf->resync_lock, flags); - conf->barrier--; - spin_unlock_irqrestore(&conf->resync_lock, flags); - wake_up(&conf->wait_barrier); -} - -static void wait_barrier(struct r10conf *conf) -{ - spin_lock_irq(&conf->resync_lock); - if (conf->barrier) { - conf->nr_waiting++; - /* Wait for the barrier to drop. - * However if there are already pending - * requests (preventing the barrier from - * rising completely), and the - * pre-process bio queue isn't empty, - * then don't wait, as we need to empty - * that queue to get the nr_pending - * count down. - */ - wait_event_lock_irq(conf->wait_barrier, - !conf->barrier || - (conf->nr_pending && - current->bio_list && - !bio_list_empty(current->bio_list)), - conf->resync_lock, - ); - conf->nr_waiting--; - } - conf->nr_pending++; - spin_unlock_irq(&conf->resync_lock); -} - -static void allow_barrier(struct r10conf *conf) -{ - unsigned long flags; - spin_lock_irqsave(&conf->resync_lock, flags); - conf->nr_pending--; - spin_unlock_irqrestore(&conf->resync_lock, flags); - wake_up(&conf->wait_barrier); -} - -static void freeze_array(struct r10conf *conf) -{ - /* stop syncio and normal IO and wait for everything to - * go quiet. - * We increment barrier and nr_waiting, and then - * wait until nr_pending match nr_queued+1 - * This is called in the context of one normal IO request - * that has failed. Thus any sync request that might be pending - * will be blocked by nr_pending, and we need to wait for - * pending IO requests to complete or be queued for re-try. - * Thus the number queued (nr_queued) plus this request (1) - * must match the number of pending IOs (nr_pending) before - * we continue. - */ - spin_lock_irq(&conf->resync_lock); - conf->barrier++; - conf->nr_waiting++; - wait_event_lock_irq(conf->wait_barrier, - conf->nr_pending == conf->nr_queued+1, - conf->resync_lock, - flush_pending_writes(conf)); - - spin_unlock_irq(&conf->resync_lock); -} - -static void unfreeze_array(struct r10conf *conf) -{ - /* reverse the effect of the freeze */ - spin_lock_irq(&conf->resync_lock); - conf->barrier--; - conf->nr_waiting--; - wake_up(&conf->wait_barrier); - spin_unlock_irq(&conf->resync_lock); -} - -static sector_t choose_data_offset(struct r10bio *r10_bio, - struct md_rdev *rdev) -{ - if (!test_bit(MD_RECOVERY_RESHAPE, &rdev->mddev->recovery) || - test_bit(R10BIO_Previous, &r10_bio->state)) - return rdev->data_offset; - else - return rdev->new_data_offset; -} - -static void make_request(struct mddev *mddev, struct bio * bio) -{ - struct r10conf *conf = mddev->private; - struct r10bio *r10_bio; - struct bio *read_bio; - int i; - sector_t chunk_mask = (conf->geo.chunk_mask & conf->prev.chunk_mask); - int chunk_sects = chunk_mask + 1; - const int rw = bio_data_dir(bio); - const unsigned long do_sync = (bio->bi_rw & REQ_SYNC); - const unsigned long do_fua = (bio->bi_rw & REQ_FUA); - unsigned long flags; - struct md_rdev *blocked_rdev; - int sectors_handled; - int max_sectors; - int sectors; - - if (unlikely(bio->bi_rw & REQ_FLUSH)) { - md_flush_request(mddev, bio); - return; - } - - /* If this request crosses a chunk boundary, we need to - * split it. This will only happen for 1 PAGE (or less) requests. - */ - if (unlikely((bio->bi_sector & chunk_mask) + (bio->bi_size >> 9) - > chunk_sects - && (conf->geo.near_copies < conf->geo.raid_disks - || conf->prev.near_copies < conf->prev.raid_disks))) { - struct bio_pair *bp; - /* Sanity check -- queue functions should prevent this happening */ - if (bio->bi_vcnt != 1 || - bio->bi_idx != 0) - goto bad_map; - /* This is a one page bio that upper layers - * refuse to split for us, so we need to split it. - */ - bp = bio_split(bio, - chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); - - /* Each of these 'make_request' calls will call 'wait_barrier'. - * If the first succeeds but the second blocks due to the resync - * thread raising the barrier, we will deadlock because the - * IO to the underlying device will be queued in generic_make_request - * and will never complete, so will never reduce nr_pending. - * So increment nr_waiting here so no new raise_barriers will - * succeed, and so the second wait_barrier cannot block. - */ - spin_lock_irq(&conf->resync_lock); - conf->nr_waiting++; - spin_unlock_irq(&conf->resync_lock); - - make_request(mddev, &bp->bio1); - make_request(mddev, &bp->bio2); - - spin_lock_irq(&conf->resync_lock); - conf->nr_waiting--; - wake_up(&conf->wait_barrier); - spin_unlock_irq(&conf->resync_lock); - - bio_pair_release(bp); - return; - bad_map: - printk("md/raid10:%s: make_request bug: can't convert block across chunks" - " or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2, - (unsigned long long)bio->bi_sector, bio->bi_size >> 10); - - bio_io_error(bio); - return; - } - - md_write_start(mddev, bio); - - /* - * Register the new request and wait if the reconstruction - * thread has put up a bar for new requests. - * Continue immediately if no resync is active currently. - */ - wait_barrier(conf); - - sectors = bio->bi_size >> 9; - while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && - bio->bi_sector < conf->reshape_progress && - bio->bi_sector + sectors > conf->reshape_progress) { - /* IO spans the reshape position. Need to wait for - * reshape to pass - */ - allow_barrier(conf); - wait_event(conf->wait_barrier, - conf->reshape_progress <= bio->bi_sector || - conf->reshape_progress >= bio->bi_sector + sectors); - wait_barrier(conf); - } - if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && - bio_data_dir(bio) == WRITE && - (mddev->reshape_backwards - ? (bio->bi_sector < conf->reshape_safe && - bio->bi_sector + sectors > conf->reshape_progress) - : (bio->bi_sector + sectors > conf->reshape_safe && - bio->bi_sector < conf->reshape_progress))) { - /* Need to update reshape_position in metadata */ - mddev->reshape_position = conf->reshape_progress; - set_bit(MD_CHANGE_DEVS, &mddev->flags); - set_bit(MD_CHANGE_PENDING, &mddev->flags); - md_wakeup_thread(mddev->thread); - wait_event(mddev->sb_wait, - !test_bit(MD_CHANGE_PENDING, &mddev->flags)); - - conf->reshape_safe = mddev->reshape_position; - } - - r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO); - - r10_bio->master_bio = bio; - r10_bio->sectors = sectors; - - r10_bio->mddev = mddev; - r10_bio->sector = bio->bi_sector; - r10_bio->state = 0; - - /* We might need to issue multiple reads to different - * devices if there are bad blocks around, so we keep - * track of the number of reads in bio->bi_phys_segments. - * If this is 0, there is only one r10_bio and no locking - * will be needed when the request completes. If it is - * non-zero, then it is the number of not-completed requests. - */ - bio->bi_phys_segments = 0; - clear_bit(BIO_SEG_VALID, &bio->bi_flags); - - if (rw == READ) { - /* - * read balancing logic: - */ - struct md_rdev *rdev; - int slot; - -read_again: - rdev = read_balance(conf, r10_bio, &max_sectors); - if (!rdev) { - raid_end_bio_io(r10_bio); - return; - } - slot = r10_bio->read_slot; - - read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev); - md_trim_bio(read_bio, r10_bio->sector - bio->bi_sector, - max_sectors); - - r10_bio->devs[slot].bio = read_bio; - r10_bio->devs[slot].rdev = rdev; - - read_bio->bi_sector = r10_bio->devs[slot].addr + - choose_data_offset(r10_bio, rdev); - read_bio->bi_bdev = rdev->bdev; - read_bio->bi_end_io = raid10_end_read_request; - read_bio->bi_rw = READ | do_sync; - read_bio->bi_private = r10_bio; - - if (max_sectors < r10_bio->sectors) { - /* Could not read all from this device, so we will - * need another r10_bio. - */ - sectors_handled = (r10_bio->sectors + max_sectors - - bio->bi_sector); - r10_bio->sectors = max_sectors; - spin_lock_irq(&conf->device_lock); - if (bio->bi_phys_segments == 0) - bio->bi_phys_segments = 2; - else - bio->bi_phys_segments++; - spin_unlock(&conf->device_lock); - /* Cannot call generic_make_request directly - * as that will be queued in __generic_make_request - * and subsequent mempool_alloc might block - * waiting for it. so hand bio over to raid10d. - */ - reschedule_retry(r10_bio); - - r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO); - - r10_bio->master_bio = bio; - r10_bio->sectors = ((bio->bi_size >> 9) - - sectors_handled); - r10_bio->state = 0; - r10_bio->mddev = mddev; - r10_bio->sector = bio->bi_sector + sectors_handled; - goto read_again; - } else - generic_make_request(read_bio); - return; - } - - /* - * WRITE: - */ - if (conf->pending_count >= max_queued_requests) { - md_wakeup_thread(mddev->thread); - wait_event(conf->wait_barrier, - conf->pending_count < max_queued_requests); - } - /* first select target devices under rcu_lock and - * inc refcount on their rdev. Record them by setting - * bios[x] to bio - * If there are known/acknowledged bad blocks on any device - * on which we have seen a write error, we want to avoid - * writing to those blocks. This potentially requires several - * writes to write around the bad blocks. Each set of writes - * gets its own r10_bio with a set of bios attached. The number - * of r10_bios is recored in bio->bi_phys_segments just as with - * the read case. - */ - - r10_bio->read_slot = -1; /* make sure repl_bio gets freed */ - raid10_find_phys(conf, r10_bio); -retry_write: - blocked_rdev = NULL; - rcu_read_lock(); - max_sectors = r10_bio->sectors; - - for (i = 0; i < conf->copies; i++) { - int d = r10_bio->devs[i].devnum; - struct md_rdev *rdev = rcu_dereference(conf->mirrors[d].rdev); - struct md_rdev *rrdev = rcu_dereference( - conf->mirrors[d].replacement); - if (rdev == rrdev) - rrdev = NULL; - if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { - atomic_inc(&rdev->nr_pending); - blocked_rdev = rdev; - break; - } - if (rrdev && unlikely(test_bit(Blocked, &rrdev->flags))) { - atomic_inc(&rrdev->nr_pending); - blocked_rdev = rrdev; - break; - } - if (rdev && (test_bit(Faulty, &rdev->flags) - || test_bit(Unmerged, &rdev->flags))) - rdev = NULL; - if (rrdev && (test_bit(Faulty, &rrdev->flags) - || test_bit(Unmerged, &rrdev->flags))) - rrdev = NULL; - - r10_bio->devs[i].bio = NULL; - r10_bio->devs[i].repl_bio = NULL; - - if (!rdev && !rrdev) { - set_bit(R10BIO_Degraded, &r10_bio->state); - continue; - } - if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) { - sector_t first_bad; - sector_t dev_sector = r10_bio->devs[i].addr; - int bad_sectors; - int is_bad; - - is_bad = is_badblock(rdev, dev_sector, - max_sectors, - &first_bad, &bad_sectors); - if (is_bad < 0) { - /* Mustn't write here until the bad block - * is acknowledged - */ - atomic_inc(&rdev->nr_pending); - set_bit(BlockedBadBlocks, &rdev->flags); - blocked_rdev = rdev; - break; - } - if (is_bad && first_bad <= dev_sector) { - /* Cannot write here at all */ - bad_sectors -= (dev_sector - first_bad); - if (bad_sectors < max_sectors) - /* Mustn't write more than bad_sectors - * to other devices yet - */ - max_sectors = bad_sectors; - /* We don't set R10BIO_Degraded as that - * only applies if the disk is missing, - * so it might be re-added, and we want to - * know to recover this chunk. - * In this case the device is here, and the - * fact that this chunk is not in-sync is - * recorded in the bad block log. - */ - continue; - } - if (is_bad) { - int good_sectors = first_bad - dev_sector; - if (good_sectors < max_sectors) - max_sectors = good_sectors; - } - } - if (rdev) { - r10_bio->devs[i].bio = bio; - atomic_inc(&rdev->nr_pending); - } - if (rrdev) { - r10_bio->devs[i].repl_bio = bio; - atomic_inc(&rrdev->nr_pending); - } - } - rcu_read_unlock(); - - if (unlikely(blocked_rdev)) { - /* Have to wait for this device to get unblocked, then retry */ - int j; - int d; - - for (j = 0; j < i; j++) { - if (r10_bio->devs[j].bio) { - d = r10_bio->devs[j].devnum; - rdev_dec_pending(conf->mirrors[d].rdev, mddev); - } - if (r10_bio->devs[j].repl_bio) { - struct md_rdev *rdev; - d = r10_bio->devs[j].devnum; - rdev = conf->mirrors[d].replacement; - if (!rdev) { - /* Race with remove_disk */ - smp_mb(); - rdev = conf->mirrors[d].rdev; - } - rdev_dec_pending(rdev, mddev); - } - } - allow_barrier(conf); - md_wait_for_blocked_rdev(blocked_rdev, mddev); - wait_barrier(conf); - goto retry_write; - } - - if (max_sectors < r10_bio->sectors) { - /* We are splitting this into multiple parts, so - * we need to prepare for allocating another r10_bio. - */ - r10_bio->sectors = max_sectors; - spin_lock_irq(&conf->device_lock); - if (bio->bi_phys_segments == 0) - bio->bi_phys_segments = 2; - else - bio->bi_phys_segments++; - spin_unlock_irq(&conf->device_lock); - } - sectors_handled = r10_bio->sector + max_sectors - bio->bi_sector; - - atomic_set(&r10_bio->remaining, 1); - bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0); - - for (i = 0; i < conf->copies; i++) { - struct bio *mbio; - int d = r10_bio->devs[i].devnum; - if (r10_bio->devs[i].bio) { - struct md_rdev *rdev = conf->mirrors[d].rdev; - mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); - md_trim_bio(mbio, r10_bio->sector - bio->bi_sector, - max_sectors); - r10_bio->devs[i].bio = mbio; - - mbio->bi_sector = (r10_bio->devs[i].addr+ - choose_data_offset(r10_bio, - rdev)); - mbio->bi_bdev = rdev->bdev; - mbio->bi_end_io = raid10_end_write_request; - mbio->bi_rw = WRITE | do_sync | do_fua; - mbio->bi_private = r10_bio; - -<<<<<<< found - atomic_inc(&r10_bio->remaining); - spin_lock_irqsave(&conf->device_lock, flags); -||||||| expected - atomic_inc(&r10_bio->remaining); - - cb = blk_check_plugged(raid10_unplug, mddev, sizeof(*plug)); - if (cb) - plug = container_of(cb, struct raid10_plug_cb, cb); - else - plug = NULL; - spin_lock_irqsave(&conf->device_lock, flags); -======= - atomic_inc(&r10_bio->remaining); - - cb = blk_check_plugged(raid10_unplug, mddev, - sizeof(*plug)); - if (cb) - plug = container_of(cb, struct raid10_plug_cb, - cb); - else - plug = NULL; - spin_lock_irqsave(&conf->device_lock, flags); ->>>>>>> replacement -<<<<<<< found - bio_list_add(&conf->pending_bio_list, mbio); - conf->pending_count++; - spin_unlock_irqrestore(&conf->device_lock, flags); -||||||| expected - if (plug) { - bio_list_add(&plug->pending, mbio); - plug->pending_cnt++; - } else { - bio_list_add(&conf->pending_bio_list, mbio); - conf->pending_count++; - } - spin_unlock_irqrestore(&conf->device_lock, flags); -======= - if (plug) { - bio_list_add(&plug->pending, mbio); - plug->pending_cnt++; - } else { - bio_list_add(&conf->pending_bio_list, mbio); - conf->pending_count++; - } - spin_unlock_irqrestore(&conf->device_lock, flags); ->>>>>>> replacement - if (!mddev_check_plugged(mddev)) - md_wakeup_thread(mddev->thread); - } - - if (r10_bio->devs[i].repl_bio) { - struct md_rdev *rdev = conf->mirrors[d].replacement; - if (rdev == NULL) { - /* Replacement just got moved to main 'rdev' */ - smp_mb(); - rdev = conf->mirrors[d].rdev; - } - mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); - md_trim_bio(mbio, r10_bio->sector - bio->bi_sector, - max_sectors); - r10_bio->devs[i].repl_bio = mbio; - - mbio->bi_sector = (r10_bio->devs[i].addr + - choose_data_offset( - r10_bio, rdev)); - mbio->bi_bdev = rdev->bdev; - mbio->bi_end_io = raid10_end_write_request; - mbio->bi_rw = WRITE | do_sync | do_fua; - mbio->bi_private = r10_bio; - - atomic_inc(&r10_bio->remaining); - spin_lock_irqsave(&conf->device_lock, flags); - bio_list_add(&conf->pending_bio_list, mbio); - conf->pending_count++; - spin_unlock_irqrestore(&conf->device_lock, flags); - if (!mddev_check_plugged(mddev)) - md_wakeup_thread(mddev->thread); - } - } - - /* Don't remove the bias on 'remaining' (one_write_done) until - * after checking if we need to go around again. - */ - - if (sectors_handled < (bio->bi_size >> 9)) { - one_write_done(r10_bio); - /* We need another r10_bio. It has already been counted - * in bio->bi_phys_segments. - */ - r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO); - - r10_bio->master_bio = bio; - r10_bio->sectors = (bio->bi_size >> 9) - sectors_handled; - - r10_bio->mddev = mddev; - r10_bio->sector = bio->bi_sector + sectors_handled; - r10_bio->state = 0; - goto retry_write; - } - one_write_done(r10_bio); - - /* In case raid10d snuck in to freeze_array */ - wake_up(&conf->wait_barrier); -} - -static void status(struct seq_file *seq, struct mddev *mddev) -{ - struct r10conf *conf = mddev->private; - int i; - - if (conf->geo.near_copies < conf->geo.raid_disks) - seq_printf(seq, " %dK chunks", mddev->chunk_sectors / 2); - if (conf->geo.near_copies > 1) - seq_printf(seq, " %d near-copies", conf->geo.near_copies); - if (conf->geo.far_copies > 1) { - if (conf->geo.far_offset) - seq_printf(seq, " %d offset-copies", conf->geo.far_copies); - else - seq_printf(seq, " %d far-copies", conf->geo.far_copies); - } - seq_printf(seq, " [%d/%d] [", conf->geo.raid_disks, - conf->geo.raid_disks - mddev->degraded); - for (i = 0; i < conf->geo.raid_disks; i++) - seq_printf(seq, "%s", - conf->mirrors[i].rdev && - test_bit(In_sync, &conf->mirrors[i].rdev->flags) ? "U" : "_"); - seq_printf(seq, "]"); -} - -/* check if there are enough drives for - * every block to appear on atleast one. - * Don't consider the device numbered 'ignore' - * as we might be about to remove it. - */ -static int _enough(struct r10conf *conf, struct geom *geo, int ignore) -{ - int first = 0; - - do { - int n = conf->copies; - int cnt = 0; - int this = first; - while (n--) { - if (conf->mirrors[this].rdev && - this != ignore) - cnt++; - this = (this+1) % geo->raid_disks; - } - if (cnt == 0) - return 0; - first = (first + geo->near_copies) % geo->raid_disks; - } while (first != 0); - return 1; -} - -static int enough(struct r10conf *conf, int ignore) -{ - return _enough(conf, &conf->geo, ignore) && - _enough(conf, &conf->prev, ignore); -} - -static void error(struct mddev *mddev, struct md_rdev *rdev) -{ - char b[BDEVNAME_SIZE]; - struct r10conf *conf = mddev->private; - - /* - * If it is not operational, then we have already marked it as dead - * else if it is the last working disks, ignore the error, let the - * next level up know. - * else mark the drive as failed - */ - if (test_bit(In_sync, &rdev->flags) - && !enough(conf, rdev->raid_disk)) - /* - * Don't fail the drive, just return an IO error. - */ - return; - if (test_and_clear_bit(In_sync, &rdev->flags)) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - mddev->degraded++; - spin_unlock_irqrestore(&conf->device_lock, flags); - /* - * if recovery is running, make sure it aborts. - */ - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - } - set_bit(Blocked, &rdev->flags); - set_bit(Faulty, &rdev->flags); - set_bit(MD_CHANGE_DEVS, &mddev->flags); - printk(KERN_ALERT - "md/raid10:%s: Disk failure on %s, disabling device.\n" - "md/raid10:%s: Operation continuing on %d devices.\n", - mdname(mddev), bdevname(rdev->bdev, b), - mdname(mddev), conf->geo.raid_disks - mddev->degraded); -} - -static void print_conf(struct r10conf *conf) -{ - int i; - struct raid10_info *tmp; - - printk(KERN_DEBUG "RAID10 conf printout:\n"); - if (!conf) { - printk(KERN_DEBUG "(!conf)\n"); - return; - } - printk(KERN_DEBUG " --- wd:%d rd:%d\n", conf->geo.raid_disks - conf->mddev->degraded, - conf->geo.raid_disks); - - for (i = 0; i < conf->geo.raid_disks; i++) { - char b[BDEVNAME_SIZE]; - tmp = conf->mirrors + i; - if (tmp->rdev) - printk(KERN_DEBUG " disk %d, wo:%d, o:%d, dev:%s\n", - i, !test_bit(In_sync, &tmp->rdev->flags), - !test_bit(Faulty, &tmp->rdev->flags), - bdevname(tmp->rdev->bdev,b)); - } -} - -static void close_sync(struct r10conf *conf) -{ - wait_barrier(conf); - allow_barrier(conf); - - mempool_destroy(conf->r10buf_pool); - conf->r10buf_pool = NULL; -} - -static int raid10_spare_active(struct mddev *mddev) -{ - int i; - struct r10conf *conf = mddev->private; - struct raid10_info *tmp; - int count = 0; - unsigned long flags; - - /* - * Find all non-in_sync disks within the RAID10 configuration - * and mark them in_sync - */ - for (i = 0; i < conf->geo.raid_disks; i++) { - tmp = conf->mirrors + i; - if (tmp->replacement - && tmp->replacement->recovery_offset == MaxSector - && !test_bit(Faulty, &tmp->replacement->flags) - && !test_and_set_bit(In_sync, &tmp->replacement->flags)) { - /* Replacement has just become active */ - if (!tmp->rdev - || !test_and_clear_bit(In_sync, &tmp->rdev->flags)) - count++; - if (tmp->rdev) { - /* Replaced device not technically faulty, - * but we need to be sure it gets removed - * and never re-added. - */ - set_bit(Faulty, &tmp->rdev->flags); - sysfs_notify_dirent_safe( - tmp->rdev->sysfs_state); - } - sysfs_notify_dirent_safe(tmp->replacement->sysfs_state); - } else if (tmp->rdev - && !test_bit(Faulty, &tmp->rdev->flags) - && !test_and_set_bit(In_sync, &tmp->rdev->flags)) { - count++; - sysfs_notify_dirent(tmp->rdev->sysfs_state); - } - } - spin_lock_irqsave(&conf->device_lock, flags); - mddev->degraded -= count; - spin_unlock_irqrestore(&conf->device_lock, flags); - - print_conf(conf); - return count; -} - - -static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev) -{ - struct r10conf *conf = mddev->private; - int err = -EEXIST; - int mirror; - int first = 0; - int last = conf->geo.raid_disks - 1; - struct request_queue *q = bdev_get_queue(rdev->bdev); - - if (mddev->recovery_cp < MaxSector) - /* only hot-add to in-sync arrays, as recovery is - * very different from resync - */ - return -EBUSY; - if (rdev->saved_raid_disk < 0 && !_enough(conf, &conf->prev, -1)) - return -EINVAL; - - if (rdev->raid_disk >= 0) - first = last = rdev->raid_disk; - - if (q->merge_bvec_fn) { - set_bit(Unmerged, &rdev->flags); - mddev->merge_check_needed = 1; - } - - if (rdev->saved_raid_disk >= first && - conf->mirrors[rdev->saved_raid_disk].rdev == NULL) - mirror = rdev->saved_raid_disk; - else - mirror = first; - for ( ; mirror <= last ; mirror++) { - struct raid10_info *p = &conf->mirrors[mirror]; - if (p->recovery_disabled == mddev->recovery_disabled) - continue; - if (p->rdev) { - if (!test_bit(WantReplacement, &p->rdev->flags) || - p->replacement != NULL) - continue; - clear_bit(In_sync, &rdev->flags); - set_bit(Replacement, &rdev->flags); - rdev->raid_disk = mirror; - err = 0; - disk_stack_limits(mddev->gendisk, rdev->bdev, - rdev->data_offset << 9); - conf->fullsync = 1; - rcu_assign_pointer(p->replacement, rdev); - break; - } - - disk_stack_limits(mddev->gendisk, rdev->bdev, - rdev->data_offset << 9); - - p->head_position = 0; - p->recovery_disabled = mddev->recovery_disabled - 1; - rdev->raid_disk = mirror; - err = 0; - if (rdev->saved_raid_disk != mirror) - conf->fullsync = 1; - rcu_assign_pointer(p->rdev, rdev); - break; - } - if (err == 0 && test_bit(Unmerged, &rdev->flags)) { - /* Some requests might not have seen this new - * merge_bvec_fn. We must wait for them to complete - * before merging the device fully. - * First we make sure any code which has tested - * our function has submitted the request, then - * we wait for all outstanding requests to complete. - */ - synchronize_sched(); - raise_barrier(conf, 0); - lower_barrier(conf); - clear_bit(Unmerged, &rdev->flags); - } - md_integrity_add_rdev(rdev, mddev); - print_conf(conf); - return err; -} - -static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev) -{ - struct r10conf *conf = mddev->private; - int err = 0; - int number = rdev->raid_disk; - struct md_rdev **rdevp; - struct raid10_info *p = conf->mirrors + number; - - print_conf(conf); - if (rdev == p->rdev) - rdevp = &p->rdev; - else if (rdev == p->replacement) - rdevp = &p->replacement; - else - return 0; - - if (test_bit(In_sync, &rdev->flags) || - atomic_read(&rdev->nr_pending)) { - err = -EBUSY; - goto abort; - } - /* Only remove faulty devices if recovery - * is not possible. - */ - if (!test_bit(Faulty, &rdev->flags) && - mddev->recovery_disabled != p->recovery_disabled && - (!p->replacement || p->replacement == rdev) && - number < conf->geo.raid_disks && - enough(conf, -1)) { - err = -EBUSY; - goto abort; - } - *rdevp = NULL; - synchronize_rcu(); - if (atomic_read(&rdev->nr_pending)) { - /* lost the race, try later */ - err = -EBUSY; - *rdevp = rdev; - goto abort; - } else if (p->replacement) { - /* We must have just cleared 'rdev' */ - p->rdev = p->replacement; - clear_bit(Replacement, &p->replacement->flags); - smp_mb(); /* Make sure other CPUs may see both as identical - * but will never see neither -- if they are careful. - */ - p->replacement = NULL; - clear_bit(WantReplacement, &rdev->flags); - } else - /* We might have just remove the Replacement as faulty - * Clear the flag just in case - */ - clear_bit(WantReplacement, &rdev->flags); - - err = md_integrity_register(mddev); - -abort: - - print_conf(conf); - return err; -} - - -static void end_sync_read(struct bio *bio, int error) -{ - struct r10bio *r10_bio = bio->bi_private; - struct r10conf *conf = r10_bio->mddev->private; - int d; - - if (bio == r10_bio->master_bio) { - /* this is a reshape read */ - d = r10_bio->read_slot; /* really the read dev */ - } else - d = find_bio_disk(conf, r10_bio, bio, NULL, NULL); - - if (test_bit(BIO_UPTODATE, &bio->bi_flags)) - set_bit(R10BIO_Uptodate, &r10_bio->state); - else - /* The write handler will notice the lack of - * R10BIO_Uptodate and record any errors etc - */ - atomic_add(r10_bio->sectors, - &conf->mirrors[d].rdev->corrected_errors); - - /* for reconstruct, we always reschedule after a read. - * for resync, only after all reads - */ - rdev_dec_pending(conf->mirrors[d].rdev, conf->mddev); - if (test_bit(R10BIO_IsRecover, &r10_bio->state) || - atomic_dec_and_test(&r10_bio->remaining)) { - /* we have read all the blocks, - * do the comparison in process context in raid10d - */ - reschedule_retry(r10_bio); - } -} - -static void end_sync_request(struct r10bio *r10_bio) -{ - struct mddev *mddev = r10_bio->mddev; - - while (atomic_dec_and_test(&r10_bio->remaining)) { - if (r10_bio->master_bio == NULL) { - /* the primary of several recovery bios */ - sector_t s = r10_bio->sectors; - if (test_bit(R10BIO_MadeGood, &r10_bio->state) || - test_bit(R10BIO_WriteError, &r10_bio->state)) - reschedule_retry(r10_bio); - else - put_buf(r10_bio); - md_done_sync(mddev, s, 1); - break; - } else { - struct r10bio *r10_bio2 = (struct r10bio *)r10_bio->master_bio; - if (test_bit(R10BIO_MadeGood, &r10_bio->state) || - test_bit(R10BIO_WriteError, &r10_bio->state)) - reschedule_retry(r10_bio); - else - put_buf(r10_bio); - r10_bio = r10_bio2; - } - } -} - -static void end_sync_write(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - struct r10bio *r10_bio = bio->bi_private; - struct mddev *mddev = r10_bio->mddev; - struct r10conf *conf = mddev->private; - int d; - sector_t first_bad; - int bad_sectors; - int slot; - int repl; - struct md_rdev *rdev = NULL; - - d = find_bio_disk(conf, r10_bio, bio, &slot, &repl); - if (repl) - rdev = conf->mirrors[d].replacement; - else - rdev = conf->mirrors[d].rdev; - - if (!uptodate) { - if (repl) - md_error(mddev, rdev); - else { - set_bit(WriteErrorSeen, &rdev->flags); - if (!test_and_set_bit(WantReplacement, &rdev->flags)) - set_bit(MD_RECOVERY_NEEDED, - &rdev->mddev->recovery); - set_bit(R10BIO_WriteError, &r10_bio->state); - } - } else if (is_badblock(rdev, - r10_bio->devs[slot].addr, - r10_bio->sectors, - &first_bad, &bad_sectors)) - set_bit(R10BIO_MadeGood, &r10_bio->state); - - rdev_dec_pending(rdev, mddev); - - end_sync_request(r10_bio); -} - -/* - * Note: sync and recover and handled very differently for raid10 - * This code is for resync. - * For resync, we read through virtual addresses and read all blocks. - * If there is any error, we schedule a write. The lowest numbered - * drive is authoritative. - * However requests come for physical address, so we need to map. - * For every physical address there are raid_disks/copies virtual addresses, - * which is always are least one, but is not necessarly an integer. - * This means that a physical address can span multiple chunks, so we may - * have to submit multiple io requests for a single sync request. - */ -/* - * We check if all blocks are in-sync and only write to blocks that - * aren't in sync - */ -static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) -{ - struct r10conf *conf = mddev->private; - int i, first; - struct bio *tbio, *fbio; - int vcnt; - - atomic_set(&r10_bio->remaining, 1); - - /* find the first device with a block */ - for (i=0; icopies; i++) - if (test_bit(BIO_UPTODATE, &r10_bio->devs[i].bio->bi_flags)) - break; - - if (i == conf->copies) - goto done; - - first = i; - fbio = r10_bio->devs[i].bio; - - vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9); - /* now find blocks with errors */ - for (i=0 ; i < conf->copies ; i++) { - int j, d; - - tbio = r10_bio->devs[i].bio; - - if (tbio->bi_end_io != end_sync_read) - continue; - if (i == first) - continue; - if (test_bit(BIO_UPTODATE, &r10_bio->devs[i].bio->bi_flags)) { - /* We know that the bi_io_vec layout is the same for - * both 'first' and 'i', so we just compare them. - * All vec entries are PAGE_SIZE; - */ - for (j = 0; j < vcnt; j++) - if (memcmp(page_address(fbio->bi_io_vec[j].bv_page), - page_address(tbio->bi_io_vec[j].bv_page), - fbio->bi_io_vec[j].bv_len)) - break; - if (j == vcnt) - continue; - mddev->resync_mismatches += r10_bio->sectors; - if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) - /* Don't fix anything. */ - continue; - } - /* Ok, we need to write this bio, either to correct an - * inconsistency or to correct an unreadable block. - * First we need to fixup bv_offset, bv_len and - * bi_vecs, as the read request might have corrupted these - */ - tbio->bi_vcnt = vcnt; - tbio->bi_size = r10_bio->sectors << 9; - tbio->bi_idx = 0; - tbio->bi_phys_segments = 0; - tbio->bi_flags &= ~(BIO_POOL_MASK - 1); - tbio->bi_flags |= 1 << BIO_UPTODATE; - tbio->bi_next = NULL; - tbio->bi_rw = WRITE; - tbio->bi_private = r10_bio; - tbio->bi_sector = r10_bio->devs[i].addr; - - for (j=0; j < vcnt ; j++) { - tbio->bi_io_vec[j].bv_offset = 0; - tbio->bi_io_vec[j].bv_len = PAGE_SIZE; - - memcpy(page_address(tbio->bi_io_vec[j].bv_page), - page_address(fbio->bi_io_vec[j].bv_page), - PAGE_SIZE); - } - tbio->bi_end_io = end_sync_write; - - d = r10_bio->devs[i].devnum; - atomic_inc(&conf->mirrors[d].rdev->nr_pending); - atomic_inc(&r10_bio->remaining); - md_sync_acct(conf->mirrors[d].rdev->bdev, tbio->bi_size >> 9); - - tbio->bi_sector += conf->mirrors[d].rdev->data_offset; - tbio->bi_bdev = conf->mirrors[d].rdev->bdev; - generic_make_request(tbio); - } - - /* Now write out to any replacement devices - * that are active - */ - for (i = 0; i < conf->copies; i++) { - int j, d; - - tbio = r10_bio->devs[i].repl_bio; - if (!tbio || !tbio->bi_end_io) - continue; - if (r10_bio->devs[i].bio->bi_end_io != end_sync_write - && r10_bio->devs[i].bio != fbio) - for (j = 0; j < vcnt; j++) - memcpy(page_address(tbio->bi_io_vec[j].bv_page), - page_address(fbio->bi_io_vec[j].bv_page), - PAGE_SIZE); - d = r10_bio->devs[i].devnum; - atomic_inc(&r10_bio->remaining); - md_sync_acct(conf->mirrors[d].replacement->bdev, - tbio->bi_size >> 9); - generic_make_request(tbio); - } - -done: - if (atomic_dec_and_test(&r10_bio->remaining)) { - md_done_sync(mddev, r10_bio->sectors, 1); - put_buf(r10_bio); - } -} - -/* - * Now for the recovery code. - * Recovery happens across physical sectors. - * We recover all non-is_sync drives by finding the virtual address of - * each, and then choose a working drive that also has that virt address. - * There is a separate r10_bio for each non-in_sync drive. - * Only the first two slots are in use. The first for reading, - * The second for writing. - * - */ -static void fix_recovery_read_error(struct r10bio *r10_bio) -{ - /* We got a read error during recovery. - * We repeat the read in smaller page-sized sections. - * If a read succeeds, write it to the new device or record - * a bad block if we cannot. - * If a read fails, record a bad block on both old and - * new devices. - */ - struct mddev *mddev = r10_bio->mddev; - struct r10conf *conf = mddev->private; - struct bio *bio = r10_bio->devs[0].bio; - sector_t sect = 0; - int sectors = r10_bio->sectors; - int idx = 0; - int dr = r10_bio->devs[0].devnum; - int dw = r10_bio->devs[1].devnum; - - while (sectors) { - int s = sectors; - struct md_rdev *rdev; - sector_t addr; - int ok; - - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - - rdev = conf->mirrors[dr].rdev; - addr = r10_bio->devs[0].addr + sect, - ok = sync_page_io(rdev, - addr, - s << 9, - bio->bi_io_vec[idx].bv_page, - READ, false); - if (ok) { - rdev = conf->mirrors[dw].rdev; - addr = r10_bio->devs[1].addr + sect; - ok = sync_page_io(rdev, - addr, - s << 9, - bio->bi_io_vec[idx].bv_page, - WRITE, false); - if (!ok) { - set_bit(WriteErrorSeen, &rdev->flags); - if (!test_and_set_bit(WantReplacement, - &rdev->flags)) - set_bit(MD_RECOVERY_NEEDED, - &rdev->mddev->recovery); - } - } - if (!ok) { - /* We don't worry if we cannot set a bad block - - * it really is bad so there is no loss in not - * recording it yet - */ - rdev_set_badblocks(rdev, addr, s, 0); - - if (rdev != conf->mirrors[dw].rdev) { - /* need bad block on destination too */ - struct md_rdev *rdev2 = conf->mirrors[dw].rdev; - addr = r10_bio->devs[1].addr + sect; - ok = rdev_set_badblocks(rdev2, addr, s, 0); - if (!ok) { - /* just abort the recovery */ - printk(KERN_NOTICE - "md/raid10:%s: recovery aborted" - " due to read error\n", - mdname(mddev)); - - conf->mirrors[dw].recovery_disabled - = mddev->recovery_disabled; - set_bit(MD_RECOVERY_INTR, - &mddev->recovery); - break; - } - } - } - - sectors -= s; - sect += s; - idx++; - } -} - -static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) -{ - struct r10conf *conf = mddev->private; - int d; - struct bio *wbio, *wbio2; - - if (!test_bit(R10BIO_Uptodate, &r10_bio->state)) { - fix_recovery_read_error(r10_bio); - end_sync_request(r10_bio); - return; - } - - /* - * share the pages with the first bio - * and submit the write request - */ - d = r10_bio->devs[1].devnum; - wbio = r10_bio->devs[1].bio; - wbio2 = r10_bio->devs[1].repl_bio; - if (wbio->bi_end_io) { - atomic_inc(&conf->mirrors[d].rdev->nr_pending); - md_sync_acct(conf->mirrors[d].rdev->bdev, wbio->bi_size >> 9); - generic_make_request(wbio); - } - if (wbio2 && wbio2->bi_end_io) { - atomic_inc(&conf->mirrors[d].replacement->nr_pending); - md_sync_acct(conf->mirrors[d].replacement->bdev, - wbio2->bi_size >> 9); - generic_make_request(wbio2); - } -} - - -/* - * Used by fix_read_error() to decay the per rdev read_errors. - * We halve the read error count for every hour that has elapsed - * since the last recorded read error. - * - */ -static void check_decay_read_errors(struct mddev *mddev, struct md_rdev *rdev) -{ - struct timespec cur_time_mon; - unsigned long hours_since_last; - unsigned int read_errors = atomic_read(&rdev->read_errors); - - ktime_get_ts(&cur_time_mon); - - if (rdev->last_read_error.tv_sec == 0 && - rdev->last_read_error.tv_nsec == 0) { - /* first time we've seen a read error */ - rdev->last_read_error = cur_time_mon; - return; - } - - hours_since_last = (cur_time_mon.tv_sec - - rdev->last_read_error.tv_sec) / 3600; - - rdev->last_read_error = cur_time_mon; - - /* - * if hours_since_last is > the number of bits in read_errors - * just set read errors to 0. We do this to avoid - * overflowing the shift of read_errors by hours_since_last. - */ - if (hours_since_last >= 8 * sizeof(read_errors)) - atomic_set(&rdev->read_errors, 0); - else - atomic_set(&rdev->read_errors, read_errors >> hours_since_last); -} - -static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector, - int sectors, struct page *page, int rw) -{ - sector_t first_bad; - int bad_sectors; - - if (is_badblock(rdev, sector, sectors, &first_bad, &bad_sectors) - && (rw == READ || test_bit(WriteErrorSeen, &rdev->flags))) - return -1; - if (sync_page_io(rdev, sector, sectors << 9, page, rw, false)) - /* success */ - return 1; - if (rw == WRITE) { - set_bit(WriteErrorSeen, &rdev->flags); - if (!test_and_set_bit(WantReplacement, &rdev->flags)) - set_bit(MD_RECOVERY_NEEDED, - &rdev->mddev->recovery); - } - /* need to record an error - either for the block or the device */ - if (!rdev_set_badblocks(rdev, sector, sectors, 0)) - md_error(rdev->mddev, rdev); - return 0; -} - -/* - * This is a kernel thread which: - * - * 1. Retries failed read operations on working mirrors. - * 2. Updates the raid superblock when problems encounter. - * 3. Performs writes following reads for array synchronising. - */ - -static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10bio *r10_bio) -{ - int sect = 0; /* Offset from r10_bio->sector */ - int sectors = r10_bio->sectors; - struct md_rdev*rdev; - int max_read_errors = atomic_read(&mddev->max_corr_read_errors); - int d = r10_bio->devs[r10_bio->read_slot].devnum; - - /* still own a reference to this rdev, so it cannot - * have been cleared recently. - */ - rdev = conf->mirrors[d].rdev; - - if (test_bit(Faulty, &rdev->flags)) - /* drive has already been failed, just ignore any - more fix_read_error() attempts */ - return; - - check_decay_read_errors(mddev, rdev); - atomic_inc(&rdev->read_errors); - if (atomic_read(&rdev->read_errors) > max_read_errors) { - char b[BDEVNAME_SIZE]; - bdevname(rdev->bdev, b); - - printk(KERN_NOTICE - "md/raid10:%s: %s: Raid device exceeded " - "read_error threshold [cur %d:max %d]\n", - mdname(mddev), b, - atomic_read(&rdev->read_errors), max_read_errors); - printk(KERN_NOTICE - "md/raid10:%s: %s: Failing raid device\n", - mdname(mddev), b); - md_error(mddev, conf->mirrors[d].rdev); - r10_bio->devs[r10_bio->read_slot].bio = IO_BLOCKED; - return; - } - - while(sectors) { - int s = sectors; - int sl = r10_bio->read_slot; - int success = 0; - int start; - - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - - rcu_read_lock(); - do { - sector_t first_bad; - int bad_sectors; - - d = r10_bio->devs[sl].devnum; - rdev = rcu_dereference(conf->mirrors[d].rdev); - if (rdev && - !test_bit(Unmerged, &rdev->flags) && - test_bit(In_sync, &rdev->flags) && - is_badblock(rdev, r10_bio->devs[sl].addr + sect, s, - &first_bad, &bad_sectors) == 0) { - atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); - success = sync_page_io(rdev, - r10_bio->devs[sl].addr + - sect, - s<<9, - conf->tmppage, READ, false); - rdev_dec_pending(rdev, mddev); - rcu_read_lock(); - if (success) - break; - } - sl++; - if (sl == conf->copies) - sl = 0; - } while (!success && sl != r10_bio->read_slot); - rcu_read_unlock(); - - if (!success) { - /* Cannot read from anywhere, just mark the block - * as bad on the first device to discourage future - * reads. - */ - int dn = r10_bio->devs[r10_bio->read_slot].devnum; - rdev = conf->mirrors[dn].rdev; - - if (!rdev_set_badblocks( - rdev, - r10_bio->devs[r10_bio->read_slot].addr - + sect, - s, 0)) { - md_error(mddev, rdev); - r10_bio->devs[r10_bio->read_slot].bio - = IO_BLOCKED; - } - break; - } - - start = sl; - /* write it back and re-read */ - rcu_read_lock(); - while (sl != r10_bio->read_slot) { - char b[BDEVNAME_SIZE]; - - if (sl==0) - sl = conf->copies; - sl--; - d = r10_bio->devs[sl].devnum; - rdev = rcu_dereference(conf->mirrors[d].rdev); - if (!rdev || - test_bit(Unmerged, &rdev->flags) || - !test_bit(In_sync, &rdev->flags)) - continue; - - atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); - if (r10_sync_page_io(rdev, - r10_bio->devs[sl].addr + - sect, - s, conf->tmppage, WRITE) - == 0) { - /* Well, this device is dead */ - printk(KERN_NOTICE - "md/raid10:%s: read correction " - "write failed" - " (%d sectors at %llu on %s)\n", - mdname(mddev), s, - (unsigned long long)( - sect + - choose_data_offset(r10_bio, - rdev)), - bdevname(rdev->bdev, b)); - printk(KERN_NOTICE "md/raid10:%s: %s: failing " - "drive\n", - mdname(mddev), - bdevname(rdev->bdev, b)); - } - rdev_dec_pending(rdev, mddev); - rcu_read_lock(); - } - sl = start; - while (sl != r10_bio->read_slot) { - char b[BDEVNAME_SIZE]; - - if (sl==0) - sl = conf->copies; - sl--; - d = r10_bio->devs[sl].devnum; - rdev = rcu_dereference(conf->mirrors[d].rdev); - if (!rdev || - !test_bit(In_sync, &rdev->flags)) - continue; - - atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); - switch (r10_sync_page_io(rdev, - r10_bio->devs[sl].addr + - sect, - s, conf->tmppage, - READ)) { - case 0: - /* Well, this device is dead */ - printk(KERN_NOTICE - "md/raid10:%s: unable to read back " - "corrected sectors" - " (%d sectors at %llu on %s)\n", - mdname(mddev), s, - (unsigned long long)( - sect + - choose_data_offset(r10_bio, rdev)), - bdevname(rdev->bdev, b)); - printk(KERN_NOTICE "md/raid10:%s: %s: failing " - "drive\n", - mdname(mddev), - bdevname(rdev->bdev, b)); - break; - case 1: - printk(KERN_INFO - "md/raid10:%s: read error corrected" - " (%d sectors at %llu on %s)\n", - mdname(mddev), s, - (unsigned long long)( - sect + - choose_data_offset(r10_bio, rdev)), - bdevname(rdev->bdev, b)); - atomic_add(s, &rdev->corrected_errors); - } - - rdev_dec_pending(rdev, mddev); - rcu_read_lock(); - } - rcu_read_unlock(); - - sectors -= s; - sect += s; - } -} - -static void bi_complete(struct bio *bio, int error) -{ - complete((struct completion *)bio->bi_private); -} - -static int submit_bio_wait(int rw, struct bio *bio) -{ - struct completion event; - rw |= REQ_SYNC; - - init_completion(&event); - bio->bi_private = &event; - bio->bi_end_io = bi_complete; - submit_bio(rw, bio); - wait_for_completion(&event); - - return test_bit(BIO_UPTODATE, &bio->bi_flags); -} - -static int narrow_write_error(struct r10bio *r10_bio, int i) -{ - struct bio *bio = r10_bio->master_bio; - struct mddev *mddev = r10_bio->mddev; - struct r10conf *conf = mddev->private; - struct md_rdev *rdev = conf->mirrors[r10_bio->devs[i].devnum].rdev; - /* bio has the data to be written to slot 'i' where - * we just recently had a write error. - * We repeatedly clone the bio and trim down to one block, - * then try the write. Where the write fails we record - * a bad block. - * It is conceivable that the bio doesn't exactly align with - * blocks. We must handle this. - * - * We currently own a reference to the rdev. - */ - - int block_sectors; - sector_t sector; - int sectors; - int sect_to_write = r10_bio->sectors; - int ok = 1; - - if (rdev->badblocks.shift < 0) - return 0; - - block_sectors = 1 << rdev->badblocks.shift; - sector = r10_bio->sector; - sectors = ((r10_bio->sector + block_sectors) - & ~(sector_t)(block_sectors - 1)) - - sector; - - while (sect_to_write) { - struct bio *wbio; - if (sectors > sect_to_write) - sectors = sect_to_write; - /* Write at 'sector' for 'sectors' */ - wbio = bio_clone_mddev(bio, GFP_NOIO, mddev); - md_trim_bio(wbio, sector - bio->bi_sector, sectors); - wbio->bi_sector = (r10_bio->devs[i].addr+ - choose_data_offset(r10_bio, rdev) + - (sector - r10_bio->sector)); - wbio->bi_bdev = rdev->bdev; - if (submit_bio_wait(WRITE, wbio) == 0) - /* Failure! */ - ok = rdev_set_badblocks(rdev, sector, - sectors, 0) - && ok; - - bio_put(wbio); - sect_to_write -= sectors; - sector += sectors; - sectors = block_sectors; - } - return ok; -} - -static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio) -{ - int slot = r10_bio->read_slot; - struct bio *bio; - struct r10conf *conf = mddev->private; - struct md_rdev *rdev = r10_bio->devs[slot].rdev; - char b[BDEVNAME_SIZE]; - unsigned long do_sync; - int max_sectors; - - /* we got a read error. Maybe the drive is bad. Maybe just - * the block and we can fix it. - * We freeze all other IO, and try reading the block from - * other devices. When we find one, we re-write - * and check it that fixes the read error. - * This is all done synchronously while the array is - * frozen. - */ - bio = r10_bio->devs[slot].bio; - bdevname(bio->bi_bdev, b); - bio_put(bio); - r10_bio->devs[slot].bio = NULL; - - if (mddev->ro == 0) { - freeze_array(conf); - fix_read_error(conf, mddev, r10_bio); - unfreeze_array(conf); - } else - r10_bio->devs[slot].bio = IO_BLOCKED; - - rdev_dec_pending(rdev, mddev); - -read_more: - rdev = read_balance(conf, r10_bio, &max_sectors); - if (rdev == NULL) { - printk(KERN_ALERT "md/raid10:%s: %s: unrecoverable I/O" - " read error for block %llu\n", - mdname(mddev), b, - (unsigned long long)r10_bio->sector); - raid_end_bio_io(r10_bio); - return; - } - - do_sync = (r10_bio->master_bio->bi_rw & REQ_SYNC); - slot = r10_bio->read_slot; - printk_ratelimited( - KERN_ERR - "md/raid10:%s: %s: redirecting " - "sector %llu to another mirror\n", - mdname(mddev), - bdevname(rdev->bdev, b), - (unsigned long long)r10_bio->sector); - bio = bio_clone_mddev(r10_bio->master_bio, - GFP_NOIO, mddev); - md_trim_bio(bio, - r10_bio->sector - bio->bi_sector, - max_sectors); - r10_bio->devs[slot].bio = bio; - r10_bio->devs[slot].rdev = rdev; - bio->bi_sector = r10_bio->devs[slot].addr - + choose_data_offset(r10_bio, rdev); - bio->bi_bdev = rdev->bdev; - bio->bi_rw = READ | do_sync; - bio->bi_private = r10_bio; - bio->bi_end_io = raid10_end_read_request; - if (max_sectors < r10_bio->sectors) { - /* Drat - have to split this up more */ - struct bio *mbio = r10_bio->master_bio; - int sectors_handled = - r10_bio->sector + max_sectors - - mbio->bi_sector; - r10_bio->sectors = max_sectors; - spin_lock_irq(&conf->device_lock); - if (mbio->bi_phys_segments == 0) - mbio->bi_phys_segments = 2; - else - mbio->bi_phys_segments++; - spin_unlock_irq(&conf->device_lock); - generic_make_request(bio); - - r10_bio = mempool_alloc(conf->r10bio_pool, - GFP_NOIO); - r10_bio->master_bio = mbio; - r10_bio->sectors = (mbio->bi_size >> 9) - - sectors_handled; - r10_bio->state = 0; - set_bit(R10BIO_ReadError, - &r10_bio->state); - r10_bio->mddev = mddev; - r10_bio->sector = mbio->bi_sector - + sectors_handled; - - goto read_more; - } else - generic_make_request(bio); -} - -static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio) -{ - /* Some sort of write request has finished and it - * succeeded in writing where we thought there was a - * bad block. So forget the bad block. - * Or possibly if failed and we need to record - * a bad block. - */ - int m; - struct md_rdev *rdev; - - if (test_bit(R10BIO_IsSync, &r10_bio->state) || - test_bit(R10BIO_IsRecover, &r10_bio->state)) { - for (m = 0; m < conf->copies; m++) { - int dev = r10_bio->devs[m].devnum; - rdev = conf->mirrors[dev].rdev; - if (r10_bio->devs[m].bio == NULL) - continue; - if (test_bit(BIO_UPTODATE, - &r10_bio->devs[m].bio->bi_flags)) { - rdev_clear_badblocks( - rdev, - r10_bio->devs[m].addr, - r10_bio->sectors, 0); - } else { - if (!rdev_set_badblocks( - rdev, - r10_bio->devs[m].addr, - r10_bio->sectors, 0)) - md_error(conf->mddev, rdev); - } - rdev = conf->mirrors[dev].replacement; - if (r10_bio->devs[m].repl_bio == NULL) - continue; - if (test_bit(BIO_UPTODATE, - &r10_bio->devs[m].repl_bio->bi_flags)) { - rdev_clear_badblocks( - rdev, - r10_bio->devs[m].addr, - r10_bio->sectors, 0); - } else { - if (!rdev_set_badblocks( - rdev, - r10_bio->devs[m].addr, - r10_bio->sectors, 0)) - md_error(conf->mddev, rdev); - } - } - put_buf(r10_bio); - } else { - for (m = 0; m < conf->copies; m++) { - int dev = r10_bio->devs[m].devnum; - struct bio *bio = r10_bio->devs[m].bio; - rdev = conf->mirrors[dev].rdev; - if (bio == IO_MADE_GOOD) { - rdev_clear_badblocks( - rdev, - r10_bio->devs[m].addr, - r10_bio->sectors, 0); - rdev_dec_pending(rdev, conf->mddev); - } else if (bio != NULL && - !test_bit(BIO_UPTODATE, &bio->bi_flags)) { - if (!narrow_write_error(r10_bio, m)) { - md_error(conf->mddev, rdev); - set_bit(R10BIO_Degraded, - &r10_bio->state); - } - rdev_dec_pending(rdev, conf->mddev); - } - bio = r10_bio->devs[m].repl_bio; - rdev = conf->mirrors[dev].replacement; - if (rdev && bio == IO_MADE_GOOD) { - rdev_clear_badblocks( - rdev, - r10_bio->devs[m].addr, - r10_bio->sectors, 0); - rdev_dec_pending(rdev, conf->mddev); - } - } - if (test_bit(R10BIO_WriteError, - &r10_bio->state)) - close_write(r10_bio); - raid_end_bio_io(r10_bio); - } -} - -static void raid10d(struct mddev *mddev) -{ - struct r10bio *r10_bio; - unsigned long flags; - struct r10conf *conf = mddev->private; - struct list_head *head = &conf->retry_list; - struct blk_plug plug; - - md_check_recovery(mddev); - - blk_start_plug(&plug); - for (;;) { - - flush_pending_writes(conf); - - spin_lock_irqsave(&conf->device_lock, flags); - if (list_empty(head)) { - spin_unlock_irqrestore(&conf->device_lock, flags); - break; - } - r10_bio = list_entry(head->prev, struct r10bio, retry_list); - list_del(head->prev); - conf->nr_queued--; - spin_unlock_irqrestore(&conf->device_lock, flags); - - mddev = r10_bio->mddev; - conf = mddev->private; - if (test_bit(R10BIO_MadeGood, &r10_bio->state) || - test_bit(R10BIO_WriteError, &r10_bio->state)) - handle_write_completed(conf, r10_bio); - else if (test_bit(R10BIO_IsReshape, &r10_bio->state)) - reshape_request_write(mddev, r10_bio); - else if (test_bit(R10BIO_IsSync, &r10_bio->state)) - sync_request_write(mddev, r10_bio); - else if (test_bit(R10BIO_IsRecover, &r10_bio->state)) - recovery_request_write(mddev, r10_bio); - else if (test_bit(R10BIO_ReadError, &r10_bio->state)) - handle_read_error(mddev, r10_bio); - else { - /* just a partial read to be scheduled from a - * separate context - */ - int slot = r10_bio->read_slot; - generic_make_request(r10_bio->devs[slot].bio); - } - - cond_resched(); - if (mddev->flags & ~(1<r10buf_pool); - conf->have_replacement = 0; - for (i = 0; i < conf->geo.raid_disks; i++) - if (conf->mirrors[i].replacement) - conf->have_replacement = 1; - conf->r10buf_pool = mempool_create(buffs, r10buf_pool_alloc, r10buf_pool_free, conf); - if (!conf->r10buf_pool) - return -ENOMEM; - conf->next_resync = 0; - return 0; -} - -/* - * perform a "sync" on one "block" - * - * We need to make sure that no normal I/O request - particularly write - * requests - conflict with active sync requests. - * - * This is achieved by tracking pending requests and a 'barrier' concept - * that can be installed to exclude normal IO requests. - * - * Resync and recovery are handled very differently. - * We differentiate by looking at MD_RECOVERY_SYNC in mddev->recovery. - * - * For resync, we iterate over virtual addresses, read all copies, - * and update if there are differences. If only one copy is live, - * skip it. - * For recovery, we iterate over physical addresses, read a good - * value for each non-in_sync drive, and over-write. - * - * So, for recovery we may have several outstanding complex requests for a - * given address, one for each out-of-sync device. We model this by allocating - * a number of r10_bio structures, one for each out-of-sync device. - * As we setup these structures, we collect all bio's together into a list - * which we then process collectively to add pages, and then process again - * to pass to generic_make_request. - * - * The r10_bio structures are linked using a borrowed master_bio pointer. - * This link is counted in ->remaining. When the r10_bio that points to NULL - * has its remaining count decremented to 0, the whole complex operation - * is complete. - * - */ - -static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, - int *skipped, int go_faster) -{ - struct r10conf *conf = mddev->private; - struct r10bio *r10_bio; - struct bio *biolist = NULL, *bio; - sector_t max_sector, nr_sectors; - int i; - int max_sync; - sector_t sync_blocks; - sector_t sectors_skipped = 0; - int chunks_skipped = 0; - sector_t chunk_mask = conf->geo.chunk_mask; - - if (!conf->r10buf_pool) - if (init_resync(conf)) - return 0; - - skipped: - max_sector = mddev->dev_sectors; - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) || - test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) - max_sector = mddev->resync_max_sectors; - if (sector_nr >= max_sector) { - /* If we aborted, we need to abort the - * sync on the 'current' bitmap chucks (there can - * be several when recovering multiple devices). - * as we may have started syncing it but not finished. - * We can find the current address in - * mddev->curr_resync, but for recovery, - * we need to convert that to several - * virtual addresses. - */ - if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) { - end_reshape(conf); - return 0; - } - - if (mddev->curr_resync < max_sector) { /* aborted */ - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) - bitmap_end_sync(mddev->bitmap, mddev->curr_resync, - &sync_blocks, 1); - else for (i = 0; i < conf->geo.raid_disks; i++) { - sector_t sect = - raid10_find_virt(conf, mddev->curr_resync, i); - bitmap_end_sync(mddev->bitmap, sect, - &sync_blocks, 1); - } - } else { - /* completed sync */ - if ((!mddev->bitmap || conf->fullsync) - && conf->have_replacement - && test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { - /* Completed a full sync so the replacements - * are now fully recovered. - */ - for (i = 0; i < conf->geo.raid_disks; i++) - if (conf->mirrors[i].replacement) - conf->mirrors[i].replacement - ->recovery_offset - = MaxSector; - } - conf->fullsync = 0; - } - bitmap_close_sync(mddev->bitmap); - close_sync(conf); - *skipped = 1; - return sectors_skipped; - } - - if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) - return reshape_request(mddev, sector_nr, skipped); - - if (chunks_skipped >= conf->geo.raid_disks) { - /* if there has been nothing to do on any drive, - * then there is nothing to do at all.. - */ - *skipped = 1; - return (max_sector - sector_nr) + sectors_skipped; - } - - if (max_sector > mddev->resync_max) - max_sector = mddev->resync_max; /* Don't do IO beyond here */ - - /* make sure whole request will fit in a chunk - if chunks - * are meaningful - */ - if (conf->geo.near_copies < conf->geo.raid_disks && - max_sector > (sector_nr | chunk_mask)) - max_sector = (sector_nr | chunk_mask) + 1; - /* - * If there is non-resync activity waiting for us then - * put in a delay to throttle resync. - */ - if (!go_faster && conf->nr_waiting) - msleep_interruptible(1000); - - /* Again, very different code for resync and recovery. - * Both must result in an r10bio with a list of bios that - * have bi_end_io, bi_sector, bi_bdev set, - * and bi_private set to the r10bio. - * For recovery, we may actually create several r10bios - * with 2 bios in each, that correspond to the bios in the main one. - * In this case, the subordinate r10bios link back through a - * borrowed master_bio pointer, and the counter in the master - * includes a ref from each subordinate. - */ - /* First, we decide what to do and set ->bi_end_io - * To end_sync_read if we want to read, and - * end_sync_write if we will want to write. - */ - - max_sync = RESYNC_PAGES << (PAGE_SHIFT-9); - if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { - /* recovery... the complicated one */ - int j; - r10_bio = NULL; - - for (i = 0 ; i < conf->geo.raid_disks; i++) { - int still_degraded; - struct r10bio *rb2; - sector_t sect; - int must_sync; - int any_working; - struct raid10_info *mirror = &conf->mirrors[i]; - - if ((mirror->rdev == NULL || - test_bit(In_sync, &mirror->rdev->flags)) - && - (mirror->replacement == NULL || - test_bit(Faulty, - &mirror->replacement->flags))) - continue; - - still_degraded = 0; - /* want to reconstruct this device */ - rb2 = r10_bio; - sect = raid10_find_virt(conf, sector_nr, i); - if (sect >= mddev->resync_max_sectors) { - /* last stripe is not complete - don't - * try to recover this sector. - */ - continue; - } - /* Unless we are doing a full sync, or a replacement - * we only need to recover the block if it is set in - * the bitmap - */ - must_sync = bitmap_start_sync(mddev->bitmap, sect, - &sync_blocks, 1); - if (sync_blocks < max_sync) - max_sync = sync_blocks; - if (!must_sync && - mirror->replacement == NULL && - !conf->fullsync) { - /* yep, skip the sync_blocks here, but don't assume - * that there will never be anything to do here - */ - chunks_skipped = -1; - continue; - } - - r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO); - raise_barrier(conf, rb2 != NULL); - atomic_set(&r10_bio->remaining, 0); - - r10_bio->master_bio = (struct bio*)rb2; - if (rb2) - atomic_inc(&rb2->remaining); - r10_bio->mddev = mddev; - set_bit(R10BIO_IsRecover, &r10_bio->state); - r10_bio->sector = sect; - - raid10_find_phys(conf, r10_bio); - - /* Need to check if the array will still be - * degraded - */ - for (j = 0; j < conf->geo.raid_disks; j++) - if (conf->mirrors[j].rdev == NULL || - test_bit(Faulty, &conf->mirrors[j].rdev->flags)) { - still_degraded = 1; - break; - } - - must_sync = bitmap_start_sync(mddev->bitmap, sect, - &sync_blocks, still_degraded); - - any_working = 0; - for (j=0; jcopies;j++) { - int k; - int d = r10_bio->devs[j].devnum; - sector_t from_addr, to_addr; - struct md_rdev *rdev; - sector_t sector, first_bad; - int bad_sectors; - if (!conf->mirrors[d].rdev || - !test_bit(In_sync, &conf->mirrors[d].rdev->flags)) - continue; - /* This is where we read from */ - any_working = 1; - rdev = conf->mirrors[d].rdev; - sector = r10_bio->devs[j].addr; - - if (is_badblock(rdev, sector, max_sync, - &first_bad, &bad_sectors)) { - if (first_bad > sector) - max_sync = first_bad - sector; - else { - bad_sectors -= (sector - - first_bad); - if (max_sync > bad_sectors) - max_sync = bad_sectors; - continue; - } - } - bio = r10_bio->devs[0].bio; - bio->bi_next = biolist; - biolist = bio; - bio->bi_private = r10_bio; - bio->bi_end_io = end_sync_read; - bio->bi_rw = READ; - from_addr = r10_bio->devs[j].addr; - bio->bi_sector = from_addr + rdev->data_offset; - bio->bi_bdev = rdev->bdev; - atomic_inc(&rdev->nr_pending); - /* and we write to 'i' (if not in_sync) */ - - for (k=0; kcopies; k++) - if (r10_bio->devs[k].devnum == i) - break; - BUG_ON(k == conf->copies); - to_addr = r10_bio->devs[k].addr; - r10_bio->devs[0].devnum = d; - r10_bio->devs[0].addr = from_addr; - r10_bio->devs[1].devnum = i; - r10_bio->devs[1].addr = to_addr; - - rdev = mirror->rdev; - if (!test_bit(In_sync, &rdev->flags)) { - bio = r10_bio->devs[1].bio; - bio->bi_next = biolist; - biolist = bio; - bio->bi_private = r10_bio; - bio->bi_end_io = end_sync_write; - bio->bi_rw = WRITE; - bio->bi_sector = to_addr - + rdev->data_offset; - bio->bi_bdev = rdev->bdev; - atomic_inc(&r10_bio->remaining); - } else - r10_bio->devs[1].bio->bi_end_io = NULL; - - /* and maybe write to replacement */ - bio = r10_bio->devs[1].repl_bio; - if (bio) - bio->bi_end_io = NULL; - rdev = mirror->replacement; - /* Note: if rdev != NULL, then bio - * cannot be NULL as r10buf_pool_alloc will - * have allocated it. - * So the second test here is pointless. - * But it keeps semantic-checkers happy, and - * this comment keeps human reviewers - * happy. - */ - if (rdev == NULL || bio == NULL || - test_bit(Faulty, &rdev->flags)) - break; - bio->bi_next = biolist; - biolist = bio; - bio->bi_private = r10_bio; - bio->bi_end_io = end_sync_write; - bio->bi_rw = WRITE; - bio->bi_sector = to_addr + rdev->data_offset; - bio->bi_bdev = rdev->bdev; - atomic_inc(&r10_bio->remaining); - break; - } - if (j == conf->copies) { - /* Cannot recover, so abort the recovery or - * record a bad block */ - put_buf(r10_bio); - if (rb2) - atomic_dec(&rb2->remaining); - r10_bio = rb2; - if (any_working) { - /* problem is that there are bad blocks - * on other device(s) - */ - int k; - for (k = 0; k < conf->copies; k++) - if (r10_bio->devs[k].devnum == i) - break; - if (!test_bit(In_sync, - &mirror->rdev->flags) - && !rdev_set_badblocks( - mirror->rdev, - r10_bio->devs[k].addr, - max_sync, 0)) - any_working = 0; - if (mirror->replacement && - !rdev_set_badblocks( - mirror->replacement, - r10_bio->devs[k].addr, - max_sync, 0)) - any_working = 0; - } - if (!any_working) { - if (!test_and_set_bit(MD_RECOVERY_INTR, - &mddev->recovery)) - printk(KERN_INFO "md/raid10:%s: insufficient " - "working devices for recovery.\n", - mdname(mddev)); - mirror->recovery_disabled - = mddev->recovery_disabled; - } - break; - } - } - if (biolist == NULL) { - while (r10_bio) { - struct r10bio *rb2 = r10_bio; - r10_bio = (struct r10bio*) rb2->master_bio; - rb2->master_bio = NULL; - put_buf(rb2); - } - goto giveup; - } - } else { - /* resync. Schedule a read for every block at this virt offset */ - int count = 0; - - bitmap_cond_end_sync(mddev->bitmap, sector_nr); - - if (!bitmap_start_sync(mddev->bitmap, sector_nr, - &sync_blocks, mddev->degraded) && - !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, - &mddev->recovery)) { - /* We can skip this block */ - *skipped = 1; - return sync_blocks + sectors_skipped; - } - if (sync_blocks < max_sync) - max_sync = sync_blocks; - r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO); - - r10_bio->mddev = mddev; - atomic_set(&r10_bio->remaining, 0); - raise_barrier(conf, 0); - conf->next_resync = sector_nr; - - r10_bio->master_bio = NULL; - r10_bio->sector = sector_nr; - set_bit(R10BIO_IsSync, &r10_bio->state); - raid10_find_phys(conf, r10_bio); - r10_bio->sectors = (sector_nr | chunk_mask) - sector_nr + 1; - - for (i = 0; i < conf->copies; i++) { - int d = r10_bio->devs[i].devnum; - sector_t first_bad, sector; - int bad_sectors; - - if (r10_bio->devs[i].repl_bio) - r10_bio->devs[i].repl_bio->bi_end_io = NULL; - - bio = r10_bio->devs[i].bio; - bio->bi_end_io = NULL; - clear_bit(BIO_UPTODATE, &bio->bi_flags); - if (conf->mirrors[d].rdev == NULL || - test_bit(Faulty, &conf->mirrors[d].rdev->flags)) - continue; - sector = r10_bio->devs[i].addr; - if (is_badblock(conf->mirrors[d].rdev, - sector, max_sync, - &first_bad, &bad_sectors)) { - if (first_bad > sector) - max_sync = first_bad - sector; - else { - bad_sectors -= (sector - first_bad); - if (max_sync > bad_sectors) - max_sync = bad_sectors; - continue; - } - } - atomic_inc(&conf->mirrors[d].rdev->nr_pending); - atomic_inc(&r10_bio->remaining); - bio->bi_next = biolist; - biolist = bio; - bio->bi_private = r10_bio; - bio->bi_end_io = end_sync_read; - bio->bi_rw = READ; - bio->bi_sector = sector + - conf->mirrors[d].rdev->data_offset; - bio->bi_bdev = conf->mirrors[d].rdev->bdev; - count++; - - if (conf->mirrors[d].replacement == NULL || - test_bit(Faulty, - &conf->mirrors[d].replacement->flags)) - continue; - - /* Need to set up for writing to the replacement */ - bio = r10_bio->devs[i].repl_bio; - clear_bit(BIO_UPTODATE, &bio->bi_flags); - - sector = r10_bio->devs[i].addr; - atomic_inc(&conf->mirrors[d].rdev->nr_pending); - bio->bi_next = biolist; - biolist = bio; - bio->bi_private = r10_bio; - bio->bi_end_io = end_sync_write; - bio->bi_rw = WRITE; - bio->bi_sector = sector + - conf->mirrors[d].replacement->data_offset; - bio->bi_bdev = conf->mirrors[d].replacement->bdev; - count++; - } - - if (count < 2) { - for (i=0; icopies; i++) { - int d = r10_bio->devs[i].devnum; - if (r10_bio->devs[i].bio->bi_end_io) - rdev_dec_pending(conf->mirrors[d].rdev, - mddev); - if (r10_bio->devs[i].repl_bio && - r10_bio->devs[i].repl_bio->bi_end_io) - rdev_dec_pending( - conf->mirrors[d].replacement, - mddev); - } - put_buf(r10_bio); - biolist = NULL; - goto giveup; - } - } - - for (bio = biolist; bio ; bio=bio->bi_next) { - - bio->bi_flags &= ~(BIO_POOL_MASK - 1); - if (bio->bi_end_io) - bio->bi_flags |= 1 << BIO_UPTODATE; - bio->bi_vcnt = 0; - bio->bi_idx = 0; - bio->bi_phys_segments = 0; - bio->bi_size = 0; - } - - nr_sectors = 0; - if (sector_nr + max_sync < max_sector) - max_sector = sector_nr + max_sync; - do { - struct page *page; - int len = PAGE_SIZE; - if (sector_nr + (len>>9) > max_sector) - len = (max_sector - sector_nr) << 9; - if (len == 0) - break; - for (bio= biolist ; bio ; bio=bio->bi_next) { - struct bio *bio2; - page = bio->bi_io_vec[bio->bi_vcnt].bv_page; - if (bio_add_page(bio, page, len, 0)) - continue; - - /* stop here */ - bio->bi_io_vec[bio->bi_vcnt].bv_page = page; - for (bio2 = biolist; - bio2 && bio2 != bio; - bio2 = bio2->bi_next) { - /* remove last page from this bio */ - bio2->bi_vcnt--; - bio2->bi_size -= len; - bio2->bi_flags &= ~(1<< BIO_SEG_VALID); - } - goto bio_full; - } - nr_sectors += len>>9; - sector_nr += len>>9; - } while (biolist->bi_vcnt < RESYNC_PAGES); - bio_full: - r10_bio->sectors = nr_sectors; - - while (biolist) { - bio = biolist; - biolist = biolist->bi_next; - - bio->bi_next = NULL; - r10_bio = bio->bi_private; - r10_bio->sectors = nr_sectors; - - if (bio->bi_end_io == end_sync_read) { - md_sync_acct(bio->bi_bdev, nr_sectors); - generic_make_request(bio); - } - } - - if (sectors_skipped) - /* pretend they weren't skipped, it makes - * no important difference in this case - */ - md_done_sync(mddev, sectors_skipped, 1); - - return sectors_skipped + nr_sectors; - giveup: - /* There is nowhere to write, so all non-sync - * drives must be failed or in resync, all drives - * have a bad block, so try the next chunk... - */ - if (sector_nr + max_sync < max_sector) - max_sector = sector_nr + max_sync; - - sectors_skipped += (max_sector - sector_nr); - chunks_skipped ++; - sector_nr = max_sector; - goto skipped; -} - -static sector_t -raid10_size(struct mddev *mddev, sector_t sectors, int raid_disks) -{ - sector_t size; - struct r10conf *conf = mddev->private; - - if (!raid_disks) - raid_disks = min(conf->geo.raid_disks, - conf->prev.raid_disks); - if (!sectors) - sectors = conf->dev_sectors; - - size = sectors >> conf->geo.chunk_shift; - sector_div(size, conf->geo.far_copies); - size = size * raid_disks; - sector_div(size, conf->geo.near_copies); - - return size << conf->geo.chunk_shift; -} - -static void calc_sectors(struct r10conf *conf, sector_t size) -{ - /* Calculate the number of sectors-per-device that will - * actually be used, and set conf->dev_sectors and - * conf->stride - */ - - size = size >> conf->geo.chunk_shift; - sector_div(size, conf->geo.far_copies); - size = size * conf->geo.raid_disks; - sector_div(size, conf->geo.near_copies); - /* 'size' is now the number of chunks in the array */ - /* calculate "used chunks per device" */ - size = size * conf->copies; - - /* We need to round up when dividing by raid_disks to - * get the stride size. - */ - size = DIV_ROUND_UP_SECTOR_T(size, conf->geo.raid_disks); - - conf->dev_sectors = size << conf->geo.chunk_shift; - - if (conf->geo.far_offset) - conf->geo.stride = 1 << conf->geo.chunk_shift; - else { - sector_div(size, conf->geo.far_copies); - conf->geo.stride = size << conf->geo.chunk_shift; - } -} - -enum geo_type {geo_new, geo_old, geo_start}; -static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new) -{ - int nc, fc, fo; - int layout, chunk, disks; - switch (new) { - case geo_old: - layout = mddev->layout; - chunk = mddev->chunk_sectors; - disks = mddev->raid_disks - mddev->delta_disks; - break; - case geo_new: - layout = mddev->new_layout; - chunk = mddev->new_chunk_sectors; - disks = mddev->raid_disks; - break; - default: /* avoid 'may be unused' warnings */ - case geo_start: /* new when starting reshape - raid_disks not - * updated yet. */ - layout = mddev->new_layout; - chunk = mddev->new_chunk_sectors; - disks = mddev->raid_disks + mddev->delta_disks; - break; - } - if (layout >> 17) - return -1; - if (chunk < (PAGE_SIZE >> 9) || - !is_power_of_2(chunk)) - return -2; - nc = layout & 255; - fc = (layout >> 8) & 255; - fo = layout & (1<<16); - geo->raid_disks = disks; - geo->near_copies = nc; - geo->far_copies = fc; - geo->far_offset = fo; - geo->chunk_mask = chunk - 1; - geo->chunk_shift = ffz(~chunk); - return nc*fc; -} - -static struct r10conf *setup_conf(struct mddev *mddev) -{ - struct r10conf *conf = NULL; - int err = -EINVAL; - struct geom geo; - int copies; - - copies = setup_geo(&geo, mddev, geo_new); - - if (copies == -2) { - printk(KERN_ERR "md/raid10:%s: chunk size must be " - "at least PAGE_SIZE(%ld) and be a power of 2.\n", - mdname(mddev), PAGE_SIZE); - goto out; - } - - if (copies < 2 || copies > mddev->raid_disks) { - printk(KERN_ERR "md/raid10:%s: unsupported raid10 layout: 0x%8x\n", - mdname(mddev), mddev->new_layout); - goto out; - } - - err = -ENOMEM; - conf = kzalloc(sizeof(struct r10conf), GFP_KERNEL); - if (!conf) - goto out; - - /* FIXME calc properly */ - conf->mirrors = kzalloc(sizeof(struct raid10_info)*(mddev->raid_disks + - max(0,mddev->delta_disks)), - GFP_KERNEL); - if (!conf->mirrors) - goto out; - - conf->tmppage = alloc_page(GFP_KERNEL); - if (!conf->tmppage) - goto out; - - conf->geo = geo; - conf->copies = copies; - conf->r10bio_pool = mempool_create(NR_RAID10_BIOS, r10bio_pool_alloc, - r10bio_pool_free, conf); - if (!conf->r10bio_pool) - goto out; - - calc_sectors(conf, mddev->dev_sectors); - if (mddev->reshape_position == MaxSector) { - conf->prev = conf->geo; - conf->reshape_progress = MaxSector; - } else { - if (setup_geo(&conf->prev, mddev, geo_old) != conf->copies) { - err = -EINVAL; - goto out; - } - conf->reshape_progress = mddev->reshape_position; - if (conf->prev.far_offset) - conf->prev.stride = 1 << conf->prev.chunk_shift; - else - /* far_copies must be 1 */ - conf->prev.stride = conf->dev_sectors; - } - spin_lock_init(&conf->device_lock); - INIT_LIST_HEAD(&conf->retry_list); - - spin_lock_init(&conf->resync_lock); - init_waitqueue_head(&conf->wait_barrier); - - conf->thread = md_register_thread(raid10d, mddev, "raid10"); - if (!conf->thread) - goto out; - - conf->mddev = mddev; - return conf; - - out: - if (err == -ENOMEM) - printk(KERN_ERR "md/raid10:%s: couldn't allocate memory.\n", - mdname(mddev)); - if (conf) { - if (conf->r10bio_pool) - mempool_destroy(conf->r10bio_pool); - kfree(conf->mirrors); - safe_put_page(conf->tmppage); - kfree(conf); - } - return ERR_PTR(err); -} - -static int run(struct mddev *mddev) -{ - struct r10conf *conf; - int i, disk_idx, chunk_size; - struct raid10_info *disk; - struct md_rdev *rdev; - sector_t size; - sector_t min_offset_diff = 0; - int first = 1; - - if (mddev->private == NULL) { - conf = setup_conf(mddev); - if (IS_ERR(conf)) - return PTR_ERR(conf); - mddev->private = conf; - } - conf = mddev->private; - if (!conf) - goto out; - - mddev->thread = conf->thread; - conf->thread = NULL; - - chunk_size = mddev->chunk_sectors << 9; - if (mddev->queue) { - blk_queue_io_min(mddev->queue, chunk_size); - if (conf->geo.raid_disks % conf->geo.near_copies) - blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks); - else - blk_queue_io_opt(mddev->queue, chunk_size * - (conf->geo.raid_disks / conf->geo.near_copies)); - } - - rdev_for_each(rdev, mddev) { - long long diff; - struct request_queue *q; - - disk_idx = rdev->raid_disk; - if (disk_idx < 0) - continue; - if (disk_idx >= conf->geo.raid_disks && - disk_idx >= conf->prev.raid_disks) - continue; - disk = conf->mirrors + disk_idx; - - if (test_bit(Replacement, &rdev->flags)) { - if (disk->replacement) - goto out_free_conf; - disk->replacement = rdev; - } else { - if (disk->rdev) - goto out_free_conf; - disk->rdev = rdev; - } - q = bdev_get_queue(rdev->bdev); - if (q->merge_bvec_fn) - mddev->merge_check_needed = 1; - diff = (rdev->new_data_offset - rdev->data_offset); - if (!mddev->reshape_backwards) - diff = -diff; - if (diff < 0) - diff = 0; - if (first || diff < min_offset_diff) - min_offset_diff = diff; - - if (mddev->gendisk) - disk_stack_limits(mddev->gendisk, rdev->bdev, - rdev->data_offset << 9); - - disk->head_position = 0; - } - - /* need to check that every block has at least one working mirror */ - if (!enough(conf, -1)) { - printk(KERN_ERR "md/raid10:%s: not enough operational mirrors.\n", - mdname(mddev)); - goto out_free_conf; - } - - if (conf->reshape_progress != MaxSector) { - /* must ensure that shape change is supported */ - if (conf->geo.far_copies != 1 && - conf->geo.far_offset == 0) - goto out_free_conf; - if (conf->prev.far_copies != 1 && - conf->geo.far_offset == 0) - goto out_free_conf; - } - - mddev->degraded = 0; - for (i = 0; - i < conf->geo.raid_disks - || i < conf->prev.raid_disks; - i++) { - - disk = conf->mirrors + i; - - if (!disk->rdev && disk->replacement) { - /* The replacement is all we have - use it */ - disk->rdev = disk->replacement; - disk->replacement = NULL; - clear_bit(Replacement, &disk->rdev->flags); - } - - if (!disk->rdev || - !test_bit(In_sync, &disk->rdev->flags)) { - disk->head_position = 0; - mddev->degraded++; - if (disk->rdev) - conf->fullsync = 1; - } - disk->recovery_disabled = mddev->recovery_disabled - 1; - } - - if (mddev->recovery_cp != MaxSector) - printk(KERN_NOTICE "md/raid10:%s: not clean" - " -- starting background reconstruction\n", - mdname(mddev)); - printk(KERN_INFO - "md/raid10:%s: active with %d out of %d devices\n", - mdname(mddev), conf->geo.raid_disks - mddev->degraded, - conf->geo.raid_disks); - /* - * Ok, everything is just fine now - */ - mddev->dev_sectors = conf->dev_sectors; - size = raid10_size(mddev, 0, 0); - md_set_array_sectors(mddev, size); - mddev->resync_max_sectors = size; - - if (mddev->queue) { - int stripe = conf->geo.raid_disks * - ((mddev->chunk_sectors << 9) / PAGE_SIZE); - mddev->queue->backing_dev_info.congested_fn = raid10_congested; - mddev->queue->backing_dev_info.congested_data = mddev; - - /* Calculate max read-ahead size. - * We need to readahead at least twice a whole stripe.... - * maybe... - */ - stripe /= conf->geo.near_copies; - if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe) - mddev->queue->backing_dev_info.ra_pages = 2 * stripe; - blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec); - } - - - if (md_integrity_register(mddev)) - goto out_free_conf; - - if (conf->reshape_progress != MaxSector) { - unsigned long before_length, after_length; - - before_length = ((1 << conf->prev.chunk_shift) * - conf->prev.far_copies); - after_length = ((1 << conf->geo.chunk_shift) * - conf->geo.far_copies); - - if (max(before_length, after_length) > min_offset_diff) { - /* This cannot work */ - printk("md/raid10: offset difference not enough to continue reshape\n"); - goto out_free_conf; - } - conf->offset_diff = min_offset_diff; - - conf->reshape_safe = conf->reshape_progress; - clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); - clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); - set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - mddev->sync_thread = md_register_thread(md_do_sync, mddev, - "reshape"); - } - - return 0; - -out_free_conf: - md_unregister_thread(&mddev->thread); - if (conf->r10bio_pool) - mempool_destroy(conf->r10bio_pool); - safe_put_page(conf->tmppage); - kfree(conf->mirrors); - kfree(conf); - mddev->private = NULL; -out: - return -EIO; -} - -static int stop(struct mddev *mddev) -{ - struct r10conf *conf = mddev->private; - - raise_barrier(conf, 0); - lower_barrier(conf); - - md_unregister_thread(&mddev->thread); - if (mddev->queue) - /* the unplug fn references 'conf'*/ - blk_sync_queue(mddev->queue); - - if (conf->r10bio_pool) - mempool_destroy(conf->r10bio_pool); - kfree(conf->mirrors); - kfree(conf); - mddev->private = NULL; - return 0; -} - -static void raid10_quiesce(struct mddev *mddev, int state) -{ - struct r10conf *conf = mddev->private; - - switch(state) { - case 1: - raise_barrier(conf, 0); - break; - case 0: - lower_barrier(conf); - break; - } -} - -static int raid10_resize(struct mddev *mddev, sector_t sectors) -{ - /* Resize of 'far' arrays is not supported. - * For 'near' and 'offset' arrays we can set the - * number of sectors used to be an appropriate multiple - * of the chunk size. - * For 'offset', this is far_copies*chunksize. - * For 'near' the multiplier is the LCM of - * near_copies and raid_disks. - * So if far_copies > 1 && !far_offset, fail. - * Else find LCM(raid_disks, near_copy)*far_copies and - * multiply by chunk_size. Then round to this number. - * This is mostly done by raid10_size() - */ - struct r10conf *conf = mddev->private; - sector_t oldsize, size; - - if (mddev->reshape_position != MaxSector) - return -EBUSY; - - if (conf->geo.far_copies > 1 && !conf->geo.far_offset) - return -EINVAL; - - oldsize = raid10_size(mddev, 0, 0); - size = raid10_size(mddev, sectors, 0); - if (mddev->external_size && - mddev->array_sectors > size) - return -EINVAL; - if (mddev->bitmap) { - int ret = bitmap_resize(mddev->bitmap, size, 0, 0); - if (ret) - return ret; - } - md_set_array_sectors(mddev, size); - set_capacity(mddev->gendisk, mddev->array_sectors); - revalidate_disk(mddev->gendisk); - if (sectors > mddev->dev_sectors && - mddev->recovery_cp > oldsize) { - mddev->recovery_cp = oldsize; - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - } - calc_sectors(conf, sectors); - mddev->dev_sectors = conf->dev_sectors; - mddev->resync_max_sectors = size; - return 0; -} - -static void *raid10_takeover_raid0(struct mddev *mddev) -{ - struct md_rdev *rdev; - struct r10conf *conf; - - if (mddev->degraded > 0) { - printk(KERN_ERR "md/raid10:%s: Error: degraded raid0!\n", - mdname(mddev)); - return ERR_PTR(-EINVAL); - } - - /* Set new parameters */ - mddev->new_level = 10; - /* new layout: far_copies = 1, near_copies = 2 */ - mddev->new_layout = (1<<8) + 2; - mddev->new_chunk_sectors = mddev->chunk_sectors; - mddev->delta_disks = mddev->raid_disks; - mddev->raid_disks *= 2; - /* make sure it will be not marked as dirty */ - mddev->recovery_cp = MaxSector; - - conf = setup_conf(mddev); - if (!IS_ERR(conf)) { - rdev_for_each(rdev, mddev) - if (rdev->raid_disk >= 0) - rdev->new_raid_disk = rdev->raid_disk * 2; - conf->barrier = 1; - } - - return conf; -} - -static void *raid10_takeover(struct mddev *mddev) -{ - struct r0conf *raid0_conf; - - /* raid10 can take over: - * raid0 - providing it has only two drives - */ - if (mddev->level == 0) { - /* for raid0 takeover only one zone is supported */ - raid0_conf = mddev->private; - if (raid0_conf->nr_strip_zones > 1) { - printk(KERN_ERR "md/raid10:%s: cannot takeover raid 0" - " with more than one zone.\n", - mdname(mddev)); - return ERR_PTR(-EINVAL); - } - return raid10_takeover_raid0(mddev); - } - return ERR_PTR(-EINVAL); -} - -static int raid10_check_reshape(struct mddev *mddev) -{ - /* Called when there is a request to change - * - layout (to ->new_layout) - * - chunk size (to ->new_chunk_sectors) - * - raid_disks (by delta_disks) - * or when trying to restart a reshape that was ongoing. - * - * We need to validate the request and possibly allocate - * space if that might be an issue later. - * - * Currently we reject any reshape of a 'far' mode array, - * allow chunk size to change if new is generally acceptable, - * allow raid_disks to increase, and allow - * a switch between 'near' mode and 'offset' mode. - */ - struct r10conf *conf = mddev->private; - struct geom geo; - - if (conf->geo.far_copies != 1 && !conf->geo.far_offset) - return -EINVAL; - - if (setup_geo(&geo, mddev, geo_start) != conf->copies) - /* mustn't change number of copies */ - return -EINVAL; - if (geo.far_copies > 1 && !geo.far_offset) - /* Cannot switch to 'far' mode */ - return -EINVAL; - - if (mddev->array_sectors & geo.chunk_mask) - /* not factor of array size */ - return -EINVAL; - - if (!enough(conf, -1)) - return -EINVAL; - - kfree(conf->mirrors_new); - conf->mirrors_new = NULL; - if (mddev->delta_disks > 0) { - /* allocate new 'mirrors' list */ - conf->mirrors_new = kzalloc( - sizeof(struct raid10_info) - *(mddev->raid_disks + - mddev->delta_disks), - GFP_KERNEL); - if (!conf->mirrors_new) - return -ENOMEM; - } - return 0; -} - -/* - * Need to check if array has failed when deciding whether to: - * - start an array - * - remove non-faulty devices - * - add a spare - * - allow a reshape - * This determination is simple when no reshape is happening. - * However if there is a reshape, we need to carefully check - * both the before and after sections. - * This is because some failed devices may only affect one - * of the two sections, and some non-in_sync devices may - * be insync in the section most affected by failed devices. - */ -static int calc_degraded(struct r10conf *conf) -{ - int degraded, degraded2; - int i; - - rcu_read_lock(); - degraded = 0; - /* 'prev' section first */ - for (i = 0; i < conf->prev.raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (!rdev || test_bit(Faulty, &rdev->flags)) - degraded++; - else if (!test_bit(In_sync, &rdev->flags)) - /* When we can reduce the number of devices in - * an array, this might not contribute to - * 'degraded'. It does now. - */ - degraded++; - } - rcu_read_unlock(); - if (conf->geo.raid_disks == conf->prev.raid_disks) - return degraded; - rcu_read_lock(); - degraded2 = 0; - for (i = 0; i < conf->geo.raid_disks; i++) { - struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (!rdev || test_bit(Faulty, &rdev->flags)) - degraded2++; - else if (!test_bit(In_sync, &rdev->flags)) { - /* If reshape is increasing the number of devices, - * this section has already been recovered, so - * it doesn't contribute to degraded. - * else it does. - */ - if (conf->geo.raid_disks <= conf->prev.raid_disks) - degraded2++; - } - } - rcu_read_unlock(); - if (degraded2 > degraded) - return degraded2; - return degraded; -} - -static int raid10_start_reshape(struct mddev *mddev) -{ - /* A 'reshape' has been requested. This commits - * the various 'new' fields and sets MD_RECOVER_RESHAPE - * This also checks if there are enough spares and adds them - * to the array. - * We currently require enough spares to make the final - * array non-degraded. We also require that the difference - * between old and new data_offset - on each device - is - * enough that we never risk over-writing. - */ - - unsigned long before_length, after_length; - sector_t min_offset_diff = 0; - int first = 1; - struct geom new; - struct r10conf *conf = mddev->private; - struct md_rdev *rdev; - int spares = 0; - int ret; - - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) - return -EBUSY; - - if (setup_geo(&new, mddev, geo_start) != conf->copies) - return -EINVAL; - - before_length = ((1 << conf->prev.chunk_shift) * - conf->prev.far_copies); - after_length = ((1 << conf->geo.chunk_shift) * - conf->geo.far_copies); - - rdev_for_each(rdev, mddev) { - if (!test_bit(In_sync, &rdev->flags) - && !test_bit(Faulty, &rdev->flags)) - spares++; - if (rdev->raid_disk >= 0) { - long long diff = (rdev->new_data_offset - - rdev->data_offset); - if (!mddev->reshape_backwards) - diff = -diff; - if (diff < 0) - diff = 0; - if (first || diff < min_offset_diff) - min_offset_diff = diff; - } - } - - if (max(before_length, after_length) > min_offset_diff) - return -EINVAL; - - if (spares < mddev->delta_disks) - return -EINVAL; - - conf->offset_diff = min_offset_diff; - spin_lock_irq(&conf->device_lock); - if (conf->mirrors_new) { - memcpy(conf->mirrors_new, conf->mirrors, - sizeof(struct raid10_info)*conf->prev.raid_disks); - smp_mb(); - kfree(conf->mirrors_old); /* FIXME and elsewhere */ - conf->mirrors_old = conf->mirrors; - conf->mirrors = conf->mirrors_new; - conf->mirrors_new = NULL; - } - setup_geo(&conf->geo, mddev, geo_start); - smp_mb(); - if (mddev->reshape_backwards) { - sector_t size = raid10_size(mddev, 0, 0); - if (size < mddev->array_sectors) { - spin_unlock_irq(&conf->device_lock); - printk(KERN_ERR "md/raid10:%s: array size must be reduce before number of disks\n", - mdname(mddev)); - return -EINVAL; - } - mddev->resync_max_sectors = size; - conf->reshape_progress = size; - } else - conf->reshape_progress = 0; - spin_unlock_irq(&conf->device_lock); - - if (mddev->delta_disks && mddev->bitmap) { - ret = bitmap_resize(mddev->bitmap, - raid10_size(mddev, 0, - conf->geo.raid_disks), - 0, 0); - if (ret) - goto abort; - } - if (mddev->delta_disks > 0) { - rdev_for_each(rdev, mddev) - if (rdev->raid_disk < 0 && - !test_bit(Faulty, &rdev->flags)) { - if (raid10_add_disk(mddev, rdev) == 0) { - if (rdev->raid_disk >= - conf->prev.raid_disks) - set_bit(In_sync, &rdev->flags); - else - rdev->recovery_offset = 0; - - if (sysfs_link_rdev(mddev, rdev)) - /* Failure here is OK */; - } - } else if (rdev->raid_disk >= conf->prev.raid_disks - && !test_bit(Faulty, &rdev->flags)) { - /* This is a spare that was manually added */ - set_bit(In_sync, &rdev->flags); - } - } - /* When a reshape changes the number of devices, - * ->degraded is measured against the larger of the - * pre and post numbers. - */ - spin_lock_irq(&conf->device_lock); - mddev->degraded = calc_degraded(conf); - spin_unlock_irq(&conf->device_lock); - mddev->raid_disks = conf->geo.raid_disks; - mddev->reshape_position = conf->reshape_progress; - set_bit(MD_CHANGE_DEVS, &mddev->flags); - - clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); - clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); - set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - - mddev->sync_thread = md_register_thread(md_do_sync, mddev, - "reshape"); - if (!mddev->sync_thread) { - ret = -EAGAIN; - goto abort; - } - conf->reshape_checkpoint = jiffies; - md_wakeup_thread(mddev->sync_thread); - md_new_event(mddev); - return 0; - -abort: - mddev->recovery = 0; - spin_lock_irq(&conf->device_lock); - conf->geo = conf->prev; - mddev->raid_disks = conf->geo.raid_disks; - rdev_for_each(rdev, mddev) - rdev->new_data_offset = rdev->data_offset; - smp_wmb(); - conf->reshape_progress = MaxSector; - mddev->reshape_position = MaxSector; - spin_unlock_irq(&conf->device_lock); - return ret; -} - -/* Calculate the last device-address that could contain - * any block from the chunk that includes the array-address 's' - * and report the next address. - * i.e. the address returned will be chunk-aligned and after - * any data that is in the chunk containing 's'. - */ -static sector_t last_dev_address(sector_t s, struct geom *geo) -{ - s = (s | geo->chunk_mask) + 1; - s >>= geo->chunk_shift; - s *= geo->near_copies; - s = DIV_ROUND_UP_SECTOR_T(s, geo->raid_disks); - s *= geo->far_copies; - s <<= geo->chunk_shift; - return s; -} - -/* Calculate the first device-address that could contain - * any block from the chunk that includes the array-address 's'. - * This too will be the start of a chunk - */ -static sector_t first_dev_address(sector_t s, struct geom *geo) -{ - s >>= geo->chunk_shift; - s *= geo->near_copies; - sector_div(s, geo->raid_disks); - s *= geo->far_copies; - s <<= geo->chunk_shift; - return s; -} - -static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, - int *skipped) -{ - /* We simply copy at most one chunk (smallest of old and new) - * at a time, possibly less if that exceeds RESYNC_PAGES, - * or we hit a bad block or something. - * This might mean we pause for normal IO in the middle of - * a chunk, but that is not a problem was mddev->reshape_position - * can record any location. - * - * If we will want to write to a location that isn't - * yet recorded as 'safe' (i.e. in metadata on disk) then - * we need to flush all reshape requests and update the metadata. - * - * When reshaping forwards (e.g. to more devices), we interpret - * 'safe' as the earliest block which might not have been copied - * down yet. We divide this by previous stripe size and multiply - * by previous stripe length to get lowest device offset that we - * cannot write to yet. - * We interpret 'sector_nr' as an address that we want to write to. - * From this we use last_device_address() to find where we might - * write to, and first_device_address on the 'safe' position. - * If this 'next' write position is after the 'safe' position, - * we must update the metadata to increase the 'safe' position. - * - * When reshaping backwards, we round in the opposite direction - * and perform the reverse test: next write position must not be - * less than current safe position. - * - * In all this the minimum difference in data offsets - * (conf->offset_diff - always positive) allows a bit of slack, - * so next can be after 'safe', but not by more than offset_disk - * - * We need to prepare all the bios here before we start any IO - * to ensure the size we choose is acceptable to all devices. - * The means one for each copy for write-out and an extra one for - * read-in. - * We store the read-in bio in ->master_bio and the others in - * ->devs[x].bio and ->devs[x].repl_bio. - */ - struct r10conf *conf = mddev->private; - struct r10bio *r10_bio; - sector_t next, safe, last; - int max_sectors; - int nr_sectors; - int s; - struct md_rdev *rdev; - int need_flush = 0; - struct bio *blist; - struct bio *bio, *read_bio; - int sectors_done = 0; - - if (sector_nr == 0) { - /* If restarting in the middle, skip the initial sectors */ - if (mddev->reshape_backwards && - conf->reshape_progress < raid10_size(mddev, 0, 0)) { - sector_nr = (raid10_size(mddev, 0, 0) - - conf->reshape_progress); - } else if (!mddev->reshape_backwards && - conf->reshape_progress > 0) - sector_nr = conf->reshape_progress; - if (sector_nr) { - mddev->curr_resync_completed = sector_nr; - sysfs_notify(&mddev->kobj, NULL, "sync_completed"); - *skipped = 1; - return sector_nr; - } - } - - /* We don't use sector_nr to track where we are up to - * as that doesn't work well for ->reshape_backwards. - * So just use ->reshape_progress. - */ - if (mddev->reshape_backwards) { - /* 'next' is the earliest device address that we might - * write to for this chunk in the new layout - */ - next = first_dev_address(conf->reshape_progress - 1, - &conf->geo); - - /* 'safe' is the last device address that we might read from - * in the old layout after a restart - */ - safe = last_dev_address(conf->reshape_safe - 1, - &conf->prev); - - if (next + conf->offset_diff < safe) - need_flush = 1; - - last = conf->reshape_progress - 1; - sector_nr = last & ~(sector_t)(conf->geo.chunk_mask - & conf->prev.chunk_mask); - if (sector_nr + RESYNC_BLOCK_SIZE/512 < last) - sector_nr = last + 1 - RESYNC_BLOCK_SIZE/512; - } else { - /* 'next' is after the last device address that we - * might write to for this chunk in the new layout - */ - next = last_dev_address(conf->reshape_progress, &conf->geo); - - /* 'safe' is the earliest device address that we might - * read from in the old layout after a restart - */ - safe = first_dev_address(conf->reshape_safe, &conf->prev); - - /* Need to update metadata if 'next' might be beyond 'safe' - * as that would possibly corrupt data - */ - if (next > safe + conf->offset_diff) - need_flush = 1; - - sector_nr = conf->reshape_progress; - last = sector_nr | (conf->geo.chunk_mask - & conf->prev.chunk_mask); - - if (sector_nr + RESYNC_BLOCK_SIZE/512 <= last) - last = sector_nr + RESYNC_BLOCK_SIZE/512 - 1; - } - - if (need_flush || - time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) { - /* Need to update reshape_position in metadata */ - wait_barrier(conf); - mddev->reshape_position = conf->reshape_progress; - if (mddev->reshape_backwards) - mddev->curr_resync_completed = raid10_size(mddev, 0, 0) - - conf->reshape_progress; - else - mddev->curr_resync_completed = conf->reshape_progress; - conf->reshape_checkpoint = jiffies; - set_bit(MD_CHANGE_DEVS, &mddev->flags); - md_wakeup_thread(mddev->thread); - wait_event(mddev->sb_wait, mddev->flags == 0 || - kthread_should_stop()); - conf->reshape_safe = mddev->reshape_position; - allow_barrier(conf); - } - -read_more: - /* Now schedule reads for blocks from sector_nr to last */ - r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO); - raise_barrier(conf, sectors_done != 0); - atomic_set(&r10_bio->remaining, 0); - r10_bio->mddev = mddev; - r10_bio->sector = sector_nr; - set_bit(R10BIO_IsReshape, &r10_bio->state); - r10_bio->sectors = last - sector_nr + 1; - rdev = read_balance(conf, r10_bio, &max_sectors); - BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state)); - - if (!rdev) { - /* Cannot read from here, so need to record bad blocks - * on all the target devices. - */ - // FIXME - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - return sectors_done; - } - - read_bio = bio_alloc_mddev(GFP_KERNEL, RESYNC_PAGES, mddev); - - read_bio->bi_bdev = rdev->bdev; - read_bio->bi_sector = (r10_bio->devs[r10_bio->read_slot].addr - + rdev->data_offset); - read_bio->bi_private = r10_bio; - read_bio->bi_end_io = end_sync_read; - read_bio->bi_rw = READ; - read_bio->bi_flags &= ~(BIO_POOL_MASK - 1); - read_bio->bi_flags |= 1 << BIO_UPTODATE; - read_bio->bi_vcnt = 0; - read_bio->bi_idx = 0; - read_bio->bi_size = 0; - r10_bio->master_bio = read_bio; - r10_bio->read_slot = r10_bio->devs[r10_bio->read_slot].devnum; - - /* Now find the locations in the new layout */ - __raid10_find_phys(&conf->geo, r10_bio); - - blist = read_bio; - read_bio->bi_next = NULL; - - for (s = 0; s < conf->copies*2; s++) { - struct bio *b; - int d = r10_bio->devs[s/2].devnum; - struct md_rdev *rdev2; - if (s&1) { - rdev2 = conf->mirrors[d].replacement; - b = r10_bio->devs[s/2].repl_bio; - } else { - rdev2 = conf->mirrors[d].rdev; - b = r10_bio->devs[s/2].bio; - } - if (!rdev2 || test_bit(Faulty, &rdev2->flags)) - continue; - b->bi_bdev = rdev2->bdev; - b->bi_sector = r10_bio->devs[s/2].addr + rdev2->new_data_offset; - b->bi_private = r10_bio; - b->bi_end_io = end_reshape_write; - b->bi_rw = WRITE; - b->bi_flags &= ~(BIO_POOL_MASK - 1); - b->bi_flags |= 1 << BIO_UPTODATE; - b->bi_next = blist; - b->bi_vcnt = 0; - b->bi_idx = 0; - b->bi_size = 0; - blist = b; - } - - /* Now add as many pages as possible to all of these bios. */ - - nr_sectors = 0; - for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) { - struct page *page = r10_bio->devs[0].bio->bi_io_vec[s/(PAGE_SIZE>>9)].bv_page; - int len = (max_sectors - s) << 9; - if (len > PAGE_SIZE) - len = PAGE_SIZE; - for (bio = blist; bio ; bio = bio->bi_next) { - struct bio *bio2; - if (bio_add_page(bio, page, len, 0)) - continue; - - /* Didn't fit, must stop */ - for (bio2 = blist; - bio2 && bio2 != bio; - bio2 = bio2->bi_next) { - /* Remove last page from this bio */ - bio2->bi_vcnt--; - bio2->bi_size -= len; - bio2->bi_flags &= ~(1<> 9; - nr_sectors += len >> 9; - } -bio_full: - r10_bio->sectors = nr_sectors; - - /* Now submit the read */ - md_sync_acct(read_bio->bi_bdev, r10_bio->sectors); - atomic_inc(&r10_bio->remaining); - read_bio->bi_next = NULL; - generic_make_request(read_bio); - sector_nr += nr_sectors; - sectors_done += nr_sectors; - if (sector_nr <= last) - goto read_more; - - /* Now that we have done the whole section we can - * update reshape_progress - */ - if (mddev->reshape_backwards) - conf->reshape_progress -= sectors_done; - else - conf->reshape_progress += sectors_done; - - return sectors_done; -} - -static void end_reshape_request(struct r10bio *r10_bio); -static int handle_reshape_read_error(struct mddev *mddev, - struct r10bio *r10_bio); -static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio) -{ - /* Reshape read completed. Hopefully we have a block - * to write out. - * If we got a read error then we do sync 1-page reads from - * elsewhere until we find the data - or give up. - */ - struct r10conf *conf = mddev->private; - int s; - - if (!test_bit(R10BIO_Uptodate, &r10_bio->state)) - if (handle_reshape_read_error(mddev, r10_bio) < 0) { - /* Reshape has been aborted */ - md_done_sync(mddev, r10_bio->sectors, 0); - return; - } - - /* We definitely have the data in the pages, schedule the - * writes. - */ - atomic_set(&r10_bio->remaining, 1); - for (s = 0; s < conf->copies*2; s++) { - struct bio *b; - int d = r10_bio->devs[s/2].devnum; - struct md_rdev *rdev; - if (s&1) { - rdev = conf->mirrors[d].replacement; - b = r10_bio->devs[s/2].repl_bio; - } else { - rdev = conf->mirrors[d].rdev; - b = r10_bio->devs[s/2].bio; - } - if (!rdev || test_bit(Faulty, &rdev->flags)) - continue; - atomic_inc(&rdev->nr_pending); - md_sync_acct(b->bi_bdev, r10_bio->sectors); - atomic_inc(&r10_bio->remaining); - b->bi_next = NULL; - generic_make_request(b); - } - end_reshape_request(r10_bio); -} - -static void end_reshape(struct r10conf *conf) -{ - if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) - return; - - spin_lock_irq(&conf->device_lock); - conf->prev = conf->geo; - md_finish_reshape(conf->mddev); - smp_wmb(); - conf->reshape_progress = MaxSector; - spin_unlock_irq(&conf->device_lock); - - /* read-ahead size must cover two whole stripes, which is - * 2 * (datadisks) * chunksize where 'n' is the number of raid devices - */ - if (conf->mddev->queue) { - int stripe = conf->geo.raid_disks * - ((conf->mddev->chunk_sectors << 9) / PAGE_SIZE); - stripe /= conf->geo.near_copies; - if (conf->mddev->queue->backing_dev_info.ra_pages < 2 * stripe) - conf->mddev->queue->backing_dev_info.ra_pages = 2 * stripe; - } - conf->fullsync = 0; -} - - -static int handle_reshape_read_error(struct mddev *mddev, - struct r10bio *r10_bio) -{ - /* Use sync reads to get the blocks from somewhere else */ - int sectors = r10_bio->sectors; - struct r10conf *conf = mddev->private; - struct { - struct r10bio r10_bio; - struct r10dev devs[conf->copies]; - } on_stack; - struct r10bio *r10b = &on_stack.r10_bio; - int slot = 0; - int idx = 0; - struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec; - - r10b->sector = r10_bio->sector; - __raid10_find_phys(&conf->prev, r10b); - - while (sectors) { - int s = sectors; - int success = 0; - int first_slot = slot; - - if (s > (PAGE_SIZE >> 9)) - s = PAGE_SIZE >> 9; - - while (!success) { - int d = r10b->devs[slot].devnum; - struct md_rdev *rdev = conf->mirrors[d].rdev; - sector_t addr; - if (rdev == NULL || - test_bit(Faulty, &rdev->flags) || - !test_bit(In_sync, &rdev->flags)) - goto failed; - - addr = r10b->devs[slot].addr + idx * PAGE_SIZE; - success = sync_page_io(rdev, - addr, - s << 9, - bvec[idx].bv_page, - READ, false); - if (success) - break; - failed: - slot++; - if (slot >= conf->copies) - slot = 0; - if (slot == first_slot) - break; - } - if (!success) { - /* couldn't read this block, must give up */ - set_bit(MD_RECOVERY_INTR, - &mddev->recovery); - return -EIO; - } - sectors -= s; - idx++; - } - return 0; -} - -static void end_reshape_write(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - struct r10bio *r10_bio = bio->bi_private; - struct mddev *mddev = r10_bio->mddev; - struct r10conf *conf = mddev->private; - int d; - int slot; - int repl; - struct md_rdev *rdev = NULL; - - d = find_bio_disk(conf, r10_bio, bio, &slot, &repl); - if (repl) - rdev = conf->mirrors[d].replacement; - if (!rdev) { - smp_mb(); - rdev = conf->mirrors[d].rdev; - } - - if (!uptodate) { - /* FIXME should record badblock */ - md_error(mddev, rdev); - } - - rdev_dec_pending(rdev, mddev); - end_reshape_request(r10_bio); -} - -static void end_reshape_request(struct r10bio *r10_bio) -{ - if (!atomic_dec_and_test(&r10_bio->remaining)) - return; - md_done_sync(r10_bio->mddev, r10_bio->sectors, 1); - bio_put(r10_bio->master_bio); - put_buf(r10_bio); -} - -static void raid10_finish_reshape(struct mddev *mddev) -{ - struct r10conf *conf = mddev->private; - - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) - return; - - if (mddev->delta_disks > 0) { - sector_t size = raid10_size(mddev, 0, 0); - md_set_array_sectors(mddev, size); - if (mddev->recovery_cp > mddev->resync_max_sectors) { - mddev->recovery_cp = mddev->resync_max_sectors; - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - } - mddev->resync_max_sectors = size; - set_capacity(mddev->gendisk, mddev->array_sectors); - revalidate_disk(mddev->gendisk); - } else { - int d; - for (d = conf->geo.raid_disks ; - d < conf->geo.raid_disks - mddev->delta_disks; - d++) { - struct md_rdev *rdev = conf->mirrors[d].rdev; - if (rdev) - clear_bit(In_sync, &rdev->flags); - rdev = conf->mirrors[d].replacement; - if (rdev) - clear_bit(In_sync, &rdev->flags); - } - } - mddev->layout = mddev->new_layout; - mddev->chunk_sectors = 1 << conf->geo.chunk_shift; - mddev->reshape_position = MaxSector; - mddev->delta_disks = 0; - mddev->reshape_backwards = 0; -} - -static struct md_personality raid10_personality = -{ - .name = "raid10", - .level = 10, - .owner = THIS_MODULE, - .make_request = make_request, - .run = run, - .stop = stop, - .status = status, - .error_handler = error, - .hot_add_disk = raid10_add_disk, - .hot_remove_disk= raid10_remove_disk, - .spare_active = raid10_spare_active, - .sync_request = sync_request, - .quiesce = raid10_quiesce, - .size = raid10_size, - .resize = raid10_resize, - .takeover = raid10_takeover, - .check_reshape = raid10_check_reshape, - .start_reshape = raid10_start_reshape, - .finish_reshape = raid10_finish_reshape, -}; - -static int __init raid_init(void) -{ - return register_md_personality(&raid10_personality); -} - -static void raid_exit(void) -{ - unregister_md_personality(&raid10_personality); -} - -module_init(raid_init); -module_exit(raid_exit); -MODULE_LICENSE("GPL"); -MODULE_DESCRIPTION("RAID10 (striped mirror) personality for MD"); -MODULE_ALIAS("md-personality-9"); /* RAID10 */ -MODULE_ALIAS("md-raid10"); -MODULE_ALIAS("md-level-10"); - -module_param(max_queued_requests, int, S_IRUGO|S_IWUSR); ./linux/raid10-race/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.822951772 +0000 @@ -1,2329 +0,0 @@ -/* - * raid1.c : Multiple Devices driver for Linux - * - * Copyright (C) 1999, 2000, 2001 Ingo Molnar, Red Hat - * - * Copyright (C) 1996, 1997, 1998 Ingo Molnar, Miguel de Icaza, Gadi Oxman - * - * RAID-1 management functions. - * - * Better read-balancing code written by Mika Kuoppala , 2000 - * - * Fixes to reconstruction by Jakob Østergaard" - * Various fixes by Neil Brown - * - * Changes by Peter T. Breuer 31/1/2003 to support - * bitmapped intelligence in resync: - * - * - bitmap marked during normal i/o - * - bitmap used to skip nondirty blocks during sync - * - * Additions to bitmap code, (C) 2003-2004 Paul Clements, SteelEye Technology: - * - persistent bitmap code - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * You should have received a copy of the GNU General Public License - * (for example /usr/src/linux/COPYING); if not, write to the Free - * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include "md.h" -#include "raid1.h" -#include "bitmap.h" - -#define DEBUG 0 -#if DEBUG -#define PRINTK(x...) printk(x) -#else -#define PRINTK(x...) -#endif - -/* - * Number of guaranteed r1bios in case of extreme VM load: - */ -#define NR_RAID1_BIOS 256 - - -static void allow_barrier(conf_t *conf); -static void lower_barrier(conf_t *conf); - -static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data) -{ - struct pool_info *pi = data; - int size = offsetof(r1bio_t, bios[pi->raid_disks]); - - /* allocate a r1bio with room for raid_disks entries in the bios array */ - return kzalloc(size, gfp_flags); -} - -static void r1bio_pool_free(void *r1_bio, void *data) -{ - kfree(r1_bio); -} - -#define RESYNC_BLOCK_SIZE (64*1024) -//#define RESYNC_BLOCK_SIZE PAGE_SIZE -#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) -#define RESYNC_WINDOW (2048*1024) - -static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) -{ - struct pool_info *pi = data; - struct page *page; - r1bio_t *r1_bio; - struct bio *bio; - int i, j; - - r1_bio = r1bio_pool_alloc(gfp_flags, pi); - if (!r1_bio) - return NULL; - - /* - * Allocate bios : 1 for reading, n-1 for writing - */ - for (j = pi->raid_disks ; j-- ; ) { - bio = bio_kmalloc(gfp_flags, RESYNC_PAGES); - if (!bio) - goto out_free_bio; - r1_bio->bios[j] = bio; - } - /* - * Allocate RESYNC_PAGES data pages and attach them to - * the first bio. - * If this is a user-requested check/repair, allocate - * RESYNC_PAGES for each bio. - */ - if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) - j = pi->raid_disks; - else - j = 1; - while(j--) { - bio = r1_bio->bios[j]; - for (i = 0; i < RESYNC_PAGES; i++) { - page = alloc_page(gfp_flags); - if (unlikely(!page)) - goto out_free_pages; - - bio->bi_io_vec[i].bv_page = page; - bio->bi_vcnt = i+1; - } - } - /* If not user-requests, copy the page pointers to all bios */ - if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) { - for (i=0; iraid_disks; j++) - r1_bio->bios[j]->bi_io_vec[i].bv_page = - r1_bio->bios[0]->bi_io_vec[i].bv_page; - } - - r1_bio->master_bio = NULL; - - return r1_bio; - -out_free_pages: - for (j=0 ; j < pi->raid_disks; j++) - for (i=0; i < r1_bio->bios[j]->bi_vcnt ; i++) - put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page); - j = -1; -out_free_bio: - while ( ++j < pi->raid_disks ) - bio_put(r1_bio->bios[j]); - r1bio_pool_free(r1_bio, data); - return NULL; -} - -static void r1buf_pool_free(void *__r1_bio, void *data) -{ - struct pool_info *pi = data; - int i,j; - r1bio_t *r1bio = __r1_bio; - - for (i = 0; i < RESYNC_PAGES; i++) - for (j = pi->raid_disks; j-- ;) { - if (j == 0 || - r1bio->bios[j]->bi_io_vec[i].bv_page != - r1bio->bios[0]->bi_io_vec[i].bv_page) - safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page); - } - for (i=0 ; i < pi->raid_disks; i++) - bio_put(r1bio->bios[i]); - - r1bio_pool_free(r1bio, data); -} - -static void put_all_bios(conf_t *conf, r1bio_t *r1_bio) -{ - int i; - - for (i = 0; i < conf->raid_disks; i++) { - struct bio **bio = r1_bio->bios + i; - if (*bio && *bio != IO_BLOCKED) - bio_put(*bio); - *bio = NULL; - } -} - -static void free_r1bio(r1bio_t *r1_bio) -{ - conf_t *conf = r1_bio->mddev->private; - - /* - * Wake up any possible resync thread that waits for the device - * to go idle. - */ - allow_barrier(conf); - - put_all_bios(conf, r1_bio); - mempool_free(r1_bio, conf->r1bio_pool); -} - -static void put_buf(r1bio_t *r1_bio) -{ - conf_t *conf = r1_bio->mddev->private; - int i; - - for (i=0; iraid_disks; i++) { - struct bio *bio = r1_bio->bios[i]; - if (bio->bi_end_io) - rdev_dec_pending(conf->mirrors[i].rdev, r1_bio->mddev); - } - - mempool_free(r1_bio, conf->r1buf_pool); - - lower_barrier(conf); -} - -static void reschedule_retry(r1bio_t *r1_bio) -{ - unsigned long flags; - mddev_t *mddev = r1_bio->mddev; - conf_t *conf = mddev->private; - - spin_lock_irqsave(&conf->device_lock, flags); - list_add(&r1_bio->retry_list, &conf->retry_list); - conf->nr_queued ++; - spin_unlock_irqrestore(&conf->device_lock, flags); - - wake_up(&conf->wait_barrier); - md_wakeup_thread(mddev->thread); -} - -/* - * raid_end_bio_io() is called when we have finished servicing a mirrored - * operation and are ready to return a success/failure code to the buffer - * cache layer. - */ -static void raid_end_bio_io(r1bio_t *r1_bio) -{ - struct bio *bio = r1_bio->master_bio; - - /* if nobody has done the final endio yet, do it now */ - if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) { - PRINTK(KERN_DEBUG "raid1: sync end %s on sectors %llu-%llu\n", - (bio_data_dir(bio) == WRITE) ? "write" : "read", - (unsigned long long) bio->bi_sector, - (unsigned long long) bio->bi_sector + - (bio->bi_size >> 9) - 1); - - bio_endio(bio, - test_bit(R1BIO_Uptodate, &r1_bio->state) ? 0 : -EIO); - } - free_r1bio(r1_bio); -} - -/* - * Update disk head position estimator based on IRQ completion info. - */ -static inline void update_head_pos(int disk, r1bio_t *r1_bio) -{ - conf_t *conf = r1_bio->mddev->private; - - conf->mirrors[disk].head_position = - r1_bio->sector + (r1_bio->sectors); -} - -static void raid1_end_read_request(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - r1bio_t *r1_bio = bio->bi_private; - int mirror; - conf_t *conf = r1_bio->mddev->private; - - mirror = r1_bio->read_disk; - /* - * this branch is our 'one mirror IO has finished' event handler: - */ - update_head_pos(mirror, r1_bio); - - if (uptodate) - set_bit(R1BIO_Uptodate, &r1_bio->state); - else { - /* If all other devices have failed, we want to return - * the error upwards rather than fail the last device. - * Here we redefine "uptodate" to mean "Don't want to retry" - */ - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - if (r1_bio->mddev->degraded == conf->raid_disks || - (r1_bio->mddev->degraded == conf->raid_disks-1 && - !test_bit(Faulty, &conf->mirrors[mirror].rdev->flags))) - uptodate = 1; - spin_unlock_irqrestore(&conf->device_lock, flags); - } - - if (uptodate) - raid_end_bio_io(r1_bio); - else { - /* - * oops, read error: - */ - char b[BDEVNAME_SIZE]; - if (printk_ratelimit()) - printk(KERN_ERR "md/raid1:%s: %s: rescheduling sector %llu\n", - mdname(conf->mddev), - bdevname(conf->mirrors[mirror].rdev->bdev,b), (unsigned long long)r1_bio->sector); - reschedule_retry(r1_bio); - } - - rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev); -} - -static void r1_bio_write_done(r1bio_t *r1_bio) -{ - if (atomic_dec_and_test(&r1_bio->remaining)) - { - /* it really is the end of this request */ - if (test_bit(R1BIO_BehindIO, &r1_bio->state)) { - /* free extra copy of the data pages */ - int i = r1_bio->behind_page_count; - while (i--) - safe_put_page(r1_bio->behind_pages[i]); - kfree(r1_bio->behind_pages); - r1_bio->behind_pages = NULL; - } - /* clear the bitmap if all writes complete successfully */ - bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, - r1_bio->sectors, - !test_bit(R1BIO_Degraded, &r1_bio->state), - test_bit(R1BIO_BehindIO, &r1_bio->state)); - md_write_end(r1_bio->mddev); - raid_end_bio_io(r1_bio); - } -} - -static void raid1_end_write_request(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - r1bio_t *r1_bio = bio->bi_private; - int mirror, behind = test_bit(R1BIO_BehindIO, &r1_bio->state); - conf_t *conf = r1_bio->mddev->private; - struct bio *to_put = NULL; - - - for (mirror = 0; mirror < conf->raid_disks; mirror++) - if (r1_bio->bios[mirror] == bio) - break; - - /* - * 'one mirror IO has finished' event handler: - */ - r1_bio->bios[mirror] = NULL; - to_put = bio; - if (!uptodate) { - md_error(r1_bio->mddev, conf->mirrors[mirror].rdev); - /* an I/O failed, we can't clear the bitmap */ - set_bit(R1BIO_Degraded, &r1_bio->state); - } else - /* - * Set R1BIO_Uptodate in our master bio, so that we - * will return a good error code for to the higher - * levels even if IO on some other mirrored buffer - * fails. - * - * The 'master' represents the composite IO operation - * to user-side. So if something waits for IO, then it - * will wait for the 'master' bio. - */ - set_bit(R1BIO_Uptodate, &r1_bio->state); - - update_head_pos(mirror, r1_bio); - - if (behind) { - if (test_bit(WriteMostly, &conf->mirrors[mirror].rdev->flags)) - atomic_dec(&r1_bio->behind_remaining); - - /* - * In behind mode, we ACK the master bio once the I/O - * has safely reached all non-writemostly - * disks. Setting the Returned bit ensures that this - * gets done only once -- we don't ever want to return - * -EIO here, instead we'll wait - */ - if (atomic_read(&r1_bio->behind_remaining) >= (atomic_read(&r1_bio->remaining)-1) && - test_bit(R1BIO_Uptodate, &r1_bio->state)) { - /* Maybe we can return now */ - if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) { - struct bio *mbio = r1_bio->master_bio; - PRINTK(KERN_DEBUG "raid1: behind end write sectors %llu-%llu\n", - (unsigned long long) mbio->bi_sector, - (unsigned long long) mbio->bi_sector + - (mbio->bi_size >> 9) - 1); - bio_endio(mbio, 0); - } - } - } - rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev); - - /* - * Let's see if all mirrored write operations have finished - * already. - */ - r1_bio_write_done(r1_bio); - - if (to_put) - bio_put(to_put); -} - - -/* - * This routine returns the disk from which the requested read should - * be done. There is a per-array 'next expected sequential IO' sector - * number - if this matches on the next IO then we use the last disk. - * There is also a per-disk 'last know head position' sector that is - * maintained from IRQ contexts, both the normal and the resync IO - * completion handlers update this position correctly. If there is no - * perfect sequential match then we pick the disk whose head is closest. - * - * If there are 2 mirrors in the same 2 devices, performance degrades - * because position is mirror, not device based. - * - * The rdev for the device selected will have nr_pending incremented. - */ -static int read_balance(conf_t *conf, r1bio_t *r1_bio) -{ - const sector_t this_sector = r1_bio->sector; - const int sectors = r1_bio->sectors; - int start_disk; - int best_disk; - int i; - sector_t best_dist; - mdk_rdev_t *rdev; - int choose_first; - - rcu_read_lock(); - /* - * Check if we can balance. We can balance on the whole - * device if no resync is going on, or below the resync window. - * We take the first readable disk when above the resync window. - */ - retry: - best_disk = -1; - best_dist = MaxSector; - if (conf->mddev->recovery_cp < MaxSector && - (this_sector + sectors >= conf->next_resync)) { - choose_first = 1; - start_disk = 0; - } else { - choose_first = 0; - start_disk = conf->last_used; - } - - for (i = 0 ; i < conf->raid_disks ; i++) { - sector_t dist; - int disk = start_disk + i; - if (disk >= conf->raid_disks) - disk -= conf->raid_disks; - - rdev = rcu_dereference(conf->mirrors[disk].rdev); - if (r1_bio->bios[disk] == IO_BLOCKED - || rdev == NULL - || test_bit(Faulty, &rdev->flags)) - continue; - if (!test_bit(In_sync, &rdev->flags) && - rdev->recovery_offset < this_sector + sectors) - continue; - if (test_bit(WriteMostly, &rdev->flags)) { - /* Don't balance among write-mostly, just - * use the first as a last resort */ - if (best_disk < 0) - best_disk = disk; - continue; - } - /* This is a reasonable device to use. It might - * even be best. - */ - dist = abs(this_sector - conf->mirrors[disk].head_position); - if (choose_first - /* Don't change to another disk for sequential reads */ - || conf->next_seq_sect == this_sector - || dist == 0 - /* If device is idle, use it */ - || atomic_read(&rdev->nr_pending) == 0) { - best_disk = disk; - break; - } - if (dist < best_dist) { - best_dist = dist; - best_disk = disk; - } - } - - if (best_disk >= 0) { - rdev = rcu_dereference(conf->mirrors[best_disk].rdev); - if (!rdev) - goto retry; - atomic_inc(&rdev->nr_pending); - if (test_bit(Faulty, &rdev->flags)) { - /* cannot risk returning a device that failed - * before we inc'ed nr_pending - */ - rdev_dec_pending(rdev, conf->mddev); - goto retry; - } - conf->next_seq_sect = this_sector + sectors; - conf->last_used = best_disk; - } - rcu_read_unlock(); - - return best_disk; -} - -int md_raid1_congested(mddev_t *mddev, int bits) -{ - conf_t *conf = mddev->private; - int i, ret = 0; - - rcu_read_lock(); - for (i = 0; i < mddev->raid_disks; i++) { - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (rdev && !test_bit(Faulty, &rdev->flags)) { - struct request_queue *q = bdev_get_queue(rdev->bdev); - - BUG_ON(!q); - - /* Note the '|| 1' - when read_balance prefers - * non-congested targets, it can be removed - */ - if ((bits & (1<backing_dev_info, bits); - else - ret &= bdi_congested(&q->backing_dev_info, bits); - } - } - rcu_read_unlock(); - return ret; -} -EXPORT_SYMBOL_GPL(md_raid1_congested); - -static int max_queued = INT_MAX; -static int raid1_congested(void *data, int bits) -{ - mddev_t *mddev = data; - - return mddev_congested(mddev, bits) || - md_raid1_congested(mddev, bits); -} - -static void flush_pending_writes(conf_t *conf) -{ - /* Any writes that have been queued but are awaiting - * bitmap updates get flushed here. - */ - spin_lock_irq(&conf->device_lock); - - if (conf->pending_bio_list.head) { - struct bio *bio; - bio = bio_list_get(&conf->pending_bio_list); - conf->pending_count = 0; - spin_unlock_irq(&conf->device_lock); - wake_up(&conf->wait_barrier); - /* flush any pending bitmap writes to - * disk before proceeding w/ I/O */ - bitmap_unplug(conf->mddev->bitmap); - - while (bio) { /* submit pending writes */ - struct bio *next = bio->bi_next; - bio->bi_next = NULL; - generic_make_request(bio); - bio = next; - } - } else - spin_unlock_irq(&conf->device_lock); -} - -/* Barriers.... - * Sometimes we need to suspend IO while we do something else, - * either some resync/recovery, or reconfigure the array. - * To do this we raise a 'barrier'. - * The 'barrier' is a counter that can be raised multiple times - * to count how many activities are happening which preclude - * normal IO. - * We can only raise the barrier if there is no pending IO. - * i.e. if nr_pending == 0. - * We choose only to raise the barrier if no-one is waiting for the - * barrier to go down. This means that as soon as an IO request - * is ready, no other operations which require a barrier will start - * until the IO request has had a chance. - * - * So: regular IO calls 'wait_barrier'. When that returns there - * is no backgroup IO happening, It must arrange to call - * allow_barrier when it has finished its IO. - * backgroup IO calls must call raise_barrier. Once that returns - * there is no normal IO happeing. It must arrange to call - * lower_barrier when the particular background IO completes. - */ -#define RESYNC_DEPTH 32 - -static void raise_barrier(conf_t *conf) -{ - spin_lock_irq(&conf->resync_lock); - - /* Wait until no block IO is waiting */ - wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting, - conf->resync_lock, ); - - /* block any new IO from starting */ - conf->barrier++; - - /* Now wait for all pending IO to complete */ - wait_event_lock_irq(conf->wait_barrier, - !conf->nr_pending && conf->barrier < RESYNC_DEPTH, - conf->resync_lock, ); - - spin_unlock_irq(&conf->resync_lock); -} - -static void lower_barrier(conf_t *conf) -{ - unsigned long flags; - BUG_ON(conf->barrier <= 0); - spin_lock_irqsave(&conf->resync_lock, flags); - conf->barrier--; - spin_unlock_irqrestore(&conf->resync_lock, flags); - wake_up(&conf->wait_barrier); -} - -static void wait_barrier(conf_t *conf) -{ - spin_lock_irq(&conf->resync_lock); - if (conf->barrier) { - conf->nr_waiting++; - wait_event_lock_irq(conf->wait_barrier, !conf->barrier, - conf->resync_lock, - ); - conf->nr_waiting--; - } - conf->nr_pending++; - spin_unlock_irq(&conf->resync_lock); -} - -static void allow_barrier(conf_t *conf) -{ - unsigned long flags; - spin_lock_irqsave(&conf->resync_lock, flags); - conf->nr_pending--; - spin_unlock_irqrestore(&conf->resync_lock, flags); - wake_up(&conf->wait_barrier); -} - -static void freeze_array(conf_t *conf) -{ - /* stop syncio and normal IO and wait for everything to - * go quite. - * We increment barrier and nr_waiting, and then - * wait until nr_pending match nr_queued+1 - * This is called in the context of one normal IO request - * that has failed. Thus any sync request that might be pending - * will be blocked by nr_pending, and we need to wait for - * pending IO requests to complete or be queued for re-try. - * Thus the number queued (nr_queued) plus this request (1) - * must match the number of pending IOs (nr_pending) before - * we continue. - */ - spin_lock_irq(&conf->resync_lock); - conf->barrier++; - conf->nr_waiting++; - wait_event_lock_irq(conf->wait_barrier, - conf->nr_pending == conf->nr_queued+1, - conf->resync_lock, - flush_pending_writes(conf)); - spin_unlock_irq(&conf->resync_lock); -} -static void unfreeze_array(conf_t *conf) -{ - /* reverse the effect of the freeze */ - spin_lock_irq(&conf->resync_lock); - conf->barrier--; - conf->nr_waiting--; - wake_up(&conf->wait_barrier); - spin_unlock_irq(&conf->resync_lock); -} - - -/* duplicate the data pages for behind I/O - */ -static void alloc_behind_pages(struct bio *bio, r1bio_t *r1_bio) -{ - int i; - struct bio_vec *bvec; - struct page **pages = kzalloc(bio->bi_vcnt * sizeof(struct page*), - GFP_NOIO); - if (unlikely(!pages)) - return; - - bio_for_each_segment(bvec, bio, i) { - pages[i] = alloc_page(GFP_NOIO); - if (unlikely(!pages[i])) - goto do_sync_io; - memcpy(kmap(pages[i]) + bvec->bv_offset, - kmap(bvec->bv_page) + bvec->bv_offset, bvec->bv_len); - kunmap(pages[i]); - kunmap(bvec->bv_page); - } - r1_bio->behind_pages = pages; - r1_bio->behind_page_count = bio->bi_vcnt; - set_bit(R1BIO_BehindIO, &r1_bio->state); - return; - -do_sync_io: - for (i = 0; i < bio->bi_vcnt; i++) - if (pages[i]) - put_page(pages[i]); - kfree(pages); - PRINTK("%dB behind alloc failed, doing sync I/O\n", bio->bi_size); -} - -static int make_request(mddev_t *mddev, struct bio * bio) -{ - conf_t *conf = mddev->private; - mirror_info_t *mirror; - r1bio_t *r1_bio; - struct bio *read_bio; - int i, targets = 0, disks; - struct bitmap *bitmap; - unsigned long flags; -<<<<<<< found -||||||| expected - struct bio_list bl; - struct page **behind_pages = NULL; -======= - struct bio_list bl; - int bl_count; - struct page **behind_pages = NULL; ->>>>>>> replacement - const int rw = bio_data_dir(bio); - const unsigned long do_sync = (bio->bi_rw & REQ_SYNC); - const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA)); - mdk_rdev_t *blocked_rdev; - int plugged; - - /* - * Register the new request and wait if the reconstruction - * thread has put up a bar for new requests. - * Continue immediately if no resync is active currently. - */ - - md_write_start(mddev, bio); /* wait on superblock update early */ - - if (bio_data_dir(bio) == WRITE && - bio->bi_sector + bio->bi_size/512 > mddev->suspend_lo && - bio->bi_sector < mddev->suspend_hi) { - /* As the suspend_* range is controlled by - * userspace, we want an interruptible - * wait. - */ - DEFINE_WAIT(w); - for (;;) { - flush_signals(current); - prepare_to_wait(&conf->wait_barrier, - &w, TASK_INTERRUPTIBLE); - if (bio->bi_sector + bio->bi_size/512 <= mddev->suspend_lo || - bio->bi_sector >= mddev->suspend_hi) - break; - schedule(); - } - finish_wait(&conf->wait_barrier, &w); - } - - wait_barrier(conf); - - bitmap = mddev->bitmap; - - /* - * make_request() can abort the operation when READA is being - * used and no empty request is available. - * - */ - r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO); - - r1_bio->master_bio = bio; - r1_bio->sectors = bio->bi_size >> 9; - r1_bio->state = 0; - r1_bio->mddev = mddev; - r1_bio->sector = bio->bi_sector; - - if (rw == READ) { - /* - * read balancing logic: - */ - int rdisk = read_balance(conf, r1_bio); - - if (rdisk < 0) { - /* couldn't find anywhere to read from */ - raid_end_bio_io(r1_bio); - return 0; - } - mirror = conf->mirrors + rdisk; - - if (test_bit(WriteMostly, &mirror->rdev->flags) && - bitmap) { - /* Reading from a write-mostly device must - * take care not to over-take any writes - * that are 'behind' - */ - wait_event(bitmap->behind_wait, - atomic_read(&bitmap->behind_writes) == 0); - } - r1_bio->read_disk = rdisk; - - read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev); - - r1_bio->bios[rdisk] = read_bio; - - read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset; - read_bio->bi_bdev = mirror->rdev->bdev; - read_bio->bi_end_io = raid1_end_read_request; - read_bio->bi_rw = READ | do_sync; - read_bio->bi_private = r1_bio; - - generic_make_request(read_bio); - return 0; - } - - /* - * WRITE: - */ - if (conf->pending_count >= max_queued) { - md_wakeup_thread(mddev->thread); - wait_event(conf->wait_barrier, - conf->pending_count < max_queued); - } - /* first select target devices under spinlock and - * inc refcount on their rdev. Record them by setting - * bios[x] to bio - */ - plugged = mddev_check_plugged(mddev); - - disks = conf->raid_disks; - retry_write: - blocked_rdev = NULL; - rcu_read_lock(); - for (i = 0; i < disks; i++) { - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { - atomic_inc(&rdev->nr_pending); - blocked_rdev = rdev; - break; - } - if (rdev && !test_bit(Faulty, &rdev->flags)) { - atomic_inc(&rdev->nr_pending); - if (test_bit(Faulty, &rdev->flags)) { - rdev_dec_pending(rdev, mddev); - r1_bio->bios[i] = NULL; - } else { - r1_bio->bios[i] = bio; - targets++; - } - } else - r1_bio->bios[i] = NULL; - } - rcu_read_unlock(); - - if (unlikely(blocked_rdev)) { - /* Wait for this device to become unblocked */ - int j; - - for (j = 0; j < i; j++) - if (r1_bio->bios[j]) - rdev_dec_pending(conf->mirrors[j].rdev, mddev); - - allow_barrier(conf); - md_wait_for_blocked_rdev(blocked_rdev, mddev); - wait_barrier(conf); - goto retry_write; - } - - BUG_ON(targets == 0); /* we never fail the last device */ - - if (targets < conf->raid_disks) { - /* array is degraded, we will not clear the bitmap - * on I/O completion (see raid1_end_write_request) */ - set_bit(R1BIO_Degraded, &r1_bio->state); - } - - /* do behind I/O ? - * Not if there are too many, or cannot allocate memory, - * or a reader on WriteMostly is waiting for behind writes - * to flush */ - if (bitmap && - (atomic_read(&bitmap->behind_writes) - < mddev->bitmap_info.max_write_behind) && - !waitqueue_active(&bitmap->behind_wait)) - alloc_behind_pages(bio, r1_bio); - - atomic_set(&r1_bio->remaining, 1); - atomic_set(&r1_bio->behind_remaining, 0); - - bitmap_startwrite(bitmap, bio->bi_sector, r1_bio->sectors, - test_bit(R1BIO_BehindIO, &r1_bio->state)); - bl_count = 0; - for (i = 0; i < disks; i++) { - struct bio *mbio; - if (!r1_bio->bios[i]) - continue; - - mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); - r1_bio->bios[i] = mbio; - - mbio->bi_sector = r1_bio->sector + conf->mirrors[i].rdev->data_offset; - mbio->bi_bdev = conf->mirrors[i].rdev->bdev; - mbio->bi_end_io = raid1_end_write_request; - mbio->bi_rw = WRITE | do_flush_fua | do_sync; - mbio->bi_private = r1_bio; - - if (r1_bio->behind_pages) { - struct bio_vec *bvec; - int j; - - /* Yes, I really want the '__' version so that - * we clear any unused pointer in the io_vec, rather - * than leave them unchanged. This is important - * because when we come to free the pages, we won't - * know the original bi_idx, so we just free - * them all - */ - __bio_for_each_segment(bvec, mbio, j, 0) - bvec->bv_page = r1_bio->behind_pages[j]; - if (test_bit(WriteMostly, &conf->mirrors[i].rdev->flags)) - atomic_inc(&r1_bio->behind_remaining); -<<<<<<< found - } - -||||||| expected - bio_list_add(&bl, mbio); - } - kfree(behind_pages); /* the behind pages are attached to the bios now */ - -======= - bio_list_add(&bl, mbio); - bl_count++; - } - kfree(behind_pages); /* the behind pages are attached to the bios now */ - ->>>>>>> replacement - atomic_inc(&r1_bio->remaining); -<<<<<<< found - spin_lock_irqsave(&conf->device_lock, flags); - bio_list_add(&conf->pending_bio_list, mbio); - spin_unlock_irqrestore(&conf->device_lock, flags); - } - r1_bio_write_done(r1_bio); -||||||| expected - spin_lock_irqsave(&conf->device_lock, flags); - bio_list_merge(&conf->pending_bio_list, &bl); - bio_list_init(&bl); - - blk_plug_device(mddev->queue); -======= - spin_lock_irqsave(&conf->device_lock, flags); - bio_list_merge(&conf->pending_bio_list, &bl); - conf->pending_count += bl_count; - bio_list_init(&bl); - - blk_plug_device(mddev->queue); ->>>>>>> replacement - - /* In case raid1d snuck in to freeze_array */ - wake_up(&conf->wait_barrier); - - if (do_sync || !bitmap || !plugged) - md_wakeup_thread(mddev->thread); - - return 0; -} - -static void status(struct seq_file *seq, mddev_t *mddev) -{ - conf_t *conf = mddev->private; - int i; - - seq_printf(seq, " [%d/%d] [", conf->raid_disks, - conf->raid_disks - mddev->degraded); - rcu_read_lock(); - for (i = 0; i < conf->raid_disks; i++) { - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); - seq_printf(seq, "%s", - rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); - } - rcu_read_unlock(); - seq_printf(seq, "]"); -} - - -static void error(mddev_t *mddev, mdk_rdev_t *rdev) -{ - char b[BDEVNAME_SIZE]; - conf_t *conf = mddev->private; - - /* - * If it is not operational, then we have already marked it as dead - * else if it is the last working disks, ignore the error, let the - * next level up know. - * else mark the drive as failed - */ - if (test_bit(In_sync, &rdev->flags) - && (conf->raid_disks - mddev->degraded) == 1) { - /* - * Don't fail the drive, act as though we were just a - * normal single drive. - * However don't try a recovery from this drive as - * it is very likely to fail. - */ - mddev->recovery_disabled = 1; - return; - } - if (test_and_clear_bit(In_sync, &rdev->flags)) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - mddev->degraded++; - set_bit(Faulty, &rdev->flags); - spin_unlock_irqrestore(&conf->device_lock, flags); - /* - * if recovery is running, make sure it aborts. - */ - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - } else - set_bit(Faulty, &rdev->flags); - set_bit(MD_CHANGE_DEVS, &mddev->flags); - printk(KERN_ALERT - "md/raid1:%s: Disk failure on %s, disabling device.\n" - "md/raid1:%s: Operation continuing on %d devices.\n", - mdname(mddev), bdevname(rdev->bdev, b), - mdname(mddev), conf->raid_disks - mddev->degraded); -} - -static void print_conf(conf_t *conf) -{ - int i; - - printk(KERN_DEBUG "RAID1 conf printout:\n"); - if (!conf) { - printk(KERN_DEBUG "(!conf)\n"); - return; - } - printk(KERN_DEBUG " --- wd:%d rd:%d\n", conf->raid_disks - conf->mddev->degraded, - conf->raid_disks); - - if ((bits & (1 << BDI_async_congested)) && - conf->pending_count >= max_queued) - return 1; - - rcu_read_lock(); - for (i = 0; i < conf->raid_disks; i++) { - char b[BDEVNAME_SIZE]; - mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev); - if (rdev) - printk(KERN_DEBUG " disk %d, wo:%d, o:%d, dev:%s\n", - i, !test_bit(In_sync, &rdev->flags), - !test_bit(Faulty, &rdev->flags), - bdevname(rdev->bdev,b)); - } - rcu_read_unlock(); -} - -static void close_sync(conf_t *conf) -{ - wait_barrier(conf); - allow_barrier(conf); - - mempool_destroy(conf->r1buf_pool); - conf->r1buf_pool = NULL; -} - -static int raid1_spare_active(mddev_t *mddev) -{ - int i; - conf_t *conf = mddev->private; - int count = 0; - unsigned long flags; - - /* - * Find all failed disks within the RAID1 configuration - * and mark them readable. - * Called under mddev lock, so rcu protection not needed. - */ - for (i = 0; i < conf->raid_disks; i++) { - mdk_rdev_t *rdev = conf->mirrors[i].rdev; - if (rdev - && !test_bit(Faulty, &rdev->flags) - && !test_and_set_bit(In_sync, &rdev->flags)) { - count++; - sysfs_notify_dirent(rdev->sysfs_state); - } - } - spin_lock_irqsave(&conf->device_lock, flags); - mddev->degraded -= count; - spin_unlock_irqrestore(&conf->device_lock, flags); - - print_conf(conf); - return count; -} - - -static int raid1_add_disk(mddev_t *mddev, mdk_rdev_t *rdev) -{ - conf_t *conf = mddev->private; - int err = -EEXIST; - int mirror = 0; - mirror_info_t *p; - int first = 0; - int last = mddev->raid_disks - 1; - - if (rdev->raid_disk >= 0) - first = last = rdev->raid_disk; - - for (mirror = first; mirror <= last; mirror++) - if ( !(p=conf->mirrors+mirror)->rdev) { - - disk_stack_limits(mddev->gendisk, rdev->bdev, - rdev->data_offset << 9); - /* as we don't honour merge_bvec_fn, we must - * never risk violating it, so limit - * ->max_segments to one lying with a single - * page, as a one page request is never in - * violation. - */ - if (rdev->bdev->bd_disk->queue->merge_bvec_fn) { - blk_queue_max_segments(mddev->queue, 1); - blk_queue_segment_boundary(mddev->queue, - PAGE_CACHE_SIZE - 1); - } - - p->head_position = 0; - rdev->raid_disk = mirror; - err = 0; - /* As all devices are equivalent, we don't need a full recovery - * if this was recently any drive of the array - */ - if (rdev->saved_raid_disk < 0) - conf->fullsync = 1; - rcu_assign_pointer(p->rdev, rdev); - break; - } - md_integrity_add_rdev(rdev, mddev); - print_conf(conf); - return err; -} - -static int raid1_remove_disk(mddev_t *mddev, int number) -{ - conf_t *conf = mddev->private; - int err = 0; - mdk_rdev_t *rdev; - mirror_info_t *p = conf->mirrors+ number; - - print_conf(conf); - rdev = p->rdev; - if (rdev) { - if (test_bit(In_sync, &rdev->flags) || - atomic_read(&rdev->nr_pending)) { - err = -EBUSY; - goto abort; - } - /* Only remove non-faulty devices if recovery - * is not possible. - */ - if (!test_bit(Faulty, &rdev->flags) && - !mddev->recovery_disabled && - mddev->degraded < conf->raid_disks) { - err = -EBUSY; - goto abort; - } - p->rdev = NULL; - synchronize_rcu(); - if (atomic_read(&rdev->nr_pending)) { - /* lost the race, try later */ - err = -EBUSY; - p->rdev = rdev; - goto abort; - } - err = md_integrity_register(mddev); - } -abort: - - print_conf(conf); - return err; -} - - -static void end_sync_read(struct bio *bio, int error) -{ - r1bio_t *r1_bio = bio->bi_private; - int i; - - for (i=r1_bio->mddev->raid_disks; i--; ) - if (r1_bio->bios[i] == bio) - break; - BUG_ON(i < 0); - update_head_pos(i, r1_bio); - /* - * we have read a block, now it needs to be re-written, - * or re-read if the read failed. - * We don't do much here, just schedule handling by raid1d - */ - if (test_bit(BIO_UPTODATE, &bio->bi_flags)) - set_bit(R1BIO_Uptodate, &r1_bio->state); - - if (atomic_dec_and_test(&r1_bio->remaining)) - reschedule_retry(r1_bio); -} - -static void end_sync_write(struct bio *bio, int error) -{ - int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); - r1bio_t *r1_bio = bio->bi_private; - mddev_t *mddev = r1_bio->mddev; - conf_t *conf = mddev->private; - int i; - int mirror=0; - - for (i = 0; i < conf->raid_disks; i++) - if (r1_bio->bios[i] == bio) { - mirror = i; - break; - } - if (!uptodate) { - sector_t sync_blocks = 0; - sector_t s = r1_bio->sector; - long sectors_to_go = r1_bio->sectors; - /* make sure these bits doesn't get cleared. */ - do { - bitmap_end_sync(mddev->bitmap, s, - &sync_blocks, 1); - s += sync_blocks; - sectors_to_go -= sync_blocks; - } while (sectors_to_go > 0); - md_error(mddev, conf->mirrors[mirror].rdev); - } - - update_head_pos(mirror, r1_bio); - - if (atomic_dec_and_test(&r1_bio->remaining)) { - sector_t s = r1_bio->sectors; - put_buf(r1_bio); - md_done_sync(mddev, s, uptodate); - } -} - -static int fix_sync_read_error(r1bio_t *r1_bio) -{ - /* Try some synchronous reads of other devices to get - * good data, much like with normal read errors. Only - * read into the pages we already have so we don't - * need to re-issue the read request. - * We don't need to freeze the array, because being in an - * active sync request, there is no normal IO, and - * no overlapping syncs. - */ - mddev_t *mddev = r1_bio->mddev; - conf_t *conf = mddev->private; - struct bio *bio = r1_bio->bios[r1_bio->read_disk]; - sector_t sect = r1_bio->sector; - int sectors = r1_bio->sectors; - int idx = 0; - - while(sectors) { - int s = sectors; - int d = r1_bio->read_disk; - int success = 0; - mdk_rdev_t *rdev; - int start; - - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - do { - if (r1_bio->bios[d]->bi_end_io == end_sync_read) { - /* No rcu protection needed here devices - * can only be removed when no resync is - * active, and resync is currently active - */ - rdev = conf->mirrors[d].rdev; - if (sync_page_io(rdev, - sect, - s<<9, - bio->bi_io_vec[idx].bv_page, - READ, false)) { - success = 1; - break; - } - } - d++; - if (d == conf->raid_disks) - d = 0; - } while (!success && d != r1_bio->read_disk); - - if (!success) { - char b[BDEVNAME_SIZE]; - /* Cannot read from anywhere, array is toast */ - md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev); - printk(KERN_ALERT "md/raid1:%s: %s: unrecoverable I/O read error" - " for block %llu\n", - mdname(mddev), - bdevname(bio->bi_bdev, b), - (unsigned long long)r1_bio->sector); - md_done_sync(mddev, r1_bio->sectors, 0); - put_buf(r1_bio); - return 0; - } - - start = d; - /* write it back and re-read */ - while (d != r1_bio->read_disk) { - if (d == 0) - d = conf->raid_disks; - d--; - if (r1_bio->bios[d]->bi_end_io != end_sync_read) - continue; - rdev = conf->mirrors[d].rdev; - if (sync_page_io(rdev, - sect, - s<<9, - bio->bi_io_vec[idx].bv_page, - WRITE, false) == 0) { - r1_bio->bios[d]->bi_end_io = NULL; - rdev_dec_pending(rdev, mddev); - md_error(mddev, rdev); - } else - atomic_add(s, &rdev->corrected_errors); - } - d = start; - while (d != r1_bio->read_disk) { - if (d == 0) - d = conf->raid_disks; - d--; - if (r1_bio->bios[d]->bi_end_io != end_sync_read) - continue; - rdev = conf->mirrors[d].rdev; - if (sync_page_io(rdev, - sect, - s<<9, - bio->bi_io_vec[idx].bv_page, - READ, false) == 0) - md_error(mddev, rdev); - } - sectors -= s; - sect += s; - idx ++; - } - set_bit(R1BIO_Uptodate, &r1_bio->state); - set_bit(BIO_UPTODATE, &bio->bi_flags); - return 1; -} - -static int process_checks(r1bio_t *r1_bio) -{ - /* We have read all readable devices. If we haven't - * got the block, then there is no hope left. - * If we have, then we want to do a comparison - * and skip the write if everything is the same. - * If any blocks failed to read, then we need to - * attempt an over-write - */ - mddev_t *mddev = r1_bio->mddev; - conf_t *conf = mddev->private; - int primary; - int i; - - for (primary = 0; primary < conf->raid_disks; primary++) - if (r1_bio->bios[primary]->bi_end_io == end_sync_read && - test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) { - r1_bio->bios[primary]->bi_end_io = NULL; - rdev_dec_pending(conf->mirrors[primary].rdev, mddev); - break; - } - r1_bio->read_disk = primary; - for (i = 0; i < conf->raid_disks; i++) { - int j; - int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9); - struct bio *pbio = r1_bio->bios[primary]; - struct bio *sbio = r1_bio->bios[i]; - int size; - - if (r1_bio->bios[i]->bi_end_io != end_sync_read) - continue; - - if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) { - for (j = vcnt; j-- ; ) { - struct page *p, *s; - p = pbio->bi_io_vec[j].bv_page; - s = sbio->bi_io_vec[j].bv_page; - if (memcmp(page_address(p), - page_address(s), - PAGE_SIZE)) - break; - } - } else - j = 0; - if (j >= 0) - mddev->resync_mismatches += r1_bio->sectors; - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery) - && test_bit(BIO_UPTODATE, &sbio->bi_flags))) { - /* No need to write to this device. */ - sbio->bi_end_io = NULL; - rdev_dec_pending(conf->mirrors[i].rdev, mddev); - continue; - } - /* fixup the bio for reuse */ - sbio->bi_vcnt = vcnt; - sbio->bi_size = r1_bio->sectors << 9; - sbio->bi_idx = 0; - sbio->bi_phys_segments = 0; - sbio->bi_flags &= ~(BIO_POOL_MASK - 1); - sbio->bi_flags |= 1 << BIO_UPTODATE; - sbio->bi_next = NULL; - sbio->bi_sector = r1_bio->sector + - conf->mirrors[i].rdev->data_offset; - sbio->bi_bdev = conf->mirrors[i].rdev->bdev; - size = sbio->bi_size; - for (j = 0; j < vcnt ; j++) { - struct bio_vec *bi; - bi = &sbio->bi_io_vec[j]; - bi->bv_offset = 0; - if (size > PAGE_SIZE) - bi->bv_len = PAGE_SIZE; - else - bi->bv_len = size; - size -= PAGE_SIZE; - memcpy(page_address(bi->bv_page), - page_address(pbio->bi_io_vec[j].bv_page), - PAGE_SIZE); - } - } - return 0; -} - -static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio) -{ - conf_t *conf = mddev->private; - int i; - int disks = conf->raid_disks; - struct bio *bio, *wbio; - - bio = r1_bio->bios[r1_bio->read_disk]; - - if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) - /* ouch - failed to read all of that. */ - if (!fix_sync_read_error(r1_bio)) - return; - - if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) - if (process_checks(r1_bio) < 0) - return; - /* - * schedule writes - */ - atomic_set(&r1_bio->remaining, 1); - for (i = 0; i < disks ; i++) { - wbio = r1_bio->bios[i]; - if (wbio->bi_end_io == NULL || - (wbio->bi_end_io == end_sync_read && - (i == r1_bio->read_disk || - !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)))) - continue; - - wbio->bi_rw = WRITE; - wbio->bi_end_io = end_sync_write; - atomic_inc(&r1_bio->remaining); - md_sync_acct(conf->mirrors[i].rdev->bdev, wbio->bi_size >> 9); - - generic_make_request(wbio); - } - - if (atomic_dec_and_test(&r1_bio->remaining)) { - /* if we're here, all write(s) have completed, so clean up */ - md_done_sync(mddev, r1_bio->sectors, 1); - put_buf(r1_bio); - } -} - -/* - * This is a kernel thread which: - * - * 1. Retries failed read operations on working mirrors. - * 2. Updates the raid superblock when problems encounter. - * 3. Performs writes following reads for array syncronising. - */ - -static void fix_read_error(conf_t *conf, int read_disk, - sector_t sect, int sectors) -{ - mddev_t *mddev = conf->mddev; - while(sectors) { - int s = sectors; - int d = read_disk; - int success = 0; - int start; - mdk_rdev_t *rdev; - - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - - do { - /* Note: no rcu protection needed here - * as this is synchronous in the raid1d thread - * which is the thread that might remove - * a device. If raid1d ever becomes multi-threaded.... - */ - rdev = conf->mirrors[d].rdev; - if (rdev && - test_bit(In_sync, &rdev->flags) && - sync_page_io(rdev, sect, s<<9, - conf->tmppage, READ, false)) - success = 1; - else { - d++; - if (d == conf->raid_disks) - d = 0; - } - } while (!success && d != read_disk); - - if (!success) { - /* Cannot read from anywhere -- bye bye array */ - md_error(mddev, conf->mirrors[read_disk].rdev); - break; - } - /* write it back and re-read */ - start = d; - while (d != read_disk) { - if (d==0) - d = conf->raid_disks; - d--; - rdev = conf->mirrors[d].rdev; - if (rdev && - test_bit(In_sync, &rdev->flags)) { - if (sync_page_io(rdev, sect, s<<9, - conf->tmppage, WRITE, false) - == 0) - /* Well, this device is dead */ - md_error(mddev, rdev); - } - } - d = start; - while (d != read_disk) { - char b[BDEVNAME_SIZE]; - if (d==0) - d = conf->raid_disks; - d--; - rdev = conf->mirrors[d].rdev; - if (rdev && - test_bit(In_sync, &rdev->flags)) { - if (sync_page_io(rdev, sect, s<<9, - conf->tmppage, READ, false) - == 0) - /* Well, this device is dead */ - md_error(mddev, rdev); - else { - atomic_add(s, &rdev->corrected_errors); - printk(KERN_INFO - "md/raid1:%s: read error corrected " - "(%d sectors at %llu on %s)\n", - mdname(mddev), s, - (unsigned long long)(sect + - rdev->data_offset), - bdevname(rdev->bdev, b)); - } - } - } - sectors -= s; - sect += s; - } -} - -static void raid1d(mddev_t *mddev) -{ - r1bio_t *r1_bio; - struct bio *bio; - unsigned long flags; - conf_t *conf = mddev->private; - struct list_head *head = &conf->retry_list; - mdk_rdev_t *rdev; - struct blk_plug plug; - - md_check_recovery(mddev); - - blk_start_plug(&plug); - for (;;) { - char b[BDEVNAME_SIZE]; - - if (atomic_read(&mddev->plug_cnt) == 0) - flush_pending_writes(conf); - - spin_lock_irqsave(&conf->device_lock, flags); - if (list_empty(head)) { - spin_unlock_irqrestore(&conf->device_lock, flags); - break; - } - r1_bio = list_entry(head->prev, r1bio_t, retry_list); - list_del(head->prev); - conf->nr_queued--; - spin_unlock_irqrestore(&conf->device_lock, flags); - - mddev = r1_bio->mddev; - conf = mddev->private; - if (test_bit(R1BIO_IsSync, &r1_bio->state)) - sync_request_write(mddev, r1_bio); - else { - int disk; - - /* we got a read error. Maybe the drive is bad. Maybe just - * the block and we can fix it. - * We freeze all other IO, and try reading the block from - * other devices. When we find one, we re-write - * and check it that fixes the read error. - * This is all done synchronously while the array is - * frozen - */ - if (mddev->ro == 0) { - freeze_array(conf); - fix_read_error(conf, r1_bio->read_disk, - r1_bio->sector, - r1_bio->sectors); - unfreeze_array(conf); - } else - md_error(mddev, - conf->mirrors[r1_bio->read_disk].rdev); - - bio = r1_bio->bios[r1_bio->read_disk]; - if ((disk=read_balance(conf, r1_bio)) == -1) { - printk(KERN_ALERT "md/raid1:%s: %s: unrecoverable I/O" - " read error for block %llu\n", - mdname(mddev), - bdevname(bio->bi_bdev,b), - (unsigned long long)r1_bio->sector); - raid_end_bio_io(r1_bio); - } else { - const unsigned long do_sync = r1_bio->master_bio->bi_rw & REQ_SYNC; - r1_bio->bios[r1_bio->read_disk] = - mddev->ro ? IO_BLOCKED : NULL; - r1_bio->read_disk = disk; - bio_put(bio); - bio = bio_clone_mddev(r1_bio->master_bio, - GFP_NOIO, mddev); - r1_bio->bios[r1_bio->read_disk] = bio; - rdev = conf->mirrors[disk].rdev; - if (printk_ratelimit()) - printk(KERN_ERR "md/raid1:%s: redirecting sector %llu to" - " other mirror: %s\n", - mdname(mddev), - (unsigned long long)r1_bio->sector, - bdevname(rdev->bdev,b)); - bio->bi_sector = r1_bio->sector + rdev->data_offset; - bio->bi_bdev = rdev->bdev; - bio->bi_end_io = raid1_end_read_request; - bio->bi_rw = READ | do_sync; - bio->bi_private = r1_bio; - generic_make_request(bio); - } - } - cond_resched(); - } - blk_finish_plug(&plug); -} - - -static int init_resync(conf_t *conf) -{ - int buffs; - - buffs = RESYNC_WINDOW / RESYNC_BLOCK_SIZE; - BUG_ON(conf->r1buf_pool); - conf->r1buf_pool = mempool_create(buffs, r1buf_pool_alloc, r1buf_pool_free, - conf->poolinfo); - if (!conf->r1buf_pool) - return -ENOMEM; - conf->next_resync = 0; - return 0; -} - -/* - * perform a "sync" on one "block" - * - * We need to make sure that no normal I/O request - particularly write - * requests - conflict with active sync requests. - * - * This is achieved by tracking pending requests and a 'barrier' concept - * that can be installed to exclude normal IO requests. - */ - -static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, int go_faster) -{ - conf_t *conf = mddev->private; - r1bio_t *r1_bio; - struct bio *bio; - sector_t max_sector, nr_sectors; - int disk = -1; - int i; - int wonly = -1; - int write_targets = 0, read_targets = 0; - sector_t sync_blocks; - int still_degraded = 0; - - if (!conf->r1buf_pool) - if (init_resync(conf)) - return 0; - - max_sector = mddev->dev_sectors; - if (sector_nr >= max_sector) { - /* If we aborted, we need to abort the - * sync on the 'current' bitmap chunk (there will - * only be one in raid1 resync. - * We can find the current addess in mddev->curr_resync - */ - if (mddev->curr_resync < max_sector) /* aborted */ - bitmap_end_sync(mddev->bitmap, mddev->curr_resync, - &sync_blocks, 1); - else /* completed sync */ - conf->fullsync = 0; - - bitmap_close_sync(mddev->bitmap); - close_sync(conf); - return 0; - } - - if (mddev->bitmap == NULL && - mddev->recovery_cp == MaxSector && - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && - conf->fullsync == 0) { - *skipped = 1; - return max_sector - sector_nr; - } - /* before building a request, check if we can skip these blocks.. - * This call the bitmap_start_sync doesn't actually record anything - */ - if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) && - !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { - /* We can skip this block, and probably several more */ - *skipped = 1; - return sync_blocks; - } - /* - * If there is non-resync activity waiting for a turn, - * and resync is going fast enough, - * then let it though before starting on this new sync request. - */ - if (!go_faster && conf->nr_waiting) - msleep_interruptible(1000); - - bitmap_cond_end_sync(mddev->bitmap, sector_nr); - r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO); - raise_barrier(conf); - - conf->next_resync = sector_nr; - - rcu_read_lock(); - /* - * If we get a correctably read error during resync or recovery, - * we might want to read from a different device. So we - * flag all drives that could conceivably be read from for READ, - * and any others (which will be non-In_sync devices) for WRITE. - * If a read fails, we try reading from something else for which READ - * is OK. - */ - - r1_bio->mddev = mddev; - r1_bio->sector = sector_nr; - r1_bio->state = 0; - set_bit(R1BIO_IsSync, &r1_bio->state); - - for (i=0; i < conf->raid_disks; i++) { - mdk_rdev_t *rdev; - bio = r1_bio->bios[i]; - - /* take from bio_init */ - bio->bi_next = NULL; - bio->bi_flags &= ~(BIO_POOL_MASK-1); - bio->bi_flags |= 1 << BIO_UPTODATE; - bio->bi_comp_cpu = -1; - bio->bi_rw = READ; - bio->bi_vcnt = 0; - bio->bi_idx = 0; - bio->bi_phys_segments = 0; - bio->bi_size = 0; - bio->bi_end_io = NULL; - bio->bi_private = NULL; - - rdev = rcu_dereference(conf->mirrors[i].rdev); - if (rdev == NULL || - test_bit(Faulty, &rdev->flags)) { - still_degraded = 1; - continue; - } else if (!test_bit(In_sync, &rdev->flags)) { - bio->bi_rw = WRITE; - bio->bi_end_io = end_sync_write; - write_targets ++; - } else { - /* may need to read from here */ - bio->bi_rw = READ; - bio->bi_end_io = end_sync_read; - if (test_bit(WriteMostly, &rdev->flags)) { - if (wonly < 0) - wonly = i; - } else { - if (disk < 0) - disk = i; - } - read_targets++; - } - atomic_inc(&rdev->nr_pending); - bio->bi_sector = sector_nr + rdev->data_offset; - bio->bi_bdev = rdev->bdev; - bio->bi_private = r1_bio; - } - rcu_read_unlock(); - if (disk < 0) - disk = wonly; - r1_bio->read_disk = disk; - - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0) - /* extra read targets are also write targets */ - write_targets += read_targets-1; - - if (write_targets == 0 || read_targets == 0) { - /* There is nowhere to write, so all non-sync - * drives must be failed - so we are finished - */ - sector_t rv = max_sector - sector_nr; - *skipped = 1; - put_buf(r1_bio); - return rv; - } - - if (max_sector > mddev->resync_max) - max_sector = mddev->resync_max; /* Don't do IO beyond here */ - nr_sectors = 0; - sync_blocks = 0; - do { - struct page *page; - int len = PAGE_SIZE; - if (sector_nr + (len>>9) > max_sector) - len = (max_sector - sector_nr) << 9; - if (len == 0) - break; - if (sync_blocks == 0) { - if (!bitmap_start_sync(mddev->bitmap, sector_nr, - &sync_blocks, still_degraded) && - !conf->fullsync && - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) - break; - BUG_ON(sync_blocks < (PAGE_SIZE>>9)); - if ((len >> 9) > sync_blocks) - len = sync_blocks<<9; - } - - for (i=0 ; i < conf->raid_disks; i++) { - bio = r1_bio->bios[i]; - if (bio->bi_end_io) { - page = bio->bi_io_vec[bio->bi_vcnt].bv_page; - if (bio_add_page(bio, page, len, 0) == 0) { - /* stop here */ - bio->bi_io_vec[bio->bi_vcnt].bv_page = page; - while (i > 0) { - i--; - bio = r1_bio->bios[i]; - if (bio->bi_end_io==NULL) - continue; - /* remove last page from this bio */ - bio->bi_vcnt--; - bio->bi_size -= len; - bio->bi_flags &= ~(1<< BIO_SEG_VALID); - } - goto bio_full; - } - } - } - nr_sectors += len>>9; - sector_nr += len>>9; - sync_blocks -= (len>>9); - } while (r1_bio->bios[disk]->bi_vcnt < RESYNC_PAGES); - bio_full: - r1_bio->sectors = nr_sectors; - - /* For a user-requested sync, we read all readable devices and do a - * compare - */ - if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { - atomic_set(&r1_bio->remaining, read_targets); - for (i=0; iraid_disks; i++) { - bio = r1_bio->bios[i]; - if (bio->bi_end_io == end_sync_read) { - md_sync_acct(bio->bi_bdev, nr_sectors); - generic_make_request(bio); - } - } - } else { - atomic_set(&r1_bio->remaining, 1); - bio = r1_bio->bios[r1_bio->read_disk]; - md_sync_acct(bio->bi_bdev, nr_sectors); - generic_make_request(bio); - - } - return nr_sectors; -} - -static sector_t raid1_size(mddev_t *mddev, sector_t sectors, int raid_disks) -{ - if (sectors) - return sectors; - - return mddev->dev_sectors; -} - -static conf_t *setup_conf(mddev_t *mddev) -{ - conf_t *conf; - int i; - mirror_info_t *disk; - mdk_rdev_t *rdev; - int err = -ENOMEM; - - conf = kzalloc(sizeof(conf_t), GFP_KERNEL); - if (!conf) - goto abort; - - conf->mirrors = kzalloc(sizeof(struct mirror_info)*mddev->raid_disks, - GFP_KERNEL); - if (!conf->mirrors) - goto abort; - - conf->tmppage = alloc_page(GFP_KERNEL); - if (!conf->tmppage) - goto abort; - - conf->poolinfo = kzalloc(sizeof(*conf->poolinfo), GFP_KERNEL); - if (!conf->poolinfo) - goto abort; - conf->poolinfo->raid_disks = mddev->raid_disks; - conf->r1bio_pool = mempool_create(NR_RAID1_BIOS, r1bio_pool_alloc, - r1bio_pool_free, - conf->poolinfo); - if (!conf->r1bio_pool) - goto abort; - - conf->poolinfo->mddev = mddev; - - spin_lock_init(&conf->device_lock); - list_for_each_entry(rdev, &mddev->disks, same_set) { - int disk_idx = rdev->raid_disk; - if (disk_idx >= mddev->raid_disks - || disk_idx < 0) - continue; - disk = conf->mirrors + disk_idx; - - disk->rdev = rdev; - - disk->head_position = 0; - } - conf->raid_disks = mddev->raid_disks; - conf->mddev = mddev; - INIT_LIST_HEAD(&conf->retry_list); - - spin_lock_init(&conf->resync_lock); - init_waitqueue_head(&conf->wait_barrier); - - bio_list_init(&conf->pending_bio_list); -<<<<<<< found - - conf->last_used = -1; - for (i = 0; i < conf->raid_disks; i++) { -||||||| expected - bio_list_init(&conf->flushing_bio_list); - - -======= - conf->pending_count = 0; - bio_list_init(&conf->flushing_bio_list); - - ->>>>>>> replacement - - disk = conf->mirrors + i; - - if (!disk->rdev || - !test_bit(In_sync, &disk->rdev->flags)) { - disk->head_position = 0; - if (disk->rdev) - conf->fullsync = 1; - } else if (conf->last_used < 0) - /* - * The first working device is used as a - * starting point to read balancing. - */ - conf->last_used = i; - } - - err = -EIO; - if (conf->last_used < 0) { - printk(KERN_ERR "md/raid1:%s: no operational mirrors\n", - mdname(mddev)); - goto abort; - } - err = -ENOMEM; - conf->thread = md_register_thread(raid1d, mddev, NULL); - if (!conf->thread) { - printk(KERN_ERR - "md/raid1:%s: couldn't allocate thread\n", - mdname(mddev)); - goto abort; - } - - return conf; - - abort: - if (conf) { - if (conf->r1bio_pool) - mempool_destroy(conf->r1bio_pool); - kfree(conf->mirrors); - safe_put_page(conf->tmppage); - kfree(conf->poolinfo); - kfree(conf); - } - return ERR_PTR(err); -} - -static int run(mddev_t *mddev) -{ - conf_t *conf; - int i; - mdk_rdev_t *rdev; - - if (mddev->level != 1) { - printk(KERN_ERR "md/raid1:%s: raid level not set to mirroring (%d)\n", - mdname(mddev), mddev->level); - return -EIO; - } - if (mddev->reshape_position != MaxSector) { - printk(KERN_ERR "md/raid1:%s: reshape_position set but not supported\n", - mdname(mddev)); - return -EIO; - } - /* - * copy the already verified devices into our private RAID1 - * bookkeeping area. [whatever we allocate in run(), - * should be freed in stop()] - */ - if (mddev->private == NULL) - conf = setup_conf(mddev); - else - conf = mddev->private; - - if (IS_ERR(conf)) - return PTR_ERR(conf); - - list_for_each_entry(rdev, &mddev->disks, same_set) { - if (!mddev->gendisk) - continue; - disk_stack_limits(mddev->gendisk, rdev->bdev, - rdev->data_offset << 9); - /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit ->max_segments to 1 lying within - * a single page, as a one page request is never in violation. - */ - if (rdev->bdev->bd_disk->queue->merge_bvec_fn) { - blk_queue_max_segments(mddev->queue, 1); - blk_queue_segment_boundary(mddev->queue, - PAGE_CACHE_SIZE - 1); - } - } - - mddev->degraded = 0; - for (i=0; i < conf->raid_disks; i++) - if (conf->mirrors[i].rdev == NULL || - !test_bit(In_sync, &conf->mirrors[i].rdev->flags) || - test_bit(Faulty, &conf->mirrors[i].rdev->flags)) - mddev->degraded++; - - if (conf->raid_disks - mddev->degraded == 1) - mddev->recovery_cp = MaxSector; - - if (mddev->recovery_cp != MaxSector) - printk(KERN_NOTICE "md/raid1:%s: not clean" - " -- starting background reconstruction\n", - mdname(mddev)); - printk(KERN_INFO - "md/raid1:%s: active with %d out of %d mirrors\n", - mdname(mddev), mddev->raid_disks - mddev->degraded, - mddev->raid_disks); - - /* - * Ok, everything is just fine now - */ - mddev->thread = conf->thread; - conf->thread = NULL; - mddev->private = conf; - - md_set_array_sectors(mddev, raid1_size(mddev, 0, 0)); - - if (mddev->queue) { - mddev->queue->backing_dev_info.congested_fn = raid1_congested; - mddev->queue->backing_dev_info.congested_data = mddev; - } - return md_integrity_register(mddev); -} - -static int stop(mddev_t *mddev) -{ - conf_t *conf = mddev->private; - struct bitmap *bitmap = mddev->bitmap; - - /* wait for behind writes to complete */ - if (bitmap && atomic_read(&bitmap->behind_writes) > 0) { - printk(KERN_INFO "md/raid1:%s: behind writes in progress - waiting to stop.\n", - mdname(mddev)); - /* need to kick something here to make sure I/O goes? */ - wait_event(bitmap->behind_wait, - atomic_read(&bitmap->behind_writes) == 0); - } - - raise_barrier(conf); - lower_barrier(conf); - - md_unregister_thread(mddev->thread); - mddev->thread = NULL; - if (conf->r1bio_pool) - mempool_destroy(conf->r1bio_pool); - kfree(conf->mirrors); - kfree(conf->poolinfo); - kfree(conf); - mddev->private = NULL; - return 0; -} - -static int raid1_resize(mddev_t *mddev, sector_t sectors) -{ - /* no resync is happening, and there is enough space - * on all devices, so we can resize. - * We need to make sure resync covers any new space. - * If the array is shrinking we should possibly wait until - * any io in the removed space completes, but it hardly seems - * worth it. - */ - md_set_array_sectors(mddev, raid1_size(mddev, sectors, 0)); - if (mddev->array_sectors > raid1_size(mddev, sectors, 0)) - return -EINVAL; - set_capacity(mddev->gendisk, mddev->array_sectors); - revalidate_disk(mddev->gendisk); - if (sectors > mddev->dev_sectors && - mddev->recovery_cp > mddev->dev_sectors) { - mddev->recovery_cp = mddev->dev_sectors; - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - } - mddev->dev_sectors = sectors; - mddev->resync_max_sectors = sectors; - return 0; -} - -static int raid1_reshape(mddev_t *mddev) -{ - /* We need to: - * 1/ resize the r1bio_pool - * 2/ resize conf->mirrors - * - * We allocate a new r1bio_pool if we can. - * Then raise a device barrier and wait until all IO stops. - * Then resize conf->mirrors and swap in the new r1bio pool. - * - * At the same time, we "pack" the devices so that all the missing - * devices have the higher raid_disk numbers. - */ - mempool_t *newpool, *oldpool; - struct pool_info *newpoolinfo; - mirror_info_t *newmirrors; - conf_t *conf = mddev->private; - int cnt, raid_disks; - unsigned long flags; - int d, d2, err; - - /* Cannot change chunk_size, layout, or level */ - if (mddev->chunk_sectors != mddev->new_chunk_sectors || - mddev->layout != mddev->new_layout || - mddev->level != mddev->new_level) { - mddev->new_chunk_sectors = mddev->chunk_sectors; - mddev->new_layout = mddev->layout; - mddev->new_level = mddev->level; - return -EINVAL; - } - - err = md_allow_write(mddev); - if (err) - return err; - - raid_disks = mddev->raid_disks + mddev->delta_disks; - - if (raid_disks < conf->raid_disks) { - cnt=0; - for (d= 0; d < conf->raid_disks; d++) - if (conf->mirrors[d].rdev) - cnt++; - if (cnt > raid_disks) - return -EBUSY; - } - - newpoolinfo = kmalloc(sizeof(*newpoolinfo), GFP_KERNEL); - if (!newpoolinfo) - return -ENOMEM; - newpoolinfo->mddev = mddev; - newpoolinfo->raid_disks = raid_disks; - - newpool = mempool_create(NR_RAID1_BIOS, r1bio_pool_alloc, - r1bio_pool_free, newpoolinfo); - if (!newpool) { - kfree(newpoolinfo); - return -ENOMEM; - } - newmirrors = kzalloc(sizeof(struct mirror_info) * raid_disks, GFP_KERNEL); - if (!newmirrors) { - kfree(newpoolinfo); - mempool_destroy(newpool); - return -ENOMEM; - } - - raise_barrier(conf); - - /* ok, everything is stopped */ - oldpool = conf->r1bio_pool; - conf->r1bio_pool = newpool; - - for (d = d2 = 0; d < conf->raid_disks; d++) { - mdk_rdev_t *rdev = conf->mirrors[d].rdev; - if (rdev && rdev->raid_disk != d2) { - char nm[20]; - sprintf(nm, "rd%d", rdev->raid_disk); - sysfs_remove_link(&mddev->kobj, nm); - rdev->raid_disk = d2; - sprintf(nm, "rd%d", rdev->raid_disk); - sysfs_remove_link(&mddev->kobj, nm); - if (sysfs_create_link(&mddev->kobj, - &rdev->kobj, nm)) - printk(KERN_WARNING - "md/raid1:%s: cannot register " - "%s\n", - mdname(mddev), nm); - } - if (rdev) - newmirrors[d2++].rdev = rdev; - } - kfree(conf->mirrors); - conf->mirrors = newmirrors; - kfree(conf->poolinfo); - conf->poolinfo = newpoolinfo; - - spin_lock_irqsave(&conf->device_lock, flags); - mddev->degraded += (raid_disks - conf->raid_disks); - spin_unlock_irqrestore(&conf->device_lock, flags); - conf->raid_disks = mddev->raid_disks = raid_disks; - mddev->delta_disks = 0; - - conf->last_used = 0; /* just make sure it is in-range */ - lower_barrier(conf); - - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - - mempool_destroy(oldpool); - return 0; -} - -static void raid1_quiesce(mddev_t *mddev, int state) -{ - conf_t *conf = mddev->private; - - switch(state) { - case 2: /* wake for suspend */ - wake_up(&conf->wait_barrier); - break; - case 1: - raise_barrier(conf); - break; - case 0: - lower_barrier(conf); - break; - } -} - -static void *raid1_takeover(mddev_t *mddev) -{ - /* raid1 can take over: - * raid5 with 2 devices, any layout or chunk size - */ - if (mddev->level == 5 && mddev->raid_disks == 2) { - conf_t *conf; - mddev->new_level = 1; - mddev->new_layout = 0; - mddev->new_chunk_sectors = 0; - conf = setup_conf(mddev); - if (!IS_ERR(conf)) - conf->barrier = 1; - return conf; - } - return ERR_PTR(-EINVAL); -} - -static struct mdk_personality raid1_personality = -{ - .name = "raid1", - .level = 1, - .owner = THIS_MODULE, - .make_request = make_request, - .run = run, - .stop = stop, - .status = status, - .error_handler = error, - .hot_add_disk = raid1_add_disk, - .hot_remove_disk= raid1_remove_disk, - .spare_active = raid1_spare_active, - .sync_request = sync_request, - .resize = raid1_resize, - .size = raid1_size, - .check_reshape = raid1_reshape, - .quiesce = raid1_quiesce, - .takeover = raid1_takeover, -}; - -static int __init raid_init(void) -{ - return register_md_personality(&raid1_personality); -} - -static void raid_exit(void) -{ - unregister_md_personality(&raid1_personality); -} - -module_init(raid_init); -module_exit(raid_exit); -MODULE_LICENSE("GPL"); -MODULE_DESCRIPTION("RAID1 (mirroring) personality for MD"); -MODULE_ALIAS("md-personality-3"); /* RAID1 */ -MODULE_ALIAS("md-raid1"); -MODULE_ALIAS("md-level-1"); - -module_param(max_queued, int, S_IRUGO|S_IWUSR); ./linux/raid1-A/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.887616493 +0000 @@ -1,270 +0,0 @@ -/* - * linux/include/linux/nfsd/nfsd.h - * - * Hodge-podge collection of knfsd-related stuff. - * I will sort this out later. - * - * Copyright (C) 1995-1997 Olaf Kirch - */ - -#ifndef LINUX_NFSD_NFSD_H -#define LINUX_NFSD_NFSD_H - -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -/* - * nfsd version - */ -#define NFSD_VERSION "0.5" -#define NFSD_SUPPORTED_MINOR_VERSION 0 - -#ifdef __KERNEL__ -/* - * Special flags for nfsd_permission. These must be different from MAY_READ, - * MAY_WRITE, and MAY_EXEC. - */ -#define MAY_NOP 0 -#define MAY_SATTR 8 -#define MAY_TRUNC 16 -#define MAY_LOCK 32 -#define MAY_OWNER_OVERRIDE 64 -#define MAY_LOCAL_ACCESS 128 /* IRIX doing local access check on device special file*/ -#if (MAY_SATTR | MAY_TRUNC | MAY_LOCK | MAY_OWNER_OVERRIDE | MAY_LOCAL_ACCESS) & (MAY_READ | MAY_WRITE | MAY_EXEC) -# error "please use a different value for MAY_SATTR or MAY_TRUNC or MAY_LOCK or MAY_LOCAL_ACCESS or MAY_OWNER_OVERRIDE." -#endif -#define MAY_CREATE (MAY_EXEC|MAY_WRITE) -#define MAY_REMOVE (MAY_EXEC|MAY_WRITE|MAY_TRUNC) - -/* - * Callback function for readdir - */ -struct readdir_cd { - int err; /* 0, nfserr, or nfserr_eof */ -}; -typedef int (*encode_dent_fn)(struct readdir_cd *, const char *, - int, loff_t, ino_t, unsigned int); -typedef int (*nfsd_dirop_t)(struct inode *, struct dentry *, int, int); - -extern struct svc_program nfsd_program; -extern struct svc_version nfsd_version2, nfsd_version3, - nfsd_version4; - -/* - * Function prototypes. - */ -int nfsd_svc(unsigned short port, int nrservs); -int nfsd_dispatch(struct svc_rqst *rqstp, u32 *statp); - -/* nfsd/vfs.c */ -int fh_lock_parent(struct svc_fh *, struct dentry *); -int nfsd_racache_init(int); -void nfsd_racache_shutdown(void); -int nfsd_lookup(struct svc_rqst *, struct svc_fh *, - const char *, int, struct svc_fh *); -int nfsd_setattr(struct svc_rqst *, struct svc_fh *, - struct iattr *, int, time_t); -int nfsd_create(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, - int type, dev_t rdev, struct svc_fh *res); -#ifdef CONFIG_NFSD_V3 -int nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); -int nfsd_create_v3(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, - struct svc_fh *res, int createmode, - u32 *verifier, int *truncp); -int nfsd_commit(struct svc_rqst *, struct svc_fh *, - off_t, unsigned long); -#endif /* CONFIG_NFSD_V3 */ -int nfsd_open(struct svc_rqst *, struct svc_fh *, int, - int, struct file *); -void nfsd_close(struct file *); -int nfsd_read(struct svc_rqst *, struct svc_fh *, - loff_t, struct iovec *,int, unsigned long *); -int nfsd_write(struct svc_rqst *, struct svc_fh *, - loff_t, struct iovec *,int, unsigned long, int *); -int nfsd_readlink(struct svc_rqst *, struct svc_fh *, - char *, int *); -int nfsd_symlink(struct svc_rqst *, struct svc_fh *, - char *name, int len, char *path, int plen, - struct svc_fh *res, struct iattr *); -int nfsd_link(struct svc_rqst *, struct svc_fh *, - char *, int, struct svc_fh *); -int nfsd_rename(struct svc_rqst *, - struct svc_fh *, char *, int, - struct svc_fh *, char *, int); -int nfsd_remove(struct svc_rqst *, - struct svc_fh *, char *, int); -int nfsd_unlink(struct svc_rqst *, struct svc_fh *, int type, - char *name, int len); -int nfsd_truncate(struct svc_rqst *, struct svc_fh *, - unsigned long size); -int nfsd_readdir(struct svc_rqst *, struct svc_fh *, - loff_t *, struct readdir_cd *, encode_dent_fn); -int nfsd_statfs(struct svc_rqst *, struct svc_fh *, - struct statfs *); - -int nfsd_notify_change(struct inode *, struct iattr *); -int nfsd_permission(struct svc_export *, struct dentry *, int); - - -/* - * NFSv4 State - */ -#ifdef CONFIG_NFSD_V4 -void nfs4_state_init(void); -void nfs4_state_shutdown(void); -#else -void static inline nfs4_state_init(void){} -void static inline nfs4_state_shutdown(void){} -#endif - -/* - * lockd binding - */ -void nfsd_lockd_init(void); -void nfsd_lockd_shutdown(void); - - -/* - * These macros provide pre-xdr'ed values for faster operation. - */ -#define nfs_ok __constant_htonl(NFS_OK) -#define nfserr_perm __constant_htonl(NFSERR_PERM) -#define nfserr_noent __constant_htonl(NFSERR_NOENT) -#define nfserr_io __constant_htonl(NFSERR_IO) -#define nfserr_nxio __constant_htonl(NFSERR_NXIO) -#define nfserr_eagain __constant_htonl(NFSERR_EAGAIN) -#define nfserr_acces __constant_htonl(NFSERR_ACCES) -#define nfserr_exist __constant_htonl(NFSERR_EXIST) -#define nfserr_xdev __constant_htonl(NFSERR_XDEV) -#define nfserr_nodev __constant_htonl(NFSERR_NODEV) -#define nfserr_notdir __constant_htonl(NFSERR_NOTDIR) -#define nfserr_isdir __constant_htonl(NFSERR_ISDIR) -#define nfserr_inval __constant_htonl(NFSERR_INVAL) -#define nfserr_fbig __constant_htonl(NFSERR_FBIG) -#define nfserr_nospc __constant_htonl(NFSERR_NOSPC) -#define nfserr_rofs __constant_htonl(NFSERR_ROFS) -#define nfserr_mlink __constant_htonl(NFSERR_MLINK) -#define nfserr_opnotsupp __constant_htonl(NFSERR_OPNOTSUPP) -#define nfserr_nametoolong __constant_htonl(NFSERR_NAMETOOLONG) -#define nfserr_notempty __constant_htonl(NFSERR_NOTEMPTY) -#define nfserr_dquot __constant_htonl(NFSERR_DQUOT) -#define nfserr_stale __constant_htonl(NFSERR_STALE) -#define nfserr_remote __constant_htonl(NFSERR_REMOTE) -#define nfserr_wflush __constant_htonl(NFSERR_WFLUSH) -#define nfserr_badhandle __constant_htonl(NFSERR_BADHANDLE) -#define nfserr_notsync __constant_htonl(NFSERR_NOT_SYNC) -#define nfserr_badcookie __constant_htonl(NFSERR_BAD_COOKIE) -#define nfserr_notsupp __constant_htonl(NFSERR_NOTSUPP) -#define nfserr_toosmall __constant_htonl(NFSERR_TOOSMALL) -#define nfserr_serverfault __constant_htonl(NFSERR_SERVERFAULT) -#define nfserr_badtype __constant_htonl(NFSERR_BADTYPE) -#define nfserr_jukebox __constant_htonl(NFSERR_JUKEBOX) -#define nfserr_bad_cookie __constant_htonl(NFSERR_BAD_COOKIE) -#define nfserr_same __constant_htonl(NFSERR_SAME) -#define nfserr_clid_inuse __constant_htonl(NFSERR_CLID_INUSE) -#define nfserr_stale_clientid __constant_htonl(NFSERR_STALE_CLIENTID) -#define nfserr_resource __constant_htonl(NFSERR_RESOURCE) -#define nfserr_nofilehandle __constant_htonl(NFSERR_NOFILEHANDLE) -#define nfserr_minor_vers_mismatch __constant_htonl(NFSERR_MINOR_VERS_MISMATCH) -#define nfserr_symlink __constant_htonl(NFSERR_SYMLINK) -#define nfserr_not_same __constant_htonl(NFSERR_NOT_SAME) -#define nfserr_readdir_nospc __constant_htonl(NFSERR_READDIR_NOSPC) -#define nfserr_bad_xdr __constant_htonl(NFSERR_BAD_XDR) - -/* error codes for internal use */ -/* if a request fails due to kmalloc failure, it gets dropped. - * Client should resend eventually - */ -#define nfserr_dropit __constant_htonl(30000) -/* end-of-file indicator in readdir */ -#define nfserr_eof __constant_htonl(30001) - -/* Check for dir entries '.' and '..' */ -#define isdotent(n, l) (l < 3 && n[0] == '.' && (l == 1 || n[1] == '.')) - -/* - * Time of server startup - */ -extern struct timeval nfssvc_boot; - - -#ifdef CONFIG_NFSD_V4 - -/* before processing a COMPOUND operation, we have to check that there - * is enough space in the buffer for XDR encode to succeed. otherwise, - * we might process an operation with side effects, and be unable to - * tell the client that the operation succeeded. - * - * COMPOUND_SLACK_SPACE - this is the minimum amount of buffer space - * needed to encode an "ordinary" _successful_ operation. (GETATTR, - * READ, READDIR, and READLINK have their own buffer checks.) if we - * fall below this level, we fail the next operation with NFS4ERR_RESOURCE. - * - * COMPOUND_ERR_SLACK_SPACE - this is the minimum amount of buffer space - * needed to encode an operation which has failed with NFS4ERR_RESOURCE. - * care is taken to ensure that we never fall below this level for any - * reason. - */ -#define COMPOUND_SLACK_SPACE 140 /* OP_GETFH */ -#define COMPOUND_ERR_SLACK_SPACE 12 /* OP_SETATTR */ - -#define NFSD_LEASE_TIME 60 /* seconds */ - -/* - * The following attributes are currently not supported by the NFSv4 server: - * ACL (will be supported in a forthcoming patch) - * ARCHIVE (deprecated anyway) - * FS_LOCATIONS (will be supported eventually) - * HIDDEN (unlikely to be supported any time soon) - * MIMETYPE (unlikely to be supported any time soon) - * QUOTA_* (will be supported in a forthcoming patch) - * SYSTEM (unlikely to be supported any time soon) - * TIME_BACKUP (unlikely to be supported any time soon) - * TIME_CREATE (unlikely to be supported any time soon) - */ -#define NFSD_SUPPORTED_ATTRS_WORD0 \ -(FATTR4_WORD0_SUPPORTED_ATTRS | FATTR4_WORD0_TYPE | FATTR4_WORD0_FH_EXPIRE_TYPE \ - | FATTR4_WORD0_CHANGE | FATTR4_WORD0_SIZE | FATTR4_WORD0_LINK_SUPPORT \ - | FATTR4_WORD0_SYMLINK_SUPPORT | FATTR4_WORD0_NAMED_ATTR | FATTR4_WORD0_FSID \ - | FATTR4_WORD0_UNIQUE_HANDLES | FATTR4_WORD0_LEASE_TIME | FATTR4_WORD0_RDATTR_ERROR \ - | FATTR4_WORD0_ACLSUPPORT | FATTR4_WORD0_CANSETTIME | FATTR4_WORD0_CASE_INSENSITIVE \ - | FATTR4_WORD0_CASE_PRESERVING | FATTR4_WORD0_CHOWN_RESTRICTED \ - | FATTR4_WORD0_FILEHANDLE | FATTR4_WORD0_FILEID | FATTR4_WORD0_FILES_AVAIL \ - | FATTR4_WORD0_FILES_FREE | FATTR4_WORD0_FILES_TOTAL | FATTR4_WORD0_HOMOGENEOUS \ - | FATTR4_WORD0_MAXFILESIZE | FATTR4_WORD0_MAXLINK | FATTR4_WORD0_MAXNAME \ - | FATTR4_WORD0_MAXREAD | FATTR4_WORD0_MAXWRITE) - -#define NFSD_SUPPORTED_ATTRS_WORD1 \ -(FATTR4_WORD1_MODE | FATTR4_WORD1_NO_TRUNC | FATTR4_WORD1_NUMLINKS \ - | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP | FATTR4_WORD1_RAWDEV \ - | FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | FATTR4_WORD1_SPACE_TOTAL \ - | FATTR4_WORD1_SPACE_USED | FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_ACCESS_SET \ - | FATTR4_WORD1_TIME_CREATE | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA \ - | FATTR4_WORD1_TIME_MODIFY | FATTR4_WORD1_TIME_MODIFY_SET) - -/* These will return ERR_INVAL if specified in GETATTR or READDIR. */ -#define NFSD_WRITEONLY_ATTRS_WORD1 \ -(FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_MODIFY_SET) - -/* These are the only attrs allowed in CREATE/OPEN/SETATTR. */ -#define NFSD_WRITEABLE_ATTRS_WORD0 FATTR4_WORD0_SIZE -#define NFSD_WRITEABLE_ATTRS_WORD1 \ -(FATTR4_WORD1_MODE | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP \ - | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_MODIFY_SET) - -#endif /* CONFIG_NFSD_V4 */ - -#endif /* __KERNEL__ */ - -#endif /* LINUX_NFSD_NFSD_H */ ./linux/nfsd-defines/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:10.903688134 +0000 @@ -1,3589 +0,0 @@ -/* - md.c : Multiple Devices driver for Linux - Copyright (C) 1998, 1999, 2000 Ingo Molnar - - completely rewritten, based on the MD driver code from Marc Zyngier - - Changes: - - - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar - - boot support for linear and striped mode by Harald Hoyer - - kerneld support by Boris Tobotras - - kmod support by: Cyrus Durgin - - RAID0 bugfixes: Mark Anthony Lisher - - Devfs support by Richard Gooch - - - lots of fixes and improvements to the RAID1/RAID5 and generic - RAID code (such as request based resynchronization): - - Neil Brown . - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - You should have received a copy of the GNU General Public License - (for example /usr/src/linux/COPYING); if not, write to the Free - Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -*/ - -#include -#include -#include -#include -#include -#include -#include -#include /* for invalidate_bdev */ -#include - -#include - -#ifdef CONFIG_KMOD -#include -#endif - -#define __KERNEL_SYSCALLS__ -#include - -#include - -#define MAJOR_NR MD_MAJOR -#define MD_DRIVER -#define DEVICE_NR(device) (minor(device)) - -#include - -#define DEBUG 0 -#define dprintk(x...) ((void)(DEBUG && printk(x))) - - -#ifndef MODULE -static void autostart_arrays (void); -#endif - -static mdk_personality_t *pers[MAX_PERSONALITY]; -static spinlock_t pers_lock = SPIN_LOCK_UNLOCKED; - -/* - * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' - * is 1000 KB/sec, so the extra system load does not show up that much. - * Increase it if you want to have more _guaranteed_ speed. Note that - * the RAID driver will use the maximum available bandwith if the IO - * subsystem is idle. There is also an 'absolute maximum' reconstruction - * speed limit - in case reconstruction slows down your system despite - * idle IO detection. - * - * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. - */ - -static int sysctl_speed_limit_min = 1000; -static int sysctl_speed_limit_max = 200000; - -static struct ctl_table_header *raid_table_header; - -static ctl_table raid_table[] = { - { - .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, - .procname = "speed_limit_min", - .data = &sysctl_speed_limit_min, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, - { - .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, - .procname = "speed_limit_max", - .data = &sysctl_speed_limit_max, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, - { .ctl_name = 0 } -}; - -static ctl_table raid_dir_table[] = { - { - .ctl_name = DEV_RAID, - .procname = "raid", - .maxlen = 0, - .mode = 0555, - .child = raid_table, - }, - { .ctl_name = 0 } -}; - -static ctl_table raid_root_table[] = { - { - .ctl_name = CTL_DEV, - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = raid_dir_table, - }, - { .ctl_name = 0 } -}; - -static struct block_device_operations md_fops; - -static struct gendisk *disks[MAX_MD_DEVS]; - -/* - * Enables to iterate over all existing md arrays - * all_mddevs_lock protects this list as well as mddev_map. - */ -static LIST_HEAD(all_mddevs); -static spinlock_t all_mddevs_lock = SPIN_LOCK_UNLOCKED; - - -/* - * iterates through all used mddevs in the system. - * We take care to grab the all_mddevs_lock whenever navigating - * the list, and to always hold a refcount when unlocked. - * Any code which breaks out of this loop while own - * a reference to the current mddev and must mddev_put it. - */ -#define ITERATE_MDDEV(mddev,tmp) \ - \ - for (({ spin_lock(&all_mddevs_lock); \ - tmp = all_mddevs.next; \ - mddev = NULL;}); \ - ({ if (tmp != &all_mddevs) \ - mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ - spin_unlock(&all_mddevs_lock); \ - if (mddev) mddev_put(mddev); \ - mddev = list_entry(tmp, mddev_t, all_mddevs); \ - tmp != &all_mddevs;}); \ - ({ spin_lock(&all_mddevs_lock); \ - tmp = tmp->next;}) \ - ) - -static mddev_t *mddev_map[MAX_MD_DEVS]; - -static int md_fail_request (request_queue_t *q, struct bio *bio) -{ - bio_io_error(bio, bio->bi_size); - return 0; -} - -static inline mddev_t *mddev_get(mddev_t *mddev) -{ - atomic_inc(&mddev->active); - return mddev; -} - -static void mddev_put(mddev_t *mddev) -{ - if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) - return; - if (!mddev->raid_disks && list_empty(&mddev->disks)) { - list_del(&mddev->all_mddevs); - mddev_map[mdidx(mddev)] = NULL; - kfree(mddev); - MOD_DEC_USE_COUNT; - } - spin_unlock(&all_mddevs_lock); -} - -static mddev_t * mddev_find(int unit) -{ - mddev_t *mddev, *new = NULL; - - retry: - spin_lock(&all_mddevs_lock); - if (mddev_map[unit]) { - mddev = mddev_get(mddev_map[unit]); - spin_unlock(&all_mddevs_lock); - if (new) - kfree(new); - return mddev; - } - if (new) { - mddev_map[unit] = new; - list_add(&new->all_mddevs, &all_mddevs); - spin_unlock(&all_mddevs_lock); - MOD_INC_USE_COUNT; - return new; - } - spin_unlock(&all_mddevs_lock); - - new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - memset(new, 0, sizeof(*new)); - - new->__minor = unit; - init_MUTEX(&new->reconfig_sem); - INIT_LIST_HEAD(&new->disks); - INIT_LIST_HEAD(&new->all_mddevs); - init_timer(&new->safemode_timer); - atomic_set(&new->active, 1); - blk_queue_make_request(&new->queue, md_fail_request); - - goto retry; -} - -static inline int mddev_lock(mddev_t * mddev) -{ - return down_interruptible(&mddev->reconfig_sem); -} - -static inline void mddev_lock_uninterruptible(mddev_t * mddev) -{ - down(&mddev->reconfig_sem); -} - -static inline int mddev_trylock(mddev_t * mddev) -{ - return down_trylock(&mddev->reconfig_sem); -} - -static inline void mddev_unlock(mddev_t * mddev) -{ - up(&mddev->reconfig_sem); -} - -mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) -{ - mdk_rdev_t * rdev; - struct list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == nr) - return rdev; - } - return NULL; -} - -static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->bdev->bd_dev == dev) - return rdev; - } - return NULL; -} - -inline static sector_t calc_dev_sboffset(struct block_device *bdev) -{ - sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - return MD_NEW_SIZE_BLOCKS(size); -} - -static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) -{ - sector_t size; - - size = rdev->sb_offset; - - if (chunk_size) - size &= ~((sector_t)chunk_size/1024 - 1); - return size; -} - -static int alloc_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) - MD_BUG(); - - rdev->sb_page = alloc_page(GFP_KERNEL); - if (!rdev->sb_page) { - printk(KERN_ALERT "md: out of memory.\n"); - return -EINVAL; - } - - return 0; -} - -static void free_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) { - page_cache_release(rdev->sb_page); - rdev->sb_loaded = 0; - rdev->sb_page = NULL; - rdev->sb_offset = 0; - rdev->size = 0; - } -} - - -static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) -{ - if (bio->bi_size) - return 1; - - complete((struct completion*)bio->bi_private); - return 0; -} - -static int sync_page_io(struct block_device *bdev, sector_t sector, int size, - struct page *page, int rw) -{ - struct bio bio; - struct bio_vec vec; - struct completion event; - - bio_init(&bio); - bio.bi_io_vec = &vec; - vec.bv_page = page; - vec.bv_len = size; - vec.bv_offset = 0; - bio.bi_vcnt = 1; - bio.bi_idx = 0; - bio.bi_size = size; - bio.bi_bdev = bdev; - bio.bi_sector = sector; - init_completion(&event); - bio.bi_private = &event; - bio.bi_end_io = bi_complete; - submit_bio(rw, &bio); - blk_run_queues(); - wait_for_completion(&event); - - return test_bit(BIO_UPTODATE, &bio.bi_flags); -} - -static int read_disk_sb(mdk_rdev_t * rdev) -{ - - if (!rdev->sb_page) { - MD_BUG(); - return -EINVAL; - } - if (rdev->sb_loaded) - return 0; - - - if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, READ)) - goto fail; - rdev->sb_loaded = 1; - return 0; - -fail: - printk(KERN_ERR "md: disabled device %s, could not read superblock.\n", - bdev_partition_name(rdev->bdev)); - return -EINVAL; -} - -static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - if ( (sb1->set_uuid0 == sb2->set_uuid0) && - (sb1->set_uuid1 == sb2->set_uuid1) && - (sb1->set_uuid2 == sb2->set_uuid2) && - (sb1->set_uuid3 == sb2->set_uuid3)) - - return 1; - - return 0; -} - - -static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - int ret; - mdp_super_t *tmp1, *tmp2; - - tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); - tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); - - if (!tmp1 || !tmp2) { - ret = 0; - printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); - goto abort; - } - - *tmp1 = *sb1; - *tmp2 = *sb2; - - /* - * nr_disks is not constant - */ - tmp1->nr_disks = 0; - tmp2->nr_disks = 0; - - if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) - ret = 0; - else - ret = 1; - -abort: - if (tmp1) - kfree(tmp1); - if (tmp2) - kfree(tmp2); - - return ret; -} - -static unsigned int calc_sb_csum(mdp_super_t * sb) -{ - unsigned int disk_csum, csum; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); - sb->sb_csum = disk_csum; - return csum; -} - -/* - * Handle superblock details. - * We want to be able to handle multiple superblock formats - * so we have a common interface to them all, and an array of - * different handlers. - * We rely on user-space to write the initial superblock, and support - * reading and updating of superblocks. - * Interface methods are: - * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) - * loads and validates a superblock on dev. - * if refdev != NULL, compare superblocks on both devices - * Return: - * 0 - dev has a superblock that is compatible with refdev - * 1 - dev has a superblock that is compatible and newer than refdev - * so dev should be used as the refdev in future - * -EINVAL superblock incompatible or invalid - * -othererror e.g. -EIO - * - * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) - * Verify that dev is acceptable into mddev. - * The first time, mddev->raid_disks will be 0, and data from - * dev should be merged in. Subsequent calls check that dev - * is new enough. Return 0 or -EINVAL - * - * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) - * Update the superblock for rdev with data in mddev - * This does not write to disc. - * - */ - -struct super_type { - char *name; - struct module *owner; - int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); - int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); - void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); -}; - -/* - * load_super for 0.90.0 - */ -static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) -{ - mdp_super_t *sb; - int ret; - sector_t sb_offset; - - /* - * Calculate the position of the superblock, - * it's at the end of the disk. - * - * It also happens to be a multiple of 4Kb. - */ - sb_offset = calc_dev_sboffset(rdev->bdev); - rdev->sb_offset = sb_offset; - - ret = read_disk_sb(rdev); - if (ret) return ret; - - ret = -EINVAL; - - sb = (mdp_super_t*)page_address(rdev->sb_page); - - if (sb->md_magic != MD_SB_MAGIC) { - printk(KERN_ERR "md: invalid raid superblock magic on %s\n", - bdev_partition_name(rdev->bdev)); - goto abort; - } - - if (sb->major_version != 0 || - sb->minor_version != 90) { - printk(KERN_WARNING "Bad version number %d.%d on %s\n", - sb->major_version, sb->minor_version, - bdev_partition_name(rdev->bdev)); - goto abort; - } - - if (sb->md_minor >= MAX_MD_DEVS) { - printk(KERN_ERR "md: %s: invalid raid minor (%x)\n", - bdev_partition_name(rdev->bdev), sb->md_minor); - goto abort; - } - if (sb->raid_disks <= 0) - goto abort; - - if (calc_sb_csum(sb) != sb->sb_csum) { - printk(KERN_WARNING "md: invalid superblock checksum on %s\n", - bdev_partition_name(rdev->bdev)); - goto abort; - } - - rdev->preferred_minor = sb->md_minor; - rdev->data_offset = 0; - - if (sb->level == MULTIPATH) - rdev->desc_nr = -1; - else - rdev->desc_nr = sb->this_disk.number; - - if (refdev == 0) - ret = 1; - else { - __u64 ev1, ev2; - mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); - if (!uuid_equal(refsb, sb)) { - printk(KERN_WARNING "md: %s has different UUID to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - goto abort; - } - if (!sb_equal(refsb, sb)) { - printk(KERN_WARNING "md: %s has same UUID" - " but different superblock to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - goto abort; - } - ev1 = md_event(sb); - ev2 = md_event(refsb); - if (ev1 > ev2) - ret = 1; - else - ret = 0; - } - rdev->size = calc_dev_size(rdev, sb->chunk_size); - - abort: - return ret; -} - -/* - * validate_super for 0.90.0 - */ -static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) -{ - mdp_disk_t *desc; - mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); - - if (mddev->raid_disks == 0) { - mddev->major_version = 0; - mddev->minor_version = sb->minor_version; - mddev->patch_version = sb->patch_version; - mddev->persistent = ! sb->not_persistent; - mddev->chunk_size = sb->chunk_size; - mddev->ctime = sb->ctime; - mddev->utime = sb->utime; - mddev->level = sb->level; - mddev->layout = sb->layout; - mddev->raid_disks = sb->raid_disks; - mddev->size = sb->size; - mddev->events = md_event(sb); - - if (sb->state & (1<recovery_cp = MaxSector; - else { - if (sb->events_hi == sb->cp_events_hi && - sb->events_lo == sb->cp_events_lo) { - mddev->recovery_cp = sb->recovery_cp; - } else - mddev->recovery_cp = 0; - } - - memcpy(mddev->uuid+0, &sb->set_uuid0, 4); - memcpy(mddev->uuid+4, &sb->set_uuid1, 4); - memcpy(mddev->uuid+8, &sb->set_uuid2, 4); - memcpy(mddev->uuid+12,&sb->set_uuid3, 4); - - mddev->max_disks = MD_SB_DISKS; - } else { - __u64 ev1; - ev1 = md_event(sb); - ++ev1; - if (ev1 < mddev->events) - return -EINVAL; - } - if (mddev->level != LEVEL_MULTIPATH) { - rdev->raid_disk = -1; - rdev->in_sync = rdev->faulty = 0; - desc = sb->disks + rdev->desc_nr; - - if (desc->state & (1<faulty = 1; - else if (desc->state & (1<raid_disk < mddev->raid_disks) { - rdev->in_sync = 1; - rdev->raid_disk = desc->raid_disk; - } - } - return 0; -} - -/* - * sync_super for 0.90.0 - */ -static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) -{ - mdp_super_t *sb; - struct list_head *tmp; - mdk_rdev_t *rdev2; - int next_spare = mddev->raid_disks; - - /* make rdev->sb match mddev data.. - * - * 1/ zero out disks - * 2/ Add info for each disk, keeping track of highest desc_nr - * 3/ any empty disks < highest become removed - * - * disks[0] gets initialised to REMOVED because - * we cannot be sure from other fields if it has - * been initialised or not. - */ - int highest = 0; - int i; - int active=0, working=0,failed=0,spare=0,nr_disks=0; - - sb = (mdp_super_t*)page_address(rdev->sb_page); - - memset(sb, 0, sizeof(*sb)); - - sb->md_magic = MD_SB_MAGIC; - sb->major_version = mddev->major_version; - sb->minor_version = mddev->minor_version; - sb->patch_version = mddev->patch_version; - sb->gvalid_words = 0; /* ignored */ - memcpy(&sb->set_uuid0, mddev->uuid+0, 4); - memcpy(&sb->set_uuid1, mddev->uuid+4, 4); - memcpy(&sb->set_uuid2, mddev->uuid+8, 4); - memcpy(&sb->set_uuid3, mddev->uuid+12,4); - - sb->ctime = mddev->ctime; - sb->level = mddev->level; - sb->size = mddev->size; - sb->raid_disks = mddev->raid_disks; - sb->md_minor = mddev->__minor; - sb->not_persistent = !mddev->persistent; - sb->utime = mddev->utime; - sb->state = 0; - sb->events_hi = (mddev->events>>32); - sb->events_lo = (u32)mddev->events; - - if (mddev->in_sync) - { - sb->recovery_cp = mddev->recovery_cp; - sb->cp_events_hi = (mddev->events>>32); - sb->cp_events_lo = (u32)mddev->events; - if (mddev->recovery_cp == MaxSector) - sb->state = (1<< MD_SB_CLEAN); - } else - sb->recovery_cp = 0; - - sb->layout = mddev->layout; - sb->chunk_size = mddev->chunk_size; - - sb->disks[0].state = (1<raid_disk >= 0 && rdev2->in_sync && !rdev2->faulty) - rdev2->desc_nr = rdev2->raid_disk; - else - rdev2->desc_nr = next_spare++; - d = &sb->disks[rdev2->desc_nr]; - nr_disks++; - d->number = rdev2->desc_nr; - d->major = MAJOR(rdev2->bdev->bd_dev); - d->minor = MINOR(rdev2->bdev->bd_dev); - if (rdev2->raid_disk >= 0 && rdev->in_sync && !rdev2->faulty) - d->raid_disk = rdev2->raid_disk; - else - d->raid_disk = rdev2->desc_nr; /* compatibility */ - if (rdev2->faulty) { - d->state = (1<in_sync) { - d->state = (1<state |= (1<state = 0; - spare++; - working++; - } - if (rdev2->desc_nr > highest) - highest = rdev2->desc_nr; - } - - /* now set the "removed" bit on any non-trailing holes */ - for (i=0; idisks[i]; - if (d->state == 0 && d->number == 0) { - d->number = i; - d->raid_disk = i; - d->state = (1<nr_disks = nr_disks; - sb->active_disks = active; - sb->working_disks = working; - sb->failed_disks = failed; - sb->spare_disks = spare; - - sb->this_disk = sb->disks[rdev->desc_nr]; - sb->sb_csum = calc_sb_csum(sb); -} - -/* - * version 1 superblock - */ - -static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) -{ - unsigned int disk_csum, csum; - int size = 256 + sb->max_dev*2; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, size, 0); - sb->sb_csum = disk_csum; - return csum; -} - -static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) -{ - struct mdp_superblock_1 *sb; - int ret; - sector_t sb_offset; - - /* - * Calculate the position of the superblock. - * It is always aligned to a 4K boundary and - * depeding on minor_version, it can be: - * 0: At least 8K, but less than 12K, from end of device - * 1: At start of device - * 2: 4K from start of device. - */ - switch(minor_version) { - case 0: - sb_offset = rdev->bdev->bd_inode->i_size >> 9; - sb_offset -= 8*2; - sb_offset &= ~(4*2); - /* convert from sectors to K */ - sb_offset /= 2; - break; - case 1: - sb_offset = 0; - break; - case 2: - sb_offset = 4; - break; - default: - return -EINVAL; - } - rdev->sb_offset = sb_offset; - - ret = read_disk_sb(rdev); - if (ret) return ret; - - - sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || - sb->major_version != cpu_to_le32(1) || - le32_to_cpu(sb->max_dev) > (4096-256)/2 || - le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || - sb->feature_map != 0) - return -EINVAL; - - if (calc_sb_1_csum(sb) != sb->sb_csum) { - printk("md: invalid superblock checksum on %s\n", - bdev_partition_name(rdev->bdev)); - return -EINVAL; - } - rdev->preferred_minor = 0xffff; - rdev->data_offset = le64_to_cpu(sb->data_offset); - - if (refdev == 0) - return 1; - else { - __u64 ev1, ev2; - struct mdp_superblock_1 *refsb = - (struct mdp_superblock_1*)page_address(refdev->sb_page); - - if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || - sb->level != refsb->level || - sb->layout != refsb->layout || - sb->chunksize != refsb->chunksize) { - printk(KERN_WARNING "md: %s has strangely different" - " superblock to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - return -EINVAL; - } - ev1 = le64_to_cpu(sb->events); - ev2 = le64_to_cpu(refsb->events); - - if (ev1 > ev2) - return 1; - } - if (minor_version) - rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; - else - rdev->size = rdev->sb_offset; - if (rdev->size < le64_to_cpu(sb->data_size)/2) - return -EINVAL; - rdev->size = le64_to_cpu(sb->data_size)/2; - if (le32_to_cpu(sb->chunksize)) - rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); - return 0; -} - -static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) -{ - struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - if (mddev->raid_disks == 0) { - mddev->major_version = 1; - mddev->minor_version = 0; - mddev->patch_version = 0; - mddev->persistent = 1; - mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; - mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); - mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); - mddev->level = le32_to_cpu(sb->level); - mddev->layout = le32_to_cpu(sb->layout); - mddev->raid_disks = le32_to_cpu(sb->raid_disks); - mddev->size = (u32)le64_to_cpu(sb->size); - mddev->events = le64_to_cpu(sb->events); - - mddev->recovery_cp = le64_to_cpu(sb->resync_offset); - memcpy(mddev->uuid, sb->set_uuid, 16); - - mddev->max_disks = (4096-256)/2; - } else { - __u64 ev1; - ev1 = le64_to_cpu(sb->events); - ++ev1; - if (ev1 < mddev->events) - return -EINVAL; - } - - if (mddev->level != LEVEL_MULTIPATH) { - int role; - rdev->desc_nr = le32_to_cpu(sb->dev_number); - role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); - switch(role) { - case 0xffff: /* spare */ - rdev->in_sync = 0; - rdev->faulty = 0; - rdev->raid_disk = -1; - break; - case 0xfffe: /* faulty */ - rdev->in_sync = 0; - rdev->faulty = 1; - rdev->raid_disk = -1; - break; - default: - rdev->in_sync = 1; - rdev->faulty = 0; - rdev->raid_disk = role; - break; - } - } - return 0; -} - -static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) -{ - struct mdp_superblock_1 *sb; - struct list_head *tmp; - mdk_rdev_t *rdev2; - int max_dev, i; - /* make rdev->sb match mddev and rdev data. */ - - sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - sb->feature_map = 0; - sb->pad0 = 0; - memset(sb->pad1, 0, sizeof(sb->pad1)); - memset(sb->pad2, 0, sizeof(sb->pad2)); - memset(sb->pad3, 0, sizeof(sb->pad3)); - - sb->utime = cpu_to_le64((__u64)mddev->utime); - sb->events = cpu_to_le64(mddev->events); - if (mddev->in_sync) - sb->resync_offset = cpu_to_le64(mddev->recovery_cp); - else - sb->resync_offset = cpu_to_le64(0); - - max_dev = 0; - ITERATE_RDEV(mddev,rdev2,tmp) - if (rdev2->desc_nr > max_dev) - max_dev = rdev2->desc_nr; - - sb->max_dev = max_dev; - for (i=0; idev_roles[max_dev] = cpu_to_le16(0xfffe); - - ITERATE_RDEV(mddev,rdev2,tmp) { - i = rdev2->desc_nr; - if (rdev2->faulty) - sb->dev_roles[i] = cpu_to_le16(0xfffe); - else if (rdev2->in_sync) - sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); - else - sb->dev_roles[i] = cpu_to_le16(0xffff); - } - - sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ -} - - -struct super_type super_types[] = { - [0] = { - .name = "0.90.0", - .owner = THIS_MODULE, - .load_super = super_90_load, - .validate_super = super_90_validate, - .sync_super = super_90_sync, - }, - [1] = { - .name = "md-1", - .owner = THIS_MODULE, - .load_super = super_1_load, - .validate_super = super_1_validate, - .sync_super = super_1_sync, - }, -}; - -static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) - if (rdev->bdev->bd_contains == dev->bdev->bd_contains) - return rdev; - - return NULL; -} - -static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev1,rdev,tmp) - if (match_dev_unit(mddev2, rdev)) - return 1; - - return 0; -} - -static LIST_HEAD(pending_raid_disks); - -static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) -{ - mdk_rdev_t *same_pdev; - - if (rdev->mddev) { - MD_BUG(); - return -EINVAL; - } - same_pdev = match_dev_unit(mddev, rdev); - if (same_pdev) - printk(KERN_WARNING - "md%d: WARNING: %s appears to be on the same physical" - " disk as %s. True\n protection against single-disk" - " failure might be compromised.\n", - mdidx(mddev), bdev_partition_name(rdev->bdev), - bdev_partition_name(same_pdev->bdev)); - - /* Verify rdev->desc_nr is unique. - * If it is -1, assign a free number, else - * check number is not in use - */ - if (rdev->desc_nr < 0) { - int choice = 0; - if (mddev->pers) choice = mddev->raid_disks; - while (find_rdev_nr(mddev, choice)) - choice++; - rdev->desc_nr = choice; - } else { - if (find_rdev_nr(mddev, rdev->desc_nr)) - return -EBUSY; - } - - list_add(&rdev->same_set, &mddev->disks); - rdev->mddev = mddev; - printk(KERN_INFO "md: bind<%s>\n", bdev_partition_name(rdev->bdev)); - return 0; -} - -static void unbind_rdev_from_array(mdk_rdev_t * rdev) -{ - if (!rdev->mddev) { - MD_BUG(); - return; - } - list_del_init(&rdev->same_set); - printk(KERN_INFO "md: unbind<%s>\n", bdev_partition_name(rdev->bdev)); - rdev->mddev = NULL; -} - -/* - * prevent the device from being mounted, repartitioned or - * otherwise reused by a RAID array (or any other kernel - * subsystem), by opening the device. [simply getting an - * inode is not enough, the SCSI module usage code needs - * an explicit open() on the device] - */ -static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) -{ - int err = 0; - struct block_device *bdev; - - bdev = bdget(dev); - if (!bdev) - return -ENOMEM; - err = blkdev_get(bdev, FMODE_READ|FMODE_WRITE, 0, BDEV_RAW); - if (err) - return err; - err = bd_claim(bdev, rdev); - if (err) { - blkdev_put(bdev, BDEV_RAW); - return err; - } - rdev->bdev = bdev; - return err; -} - -static void unlock_rdev(mdk_rdev_t *rdev) -{ - struct block_device *bdev = rdev->bdev; - rdev->bdev = NULL; - if (!bdev) - MD_BUG(); - bd_release(bdev); - blkdev_put(bdev, BDEV_RAW); -} - -void md_autodetect_dev(dev_t dev); - -static void export_rdev(mdk_rdev_t * rdev) -{ - printk(KERN_INFO "md: export_rdev(%s)\n", - bdev_partition_name(rdev->bdev)); - if (rdev->mddev) - MD_BUG(); - free_disk_sb(rdev); - list_del_init(&rdev->same_set); -#ifndef MODULE - md_autodetect_dev(rdev->bdev->bd_dev); -#endif - unlock_rdev(rdev); - kfree(rdev); -} - -static void kick_rdev_from_array(mdk_rdev_t * rdev) -{ - unbind_rdev_from_array(rdev); - export_rdev(rdev); -} - -static void export_array(mddev_t *mddev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!rdev->mddev) { - MD_BUG(); - continue; - } - kick_rdev_from_array(rdev); - } - if (!list_empty(&mddev->disks)) - MD_BUG(); - mddev->raid_disks = 0; - mddev->major_version = 0; -} - -static void print_desc(mdp_disk_t *desc) -{ - printk(" DISK\n", desc->number, - partition_name(MKDEV(desc->major,desc->minor)), - desc->major,desc->minor,desc->raid_disk,desc->state); -} - -static void print_sb(mdp_super_t *sb) -{ - int i; - - printk(KERN_INFO - "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", - sb->major_version, sb->minor_version, sb->patch_version, - sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, - sb->ctime); - printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", - sb->level, sb->size, sb->nr_disks, sb->raid_disks, - sb->md_minor, sb->layout, sb->chunk_size); - printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" - " FD:%d SD:%d CSUM:%08x E:%08lx\n", - sb->utime, sb->state, sb->active_disks, sb->working_disks, - sb->failed_disks, sb->spare_disks, - sb->sb_csum, (unsigned long)sb->events_lo); - - printk(KERN_INFO); - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - - desc = sb->disks + i; - if (desc->number || desc->major || desc->minor || - desc->raid_disk || (desc->state && (desc->state != 4))) { - printk(" D %2d: ", i); - print_desc(desc); - } - } - printk(KERN_INFO "md: THIS: "); - print_desc(&sb->this_disk); - -} - -static void print_rdev(mdk_rdev_t *rdev) -{ - printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%d ", - bdev_partition_name(rdev->bdev), (unsigned long long)rdev->size, - rdev->faulty, rdev->in_sync, rdev->desc_nr); - if (rdev->sb_loaded) { - printk(KERN_INFO "md: rdev superblock:\n"); - print_sb((mdp_super_t*)page_address(rdev->sb_page)); - } else - printk(KERN_INFO "md: no rdev superblock!\n"); -} - -void md_print_devices(void) -{ - struct list_head *tmp, *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev; - - printk("\n"); - printk("md: **********************************\n"); - printk("md: * *\n"); - printk("md: **********************************\n"); - ITERATE_MDDEV(mddev,tmp) { - printk("md%d: ", mdidx(mddev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - printk("<%s>", bdev_partition_name(rdev->bdev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - print_rdev(rdev); - } - printk("md: **********************************\n"); - printk("\n"); -} - - -static int write_disk_sb(mdk_rdev_t * rdev) -{ - - if (!rdev->sb_loaded) { - MD_BUG(); - return 1; - } - if (rdev->faulty) { - MD_BUG(); - return 1; - } - - dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->sb_offset); - - if (sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE)) - return 0; - - printk("md: write_disk_sb failed for device %s\n", - bdev_partition_name(rdev->bdev)); - return 1; -} - -static void sync_sbs(mddev_t * mddev) -{ - mdk_rdev_t *rdev; - struct list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - super_types[mddev->major_version]. - sync_super(mddev, rdev); - rdev->sb_loaded = 1; - } -} - -static void md_update_sb(mddev_t * mddev) -{ - int err, count = 100; - struct list_head *tmp; - mdk_rdev_t *rdev; - - mddev->sb_dirty = 0; -repeat: - mddev->utime = get_seconds(); - mddev->events ++; - - if (!mddev->events) { - /* - * oops, this 64-bit counter should never wrap. - * Either we are in around ~1 trillion A.C., assuming - * 1 reboot per second, or we have a bug: - */ - MD_BUG(); - mddev->events --; - } - sync_sbs(mddev); - - /* - * do not write anything to disk if using - * nonpersistent superblocks - */ - if (!mddev->persistent) - return; - - dprintk(KERN_INFO - "md: updating md%d RAID superblock on device (in sync %d)\n", - mdidx(mddev),mddev->in_sync); - - err = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - dprintk(KERN_INFO "md: "); - if (rdev->faulty) - dprintk("(skipping faulty "); - - dprintk("%s ", bdev_partition_name(rdev->bdev)); - if (!rdev->faulty) { - err += write_disk_sb(rdev); - } else - dprintk(")\n"); - if (!err && mddev->level == LEVEL_MULTIPATH) - /* only need to write one superblock... */ - break; - } - if (err) { - if (--count) { - printk(KERN_ERR "md: errors occurred during superblock" - " update, repeating\n"); - goto repeat; - } - printk(KERN_ERR \ - "md: excessive errors occurred during superblock update, exiting\n"); - } -} - -/* - * Import a device. If 'super_format' >= 0, then sanity check the superblock - * - * mark the device faulty if: - * - * - the device is nonexistent (zero size) - * - the device has no valid superblock - * - * a faulty rdev _never_ has rdev->sb set. - */ -static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) -{ - int err; - mdk_rdev_t *rdev; - sector_t size; - - rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL); - if (!rdev) { - printk(KERN_ERR "md: could not alloc mem for %s!\n", - partition_name(newdev)); - return ERR_PTR(-ENOMEM); - } - memset(rdev, 0, sizeof(*rdev)); - - if ((err = alloc_disk_sb(rdev))) - goto abort_free; - - err = lock_rdev(rdev, newdev); - if (err) { - printk(KERN_ERR "md: could not lock %s.\n", - partition_name(newdev)); - goto abort_free; - } - rdev->desc_nr = -1; - rdev->faulty = 0; - rdev->in_sync = 0; - rdev->data_offset = 0; - atomic_set(&rdev->nr_pending, 0); - - size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - if (!size) { - printk(KERN_WARNING - "md: %s has zero or unknown size, marking faulty!\n", - bdev_partition_name(rdev->bdev)); - err = -EINVAL; - goto abort_free; - } - - if (super_format >= 0) { - err = super_types[super_format]. - load_super(rdev, NULL, super_minor); - if (err == -EINVAL) { - printk(KERN_WARNING - "md: %s has invalid sb, not importing!\n", - bdev_partition_name(rdev->bdev)); - goto abort_free; - } - if (err < 0) { - printk(KERN_WARNING - "md: could not read %s's sb, not importing!\n", - bdev_partition_name(rdev->bdev)); - goto abort_free; - } - } - INIT_LIST_HEAD(&rdev->same_set); - - return rdev; - -abort_free: - if (rdev->sb_page) { - if (rdev->bdev) - unlock_rdev(rdev); - free_disk_sb(rdev); - } - kfree(rdev); - return ERR_PTR(err); -} - -/* - * Check a full RAID array for plausibility - */ - - -static int analyze_sbs(mddev_t * mddev) -{ - int i; - struct list_head *tmp; - mdk_rdev_t *rdev, *freshest; - - freshest = NULL; - ITERATE_RDEV(mddev,rdev,tmp) - switch (super_types[mddev->major_version]. - load_super(rdev, freshest, mddev->minor_version)) { - case 1: - freshest = rdev; - break; - case 0: - break; - default: - printk( KERN_ERR \ - "md: fatal superblock inconsistency in %s" - " -- removing from array\n", - bdev_partition_name(rdev->bdev)); - kick_rdev_from_array(rdev); - } - - - super_types[mddev->major_version]. - validate_super(mddev, freshest); - - i = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev != freshest) - if (super_types[mddev->major_version]. - validate_super(mddev, rdev)) { - printk(KERN_WARNING "md: kicking non-fresh %s" - " from array!\n", - bdev_partition_name(rdev->bdev)); - kick_rdev_from_array(rdev); - continue; - } - if (mddev->level == LEVEL_MULTIPATH) { - rdev->desc_nr = i++; - rdev->raid_disk = rdev->desc_nr; - rdev->in_sync = 1; - } - } - - - /* - * Check if we can support this RAID array - */ - if (mddev->major_version != MD_MAJOR_VERSION || - mddev->minor_version > MD_MINOR_VERSION) { - printk(KERN_ALERT - "md: md%d: unsupported raid array version %d.%d.%d\n", - mdidx(mddev), mddev->major_version, - mddev->minor_version, mddev->patch_version); - goto abort; - } - - if ((mddev->recovery_cp != MaxSector) && ((mddev->level == 1) || - (mddev->level == 4) || (mddev->level == 5))) - printk(KERN_ERR "md: md%d: raid array is not clean" - " -- starting background reconstruction\n", - mdidx(mddev)); - - return 0; -abort: - return 1; -} - -static struct gendisk *md_probe(dev_t dev, int *part, void *data) -{ - static DECLARE_MUTEX(disks_sem); - int unit = MINOR(dev); - mddev_t *mddev = mddev_find(unit); - struct gendisk *disk; - - if (!mddev) - return NULL; - - down(&disks_sem); - if (disks[unit]) { - up(&disks_sem); - mddev_put(mddev); - return NULL; - } - disk = alloc_disk(1); - if (!disk) { - up(&disks_sem); - mddev_put(mddev); - return NULL; - } - disk->major = MD_MAJOR; - disk->first_minor = mdidx(mddev); - sprintf(disk->disk_name, "md%d", mdidx(mddev)); - disk->fops = &md_fops; - disk->private_data = mddev; - disk->queue = &mddev->queue; - add_disk(disk); - disks[mdidx(mddev)] = disk; - up(&disks_sem); - return NULL; -} - -void md_wakeup_thread(mdk_thread_t *thread); - -static void md_safemode_timeout(unsigned long data) -{ - mddev_t *mddev = (mddev_t *) data; - - mddev->safemode = 1; - md_wakeup_thread(mddev->thread); -} - - -static int do_md_run(mddev_t * mddev) -{ - int pnum, err; - int chunk_size; - struct list_head *tmp; - mdk_rdev_t *rdev; - struct gendisk *disk; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return -EINVAL; - } - - if (mddev->pers) - return -EBUSY; - - /* - * Analyze all RAID superblock(s) - */ - if (!mddev->raid_disks && analyze_sbs(mddev)) { - MD_BUG(); - return -EINVAL; - } - - chunk_size = mddev->chunk_size; - pnum = level_to_pers(mddev->level); - - if ((pnum != MULTIPATH) && (pnum != RAID1)) { - if (!chunk_size) { - /* - * 'default chunksize' in the old md code used to - * be PAGE_SIZE, baaad. - * we abort here to be on the safe side. We don't - * want to continue the bad practice. - */ - printk(KERN_ERR - "no chunksize specified, see 'man raidtab'\n"); - return -EINVAL; - } - if (chunk_size > MAX_CHUNK_SIZE) { - printk(KERN_ERR "too big chunk_size: %d > %d\n", - chunk_size, MAX_CHUNK_SIZE); - return -EINVAL; - } - /* - * chunk-size has to be a power of 2 and multiples of PAGE_SIZE - */ - if ( (1 << ffz(~chunk_size)) != chunk_size) { - MD_BUG(); - return -EINVAL; - } - if (chunk_size < PAGE_SIZE) { - printk(KERN_ERR "too small chunk_size: %d < %ld\n", - chunk_size, PAGE_SIZE); - return -EINVAL; - } - - /* devices must have minimum size of one chunk */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size < chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size:" - " %lluk < %dk\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->size, - chunk_size / 1024); - return -EINVAL; - } - } - } - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - -#ifdef CONFIG_KMOD - if (!pers[pnum]) - { - char module_name[80]; - sprintf (module_name, "md-personality-%d", pnum); - request_module (module_name); - } -#endif - - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md - * device. - * Also find largest hardsector size - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - sync_blockdev(rdev->bdev); - invalidate_bdev(rdev->bdev, 0); - } - - md_probe(mdidx(mddev), NULL, NULL); - disk = disks[mdidx(mddev)]; - if (!disk) - return -ENOMEM; - - spin_lock(&pers_lock); - if (!pers[pnum] || !try_module_get(pers[pnum]->owner)) { - spin_unlock(&pers_lock); - printk(KERN_ERR "md: personality %d is not loaded!\n", - pnum); - return -EINVAL; - } - - mddev->pers = pers[pnum]; - spin_unlock(&pers_lock); - - blk_queue_make_request(&mddev->queue, mddev->pers->make_request); - printk("%s: setting max_sectors to %d, segment boundary to %d\n", - disk->disk_name, - chunk_size >> 9, - (chunk_size>>1)-1); - blk_queue_max_sectors(&mddev->queue, chunk_size >> 9); - blk_queue_segment_boundary(&mddev->queue, (chunk_size>>1) - 1); - mddev->queue.queuedata = mddev; - - err = mddev->pers->run(mddev); - if (err) { - printk(KERN_ERR "md: pers->run() failed ...\n"); - module_put(mddev->pers->owner); - mddev->pers = NULL; - return -EINVAL; - } - atomic_set(&mddev->writes_pending,0); - mddev->safemode = 0; - mddev->safemode_timer.function = md_safemode_timeout; - mddev->safemode_timer.data = (unsigned long) mddev; - mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ - mddev->in_sync = 1; - - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - set_capacity(disk, mddev->array_size<<1); - return 0; -} - -static int restart_array(mddev_t *mddev) -{ - struct gendisk *disk = disks[mdidx(mddev)]; - int err; - - /* - * Complain if it has no devices - */ - err = -ENXIO; - if (list_empty(&mddev->disks)) - goto out; - - if (mddev->pers) { - err = -EBUSY; - if (!mddev->ro) - goto out; - - mddev->safemode = 0; - mddev->ro = 0; - set_disk_ro(disk, 0); - - printk(KERN_INFO "md: md%d switched to read-write mode.\n", - mdidx(mddev)); - /* - * Kick recovery or resync if necessary - */ - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - err = 0; - } else { - printk(KERN_ERR "md: md%d has no personality assigned.\n", - mdidx(mddev)); - err = -EINVAL; - } - -out: - return err; -} - -static int do_md_stop(mddev_t * mddev, int ro) -{ - int err = 0; - struct gendisk *disk = disks[mdidx(mddev)]; - - if (atomic_read(&mddev->active)>2) { - printk("md: md%d still in use.\n",mdidx(mddev)); - err = -EBUSY; - goto out; - } - - if (mddev->pers) { - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - } - - del_timer_sync(&mddev->safemode_timer); - - invalidate_device(mk_kdev(disk->major, disk->first_minor), 1); - - if (ro) { - err = -ENXIO; - if (mddev->ro) - goto out; - mddev->ro = 1; - } else { - if (mddev->ro) - set_disk_ro(disk, 0); - if (mddev->pers->stop(mddev)) { - err = -EBUSY; - if (mddev->ro) - set_disk_ro(disk, 1); - goto out; - } - module_put(mddev->pers->owner); - mddev->pers = NULL; - if (mddev->ro) - mddev->ro = 0; - } - if (mddev->raid_disks) { - /* mark array as shutdown cleanly */ - mddev->in_sync = 1; - md_update_sb(mddev); - } - if (ro) - set_disk_ro(disk, 1); - } - /* - * Free resources if final stop - */ - if (!ro) { - struct gendisk *disk; - printk(KERN_INFO "md: md%d stopped.\n", mdidx(mddev)); - - export_array(mddev); - - mddev->array_size = 0; - disk = disks[mdidx(mddev)]; - if (disk) - set_capacity(disk, 0); - } else - printk(KERN_INFO "md: md%d switched to read-only mode.\n", - mdidx(mddev)); - err = 0; -out: - return err; -} - -static void autorun_array(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct list_head *tmp; - int err; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return; - } - - printk(KERN_INFO "md: running: "); - - ITERATE_RDEV(mddev,rdev,tmp) { - printk("<%s>", bdev_partition_name(rdev->bdev)); - } - printk("\n"); - - err = do_md_run (mddev); - if (err) { - printk(KERN_WARNING "md :do_md_run() returned %d\n", err); - do_md_stop (mddev, 0); - } -} - -/* - * lets try to run arrays based on all disks that have arrived - * until now. (those are in pending_raid_disks) - * - * the method: pick the first pending disk, collect all disks with - * the same UUID, remove all from the pending list and put them into - * the 'same_array' list. Then order this list based on superblock - * update time (freshest comes first), kick out 'old' disks and - * compare superblocks. If everything's fine then run it. - * - * If "unit" is allocated, then bump its reference count - */ -static void autorun_devices(void) -{ - struct list_head candidates; - struct list_head *tmp; - mdk_rdev_t *rdev0, *rdev; - mddev_t *mddev; - - printk(KERN_INFO "md: autorun ...\n"); - while (!list_empty(&pending_raid_disks)) { - rdev0 = list_entry(pending_raid_disks.next, - mdk_rdev_t, same_set); - - printk(KERN_INFO "md: considering %s ...\n", - bdev_partition_name(rdev0->bdev)); - INIT_LIST_HEAD(&candidates); - ITERATE_RDEV_PENDING(rdev,tmp) - if (super_90_load(rdev, rdev0, 0) >= 0) { - printk(KERN_INFO "md: adding %s ...\n", - bdev_partition_name(rdev->bdev)); - list_move(&rdev->same_set, &candidates); - } - /* - * now we have a set of devices, with all of them having - * mostly sane superblocks. It's time to allocate the - * mddev. - */ - - mddev = mddev_find(rdev0->preferred_minor); - if (!mddev) { - printk(KERN_ERR - "md: cannot allocate memory for md drive.\n"); - break; - } - if (mddev_lock(mddev)) - printk(KERN_WARNING "md: md%d locked, cannot run\n", - mdidx(mddev)); - else if (mddev->raid_disks || mddev->major_version - || !list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: md%d already running, cannot run %s\n", - mdidx(mddev), bdev_partition_name(rdev0->bdev)); - mddev_unlock(mddev); - } else { - printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); - ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { - list_del_init(&rdev->same_set); - if (bind_rdev_to_array(rdev, mddev)) - export_rdev(rdev); - } - autorun_array(mddev); - mddev_unlock(mddev); - } - /* on success, candidates will be empty, on error - * it won't... - */ - ITERATE_RDEV_GENERIC(candidates,rdev,tmp) - export_rdev(rdev); - mddev_put(mddev); - } - printk(KERN_INFO "md: ... autorun DONE.\n"); -} - -/* - * import RAID devices based on one partition - * if possible, the array gets run as well. - */ - -static int autostart_array(dev_t startdev) -{ - int err = -EINVAL, i; - mdp_super_t *sb = NULL; - mdk_rdev_t *start_rdev = NULL, *rdev; - - start_rdev = md_import_device(startdev, 0, 0); - if (IS_ERR(start_rdev)) { - printk(KERN_WARNING "md: could not import %s!\n", - partition_name(startdev)); - return err; - } - - /* NOTE: this can only work for 0.90.0 superblocks */ - sb = (mdp_super_t*)page_address(start_rdev->sb_page); - if (sb->major_version != 0 || - sb->minor_version != 90 ) { - printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); - export_rdev(start_rdev); - return err; - } - - if (start_rdev->faulty) { - printk(KERN_WARNING - "md: can not autostart based on faulty %s!\n", - bdev_partition_name(start_rdev->bdev)); - export_rdev(start_rdev); - return err; - } - list_add(&start_rdev->same_set, &pending_raid_disks); - - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - dev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (!dev) - continue; - if (dev == startdev) - continue; - rdev = md_import_device(dev, 0, 0); - if (IS_ERR(rdev)) { - printk(KERN_WARNING "md: could not import %s," - " trying to run array nevertheless.\n", - partition_name(dev)); - continue; - } - list_add(&rdev->same_set, &pending_raid_disks); - } - - /* - * possibly return codes - */ - autorun_devices(); - return 0; - -} - - -static int get_version(void * arg) -{ - mdu_version_t ver; - - ver.major = MD_MAJOR_VERSION; - ver.minor = MD_MINOR_VERSION; - ver.patchlevel = MD_PATCHLEVEL_VERSION; - - if (copy_to_user(arg, &ver, sizeof(ver))) - return -EFAULT; - - return 0; -} - -static int get_array_info(mddev_t * mddev, void * arg) -{ - mdu_array_info_t info; - int nr,working,active,failed,spare; - mdk_rdev_t *rdev; - struct list_head *tmp; - - nr=working=active=failed=spare=0; - ITERATE_RDEV(mddev,rdev,tmp) { - nr++; - if (rdev->faulty) - failed++; - else { - working++; - if (rdev->in_sync) - active++; - else - spare++; - } - } - - info.major_version = mddev->major_version; - info.minor_version = mddev->minor_version; - info.patch_version = 1; - info.ctime = mddev->ctime; - info.level = mddev->level; - info.size = mddev->size; - info.nr_disks = nr; - info.raid_disks = mddev->raid_disks; - info.md_minor = mddev->__minor; - info.not_persistent= !mddev->persistent; - - info.utime = mddev->utime; - info.state = 0; - if (mddev->in_sync) - info.state = (1<layout; - info.chunk_size = mddev->chunk_size; - - if (copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} - -static int get_disk_info(mddev_t * mddev, void * arg) -{ - mdu_disk_info_t info; - unsigned int nr; - mdk_rdev_t *rdev; - - if (copy_from_user(&info, arg, sizeof(info))) - return -EFAULT; - - nr = info.number; - - rdev = find_rdev_nr(mddev, nr); - if (rdev) { - info.major = MAJOR(rdev->bdev->bd_dev); - info.minor = MINOR(rdev->bdev->bd_dev); - info.raid_disk = rdev->raid_disk; - info.state = 0; - if (rdev->faulty) - info.state |= (1<in_sync) { - info.state |= (1<major,info->minor); - if (!mddev->raid_disks) { - int err; - /* expecting a device which has a superblock */ - rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: md_import_device returned %ld\n", - PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - if (!list_empty(&mddev->disks)) { - mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, - mdk_rdev_t, same_set); - int err = super_types[mddev->major_version] - .load_super(rdev, rdev0, mddev->minor_version); - if (err < 0) { - printk(KERN_WARNING - "md: %s has different UUID to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(rdev0->bdev)); - export_rdev(rdev); - return -EINVAL; - } - } - err = bind_rdev_to_array(rdev, mddev); - if (err) - export_rdev(rdev); - return err; - } - - /* - * add_new_disk can be used once the array is assembled - * to add "hot spares". They must already have a superblock - * written - */ - if (mddev->pers) { - int err; - if (!mddev->pers->hot_add_disk) { - printk(KERN_WARNING - "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - rdev = md_import_device(dev, mddev->major_version, - mddev->minor_version); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: md_import_device returned %ld\n", - PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - rdev->in_sync = 0; /* just to be sure */ - rdev->raid_disk = -1; - err = bind_rdev_to_array(rdev, mddev); - if (err) - export_rdev(rdev); - if (mddev->thread) - md_wakeup_thread(mddev->thread); - return err; - } - - /* otherwise, add_new_disk is only allowed - * for major_version==0 superblocks - */ - if (mddev->major_version != 0) { - printk(KERN_WARNING "md%d: ADD_NEW_DISK not supported\n", - mdidx(mddev)); - return -EINVAL; - } - - if (!(info->state & (1<desc_nr = info->number; - if (info->raid_disk < mddev->raid_disks) - rdev->raid_disk = info->raid_disk; - else - rdev->raid_disk = -1; - - rdev->faulty = 0; - if (rdev->raid_disk < mddev->raid_disks) - rdev->in_sync = (info->state & (1<in_sync = 0; - - err = bind_rdev_to_array(rdev, mddev); - if (err) { - export_rdev(rdev); - return err; - } - - if (!mddev->persistent) { - printk(KERN_INFO "md: nonpersistent superblock ...\n"); - rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - } else - rdev->sb_offset = calc_dev_sboffset(rdev->bdev); - rdev->size = calc_dev_size(rdev, mddev->chunk_size); - - if (!mddev->size || (mddev->size > rdev->size)) - mddev->size = rdev->size; - } - - return 0; -} - -static int hot_generate_error(mddev_t * mddev, dev_t dev) -{ - struct request_queue *q; - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to generate %s error in md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - return -ENXIO; - } - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - if (!rdev->in_sync) - return -ENODEV; - - q = bdev_get_queue(rdev->bdev); - if (!q) { - MD_BUG(); - return -ENODEV; - } - printk(KERN_INFO "md: okay, generating error!\n"); -// q->oneshot_error = 1; // disabled for now - - return 0; -} - -static int hot_remove_disk(mddev_t * mddev, dev_t dev) -{ - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to remove %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) - return -ENXIO; - - if (rdev->raid_disk >= 0) - goto busy; - - kick_rdev_from_array(rdev); - md_update_sb(mddev); - - return 0; -busy: - printk(KERN_WARNING "md: cannot remove active disk %s from md%d ... \n", - bdev_partition_name(rdev->bdev), mdidx(mddev)); - return -EBUSY; -} - -static int hot_add_disk(mddev_t * mddev, dev_t dev) -{ - int err; - unsigned int size; - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to hot-add %s to md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (mddev->major_version != 0) { - printk(KERN_WARNING "md%d: HOT_ADD may only be used with" - " version-0 superblocks.\n", - mdidx(mddev)); - return -EINVAL; - } - if (!mddev->pers->hot_add_disk) { - printk(KERN_WARNING - "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - rdev = md_import_device (dev, -1, 0); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: error, md_import_device() returned %ld\n", - PTR_ERR(rdev)); - return -EINVAL; - } - - rdev->sb_offset = calc_dev_sboffset(rdev->bdev); - size = calc_dev_size(rdev, mddev->chunk_size); - rdev->size = size; - - if (size < mddev->size) { - printk(KERN_WARNING - "md%d: disk size %llu blocks < array size %llu\n", - mdidx(mddev), (unsigned long long)size, - (unsigned long long)mddev->size); - err = -ENOSPC; - goto abort_export; - } - - if (rdev->faulty) { - printk(KERN_WARNING - "md: can not hot-add faulty %s disk to md%d!\n", - bdev_partition_name(rdev->bdev), mdidx(mddev)); - err = -EINVAL; - goto abort_export; - } - rdev->in_sync = 0; - rdev->desc_nr = -1; - bind_rdev_to_array(rdev, mddev); - - /* - * The rest should better be atomic, we can have disk failures - * noticed in interrupt contexts ... - */ - - if (rdev->desc_nr == mddev->max_disks) { - printk(KERN_WARNING "md%d: can not hot-add to full array!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unbind_export; - } - - rdev->raid_disk = -1; - - md_update_sb(mddev); - - /* - * Kick recovery, maybe this spare has to be added to the - * array immediately. - */ - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - - return 0; - -abort_unbind_export: - unbind_rdev_from_array(rdev); - -abort_export: - export_rdev(rdev); - return err; -} - -/* - * set_array_info is used two different ways - * The original usage is when creating a new array. - * In this usage, raid_disks is > = and it together with - * level, size, not_persistent,layout,chunksize determine the - * shape of the array. - * This will always create an array with a type-0.90.0 superblock. - * The newer usage is when assembling an array. - * In this case raid_disks will be 0, and the major_version field is - * use to determine which style super-blocks are to be found on the devices. - * The minor and patch _version numbers are also kept incase the - * super_block handler wishes to interpret them. - */ -static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) -{ - - if (info->raid_disks == 0) { - /* just setting version number for superblock loading */ - if (info->major_version < 0 || - info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || - super_types[info->major_version].name == NULL) { - /* maybe try to auto-load a module? */ - printk(KERN_INFO - "md: superblock version %d not known\n", - info->major_version); - return -EINVAL; - } - mddev->major_version = info->major_version; - mddev->minor_version = info->minor_version; - mddev->patch_version = info->patch_version; - return 0; - } - mddev->major_version = MD_MAJOR_VERSION; - mddev->minor_version = MD_MINOR_VERSION; - mddev->patch_version = MD_PATCHLEVEL_VERSION; - mddev->ctime = get_seconds(); - - mddev->level = info->level; - mddev->size = info->size; - mddev->raid_disks = info->raid_disks; - /* don't set __minor, it is determined by which /dev/md* was - * openned - */ - if (info->state & (1<recovery_cp = MaxSector; - else - mddev->recovery_cp = 0; - mddev->persistent = ! info->not_persistent; - - mddev->layout = info->layout; - mddev->chunk_size = info->chunk_size; - - mddev->max_disks = MD_SB_DISKS; - - - /* - * Generate a 128 bit UUID - */ - get_random_bytes(mddev->uuid, 16); - - return 0; -} - -static int set_disk_faulty(mddev_t *mddev, dev_t dev) -{ - mdk_rdev_t *rdev; - - rdev = find_rdev(mddev, dev); - if (!rdev) - return 0; - - md_error(mddev, rdev); - return 1; -} - -static int md_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg) -{ - unsigned int minor; - int err = 0; - struct hd_geometry *loc = (struct hd_geometry *) arg; - mddev_t *mddev = NULL; - kdev_t dev; - - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; - - dev = inode->i_rdev; - minor = minor(dev); - if (minor >= MAX_MD_DEVS) { - MD_BUG(); - return -EINVAL; - } - - /* - * Commands dealing with the RAID driver but not any - * particular array: - */ - switch (cmd) - { - case RAID_VERSION: - err = get_version((void *)arg); - goto done; - - case PRINT_RAID_DEBUG: - err = 0; - md_print_devices(); - goto done; - -#ifndef MODULE - case RAID_AUTORUN: - err = 0; - autostart_arrays(); - goto done; -#endif - default:; - } - - /* - * Commands creating/starting a new array: - */ - - mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) { - BUG(); - goto abort; - } - - - if (cmd == START_ARRAY) { - /* START_ARRAY doesn't need to lock the array as autostart_array - * does the locking, and it could even be a different array - */ - err = autostart_array(arg); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name(arg)); - goto abort; - } - goto done; - } - - err = mddev_lock(mddev); - if (err) { - printk(KERN_INFO - "md: ioctl lock interrupted, reason %d, cmd %d\n", - err, cmd); - goto abort; - } - - switch (cmd) - { - case SET_ARRAY_INFO: - - if (!list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: array md%d already has disks!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - if (mddev->raid_disks) { - printk(KERN_WARNING - "md: array md%d already initialised!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - { - mdu_array_info_t info; - if (!arg) - memset(&info, 0, sizeof(info)); - else if (copy_from_user(&info, (void*)arg, sizeof(info))) { - err = -EFAULT; - goto abort_unlock; - } - err = set_array_info(mddev, &info); - if (err) { - printk(KERN_WARNING "md: couldn't set" - " array info. %d\n", err); - goto abort_unlock; - } - } - goto done_unlock; - - default:; - } - - /* - * Commands querying/configuring an existing array: - */ - /* if we are initialised yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */ - if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) { - err = -ENODEV; - goto abort_unlock; - } - - /* - * Commands even a read-only array can execute: - */ - switch (cmd) - { - case GET_ARRAY_INFO: - err = get_array_info(mddev, (void *)arg); - goto done_unlock; - - case GET_DISK_INFO: - err = get_disk_info(mddev, (void *)arg); - goto done_unlock; - - case RESTART_ARRAY_RW: - err = restart_array(mddev); - goto done_unlock; - - case STOP_ARRAY: - err = do_md_stop (mddev, 0); - goto done_unlock; - - case STOP_ARRAY_RO: - err = do_md_stop (mddev, 1); - goto done_unlock; - - /* - * We have a problem here : there is no easy way to give a CHS - * virtual geometry. We currently pretend that we have a 2 heads - * 4 sectors (with a BIG number of cylinders...). This drives - * dosfs just mad... ;-) - */ - case HDIO_GETGEO: - if (!loc) { - err = -EINVAL; - goto abort_unlock; - } - err = put_user (2, (char *) &loc->heads); - if (err) - goto abort_unlock; - err = put_user (4, (char *) &loc->sectors); - if (err) - goto abort_unlock; - err = put_user(get_capacity(disks[mdidx(mddev)])/8, - (short *) &loc->cylinders); - if (err) - goto abort_unlock; - err = put_user (get_start_sect(inode->i_bdev), - (long *) &loc->start); - goto done_unlock; - } - - /* - * The remaining ioctls are changing the state of the - * superblock, so we do not allow read-only arrays - * here: - */ - if (mddev->ro) { - err = -EROFS; - goto abort_unlock; - } - - switch (cmd) - { - case ADD_NEW_DISK: - { - mdu_disk_info_t info; - if (copy_from_user(&info, (void*)arg, sizeof(info))) - err = -EFAULT; - else - err = add_new_disk(mddev, &info); - goto done_unlock; - } - case HOT_GENERATE_ERROR: - err = hot_generate_error(mddev, arg); - goto done_unlock; - case HOT_REMOVE_DISK: - err = hot_remove_disk(mddev, arg); - goto done_unlock; - - case HOT_ADD_DISK: - err = hot_add_disk(mddev, arg); - goto done_unlock; - - case SET_DISK_FAULTY: - err = set_disk_faulty(mddev, arg); - goto done_unlock; - - case RUN_ARRAY: - { - err = do_md_run (mddev); - /* - * we have to clean up the mess if - * the array cannot be run for some - * reason ... - * ->pers will not be set, to superblock will - * not be updated. - */ - if (err) - do_md_stop (mddev, 0); - goto done_unlock; - } - - default: - if (_IOC_TYPE(cmd) == MD_MAJOR) - printk(KERN_WARNING "md: %s(pid %d) used" - " obsolete MD ioctl, upgrade your" - " software to use new ictls.\n", - current->comm, current->pid); - err = -EINVAL; - goto abort_unlock; - } - -done_unlock: -abort_unlock: - mddev_unlock(mddev); - - return err; -done: - if (err) - MD_BUG(); -abort: - return err; -} - -static int md_open(struct inode *inode, struct file *file) -{ - /* - * Succeed if we can find or allocate a mddev structure. - */ - mddev_t *mddev = mddev_find(minor(inode->i_rdev)); - int err = -ENOMEM; - - if (!mddev) - goto out; - - if ((err = mddev_lock(mddev))) - goto put; - - err = 0; - mddev_unlock(mddev); - inode->i_bdev->bd_inode->u.generic_ip = mddev_get(mddev); - put: - mddev_put(mddev); - out: - return err; -} - -static int md_release(struct inode *inode, struct file * file) -{ - mddev_t *mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) - BUG(); - mddev_put(mddev); - - return 0; -} - -static struct block_device_operations md_fops = -{ - .owner = THIS_MODULE, - .open = md_open, - .release = md_release, - .ioctl = md_ioctl, -}; - -int md_thread(void * arg) -{ - mdk_thread_t *thread = arg; - - lock_kernel(); - - /* - * Detach thread - */ - - daemonize(thread->name, mdidx(thread->mddev)); - - current->exit_signal = SIGCHLD; - allow_signal(SIGKILL); - thread->tsk = current; - - /* - * md_thread is a 'system-thread', it's priority should be very - * high. We avoid resource deadlocks individually in each - * raid personality. (RAID5 does preallocation) We also use RR and - * the very same RT priority as kswapd, thus we will never get - * into a priority inversion deadlock. - * - * we definitely have to have equal or higher priority than - * bdflush, otherwise bdflush will deadlock if there are too - * many dirty RAID5 blocks. - */ - unlock_kernel(); - - complete(thread->event); - while (thread->run) { - void (*run)(mddev_t *); - - wait_event_interruptible(thread->wqueue, - test_bit(THREAD_WAKEUP, &thread->flags)); - if (current->flags & PF_FREEZE) - refrigerator(PF_IOTHREAD); - - clear_bit(THREAD_WAKEUP, &thread->flags); - - run = thread->run; - if (run) { - run(thread->mddev); - blk_run_queues(); - } - if (signal_pending(current)) - flush_signals(current); - } - complete(thread->event); - return 0; -} - -void md_wakeup_thread(mdk_thread_t *thread) -{ - if (thread) { - dprintk("md: waking up MD thread %p.\n", thread); - set_bit(THREAD_WAKEUP, &thread->flags); - wake_up(&thread->wqueue); - } -} - -mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, - const char *name) -{ - mdk_thread_t *thread; - int ret; - struct completion event; - - thread = (mdk_thread_t *) kmalloc - (sizeof(mdk_thread_t), GFP_KERNEL); - if (!thread) - return NULL; - - memset(thread, 0, sizeof(mdk_thread_t)); - init_waitqueue_head(&thread->wqueue); - - init_completion(&event); - thread->event = &event; - thread->run = run; - thread->mddev = mddev; - thread->name = name; - ret = kernel_thread(md_thread, thread, 0); - if (ret < 0) { - kfree(thread); - return NULL; - } - wait_for_completion(&event); - return thread; -} - -void md_interrupt_thread(mdk_thread_t *thread) -{ - if (!thread->tsk) { - MD_BUG(); - return; - } - dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); - send_sig(SIGKILL, thread->tsk, 1); -} - -void md_unregister_thread(mdk_thread_t *thread) -{ - struct completion event; - - init_completion(&event); - - thread->event = &event; - thread->run = NULL; - thread->name = NULL; - md_interrupt_thread(thread); - wait_for_completion(&event); - kfree(thread); -} - -void md_error(mddev_t *mddev, mdk_rdev_t *rdev) -{ - dprintk("md_error dev:(%d:%d), rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", - MD_MAJOR,mdidx(mddev), - MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), - __builtin_return_address(0),__builtin_return_address(1), - __builtin_return_address(2),__builtin_return_address(3)); - - if (!mddev) { - MD_BUG(); - return; - } - - if (!rdev || rdev->faulty) - return; - if (!mddev->pers->error_handler) - return; - mddev->pers->error_handler(mddev,rdev); - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); -} - -/* seq_file implementation /proc/mdstat */ - -static void status_unused(struct seq_file *seq) -{ - int i = 0; - mdk_rdev_t *rdev; - struct list_head *tmp; - - seq_printf(seq, "unused devices: "); - - ITERATE_RDEV_PENDING(rdev,tmp) { - i++; - seq_printf(seq, "%s ", - bdev_partition_name(rdev->bdev)); - } - if (!i) - seq_printf(seq, ""); - - seq_printf(seq, "\n"); -} - - -static void status_resync(struct seq_file *seq, mddev_t * mddev) -{ - unsigned long max_blocks, resync, res, dt, db, rt; - - resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; - max_blocks = mddev->size; - - /* - * Should not happen. - */ - if (!max_blocks) { - MD_BUG(); - return; - } - res = (resync/1024)*1000/(max_blocks/1024 + 1); - { - int i, x = res/50, y = 20-x; - seq_printf(seq, "["); - for (i = 0; i < x; i++) - seq_printf(seq, "="); - seq_printf(seq, ">"); - for (i = 0; i < y; i++) - seq_printf(seq, "."); - seq_printf(seq, "] "); - } - seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", - (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? - "resync" : "recovery"), - res/10, res % 10, resync, max_blocks); - - /* - * We do not want to overflow, so the order of operands and - * the * 100 / 100 trick are important. We do a +1 to be - * safe against division by zero. We only estimate anyway. - * - * dt: time from mark until now - * db: blocks written from mark until now - * rt: remaining time - */ - dt = ((jiffies - mddev->resync_mark) / HZ); - if (!dt) dt++; - db = resync - (mddev->resync_mark_cnt/2); - rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; - - seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); - - seq_printf(seq, " speed=%ldK/sec", db/dt); -} - -static void *md_seq_start(struct seq_file *seq, loff_t *pos) -{ - struct list_head *tmp; - loff_t l = *pos; - mddev_t *mddev; - - if (l > 0x10000) - return NULL; - if (!l--) - /* header */ - return (void*)1; - - spin_lock(&all_mddevs_lock); - list_for_each(tmp,&all_mddevs) - if (!l--) { - mddev = list_entry(tmp, mddev_t, all_mddevs); - mddev_get(mddev); - spin_unlock(&all_mddevs_lock); - return mddev; - } - spin_unlock(&all_mddevs_lock); - return (void*)2;/* tail */ -} - -static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) -{ - struct list_head *tmp; - mddev_t *next_mddev, *mddev = v; - - ++*pos; - if (v == (void*)2) - return NULL; - - spin_lock(&all_mddevs_lock); - if (v == (void*)1) - tmp = all_mddevs.next; - else - tmp = mddev->all_mddevs.next; - if (tmp != &all_mddevs) - next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); - else { - next_mddev = (void*)2; - *pos = 0x10000; - } - spin_unlock(&all_mddevs_lock); - - if (v != (void*)1) - mddev_put(mddev); - return next_mddev; - -} - -static void md_seq_stop(struct seq_file *seq, void *v) -{ - mddev_t *mddev = v; - - if (mddev && v != (void*)1 && v != (void*)2) - mddev_put(mddev); -} - -static int md_seq_show(struct seq_file *seq, void *v) -{ - mddev_t *mddev = v; - sector_t size; - struct list_head *tmp2; - mdk_rdev_t *rdev; - int i; - - if (v == (void*)1) { - seq_printf(seq, "Personalities : "); - spin_lock(&pers_lock); - for (i = 0; i < MAX_PERSONALITY; i++) - if (pers[i]) - seq_printf(seq, "[%s] ", pers[i]->name); - - spin_unlock(&pers_lock); - seq_printf(seq, "\n"); - return 0; - } - if (v == (void*)2) { - status_unused(seq); - return 0; - } - - if (mddev_lock(mddev)!=0) - return -EINTR; - if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { - seq_printf(seq, "md%d : %sactive", mdidx(mddev), - mddev->pers ? "" : "in"); - if (mddev->pers) { - if (mddev->ro) - seq_printf(seq, " (read-only)"); - seq_printf(seq, " %s", mddev->pers->name); - } - - size = 0; - ITERATE_RDEV(mddev,rdev,tmp2) { - seq_printf(seq, " %s[%d]", - bdev_partition_name(rdev->bdev), rdev->desc_nr); - if (rdev->faulty) { - seq_printf(seq, "(F)"); - continue; - } - size += rdev->size; - } - - if (!list_empty(&mddev->disks)) { - if (mddev->pers) - seq_printf(seq, "\n %llu blocks", - (unsigned long long)mddev->array_size); - else - seq_printf(seq, "\n %llu blocks", - (unsigned long long)size); - } - - if (mddev->pers) { - mddev->pers->status (seq, mddev); - seq_printf(seq, "\n "); - if (mddev->curr_resync > 2) - status_resync (seq, mddev); - else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) - seq_printf(seq, " resync=DELAYED"); - } - - seq_printf(seq, "\n"); - } - mddev_unlock(mddev); - - return 0; -} - -static struct seq_operations md_seq_ops = { - .start = md_seq_start, - .next = md_seq_next, - .stop = md_seq_stop, - .show = md_seq_show, -}; - -static int md_seq_open(struct inode *inode, struct file *file) -{ - int error; - - error = seq_open(file, &md_seq_ops); - return error; -} - -static struct file_operations md_seq_fops = { - .open = md_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -int register_md_personality(int pnum, mdk_personality_t *p) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - spin_lock(&pers_lock); - if (pers[pnum]) { - spin_unlock(&pers_lock); - MD_BUG(); - return -EBUSY; - } - - pers[pnum] = p; - printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum); - spin_unlock(&pers_lock); - return 0; -} - -int unregister_md_personality(int pnum) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name); - spin_lock(&pers_lock); - pers[pnum] = NULL; - spin_unlock(&pers_lock); - return 0; -} - -void md_sync_acct(mdk_rdev_t *rdev, unsigned long nr_sectors) -{ - rdev->bdev->bd_contains->bd_disk->sync_io += nr_sectors; -} - -static int is_mddev_idle(mddev_t *mddev) -{ - mdk_rdev_t * rdev; - struct list_head *tmp; - int idle; - unsigned long curr_events; - - idle = 1; - ITERATE_RDEV(mddev,rdev,tmp) { - struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; - curr_events = disk_stat_read(disk, read_sectors) + - disk_stat_read(disk, write_sectors) - - disk->sync_io; - if ((curr_events - rdev->last_events) > 32) { - rdev->last_events = curr_events; - idle = 0; - } - } - return idle; -} - -void md_done_sync(mddev_t *mddev, int blocks, int ok) -{ - /* another "blocks" (512byte) blocks have been synced */ - atomic_sub(blocks, &mddev->recovery_active); - wake_up(&mddev->recovery_wait); - if (!ok) { - set_bit(MD_RECOVERY_ERR, &mddev->recovery); - md_wakeup_thread(mddev->thread); - // stop recovery, signal do_sync .... - } -} - - -void md_write_start(mddev_t *mddev) -{ - if (!atomic_read(&mddev->writes_pending)) { - mddev_lock_uninterruptible(mddev); - if (mddev->in_sync) { - mddev->in_sync = 0; - del_timer(&mddev->safemode_timer); - md_update_sb(mddev); - } - atomic_inc(&mddev->writes_pending); - mddev_unlock(mddev); - } else - atomic_inc(&mddev->writes_pending); -} - -void md_write_end(mddev_t *mddev) -{ - if (atomic_dec_and_test(&mddev->writes_pending)) { - if (mddev->safemode == 2) - md_wakeup_thread(mddev->thread); - else - mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); - } -} - -static inline void md_enter_safemode(mddev_t *mddev) -{ - mddev_lock_uninterruptible(mddev); - if (mddev->safemode && !atomic_read(&mddev->writes_pending) && - !mddev->in_sync && mddev->recovery_cp == MaxSector) { - mddev->in_sync = 1; - md_update_sb(mddev); - } - mddev_unlock(mddev); - - if (mddev->safemode == 1) - mddev->safemode = 0; -} - -void md_handle_safemode(mddev_t *mddev) -{ - if (signal_pending(current)) { - printk(KERN_INFO "md: md%d in immediate safe mode\n", - mdidx(mddev)); - mddev->safemode = 2; - flush_signals(current); - } - if (mddev->safemode) - md_enter_safemode(mddev); -} - - -DECLARE_WAIT_QUEUE_HEAD(resync_wait); - -#define SYNC_MARKS 10 -#define SYNC_MARK_STEP (3*HZ) -static void md_do_sync(mddev_t *mddev) -{ - mddev_t *mddev2; - unsigned int max_sectors, currspeed = 0, - j, window; - unsigned long mark[SYNC_MARKS]; - unsigned long mark_cnt[SYNC_MARKS]; - int last_mark,m; - struct list_head *tmp; - unsigned long last_check; - - /* just incase thread restarts... */ - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) - return; - - /* we overload curr_resync somewhat here. - * 0 == not engaged in resync at all - * 2 == checking that there is no conflict with another sync - * 1 == like 2, but have yielded to allow conflicting resync to - * commense - * other == active in resync - this many blocks - */ - do { - mddev->curr_resync = 2; - - ITERATE_MDDEV(mddev2,tmp) { - if (mddev2 == mddev) - continue; - if (mddev2->curr_resync && - match_mddev_units(mddev,mddev2)) { - printk(KERN_INFO "md: delaying resync of md%d" - " until md%d has finished resync (they" - " share one or more physical units)\n", - mdidx(mddev), mdidx(mddev2)); - if (mddev < mddev2) {/* arbitrarily yield */ - mddev->curr_resync = 1; - wake_up(&resync_wait); - } - if (wait_event_interruptible(resync_wait, - mddev2->curr_resync < mddev->curr_resync)) { - flush_signals(current); - mddev_put(mddev2); - goto skip; - } - } - if (mddev->curr_resync == 1) { - mddev_put(mddev2); - break; - } - } - } while (mddev->curr_resync < 2); - - max_sectors = mddev->size << 1; - - printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); - printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" - " %d KB/sec/disc.\n", sysctl_speed_limit_min); - printk(KERN_INFO "md: using maximum available idle IO bandwith " - "(but not more than %d KB/sec) for reconstruction.\n", - sysctl_speed_limit_max); - - is_mddev_idle(mddev); /* this also initializes IO event counters */ - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) - j = mddev->recovery_cp; - else - j = 0; - for (m = 0; m < SYNC_MARKS; m++) { - mark[m] = jiffies; - mark_cnt[m] = j; - } - last_mark = 0; - mddev->resync_mark = mark[last_mark]; - mddev->resync_mark_cnt = mark_cnt[last_mark]; - - /* - * Tune reconstruction: - */ - window = 32*(PAGE_SIZE/512); - printk(KERN_INFO "md: using %dk window, over a total of %d blocks.\n", - window/2,max_sectors/2); - - atomic_set(&mddev->recovery_active, 0); - init_waitqueue_head(&mddev->recovery_wait); - last_check = 0; - - if (j) - printk(KERN_INFO - "md: resuming recovery of md%d from checkpoint.\n", - mdidx(mddev)); - - while (j < max_sectors) { - int sectors; - - sectors = mddev->pers->sync_request(mddev, j, currspeed < sysctl_speed_limit_min); - if (sectors < 0) { - set_bit(MD_RECOVERY_ERR, &mddev->recovery); - goto out; - } - atomic_add(sectors, &mddev->recovery_active); - j += sectors; - if (j>1) mddev->curr_resync = j; - - if (last_check + window > j) - continue; - - last_check = j; - - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || - test_bit(MD_RECOVERY_ERR, &mddev->recovery)) - break; - - blk_run_queues(); - - repeat: - if (jiffies >= mark[last_mark] + SYNC_MARK_STEP ) { - /* step marks */ - int next = (last_mark+1) % SYNC_MARKS; - - mddev->resync_mark = mark[next]; - mddev->resync_mark_cnt = mark_cnt[next]; - mark[next] = jiffies; - mark_cnt[next] = j - atomic_read(&mddev->recovery_active); - last_mark = next; - } - - - if (signal_pending(current)) { - /* - * got a signal, exit. - */ - printk(KERN_INFO - "md: md_do_sync() got signal ... exiting\n"); - flush_signals(current); - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - goto out; - } - - /* - * this loop exits only if either when we are slower than - * the 'hard' speed limit, or the system was IO-idle for - * a jiffy. - * the system might be non-idle CPU-wise, but we only care - * about not overloading the IO subsystem. (things like an - * e2fsck being done on the RAID array should execute fast) - */ - cond_resched(); - - currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; - - if (currspeed > sysctl_speed_limit_min) { - if ((currspeed > sysctl_speed_limit_max) || - !is_mddev_idle(mddev)) { - current->state = TASK_INTERRUPTIBLE; - schedule_timeout(HZ/4); - goto repeat; - } - } - } - printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev)); - /* - * this also signals 'finished resyncing' to md_stop - */ - out: - wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); - - /* tell personality that we are finished */ - mddev->pers->sync_request(mddev, max_sectors, 1); - - if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && - mddev->curr_resync > 2 && - mddev->curr_resync > mddev->recovery_cp) { - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { - printk(KERN_INFO - "md: checkpointing recovery of md%d.\n", - mdidx(mddev)); - mddev->recovery_cp = mddev->curr_resync; - } else - mddev->recovery_cp = MaxSector; - } - - if (mddev->safemode) - md_enter_safemode(mddev); - skip: - mddev->curr_resync = 0; - set_bit(MD_RECOVERY_DONE, &mddev->recovery); - md_wakeup_thread(mddev->thread); -} - - -/* - * This routine is regularly called by all per-raid-array threads to - * deal with generic issues like resync and super-block update. - * Raid personalities that don't have a thread (linear/raid0) do not - * need this as they never do any recovery or update the superblock. - * - * It does not do any resync itself, but rather "forks" off other threads - * to do that as needed. - * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in - * "->recovery" and create a thread at ->sync_thread. - * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) - * and wakeups up this thread which will reap the thread and finish up. - * This thread also removes any faulty devices (with nr_pending == 0). - * - * The overall approach is: - * 1/ if the superblock needs updating, update it. - * 2/ If a recovery thread is running, don't do anything else. - * 3/ If recovery has finished, clean up, possibly marking spares active. - * 4/ If there are any faulty devices, remove them. - * 5/ If array is degraded, try to add spares devices - * 6/ If array has spares or is not in-sync, start a resync thread. - */ -void md_check_recovery(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct list_head *rtmp; - - - dprintk(KERN_INFO "md: recovery thread got woken up ...\n"); - - if (mddev->ro) - return; - if ( ! ( - mddev->sb_dirty || - test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || - test_bit(MD_RECOVERY_DONE, &mddev->recovery) - )) - return; - if (mddev_trylock(mddev)==0) { - int spares =0; - if (mddev->sb_dirty) - md_update_sb(mddev); - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && - !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) - /* resync/recovery still happening */ - goto unlock; - if (mddev->sync_thread) { - /* resync has finished, collect result */ - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery)) { - /* success...*/ - /* activate any spares */ - mddev->pers->spare_active(mddev); - } - md_update_sb(mddev); - mddev->recovery = 0; - wake_up(&resync_wait); - goto unlock; - } - if (mddev->recovery) { - /* that's odd.. */ - mddev->recovery = 0; - wake_up(&resync_wait); - } - - /* no recovery is running. - * remove any failed drives, then - * add spares if possible - */ - ITERATE_RDEV(mddev,rdev,rtmp) { - if (rdev->raid_disk >= 0 && - rdev->faulty && - atomic_read(&rdev->nr_pending)==0) { - mddev->pers->hot_remove_disk(mddev, rdev->raid_disk); - rdev->raid_disk = -1; - } - if (!rdev->faulty && rdev->raid_disk >= 0 && !rdev->in_sync) - spares++; - } - if (mddev->degraded) { - ITERATE_RDEV(mddev,rdev,rtmp) - if (rdev->raid_disk < 0 - && !rdev->faulty) { - if (mddev->pers->hot_add_disk(mddev,rdev)) - spares++; - else - break; - } - } - - if (!spares && (mddev->recovery_cp == MaxSector )) { - /* nothing we can do ... */ - goto unlock; - } - if (mddev->pers->sync_request) { - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - if (!spares) - set_bit(MD_RECOVERY_SYNC, &mddev->recovery); - mddev->sync_thread = md_register_thread(md_do_sync, - mddev, - "md%d_resync"); - if (!mddev->sync_thread) { - printk(KERN_ERR "md%d: could not start resync" - " thread...\n", - mdidx(mddev)); - /* leave the spares where they are, it shouldn't hurt */ - mddev->recovery = 0; - } else { - md_wakeup_thread(mddev->sync_thread); - } - } - unlock: - mddev_unlock(mddev); - } -} - -int md_notify_reboot(struct notifier_block *this, - unsigned long code, void *x) -{ - struct list_head *tmp; - mddev_t *mddev; - - if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { - - printk(KERN_INFO "md: stopping all md devices.\n"); - - ITERATE_MDDEV(mddev,tmp) - if (mddev_trylock(mddev)==0) - do_md_stop (mddev, 1); - /* - * certain more exotic SCSI devices are known to be - * volatile wrt too early system reboots. While the - * right place to handle this issue is the given - * driver, we do want to have a safe RAID driver ... - */ - mdelay(1000*1); - } - return NOTIFY_DONE; -} - -struct notifier_block md_notifier = { - .notifier_call = md_notify_reboot, - .next = NULL, - .priority = INT_MAX, /* before any real devices */ -}; - -static void md_geninit(void) -{ - struct proc_dir_entry *p; - - dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); - -#ifdef CONFIG_PROC_FS - p = create_proc_entry("mdstat", S_IRUGO, NULL); - if (p) - p->proc_fops = &md_seq_fops; -#endif -} - -int __init md_init(void) -{ - int minor; - - printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," - " MD_SB_DISKS=%d\n", - MD_MAJOR_VERSION, MD_MINOR_VERSION, - MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); - - if (register_blkdev(MAJOR_NR, "md")) - return -1; - - devfs_mk_dir("md"); - blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, - md_probe, NULL, NULL); - for (minor=0; minor < MAX_MD_DEVS; ++minor) { - char name[16]; - sprintf(name, "md/%d", minor); - devfs_register(NULL, name, DEVFS_FL_DEFAULT, MAJOR_NR, minor, - S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); - } - - register_reboot_notifier(&md_notifier); - raid_table_header = register_sysctl_table(raid_root_table, 1); - - md_geninit(); - return (0); -} - - -#ifndef MODULE - -/* - * Searches all registered partitions for autorun RAID arrays - * at boot time. - */ -static dev_t detected_devices[128]; -static int dev_cnt; - -void md_autodetect_dev(dev_t dev) -{ - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; -} - - -static void autostart_arrays(void) -{ - mdk_rdev_t *rdev; - int i; - - printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); - - for (i = 0; i < dev_cnt; i++) { - dev_t dev = detected_devices[i]; - - rdev = md_import_device(dev,0, 0); - if (IS_ERR(rdev)) { - printk(KERN_ALERT "md: could not import %s!\n", - partition_name(dev)); - continue; - } - if (rdev->faulty) { - MD_BUG(); - continue; - } - list_add(&rdev->same_set, &pending_raid_disks); - } - dev_cnt = 0; - - autorun_devices(); -} - -#endif - -static __exit void md_exit(void) -{ - int i; - blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); - for (i=0; i < MAX_MD_DEVS; i++) - devfs_remove("md/%d", i); - devfs_remove("md"); - - unregister_blkdev(MAJOR_NR,"md"); - unregister_reboot_notifier(&md_notifier); - unregister_sysctl_table(raid_table_header); -#ifdef CONFIG_PROC_FS - remove_proc_entry("mdstat", NULL); -#endif - for (i = 0; i < MAX_MD_DEVS; i++) { - struct gendisk *disk = disks[i]; - mddev_t *mddev; - if (!disks[i]) - continue; - mddev = disk->private_data; - del_gendisk(disk); - put_disk(disk); - mddev_put(mddev); - } -} - -module_init(md_init) -module_exit(md_exit) - -EXPORT_SYMBOL(register_md_personality); -EXPORT_SYMBOL(unregister_md_personality); -EXPORT_SYMBOL(md_error); -EXPORT_SYMBOL(md_sync_acct); -EXPORT_SYMBOL(md_done_sync); -EXPORT_SYMBOL(md_write_start); -EXPORT_SYMBOL(md_write_end); -EXPORT_SYMBOL(md_handle_safemode); -EXPORT_SYMBOL(md_register_thread); -EXPORT_SYMBOL(md_unregister_thread); -EXPORT_SYMBOL(md_wakeup_thread); -EXPORT_SYMBOL(md_print_devices); -EXPORT_SYMBOL(md_interrupt_thread); -EXPORT_SYMBOL(md_check_recovery); -MODULE_LICENSE("GPL"); ./linux/md/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ orig.tmp 2020-03-09 16:05:10.983493775 +0000 @@ -1436,6 +1436,88 @@ return 1; } +static int device_size_calculation(mddev_t * mddev) +{ + int data_disks = 0; + unsigned int readahead; + struct list_head *tmp; + mdk_rdev_t *rdev; + + /* + * Do device size calculation. Bail out if too small. + * (we have to do this after having validated chunk_size, + * because device size has to be modulo chunk_size) + */ + + ITERATE_RDEV(mddev,rdev,tmp) { + if (rdev->faulty) + continue; + if (rdev->size < mddev->chunk_size / 1024) { + printk(KERN_WARNING + "md: Dev %s smaller than chunk_size:" + " %lluk < %dk\n", + bdev_partition_name(rdev->bdev), + (unsigned long long)rdev->size, + mddev->chunk_size / 1024); + return -EINVAL; + } + } + + switch (mddev->level) { + case LEVEL_MULTIPATH: + data_disks = 1; + break; + case -3: + data_disks = 1; + break; + case -2: + data_disks = 1; + break; + case LEVEL_LINEAR: + zoned_raid_size(mddev); + data_disks = 1; + break; + case 0: + zoned_raid_size(mddev); + data_disks = mddev->raid_disks; + break; + case 1: + data_disks = 1; + break; + case 4: + case 5: + data_disks = mddev->raid_disks-1; + break; + default: + printk(KERN_ERR "md: md%d: unsupported raid level %d\n", + mdidx(mddev), mddev->level); + goto abort; + } + if (!md_size[mdidx(mddev)]) + md_size[mdidx(mddev)] = mddev->size * data_disks; + + readahead = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE; + if (!mddev->level || (mddev->level == 4) || (mddev->level == 5)) { + readahead = (mddev->chunk_size>>PAGE_SHIFT) * 4 * data_disks; + if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) + readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; + } else { + // (no multipath branch - it uses the default setting) + if (mddev->level == -3) + readahead = 0; + } + + printk(KERN_INFO "md%d: max total readahead window set to %ldk\n", + mdidx(mddev), readahead*(PAGE_SIZE/1024)); + + printk(KERN_INFO + "md%d: %d data-disks, max readahead per data-disk: %ldk\n", + mdidx(mddev), data_disks, readahead/data_disks*(PAGE_SIZE/1024)); + return 0; +abort: + return 1; +} + static struct gendisk *md_probe(dev_t dev, int *part, void *data) { static DECLARE_MUTEX(disks_sem); @@ -1567,6 +1649,9 @@ } #endif + if (device_size_calculation(mddev)) + return -EINVAL; + /* * Drop all container device buffers, from now on * the only valid external interface is through the md rm: cannot remove 'orig.tmp.porig': No such file or directory ./linux/md/replace FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- rediff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.000605527 +0000 @@ -1,101 +0,0 @@ -@@ -1453,90 +1436,6 @@ - return 1; - } - --#undef OLD_LEVEL -- --static int device_size_calculation(mddev_t * mddev) --{ -- int data_disks = 0; -- unsigned int readahead; -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- -- /* -- * Do device size calculation. Bail out if too small. -- * (we have to do this after having validated chunk_size, -- * because device size has to be modulo chunk_size) -- */ -- -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (rdev->faulty) -- continue; -- if (rdev->size < mddev->chunk_size / 1024) { -- printk(KERN_WARNING -- "md: Dev %s smaller than chunk_size:" -- " %lluk < %dk\n", -- bdev_partition_name(rdev->bdev), -- (unsigned long long)rdev->size, -- mddev->chunk_size / 1024); -- return -EINVAL; -- } -- } -- -- switch (mddev->level) { -- case LEVEL_MULTIPATH: -- data_disks = 1; -- break; -- case -3: -- data_disks = 1; -- break; -- case -2: -- data_disks = 1; -- break; -- case LEVEL_LINEAR: -- zoned_raid_size(mddev); -- data_disks = 1; -- break; -- case 0: -- zoned_raid_size(mddev); -- data_disks = mddev->raid_disks; -- break; -- case 1: -- data_disks = 1; -- break; -- case 4: -- case 5: -- data_disks = mddev->raid_disks-1; -- break; -- default: -- printk(KERN_ERR "md: md%d: unsupported raid level %d\n", -- mdidx(mddev), mddev->level); -- goto abort; -- } -- if (!md_size[mdidx(mddev)]) -- md_size[mdidx(mddev)] = mddev->size * data_disks; -- -- readahead = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE; -- if (!mddev->level || (mddev->level == 4) || (mddev->level == 5)) { -- readahead = (mddev->chunk_size>>PAGE_SHIFT) * 4 * data_disks; -- if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) -- readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; -- } else { -- // (no multipath branch - it uses the default setting) -- if (mddev->level == -3) -- readahead = 0; -- } -- -- printk(KERN_INFO "md%d: max total readahead window set to %ldk\n", -- mdidx(mddev), readahead*(PAGE_SIZE/1024)); -- -- printk(KERN_INFO -- "md%d: %d data-disks, max readahead per data-disk: %ldk\n", -- mdidx(mddev), data_disks, readahead/data_disks*(PAGE_SIZE/1024)); -- return 0; --abort: -- return 1; --} -- - static struct gendisk *md_probe(dev_t dev, int *part, void *data) - { - static DECLARE_MUTEX(disks_sem); -@@ -1664,9 +1571,6 @@ - } - } - -- if (device_size_calculation(mddev)) -- return -EINVAL; -- - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md ./linux/md/rediff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.011248100 +0000 @@ -1,3589 +0,0 @@ -/* - md.c : Multiple Devices driver for Linux - Copyright (C) 1998, 1999, 2000 Ingo Molnar - - completely rewritten, based on the MD driver code from Marc Zyngier - - Changes: - - - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar - - boot support for linear and striped mode by Harald Hoyer - - kerneld support by Boris Tobotras - - kmod support by: Cyrus Durgin - - RAID0 bugfixes: Mark Anthony Lisher - - Devfs support by Richard Gooch - - - lots of fixes and improvements to the RAID1/RAID5 and generic - RAID code (such as request based resynchronization): - - Neil Brown . - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - You should have received a copy of the GNU General Public License - (for example /usr/src/linux/COPYING); if not, write to the Free - Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -*/ - -#include -#include -#include -#include -#include -#include -#include -#include /* for invalidate_bdev */ -#include - -#include - -#ifdef CONFIG_KMOD -#include -#endif - -#define __KERNEL_SYSCALLS__ -#include - -#include - -#define MAJOR_NR MD_MAJOR -#define MD_DRIVER -#define DEVICE_NR(device) (minor(device)) - -#include - -#define DEBUG 0 -#define dprintk(x...) ((void)(DEBUG && printk(x))) - - -#ifndef MODULE -static void autostart_arrays (void); -#endif - -static mdk_personality_t *pers[MAX_PERSONALITY]; -static spinlock_t pers_lock = SPIN_LOCK_UNLOCKED; - -/* - * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' - * is 1000 KB/sec, so the extra system load does not show up that much. - * Increase it if you want to have more _guaranteed_ speed. Note that - * the RAID driver will use the maximum available bandwith if the IO - * subsystem is idle. There is also an 'absolute maximum' reconstruction - * speed limit - in case reconstruction slows down your system despite - * idle IO detection. - * - * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. - */ - -static int sysctl_speed_limit_min = 1000; -static int sysctl_speed_limit_max = 200000; - -static struct ctl_table_header *raid_table_header; - -static ctl_table raid_table[] = { - { - .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, - .procname = "speed_limit_min", - .data = &sysctl_speed_limit_min, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, - { - .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, - .procname = "speed_limit_max", - .data = &sysctl_speed_limit_max, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, - { .ctl_name = 0 } -}; - -static ctl_table raid_dir_table[] = { - { - .ctl_name = DEV_RAID, - .procname = "raid", - .maxlen = 0, - .mode = 0555, - .child = raid_table, - }, - { .ctl_name = 0 } -}; - -static ctl_table raid_root_table[] = { - { - .ctl_name = CTL_DEV, - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = raid_dir_table, - }, - { .ctl_name = 0 } -}; - -static struct block_device_operations md_fops; - -static struct gendisk *disks[MAX_MD_DEVS]; - -/* - * Enables to iterate over all existing md arrays - * all_mddevs_lock protects this list as well as mddev_map. - */ -static LIST_HEAD(all_mddevs); -static spinlock_t all_mddevs_lock = SPIN_LOCK_UNLOCKED; - - -/* - * iterates through all used mddevs in the system. - * We take care to grab the all_mddevs_lock whenever navigating - * the list, and to always hold a refcount when unlocked. - * Any code which breaks out of this loop while own - * a reference to the current mddev and must mddev_put it. - */ -#define ITERATE_MDDEV(mddev,tmp) \ - \ - for (({ spin_lock(&all_mddevs_lock); \ - tmp = all_mddevs.next; \ - mddev = NULL;}); \ - ({ if (tmp != &all_mddevs) \ - mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ - spin_unlock(&all_mddevs_lock); \ - if (mddev) mddev_put(mddev); \ - mddev = list_entry(tmp, mddev_t, all_mddevs); \ - tmp != &all_mddevs;}); \ - ({ spin_lock(&all_mddevs_lock); \ - tmp = tmp->next;}) \ - ) - -static mddev_t *mddev_map[MAX_MD_DEVS]; - -static int md_fail_request (request_queue_t *q, struct bio *bio) -{ - bio_io_error(bio, bio->bi_size); - return 0; -} - -static inline mddev_t *mddev_get(mddev_t *mddev) -{ - atomic_inc(&mddev->active); - return mddev; -} - -static void mddev_put(mddev_t *mddev) -{ - if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) - return; - if (!mddev->raid_disks && list_empty(&mddev->disks)) { - list_del(&mddev->all_mddevs); - mddev_map[mdidx(mddev)] = NULL; - kfree(mddev); - MOD_DEC_USE_COUNT; - } - spin_unlock(&all_mddevs_lock); -} - -static mddev_t * mddev_find(int unit) -{ - mddev_t *mddev, *new = NULL; - - retry: - spin_lock(&all_mddevs_lock); - if (mddev_map[unit]) { - mddev = mddev_get(mddev_map[unit]); - spin_unlock(&all_mddevs_lock); - if (new) - kfree(new); - return mddev; - } - if (new) { - mddev_map[unit] = new; - list_add(&new->all_mddevs, &all_mddevs); - spin_unlock(&all_mddevs_lock); - MOD_INC_USE_COUNT; - return new; - } - spin_unlock(&all_mddevs_lock); - - new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - memset(new, 0, sizeof(*new)); - - new->__minor = unit; - init_MUTEX(&new->reconfig_sem); - INIT_LIST_HEAD(&new->disks); - INIT_LIST_HEAD(&new->all_mddevs); - init_timer(&new->safemode_timer); - atomic_set(&new->active, 1); - blk_queue_make_request(&new->queue, md_fail_request); - - goto retry; -} - -static inline int mddev_lock(mddev_t * mddev) -{ - return down_interruptible(&mddev->reconfig_sem); -} - -static inline void mddev_lock_uninterruptible(mddev_t * mddev) -{ - down(&mddev->reconfig_sem); -} - -static inline int mddev_trylock(mddev_t * mddev) -{ - return down_trylock(&mddev->reconfig_sem); -} - -static inline void mddev_unlock(mddev_t * mddev) -{ - up(&mddev->reconfig_sem); -} - -mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) -{ - mdk_rdev_t * rdev; - struct list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == nr) - return rdev; - } - return NULL; -} - -static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->bdev->bd_dev == dev) - return rdev; - } - return NULL; -} - -inline static sector_t calc_dev_sboffset(struct block_device *bdev) -{ - sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - return MD_NEW_SIZE_BLOCKS(size); -} - -static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) -{ - sector_t size; - - size = rdev->sb_offset; - - if (chunk_size) - size &= ~((sector_t)chunk_size/1024 - 1); - return size; -} - -static int alloc_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) - MD_BUG(); - - rdev->sb_page = alloc_page(GFP_KERNEL); - if (!rdev->sb_page) { - printk(KERN_ALERT "md: out of memory.\n"); - return -EINVAL; - } - - return 0; -} - -static void free_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) { - page_cache_release(rdev->sb_page); - rdev->sb_loaded = 0; - rdev->sb_page = NULL; - rdev->sb_offset = 0; - rdev->size = 0; - } -} - - -static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) -{ - if (bio->bi_size) - return 1; - - complete((struct completion*)bio->bi_private); - return 0; -} - -static int sync_page_io(struct block_device *bdev, sector_t sector, int size, - struct page *page, int rw) -{ - struct bio bio; - struct bio_vec vec; - struct completion event; - - bio_init(&bio); - bio.bi_io_vec = &vec; - vec.bv_page = page; - vec.bv_len = size; - vec.bv_offset = 0; - bio.bi_vcnt = 1; - bio.bi_idx = 0; - bio.bi_size = size; - bio.bi_bdev = bdev; - bio.bi_sector = sector; - init_completion(&event); - bio.bi_private = &event; - bio.bi_end_io = bi_complete; - submit_bio(rw, &bio); - blk_run_queues(); - wait_for_completion(&event); - - return test_bit(BIO_UPTODATE, &bio.bi_flags); -} - -static int read_disk_sb(mdk_rdev_t * rdev) -{ - - if (!rdev->sb_page) { - MD_BUG(); - return -EINVAL; - } - if (rdev->sb_loaded) - return 0; - - - if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, READ)) - goto fail; - rdev->sb_loaded = 1; - return 0; - -fail: - printk(KERN_ERR "md: disabled device %s, could not read superblock.\n", - bdev_partition_name(rdev->bdev)); - return -EINVAL; -} - -static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - if ( (sb1->set_uuid0 == sb2->set_uuid0) && - (sb1->set_uuid1 == sb2->set_uuid1) && - (sb1->set_uuid2 == sb2->set_uuid2) && - (sb1->set_uuid3 == sb2->set_uuid3)) - - return 1; - - return 0; -} - - -static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - int ret; - mdp_super_t *tmp1, *tmp2; - - tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); - tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); - - if (!tmp1 || !tmp2) { - ret = 0; - printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); - goto abort; - } - - *tmp1 = *sb1; - *tmp2 = *sb2; - - /* - * nr_disks is not constant - */ - tmp1->nr_disks = 0; - tmp2->nr_disks = 0; - - if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) - ret = 0; - else - ret = 1; - -abort: - if (tmp1) - kfree(tmp1); - if (tmp2) - kfree(tmp2); - - return ret; -} - -static unsigned int calc_sb_csum(mdp_super_t * sb) -{ - unsigned int disk_csum, csum; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); - sb->sb_csum = disk_csum; - return csum; -} - -/* - * Handle superblock details. - * We want to be able to handle multiple superblock formats - * so we have a common interface to them all, and an array of - * different handlers. - * We rely on user-space to write the initial superblock, and support - * reading and updating of superblocks. - * Interface methods are: - * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) - * loads and validates a superblock on dev. - * if refdev != NULL, compare superblocks on both devices - * Return: - * 0 - dev has a superblock that is compatible with refdev - * 1 - dev has a superblock that is compatible and newer than refdev - * so dev should be used as the refdev in future - * -EINVAL superblock incompatible or invalid - * -othererror e.g. -EIO - * - * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) - * Verify that dev is acceptable into mddev. - * The first time, mddev->raid_disks will be 0, and data from - * dev should be merged in. Subsequent calls check that dev - * is new enough. Return 0 or -EINVAL - * - * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) - * Update the superblock for rdev with data in mddev - * This does not write to disc. - * - */ - -struct super_type { - char *name; - struct module *owner; - int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); - int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); - void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); -}; - -/* - * load_super for 0.90.0 - */ -static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) -{ - mdp_super_t *sb; - int ret; - sector_t sb_offset; - - /* - * Calculate the position of the superblock, - * it's at the end of the disk. - * - * It also happens to be a multiple of 4Kb. - */ - sb_offset = calc_dev_sboffset(rdev->bdev); - rdev->sb_offset = sb_offset; - - ret = read_disk_sb(rdev); - if (ret) return ret; - - ret = -EINVAL; - - sb = (mdp_super_t*)page_address(rdev->sb_page); - - if (sb->md_magic != MD_SB_MAGIC) { - printk(KERN_ERR "md: invalid raid superblock magic on %s\n", - bdev_partition_name(rdev->bdev)); - goto abort; - } - - if (sb->major_version != 0 || - sb->minor_version != 90) { - printk(KERN_WARNING "Bad version number %d.%d on %s\n", - sb->major_version, sb->minor_version, - bdev_partition_name(rdev->bdev)); - goto abort; - } - - if (sb->md_minor >= MAX_MD_DEVS) { - printk(KERN_ERR "md: %s: invalid raid minor (%x)\n", - bdev_partition_name(rdev->bdev), sb->md_minor); - goto abort; - } - if (sb->raid_disks <= 0) - goto abort; - - if (calc_sb_csum(sb) != sb->sb_csum) { - printk(KERN_WARNING "md: invalid superblock checksum on %s\n", - bdev_partition_name(rdev->bdev)); - goto abort; - } - - rdev->preferred_minor = sb->md_minor; - rdev->data_offset = 0; - - if (sb->level == MULTIPATH) - rdev->desc_nr = -1; - else - rdev->desc_nr = sb->this_disk.number; - - if (refdev == 0) - ret = 1; - else { - __u64 ev1, ev2; - mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); - if (!uuid_equal(refsb, sb)) { - printk(KERN_WARNING "md: %s has different UUID to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - goto abort; - } - if (!sb_equal(refsb, sb)) { - printk(KERN_WARNING "md: %s has same UUID" - " but different superblock to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - goto abort; - } - ev1 = md_event(sb); - ev2 = md_event(refsb); - if (ev1 > ev2) - ret = 1; - else - ret = 0; - } - rdev->size = calc_dev_size(rdev, sb->chunk_size); - - abort: - return ret; -} - -/* - * validate_super for 0.90.0 - */ -static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) -{ - mdp_disk_t *desc; - mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); - - if (mddev->raid_disks == 0) { - mddev->major_version = 0; - mddev->minor_version = sb->minor_version; - mddev->patch_version = sb->patch_version; - mddev->persistent = ! sb->not_persistent; - mddev->chunk_size = sb->chunk_size; - mddev->ctime = sb->ctime; - mddev->utime = sb->utime; - mddev->level = sb->level; - mddev->layout = sb->layout; - mddev->raid_disks = sb->raid_disks; - mddev->size = sb->size; - mddev->events = md_event(sb); - - if (sb->state & (1<recovery_cp = MaxSector; - else { - if (sb->events_hi == sb->cp_events_hi && - sb->events_lo == sb->cp_events_lo) { - mddev->recovery_cp = sb->recovery_cp; - } else - mddev->recovery_cp = 0; - } - - memcpy(mddev->uuid+0, &sb->set_uuid0, 4); - memcpy(mddev->uuid+4, &sb->set_uuid1, 4); - memcpy(mddev->uuid+8, &sb->set_uuid2, 4); - memcpy(mddev->uuid+12,&sb->set_uuid3, 4); - - mddev->max_disks = MD_SB_DISKS; - } else { - __u64 ev1; - ev1 = md_event(sb); - ++ev1; - if (ev1 < mddev->events) - return -EINVAL; - } - if (mddev->level != LEVEL_MULTIPATH) { - rdev->raid_disk = -1; - rdev->in_sync = rdev->faulty = 0; - desc = sb->disks + rdev->desc_nr; - - if (desc->state & (1<faulty = 1; - else if (desc->state & (1<raid_disk < mddev->raid_disks) { - rdev->in_sync = 1; - rdev->raid_disk = desc->raid_disk; - } - } - return 0; -} - -/* - * sync_super for 0.90.0 - */ -static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) -{ - mdp_super_t *sb; - struct list_head *tmp; - mdk_rdev_t *rdev2; - int next_spare = mddev->raid_disks; - - /* make rdev->sb match mddev data.. - * - * 1/ zero out disks - * 2/ Add info for each disk, keeping track of highest desc_nr - * 3/ any empty disks < highest become removed - * - * disks[0] gets initialised to REMOVED because - * we cannot be sure from other fields if it has - * been initialised or not. - */ - int highest = 0; - int i; - int active=0, working=0,failed=0,spare=0,nr_disks=0; - - sb = (mdp_super_t*)page_address(rdev->sb_page); - - memset(sb, 0, sizeof(*sb)); - - sb->md_magic = MD_SB_MAGIC; - sb->major_version = mddev->major_version; - sb->minor_version = mddev->minor_version; - sb->patch_version = mddev->patch_version; - sb->gvalid_words = 0; /* ignored */ - memcpy(&sb->set_uuid0, mddev->uuid+0, 4); - memcpy(&sb->set_uuid1, mddev->uuid+4, 4); - memcpy(&sb->set_uuid2, mddev->uuid+8, 4); - memcpy(&sb->set_uuid3, mddev->uuid+12,4); - - sb->ctime = mddev->ctime; - sb->level = mddev->level; - sb->size = mddev->size; - sb->raid_disks = mddev->raid_disks; - sb->md_minor = mddev->__minor; - sb->not_persistent = !mddev->persistent; - sb->utime = mddev->utime; - sb->state = 0; - sb->events_hi = (mddev->events>>32); - sb->events_lo = (u32)mddev->events; - - if (mddev->in_sync) - { - sb->recovery_cp = mddev->recovery_cp; - sb->cp_events_hi = (mddev->events>>32); - sb->cp_events_lo = (u32)mddev->events; - if (mddev->recovery_cp == MaxSector) - sb->state = (1<< MD_SB_CLEAN); - } else - sb->recovery_cp = 0; - - sb->layout = mddev->layout; - sb->chunk_size = mddev->chunk_size; - - sb->disks[0].state = (1<raid_disk >= 0 && rdev2->in_sync && !rdev2->faulty) - rdev2->desc_nr = rdev2->raid_disk; - else - rdev2->desc_nr = next_spare++; - d = &sb->disks[rdev2->desc_nr]; - nr_disks++; - d->number = rdev2->desc_nr; - d->major = MAJOR(rdev2->bdev->bd_dev); - d->minor = MINOR(rdev2->bdev->bd_dev); - if (rdev2->raid_disk >= 0 && rdev->in_sync && !rdev2->faulty) - d->raid_disk = rdev2->raid_disk; - else - d->raid_disk = rdev2->desc_nr; /* compatibility */ - if (rdev2->faulty) { - d->state = (1<in_sync) { - d->state = (1<state |= (1<state = 0; - spare++; - working++; - } - if (rdev2->desc_nr > highest) - highest = rdev2->desc_nr; - } - - /* now set the "removed" bit on any non-trailing holes */ - for (i=0; idisks[i]; - if (d->state == 0 && d->number == 0) { - d->number = i; - d->raid_disk = i; - d->state = (1<nr_disks = nr_disks; - sb->active_disks = active; - sb->working_disks = working; - sb->failed_disks = failed; - sb->spare_disks = spare; - - sb->this_disk = sb->disks[rdev->desc_nr]; - sb->sb_csum = calc_sb_csum(sb); -} - -/* - * version 1 superblock - */ - -static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) -{ - unsigned int disk_csum, csum; - int size = 256 + sb->max_dev*2; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, size, 0); - sb->sb_csum = disk_csum; - return csum; -} - -static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) -{ - struct mdp_superblock_1 *sb; - int ret; - sector_t sb_offset; - - /* - * Calculate the position of the superblock. - * It is always aligned to a 4K boundary and - * depeding on minor_version, it can be: - * 0: At least 8K, but less than 12K, from end of device - * 1: At start of device - * 2: 4K from start of device. - */ - switch(minor_version) { - case 0: - sb_offset = rdev->bdev->bd_inode->i_size >> 9; - sb_offset -= 8*2; - sb_offset &= ~(4*2); - /* convert from sectors to K */ - sb_offset /= 2; - break; - case 1: - sb_offset = 0; - break; - case 2: - sb_offset = 4; - break; - default: - return -EINVAL; - } - rdev->sb_offset = sb_offset; - - ret = read_disk_sb(rdev); - if (ret) return ret; - - - sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || - sb->major_version != cpu_to_le32(1) || - le32_to_cpu(sb->max_dev) > (4096-256)/2 || - le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || - sb->feature_map != 0) - return -EINVAL; - - if (calc_sb_1_csum(sb) != sb->sb_csum) { - printk("md: invalid superblock checksum on %s\n", - bdev_partition_name(rdev->bdev)); - return -EINVAL; - } - rdev->preferred_minor = 0xffff; - rdev->data_offset = le64_to_cpu(sb->data_offset); - - if (refdev == 0) - return 1; - else { - __u64 ev1, ev2; - struct mdp_superblock_1 *refsb = - (struct mdp_superblock_1*)page_address(refdev->sb_page); - - if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || - sb->level != refsb->level || - sb->layout != refsb->layout || - sb->chunksize != refsb->chunksize) { - printk(KERN_WARNING "md: %s has strangely different" - " superblock to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - return -EINVAL; - } - ev1 = le64_to_cpu(sb->events); - ev2 = le64_to_cpu(refsb->events); - - if (ev1 > ev2) - return 1; - } - if (minor_version) - rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; - else - rdev->size = rdev->sb_offset; - if (rdev->size < le64_to_cpu(sb->data_size)/2) - return -EINVAL; - rdev->size = le64_to_cpu(sb->data_size)/2; - if (le32_to_cpu(sb->chunksize)) - rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); - return 0; -} - -static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) -{ - struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - if (mddev->raid_disks == 0) { - mddev->major_version = 1; - mddev->minor_version = 0; - mddev->patch_version = 0; - mddev->persistent = 1; - mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; - mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); - mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); - mddev->level = le32_to_cpu(sb->level); - mddev->layout = le32_to_cpu(sb->layout); - mddev->raid_disks = le32_to_cpu(sb->raid_disks); - mddev->size = (u32)le64_to_cpu(sb->size); - mddev->events = le64_to_cpu(sb->events); - - mddev->recovery_cp = le64_to_cpu(sb->resync_offset); - memcpy(mddev->uuid, sb->set_uuid, 16); - - mddev->max_disks = (4096-256)/2; - } else { - __u64 ev1; - ev1 = le64_to_cpu(sb->events); - ++ev1; - if (ev1 < mddev->events) - return -EINVAL; - } - - if (mddev->level != LEVEL_MULTIPATH) { - int role; - rdev->desc_nr = le32_to_cpu(sb->dev_number); - role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); - switch(role) { - case 0xffff: /* spare */ - rdev->in_sync = 0; - rdev->faulty = 0; - rdev->raid_disk = -1; - break; - case 0xfffe: /* faulty */ - rdev->in_sync = 0; - rdev->faulty = 1; - rdev->raid_disk = -1; - break; - default: - rdev->in_sync = 1; - rdev->faulty = 0; - rdev->raid_disk = role; - break; - } - } - return 0; -} - -static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) -{ - struct mdp_superblock_1 *sb; - struct list_head *tmp; - mdk_rdev_t *rdev2; - int max_dev, i; - /* make rdev->sb match mddev and rdev data. */ - - sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - sb->feature_map = 0; - sb->pad0 = 0; - memset(sb->pad1, 0, sizeof(sb->pad1)); - memset(sb->pad2, 0, sizeof(sb->pad2)); - memset(sb->pad3, 0, sizeof(sb->pad3)); - - sb->utime = cpu_to_le64((__u64)mddev->utime); - sb->events = cpu_to_le64(mddev->events); - if (mddev->in_sync) - sb->resync_offset = cpu_to_le64(mddev->recovery_cp); - else - sb->resync_offset = cpu_to_le64(0); - - max_dev = 0; - ITERATE_RDEV(mddev,rdev2,tmp) - if (rdev2->desc_nr > max_dev) - max_dev = rdev2->desc_nr; - - sb->max_dev = max_dev; - for (i=0; idev_roles[max_dev] = cpu_to_le16(0xfffe); - - ITERATE_RDEV(mddev,rdev2,tmp) { - i = rdev2->desc_nr; - if (rdev2->faulty) - sb->dev_roles[i] = cpu_to_le16(0xfffe); - else if (rdev2->in_sync) - sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); - else - sb->dev_roles[i] = cpu_to_le16(0xffff); - } - - sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ -} - - -struct super_type super_types[] = { - [0] = { - .name = "0.90.0", - .owner = THIS_MODULE, - .load_super = super_90_load, - .validate_super = super_90_validate, - .sync_super = super_90_sync, - }, - [1] = { - .name = "md-1", - .owner = THIS_MODULE, - .load_super = super_1_load, - .validate_super = super_1_validate, - .sync_super = super_1_sync, - }, -}; - -static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) - if (rdev->bdev->bd_contains == dev->bdev->bd_contains) - return rdev; - - return NULL; -} - -static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev1,rdev,tmp) - if (match_dev_unit(mddev2, rdev)) - return 1; - - return 0; -} - -static LIST_HEAD(pending_raid_disks); - -static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) -{ - mdk_rdev_t *same_pdev; - - if (rdev->mddev) { - MD_BUG(); - return -EINVAL; - } - same_pdev = match_dev_unit(mddev, rdev); - if (same_pdev) - printk(KERN_WARNING - "md%d: WARNING: %s appears to be on the same physical" - " disk as %s. True\n protection against single-disk" - " failure might be compromised.\n", - mdidx(mddev), bdev_partition_name(rdev->bdev), - bdev_partition_name(same_pdev->bdev)); - - /* Verify rdev->desc_nr is unique. - * If it is -1, assign a free number, else - * check number is not in use - */ - if (rdev->desc_nr < 0) { - int choice = 0; - if (mddev->pers) choice = mddev->raid_disks; - while (find_rdev_nr(mddev, choice)) - choice++; - rdev->desc_nr = choice; - } else { - if (find_rdev_nr(mddev, rdev->desc_nr)) - return -EBUSY; - } - - list_add(&rdev->same_set, &mddev->disks); - rdev->mddev = mddev; - printk(KERN_INFO "md: bind<%s>\n", bdev_partition_name(rdev->bdev)); - return 0; -} - -static void unbind_rdev_from_array(mdk_rdev_t * rdev) -{ - if (!rdev->mddev) { - MD_BUG(); - return; - } - list_del_init(&rdev->same_set); - printk(KERN_INFO "md: unbind<%s>\n", bdev_partition_name(rdev->bdev)); - rdev->mddev = NULL; -} - -/* - * prevent the device from being mounted, repartitioned or - * otherwise reused by a RAID array (or any other kernel - * subsystem), by opening the device. [simply getting an - * inode is not enough, the SCSI module usage code needs - * an explicit open() on the device] - */ -static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) -{ - int err = 0; - struct block_device *bdev; - - bdev = bdget(dev); - if (!bdev) - return -ENOMEM; - err = blkdev_get(bdev, FMODE_READ|FMODE_WRITE, 0, BDEV_RAW); - if (err) - return err; - err = bd_claim(bdev, rdev); - if (err) { - blkdev_put(bdev, BDEV_RAW); - return err; - } - rdev->bdev = bdev; - return err; -} - -static void unlock_rdev(mdk_rdev_t *rdev) -{ - struct block_device *bdev = rdev->bdev; - rdev->bdev = NULL; - if (!bdev) - MD_BUG(); - bd_release(bdev); - blkdev_put(bdev, BDEV_RAW); -} - -void md_autodetect_dev(dev_t dev); - -static void export_rdev(mdk_rdev_t * rdev) -{ - printk(KERN_INFO "md: export_rdev(%s)\n", - bdev_partition_name(rdev->bdev)); - if (rdev->mddev) - MD_BUG(); - free_disk_sb(rdev); - list_del_init(&rdev->same_set); -#ifndef MODULE - md_autodetect_dev(rdev->bdev->bd_dev); -#endif - unlock_rdev(rdev); - kfree(rdev); -} - -static void kick_rdev_from_array(mdk_rdev_t * rdev) -{ - unbind_rdev_from_array(rdev); - export_rdev(rdev); -} - -static void export_array(mddev_t *mddev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!rdev->mddev) { - MD_BUG(); - continue; - } - kick_rdev_from_array(rdev); - } - if (!list_empty(&mddev->disks)) - MD_BUG(); - mddev->raid_disks = 0; - mddev->major_version = 0; -} - -static void print_desc(mdp_disk_t *desc) -{ - printk(" DISK\n", desc->number, - partition_name(MKDEV(desc->major,desc->minor)), - desc->major,desc->minor,desc->raid_disk,desc->state); -} - -static void print_sb(mdp_super_t *sb) -{ - int i; - - printk(KERN_INFO - "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", - sb->major_version, sb->minor_version, sb->patch_version, - sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, - sb->ctime); - printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", - sb->level, sb->size, sb->nr_disks, sb->raid_disks, - sb->md_minor, sb->layout, sb->chunk_size); - printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" - " FD:%d SD:%d CSUM:%08x E:%08lx\n", - sb->utime, sb->state, sb->active_disks, sb->working_disks, - sb->failed_disks, sb->spare_disks, - sb->sb_csum, (unsigned long)sb->events_lo); - - printk(KERN_INFO); - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - - desc = sb->disks + i; - if (desc->number || desc->major || desc->minor || - desc->raid_disk || (desc->state && (desc->state != 4))) { - printk(" D %2d: ", i); - print_desc(desc); - } - } - printk(KERN_INFO "md: THIS: "); - print_desc(&sb->this_disk); - -} - -static void print_rdev(mdk_rdev_t *rdev) -{ - printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%d ", - bdev_partition_name(rdev->bdev), (unsigned long long)rdev->size, - rdev->faulty, rdev->in_sync, rdev->desc_nr); - if (rdev->sb_loaded) { - printk(KERN_INFO "md: rdev superblock:\n"); - print_sb((mdp_super_t*)page_address(rdev->sb_page)); - } else - printk(KERN_INFO "md: no rdev superblock!\n"); -} - -void md_print_devices(void) -{ - struct list_head *tmp, *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev; - - printk("\n"); - printk("md: **********************************\n"); - printk("md: * *\n"); - printk("md: **********************************\n"); - ITERATE_MDDEV(mddev,tmp) { - printk("md%d: ", mdidx(mddev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - printk("<%s>", bdev_partition_name(rdev->bdev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - print_rdev(rdev); - } - printk("md: **********************************\n"); - printk("\n"); -} - - -static int write_disk_sb(mdk_rdev_t * rdev) -{ - - if (!rdev->sb_loaded) { - MD_BUG(); - return 1; - } - if (rdev->faulty) { - MD_BUG(); - return 1; - } - - dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->sb_offset); - - if (sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE)) - return 0; - - printk("md: write_disk_sb failed for device %s\n", - bdev_partition_name(rdev->bdev)); - return 1; -} - -static void sync_sbs(mddev_t * mddev) -{ - mdk_rdev_t *rdev; - struct list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - super_types[mddev->major_version]. - sync_super(mddev, rdev); - rdev->sb_loaded = 1; - } -} - -static void md_update_sb(mddev_t * mddev) -{ - int err, count = 100; - struct list_head *tmp; - mdk_rdev_t *rdev; - - mddev->sb_dirty = 0; -repeat: - mddev->utime = get_seconds(); - mddev->events ++; - - if (!mddev->events) { - /* - * oops, this 64-bit counter should never wrap. - * Either we are in around ~1 trillion A.C., assuming - * 1 reboot per second, or we have a bug: - */ - MD_BUG(); - mddev->events --; - } - sync_sbs(mddev); - - /* - * do not write anything to disk if using - * nonpersistent superblocks - */ - if (!mddev->persistent) - return; - - dprintk(KERN_INFO - "md: updating md%d RAID superblock on device (in sync %d)\n", - mdidx(mddev),mddev->in_sync); - - err = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - dprintk(KERN_INFO "md: "); - if (rdev->faulty) - dprintk("(skipping faulty "); - - dprintk("%s ", bdev_partition_name(rdev->bdev)); - if (!rdev->faulty) { - err += write_disk_sb(rdev); - } else - dprintk(")\n"); - if (!err && mddev->level == LEVEL_MULTIPATH) - /* only need to write one superblock... */ - break; - } - if (err) { - if (--count) { - printk(KERN_ERR "md: errors occurred during superblock" - " update, repeating\n"); - goto repeat; - } - printk(KERN_ERR \ - "md: excessive errors occurred during superblock update, exiting\n"); - } -} - -/* - * Import a device. If 'super_format' >= 0, then sanity check the superblock - * - * mark the device faulty if: - * - * - the device is nonexistent (zero size) - * - the device has no valid superblock - * - * a faulty rdev _never_ has rdev->sb set. - */ -static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) -{ - int err; - mdk_rdev_t *rdev; - sector_t size; - - rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL); - if (!rdev) { - printk(KERN_ERR "md: could not alloc mem for %s!\n", - partition_name(newdev)); - return ERR_PTR(-ENOMEM); - } - memset(rdev, 0, sizeof(*rdev)); - - if ((err = alloc_disk_sb(rdev))) - goto abort_free; - - err = lock_rdev(rdev, newdev); - if (err) { - printk(KERN_ERR "md: could not lock %s.\n", - partition_name(newdev)); - goto abort_free; - } - rdev->desc_nr = -1; - rdev->faulty = 0; - rdev->in_sync = 0; - rdev->data_offset = 0; - atomic_set(&rdev->nr_pending, 0); - - size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - if (!size) { - printk(KERN_WARNING - "md: %s has zero or unknown size, marking faulty!\n", - bdev_partition_name(rdev->bdev)); - err = -EINVAL; - goto abort_free; - } - - if (super_format >= 0) { - err = super_types[super_format]. - load_super(rdev, NULL, super_minor); - if (err == -EINVAL) { - printk(KERN_WARNING - "md: %s has invalid sb, not importing!\n", - bdev_partition_name(rdev->bdev)); - goto abort_free; - } - if (err < 0) { - printk(KERN_WARNING - "md: could not read %s's sb, not importing!\n", - bdev_partition_name(rdev->bdev)); - goto abort_free; - } - } - INIT_LIST_HEAD(&rdev->same_set); - - return rdev; - -abort_free: - if (rdev->sb_page) { - if (rdev->bdev) - unlock_rdev(rdev); - free_disk_sb(rdev); - } - kfree(rdev); - return ERR_PTR(err); -} - -/* - * Check a full RAID array for plausibility - */ - - -static int analyze_sbs(mddev_t * mddev) -{ - int i; - struct list_head *tmp; - mdk_rdev_t *rdev, *freshest; - - freshest = NULL; - ITERATE_RDEV(mddev,rdev,tmp) - switch (super_types[mddev->major_version]. - load_super(rdev, freshest, mddev->minor_version)) { - case 1: - freshest = rdev; - break; - case 0: - break; - default: - printk( KERN_ERR \ - "md: fatal superblock inconsistency in %s" - " -- removing from array\n", - bdev_partition_name(rdev->bdev)); - kick_rdev_from_array(rdev); - } - - - super_types[mddev->major_version]. - validate_super(mddev, freshest); - - i = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev != freshest) - if (super_types[mddev->major_version]. - validate_super(mddev, rdev)) { - printk(KERN_WARNING "md: kicking non-fresh %s" - " from array!\n", - bdev_partition_name(rdev->bdev)); - kick_rdev_from_array(rdev); - continue; - } - if (mddev->level == LEVEL_MULTIPATH) { - rdev->desc_nr = i++; - rdev->raid_disk = rdev->desc_nr; - rdev->in_sync = 1; - } - } - - - /* - * Check if we can support this RAID array - */ - if (mddev->major_version != MD_MAJOR_VERSION || - mddev->minor_version > MD_MINOR_VERSION) { - printk(KERN_ALERT - "md: md%d: unsupported raid array version %d.%d.%d\n", - mdidx(mddev), mddev->major_version, - mddev->minor_version, mddev->patch_version); - goto abort; - } - - if ((mddev->recovery_cp != MaxSector) && ((mddev->level == 1) || - (mddev->level == 4) || (mddev->level == 5))) - printk(KERN_ERR "md: md%d: raid array is not clean" - " -- starting background reconstruction\n", - mdidx(mddev)); - - return 0; -abort: - return 1; -} - -static struct gendisk *md_probe(dev_t dev, int *part, void *data) -{ - static DECLARE_MUTEX(disks_sem); - int unit = MINOR(dev); - mddev_t *mddev = mddev_find(unit); - struct gendisk *disk; - - if (!mddev) - return NULL; - - down(&disks_sem); - if (disks[unit]) { - up(&disks_sem); - mddev_put(mddev); - return NULL; - } - disk = alloc_disk(1); - if (!disk) { - up(&disks_sem); - mddev_put(mddev); - return NULL; - } - disk->major = MD_MAJOR; - disk->first_minor = mdidx(mddev); - sprintf(disk->disk_name, "md%d", mdidx(mddev)); - disk->fops = &md_fops; - disk->private_data = mddev; - disk->queue = &mddev->queue; - add_disk(disk); - disks[mdidx(mddev)] = disk; - up(&disks_sem); - return NULL; -} - -void md_wakeup_thread(mdk_thread_t *thread); - -static void md_safemode_timeout(unsigned long data) -{ - mddev_t *mddev = (mddev_t *) data; - - mddev->safemode = 1; - md_wakeup_thread(mddev->thread); -} - - -static int do_md_run(mddev_t * mddev) -{ - int pnum, err; - int chunk_size; - struct list_head *tmp; - mdk_rdev_t *rdev; - struct gendisk *disk; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return -EINVAL; - } - - if (mddev->pers) - return -EBUSY; - - /* - * Analyze all RAID superblock(s) - */ - if (!mddev->raid_disks && analyze_sbs(mddev)) { - MD_BUG(); - return -EINVAL; - } - - chunk_size = mddev->chunk_size; - pnum = level_to_pers(mddev->level); - - if ((pnum != MULTIPATH) && (pnum != RAID1)) { - if (!chunk_size) { - /* - * 'default chunksize' in the old md code used to - * be PAGE_SIZE, baaad. - * we abort here to be on the safe side. We don't - * want to continue the bad practice. - */ - printk(KERN_ERR - "no chunksize specified, see 'man raidtab'\n"); - return -EINVAL; - } - if (chunk_size > MAX_CHUNK_SIZE) { - printk(KERN_ERR "too big chunk_size: %d > %d\n", - chunk_size, MAX_CHUNK_SIZE); - return -EINVAL; - } - /* - * chunk-size has to be a power of 2 and multiples of PAGE_SIZE - */ - if ( (1 << ffz(~chunk_size)) != chunk_size) { - MD_BUG(); - return -EINVAL; - } - if (chunk_size < PAGE_SIZE) { - printk(KERN_ERR "too small chunk_size: %d < %ld\n", - chunk_size, PAGE_SIZE); - return -EINVAL; - } - - /* devices must have minimum size of one chunk */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size < chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size:" - " %lluk < %dk\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->size, - chunk_size / 1024); - return -EINVAL; - } - } - } - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - -#ifdef CONFIG_KMOD - if (!pers[pnum]) - { - char module_name[80]; - sprintf (module_name, "md-personality-%d", pnum); - request_module (module_name); - } -#endif - - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md - * device. - * Also find largest hardsector size - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - sync_blockdev(rdev->bdev); - invalidate_bdev(rdev->bdev, 0); - } - - md_probe(mdidx(mddev), NULL, NULL); - disk = disks[mdidx(mddev)]; - if (!disk) - return -ENOMEM; - - spin_lock(&pers_lock); - if (!pers[pnum] || !try_module_get(pers[pnum]->owner)) { - spin_unlock(&pers_lock); - printk(KERN_ERR "md: personality %d is not loaded!\n", - pnum); - return -EINVAL; - } - - mddev->pers = pers[pnum]; - spin_unlock(&pers_lock); - - blk_queue_make_request(&mddev->queue, mddev->pers->make_request); - printk("%s: setting max_sectors to %d, segment boundary to %d\n", - disk->disk_name, - chunk_size >> 9, - (chunk_size>>1)-1); - blk_queue_max_sectors(&mddev->queue, chunk_size >> 9); - blk_queue_segment_boundary(&mddev->queue, (chunk_size>>1) - 1); - mddev->queue.queuedata = mddev; - - err = mddev->pers->run(mddev); - if (err) { - printk(KERN_ERR "md: pers->run() failed ...\n"); - module_put(mddev->pers->owner); - mddev->pers = NULL; - return -EINVAL; - } - atomic_set(&mddev->writes_pending,0); - mddev->safemode = 0; - mddev->safemode_timer.function = md_safemode_timeout; - mddev->safemode_timer.data = (unsigned long) mddev; - mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ - mddev->in_sync = 1; - - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - set_capacity(disk, mddev->array_size<<1); - return 0; -} - -static int restart_array(mddev_t *mddev) -{ - struct gendisk *disk = disks[mdidx(mddev)]; - int err; - - /* - * Complain if it has no devices - */ - err = -ENXIO; - if (list_empty(&mddev->disks)) - goto out; - - if (mddev->pers) { - err = -EBUSY; - if (!mddev->ro) - goto out; - - mddev->safemode = 0; - mddev->ro = 0; - set_disk_ro(disk, 0); - - printk(KERN_INFO "md: md%d switched to read-write mode.\n", - mdidx(mddev)); - /* - * Kick recovery or resync if necessary - */ - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - err = 0; - } else { - printk(KERN_ERR "md: md%d has no personality assigned.\n", - mdidx(mddev)); - err = -EINVAL; - } - -out: - return err; -} - -static int do_md_stop(mddev_t * mddev, int ro) -{ - int err = 0; - struct gendisk *disk = disks[mdidx(mddev)]; - - if (atomic_read(&mddev->active)>2) { - printk("md: md%d still in use.\n",mdidx(mddev)); - err = -EBUSY; - goto out; - } - - if (mddev->pers) { - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - } - - del_timer_sync(&mddev->safemode_timer); - - invalidate_device(mk_kdev(disk->major, disk->first_minor), 1); - - if (ro) { - err = -ENXIO; - if (mddev->ro) - goto out; - mddev->ro = 1; - } else { - if (mddev->ro) - set_disk_ro(disk, 0); - if (mddev->pers->stop(mddev)) { - err = -EBUSY; - if (mddev->ro) - set_disk_ro(disk, 1); - goto out; - } - module_put(mddev->pers->owner); - mddev->pers = NULL; - if (mddev->ro) - mddev->ro = 0; - } - if (mddev->raid_disks) { - /* mark array as shutdown cleanly */ - mddev->in_sync = 1; - md_update_sb(mddev); - } - if (ro) - set_disk_ro(disk, 1); - } - /* - * Free resources if final stop - */ - if (!ro) { - struct gendisk *disk; - printk(KERN_INFO "md: md%d stopped.\n", mdidx(mddev)); - - export_array(mddev); - - mddev->array_size = 0; - disk = disks[mdidx(mddev)]; - if (disk) - set_capacity(disk, 0); - } else - printk(KERN_INFO "md: md%d switched to read-only mode.\n", - mdidx(mddev)); - err = 0; -out: - return err; -} - -static void autorun_array(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct list_head *tmp; - int err; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return; - } - - printk(KERN_INFO "md: running: "); - - ITERATE_RDEV(mddev,rdev,tmp) { - printk("<%s>", bdev_partition_name(rdev->bdev)); - } - printk("\n"); - - err = do_md_run (mddev); - if (err) { - printk(KERN_WARNING "md :do_md_run() returned %d\n", err); - do_md_stop (mddev, 0); - } -} - -/* - * lets try to run arrays based on all disks that have arrived - * until now. (those are in pending_raid_disks) - * - * the method: pick the first pending disk, collect all disks with - * the same UUID, remove all from the pending list and put them into - * the 'same_array' list. Then order this list based on superblock - * update time (freshest comes first), kick out 'old' disks and - * compare superblocks. If everything's fine then run it. - * - * If "unit" is allocated, then bump its reference count - */ -static void autorun_devices(void) -{ - struct list_head candidates; - struct list_head *tmp; - mdk_rdev_t *rdev0, *rdev; - mddev_t *mddev; - - printk(KERN_INFO "md: autorun ...\n"); - while (!list_empty(&pending_raid_disks)) { - rdev0 = list_entry(pending_raid_disks.next, - mdk_rdev_t, same_set); - - printk(KERN_INFO "md: considering %s ...\n", - bdev_partition_name(rdev0->bdev)); - INIT_LIST_HEAD(&candidates); - ITERATE_RDEV_PENDING(rdev,tmp) - if (super_90_load(rdev, rdev0, 0) >= 0) { - printk(KERN_INFO "md: adding %s ...\n", - bdev_partition_name(rdev->bdev)); - list_move(&rdev->same_set, &candidates); - } - /* - * now we have a set of devices, with all of them having - * mostly sane superblocks. It's time to allocate the - * mddev. - */ - - mddev = mddev_find(rdev0->preferred_minor); - if (!mddev) { - printk(KERN_ERR - "md: cannot allocate memory for md drive.\n"); - break; - } - if (mddev_lock(mddev)) - printk(KERN_WARNING "md: md%d locked, cannot run\n", - mdidx(mddev)); - else if (mddev->raid_disks || mddev->major_version - || !list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: md%d already running, cannot run %s\n", - mdidx(mddev), bdev_partition_name(rdev0->bdev)); - mddev_unlock(mddev); - } else { - printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); - ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { - list_del_init(&rdev->same_set); - if (bind_rdev_to_array(rdev, mddev)) - export_rdev(rdev); - } - autorun_array(mddev); - mddev_unlock(mddev); - } - /* on success, candidates will be empty, on error - * it won't... - */ - ITERATE_RDEV_GENERIC(candidates,rdev,tmp) - export_rdev(rdev); - mddev_put(mddev); - } - printk(KERN_INFO "md: ... autorun DONE.\n"); -} - -/* - * import RAID devices based on one partition - * if possible, the array gets run as well. - */ - -static int autostart_array(dev_t startdev) -{ - int err = -EINVAL, i; - mdp_super_t *sb = NULL; - mdk_rdev_t *start_rdev = NULL, *rdev; - - start_rdev = md_import_device(startdev, 0, 0); - if (IS_ERR(start_rdev)) { - printk(KERN_WARNING "md: could not import %s!\n", - partition_name(startdev)); - return err; - } - - /* NOTE: this can only work for 0.90.0 superblocks */ - sb = (mdp_super_t*)page_address(start_rdev->sb_page); - if (sb->major_version != 0 || - sb->minor_version != 90 ) { - printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); - export_rdev(start_rdev); - return err; - } - - if (start_rdev->faulty) { - printk(KERN_WARNING - "md: can not autostart based on faulty %s!\n", - bdev_partition_name(start_rdev->bdev)); - export_rdev(start_rdev); - return err; - } - list_add(&start_rdev->same_set, &pending_raid_disks); - - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - dev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (!dev) - continue; - if (dev == startdev) - continue; - rdev = md_import_device(dev, 0, 0); - if (IS_ERR(rdev)) { - printk(KERN_WARNING "md: could not import %s," - " trying to run array nevertheless.\n", - partition_name(dev)); - continue; - } - list_add(&rdev->same_set, &pending_raid_disks); - } - - /* - * possibly return codes - */ - autorun_devices(); - return 0; - -} - - -static int get_version(void * arg) -{ - mdu_version_t ver; - - ver.major = MD_MAJOR_VERSION; - ver.minor = MD_MINOR_VERSION; - ver.patchlevel = MD_PATCHLEVEL_VERSION; - - if (copy_to_user(arg, &ver, sizeof(ver))) - return -EFAULT; - - return 0; -} - -static int get_array_info(mddev_t * mddev, void * arg) -{ - mdu_array_info_t info; - int nr,working,active,failed,spare; - mdk_rdev_t *rdev; - struct list_head *tmp; - - nr=working=active=failed=spare=0; - ITERATE_RDEV(mddev,rdev,tmp) { - nr++; - if (rdev->faulty) - failed++; - else { - working++; - if (rdev->in_sync) - active++; - else - spare++; - } - } - - info.major_version = mddev->major_version; - info.minor_version = mddev->minor_version; - info.patch_version = 1; - info.ctime = mddev->ctime; - info.level = mddev->level; - info.size = mddev->size; - info.nr_disks = nr; - info.raid_disks = mddev->raid_disks; - info.md_minor = mddev->__minor; - info.not_persistent= !mddev->persistent; - - info.utime = mddev->utime; - info.state = 0; - if (mddev->in_sync) - info.state = (1<layout; - info.chunk_size = mddev->chunk_size; - - if (copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} - -static int get_disk_info(mddev_t * mddev, void * arg) -{ - mdu_disk_info_t info; - unsigned int nr; - mdk_rdev_t *rdev; - - if (copy_from_user(&info, arg, sizeof(info))) - return -EFAULT; - - nr = info.number; - - rdev = find_rdev_nr(mddev, nr); - if (rdev) { - info.major = MAJOR(rdev->bdev->bd_dev); - info.minor = MINOR(rdev->bdev->bd_dev); - info.raid_disk = rdev->raid_disk; - info.state = 0; - if (rdev->faulty) - info.state |= (1<in_sync) { - info.state |= (1<major,info->minor); - if (!mddev->raid_disks) { - int err; - /* expecting a device which has a superblock */ - rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: md_import_device returned %ld\n", - PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - if (!list_empty(&mddev->disks)) { - mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, - mdk_rdev_t, same_set); - int err = super_types[mddev->major_version] - .load_super(rdev, rdev0, mddev->minor_version); - if (err < 0) { - printk(KERN_WARNING - "md: %s has different UUID to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(rdev0->bdev)); - export_rdev(rdev); - return -EINVAL; - } - } - err = bind_rdev_to_array(rdev, mddev); - if (err) - export_rdev(rdev); - return err; - } - - /* - * add_new_disk can be used once the array is assembled - * to add "hot spares". They must already have a superblock - * written - */ - if (mddev->pers) { - int err; - if (!mddev->pers->hot_add_disk) { - printk(KERN_WARNING - "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - rdev = md_import_device(dev, mddev->major_version, - mddev->minor_version); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: md_import_device returned %ld\n", - PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - rdev->in_sync = 0; /* just to be sure */ - rdev->raid_disk = -1; - err = bind_rdev_to_array(rdev, mddev); - if (err) - export_rdev(rdev); - if (mddev->thread) - md_wakeup_thread(mddev->thread); - return err; - } - - /* otherwise, add_new_disk is only allowed - * for major_version==0 superblocks - */ - if (mddev->major_version != 0) { - printk(KERN_WARNING "md%d: ADD_NEW_DISK not supported\n", - mdidx(mddev)); - return -EINVAL; - } - - if (!(info->state & (1<desc_nr = info->number; - if (info->raid_disk < mddev->raid_disks) - rdev->raid_disk = info->raid_disk; - else - rdev->raid_disk = -1; - - rdev->faulty = 0; - if (rdev->raid_disk < mddev->raid_disks) - rdev->in_sync = (info->state & (1<in_sync = 0; - - err = bind_rdev_to_array(rdev, mddev); - if (err) { - export_rdev(rdev); - return err; - } - - if (!mddev->persistent) { - printk(KERN_INFO "md: nonpersistent superblock ...\n"); - rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - } else - rdev->sb_offset = calc_dev_sboffset(rdev->bdev); - rdev->size = calc_dev_size(rdev, mddev->chunk_size); - - if (!mddev->size || (mddev->size > rdev->size)) - mddev->size = rdev->size; - } - - return 0; -} - -static int hot_generate_error(mddev_t * mddev, dev_t dev) -{ - struct request_queue *q; - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to generate %s error in md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - return -ENXIO; - } - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - if (!rdev->in_sync) - return -ENODEV; - - q = bdev_get_queue(rdev->bdev); - if (!q) { - MD_BUG(); - return -ENODEV; - } - printk(KERN_INFO "md: okay, generating error!\n"); -// q->oneshot_error = 1; // disabled for now - - return 0; -} - -static int hot_remove_disk(mddev_t * mddev, dev_t dev) -{ - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to remove %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) - return -ENXIO; - - if (rdev->raid_disk >= 0) - goto busy; - - kick_rdev_from_array(rdev); - md_update_sb(mddev); - - return 0; -busy: - printk(KERN_WARNING "md: cannot remove active disk %s from md%d ... \n", - bdev_partition_name(rdev->bdev), mdidx(mddev)); - return -EBUSY; -} - -static int hot_add_disk(mddev_t * mddev, dev_t dev) -{ - int err; - unsigned int size; - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to hot-add %s to md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (mddev->major_version != 0) { - printk(KERN_WARNING "md%d: HOT_ADD may only be used with" - " version-0 superblocks.\n", - mdidx(mddev)); - return -EINVAL; - } - if (!mddev->pers->hot_add_disk) { - printk(KERN_WARNING - "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - rdev = md_import_device (dev, -1, 0); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: error, md_import_device() returned %ld\n", - PTR_ERR(rdev)); - return -EINVAL; - } - - rdev->sb_offset = calc_dev_sboffset(rdev->bdev); - size = calc_dev_size(rdev, mddev->chunk_size); - rdev->size = size; - - if (size < mddev->size) { - printk(KERN_WARNING - "md%d: disk size %llu blocks < array size %llu\n", - mdidx(mddev), (unsigned long long)size, - (unsigned long long)mddev->size); - err = -ENOSPC; - goto abort_export; - } - - if (rdev->faulty) { - printk(KERN_WARNING - "md: can not hot-add faulty %s disk to md%d!\n", - bdev_partition_name(rdev->bdev), mdidx(mddev)); - err = -EINVAL; - goto abort_export; - } - rdev->in_sync = 0; - rdev->desc_nr = -1; - bind_rdev_to_array(rdev, mddev); - - /* - * The rest should better be atomic, we can have disk failures - * noticed in interrupt contexts ... - */ - - if (rdev->desc_nr == mddev->max_disks) { - printk(KERN_WARNING "md%d: can not hot-add to full array!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unbind_export; - } - - rdev->raid_disk = -1; - - md_update_sb(mddev); - - /* - * Kick recovery, maybe this spare has to be added to the - * array immediately. - */ - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - - return 0; - -abort_unbind_export: - unbind_rdev_from_array(rdev); - -abort_export: - export_rdev(rdev); - return err; -} - -/* - * set_array_info is used two different ways - * The original usage is when creating a new array. - * In this usage, raid_disks is > = and it together with - * level, size, not_persistent,layout,chunksize determine the - * shape of the array. - * This will always create an array with a type-0.90.0 superblock. - * The newer usage is when assembling an array. - * In this case raid_disks will be 0, and the major_version field is - * use to determine which style super-blocks are to be found on the devices. - * The minor and patch _version numbers are also kept incase the - * super_block handler wishes to interpret them. - */ -static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) -{ - - if (info->raid_disks == 0) { - /* just setting version number for superblock loading */ - if (info->major_version < 0 || - info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || - super_types[info->major_version].name == NULL) { - /* maybe try to auto-load a module? */ - printk(KERN_INFO - "md: superblock version %d not known\n", - info->major_version); - return -EINVAL; - } - mddev->major_version = info->major_version; - mddev->minor_version = info->minor_version; - mddev->patch_version = info->patch_version; - return 0; - } - mddev->major_version = MD_MAJOR_VERSION; - mddev->minor_version = MD_MINOR_VERSION; - mddev->patch_version = MD_PATCHLEVEL_VERSION; - mddev->ctime = get_seconds(); - - mddev->level = info->level; - mddev->size = info->size; - mddev->raid_disks = info->raid_disks; - /* don't set __minor, it is determined by which /dev/md* was - * openned - */ - if (info->state & (1<recovery_cp = MaxSector; - else - mddev->recovery_cp = 0; - mddev->persistent = ! info->not_persistent; - - mddev->layout = info->layout; - mddev->chunk_size = info->chunk_size; - - mddev->max_disks = MD_SB_DISKS; - - - /* - * Generate a 128 bit UUID - */ - get_random_bytes(mddev->uuid, 16); - - return 0; -} - -static int set_disk_faulty(mddev_t *mddev, dev_t dev) -{ - mdk_rdev_t *rdev; - - rdev = find_rdev(mddev, dev); - if (!rdev) - return 0; - - md_error(mddev, rdev); - return 1; -} - -static int md_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg) -{ - unsigned int minor; - int err = 0; - struct hd_geometry *loc = (struct hd_geometry *) arg; - mddev_t *mddev = NULL; - kdev_t dev; - - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; - - dev = inode->i_rdev; - minor = minor(dev); - if (minor >= MAX_MD_DEVS) { - MD_BUG(); - return -EINVAL; - } - - /* - * Commands dealing with the RAID driver but not any - * particular array: - */ - switch (cmd) - { - case RAID_VERSION: - err = get_version((void *)arg); - goto done; - - case PRINT_RAID_DEBUG: - err = 0; - md_print_devices(); - goto done; - -#ifndef MODULE - case RAID_AUTORUN: - err = 0; - autostart_arrays(); - goto done; -#endif - default:; - } - - /* - * Commands creating/starting a new array: - */ - - mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) { - BUG(); - goto abort; - } - - - if (cmd == START_ARRAY) { - /* START_ARRAY doesn't need to lock the array as autostart_array - * does the locking, and it could even be a different array - */ - err = autostart_array(arg); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name(arg)); - goto abort; - } - goto done; - } - - err = mddev_lock(mddev); - if (err) { - printk(KERN_INFO - "md: ioctl lock interrupted, reason %d, cmd %d\n", - err, cmd); - goto abort; - } - - switch (cmd) - { - case SET_ARRAY_INFO: - - if (!list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: array md%d already has disks!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - if (mddev->raid_disks) { - printk(KERN_WARNING - "md: array md%d already initialised!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - { - mdu_array_info_t info; - if (!arg) - memset(&info, 0, sizeof(info)); - else if (copy_from_user(&info, (void*)arg, sizeof(info))) { - err = -EFAULT; - goto abort_unlock; - } - err = set_array_info(mddev, &info); - if (err) { - printk(KERN_WARNING "md: couldn't set" - " array info. %d\n", err); - goto abort_unlock; - } - } - goto done_unlock; - - default:; - } - - /* - * Commands querying/configuring an existing array: - */ - /* if we are initialised yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */ - if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) { - err = -ENODEV; - goto abort_unlock; - } - - /* - * Commands even a read-only array can execute: - */ - switch (cmd) - { - case GET_ARRAY_INFO: - err = get_array_info(mddev, (void *)arg); - goto done_unlock; - - case GET_DISK_INFO: - err = get_disk_info(mddev, (void *)arg); - goto done_unlock; - - case RESTART_ARRAY_RW: - err = restart_array(mddev); - goto done_unlock; - - case STOP_ARRAY: - err = do_md_stop (mddev, 0); - goto done_unlock; - - case STOP_ARRAY_RO: - err = do_md_stop (mddev, 1); - goto done_unlock; - - /* - * We have a problem here : there is no easy way to give a CHS - * virtual geometry. We currently pretend that we have a 2 heads - * 4 sectors (with a BIG number of cylinders...). This drives - * dosfs just mad... ;-) - */ - case HDIO_GETGEO: - if (!loc) { - err = -EINVAL; - goto abort_unlock; - } - err = put_user (2, (char *) &loc->heads); - if (err) - goto abort_unlock; - err = put_user (4, (char *) &loc->sectors); - if (err) - goto abort_unlock; - err = put_user(get_capacity(disks[mdidx(mddev)])/8, - (short *) &loc->cylinders); - if (err) - goto abort_unlock; - err = put_user (get_start_sect(inode->i_bdev), - (long *) &loc->start); - goto done_unlock; - } - - /* - * The remaining ioctls are changing the state of the - * superblock, so we do not allow read-only arrays - * here: - */ - if (mddev->ro) { - err = -EROFS; - goto abort_unlock; - } - - switch (cmd) - { - case ADD_NEW_DISK: - { - mdu_disk_info_t info; - if (copy_from_user(&info, (void*)arg, sizeof(info))) - err = -EFAULT; - else - err = add_new_disk(mddev, &info); - goto done_unlock; - } - case HOT_GENERATE_ERROR: - err = hot_generate_error(mddev, arg); - goto done_unlock; - case HOT_REMOVE_DISK: - err = hot_remove_disk(mddev, arg); - goto done_unlock; - - case HOT_ADD_DISK: - err = hot_add_disk(mddev, arg); - goto done_unlock; - - case SET_DISK_FAULTY: - err = set_disk_faulty(mddev, arg); - goto done_unlock; - - case RUN_ARRAY: - { - err = do_md_run (mddev); - /* - * we have to clean up the mess if - * the array cannot be run for some - * reason ... - * ->pers will not be set, to superblock will - * not be updated. - */ - if (err) - do_md_stop (mddev, 0); - goto done_unlock; - } - - default: - if (_IOC_TYPE(cmd) == MD_MAJOR) - printk(KERN_WARNING "md: %s(pid %d) used" - " obsolete MD ioctl, upgrade your" - " software to use new ictls.\n", - current->comm, current->pid); - err = -EINVAL; - goto abort_unlock; - } - -done_unlock: -abort_unlock: - mddev_unlock(mddev); - - return err; -done: - if (err) - MD_BUG(); -abort: - return err; -} - -static int md_open(struct inode *inode, struct file *file) -{ - /* - * Succeed if we can find or allocate a mddev structure. - */ - mddev_t *mddev = mddev_find(minor(inode->i_rdev)); - int err = -ENOMEM; - - if (!mddev) - goto out; - - if ((err = mddev_lock(mddev))) - goto put; - - err = 0; - mddev_unlock(mddev); - inode->i_bdev->bd_inode->u.generic_ip = mddev_get(mddev); - put: - mddev_put(mddev); - out: - return err; -} - -static int md_release(struct inode *inode, struct file * file) -{ - mddev_t *mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) - BUG(); - mddev_put(mddev); - - return 0; -} - -static struct block_device_operations md_fops = -{ - .owner = THIS_MODULE, - .open = md_open, - .release = md_release, - .ioctl = md_ioctl, -}; - -int md_thread(void * arg) -{ - mdk_thread_t *thread = arg; - - lock_kernel(); - - /* - * Detach thread - */ - - daemonize(thread->name, mdidx(thread->mddev)); - - current->exit_signal = SIGCHLD; - allow_signal(SIGKILL); - thread->tsk = current; - - /* - * md_thread is a 'system-thread', it's priority should be very - * high. We avoid resource deadlocks individually in each - * raid personality. (RAID5 does preallocation) We also use RR and - * the very same RT priority as kswapd, thus we will never get - * into a priority inversion deadlock. - * - * we definitely have to have equal or higher priority than - * bdflush, otherwise bdflush will deadlock if there are too - * many dirty RAID5 blocks. - */ - unlock_kernel(); - - complete(thread->event); - while (thread->run) { - void (*run)(mddev_t *); - - wait_event_interruptible(thread->wqueue, - test_bit(THREAD_WAKEUP, &thread->flags)); - if (current->flags & PF_FREEZE) - refrigerator(PF_IOTHREAD); - - clear_bit(THREAD_WAKEUP, &thread->flags); - - run = thread->run; - if (run) { - run(thread->mddev); - blk_run_queues(); - } - if (signal_pending(current)) - flush_signals(current); - } - complete(thread->event); - return 0; -} - -void md_wakeup_thread(mdk_thread_t *thread) -{ - if (thread) { - dprintk("md: waking up MD thread %p.\n", thread); - set_bit(THREAD_WAKEUP, &thread->flags); - wake_up(&thread->wqueue); - } -} - -mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, - const char *name) -{ - mdk_thread_t *thread; - int ret; - struct completion event; - - thread = (mdk_thread_t *) kmalloc - (sizeof(mdk_thread_t), GFP_KERNEL); - if (!thread) - return NULL; - - memset(thread, 0, sizeof(mdk_thread_t)); - init_waitqueue_head(&thread->wqueue); - - init_completion(&event); - thread->event = &event; - thread->run = run; - thread->mddev = mddev; - thread->name = name; - ret = kernel_thread(md_thread, thread, 0); - if (ret < 0) { - kfree(thread); - return NULL; - } - wait_for_completion(&event); - return thread; -} - -void md_interrupt_thread(mdk_thread_t *thread) -{ - if (!thread->tsk) { - MD_BUG(); - return; - } - dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); - send_sig(SIGKILL, thread->tsk, 1); -} - -void md_unregister_thread(mdk_thread_t *thread) -{ - struct completion event; - - init_completion(&event); - - thread->event = &event; - thread->run = NULL; - thread->name = NULL; - md_interrupt_thread(thread); - wait_for_completion(&event); - kfree(thread); -} - -void md_error(mddev_t *mddev, mdk_rdev_t *rdev) -{ - dprintk("md_error dev:(%d:%d), rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", - MD_MAJOR,mdidx(mddev), - MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), - __builtin_return_address(0),__builtin_return_address(1), - __builtin_return_address(2),__builtin_return_address(3)); - - if (!mddev) { - MD_BUG(); - return; - } - - if (!rdev || rdev->faulty) - return; - if (!mddev->pers->error_handler) - return; - mddev->pers->error_handler(mddev,rdev); - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); -} - -/* seq_file implementation /proc/mdstat */ - -static void status_unused(struct seq_file *seq) -{ - int i = 0; - mdk_rdev_t *rdev; - struct list_head *tmp; - - seq_printf(seq, "unused devices: "); - - ITERATE_RDEV_PENDING(rdev,tmp) { - i++; - seq_printf(seq, "%s ", - bdev_partition_name(rdev->bdev)); - } - if (!i) - seq_printf(seq, ""); - - seq_printf(seq, "\n"); -} - - -static void status_resync(struct seq_file *seq, mddev_t * mddev) -{ - unsigned long max_blocks, resync, res, dt, db, rt; - - resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; - max_blocks = mddev->size; - - /* - * Should not happen. - */ - if (!max_blocks) { - MD_BUG(); - return; - } - res = (resync/1024)*1000/(max_blocks/1024 + 1); - { - int i, x = res/50, y = 20-x; - seq_printf(seq, "["); - for (i = 0; i < x; i++) - seq_printf(seq, "="); - seq_printf(seq, ">"); - for (i = 0; i < y; i++) - seq_printf(seq, "."); - seq_printf(seq, "] "); - } - seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", - (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? - "resync" : "recovery"), - res/10, res % 10, resync, max_blocks); - - /* - * We do not want to overflow, so the order of operands and - * the * 100 / 100 trick are important. We do a +1 to be - * safe against division by zero. We only estimate anyway. - * - * dt: time from mark until now - * db: blocks written from mark until now - * rt: remaining time - */ - dt = ((jiffies - mddev->resync_mark) / HZ); - if (!dt) dt++; - db = resync - (mddev->resync_mark_cnt/2); - rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; - - seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); - - seq_printf(seq, " speed=%ldK/sec", db/dt); -} - -static void *md_seq_start(struct seq_file *seq, loff_t *pos) -{ - struct list_head *tmp; - loff_t l = *pos; - mddev_t *mddev; - - if (l > 0x10000) - return NULL; - if (!l--) - /* header */ - return (void*)1; - - spin_lock(&all_mddevs_lock); - list_for_each(tmp,&all_mddevs) - if (!l--) { - mddev = list_entry(tmp, mddev_t, all_mddevs); - mddev_get(mddev); - spin_unlock(&all_mddevs_lock); - return mddev; - } - spin_unlock(&all_mddevs_lock); - return (void*)2;/* tail */ -} - -static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) -{ - struct list_head *tmp; - mddev_t *next_mddev, *mddev = v; - - ++*pos; - if (v == (void*)2) - return NULL; - - spin_lock(&all_mddevs_lock); - if (v == (void*)1) - tmp = all_mddevs.next; - else - tmp = mddev->all_mddevs.next; - if (tmp != &all_mddevs) - next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); - else { - next_mddev = (void*)2; - *pos = 0x10000; - } - spin_unlock(&all_mddevs_lock); - - if (v != (void*)1) - mddev_put(mddev); - return next_mddev; - -} - -static void md_seq_stop(struct seq_file *seq, void *v) -{ - mddev_t *mddev = v; - - if (mddev && v != (void*)1 && v != (void*)2) - mddev_put(mddev); -} - -static int md_seq_show(struct seq_file *seq, void *v) -{ - mddev_t *mddev = v; - sector_t size; - struct list_head *tmp2; - mdk_rdev_t *rdev; - int i; - - if (v == (void*)1) { - seq_printf(seq, "Personalities : "); - spin_lock(&pers_lock); - for (i = 0; i < MAX_PERSONALITY; i++) - if (pers[i]) - seq_printf(seq, "[%s] ", pers[i]->name); - - spin_unlock(&pers_lock); - seq_printf(seq, "\n"); - return 0; - } - if (v == (void*)2) { - status_unused(seq); - return 0; - } - - if (mddev_lock(mddev)!=0) - return -EINTR; - if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { - seq_printf(seq, "md%d : %sactive", mdidx(mddev), - mddev->pers ? "" : "in"); - if (mddev->pers) { - if (mddev->ro) - seq_printf(seq, " (read-only)"); - seq_printf(seq, " %s", mddev->pers->name); - } - - size = 0; - ITERATE_RDEV(mddev,rdev,tmp2) { - seq_printf(seq, " %s[%d]", - bdev_partition_name(rdev->bdev), rdev->desc_nr); - if (rdev->faulty) { - seq_printf(seq, "(F)"); - continue; - } - size += rdev->size; - } - - if (!list_empty(&mddev->disks)) { - if (mddev->pers) - seq_printf(seq, "\n %llu blocks", - (unsigned long long)mddev->array_size); - else - seq_printf(seq, "\n %llu blocks", - (unsigned long long)size); - } - - if (mddev->pers) { - mddev->pers->status (seq, mddev); - seq_printf(seq, "\n "); - if (mddev->curr_resync > 2) - status_resync (seq, mddev); - else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) - seq_printf(seq, " resync=DELAYED"); - } - - seq_printf(seq, "\n"); - } - mddev_unlock(mddev); - - return 0; -} - -static struct seq_operations md_seq_ops = { - .start = md_seq_start, - .next = md_seq_next, - .stop = md_seq_stop, - .show = md_seq_show, -}; - -static int md_seq_open(struct inode *inode, struct file *file) -{ - int error; - - error = seq_open(file, &md_seq_ops); - return error; -} - -static struct file_operations md_seq_fops = { - .open = md_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -int register_md_personality(int pnum, mdk_personality_t *p) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - spin_lock(&pers_lock); - if (pers[pnum]) { - spin_unlock(&pers_lock); - MD_BUG(); - return -EBUSY; - } - - pers[pnum] = p; - printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum); - spin_unlock(&pers_lock); - return 0; -} - -int unregister_md_personality(int pnum) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name); - spin_lock(&pers_lock); - pers[pnum] = NULL; - spin_unlock(&pers_lock); - return 0; -} - -void md_sync_acct(mdk_rdev_t *rdev, unsigned long nr_sectors) -{ - rdev->bdev->bd_contains->bd_disk->sync_io += nr_sectors; -} - -static int is_mddev_idle(mddev_t *mddev) -{ - mdk_rdev_t * rdev; - struct list_head *tmp; - int idle; - unsigned long curr_events; - - idle = 1; - ITERATE_RDEV(mddev,rdev,tmp) { - struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; - curr_events = disk_stat_read(disk, read_sectors) + - disk_stat_read(disk, write_sectors) - - disk->sync_io; - if ((curr_events - rdev->last_events) > 32) { - rdev->last_events = curr_events; - idle = 0; - } - } - return idle; -} - -void md_done_sync(mddev_t *mddev, int blocks, int ok) -{ - /* another "blocks" (512byte) blocks have been synced */ - atomic_sub(blocks, &mddev->recovery_active); - wake_up(&mddev->recovery_wait); - if (!ok) { - set_bit(MD_RECOVERY_ERR, &mddev->recovery); - md_wakeup_thread(mddev->thread); - // stop recovery, signal do_sync .... - } -} - - -void md_write_start(mddev_t *mddev) -{ - if (!atomic_read(&mddev->writes_pending)) { - mddev_lock_uninterruptible(mddev); - if (mddev->in_sync) { - mddev->in_sync = 0; - del_timer(&mddev->safemode_timer); - md_update_sb(mddev); - } - atomic_inc(&mddev->writes_pending); - mddev_unlock(mddev); - } else - atomic_inc(&mddev->writes_pending); -} - -void md_write_end(mddev_t *mddev) -{ - if (atomic_dec_and_test(&mddev->writes_pending)) { - if (mddev->safemode == 2) - md_wakeup_thread(mddev->thread); - else - mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); - } -} - -static inline void md_enter_safemode(mddev_t *mddev) -{ - mddev_lock_uninterruptible(mddev); - if (mddev->safemode && !atomic_read(&mddev->writes_pending) && - !mddev->in_sync && mddev->recovery_cp == MaxSector) { - mddev->in_sync = 1; - md_update_sb(mddev); - } - mddev_unlock(mddev); - - if (mddev->safemode == 1) - mddev->safemode = 0; -} - -void md_handle_safemode(mddev_t *mddev) -{ - if (signal_pending(current)) { - printk(KERN_INFO "md: md%d in immediate safe mode\n", - mdidx(mddev)); - mddev->safemode = 2; - flush_signals(current); - } - if (mddev->safemode) - md_enter_safemode(mddev); -} - - -DECLARE_WAIT_QUEUE_HEAD(resync_wait); - -#define SYNC_MARKS 10 -#define SYNC_MARK_STEP (3*HZ) -static void md_do_sync(mddev_t *mddev) -{ - mddev_t *mddev2; - unsigned int max_sectors, currspeed = 0, - j, window; - unsigned long mark[SYNC_MARKS]; - unsigned long mark_cnt[SYNC_MARKS]; - int last_mark,m; - struct list_head *tmp; - unsigned long last_check; - - /* just incase thread restarts... */ - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) - return; - - /* we overload curr_resync somewhat here. - * 0 == not engaged in resync at all - * 2 == checking that there is no conflict with another sync - * 1 == like 2, but have yielded to allow conflicting resync to - * commense - * other == active in resync - this many blocks - */ - do { - mddev->curr_resync = 2; - - ITERATE_MDDEV(mddev2,tmp) { - if (mddev2 == mddev) - continue; - if (mddev2->curr_resync && - match_mddev_units(mddev,mddev2)) { - printk(KERN_INFO "md: delaying resync of md%d" - " until md%d has finished resync (they" - " share one or more physical units)\n", - mdidx(mddev), mdidx(mddev2)); - if (mddev < mddev2) {/* arbitrarily yield */ - mddev->curr_resync = 1; - wake_up(&resync_wait); - } - if (wait_event_interruptible(resync_wait, - mddev2->curr_resync < mddev->curr_resync)) { - flush_signals(current); - mddev_put(mddev2); - goto skip; - } - } - if (mddev->curr_resync == 1) { - mddev_put(mddev2); - break; - } - } - } while (mddev->curr_resync < 2); - - max_sectors = mddev->size << 1; - - printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); - printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" - " %d KB/sec/disc.\n", sysctl_speed_limit_min); - printk(KERN_INFO "md: using maximum available idle IO bandwith " - "(but not more than %d KB/sec) for reconstruction.\n", - sysctl_speed_limit_max); - - is_mddev_idle(mddev); /* this also initializes IO event counters */ - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) - j = mddev->recovery_cp; - else - j = 0; - for (m = 0; m < SYNC_MARKS; m++) { - mark[m] = jiffies; - mark_cnt[m] = j; - } - last_mark = 0; - mddev->resync_mark = mark[last_mark]; - mddev->resync_mark_cnt = mark_cnt[last_mark]; - - /* - * Tune reconstruction: - */ - window = 32*(PAGE_SIZE/512); - printk(KERN_INFO "md: using %dk window, over a total of %d blocks.\n", - window/2,max_sectors/2); - - atomic_set(&mddev->recovery_active, 0); - init_waitqueue_head(&mddev->recovery_wait); - last_check = 0; - - if (j) - printk(KERN_INFO - "md: resuming recovery of md%d from checkpoint.\n", - mdidx(mddev)); - - while (j < max_sectors) { - int sectors; - - sectors = mddev->pers->sync_request(mddev, j, currspeed < sysctl_speed_limit_min); - if (sectors < 0) { - set_bit(MD_RECOVERY_ERR, &mddev->recovery); - goto out; - } - atomic_add(sectors, &mddev->recovery_active); - j += sectors; - if (j>1) mddev->curr_resync = j; - - if (last_check + window > j) - continue; - - last_check = j; - - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || - test_bit(MD_RECOVERY_ERR, &mddev->recovery)) - break; - - blk_run_queues(); - - repeat: - if (jiffies >= mark[last_mark] + SYNC_MARK_STEP ) { - /* step marks */ - int next = (last_mark+1) % SYNC_MARKS; - - mddev->resync_mark = mark[next]; - mddev->resync_mark_cnt = mark_cnt[next]; - mark[next] = jiffies; - mark_cnt[next] = j - atomic_read(&mddev->recovery_active); - last_mark = next; - } - - - if (signal_pending(current)) { - /* - * got a signal, exit. - */ - printk(KERN_INFO - "md: md_do_sync() got signal ... exiting\n"); - flush_signals(current); - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - goto out; - } - - /* - * this loop exits only if either when we are slower than - * the 'hard' speed limit, or the system was IO-idle for - * a jiffy. - * the system might be non-idle CPU-wise, but we only care - * about not overloading the IO subsystem. (things like an - * e2fsck being done on the RAID array should execute fast) - */ - cond_resched(); - - currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; - - if (currspeed > sysctl_speed_limit_min) { - if ((currspeed > sysctl_speed_limit_max) || - !is_mddev_idle(mddev)) { - current->state = TASK_INTERRUPTIBLE; - schedule_timeout(HZ/4); - goto repeat; - } - } - } - printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev)); - /* - * this also signals 'finished resyncing' to md_stop - */ - out: - wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); - - /* tell personality that we are finished */ - mddev->pers->sync_request(mddev, max_sectors, 1); - - if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && - mddev->curr_resync > 2 && - mddev->curr_resync > mddev->recovery_cp) { - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { - printk(KERN_INFO - "md: checkpointing recovery of md%d.\n", - mdidx(mddev)); - mddev->recovery_cp = mddev->curr_resync; - } else - mddev->recovery_cp = MaxSector; - } - - if (mddev->safemode) - md_enter_safemode(mddev); - skip: - mddev->curr_resync = 0; - set_bit(MD_RECOVERY_DONE, &mddev->recovery); - md_wakeup_thread(mddev->thread); -} - - -/* - * This routine is regularly called by all per-raid-array threads to - * deal with generic issues like resync and super-block update. - * Raid personalities that don't have a thread (linear/raid0) do not - * need this as they never do any recovery or update the superblock. - * - * It does not do any resync itself, but rather "forks" off other threads - * to do that as needed. - * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in - * "->recovery" and create a thread at ->sync_thread. - * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) - * and wakeups up this thread which will reap the thread and finish up. - * This thread also removes any faulty devices (with nr_pending == 0). - * - * The overall approach is: - * 1/ if the superblock needs updating, update it. - * 2/ If a recovery thread is running, don't do anything else. - * 3/ If recovery has finished, clean up, possibly marking spares active. - * 4/ If there are any faulty devices, remove them. - * 5/ If array is degraded, try to add spares devices - * 6/ If array has spares or is not in-sync, start a resync thread. - */ -void md_check_recovery(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct list_head *rtmp; - - - dprintk(KERN_INFO "md: recovery thread got woken up ...\n"); - - if (mddev->ro) - return; - if ( ! ( - mddev->sb_dirty || - test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || - test_bit(MD_RECOVERY_DONE, &mddev->recovery) - )) - return; - if (mddev_trylock(mddev)==0) { - int spares =0; - if (mddev->sb_dirty) - md_update_sb(mddev); - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && - !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) - /* resync/recovery still happening */ - goto unlock; - if (mddev->sync_thread) { - /* resync has finished, collect result */ - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery)) { - /* success...*/ - /* activate any spares */ - mddev->pers->spare_active(mddev); - } - md_update_sb(mddev); - mddev->recovery = 0; - wake_up(&resync_wait); - goto unlock; - } - if (mddev->recovery) { - /* that's odd.. */ - mddev->recovery = 0; - wake_up(&resync_wait); - } - - /* no recovery is running. - * remove any failed drives, then - * add spares if possible - */ - ITERATE_RDEV(mddev,rdev,rtmp) { - if (rdev->raid_disk >= 0 && - rdev->faulty && - atomic_read(&rdev->nr_pending)==0) { - mddev->pers->hot_remove_disk(mddev, rdev->raid_disk); - rdev->raid_disk = -1; - } - if (!rdev->faulty && rdev->raid_disk >= 0 && !rdev->in_sync) - spares++; - } - if (mddev->degraded) { - ITERATE_RDEV(mddev,rdev,rtmp) - if (rdev->raid_disk < 0 - && !rdev->faulty) { - if (mddev->pers->hot_add_disk(mddev,rdev)) - spares++; - else - break; - } - } - - if (!spares && (mddev->recovery_cp == MaxSector )) { - /* nothing we can do ... */ - goto unlock; - } - if (mddev->pers->sync_request) { - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - if (!spares) - set_bit(MD_RECOVERY_SYNC, &mddev->recovery); - mddev->sync_thread = md_register_thread(md_do_sync, - mddev, - "md%d_resync"); - if (!mddev->sync_thread) { - printk(KERN_ERR "md%d: could not start resync" - " thread...\n", - mdidx(mddev)); - /* leave the spares where they are, it shouldn't hurt */ - mddev->recovery = 0; - } else { - md_wakeup_thread(mddev->sync_thread); - } - } - unlock: - mddev_unlock(mddev); - } -} - -int md_notify_reboot(struct notifier_block *this, - unsigned long code, void *x) -{ - struct list_head *tmp; - mddev_t *mddev; - - if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { - - printk(KERN_INFO "md: stopping all md devices.\n"); - - ITERATE_MDDEV(mddev,tmp) - if (mddev_trylock(mddev)==0) - do_md_stop (mddev, 1); - /* - * certain more exotic SCSI devices are known to be - * volatile wrt too early system reboots. While the - * right place to handle this issue is the given - * driver, we do want to have a safe RAID driver ... - */ - mdelay(1000*1); - } - return NOTIFY_DONE; -} - -struct notifier_block md_notifier = { - .notifier_call = md_notify_reboot, - .next = NULL, - .priority = INT_MAX, /* before any real devices */ -}; - -static void md_geninit(void) -{ - struct proc_dir_entry *p; - - dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); - -#ifdef CONFIG_PROC_FS - p = create_proc_entry("mdstat", S_IRUGO, NULL); - if (p) - p->proc_fops = &md_seq_fops; -#endif -} - -int __init md_init(void) -{ - int minor; - - printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," - " MD_SB_DISKS=%d\n", - MD_MAJOR_VERSION, MD_MINOR_VERSION, - MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); - - if (register_blkdev(MAJOR_NR, "md")) - return -1; - - devfs_mk_dir("md"); - blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, - md_probe, NULL, NULL); - for (minor=0; minor < MAX_MD_DEVS; ++minor) { - char name[16]; - sprintf(name, "md/%d", minor); - devfs_register(NULL, name, DEVFS_FL_DEFAULT, MAJOR_NR, minor, - S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); - } - - register_reboot_notifier(&md_notifier); - raid_table_header = register_sysctl_table(raid_root_table, 1); - - md_geninit(); - return (0); -} - - -#ifndef MODULE - -/* - * Searches all registered partitions for autorun RAID arrays - * at boot time. - */ -static dev_t detected_devices[128]; -static int dev_cnt; - -void md_autodetect_dev(dev_t dev) -{ - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; -} - - -static void autostart_arrays(void) -{ - mdk_rdev_t *rdev; - int i; - - printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); - - for (i = 0; i < dev_cnt; i++) { - dev_t dev = detected_devices[i]; - - rdev = md_import_device(dev,0, 0); - if (IS_ERR(rdev)) { - printk(KERN_ALERT "md: could not import %s!\n", - partition_name(dev)); - continue; - } - if (rdev->faulty) { - MD_BUG(); - continue; - } - list_add(&rdev->same_set, &pending_raid_disks); - } - dev_cnt = 0; - - autorun_devices(); -} - -#endif - -static __exit void md_exit(void) -{ - int i; - blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); - for (i=0; i < MAX_MD_DEVS; i++) - devfs_remove("md/%d", i); - devfs_remove("md"); - - unregister_blkdev(MAJOR_NR,"md"); - unregister_reboot_notifier(&md_notifier); - unregister_sysctl_table(raid_table_header); -#ifdef CONFIG_PROC_FS - remove_proc_entry("mdstat", NULL); -#endif - for (i = 0; i < MAX_MD_DEVS; i++) { - struct gendisk *disk = disks[i]; - mddev_t *mddev; - if (!disks[i]) - continue; - mddev = disk->private_data; - del_gendisk(disk); - put_disk(disk); - mddev_put(mddev); - } -} - -module_init(md_init) -module_exit(md_exit) - -EXPORT_SYMBOL(register_md_personality); -EXPORT_SYMBOL(unregister_md_personality); -EXPORT_SYMBOL(md_error); -EXPORT_SYMBOL(md_sync_acct); -EXPORT_SYMBOL(md_done_sync); -EXPORT_SYMBOL(md_write_start); -EXPORT_SYMBOL(md_write_end); -EXPORT_SYMBOL(md_handle_safemode); -EXPORT_SYMBOL(md_register_thread); -EXPORT_SYMBOL(md_unregister_thread); -EXPORT_SYMBOL(md_wakeup_thread); -EXPORT_SYMBOL(md_print_devices); -EXPORT_SYMBOL(md_interrupt_thread); -EXPORT_SYMBOL(md_check_recovery); -MODULE_LICENSE("GPL"); ./linux/md/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.091426892 +0000 @@ -1,3589 +0,0 @@ -/* - md.c : Multiple Devices driver for Linux - Copyright (C) 1998, 1999, 2000 Ingo Molnar - - completely rewritten, based on the MD driver code from Marc Zyngier - - Changes: - - - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar - - boot support for linear and striped mode by Harald Hoyer - - kerneld support by Boris Tobotras - - kmod support by: Cyrus Durgin - - RAID0 bugfixes: Mark Anthony Lisher - - Devfs support by Richard Gooch - - - lots of fixes and improvements to the RAID1/RAID5 and generic - RAID code (such as request based resynchronization): - - Neil Brown . - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - You should have received a copy of the GNU General Public License - (for example /usr/src/linux/COPYING); if not, write to the Free - Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -*/ - -#include -#include -#include -#include -#include -#include -#include -#include /* for invalidate_bdev */ -#include - -#include - -#ifdef CONFIG_KMOD -#include -#endif - -#define __KERNEL_SYSCALLS__ -#include - -#include - -#define MAJOR_NR MD_MAJOR -#define MD_DRIVER -#define DEVICE_NR(device) (minor(device)) - -#include - -#define DEBUG 0 -#define dprintk(x...) ((void)(DEBUG && printk(x))) - - -#ifndef MODULE -static void autostart_arrays (void); -#endif - -static mdk_personality_t *pers[MAX_PERSONALITY]; -static spinlock_t pers_lock = SPIN_LOCK_UNLOCKED; - -/* - * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' - * is 1000 KB/sec, so the extra system load does not show up that much. - * Increase it if you want to have more _guaranteed_ speed. Note that - * the RAID driver will use the maximum available bandwith if the IO - * subsystem is idle. There is also an 'absolute maximum' reconstruction - * speed limit - in case reconstruction slows down your system despite - * idle IO detection. - * - * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. - */ - -static int sysctl_speed_limit_min = 1000; -static int sysctl_speed_limit_max = 200000; - -static struct ctl_table_header *raid_table_header; - -static ctl_table raid_table[] = { - { - .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, - .procname = "speed_limit_min", - .data = &sysctl_speed_limit_min, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, - { - .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, - .procname = "speed_limit_max", - .data = &sysctl_speed_limit_max, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, - { .ctl_name = 0 } -}; - -static ctl_table raid_dir_table[] = { - { - .ctl_name = DEV_RAID, - .procname = "raid", - .maxlen = 0, - .mode = 0555, - .child = raid_table, - }, - { .ctl_name = 0 } -}; - -static ctl_table raid_root_table[] = { - { - .ctl_name = CTL_DEV, - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = raid_dir_table, - }, - { .ctl_name = 0 } -}; - -static struct block_device_operations md_fops; - -static struct gendisk *disks[MAX_MD_DEVS]; - -/* - * Enables to iterate over all existing md arrays - * all_mddevs_lock protects this list as well as mddev_map. - */ -static LIST_HEAD(all_mddevs); -static spinlock_t all_mddevs_lock = SPIN_LOCK_UNLOCKED; - - -/* - * iterates through all used mddevs in the system. - * We take care to grab the all_mddevs_lock whenever navigating - * the list, and to always hold a refcount when unlocked. - * Any code which breaks out of this loop while own - * a reference to the current mddev and must mddev_put it. - */ -#define ITERATE_MDDEV(mddev,tmp) \ - \ - for (({ spin_lock(&all_mddevs_lock); \ - tmp = all_mddevs.next; \ - mddev = NULL;}); \ - ({ if (tmp != &all_mddevs) \ - mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ - spin_unlock(&all_mddevs_lock); \ - if (mddev) mddev_put(mddev); \ - mddev = list_entry(tmp, mddev_t, all_mddevs); \ - tmp != &all_mddevs;}); \ - ({ spin_lock(&all_mddevs_lock); \ - tmp = tmp->next;}) \ - ) - -static mddev_t *mddev_map[MAX_MD_DEVS]; - -static int md_fail_request (request_queue_t *q, struct bio *bio) -{ - bio_io_error(bio, bio->bi_size); - return 0; -} - -static inline mddev_t *mddev_get(mddev_t *mddev) -{ - atomic_inc(&mddev->active); - return mddev; -} - -static void mddev_put(mddev_t *mddev) -{ - if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) - return; - if (!mddev->raid_disks && list_empty(&mddev->disks)) { - list_del(&mddev->all_mddevs); - mddev_map[mdidx(mddev)] = NULL; - kfree(mddev); - MOD_DEC_USE_COUNT; - } - spin_unlock(&all_mddevs_lock); -} - -static mddev_t * mddev_find(int unit) -{ - mddev_t *mddev, *new = NULL; - - retry: - spin_lock(&all_mddevs_lock); - if (mddev_map[unit]) { - mddev = mddev_get(mddev_map[unit]); - spin_unlock(&all_mddevs_lock); - if (new) - kfree(new); - return mddev; - } - if (new) { - mddev_map[unit] = new; - list_add(&new->all_mddevs, &all_mddevs); - spin_unlock(&all_mddevs_lock); - MOD_INC_USE_COUNT; - return new; - } - spin_unlock(&all_mddevs_lock); - - new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - memset(new, 0, sizeof(*new)); - - new->__minor = unit; - init_MUTEX(&new->reconfig_sem); - INIT_LIST_HEAD(&new->disks); - INIT_LIST_HEAD(&new->all_mddevs); - init_timer(&new->safemode_timer); - atomic_set(&new->active, 1); - blk_queue_make_request(&new->queue, md_fail_request); - - goto retry; -} - -static inline int mddev_lock(mddev_t * mddev) -{ - return down_interruptible(&mddev->reconfig_sem); -} - -static inline void mddev_lock_uninterruptible(mddev_t * mddev) -{ - down(&mddev->reconfig_sem); -} - -static inline int mddev_trylock(mddev_t * mddev) -{ - return down_trylock(&mddev->reconfig_sem); -} - -static inline void mddev_unlock(mddev_t * mddev) -{ - up(&mddev->reconfig_sem); -} - -mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) -{ - mdk_rdev_t * rdev; - struct list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == nr) - return rdev; - } - return NULL; -} - -static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->bdev->bd_dev == dev) - return rdev; - } - return NULL; -} - -inline static sector_t calc_dev_sboffset(struct block_device *bdev) -{ - sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - return MD_NEW_SIZE_BLOCKS(size); -} - -static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) -{ - sector_t size; - - size = rdev->sb_offset; - - if (chunk_size) - size &= ~((sector_t)chunk_size/1024 - 1); - return size; -} - -static int alloc_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) - MD_BUG(); - - rdev->sb_page = alloc_page(GFP_KERNEL); - if (!rdev->sb_page) { - printk(KERN_ALERT "md: out of memory.\n"); - return -EINVAL; - } - - return 0; -} - -static void free_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) { - page_cache_release(rdev->sb_page); - rdev->sb_loaded = 0; - rdev->sb_page = NULL; - rdev->sb_offset = 0; - rdev->size = 0; - } -} - - -static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) -{ - if (bio->bi_size) - return 1; - - complete((struct completion*)bio->bi_private); - return 0; -} - -static int sync_page_io(struct block_device *bdev, sector_t sector, int size, - struct page *page, int rw) -{ - struct bio bio; - struct bio_vec vec; - struct completion event; - - bio_init(&bio); - bio.bi_io_vec = &vec; - vec.bv_page = page; - vec.bv_len = size; - vec.bv_offset = 0; - bio.bi_vcnt = 1; - bio.bi_idx = 0; - bio.bi_size = size; - bio.bi_bdev = bdev; - bio.bi_sector = sector; - init_completion(&event); - bio.bi_private = &event; - bio.bi_end_io = bi_complete; - submit_bio(rw, &bio); - blk_run_queues(); - wait_for_completion(&event); - - return test_bit(BIO_UPTODATE, &bio.bi_flags); -} - -static int read_disk_sb(mdk_rdev_t * rdev) -{ - - if (!rdev->sb_page) { - MD_BUG(); - return -EINVAL; - } - if (rdev->sb_loaded) - return 0; - - - if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, READ)) - goto fail; - rdev->sb_loaded = 1; - return 0; - -fail: - printk(KERN_ERR "md: disabled device %s, could not read superblock.\n", - bdev_partition_name(rdev->bdev)); - return -EINVAL; -} - -static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - if ( (sb1->set_uuid0 == sb2->set_uuid0) && - (sb1->set_uuid1 == sb2->set_uuid1) && - (sb1->set_uuid2 == sb2->set_uuid2) && - (sb1->set_uuid3 == sb2->set_uuid3)) - - return 1; - - return 0; -} - - -static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - int ret; - mdp_super_t *tmp1, *tmp2; - - tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); - tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); - - if (!tmp1 || !tmp2) { - ret = 0; - printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); - goto abort; - } - - *tmp1 = *sb1; - *tmp2 = *sb2; - - /* - * nr_disks is not constant - */ - tmp1->nr_disks = 0; - tmp2->nr_disks = 0; - - if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) - ret = 0; - else - ret = 1; - -abort: - if (tmp1) - kfree(tmp1); - if (tmp2) - kfree(tmp2); - - return ret; -} - -static unsigned int calc_sb_csum(mdp_super_t * sb) -{ - unsigned int disk_csum, csum; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); - sb->sb_csum = disk_csum; - return csum; -} - -/* - * Handle superblock details. - * We want to be able to handle multiple superblock formats - * so we have a common interface to them all, and an array of - * different handlers. - * We rely on user-space to write the initial superblock, and support - * reading and updating of superblocks. - * Interface methods are: - * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) - * loads and validates a superblock on dev. - * if refdev != NULL, compare superblocks on both devices - * Return: - * 0 - dev has a superblock that is compatible with refdev - * 1 - dev has a superblock that is compatible and newer than refdev - * so dev should be used as the refdev in future - * -EINVAL superblock incompatible or invalid - * -othererror e.g. -EIO - * - * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) - * Verify that dev is acceptable into mddev. - * The first time, mddev->raid_disks will be 0, and data from - * dev should be merged in. Subsequent calls check that dev - * is new enough. Return 0 or -EINVAL - * - * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) - * Update the superblock for rdev with data in mddev - * This does not write to disc. - * - */ - -struct super_type { - char *name; - struct module *owner; - int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); - int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); - void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); -}; - -/* - * load_super for 0.90.0 - */ -static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) -{ - mdp_super_t *sb; - int ret; - sector_t sb_offset; - - /* - * Calculate the position of the superblock, - * it's at the end of the disk. - * - * It also happens to be a multiple of 4Kb. - */ - sb_offset = calc_dev_sboffset(rdev->bdev); - rdev->sb_offset = sb_offset; - - ret = read_disk_sb(rdev); - if (ret) return ret; - - ret = -EINVAL; - - sb = (mdp_super_t*)page_address(rdev->sb_page); - - if (sb->md_magic != MD_SB_MAGIC) { - printk(KERN_ERR "md: invalid raid superblock magic on %s\n", - bdev_partition_name(rdev->bdev)); - goto abort; - } - - if (sb->major_version != 0 || - sb->minor_version != 90) { - printk(KERN_WARNING "Bad version number %d.%d on %s\n", - sb->major_version, sb->minor_version, - bdev_partition_name(rdev->bdev)); - goto abort; - } - - if (sb->md_minor >= MAX_MD_DEVS) { - printk(KERN_ERR "md: %s: invalid raid minor (%x)\n", - bdev_partition_name(rdev->bdev), sb->md_minor); - goto abort; - } - if (sb->raid_disks <= 0) - goto abort; - - if (calc_sb_csum(sb) != sb->sb_csum) { - printk(KERN_WARNING "md: invalid superblock checksum on %s\n", - bdev_partition_name(rdev->bdev)); - goto abort; - } - - rdev->preferred_minor = sb->md_minor; - rdev->data_offset = 0; - - if (sb->level == MULTIPATH) - rdev->desc_nr = -1; - else - rdev->desc_nr = sb->this_disk.number; - - if (refdev == 0) - ret = 1; - else { - __u64 ev1, ev2; - mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); - if (!uuid_equal(refsb, sb)) { - printk(KERN_WARNING "md: %s has different UUID to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - goto abort; - } - if (!sb_equal(refsb, sb)) { - printk(KERN_WARNING "md: %s has same UUID" - " but different superblock to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - goto abort; - } - ev1 = md_event(sb); - ev2 = md_event(refsb); - if (ev1 > ev2) - ret = 1; - else - ret = 0; - } - rdev->size = calc_dev_size(rdev, sb->chunk_size); - - abort: - return ret; -} - -/* - * validate_super for 0.90.0 - */ -static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) -{ - mdp_disk_t *desc; - mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); - - if (mddev->raid_disks == 0) { - mddev->major_version = 0; - mddev->minor_version = sb->minor_version; - mddev->patch_version = sb->patch_version; - mddev->persistent = ! sb->not_persistent; - mddev->chunk_size = sb->chunk_size; - mddev->ctime = sb->ctime; - mddev->utime = sb->utime; - mddev->level = sb->level; - mddev->layout = sb->layout; - mddev->raid_disks = sb->raid_disks; - mddev->size = sb->size; - mddev->events = md_event(sb); - - if (sb->state & (1<recovery_cp = MaxSector; - else { - if (sb->events_hi == sb->cp_events_hi && - sb->events_lo == sb->cp_events_lo) { - mddev->recovery_cp = sb->recovery_cp; - } else - mddev->recovery_cp = 0; - } - - memcpy(mddev->uuid+0, &sb->set_uuid0, 4); - memcpy(mddev->uuid+4, &sb->set_uuid1, 4); - memcpy(mddev->uuid+8, &sb->set_uuid2, 4); - memcpy(mddev->uuid+12,&sb->set_uuid3, 4); - - mddev->max_disks = MD_SB_DISKS; - } else { - __u64 ev1; - ev1 = md_event(sb); - ++ev1; - if (ev1 < mddev->events) - return -EINVAL; - } - if (mddev->level != LEVEL_MULTIPATH) { - rdev->raid_disk = -1; - rdev->in_sync = rdev->faulty = 0; - desc = sb->disks + rdev->desc_nr; - - if (desc->state & (1<faulty = 1; - else if (desc->state & (1<raid_disk < mddev->raid_disks) { - rdev->in_sync = 1; - rdev->raid_disk = desc->raid_disk; - } - } - return 0; -} - -/* - * sync_super for 0.90.0 - */ -static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) -{ - mdp_super_t *sb; - struct list_head *tmp; - mdk_rdev_t *rdev2; - int next_spare = mddev->raid_disks; - - /* make rdev->sb match mddev data.. - * - * 1/ zero out disks - * 2/ Add info for each disk, keeping track of highest desc_nr - * 3/ any empty disks < highest become removed - * - * disks[0] gets initialised to REMOVED because - * we cannot be sure from other fields if it has - * been initialised or not. - */ - int highest = 0; - int i; - int active=0, working=0,failed=0,spare=0,nr_disks=0; - - sb = (mdp_super_t*)page_address(rdev->sb_page); - - memset(sb, 0, sizeof(*sb)); - - sb->md_magic = MD_SB_MAGIC; - sb->major_version = mddev->major_version; - sb->minor_version = mddev->minor_version; - sb->patch_version = mddev->patch_version; - sb->gvalid_words = 0; /* ignored */ - memcpy(&sb->set_uuid0, mddev->uuid+0, 4); - memcpy(&sb->set_uuid1, mddev->uuid+4, 4); - memcpy(&sb->set_uuid2, mddev->uuid+8, 4); - memcpy(&sb->set_uuid3, mddev->uuid+12,4); - - sb->ctime = mddev->ctime; - sb->level = mddev->level; - sb->size = mddev->size; - sb->raid_disks = mddev->raid_disks; - sb->md_minor = mddev->__minor; - sb->not_persistent = !mddev->persistent; - sb->utime = mddev->utime; - sb->state = 0; - sb->events_hi = (mddev->events>>32); - sb->events_lo = (u32)mddev->events; - - if (mddev->in_sync) - { - sb->recovery_cp = mddev->recovery_cp; - sb->cp_events_hi = (mddev->events>>32); - sb->cp_events_lo = (u32)mddev->events; - if (mddev->recovery_cp == MaxSector) - sb->state = (1<< MD_SB_CLEAN); - } else - sb->recovery_cp = 0; - - sb->layout = mddev->layout; - sb->chunk_size = mddev->chunk_size; - - sb->disks[0].state = (1<raid_disk >= 0 && rdev2->in_sync && !rdev2->faulty) - rdev2->desc_nr = rdev2->raid_disk; - else - rdev2->desc_nr = next_spare++; - d = &sb->disks[rdev2->desc_nr]; - nr_disks++; - d->number = rdev2->desc_nr; - d->major = MAJOR(rdev2->bdev->bd_dev); - d->minor = MINOR(rdev2->bdev->bd_dev); - if (rdev2->raid_disk >= 0 && rdev->in_sync && !rdev2->faulty) - d->raid_disk = rdev2->raid_disk; - else - d->raid_disk = rdev2->desc_nr; /* compatibility */ - if (rdev2->faulty) { - d->state = (1<in_sync) { - d->state = (1<state |= (1<state = 0; - spare++; - working++; - } - if (rdev2->desc_nr > highest) - highest = rdev2->desc_nr; - } - - /* now set the "removed" bit on any non-trailing holes */ - for (i=0; idisks[i]; - if (d->state == 0 && d->number == 0) { - d->number = i; - d->raid_disk = i; - d->state = (1<nr_disks = nr_disks; - sb->active_disks = active; - sb->working_disks = working; - sb->failed_disks = failed; - sb->spare_disks = spare; - - sb->this_disk = sb->disks[rdev->desc_nr]; - sb->sb_csum = calc_sb_csum(sb); -} - -/* - * version 1 superblock - */ - -static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) -{ - unsigned int disk_csum, csum; - int size = 256 + sb->max_dev*2; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, size, 0); - sb->sb_csum = disk_csum; - return csum; -} - -static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) -{ - struct mdp_superblock_1 *sb; - int ret; - sector_t sb_offset; - - /* - * Calculate the position of the superblock. - * It is always aligned to a 4K boundary and - * depeding on minor_version, it can be: - * 0: At least 8K, but less than 12K, from end of device - * 1: At start of device - * 2: 4K from start of device. - */ - switch(minor_version) { - case 0: - sb_offset = rdev->bdev->bd_inode->i_size >> 9; - sb_offset -= 8*2; - sb_offset &= ~(4*2); - /* convert from sectors to K */ - sb_offset /= 2; - break; - case 1: - sb_offset = 0; - break; - case 2: - sb_offset = 4; - break; - default: - return -EINVAL; - } - rdev->sb_offset = sb_offset; - - ret = read_disk_sb(rdev); - if (ret) return ret; - - - sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || - sb->major_version != cpu_to_le32(1) || - le32_to_cpu(sb->max_dev) > (4096-256)/2 || - le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || - sb->feature_map != 0) - return -EINVAL; - - if (calc_sb_1_csum(sb) != sb->sb_csum) { - printk("md: invalid superblock checksum on %s\n", - bdev_partition_name(rdev->bdev)); - return -EINVAL; - } - rdev->preferred_minor = 0xffff; - rdev->data_offset = le64_to_cpu(sb->data_offset); - - if (refdev == 0) - return 1; - else { - __u64 ev1, ev2; - struct mdp_superblock_1 *refsb = - (struct mdp_superblock_1*)page_address(refdev->sb_page); - - if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || - sb->level != refsb->level || - sb->layout != refsb->layout || - sb->chunksize != refsb->chunksize) { - printk(KERN_WARNING "md: %s has strangely different" - " superblock to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(refdev->bdev)); - return -EINVAL; - } - ev1 = le64_to_cpu(sb->events); - ev2 = le64_to_cpu(refsb->events); - - if (ev1 > ev2) - return 1; - } - if (minor_version) - rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; - else - rdev->size = rdev->sb_offset; - if (rdev->size < le64_to_cpu(sb->data_size)/2) - return -EINVAL; - rdev->size = le64_to_cpu(sb->data_size)/2; - if (le32_to_cpu(sb->chunksize)) - rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); - return 0; -} - -static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) -{ - struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - if (mddev->raid_disks == 0) { - mddev->major_version = 1; - mddev->minor_version = 0; - mddev->patch_version = 0; - mddev->persistent = 1; - mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; - mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); - mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); - mddev->level = le32_to_cpu(sb->level); - mddev->layout = le32_to_cpu(sb->layout); - mddev->raid_disks = le32_to_cpu(sb->raid_disks); - mddev->size = (u32)le64_to_cpu(sb->size); - mddev->events = le64_to_cpu(sb->events); - - mddev->recovery_cp = le64_to_cpu(sb->resync_offset); - memcpy(mddev->uuid, sb->set_uuid, 16); - - mddev->max_disks = (4096-256)/2; - } else { - __u64 ev1; - ev1 = le64_to_cpu(sb->events); - ++ev1; - if (ev1 < mddev->events) - return -EINVAL; - } - - if (mddev->level != LEVEL_MULTIPATH) { - int role; - rdev->desc_nr = le32_to_cpu(sb->dev_number); - role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); - switch(role) { - case 0xffff: /* spare */ - rdev->in_sync = 0; - rdev->faulty = 0; - rdev->raid_disk = -1; - break; - case 0xfffe: /* faulty */ - rdev->in_sync = 0; - rdev->faulty = 1; - rdev->raid_disk = -1; - break; - default: - rdev->in_sync = 1; - rdev->faulty = 0; - rdev->raid_disk = role; - break; - } - } - return 0; -} - -static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) -{ - struct mdp_superblock_1 *sb; - struct list_head *tmp; - mdk_rdev_t *rdev2; - int max_dev, i; - /* make rdev->sb match mddev and rdev data. */ - - sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); - - sb->feature_map = 0; - sb->pad0 = 0; - memset(sb->pad1, 0, sizeof(sb->pad1)); - memset(sb->pad2, 0, sizeof(sb->pad2)); - memset(sb->pad3, 0, sizeof(sb->pad3)); - - sb->utime = cpu_to_le64((__u64)mddev->utime); - sb->events = cpu_to_le64(mddev->events); - if (mddev->in_sync) - sb->resync_offset = cpu_to_le64(mddev->recovery_cp); - else - sb->resync_offset = cpu_to_le64(0); - - max_dev = 0; - ITERATE_RDEV(mddev,rdev2,tmp) - if (rdev2->desc_nr > max_dev) - max_dev = rdev2->desc_nr; - - sb->max_dev = max_dev; - for (i=0; idev_roles[max_dev] = cpu_to_le16(0xfffe); - - ITERATE_RDEV(mddev,rdev2,tmp) { - i = rdev2->desc_nr; - if (rdev2->faulty) - sb->dev_roles[i] = cpu_to_le16(0xfffe); - else if (rdev2->in_sync) - sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); - else - sb->dev_roles[i] = cpu_to_le16(0xffff); - } - - sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ -} - - -struct super_type super_types[] = { - [0] = { - .name = "0.90.0", - .owner = THIS_MODULE, - .load_super = super_90_load, - .validate_super = super_90_validate, - .sync_super = super_90_sync, - }, - [1] = { - .name = "md-1", - .owner = THIS_MODULE, - .load_super = super_1_load, - .validate_super = super_1_validate, - .sync_super = super_1_sync, - }, -}; - -static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) - if (rdev->bdev->bd_contains == dev->bdev->bd_contains) - return rdev; - - return NULL; -} - -static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev1,rdev,tmp) - if (match_dev_unit(mddev2, rdev)) - return 1; - - return 0; -} - -static LIST_HEAD(pending_raid_disks); - -static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) -{ - mdk_rdev_t *same_pdev; - - if (rdev->mddev) { - MD_BUG(); - return -EINVAL; - } - same_pdev = match_dev_unit(mddev, rdev); - if (same_pdev) - printk(KERN_WARNING - "md%d: WARNING: %s appears to be on the same physical" - " disk as %s. True\n protection against single-disk" - " failure might be compromised.\n", - mdidx(mddev), bdev_partition_name(rdev->bdev), - bdev_partition_name(same_pdev->bdev)); - - /* Verify rdev->desc_nr is unique. - * If it is -1, assign a free number, else - * check number is not in use - */ - if (rdev->desc_nr < 0) { - int choice = 0; - if (mddev->pers) choice = mddev->raid_disks; - while (find_rdev_nr(mddev, choice)) - choice++; - rdev->desc_nr = choice; - } else { - if (find_rdev_nr(mddev, rdev->desc_nr)) - return -EBUSY; - } - - list_add(&rdev->same_set, &mddev->disks); - rdev->mddev = mddev; - printk(KERN_INFO "md: bind<%s>\n", bdev_partition_name(rdev->bdev)); - return 0; -} - -static void unbind_rdev_from_array(mdk_rdev_t * rdev) -{ - if (!rdev->mddev) { - MD_BUG(); - return; - } - list_del_init(&rdev->same_set); - printk(KERN_INFO "md: unbind<%s>\n", bdev_partition_name(rdev->bdev)); - rdev->mddev = NULL; -} - -/* - * prevent the device from being mounted, repartitioned or - * otherwise reused by a RAID array (or any other kernel - * subsystem), by opening the device. [simply getting an - * inode is not enough, the SCSI module usage code needs - * an explicit open() on the device] - */ -static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) -{ - int err = 0; - struct block_device *bdev; - - bdev = bdget(dev); - if (!bdev) - return -ENOMEM; - err = blkdev_get(bdev, FMODE_READ|FMODE_WRITE, 0, BDEV_RAW); - if (err) - return err; - err = bd_claim(bdev, rdev); - if (err) { - blkdev_put(bdev, BDEV_RAW); - return err; - } - rdev->bdev = bdev; - return err; -} - -static void unlock_rdev(mdk_rdev_t *rdev) -{ - struct block_device *bdev = rdev->bdev; - rdev->bdev = NULL; - if (!bdev) - MD_BUG(); - bd_release(bdev); - blkdev_put(bdev, BDEV_RAW); -} - -void md_autodetect_dev(dev_t dev); - -static void export_rdev(mdk_rdev_t * rdev) -{ - printk(KERN_INFO "md: export_rdev(%s)\n", - bdev_partition_name(rdev->bdev)); - if (rdev->mddev) - MD_BUG(); - free_disk_sb(rdev); - list_del_init(&rdev->same_set); -#ifndef MODULE - md_autodetect_dev(rdev->bdev->bd_dev); -#endif - unlock_rdev(rdev); - kfree(rdev); -} - -static void kick_rdev_from_array(mdk_rdev_t * rdev) -{ - unbind_rdev_from_array(rdev); - export_rdev(rdev); -} - -static void export_array(mddev_t *mddev) -{ - struct list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!rdev->mddev) { - MD_BUG(); - continue; - } - kick_rdev_from_array(rdev); - } - if (!list_empty(&mddev->disks)) - MD_BUG(); - mddev->raid_disks = 0; - mddev->major_version = 0; -} - -static void print_desc(mdp_disk_t *desc) -{ - printk(" DISK\n", desc->number, - partition_name(MKDEV(desc->major,desc->minor)), - desc->major,desc->minor,desc->raid_disk,desc->state); -} - -static void print_sb(mdp_super_t *sb) -{ - int i; - - printk(KERN_INFO - "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", - sb->major_version, sb->minor_version, sb->patch_version, - sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, - sb->ctime); - printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", - sb->level, sb->size, sb->nr_disks, sb->raid_disks, - sb->md_minor, sb->layout, sb->chunk_size); - printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" - " FD:%d SD:%d CSUM:%08x E:%08lx\n", - sb->utime, sb->state, sb->active_disks, sb->working_disks, - sb->failed_disks, sb->spare_disks, - sb->sb_csum, (unsigned long)sb->events_lo); - - printk(KERN_INFO); - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - - desc = sb->disks + i; - if (desc->number || desc->major || desc->minor || - desc->raid_disk || (desc->state && (desc->state != 4))) { - printk(" D %2d: ", i); - print_desc(desc); - } - } - printk(KERN_INFO "md: THIS: "); - print_desc(&sb->this_disk); - -} - -static void print_rdev(mdk_rdev_t *rdev) -{ - printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%d ", - bdev_partition_name(rdev->bdev), (unsigned long long)rdev->size, - rdev->faulty, rdev->in_sync, rdev->desc_nr); - if (rdev->sb_loaded) { - printk(KERN_INFO "md: rdev superblock:\n"); - print_sb((mdp_super_t*)page_address(rdev->sb_page)); - } else - printk(KERN_INFO "md: no rdev superblock!\n"); -} - -void md_print_devices(void) -{ - struct list_head *tmp, *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev; - - printk("\n"); - printk("md: **********************************\n"); - printk("md: * *\n"); - printk("md: **********************************\n"); - ITERATE_MDDEV(mddev,tmp) { - printk("md%d: ", mdidx(mddev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - printk("<%s>", bdev_partition_name(rdev->bdev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - print_rdev(rdev); - } - printk("md: **********************************\n"); - printk("\n"); -} - - -static int write_disk_sb(mdk_rdev_t * rdev) -{ - - if (!rdev->sb_loaded) { - MD_BUG(); - return 1; - } - if (rdev->faulty) { - MD_BUG(); - return 1; - } - - dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->sb_offset); - - if (sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE)) - return 0; - - printk("md: write_disk_sb failed for device %s\n", - bdev_partition_name(rdev->bdev)); - return 1; -} - -static void sync_sbs(mddev_t * mddev) -{ - mdk_rdev_t *rdev; - struct list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - super_types[mddev->major_version]. - sync_super(mddev, rdev); - rdev->sb_loaded = 1; - } -} - -static void md_update_sb(mddev_t * mddev) -{ - int err, count = 100; - struct list_head *tmp; - mdk_rdev_t *rdev; - - mddev->sb_dirty = 0; -repeat: - mddev->utime = get_seconds(); - mddev->events ++; - - if (!mddev->events) { - /* - * oops, this 64-bit counter should never wrap. - * Either we are in around ~1 trillion A.C., assuming - * 1 reboot per second, or we have a bug: - */ - MD_BUG(); - mddev->events --; - } - sync_sbs(mddev); - - /* - * do not write anything to disk if using - * nonpersistent superblocks - */ - if (!mddev->persistent) - return; - - dprintk(KERN_INFO - "md: updating md%d RAID superblock on device (in sync %d)\n", - mdidx(mddev),mddev->in_sync); - - err = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - dprintk(KERN_INFO "md: "); - if (rdev->faulty) - dprintk("(skipping faulty "); - - dprintk("%s ", bdev_partition_name(rdev->bdev)); - if (!rdev->faulty) { - err += write_disk_sb(rdev); - } else - dprintk(")\n"); - if (!err && mddev->level == LEVEL_MULTIPATH) - /* only need to write one superblock... */ - break; - } - if (err) { - if (--count) { - printk(KERN_ERR "md: errors occurred during superblock" - " update, repeating\n"); - goto repeat; - } - printk(KERN_ERR \ - "md: excessive errors occurred during superblock update, exiting\n"); - } -} - -/* - * Import a device. If 'super_format' >= 0, then sanity check the superblock - * - * mark the device faulty if: - * - * - the device is nonexistent (zero size) - * - the device has no valid superblock - * - * a faulty rdev _never_ has rdev->sb set. - */ -static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) -{ - int err; - mdk_rdev_t *rdev; - sector_t size; - - rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL); - if (!rdev) { - printk(KERN_ERR "md: could not alloc mem for %s!\n", - partition_name(newdev)); - return ERR_PTR(-ENOMEM); - } - memset(rdev, 0, sizeof(*rdev)); - - if ((err = alloc_disk_sb(rdev))) - goto abort_free; - - err = lock_rdev(rdev, newdev); - if (err) { - printk(KERN_ERR "md: could not lock %s.\n", - partition_name(newdev)); - goto abort_free; - } - rdev->desc_nr = -1; - rdev->faulty = 0; - rdev->in_sync = 0; - rdev->data_offset = 0; - atomic_set(&rdev->nr_pending, 0); - - size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - if (!size) { - printk(KERN_WARNING - "md: %s has zero or unknown size, marking faulty!\n", - bdev_partition_name(rdev->bdev)); - err = -EINVAL; - goto abort_free; - } - - if (super_format >= 0) { - err = super_types[super_format]. - load_super(rdev, NULL, super_minor); - if (err == -EINVAL) { - printk(KERN_WARNING - "md: %s has invalid sb, not importing!\n", - bdev_partition_name(rdev->bdev)); - goto abort_free; - } - if (err < 0) { - printk(KERN_WARNING - "md: could not read %s's sb, not importing!\n", - bdev_partition_name(rdev->bdev)); - goto abort_free; - } - } - INIT_LIST_HEAD(&rdev->same_set); - - return rdev; - -abort_free: - if (rdev->sb_page) { - if (rdev->bdev) - unlock_rdev(rdev); - free_disk_sb(rdev); - } - kfree(rdev); - return ERR_PTR(err); -} - -/* - * Check a full RAID array for plausibility - */ - - -static int analyze_sbs(mddev_t * mddev) -{ - int i; - struct list_head *tmp; - mdk_rdev_t *rdev, *freshest; - - freshest = NULL; - ITERATE_RDEV(mddev,rdev,tmp) - switch (super_types[mddev->major_version]. - load_super(rdev, freshest, mddev->minor_version)) { - case 1: - freshest = rdev; - break; - case 0: - break; - default: - printk( KERN_ERR \ - "md: fatal superblock inconsistency in %s" - " -- removing from array\n", - bdev_partition_name(rdev->bdev)); - kick_rdev_from_array(rdev); - } - - - super_types[mddev->major_version]. - validate_super(mddev, freshest); - - i = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev != freshest) - if (super_types[mddev->major_version]. - validate_super(mddev, rdev)) { - printk(KERN_WARNING "md: kicking non-fresh %s" - " from array!\n", - bdev_partition_name(rdev->bdev)); - kick_rdev_from_array(rdev); - continue; - } - if (mddev->level == LEVEL_MULTIPATH) { - rdev->desc_nr = i++; - rdev->raid_disk = rdev->desc_nr; - rdev->in_sync = 1; - } - } - - - /* - * Check if we can support this RAID array - */ - if (mddev->major_version != MD_MAJOR_VERSION || - mddev->minor_version > MD_MINOR_VERSION) { - printk(KERN_ALERT - "md: md%d: unsupported raid array version %d.%d.%d\n", - mdidx(mddev), mddev->major_version, - mddev->minor_version, mddev->patch_version); - goto abort; - } - - if ((mddev->recovery_cp != MaxSector) && ((mddev->level == 1) || - (mddev->level == 4) || (mddev->level == 5))) - printk(KERN_ERR "md: md%d: raid array is not clean" - " -- starting background reconstruction\n", - mdidx(mddev)); - - return 0; -abort: - return 1; -} - -static struct gendisk *md_probe(dev_t dev, int *part, void *data) -{ - static DECLARE_MUTEX(disks_sem); - int unit = MINOR(dev); - mddev_t *mddev = mddev_find(unit); - struct gendisk *disk; - - if (!mddev) - return NULL; - - down(&disks_sem); - if (disks[unit]) { - up(&disks_sem); - mddev_put(mddev); - return NULL; - } - disk = alloc_disk(1); - if (!disk) { - up(&disks_sem); - mddev_put(mddev); - return NULL; - } - disk->major = MD_MAJOR; - disk->first_minor = mdidx(mddev); - sprintf(disk->disk_name, "md%d", mdidx(mddev)); - disk->fops = &md_fops; - disk->private_data = mddev; - disk->queue = &mddev->queue; - add_disk(disk); - disks[mdidx(mddev)] = disk; - up(&disks_sem); - return NULL; -} - -void md_wakeup_thread(mdk_thread_t *thread); - -static void md_safemode_timeout(unsigned long data) -{ - mddev_t *mddev = (mddev_t *) data; - - mddev->safemode = 1; - md_wakeup_thread(mddev->thread); -} - - -static int do_md_run(mddev_t * mddev) -{ - int pnum, err; - int chunk_size; - struct list_head *tmp; - mdk_rdev_t *rdev; - struct gendisk *disk; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return -EINVAL; - } - - if (mddev->pers) - return -EBUSY; - - /* - * Analyze all RAID superblock(s) - */ - if (!mddev->raid_disks && analyze_sbs(mddev)) { - MD_BUG(); - return -EINVAL; - } - - chunk_size = mddev->chunk_size; - pnum = level_to_pers(mddev->level); - - if ((pnum != MULTIPATH) && (pnum != RAID1)) { - if (!chunk_size) { - /* - * 'default chunksize' in the old md code used to - * be PAGE_SIZE, baaad. - * we abort here to be on the safe side. We don't - * want to continue the bad practice. - */ - printk(KERN_ERR - "no chunksize specified, see 'man raidtab'\n"); - return -EINVAL; - } - if (chunk_size > MAX_CHUNK_SIZE) { - printk(KERN_ERR "too big chunk_size: %d > %d\n", - chunk_size, MAX_CHUNK_SIZE); - return -EINVAL; - } - /* - * chunk-size has to be a power of 2 and multiples of PAGE_SIZE - */ - if ( (1 << ffz(~chunk_size)) != chunk_size) { - MD_BUG(); - return -EINVAL; - } - if (chunk_size < PAGE_SIZE) { - printk(KERN_ERR "too small chunk_size: %d < %ld\n", - chunk_size, PAGE_SIZE); - return -EINVAL; - } - - /* devices must have minimum size of one chunk */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size < chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size:" - " %lluk < %dk\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->size, - chunk_size / 1024); - return -EINVAL; - } - } - } - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - -#ifdef CONFIG_KMOD - if (!pers[pnum]) - { - char module_name[80]; - sprintf (module_name, "md-personality-%d", pnum); - request_module (module_name); - } -#endif - - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md - * device. - * Also find largest hardsector size - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - sync_blockdev(rdev->bdev); - invalidate_bdev(rdev->bdev, 0); - } - - md_probe(mdidx(mddev), NULL, NULL); - disk = disks[mdidx(mddev)]; - if (!disk) - return -ENOMEM; - - spin_lock(&pers_lock); - if (!pers[pnum] || !try_module_get(pers[pnum]->owner)) { - spin_unlock(&pers_lock); - printk(KERN_ERR "md: personality %d is not loaded!\n", - pnum); - return -EINVAL; - } - - mddev->pers = pers[pnum]; - spin_unlock(&pers_lock); - - blk_queue_make_request(&mddev->queue, mddev->pers->make_request); - printk("%s: setting max_sectors to %d, segment boundary to %d\n", - disk->disk_name, - chunk_size >> 9, - (chunk_size>>1)-1); - blk_queue_max_sectors(&mddev->queue, chunk_size >> 9); - blk_queue_segment_boundary(&mddev->queue, (chunk_size>>1) - 1); - mddev->queue.queuedata = mddev; - - err = mddev->pers->run(mddev); - if (err) { - printk(KERN_ERR "md: pers->run() failed ...\n"); - module_put(mddev->pers->owner); - mddev->pers = NULL; - return -EINVAL; - } - atomic_set(&mddev->writes_pending,0); - mddev->safemode = 0; - mddev->safemode_timer.function = md_safemode_timeout; - mddev->safemode_timer.data = (unsigned long) mddev; - mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ - mddev->in_sync = 1; - - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - set_capacity(disk, mddev->array_size<<1); - return 0; -} - -static int restart_array(mddev_t *mddev) -{ - struct gendisk *disk = disks[mdidx(mddev)]; - int err; - - /* - * Complain if it has no devices - */ - err = -ENXIO; - if (list_empty(&mddev->disks)) - goto out; - - if (mddev->pers) { - err = -EBUSY; - if (!mddev->ro) - goto out; - - mddev->safemode = 0; - mddev->ro = 0; - set_disk_ro(disk, 0); - - printk(KERN_INFO "md: md%d switched to read-write mode.\n", - mdidx(mddev)); - /* - * Kick recovery or resync if necessary - */ - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - err = 0; - } else { - printk(KERN_ERR "md: md%d has no personality assigned.\n", - mdidx(mddev)); - err = -EINVAL; - } - -out: - return err; -} - -static int do_md_stop(mddev_t * mddev, int ro) -{ - int err = 0; - struct gendisk *disk = disks[mdidx(mddev)]; - - if (atomic_read(&mddev->active)>2) { - printk("md: md%d still in use.\n",mdidx(mddev)); - err = -EBUSY; - goto out; - } - - if (mddev->pers) { - if (mddev->sync_thread) { - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - } - - del_timer_sync(&mddev->safemode_timer); - - invalidate_device(mk_kdev(disk->major, disk->first_minor), 1); - - if (ro) { - err = -ENXIO; - if (mddev->ro) - goto out; - mddev->ro = 1; - } else { - if (mddev->ro) - set_disk_ro(disk, 0); - if (mddev->pers->stop(mddev)) { - err = -EBUSY; - if (mddev->ro) - set_disk_ro(disk, 1); - goto out; - } - module_put(mddev->pers->owner); - mddev->pers = NULL; - if (mddev->ro) - mddev->ro = 0; - } - if (mddev->raid_disks) { - /* mark array as shutdown cleanly */ - mddev->in_sync = 1; - md_update_sb(mddev); - } - if (ro) - set_disk_ro(disk, 1); - } - /* - * Free resources if final stop - */ - if (!ro) { - struct gendisk *disk; - printk(KERN_INFO "md: md%d stopped.\n", mdidx(mddev)); - - export_array(mddev); - - mddev->array_size = 0; - disk = disks[mdidx(mddev)]; - if (disk) - set_capacity(disk, 0); - } else - printk(KERN_INFO "md: md%d switched to read-only mode.\n", - mdidx(mddev)); - err = 0; -out: - return err; -} - -static void autorun_array(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct list_head *tmp; - int err; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return; - } - - printk(KERN_INFO "md: running: "); - - ITERATE_RDEV(mddev,rdev,tmp) { - printk("<%s>", bdev_partition_name(rdev->bdev)); - } - printk("\n"); - - err = do_md_run (mddev); - if (err) { - printk(KERN_WARNING "md :do_md_run() returned %d\n", err); - do_md_stop (mddev, 0); - } -} - -/* - * lets try to run arrays based on all disks that have arrived - * until now. (those are in pending_raid_disks) - * - * the method: pick the first pending disk, collect all disks with - * the same UUID, remove all from the pending list and put them into - * the 'same_array' list. Then order this list based on superblock - * update time (freshest comes first), kick out 'old' disks and - * compare superblocks. If everything's fine then run it. - * - * If "unit" is allocated, then bump its reference count - */ -static void autorun_devices(void) -{ - struct list_head candidates; - struct list_head *tmp; - mdk_rdev_t *rdev0, *rdev; - mddev_t *mddev; - - printk(KERN_INFO "md: autorun ...\n"); - while (!list_empty(&pending_raid_disks)) { - rdev0 = list_entry(pending_raid_disks.next, - mdk_rdev_t, same_set); - - printk(KERN_INFO "md: considering %s ...\n", - bdev_partition_name(rdev0->bdev)); - INIT_LIST_HEAD(&candidates); - ITERATE_RDEV_PENDING(rdev,tmp) - if (super_90_load(rdev, rdev0, 0) >= 0) { - printk(KERN_INFO "md: adding %s ...\n", - bdev_partition_name(rdev->bdev)); - list_move(&rdev->same_set, &candidates); - } - /* - * now we have a set of devices, with all of them having - * mostly sane superblocks. It's time to allocate the - * mddev. - */ - - mddev = mddev_find(rdev0->preferred_minor); - if (!mddev) { - printk(KERN_ERR - "md: cannot allocate memory for md drive.\n"); - break; - } - if (mddev_lock(mddev)) - printk(KERN_WARNING "md: md%d locked, cannot run\n", - mdidx(mddev)); - else if (mddev->raid_disks || mddev->major_version - || !list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: md%d already running, cannot run %s\n", - mdidx(mddev), bdev_partition_name(rdev0->bdev)); - mddev_unlock(mddev); - } else { - printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); - ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { - list_del_init(&rdev->same_set); - if (bind_rdev_to_array(rdev, mddev)) - export_rdev(rdev); - } - autorun_array(mddev); - mddev_unlock(mddev); - } - /* on success, candidates will be empty, on error - * it won't... - */ - ITERATE_RDEV_GENERIC(candidates,rdev,tmp) - export_rdev(rdev); - mddev_put(mddev); - } - printk(KERN_INFO "md: ... autorun DONE.\n"); -} - -/* - * import RAID devices based on one partition - * if possible, the array gets run as well. - */ - -static int autostart_array(dev_t startdev) -{ - int err = -EINVAL, i; - mdp_super_t *sb = NULL; - mdk_rdev_t *start_rdev = NULL, *rdev; - - start_rdev = md_import_device(startdev, 0, 0); - if (IS_ERR(start_rdev)) { - printk(KERN_WARNING "md: could not import %s!\n", - partition_name(startdev)); - return err; - } - - /* NOTE: this can only work for 0.90.0 superblocks */ - sb = (mdp_super_t*)page_address(start_rdev->sb_page); - if (sb->major_version != 0 || - sb->minor_version != 90 ) { - printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); - export_rdev(start_rdev); - return err; - } - - if (start_rdev->faulty) { - printk(KERN_WARNING - "md: can not autostart based on faulty %s!\n", - bdev_partition_name(start_rdev->bdev)); - export_rdev(start_rdev); - return err; - } - list_add(&start_rdev->same_set, &pending_raid_disks); - - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - dev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (!dev) - continue; - if (dev == startdev) - continue; - rdev = md_import_device(dev, 0, 0); - if (IS_ERR(rdev)) { - printk(KERN_WARNING "md: could not import %s," - " trying to run array nevertheless.\n", - partition_name(dev)); - continue; - } - list_add(&rdev->same_set, &pending_raid_disks); - } - - /* - * possibly return codes - */ - autorun_devices(); - return 0; - -} - - -static int get_version(void * arg) -{ - mdu_version_t ver; - - ver.major = MD_MAJOR_VERSION; - ver.minor = MD_MINOR_VERSION; - ver.patchlevel = MD_PATCHLEVEL_VERSION; - - if (copy_to_user(arg, &ver, sizeof(ver))) - return -EFAULT; - - return 0; -} - -static int get_array_info(mddev_t * mddev, void * arg) -{ - mdu_array_info_t info; - int nr,working,active,failed,spare; - mdk_rdev_t *rdev; - struct list_head *tmp; - - nr=working=active=failed=spare=0; - ITERATE_RDEV(mddev,rdev,tmp) { - nr++; - if (rdev->faulty) - failed++; - else { - working++; - if (rdev->in_sync) - active++; - else - spare++; - } - } - - info.major_version = mddev->major_version; - info.minor_version = mddev->minor_version; - info.patch_version = 1; - info.ctime = mddev->ctime; - info.level = mddev->level; - info.size = mddev->size; - info.nr_disks = nr; - info.raid_disks = mddev->raid_disks; - info.md_minor = mddev->__minor; - info.not_persistent= !mddev->persistent; - - info.utime = mddev->utime; - info.state = 0; - if (mddev->in_sync) - info.state = (1<layout; - info.chunk_size = mddev->chunk_size; - - if (copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} - -static int get_disk_info(mddev_t * mddev, void * arg) -{ - mdu_disk_info_t info; - unsigned int nr; - mdk_rdev_t *rdev; - - if (copy_from_user(&info, arg, sizeof(info))) - return -EFAULT; - - nr = info.number; - - rdev = find_rdev_nr(mddev, nr); - if (rdev) { - info.major = MAJOR(rdev->bdev->bd_dev); - info.minor = MINOR(rdev->bdev->bd_dev); - info.raid_disk = rdev->raid_disk; - info.state = 0; - if (rdev->faulty) - info.state |= (1<in_sync) { - info.state |= (1<major,info->minor); - if (!mddev->raid_disks) { - int err; - /* expecting a device which has a superblock */ - rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: md_import_device returned %ld\n", - PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - if (!list_empty(&mddev->disks)) { - mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, - mdk_rdev_t, same_set); - int err = super_types[mddev->major_version] - .load_super(rdev, rdev0, mddev->minor_version); - if (err < 0) { - printk(KERN_WARNING - "md: %s has different UUID to %s\n", - bdev_partition_name(rdev->bdev), - bdev_partition_name(rdev0->bdev)); - export_rdev(rdev); - return -EINVAL; - } - } - err = bind_rdev_to_array(rdev, mddev); - if (err) - export_rdev(rdev); - return err; - } - - /* - * add_new_disk can be used once the array is assembled - * to add "hot spares". They must already have a superblock - * written - */ - if (mddev->pers) { - int err; - if (!mddev->pers->hot_add_disk) { - printk(KERN_WARNING - "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - rdev = md_import_device(dev, mddev->major_version, - mddev->minor_version); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: md_import_device returned %ld\n", - PTR_ERR(rdev)); - return PTR_ERR(rdev); - } - rdev->in_sync = 0; /* just to be sure */ - rdev->raid_disk = -1; - err = bind_rdev_to_array(rdev, mddev); - if (err) - export_rdev(rdev); - if (mddev->thread) - md_wakeup_thread(mddev->thread); - return err; - } - - /* otherwise, add_new_disk is only allowed - * for major_version==0 superblocks - */ - if (mddev->major_version != 0) { - printk(KERN_WARNING "md%d: ADD_NEW_DISK not supported\n", - mdidx(mddev)); - return -EINVAL; - } - - if (!(info->state & (1<desc_nr = info->number; - if (info->raid_disk < mddev->raid_disks) - rdev->raid_disk = info->raid_disk; - else - rdev->raid_disk = -1; - - rdev->faulty = 0; - if (rdev->raid_disk < mddev->raid_disks) - rdev->in_sync = (info->state & (1<in_sync = 0; - - err = bind_rdev_to_array(rdev, mddev); - if (err) { - export_rdev(rdev); - return err; - } - - if (!mddev->persistent) { - printk(KERN_INFO "md: nonpersistent superblock ...\n"); - rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; - } else - rdev->sb_offset = calc_dev_sboffset(rdev->bdev); - rdev->size = calc_dev_size(rdev, mddev->chunk_size); - - if (!mddev->size || (mddev->size > rdev->size)) - mddev->size = rdev->size; - } - - return 0; -} - -static int hot_generate_error(mddev_t * mddev, dev_t dev) -{ - struct request_queue *q; - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to generate %s error in md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - return -ENXIO; - } - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - if (!rdev->in_sync) - return -ENODEV; - - q = bdev_get_queue(rdev->bdev); - if (!q) { - MD_BUG(); - return -ENODEV; - } - printk(KERN_INFO "md: okay, generating error!\n"); -// q->oneshot_error = 1; // disabled for now - - return 0; -} - -static int hot_remove_disk(mddev_t * mddev, dev_t dev) -{ - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to remove %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) - return -ENXIO; - - if (rdev->raid_disk >= 0) - goto busy; - - kick_rdev_from_array(rdev); - md_update_sb(mddev); - - return 0; -busy: - printk(KERN_WARNING "md: cannot remove active disk %s from md%d ... \n", - bdev_partition_name(rdev->bdev), mdidx(mddev)); - return -EBUSY; -} - -static int hot_add_disk(mddev_t * mddev, dev_t dev) -{ - int err; - unsigned int size; - mdk_rdev_t *rdev; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to hot-add %s to md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (mddev->major_version != 0) { - printk(KERN_WARNING "md%d: HOT_ADD may only be used with" - " version-0 superblocks.\n", - mdidx(mddev)); - return -EINVAL; - } - if (!mddev->pers->hot_add_disk) { - printk(KERN_WARNING - "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - rdev = md_import_device (dev, -1, 0); - if (IS_ERR(rdev)) { - printk(KERN_WARNING - "md: error, md_import_device() returned %ld\n", - PTR_ERR(rdev)); - return -EINVAL; - } - - rdev->sb_offset = calc_dev_sboffset(rdev->bdev); - size = calc_dev_size(rdev, mddev->chunk_size); - rdev->size = size; - - if (size < mddev->size) { - printk(KERN_WARNING - "md%d: disk size %llu blocks < array size %llu\n", - mdidx(mddev), (unsigned long long)size, - (unsigned long long)mddev->size); - err = -ENOSPC; - goto abort_export; - } - - if (rdev->faulty) { - printk(KERN_WARNING - "md: can not hot-add faulty %s disk to md%d!\n", - bdev_partition_name(rdev->bdev), mdidx(mddev)); - err = -EINVAL; - goto abort_export; - } - rdev->in_sync = 0; - rdev->desc_nr = -1; - bind_rdev_to_array(rdev, mddev); - - /* - * The rest should better be atomic, we can have disk failures - * noticed in interrupt contexts ... - */ - - if (rdev->desc_nr == mddev->max_disks) { - printk(KERN_WARNING "md%d: can not hot-add to full array!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unbind_export; - } - - rdev->raid_disk = -1; - - md_update_sb(mddev); - - /* - * Kick recovery, maybe this spare has to be added to the - * array immediately. - */ - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); - - return 0; - -abort_unbind_export: - unbind_rdev_from_array(rdev); - -abort_export: - export_rdev(rdev); - return err; -} - -/* - * set_array_info is used two different ways - * The original usage is when creating a new array. - * In this usage, raid_disks is > = and it together with - * level, size, not_persistent,layout,chunksize determine the - * shape of the array. - * This will always create an array with a type-0.90.0 superblock. - * The newer usage is when assembling an array. - * In this case raid_disks will be 0, and the major_version field is - * use to determine which style super-blocks are to be found on the devices. - * The minor and patch _version numbers are also kept incase the - * super_block handler wishes to interpret them. - */ -static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) -{ - - if (info->raid_disks == 0) { - /* just setting version number for superblock loading */ - if (info->major_version < 0 || - info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || - super_types[info->major_version].name == NULL) { - /* maybe try to auto-load a module? */ - printk(KERN_INFO - "md: superblock version %d not known\n", - info->major_version); - return -EINVAL; - } - mddev->major_version = info->major_version; - mddev->minor_version = info->minor_version; - mddev->patch_version = info->patch_version; - return 0; - } - mddev->major_version = MD_MAJOR_VERSION; - mddev->minor_version = MD_MINOR_VERSION; - mddev->patch_version = MD_PATCHLEVEL_VERSION; - mddev->ctime = get_seconds(); - - mddev->level = info->level; - mddev->size = info->size; - mddev->raid_disks = info->raid_disks; - /* don't set __minor, it is determined by which /dev/md* was - * openned - */ - if (info->state & (1<recovery_cp = MaxSector; - else - mddev->recovery_cp = 0; - mddev->persistent = ! info->not_persistent; - - mddev->layout = info->layout; - mddev->chunk_size = info->chunk_size; - - mddev->max_disks = MD_SB_DISKS; - - - /* - * Generate a 128 bit UUID - */ - get_random_bytes(mddev->uuid, 16); - - return 0; -} - -static int set_disk_faulty(mddev_t *mddev, dev_t dev) -{ - mdk_rdev_t *rdev; - - rdev = find_rdev(mddev, dev); - if (!rdev) - return 0; - - md_error(mddev, rdev); - return 1; -} - -static int md_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg) -{ - unsigned int minor; - int err = 0; - struct hd_geometry *loc = (struct hd_geometry *) arg; - mddev_t *mddev = NULL; - kdev_t dev; - - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; - - dev = inode->i_rdev; - minor = minor(dev); - if (minor >= MAX_MD_DEVS) { - MD_BUG(); - return -EINVAL; - } - - /* - * Commands dealing with the RAID driver but not any - * particular array: - */ - switch (cmd) - { - case RAID_VERSION: - err = get_version((void *)arg); - goto done; - - case PRINT_RAID_DEBUG: - err = 0; - md_print_devices(); - goto done; - -#ifndef MODULE - case RAID_AUTORUN: - err = 0; - autostart_arrays(); - goto done; -#endif - default:; - } - - /* - * Commands creating/starting a new array: - */ - - mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) { - BUG(); - goto abort; - } - - - if (cmd == START_ARRAY) { - /* START_ARRAY doesn't need to lock the array as autostart_array - * does the locking, and it could even be a different array - */ - err = autostart_array(arg); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name(arg)); - goto abort; - } - goto done; - } - - err = mddev_lock(mddev); - if (err) { - printk(KERN_INFO - "md: ioctl lock interrupted, reason %d, cmd %d\n", - err, cmd); - goto abort; - } - - switch (cmd) - { - case SET_ARRAY_INFO: - - if (!list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: array md%d already has disks!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - if (mddev->raid_disks) { - printk(KERN_WARNING - "md: array md%d already initialised!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - { - mdu_array_info_t info; - if (!arg) - memset(&info, 0, sizeof(info)); - else if (copy_from_user(&info, (void*)arg, sizeof(info))) { - err = -EFAULT; - goto abort_unlock; - } - err = set_array_info(mddev, &info); - if (err) { - printk(KERN_WARNING "md: couldn't set" - " array info. %d\n", err); - goto abort_unlock; - } - } - goto done_unlock; - - default:; - } - - /* - * Commands querying/configuring an existing array: - */ - /* if we are initialised yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */ - if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) { - err = -ENODEV; - goto abort_unlock; - } - - /* - * Commands even a read-only array can execute: - */ - switch (cmd) - { - case GET_ARRAY_INFO: - err = get_array_info(mddev, (void *)arg); - goto done_unlock; - - case GET_DISK_INFO: - err = get_disk_info(mddev, (void *)arg); - goto done_unlock; - - case RESTART_ARRAY_RW: - err = restart_array(mddev); - goto done_unlock; - - case STOP_ARRAY: - err = do_md_stop (mddev, 0); - goto done_unlock; - - case STOP_ARRAY_RO: - err = do_md_stop (mddev, 1); - goto done_unlock; - - /* - * We have a problem here : there is no easy way to give a CHS - * virtual geometry. We currently pretend that we have a 2 heads - * 4 sectors (with a BIG number of cylinders...). This drives - * dosfs just mad... ;-) - */ - case HDIO_GETGEO: - if (!loc) { - err = -EINVAL; - goto abort_unlock; - } - err = put_user (2, (char *) &loc->heads); - if (err) - goto abort_unlock; - err = put_user (4, (char *) &loc->sectors); - if (err) - goto abort_unlock; - err = put_user(get_capacity(disks[mdidx(mddev)])/8, - (short *) &loc->cylinders); - if (err) - goto abort_unlock; - err = put_user (get_start_sect(inode->i_bdev), - (long *) &loc->start); - goto done_unlock; - } - - /* - * The remaining ioctls are changing the state of the - * superblock, so we do not allow read-only arrays - * here: - */ - if (mddev->ro) { - err = -EROFS; - goto abort_unlock; - } - - switch (cmd) - { - case ADD_NEW_DISK: - { - mdu_disk_info_t info; - if (copy_from_user(&info, (void*)arg, sizeof(info))) - err = -EFAULT; - else - err = add_new_disk(mddev, &info); - goto done_unlock; - } - case HOT_GENERATE_ERROR: - err = hot_generate_error(mddev, arg); - goto done_unlock; - case HOT_REMOVE_DISK: - err = hot_remove_disk(mddev, arg); - goto done_unlock; - - case HOT_ADD_DISK: - err = hot_add_disk(mddev, arg); - goto done_unlock; - - case SET_DISK_FAULTY: - err = set_disk_faulty(mddev, arg); - goto done_unlock; - - case RUN_ARRAY: - { - err = do_md_run (mddev); - /* - * we have to clean up the mess if - * the array cannot be run for some - * reason ... - * ->pers will not be set, to superblock will - * not be updated. - */ - if (err) - do_md_stop (mddev, 0); - goto done_unlock; - } - - default: - if (_IOC_TYPE(cmd) == MD_MAJOR) - printk(KERN_WARNING "md: %s(pid %d) used" - " obsolete MD ioctl, upgrade your" - " software to use new ictls.\n", - current->comm, current->pid); - err = -EINVAL; - goto abort_unlock; - } - -done_unlock: -abort_unlock: - mddev_unlock(mddev); - - return err; -done: - if (err) - MD_BUG(); -abort: - return err; -} - -static int md_open(struct inode *inode, struct file *file) -{ - /* - * Succeed if we can find or allocate a mddev structure. - */ - mddev_t *mddev = mddev_find(minor(inode->i_rdev)); - int err = -ENOMEM; - - if (!mddev) - goto out; - - if ((err = mddev_lock(mddev))) - goto put; - - err = 0; - mddev_unlock(mddev); - inode->i_bdev->bd_inode->u.generic_ip = mddev_get(mddev); - put: - mddev_put(mddev); - out: - return err; -} - -static int md_release(struct inode *inode, struct file * file) -{ - mddev_t *mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) - BUG(); - mddev_put(mddev); - - return 0; -} - -static struct block_device_operations md_fops = -{ - .owner = THIS_MODULE, - .open = md_open, - .release = md_release, - .ioctl = md_ioctl, -}; - -int md_thread(void * arg) -{ - mdk_thread_t *thread = arg; - - lock_kernel(); - - /* - * Detach thread - */ - - daemonize(thread->name, mdidx(thread->mddev)); - - current->exit_signal = SIGCHLD; - allow_signal(SIGKILL); - thread->tsk = current; - - /* - * md_thread is a 'system-thread', it's priority should be very - * high. We avoid resource deadlocks individually in each - * raid personality. (RAID5 does preallocation) We also use RR and - * the very same RT priority as kswapd, thus we will never get - * into a priority inversion deadlock. - * - * we definitely have to have equal or higher priority than - * bdflush, otherwise bdflush will deadlock if there are too - * many dirty RAID5 blocks. - */ - unlock_kernel(); - - complete(thread->event); - while (thread->run) { - void (*run)(mddev_t *); - - wait_event_interruptible(thread->wqueue, - test_bit(THREAD_WAKEUP, &thread->flags)); - if (current->flags & PF_FREEZE) - refrigerator(PF_IOTHREAD); - - clear_bit(THREAD_WAKEUP, &thread->flags); - - run = thread->run; - if (run) { - run(thread->mddev); - blk_run_queues(); - } - if (signal_pending(current)) - flush_signals(current); - } - complete(thread->event); - return 0; -} - -void md_wakeup_thread(mdk_thread_t *thread) -{ - if (thread) { - dprintk("md: waking up MD thread %p.\n", thread); - set_bit(THREAD_WAKEUP, &thread->flags); - wake_up(&thread->wqueue); - } -} - -mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, - const char *name) -{ - mdk_thread_t *thread; - int ret; - struct completion event; - - thread = (mdk_thread_t *) kmalloc - (sizeof(mdk_thread_t), GFP_KERNEL); - if (!thread) - return NULL; - - memset(thread, 0, sizeof(mdk_thread_t)); - init_waitqueue_head(&thread->wqueue); - - init_completion(&event); - thread->event = &event; - thread->run = run; - thread->mddev = mddev; - thread->name = name; - ret = kernel_thread(md_thread, thread, 0); - if (ret < 0) { - kfree(thread); - return NULL; - } - wait_for_completion(&event); - return thread; -} - -void md_interrupt_thread(mdk_thread_t *thread) -{ - if (!thread->tsk) { - MD_BUG(); - return; - } - dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); - send_sig(SIGKILL, thread->tsk, 1); -} - -void md_unregister_thread(mdk_thread_t *thread) -{ - struct completion event; - - init_completion(&event); - - thread->event = &event; - thread->run = NULL; - thread->name = NULL; - md_interrupt_thread(thread); - wait_for_completion(&event); - kfree(thread); -} - -void md_error(mddev_t *mddev, mdk_rdev_t *rdev) -{ - dprintk("md_error dev:(%d:%d), rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", - MD_MAJOR,mdidx(mddev), - MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), - __builtin_return_address(0),__builtin_return_address(1), - __builtin_return_address(2),__builtin_return_address(3)); - - if (!mddev) { - MD_BUG(); - return; - } - - if (!rdev || rdev->faulty) - return; - if (!mddev->pers->error_handler) - return; - mddev->pers->error_handler(mddev,rdev); - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); -} - -/* seq_file implementation /proc/mdstat */ - -static void status_unused(struct seq_file *seq) -{ - int i = 0; - mdk_rdev_t *rdev; - struct list_head *tmp; - - seq_printf(seq, "unused devices: "); - - ITERATE_RDEV_PENDING(rdev,tmp) { - i++; - seq_printf(seq, "%s ", - bdev_partition_name(rdev->bdev)); - } - if (!i) - seq_printf(seq, ""); - - seq_printf(seq, "\n"); -} - - -static void status_resync(struct seq_file *seq, mddev_t * mddev) -{ - unsigned long max_blocks, resync, res, dt, db, rt; - - resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; - max_blocks = mddev->size; - - /* - * Should not happen. - */ - if (!max_blocks) { - MD_BUG(); - return; - } - res = (resync/1024)*1000/(max_blocks/1024 + 1); - { - int i, x = res/50, y = 20-x; - seq_printf(seq, "["); - for (i = 0; i < x; i++) - seq_printf(seq, "="); - seq_printf(seq, ">"); - for (i = 0; i < y; i++) - seq_printf(seq, "."); - seq_printf(seq, "] "); - } - seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", - (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? - "resync" : "recovery"), - res/10, res % 10, resync, max_blocks); - - /* - * We do not want to overflow, so the order of operands and - * the * 100 / 100 trick are important. We do a +1 to be - * safe against division by zero. We only estimate anyway. - * - * dt: time from mark until now - * db: blocks written from mark until now - * rt: remaining time - */ - dt = ((jiffies - mddev->resync_mark) / HZ); - if (!dt) dt++; - db = resync - (mddev->resync_mark_cnt/2); - rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; - - seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); - - seq_printf(seq, " speed=%ldK/sec", db/dt); -} - -static void *md_seq_start(struct seq_file *seq, loff_t *pos) -{ - struct list_head *tmp; - loff_t l = *pos; - mddev_t *mddev; - - if (l > 0x10000) - return NULL; - if (!l--) - /* header */ - return (void*)1; - - spin_lock(&all_mddevs_lock); - list_for_each(tmp,&all_mddevs) - if (!l--) { - mddev = list_entry(tmp, mddev_t, all_mddevs); - mddev_get(mddev); - spin_unlock(&all_mddevs_lock); - return mddev; - } - spin_unlock(&all_mddevs_lock); - return (void*)2;/* tail */ -} - -static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) -{ - struct list_head *tmp; - mddev_t *next_mddev, *mddev = v; - - ++*pos; - if (v == (void*)2) - return NULL; - - spin_lock(&all_mddevs_lock); - if (v == (void*)1) - tmp = all_mddevs.next; - else - tmp = mddev->all_mddevs.next; - if (tmp != &all_mddevs) - next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); - else { - next_mddev = (void*)2; - *pos = 0x10000; - } - spin_unlock(&all_mddevs_lock); - - if (v != (void*)1) - mddev_put(mddev); - return next_mddev; - -} - -static void md_seq_stop(struct seq_file *seq, void *v) -{ - mddev_t *mddev = v; - - if (mddev && v != (void*)1 && v != (void*)2) - mddev_put(mddev); -} - -static int md_seq_show(struct seq_file *seq, void *v) -{ - mddev_t *mddev = v; - sector_t size; - struct list_head *tmp2; - mdk_rdev_t *rdev; - int i; - - if (v == (void*)1) { - seq_printf(seq, "Personalities : "); - spin_lock(&pers_lock); - for (i = 0; i < MAX_PERSONALITY; i++) - if (pers[i]) - seq_printf(seq, "[%s] ", pers[i]->name); - - spin_unlock(&pers_lock); - seq_printf(seq, "\n"); - return 0; - } - if (v == (void*)2) { - status_unused(seq); - return 0; - } - - if (mddev_lock(mddev)!=0) - return -EINTR; - if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { - seq_printf(seq, "md%d : %sactive", mdidx(mddev), - mddev->pers ? "" : "in"); - if (mddev->pers) { - if (mddev->ro) - seq_printf(seq, " (read-only)"); - seq_printf(seq, " %s", mddev->pers->name); - } - - size = 0; - ITERATE_RDEV(mddev,rdev,tmp2) { - seq_printf(seq, " %s[%d]", - bdev_partition_name(rdev->bdev), rdev->desc_nr); - if (rdev->faulty) { - seq_printf(seq, "(F)"); - continue; - } - size += rdev->size; - } - - if (!list_empty(&mddev->disks)) { - if (mddev->pers) - seq_printf(seq, "\n %llu blocks", - (unsigned long long)mddev->array_size); - else - seq_printf(seq, "\n %llu blocks", - (unsigned long long)size); - } - - if (mddev->pers) { - mddev->pers->status (seq, mddev); - seq_printf(seq, "\n "); - if (mddev->curr_resync > 2) - status_resync (seq, mddev); - else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) - seq_printf(seq, " resync=DELAYED"); - } - - seq_printf(seq, "\n"); - } - mddev_unlock(mddev); - - return 0; -} - -static struct seq_operations md_seq_ops = { - .start = md_seq_start, - .next = md_seq_next, - .stop = md_seq_stop, - .show = md_seq_show, -}; - -static int md_seq_open(struct inode *inode, struct file *file) -{ - int error; - - error = seq_open(file, &md_seq_ops); - return error; -} - -static struct file_operations md_seq_fops = { - .open = md_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - -int register_md_personality(int pnum, mdk_personality_t *p) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - spin_lock(&pers_lock); - if (pers[pnum]) { - spin_unlock(&pers_lock); - MD_BUG(); - return -EBUSY; - } - - pers[pnum] = p; - printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum); - spin_unlock(&pers_lock); - return 0; -} - -int unregister_md_personality(int pnum) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name); - spin_lock(&pers_lock); - pers[pnum] = NULL; - spin_unlock(&pers_lock); - return 0; -} - -void md_sync_acct(mdk_rdev_t *rdev, unsigned long nr_sectors) -{ - rdev->bdev->bd_contains->bd_disk->sync_io += nr_sectors; -} - -static int is_mddev_idle(mddev_t *mddev) -{ - mdk_rdev_t * rdev; - struct list_head *tmp; - int idle; - unsigned long curr_events; - - idle = 1; - ITERATE_RDEV(mddev,rdev,tmp) { - struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; - curr_events = disk_stat_read(disk, read_sectors) + - disk_stat_read(disk, write_sectors) - - disk->sync_io; - if ((curr_events - rdev->last_events) > 32) { - rdev->last_events = curr_events; - idle = 0; - } - } - return idle; -} - -void md_done_sync(mddev_t *mddev, int blocks, int ok) -{ - /* another "blocks" (512byte) blocks have been synced */ - atomic_sub(blocks, &mddev->recovery_active); - wake_up(&mddev->recovery_wait); - if (!ok) { - set_bit(MD_RECOVERY_ERR, &mddev->recovery); - md_wakeup_thread(mddev->thread); - // stop recovery, signal do_sync .... - } -} - - -void md_write_start(mddev_t *mddev) -{ - if (!atomic_read(&mddev->writes_pending)) { - mddev_lock_uninterruptible(mddev); - if (mddev->in_sync) { - mddev->in_sync = 0; - del_timer(&mddev->safemode_timer); - md_update_sb(mddev); - } - atomic_inc(&mddev->writes_pending); - mddev_unlock(mddev); - } else - atomic_inc(&mddev->writes_pending); -} - -void md_write_end(mddev_t *mddev) -{ - if (atomic_dec_and_test(&mddev->writes_pending)) { - if (mddev->safemode == 2) - md_wakeup_thread(mddev->thread); - else - mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); - } -} - -static inline void md_enter_safemode(mddev_t *mddev) -{ - mddev_lock_uninterruptible(mddev); - if (mddev->safemode && !atomic_read(&mddev->writes_pending) && - !mddev->in_sync && mddev->recovery_cp == MaxSector) { - mddev->in_sync = 1; - md_update_sb(mddev); - } - mddev_unlock(mddev); - - if (mddev->safemode == 1) - mddev->safemode = 0; -} - -void md_handle_safemode(mddev_t *mddev) -{ - if (signal_pending(current)) { - printk(KERN_INFO "md: md%d in immediate safe mode\n", - mdidx(mddev)); - mddev->safemode = 2; - flush_signals(current); - } - if (mddev->safemode) - md_enter_safemode(mddev); -} - - -DECLARE_WAIT_QUEUE_HEAD(resync_wait); - -#define SYNC_MARKS 10 -#define SYNC_MARK_STEP (3*HZ) -static void md_do_sync(mddev_t *mddev) -{ - mddev_t *mddev2; - unsigned int max_sectors, currspeed = 0, - j, window; - unsigned long mark[SYNC_MARKS]; - unsigned long mark_cnt[SYNC_MARKS]; - int last_mark,m; - struct list_head *tmp; - unsigned long last_check; - - /* just incase thread restarts... */ - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) - return; - - /* we overload curr_resync somewhat here. - * 0 == not engaged in resync at all - * 2 == checking that there is no conflict with another sync - * 1 == like 2, but have yielded to allow conflicting resync to - * commense - * other == active in resync - this many blocks - */ - do { - mddev->curr_resync = 2; - - ITERATE_MDDEV(mddev2,tmp) { - if (mddev2 == mddev) - continue; - if (mddev2->curr_resync && - match_mddev_units(mddev,mddev2)) { - printk(KERN_INFO "md: delaying resync of md%d" - " until md%d has finished resync (they" - " share one or more physical units)\n", - mdidx(mddev), mdidx(mddev2)); - if (mddev < mddev2) {/* arbitrarily yield */ - mddev->curr_resync = 1; - wake_up(&resync_wait); - } - if (wait_event_interruptible(resync_wait, - mddev2->curr_resync < mddev->curr_resync)) { - flush_signals(current); - mddev_put(mddev2); - goto skip; - } - } - if (mddev->curr_resync == 1) { - mddev_put(mddev2); - break; - } - } - } while (mddev->curr_resync < 2); - - max_sectors = mddev->size << 1; - - printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); - printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" - " %d KB/sec/disc.\n", sysctl_speed_limit_min); - printk(KERN_INFO "md: using maximum available idle IO bandwith " - "(but not more than %d KB/sec) for reconstruction.\n", - sysctl_speed_limit_max); - - is_mddev_idle(mddev); /* this also initializes IO event counters */ - if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) - j = mddev->recovery_cp; - else - j = 0; - for (m = 0; m < SYNC_MARKS; m++) { - mark[m] = jiffies; - mark_cnt[m] = j; - } - last_mark = 0; - mddev->resync_mark = mark[last_mark]; - mddev->resync_mark_cnt = mark_cnt[last_mark]; - - /* - * Tune reconstruction: - */ - window = 32*(PAGE_SIZE/512); - printk(KERN_INFO "md: using %dk window, over a total of %d blocks.\n", - window/2,max_sectors/2); - - atomic_set(&mddev->recovery_active, 0); - init_waitqueue_head(&mddev->recovery_wait); - last_check = 0; - - if (j) - printk(KERN_INFO - "md: resuming recovery of md%d from checkpoint.\n", - mdidx(mddev)); - - while (j < max_sectors) { - int sectors; - - sectors = mddev->pers->sync_request(mddev, j, currspeed < sysctl_speed_limit_min); - if (sectors < 0) { - set_bit(MD_RECOVERY_ERR, &mddev->recovery); - goto out; - } - atomic_add(sectors, &mddev->recovery_active); - j += sectors; - if (j>1) mddev->curr_resync = j; - - if (last_check + window > j) - continue; - - last_check = j; - - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || - test_bit(MD_RECOVERY_ERR, &mddev->recovery)) - break; - - blk_run_queues(); - - repeat: - if (jiffies >= mark[last_mark] + SYNC_MARK_STEP ) { - /* step marks */ - int next = (last_mark+1) % SYNC_MARKS; - - mddev->resync_mark = mark[next]; - mddev->resync_mark_cnt = mark_cnt[next]; - mark[next] = jiffies; - mark_cnt[next] = j - atomic_read(&mddev->recovery_active); - last_mark = next; - } - - - if (signal_pending(current)) { - /* - * got a signal, exit. - */ - printk(KERN_INFO - "md: md_do_sync() got signal ... exiting\n"); - flush_signals(current); - set_bit(MD_RECOVERY_INTR, &mddev->recovery); - goto out; - } - - /* - * this loop exits only if either when we are slower than - * the 'hard' speed limit, or the system was IO-idle for - * a jiffy. - * the system might be non-idle CPU-wise, but we only care - * about not overloading the IO subsystem. (things like an - * e2fsck being done on the RAID array should execute fast) - */ - cond_resched(); - - currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; - - if (currspeed > sysctl_speed_limit_min) { - if ((currspeed > sysctl_speed_limit_max) || - !is_mddev_idle(mddev)) { - current->state = TASK_INTERRUPTIBLE; - schedule_timeout(HZ/4); - goto repeat; - } - } - } - printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev)); - /* - * this also signals 'finished resyncing' to md_stop - */ - out: - wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); - - /* tell personality that we are finished */ - mddev->pers->sync_request(mddev, max_sectors, 1); - - if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && - mddev->curr_resync > 2 && - mddev->curr_resync > mddev->recovery_cp) { - if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { - printk(KERN_INFO - "md: checkpointing recovery of md%d.\n", - mdidx(mddev)); - mddev->recovery_cp = mddev->curr_resync; - } else - mddev->recovery_cp = MaxSector; - } - - if (mddev->safemode) - md_enter_safemode(mddev); - skip: - mddev->curr_resync = 0; - set_bit(MD_RECOVERY_DONE, &mddev->recovery); - md_wakeup_thread(mddev->thread); -} - - -/* - * This routine is regularly called by all per-raid-array threads to - * deal with generic issues like resync and super-block update. - * Raid personalities that don't have a thread (linear/raid0) do not - * need this as they never do any recovery or update the superblock. - * - * It does not do any resync itself, but rather "forks" off other threads - * to do that as needed. - * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in - * "->recovery" and create a thread at ->sync_thread. - * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) - * and wakeups up this thread which will reap the thread and finish up. - * This thread also removes any faulty devices (with nr_pending == 0). - * - * The overall approach is: - * 1/ if the superblock needs updating, update it. - * 2/ If a recovery thread is running, don't do anything else. - * 3/ If recovery has finished, clean up, possibly marking spares active. - * 4/ If there are any faulty devices, remove them. - * 5/ If array is degraded, try to add spares devices - * 6/ If array has spares or is not in-sync, start a resync thread. - */ -void md_check_recovery(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct list_head *rtmp; - - - dprintk(KERN_INFO "md: recovery thread got woken up ...\n"); - - if (mddev->ro) - return; - if ( ! ( - mddev->sb_dirty || - test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || - test_bit(MD_RECOVERY_DONE, &mddev->recovery) - )) - return; - if (mddev_trylock(mddev)==0) { - int spares =0; - if (mddev->sb_dirty) - md_update_sb(mddev); - if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && - !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) - /* resync/recovery still happening */ - goto unlock; - if (mddev->sync_thread) { - /* resync has finished, collect result */ - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery)) { - /* success...*/ - /* activate any spares */ - mddev->pers->spare_active(mddev); - } - md_update_sb(mddev); - mddev->recovery = 0; - wake_up(&resync_wait); - goto unlock; - } - if (mddev->recovery) { - /* that's odd.. */ - mddev->recovery = 0; - wake_up(&resync_wait); - } - - /* no recovery is running. - * remove any failed drives, then - * add spares if possible - */ - ITERATE_RDEV(mddev,rdev,rtmp) { - if (rdev->raid_disk >= 0 && - rdev->faulty && - atomic_read(&rdev->nr_pending)==0) { - mddev->pers->hot_remove_disk(mddev, rdev->raid_disk); - rdev->raid_disk = -1; - } - if (!rdev->faulty && rdev->raid_disk >= 0 && !rdev->in_sync) - spares++; - } - if (mddev->degraded) { - ITERATE_RDEV(mddev,rdev,rtmp) - if (rdev->raid_disk < 0 - && !rdev->faulty) { - if (mddev->pers->hot_add_disk(mddev,rdev)) - spares++; - else - break; - } - } - - if (!spares && (mddev->recovery_cp == MaxSector )) { - /* nothing we can do ... */ - goto unlock; - } - if (mddev->pers->sync_request) { - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); - if (!spares) - set_bit(MD_RECOVERY_SYNC, &mddev->recovery); - mddev->sync_thread = md_register_thread(md_do_sync, - mddev, - "md%d_resync"); - if (!mddev->sync_thread) { - printk(KERN_ERR "md%d: could not start resync" - " thread...\n", - mdidx(mddev)); - /* leave the spares where they are, it shouldn't hurt */ - mddev->recovery = 0; - } else { - md_wakeup_thread(mddev->sync_thread); - } - } - unlock: - mddev_unlock(mddev); - } -} - -int md_notify_reboot(struct notifier_block *this, - unsigned long code, void *x) -{ - struct list_head *tmp; - mddev_t *mddev; - - if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { - - printk(KERN_INFO "md: stopping all md devices.\n"); - - ITERATE_MDDEV(mddev,tmp) - if (mddev_trylock(mddev)==0) - do_md_stop (mddev, 1); - /* - * certain more exotic SCSI devices are known to be - * volatile wrt too early system reboots. While the - * right place to handle this issue is the given - * driver, we do want to have a safe RAID driver ... - */ - mdelay(1000*1); - } - return NOTIFY_DONE; -} - -struct notifier_block md_notifier = { - .notifier_call = md_notify_reboot, - .next = NULL, - .priority = INT_MAX, /* before any real devices */ -}; - -static void md_geninit(void) -{ - struct proc_dir_entry *p; - - dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); - -#ifdef CONFIG_PROC_FS - p = create_proc_entry("mdstat", S_IRUGO, NULL); - if (p) - p->proc_fops = &md_seq_fops; -#endif -} - -int __init md_init(void) -{ - int minor; - - printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," - " MD_SB_DISKS=%d\n", - MD_MAJOR_VERSION, MD_MINOR_VERSION, - MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); - - if (register_blkdev(MAJOR_NR, "md")) - return -1; - - devfs_mk_dir("md"); - blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, - md_probe, NULL, NULL); - for (minor=0; minor < MAX_MD_DEVS; ++minor) { - char name[16]; - sprintf(name, "md/%d", minor); - devfs_register(NULL, name, DEVFS_FL_DEFAULT, MAJOR_NR, minor, - S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); - } - - register_reboot_notifier(&md_notifier); - raid_table_header = register_sysctl_table(raid_root_table, 1); - - md_geninit(); - return (0); -} - - -#ifndef MODULE - -/* - * Searches all registered partitions for autorun RAID arrays - * at boot time. - */ -static dev_t detected_devices[128]; -static int dev_cnt; - -void md_autodetect_dev(dev_t dev) -{ - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; -} - - -static void autostart_arrays(void) -{ - mdk_rdev_t *rdev; - int i; - - printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); - - for (i = 0; i < dev_cnt; i++) { - dev_t dev = detected_devices[i]; - - rdev = md_import_device(dev,0, 0); - if (IS_ERR(rdev)) { - printk(KERN_ALERT "md: could not import %s!\n", - partition_name(dev)); - continue; - } - if (rdev->faulty) { - MD_BUG(); - continue; - } - list_add(&rdev->same_set, &pending_raid_disks); - } - dev_cnt = 0; - - autorun_devices(); -} - -#endif - -static __exit void md_exit(void) -{ - int i; - blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); - for (i=0; i < MAX_MD_DEVS; i++) - devfs_remove("md/%d", i); - devfs_remove("md"); - - unregister_blkdev(MAJOR_NR,"md"); - unregister_reboot_notifier(&md_notifier); - unregister_sysctl_table(raid_table_header); -#ifdef CONFIG_PROC_FS - remove_proc_entry("mdstat", NULL); -#endif - for (i = 0; i < MAX_MD_DEVS; i++) { - struct gendisk *disk = disks[i]; - mddev_t *mddev; - if (!disks[i]) - continue; - mddev = disk->private_data; - del_gendisk(disk); - put_disk(disk); - mddev_put(mddev); - } -} - -module_init(md_init) -module_exit(md_exit) - -EXPORT_SYMBOL(register_md_personality); -EXPORT_SYMBOL(unregister_md_personality); -EXPORT_SYMBOL(md_error); -EXPORT_SYMBOL(md_sync_acct); -EXPORT_SYMBOL(md_done_sync); -EXPORT_SYMBOL(md_write_start); -EXPORT_SYMBOL(md_write_end); -EXPORT_SYMBOL(md_handle_safemode); -EXPORT_SYMBOL(md_register_thread); -EXPORT_SYMBOL(md_unregister_thread); -EXPORT_SYMBOL(md_wakeup_thread); -EXPORT_SYMBOL(md_print_devices); -EXPORT_SYMBOL(md_interrupt_thread); -EXPORT_SYMBOL(md_check_recovery); -MODULE_LICENSE("GPL"); ./linux/md/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.197086032 +0000 @@ -1,3680 +0,0 @@ -@@ -1,3674 +1,101 @@ --/* -- md.c : Multiple Devices driver for Linux -- Copyright (C) 1998, 1999, 2000 Ingo Molnar -- -- completely rewritten, based on the MD driver code from Marc Zyngier -- -- Changes: -- -- - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar -- - boot support for linear and striped mode by Harald Hoyer -- - kerneld support by Boris Tobotras -- - kmod support by: Cyrus Durgin -- - RAID0 bugfixes: Mark Anthony Lisher -- - Devfs support by Richard Gooch -- -- - lots of fixes and improvements to the RAID1/RAID5 and generic -- RAID code (such as request based resynchronization): -- -- Neil Brown . -- -- This program is free software; you can redistribute it and/or modify -- it under the terms of the GNU General Public License as published by -- the Free Software Foundation; either version 2, or (at your option) -- any later version. -- -- You should have received a copy of the GNU General Public License -- (for example /usr/src/linux/COPYING); if not, write to the Free -- Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. --*/ -- --#include --#include --#include --#include --#include --#include --#include --#include /* for invalidate_bdev */ --#include -- --#include -- --#ifdef CONFIG_KMOD --#include --#endif -- --#define __KERNEL_SYSCALLS__ --#include -- --#include -- --#define MAJOR_NR MD_MAJOR --#define MD_DRIVER --#define DEVICE_NR(device) (minor(device)) -- --#include -- --#define DEBUG 0 --#define dprintk(x...) ((void)(DEBUG && printk(x))) -- -- --#ifndef MODULE --static void autostart_arrays (void); --#endif -- --static mdk_personality_t *pers[MAX_PERSONALITY]; --static spinlock_t pers_lock = SPIN_LOCK_UNLOCKED; -- --/* -- * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' -- * is 1000 KB/sec, so the extra system load does not show up that much. -- * Increase it if you want to have more _guaranteed_ speed. Note that -- * the RAID driver will use the maximum available bandwith if the IO -- * subsystem is idle. There is also an 'absolute maximum' reconstruction -- * speed limit - in case reconstruction slows down your system despite -- * idle IO detection. -- * -- * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. -- */ -- --static int sysctl_speed_limit_min = 1000; --static int sysctl_speed_limit_max = 200000; -- --static struct ctl_table_header *raid_table_header; -- --static ctl_table raid_table[] = { -- { -- .ctl_name = DEV_RAID_SPEED_LIMIT_MIN, -- .procname = "speed_limit_min", -- .data = &sysctl_speed_limit_min, -- .maxlen = sizeof(int), -- .mode = 0644, -- .proc_handler = &proc_dointvec, -- }, -- { -- .ctl_name = DEV_RAID_SPEED_LIMIT_MAX, -- .procname = "speed_limit_max", -- .data = &sysctl_speed_limit_max, -- .maxlen = sizeof(int), -- .mode = 0644, -- .proc_handler = &proc_dointvec, -- }, -- { .ctl_name = 0 } --}; -- --static ctl_table raid_dir_table[] = { -- { -- .ctl_name = DEV_RAID, -- .procname = "raid", -- .maxlen = 0, -- .mode = 0555, -- .child = raid_table, -- }, -- { .ctl_name = 0 } --}; -- --static ctl_table raid_root_table[] = { -- { -- .ctl_name = CTL_DEV, -- .procname = "dev", -- .maxlen = 0, -- .mode = 0555, -- .child = raid_dir_table, -- }, -- { .ctl_name = 0 } --}; -- --static struct block_device_operations md_fops; -- --static struct gendisk *disks[MAX_MD_DEVS]; -- --/* -- * Enables to iterate over all existing md arrays -- * all_mddevs_lock protects this list as well as mddev_map. -- */ --static LIST_HEAD(all_mddevs); --static spinlock_t all_mddevs_lock = SPIN_LOCK_UNLOCKED; -- -- --/* -- * iterates through all used mddevs in the system. -- * We take care to grab the all_mddevs_lock whenever navigating -- * the list, and to always hold a refcount when unlocked. -- * Any code which breaks out of this loop while own -- * a reference to the current mddev and must mddev_put it. -- */ --#define ITERATE_MDDEV(mddev,tmp) \ -- \ -- for (({ spin_lock(&all_mddevs_lock); \ -- tmp = all_mddevs.next; \ -- mddev = NULL;}); \ -- ({ if (tmp != &all_mddevs) \ -- mddev_get(list_entry(tmp, mddev_t, all_mddevs));\ -- spin_unlock(&all_mddevs_lock); \ -- if (mddev) mddev_put(mddev); \ -- mddev = list_entry(tmp, mddev_t, all_mddevs); \ -- tmp != &all_mddevs;}); \ -- ({ spin_lock(&all_mddevs_lock); \ -- tmp = tmp->next;}) \ -- ) -- --static mddev_t *mddev_map[MAX_MD_DEVS]; -- --static int md_fail_request (request_queue_t *q, struct bio *bio) --{ -- bio_io_error(bio, bio->bi_size); -- return 0; --} -- --static inline mddev_t *mddev_get(mddev_t *mddev) --{ -- atomic_inc(&mddev->active); -- return mddev; --} -- --static void mddev_put(mddev_t *mddev) --{ -- if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) -- return; -- if (!mddev->raid_disks && list_empty(&mddev->disks)) { -- list_del(&mddev->all_mddevs); -- mddev_map[mdidx(mddev)] = NULL; -- kfree(mddev); -- MOD_DEC_USE_COUNT; -- } -- spin_unlock(&all_mddevs_lock); --} -- --static mddev_t * mddev_find(int unit) --{ -- mddev_t *mddev, *new = NULL; -- -- retry: -- spin_lock(&all_mddevs_lock); -- if (mddev_map[unit]) { -- mddev = mddev_get(mddev_map[unit]); -- spin_unlock(&all_mddevs_lock); -- if (new) -- kfree(new); -- return mddev; -- } -- if (new) { -- mddev_map[unit] = new; -- list_add(&new->all_mddevs, &all_mddevs); -- spin_unlock(&all_mddevs_lock); -- MOD_INC_USE_COUNT; -- return new; -- } -- spin_unlock(&all_mddevs_lock); -- -- new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL); -- if (!new) -- return NULL; -- -- memset(new, 0, sizeof(*new)); -- -- new->__minor = unit; -- init_MUTEX(&new->reconfig_sem); -- INIT_LIST_HEAD(&new->disks); -- INIT_LIST_HEAD(&new->all_mddevs); -- init_timer(&new->safemode_timer); -- atomic_set(&new->active, 1); -- blk_queue_make_request(&new->queue, md_fail_request); -- -- goto retry; --} -- --static inline int mddev_lock(mddev_t * mddev) --{ -- return down_interruptible(&mddev->reconfig_sem); --} -- --static inline void mddev_lock_uninterruptible(mddev_t * mddev) --{ -- down(&mddev->reconfig_sem); --} -- --static inline int mddev_trylock(mddev_t * mddev) --{ -- return down_trylock(&mddev->reconfig_sem); --} -- --static inline void mddev_unlock(mddev_t * mddev) --{ -- up(&mddev->reconfig_sem); --} -- --mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) --{ -- mdk_rdev_t * rdev; -- struct list_head *tmp; -- -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (rdev->desc_nr == nr) -- return rdev; -- } -- return NULL; --} -- --static mdk_rdev_t * find_rdev(mddev_t * mddev, dev_t dev) --{ -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (rdev->bdev->bd_dev == dev) -- return rdev; -- } -- return NULL; --} -- --inline static sector_t calc_dev_sboffset(struct block_device *bdev) --{ -- sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; -- return MD_NEW_SIZE_BLOCKS(size); --} -- --static sector_t calc_dev_size(mdk_rdev_t *rdev, unsigned chunk_size) --{ -- sector_t size; -- -- size = rdev->sb_offset; -- -- if (chunk_size) -- size &= ~((sector_t)chunk_size/1024 - 1); -- return size; --} -- --static int alloc_disk_sb(mdk_rdev_t * rdev) --{ -- if (rdev->sb_page) -- MD_BUG(); -- -- rdev->sb_page = alloc_page(GFP_KERNEL); -- if (!rdev->sb_page) { -- printk(KERN_ALERT "md: out of memory.\n"); -- return -EINVAL; -- } -- -- return 0; --} -- --static void free_disk_sb(mdk_rdev_t * rdev) --{ -- if (rdev->sb_page) { -- page_cache_release(rdev->sb_page); -- rdev->sb_loaded = 0; -- rdev->sb_page = NULL; -- rdev->sb_offset = 0; -- rdev->size = 0; -- } --} -- -- --static int bi_complete(struct bio *bio, unsigned int bytes_done, int error) --{ -- if (bio->bi_size) -- return 1; -- -- complete((struct completion*)bio->bi_private); -- return 0; --} -- --static int sync_page_io(struct block_device *bdev, sector_t sector, int size, -- struct page *page, int rw) --{ -- struct bio bio; -- struct bio_vec vec; -- struct completion event; -- -- bio_init(&bio); -- bio.bi_io_vec = &vec; -- vec.bv_page = page; -- vec.bv_len = size; -- vec.bv_offset = 0; -- bio.bi_vcnt = 1; -- bio.bi_idx = 0; -- bio.bi_size = size; -- bio.bi_bdev = bdev; -- bio.bi_sector = sector; -- init_completion(&event); -- bio.bi_private = &event; -- bio.bi_end_io = bi_complete; -- submit_bio(rw, &bio); -- blk_run_queues(); -- wait_for_completion(&event); -- -- return test_bit(BIO_UPTODATE, &bio.bi_flags); --} -- --static int read_disk_sb(mdk_rdev_t * rdev) --{ -- -- if (!rdev->sb_page) { -- MD_BUG(); -- return -EINVAL; -- } -- if (rdev->sb_loaded) -- return 0; -- -- -- if (!sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, READ)) -- goto fail; -- rdev->sb_loaded = 1; -- return 0; -- --fail: -- printk(KERN_ERR "md: disabled device %s, could not read superblock.\n", -- bdev_partition_name(rdev->bdev)); -- return -EINVAL; --} -- --static int uuid_equal(mdp_super_t *sb1, mdp_super_t *sb2) --{ -- if ( (sb1->set_uuid0 == sb2->set_uuid0) && -- (sb1->set_uuid1 == sb2->set_uuid1) && -- (sb1->set_uuid2 == sb2->set_uuid2) && -- (sb1->set_uuid3 == sb2->set_uuid3)) -- -- return 1; -- -- return 0; --} -- -- --static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) --{ -- int ret; -- mdp_super_t *tmp1, *tmp2; -- -- tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); -- tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); -- -- if (!tmp1 || !tmp2) { -- ret = 0; -- printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); -- goto abort; -- } -- -- *tmp1 = *sb1; -- *tmp2 = *sb2; -- -- /* -- * nr_disks is not constant -- */ -- tmp1->nr_disks = 0; -- tmp2->nr_disks = 0; -- -- if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) -- ret = 0; -- else -- ret = 1; -- --abort: -- if (tmp1) -- kfree(tmp1); -- if (tmp2) -- kfree(tmp2); -- -- return ret; --} -- --static unsigned int calc_sb_csum(mdp_super_t * sb) --{ -- unsigned int disk_csum, csum; -- -- disk_csum = sb->sb_csum; -- sb->sb_csum = 0; -- csum = csum_partial((void *)sb, MD_SB_BYTES, 0); -- sb->sb_csum = disk_csum; -- return csum; --} -- --/* -- * Handle superblock details. -- * We want to be able to handle multiple superblock formats -- * so we have a common interface to them all, and an array of -- * different handlers. -- * We rely on user-space to write the initial superblock, and support -- * reading and updating of superblocks. -- * Interface methods are: -- * int load_super(mdk_rdev_t *dev, mdk_rdev_t *refdev, int minor_version) -- * loads and validates a superblock on dev. -- * if refdev != NULL, compare superblocks on both devices -- * Return: -- * 0 - dev has a superblock that is compatible with refdev -- * 1 - dev has a superblock that is compatible and newer than refdev -- * so dev should be used as the refdev in future -- * -EINVAL superblock incompatible or invalid -- * -othererror e.g. -EIO -- * -- * int validate_super(mddev_t *mddev, mdk_rdev_t *dev) -- * Verify that dev is acceptable into mddev. -- * The first time, mddev->raid_disks will be 0, and data from -- * dev should be merged in. Subsequent calls check that dev -- * is new enough. Return 0 or -EINVAL -- * -- * void sync_super(mddev_t *mddev, mdk_rdev_t *dev) -- * Update the superblock for rdev with data in mddev -- * This does not write to disc. -- * -- */ -- --struct super_type { -- char *name; -- struct module *owner; -- int (*load_super)(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version); -- int (*validate_super)(mddev_t *mddev, mdk_rdev_t *rdev); -- void (*sync_super)(mddev_t *mddev, mdk_rdev_t *rdev); --}; -- --/* -- * load_super for 0.90.0 -- */ --static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) --{ -- mdp_super_t *sb; -- int ret; -- sector_t sb_offset; -- -- /* -- * Calculate the position of the superblock, -- * it's at the end of the disk. -- * -- * It also happens to be a multiple of 4Kb. -- */ -- sb_offset = calc_dev_sboffset(rdev->bdev); -- rdev->sb_offset = sb_offset; -- -- ret = read_disk_sb(rdev); -- if (ret) return ret; -- -- ret = -EINVAL; -- -- sb = (mdp_super_t*)page_address(rdev->sb_page); -- -- if (sb->md_magic != MD_SB_MAGIC) { -- printk(KERN_ERR "md: invalid raid superblock magic on %s\n", -- bdev_partition_name(rdev->bdev)); -- goto abort; -- } -- -- if (sb->major_version != 0 || -- sb->minor_version != 90) { -- printk(KERN_WARNING "Bad version number %d.%d on %s\n", -- sb->major_version, sb->minor_version, -- bdev_partition_name(rdev->bdev)); -- goto abort; -- } -- -- if (sb->md_minor >= MAX_MD_DEVS) { -- printk(KERN_ERR "md: %s: invalid raid minor (%x)\n", -- bdev_partition_name(rdev->bdev), sb->md_minor); -- goto abort; -- } -- if (sb->raid_disks <= 0) -- goto abort; -- -- if (calc_sb_csum(sb) != sb->sb_csum) { -- printk(KERN_WARNING "md: invalid superblock checksum on %s\n", -- bdev_partition_name(rdev->bdev)); -- goto abort; -- } -- -- rdev->preferred_minor = sb->md_minor; -- rdev->data_offset = 0; -- -- if (sb->level == MULTIPATH) -- rdev->desc_nr = -1; -- else -- rdev->desc_nr = sb->this_disk.number; -- -- if (refdev == 0) -- ret = 1; -- else { -- __u64 ev1, ev2; -- mdp_super_t *refsb = (mdp_super_t*)page_address(refdev->sb_page); -- if (!uuid_equal(refsb, sb)) { -- printk(KERN_WARNING "md: %s has different UUID to %s\n", -- bdev_partition_name(rdev->bdev), -- bdev_partition_name(refdev->bdev)); -- goto abort; -- } -- if (!sb_equal(refsb, sb)) { -- printk(KERN_WARNING "md: %s has same UUID" -- " but different superblock to %s\n", -- bdev_partition_name(rdev->bdev), -- bdev_partition_name(refdev->bdev)); -- goto abort; -- } -- ev1 = md_event(sb); -- ev2 = md_event(refsb); -- if (ev1 > ev2) -- ret = 1; -- else -- ret = 0; -- } -- rdev->size = calc_dev_size(rdev, sb->chunk_size); -- -- abort: -- return ret; --} -- --/* -- * validate_super for 0.90.0 -- */ --static int super_90_validate(mddev_t *mddev, mdk_rdev_t *rdev) --{ -- mdp_disk_t *desc; -- mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page); -- -- if (mddev->raid_disks == 0) { -- mddev->major_version = 0; -- mddev->minor_version = sb->minor_version; -- mddev->patch_version = sb->patch_version; -- mddev->persistent = ! sb->not_persistent; -- mddev->chunk_size = sb->chunk_size; -- mddev->ctime = sb->ctime; -- mddev->utime = sb->utime; -- mddev->level = sb->level; -- mddev->layout = sb->layout; -- mddev->raid_disks = sb->raid_disks; -- mddev->size = sb->size; -- mddev->events = md_event(sb); -- -- if (sb->state & (1<recovery_cp = MaxSector; -- else { -- if (sb->events_hi == sb->cp_events_hi && -- sb->events_lo == sb->cp_events_lo) { -- mddev->recovery_cp = sb->recovery_cp; -- } else -- mddev->recovery_cp = 0; -- } -- -- memcpy(mddev->uuid+0, &sb->set_uuid0, 4); -- memcpy(mddev->uuid+4, &sb->set_uuid1, 4); -- memcpy(mddev->uuid+8, &sb->set_uuid2, 4); -- memcpy(mddev->uuid+12,&sb->set_uuid3, 4); -- -- mddev->max_disks = MD_SB_DISKS; -- } else { -- __u64 ev1; -- ev1 = md_event(sb); -- ++ev1; -- if (ev1 < mddev->events) -- return -EINVAL; -- } -- if (mddev->level != LEVEL_MULTIPATH) { -- rdev->raid_disk = -1; -- rdev->in_sync = rdev->faulty = 0; -- desc = sb->disks + rdev->desc_nr; -- -- if (desc->state & (1<faulty = 1; -- else if (desc->state & (1<raid_disk < mddev->raid_disks) { -- rdev->in_sync = 1; -- rdev->raid_disk = desc->raid_disk; -- } -- } -- return 0; --} -- --/* -- * sync_super for 0.90.0 -- */ --static void super_90_sync(mddev_t *mddev, mdk_rdev_t *rdev) --{ -- mdp_super_t *sb; -- struct list_head *tmp; -- mdk_rdev_t *rdev2; -- int next_spare = mddev->raid_disks; -- -- /* make rdev->sb match mddev data.. -- * -- * 1/ zero out disks -- * 2/ Add info for each disk, keeping track of highest desc_nr -- * 3/ any empty disks < highest become removed -- * -- * disks[0] gets initialised to REMOVED because -- * we cannot be sure from other fields if it has -- * been initialised or not. -- */ -- int highest = 0; -- int i; -- int active=0, working=0,failed=0,spare=0,nr_disks=0; -- -- sb = (mdp_super_t*)page_address(rdev->sb_page); -- -- memset(sb, 0, sizeof(*sb)); -- -- sb->md_magic = MD_SB_MAGIC; -- sb->major_version = mddev->major_version; -- sb->minor_version = mddev->minor_version; -- sb->patch_version = mddev->patch_version; -- sb->gvalid_words = 0; /* ignored */ -- memcpy(&sb->set_uuid0, mddev->uuid+0, 4); -- memcpy(&sb->set_uuid1, mddev->uuid+4, 4); -- memcpy(&sb->set_uuid2, mddev->uuid+8, 4); -- memcpy(&sb->set_uuid3, mddev->uuid+12,4); -- -- sb->ctime = mddev->ctime; -- sb->level = mddev->level; -- sb->size = mddev->size; -- sb->raid_disks = mddev->raid_disks; -- sb->md_minor = mddev->__minor; -- sb->not_persistent = !mddev->persistent; -- sb->utime = mddev->utime; -- sb->state = 0; -- sb->events_hi = (mddev->events>>32); -- sb->events_lo = (u32)mddev->events; -- -- if (mddev->in_sync) -- { -- sb->recovery_cp = mddev->recovery_cp; -- sb->cp_events_hi = (mddev->events>>32); -- sb->cp_events_lo = (u32)mddev->events; -- if (mddev->recovery_cp == MaxSector) -- sb->state = (1<< MD_SB_CLEAN); -- } else -- sb->recovery_cp = 0; -- -- sb->layout = mddev->layout; -- sb->chunk_size = mddev->chunk_size; -- -- sb->disks[0].state = (1<raid_disk >= 0 && rdev2->in_sync && !rdev2->faulty) -- rdev2->desc_nr = rdev2->raid_disk; -- else -- rdev2->desc_nr = next_spare++; -- d = &sb->disks[rdev2->desc_nr]; -- nr_disks++; -- d->number = rdev2->desc_nr; -- d->major = MAJOR(rdev2->bdev->bd_dev); -- d->minor = MINOR(rdev2->bdev->bd_dev); -- if (rdev2->raid_disk >= 0 && rdev->in_sync && !rdev2->faulty) -- d->raid_disk = rdev2->raid_disk; -- else -- d->raid_disk = rdev2->desc_nr; /* compatibility */ -- if (rdev2->faulty) { -- d->state = (1<in_sync) { -- d->state = (1<state |= (1<state = 0; -- spare++; -- working++; -- } -- if (rdev2->desc_nr > highest) -- highest = rdev2->desc_nr; -- } -- -- /* now set the "removed" bit on any non-trailing holes */ -- for (i=0; idisks[i]; -- if (d->state == 0 && d->number == 0) { -- d->number = i; -- d->raid_disk = i; -- d->state = (1<nr_disks = nr_disks; -- sb->active_disks = active; -- sb->working_disks = working; -- sb->failed_disks = failed; -- sb->spare_disks = spare; -- -- sb->this_disk = sb->disks[rdev->desc_nr]; -- sb->sb_csum = calc_sb_csum(sb); --} -- --/* -- * version 1 superblock -- */ -- --static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb) --{ -- unsigned int disk_csum, csum; -- int size = 256 + sb->max_dev*2; -- -- disk_csum = sb->sb_csum; -- sb->sb_csum = 0; -- csum = csum_partial((void *)sb, size, 0); -- sb->sb_csum = disk_csum; -- return csum; --} -- --static int super_1_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version) --{ -- struct mdp_superblock_1 *sb; -- int ret; -- sector_t sb_offset; -- -- /* -- * Calculate the position of the superblock. -- * It is always aligned to a 4K boundary and -- * depeding on minor_version, it can be: -- * 0: At least 8K, but less than 12K, from end of device -- * 1: At start of device -- * 2: 4K from start of device. -- */ -- switch(minor_version) { -- case 0: -- sb_offset = rdev->bdev->bd_inode->i_size >> 9; -- sb_offset -= 8*2; -- sb_offset &= ~(4*2); -- /* convert from sectors to K */ -- sb_offset /= 2; -- break; -- case 1: -- sb_offset = 0; -- break; -- case 2: -- sb_offset = 4; -- break; -- default: -- return -EINVAL; -- } -- rdev->sb_offset = sb_offset; -- -- ret = read_disk_sb(rdev); -- if (ret) return ret; -- -- -- sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); -- -- if (sb->magic != cpu_to_le32(MD_SB_MAGIC) || -- sb->major_version != cpu_to_le32(1) || -- le32_to_cpu(sb->max_dev) > (4096-256)/2 || -- le64_to_cpu(sb->super_offset) != (rdev->sb_offset<<1) || -- sb->feature_map != 0) -- return -EINVAL; -- -- if (calc_sb_1_csum(sb) != sb->sb_csum) { -- printk("md: invalid superblock checksum on %s\n", -- bdev_partition_name(rdev->bdev)); -- return -EINVAL; -- } -- rdev->preferred_minor = 0xffff; -- rdev->data_offset = le64_to_cpu(sb->data_offset); -- -- if (refdev == 0) -- return 1; -- else { -- __u64 ev1, ev2; -- struct mdp_superblock_1 *refsb = -- (struct mdp_superblock_1*)page_address(refdev->sb_page); -- -- if (memcmp(sb->set_uuid, refsb->set_uuid, 16) != 0 || -- sb->level != refsb->level || -- sb->layout != refsb->layout || -- sb->chunksize != refsb->chunksize) { -- printk(KERN_WARNING "md: %s has strangely different" -- " superblock to %s\n", -- bdev_partition_name(rdev->bdev), -- bdev_partition_name(refdev->bdev)); -- return -EINVAL; -- } -- ev1 = le64_to_cpu(sb->events); -- ev2 = le64_to_cpu(refsb->events); -- -- if (ev1 > ev2) -- return 1; -- } -- if (minor_version) -- rdev->size = ((rdev->bdev->bd_inode->i_size>>9) - le64_to_cpu(sb->data_offset)) / 2; -- else -- rdev->size = rdev->sb_offset; -- if (rdev->size < le64_to_cpu(sb->data_size)/2) -- return -EINVAL; -- rdev->size = le64_to_cpu(sb->data_size)/2; -- if (le32_to_cpu(sb->chunksize)) -- rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1); -- return 0; --} -- --static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev) --{ -- struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); -- -- if (mddev->raid_disks == 0) { -- mddev->major_version = 1; -- mddev->minor_version = 0; -- mddev->patch_version = 0; -- mddev->persistent = 1; -- mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9; -- mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1); -- mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1); -- mddev->level = le32_to_cpu(sb->level); -- mddev->layout = le32_to_cpu(sb->layout); -- mddev->raid_disks = le32_to_cpu(sb->raid_disks); -- mddev->size = (u32)le64_to_cpu(sb->size); -- mddev->events = le64_to_cpu(sb->events); -- -- mddev->recovery_cp = le64_to_cpu(sb->resync_offset); -- memcpy(mddev->uuid, sb->set_uuid, 16); -- -- mddev->max_disks = (4096-256)/2; -- } else { -- __u64 ev1; -- ev1 = le64_to_cpu(sb->events); -- ++ev1; -- if (ev1 < mddev->events) -- return -EINVAL; -- } -- -- if (mddev->level != LEVEL_MULTIPATH) { -- int role; -- rdev->desc_nr = le32_to_cpu(sb->dev_number); -- role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]); -- switch(role) { -- case 0xffff: /* spare */ -- rdev->in_sync = 0; -- rdev->faulty = 0; -- rdev->raid_disk = -1; -- break; -- case 0xfffe: /* faulty */ -- rdev->in_sync = 0; -- rdev->faulty = 1; -- rdev->raid_disk = -1; -- break; -- default: -- rdev->in_sync = 1; -- rdev->faulty = 0; -- rdev->raid_disk = role; -- break; -- } -- } -- return 0; --} -- --static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev) --{ -- struct mdp_superblock_1 *sb; -- struct list_head *tmp; -- mdk_rdev_t *rdev2; -- int max_dev, i; -- /* make rdev->sb match mddev and rdev data. */ -- -- sb = (struct mdp_superblock_1*)page_address(rdev->sb_page); -- -- sb->feature_map = 0; -- sb->pad0 = 0; -- memset(sb->pad1, 0, sizeof(sb->pad1)); -- memset(sb->pad2, 0, sizeof(sb->pad2)); -- memset(sb->pad3, 0, sizeof(sb->pad3)); -- -- sb->utime = cpu_to_le64((__u64)mddev->utime); -- sb->events = cpu_to_le64(mddev->events); -- if (mddev->in_sync) -- sb->resync_offset = cpu_to_le64(mddev->recovery_cp); -- else -- sb->resync_offset = cpu_to_le64(0); -- -- max_dev = 0; -- ITERATE_RDEV(mddev,rdev2,tmp) -- if (rdev2->desc_nr > max_dev) -- max_dev = rdev2->desc_nr; -- -- sb->max_dev = max_dev; -- for (i=0; idev_roles[max_dev] = cpu_to_le16(0xfffe); -- -- ITERATE_RDEV(mddev,rdev2,tmp) { -- i = rdev2->desc_nr; -- if (rdev2->faulty) -- sb->dev_roles[i] = cpu_to_le16(0xfffe); -- else if (rdev2->in_sync) -- sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk); -- else -- sb->dev_roles[i] = cpu_to_le16(0xffff); -- } -- -- sb->recovery_offset = cpu_to_le64(0); /* not supported yet */ --} -- -- --struct super_type super_types[] = { -- [0] = { -- .name = "0.90.0", -- .owner = THIS_MODULE, -- .load_super = super_90_load, -- .validate_super = super_90_validate, -- .sync_super = super_90_sync, -- }, -- [1] = { -- .name = "md-1", -- .owner = THIS_MODULE, -- .load_super = super_1_load, -- .validate_super = super_1_validate, -- .sync_super = super_1_sync, -- }, --}; -- --static mdk_rdev_t * match_dev_unit(mddev_t *mddev, mdk_rdev_t *dev) --{ -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- -- ITERATE_RDEV(mddev,rdev,tmp) -- if (rdev->bdev->bd_contains == dev->bdev->bd_contains) -- return rdev; -- -- return NULL; --} -- --static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) --{ -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- -- ITERATE_RDEV(mddev1,rdev,tmp) -- if (match_dev_unit(mddev2, rdev)) -- return 1; -- -- return 0; --} -- --static LIST_HEAD(pending_raid_disks); -- --static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) --{ -- mdk_rdev_t *same_pdev; -- -- if (rdev->mddev) { -- MD_BUG(); -- return -EINVAL; -- } -- same_pdev = match_dev_unit(mddev, rdev); -- if (same_pdev) -- printk(KERN_WARNING -- "md%d: WARNING: %s appears to be on the same physical" -- " disk as %s. True\n protection against single-disk" -- " failure might be compromised.\n", -- mdidx(mddev), bdev_partition_name(rdev->bdev), -- bdev_partition_name(same_pdev->bdev)); -- -- /* Verify rdev->desc_nr is unique. -- * If it is -1, assign a free number, else -- * check number is not in use -- */ -- if (rdev->desc_nr < 0) { -- int choice = 0; -- if (mddev->pers) choice = mddev->raid_disks; -- while (find_rdev_nr(mddev, choice)) -- choice++; -- rdev->desc_nr = choice; -- } else { -- if (find_rdev_nr(mddev, rdev->desc_nr)) -- return -EBUSY; -- } -- -- list_add(&rdev->same_set, &mddev->disks); -- rdev->mddev = mddev; -- printk(KERN_INFO "md: bind<%s>\n", bdev_partition_name(rdev->bdev)); -- return 0; --} -- --static void unbind_rdev_from_array(mdk_rdev_t * rdev) --{ -- if (!rdev->mddev) { -- MD_BUG(); -- return; -- } -- list_del_init(&rdev->same_set); -- printk(KERN_INFO "md: unbind<%s>\n", bdev_partition_name(rdev->bdev)); -- rdev->mddev = NULL; --} -- --/* -- * prevent the device from being mounted, repartitioned or -- * otherwise reused by a RAID array (or any other kernel -- * subsystem), by opening the device. [simply getting an -- * inode is not enough, the SCSI module usage code needs -- * an explicit open() on the device] -- */ --static int lock_rdev(mdk_rdev_t *rdev, dev_t dev) --{ -- int err = 0; -- struct block_device *bdev; -- -- bdev = bdget(dev); -- if (!bdev) -- return -ENOMEM; -- err = blkdev_get(bdev, FMODE_READ|FMODE_WRITE, 0, BDEV_RAW); -- if (err) -- return err; -- err = bd_claim(bdev, rdev); -- if (err) { -- blkdev_put(bdev, BDEV_RAW); -- return err; -- } -- rdev->bdev = bdev; -- return err; --} -- --static void unlock_rdev(mdk_rdev_t *rdev) --{ -- struct block_device *bdev = rdev->bdev; -- rdev->bdev = NULL; -- if (!bdev) -- MD_BUG(); -- bd_release(bdev); -- blkdev_put(bdev, BDEV_RAW); --} -- --void md_autodetect_dev(dev_t dev); -- --static void export_rdev(mdk_rdev_t * rdev) --{ -- printk(KERN_INFO "md: export_rdev(%s)\n", -- bdev_partition_name(rdev->bdev)); -- if (rdev->mddev) -- MD_BUG(); -- free_disk_sb(rdev); -- list_del_init(&rdev->same_set); --#ifndef MODULE -- md_autodetect_dev(rdev->bdev->bd_dev); --#endif -- unlock_rdev(rdev); -- kfree(rdev); --} -- --static void kick_rdev_from_array(mdk_rdev_t * rdev) --{ -- unbind_rdev_from_array(rdev); -- export_rdev(rdev); --} -- --static void export_array(mddev_t *mddev) --{ -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (!rdev->mddev) { -- MD_BUG(); -- continue; -- } -- kick_rdev_from_array(rdev); -- } -- if (!list_empty(&mddev->disks)) -- MD_BUG(); -- mddev->raid_disks = 0; -- mddev->major_version = 0; --} -- --static void print_desc(mdp_disk_t *desc) --{ -- printk(" DISK\n", desc->number, -- partition_name(MKDEV(desc->major,desc->minor)), -- desc->major,desc->minor,desc->raid_disk,desc->state); --} -- --static void print_sb(mdp_super_t *sb) --{ -- int i; -- -- printk(KERN_INFO -- "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", -- sb->major_version, sb->minor_version, sb->patch_version, -- sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, -- sb->ctime); -- printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", -- sb->level, sb->size, sb->nr_disks, sb->raid_disks, -- sb->md_minor, sb->layout, sb->chunk_size); -- printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d" -- " FD:%d SD:%d CSUM:%08x E:%08lx\n", -- sb->utime, sb->state, sb->active_disks, sb->working_disks, -- sb->failed_disks, sb->spare_disks, -- sb->sb_csum, (unsigned long)sb->events_lo); -- -- printk(KERN_INFO); -- for (i = 0; i < MD_SB_DISKS; i++) { -- mdp_disk_t *desc; -- -- desc = sb->disks + i; -- if (desc->number || desc->major || desc->minor || -- desc->raid_disk || (desc->state && (desc->state != 4))) { -- printk(" D %2d: ", i); -- print_desc(desc); -- } -- } -- printk(KERN_INFO "md: THIS: "); -- print_desc(&sb->this_disk); -- --} -- --static void print_rdev(mdk_rdev_t *rdev) --{ -- printk(KERN_INFO "md: rdev %s, SZ:%08llu F:%d S:%d DN:%d ", -- bdev_partition_name(rdev->bdev), (unsigned long long)rdev->size, -- rdev->faulty, rdev->in_sync, rdev->desc_nr); -- if (rdev->sb_loaded) { -- printk(KERN_INFO "md: rdev superblock:\n"); -- print_sb((mdp_super_t*)page_address(rdev->sb_page)); -- } else -- printk(KERN_INFO "md: no rdev superblock!\n"); --} -- --void md_print_devices(void) --{ -- struct list_head *tmp, *tmp2; -- mdk_rdev_t *rdev; -- mddev_t *mddev; -- -- printk("\n"); -- printk("md: **********************************\n"); -- printk("md: * *\n"); -- printk("md: **********************************\n"); -- ITERATE_MDDEV(mddev,tmp) { -- printk("md%d: ", mdidx(mddev)); -- -- ITERATE_RDEV(mddev,rdev,tmp2) -- printk("<%s>", bdev_partition_name(rdev->bdev)); -- -- ITERATE_RDEV(mddev,rdev,tmp2) -- print_rdev(rdev); -- } -- printk("md: **********************************\n"); -- printk("\n"); --} -- -- --static int write_disk_sb(mdk_rdev_t * rdev) --{ -- -- if (!rdev->sb_loaded) { -- MD_BUG(); -- return 1; -- } -- if (rdev->faulty) { -- MD_BUG(); -- return 1; -- } -- -- dprintk(KERN_INFO "(write) %s's sb offset: %llu\n", -- bdev_partition_name(rdev->bdev), -- (unsigned long long)rdev->sb_offset); -- -- if (sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE)) -- return 0; -- -- printk("md: write_disk_sb failed for device %s\n", -- bdev_partition_name(rdev->bdev)); -- return 1; --} -- --static void sync_sbs(mddev_t * mddev) --{ -- mdk_rdev_t *rdev; -- struct list_head *tmp; -- -- ITERATE_RDEV(mddev,rdev,tmp) { -- super_types[mddev->major_version]. -- sync_super(mddev, rdev); -- rdev->sb_loaded = 1; -- } --} -- --static void md_update_sb(mddev_t * mddev) --{ -- int err, count = 100; -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- -- mddev->sb_dirty = 0; --repeat: -- mddev->utime = get_seconds(); -- mddev->events ++; -- -- if (!mddev->events) { -- /* -- * oops, this 64-bit counter should never wrap. -- * Either we are in around ~1 trillion A.C., assuming -- * 1 reboot per second, or we have a bug: -- */ -- MD_BUG(); -- mddev->events --; -- } -- sync_sbs(mddev); -- -- /* -- * do not write anything to disk if using -- * nonpersistent superblocks -- */ -- if (!mddev->persistent) -- return; -- -- dprintk(KERN_INFO -- "md: updating md%d RAID superblock on device (in sync %d)\n", -- mdidx(mddev),mddev->in_sync); -- -- err = 0; -- ITERATE_RDEV(mddev,rdev,tmp) { -- dprintk(KERN_INFO "md: "); -- if (rdev->faulty) -- dprintk("(skipping faulty "); -- -- dprintk("%s ", bdev_partition_name(rdev->bdev)); -- if (!rdev->faulty) { -- err += write_disk_sb(rdev); -- } else -- dprintk(")\n"); -- if (!err && mddev->level == LEVEL_MULTIPATH) -- /* only need to write one superblock... */ -- break; -- } -- if (err) { -- if (--count) { -- printk(KERN_ERR "md: errors occurred during superblock" -- " update, repeating\n"); -- goto repeat; -- } -- printk(KERN_ERR \ -- "md: excessive errors occurred during superblock update, exiting\n"); -- } --} -- --/* -- * Import a device. If 'super_format' >= 0, then sanity check the superblock -- * -- * mark the device faulty if: -- * -- * - the device is nonexistent (zero size) -- * - the device has no valid superblock -- * -- * a faulty rdev _never_ has rdev->sb set. -- */ --static mdk_rdev_t *md_import_device(dev_t newdev, int super_format, int super_minor) --{ -- int err; -- mdk_rdev_t *rdev; -- sector_t size; -- -- rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL); -- if (!rdev) { -- printk(KERN_ERR "md: could not alloc mem for %s!\n", -- partition_name(newdev)); -- return ERR_PTR(-ENOMEM); -- } -- memset(rdev, 0, sizeof(*rdev)); -- -- if ((err = alloc_disk_sb(rdev))) -- goto abort_free; -- -- err = lock_rdev(rdev, newdev); -- if (err) { -- printk(KERN_ERR "md: could not lock %s.\n", -- partition_name(newdev)); -- goto abort_free; -- } -- rdev->desc_nr = -1; -- rdev->faulty = 0; -- rdev->in_sync = 0; -- rdev->data_offset = 0; -- atomic_set(&rdev->nr_pending, 0); -- -- size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; -- if (!size) { -- printk(KERN_WARNING -- "md: %s has zero or unknown size, marking faulty!\n", -- bdev_partition_name(rdev->bdev)); -- err = -EINVAL; -- goto abort_free; -- } -- -- if (super_format >= 0) { -- err = super_types[super_format]. -- load_super(rdev, NULL, super_minor); -- if (err == -EINVAL) { -- printk(KERN_WARNING -- "md: %s has invalid sb, not importing!\n", -- bdev_partition_name(rdev->bdev)); -- goto abort_free; -- } -- if (err < 0) { -- printk(KERN_WARNING -- "md: could not read %s's sb, not importing!\n", -- bdev_partition_name(rdev->bdev)); -- goto abort_free; -- } -- } -- INIT_LIST_HEAD(&rdev->same_set); -- -- return rdev; -- --abort_free: -- if (rdev->sb_page) { -- if (rdev->bdev) -- unlock_rdev(rdev); -- free_disk_sb(rdev); -- } -- kfree(rdev); -- return ERR_PTR(err); --} -- --/* -- * Check a full RAID array for plausibility -- */ -- -- --static int analyze_sbs(mddev_t * mddev) --{ -- int i; -- struct list_head *tmp; -- mdk_rdev_t *rdev, *freshest; -- -- freshest = NULL; -- ITERATE_RDEV(mddev,rdev,tmp) -- switch (super_types[mddev->major_version]. -- load_super(rdev, freshest, mddev->minor_version)) { -- case 1: -- freshest = rdev; -- break; -- case 0: -- break; -- default: -- printk( KERN_ERR \ -- "md: fatal superblock inconsistency in %s" -- " -- removing from array\n", -- bdev_partition_name(rdev->bdev)); -- kick_rdev_from_array(rdev); -- } -- -- -- super_types[mddev->major_version]. -- validate_super(mddev, freshest); -- -- i = 0; -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (rdev != freshest) -- if (super_types[mddev->major_version]. -- validate_super(mddev, rdev)) { -- printk(KERN_WARNING "md: kicking non-fresh %s" -- " from array!\n", -- bdev_partition_name(rdev->bdev)); -- kick_rdev_from_array(rdev); -- continue; -- } -- if (mddev->level == LEVEL_MULTIPATH) { -- rdev->desc_nr = i++; -- rdev->raid_disk = rdev->desc_nr; -- rdev->in_sync = 1; -- } -- } -- -- -- /* -- * Check if we can support this RAID array -- */ -- if (mddev->major_version != MD_MAJOR_VERSION || -- mddev->minor_version > MD_MINOR_VERSION) { -- printk(KERN_ALERT -- "md: md%d: unsupported raid array version %d.%d.%d\n", -- mdidx(mddev), mddev->major_version, -- mddev->minor_version, mddev->patch_version); -- goto abort; -- } -- -- if ((mddev->recovery_cp != MaxSector) && ((mddev->level == 1) || -- (mddev->level == 4) || (mddev->level == 5))) -- printk(KERN_ERR "md: md%d: raid array is not clean" -- " -- starting background reconstruction\n", -- mdidx(mddev)); -- -- return 0; --abort: -+*** 1453,90 **** 1 - return 1; - } - -+#undef OLD_LEVEL -+ - static int device_size_calculation(mddev_t * mddev) - { - int data_disks = 0; - unsigned int readahead; - struct list_head *tmp; - mdk_rdev_t *rdev; - - /* - * Do device size calculation. Bail out if too small. - * (we have to do this after having validated chunk_size, - * because device size has to be modulo chunk_size) - */ - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size < mddev->chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size:" - " %lluk < %dk\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->size, - mddev->chunk_size / 1024); - return -EINVAL; - } - } - - switch (mddev->level) { - case LEVEL_MULTIPATH: - data_disks = 1; - break; - case -3: - data_disks = 1; - break; - case -2: - data_disks = 1; - break; - case LEVEL_LINEAR: - zoned_raid_size(mddev); - data_disks = 1; - break; - case 0: - zoned_raid_size(mddev); - data_disks = mddev->raid_disks; - break; - case 1: - data_disks = 1; - break; - case 4: - case 5: - data_disks = mddev->raid_disks-1; - break; - default: - printk(KERN_ERR "md: md%d: unsupported raid level %d\n", - mdidx(mddev), mddev->level); - goto abort; - } - if (!md_size[mdidx(mddev)]) - md_size[mdidx(mddev)] = mddev->size * data_disks; - - readahead = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE; - if (!mddev->level || (mddev->level == 4) || (mddev->level == 5)) { - readahead = (mddev->chunk_size>>PAGE_SHIFT) * 4 * data_disks; - if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) - readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; - } else { - // (no multipath branch - it uses the default setting) - if (mddev->level == -3) - readahead = 0; - } - - printk(KERN_INFO "md%d: max total readahead window set to %ldk\n", - mdidx(mddev), readahead*(PAGE_SIZE/1024)); - - printk(KERN_INFO - "md%d: %d data-disks, max readahead per data-disk: %ldk\n", - mdidx(mddev), data_disks, readahead/data_disks*(PAGE_SIZE/1024)); - return 0; - abort: - return 1; - } - - static struct gendisk *md_probe(dev_t dev, int *part, void *data) - { - static DECLARE_MUTEX(disks_sem); -- int unit = MINOR(dev); -- mddev_t *mddev = mddev_find(unit); -- struct gendisk *disk; -- -- if (!mddev) -- return NULL; -- -- down(&disks_sem); -- if (disks[unit]) { -- up(&disks_sem); -- mddev_put(mddev); -- return NULL; -- } -- disk = alloc_disk(1); -- if (!disk) { -- up(&disks_sem); -- mddev_put(mddev); -- return NULL; -- } -- disk->major = MD_MAJOR; -- disk->first_minor = mdidx(mddev); -- sprintf(disk->disk_name, "md%d", mdidx(mddev)); -- disk->fops = &md_fops; -- disk->private_data = mddev; -- disk->queue = &mddev->queue; -- add_disk(disk); -- disks[mdidx(mddev)] = disk; -- up(&disks_sem); -- return NULL; --} -- --void md_wakeup_thread(mdk_thread_t *thread); -- --static void md_safemode_timeout(unsigned long data) --{ -- mddev_t *mddev = (mddev_t *) data; -- -- mddev->safemode = 1; -- md_wakeup_thread(mddev->thread); --} -- -- --static int do_md_run(mddev_t * mddev) --{ -- int pnum, err; -- int chunk_size; -- struct list_head *tmp; -- mdk_rdev_t *rdev; -- struct gendisk *disk; -- -- if (list_empty(&mddev->disks)) { -- MD_BUG(); -- return -EINVAL; -- } -- -- if (mddev->pers) -- return -EBUSY; -- -- /* -- * Analyze all RAID superblock(s) -- */ -- if (!mddev->raid_disks && analyze_sbs(mddev)) { -- MD_BUG(); -- return -EINVAL; -- } -- -- chunk_size = mddev->chunk_size; -- pnum = level_to_pers(mddev->level); -- -- if ((pnum != MULTIPATH) && (pnum != RAID1)) { -- if (!chunk_size) { -- /* -- * 'default chunksize' in the old md code used to -- * be PAGE_SIZE, baaad. -- * we abort here to be on the safe side. We don't -- * want to continue the bad practice. -- */ -- printk(KERN_ERR -- "no chunksize specified, see 'man raidtab'\n"); -- return -EINVAL; -- } -- if (chunk_size > MAX_CHUNK_SIZE) { -- printk(KERN_ERR "too big chunk_size: %d > %d\n", -- chunk_size, MAX_CHUNK_SIZE); -- return -EINVAL; -- } -- /* -- * chunk-size has to be a power of 2 and multiples of PAGE_SIZE -- */ -- if ( (1 << ffz(~chunk_size)) != chunk_size) { -- MD_BUG(); -- return -EINVAL; -- } -- if (chunk_size < PAGE_SIZE) { -- printk(KERN_ERR "too small chunk_size: %d < %ld\n", -- chunk_size, PAGE_SIZE); -- return -EINVAL; -- } -- -- /* devices must have minimum size of one chunk */ -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (rdev->faulty) -- continue; -- if (rdev->size < chunk_size / 1024) { -- printk(KERN_WARNING -- "md: Dev %s smaller than chunk_size:" -- " %lluk < %dk\n", -- bdev_partition_name(rdev->bdev), -- (unsigned long long)rdev->size, -- chunk_size / 1024); -- return -EINVAL; -- } -- } -- } -- if (pnum >= MAX_PERSONALITY) { -- MD_BUG(); -- return -EINVAL; -- } -- --#ifdef CONFIG_KMOD -- if (!pers[pnum]) -- { -- char module_name[80]; -- sprintf (module_name, "md-personality-%d", pnum); -- request_module (module_name); -+*** 1664,9 **** 2 -+ } - } --#endif - - if (device_size_calculation(mddev)) - return -EINVAL; - - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md -- * device. -- * Also find largest hardsector size -- */ -- ITERATE_RDEV(mddev,rdev,tmp) { -- if (rdev->faulty) -- continue; -- sync_blockdev(rdev->bdev); -- invalidate_bdev(rdev->bdev, 0); -- } -- -- md_probe(mdidx(mddev), NULL, NULL); -- disk = disks[mdidx(mddev)]; -- if (!disk) -- return -ENOMEM; -- -- spin_lock(&pers_lock); -- if (!pers[pnum] || !try_module_get(pers[pnum]->owner)) { -- spin_unlock(&pers_lock); -- printk(KERN_ERR "md: personality %d is not loaded!\n", -- pnum); -- return -EINVAL; -- } -- -- mddev->pers = pers[pnum]; -- spin_unlock(&pers_lock); -- -- blk_queue_make_request(&mddev->queue, mddev->pers->make_request); -- printk("%s: setting max_sectors to %d, segment boundary to %d\n", -- disk->disk_name, -- chunk_size >> 9, -- (chunk_size>>1)-1); -- blk_queue_max_sectors(&mddev->queue, chunk_size >> 9); -- blk_queue_segment_boundary(&mddev->queue, (chunk_size>>1) - 1); -- mddev->queue.queuedata = mddev; -- -- err = mddev->pers->run(mddev); -- if (err) { -- printk(KERN_ERR "md: pers->run() failed ...\n"); -- module_put(mddev->pers->owner); -- mddev->pers = NULL; -- return -EINVAL; -- } -- atomic_set(&mddev->writes_pending,0); -- mddev->safemode = 0; -- mddev->safemode_timer.function = md_safemode_timeout; -- mddev->safemode_timer.data = (unsigned long) mddev; -- mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */ -- mddev->in_sync = 1; -- -- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); -- md_wakeup_thread(mddev->thread); -- set_capacity(disk, mddev->array_size<<1); -- return 0; --} -- --static int restart_array(mddev_t *mddev) --{ -- struct gendisk *disk = disks[mdidx(mddev)]; -- int err; -- -- /* -- * Complain if it has no devices -- */ -- err = -ENXIO; -- if (list_empty(&mddev->disks)) -- goto out; -- -- if (mddev->pers) { -- err = -EBUSY; -- if (!mddev->ro) -- goto out; -- -- mddev->safemode = 0; -- mddev->ro = 0; -- set_disk_ro(disk, 0); -- -- printk(KERN_INFO "md: md%d switched to read-write mode.\n", -- mdidx(mddev)); -- /* -- * Kick recovery or resync if necessary -- */ -- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); -- md_wakeup_thread(mddev->thread); -- err = 0; -- } else { -- printk(KERN_ERR "md: md%d has no personality assigned.\n", -- mdidx(mddev)); -- err = -EINVAL; -- } -- --out: -- return err; --} -- --static int do_md_stop(mddev_t * mddev, int ro) --{ -- int err = 0; -- struct gendisk *disk = disks[mdidx(mddev)]; -- -- if (atomic_read(&mddev->active)>2) { -- printk("md: md%d still in use.\n",mdidx(mddev)); -- err = -EBUSY; -- goto out; -- } -- -- if (mddev->pers) { -- if (mddev->sync_thread) { -- set_bit(MD_RECOVERY_INTR, &mddev->recovery); -- md_unregister_thread(mddev->sync_thread); -- mddev->sync_thread = NULL; -- } -- -- del_timer_sync(&mddev->safemode_timer); -- -- invalidate_device(mk_kdev(disk->major, disk->first_minor), 1); -- -- if (ro) { -- err = -ENXIO; -- if (mddev->ro) -- goto out; -- mddev->ro = 1; -- } else { -- if (mddev->ro) -- set_disk_ro(disk, 0); -- if (mddev->pers->stop(mddev)) { -- err = -EBUSY; -- if (mddev->ro) -- set_disk_ro(disk, 1); -- goto out; -- } -- module_put(mddev->pers->owner); -- mddev->pers = NULL; -- if (mddev->ro) -- mddev->ro = 0; -- } -- if (mddev->raid_disks) { -- /* mark array as shutdown cleanly */ -- mddev->in_sync = 1; -- md_update_sb(mddev); -- } -- if (ro) -- set_disk_ro(disk, 1); -- } -- /* -- * Free resources if final stop -- */ -- if (!ro) { -- struct gendisk *disk; -- printk(KERN_INFO "md: md%d stopped.\n", mdidx(mddev)); -- -- export_array(mddev); -- -- mddev->array_size = 0; -- disk = disks[mdidx(mddev)]; -- if (disk) -- set_capacity(disk, 0); -- } else -- printk(KERN_INFO "md: md%d switched to read-only mode.\n", -- mdidx(mddev)); -- err = 0; --out: -- return err; --} -- --static void autorun_array(mddev_t *mddev) --{ -- mdk_rdev_t *rdev; -- struct list_head *tmp; -- int err; -- -- if (list_empty(&mddev->disks)) { -- MD_BUG(); -- return; -- } -- -- printk(KERN_INFO "md: running: "); -- -- ITERATE_RDEV(mddev,rdev,tmp) { -- printk("<%s>", bdev_partition_name(rdev->bdev)); -- } -- printk("\n"); -- -- err = do_md_run (mddev); -- if (err) { -- printk(KERN_WARNING "md :do_md_run() returned %d\n", err); -- do_md_stop (mddev, 0); -- } --} -- --/* -- * lets try to run arrays based on all disks that have arrived -- * until now. (those are in pending_raid_disks) -- * -- * the method: pick the first pending disk, collect all disks with -- * the same UUID, remove all from the pending list and put them into -- * the 'same_array' list. Then order this list based on superblock -- * update time (freshest comes first), kick out 'old' disks and -- * compare superblocks. If everything's fine then run it. -- * -- * If "unit" is allocated, then bump its reference count -- */ --static void autorun_devices(void) --{ -- struct list_head candidates; -- struct list_head *tmp; -- mdk_rdev_t *rdev0, *rdev; -- mddev_t *mddev; -- -- printk(KERN_INFO "md: autorun ...\n"); -- while (!list_empty(&pending_raid_disks)) { -- rdev0 = list_entry(pending_raid_disks.next, -- mdk_rdev_t, same_set); -- -- printk(KERN_INFO "md: considering %s ...\n", -- bdev_partition_name(rdev0->bdev)); -- INIT_LIST_HEAD(&candidates); -- ITERATE_RDEV_PENDING(rdev,tmp) -- if (super_90_load(rdev, rdev0, 0) >= 0) { -- printk(KERN_INFO "md: adding %s ...\n", -- bdev_partition_name(rdev->bdev)); -- list_move(&rdev->same_set, &candidates); -- } -- /* -- * now we have a set of devices, with all of them having -- * mostly sane superblocks. It's time to allocate the -- * mddev. -- */ -- -- mddev = mddev_find(rdev0->preferred_minor); -- if (!mddev) { -- printk(KERN_ERR -- "md: cannot allocate memory for md drive.\n"); -- break; -- } -- if (mddev_lock(mddev)) -- printk(KERN_WARNING "md: md%d locked, cannot run\n", -- mdidx(mddev)); -- else if (mddev->raid_disks || mddev->major_version -- || !list_empty(&mddev->disks)) { -- printk(KERN_WARNING -- "md: md%d already running, cannot run %s\n", -- mdidx(mddev), bdev_partition_name(rdev0->bdev)); -- mddev_unlock(mddev); -- } else { -- printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); -- ITERATE_RDEV_GENERIC(candidates,rdev,tmp) { -- list_del_init(&rdev->same_set); -- if (bind_rdev_to_array(rdev, mddev)) -- export_rdev(rdev); -- } -- autorun_array(mddev); -- mddev_unlock(mddev); -- } -- /* on success, candidates will be empty, on error -- * it won't... -- */ -- ITERATE_RDEV_GENERIC(candidates,rdev,tmp) -- export_rdev(rdev); -- mddev_put(mddev); -- } -- printk(KERN_INFO "md: ... autorun DONE.\n"); --} -- --/* -- * import RAID devices based on one partition -- * if possible, the array gets run as well. -- */ -- --static int autostart_array(dev_t startdev) --{ -- int err = -EINVAL, i; -- mdp_super_t *sb = NULL; -- mdk_rdev_t *start_rdev = NULL, *rdev; -- -- start_rdev = md_import_device(startdev, 0, 0); -- if (IS_ERR(start_rdev)) { -- printk(KERN_WARNING "md: could not import %s!\n", -- partition_name(startdev)); -- return err; -- } -- -- /* NOTE: this can only work for 0.90.0 superblocks */ -- sb = (mdp_super_t*)page_address(start_rdev->sb_page); -- if (sb->major_version != 0 || -- sb->minor_version != 90 ) { -- printk(KERN_WARNING "md: can only autostart 0.90.0 arrays\n"); -- export_rdev(start_rdev); -- return err; -- } -- -- if (start_rdev->faulty) { -- printk(KERN_WARNING -- "md: can not autostart based on faulty %s!\n", -- bdev_partition_name(start_rdev->bdev)); -- export_rdev(start_rdev); -- return err; -- } -- list_add(&start_rdev->same_set, &pending_raid_disks); -- -- for (i = 0; i < MD_SB_DISKS; i++) { -- mdp_disk_t *desc; -- dev_t dev; -- -- desc = sb->disks + i; -- dev = MKDEV(desc->major, desc->minor); -- -- if (!dev) -- continue; -- if (dev == startdev) -- continue; -- rdev = md_import_device(dev, 0, 0); -- if (IS_ERR(rdev)) { -- printk(KERN_WARNING "md: could not import %s," -- " trying to run array nevertheless.\n", -- partition_name(dev)); -- continue; -- } -- list_add(&rdev->same_set, &pending_raid_disks); -- } -- -- /* -- * possibly return codes -- */ -- autorun_devices(); -- return 0; -- --} -- -- --static int get_version(void * arg) --{ -- mdu_version_t ver; -- -- ver.major = MD_MAJOR_VERSION; -- ver.minor = MD_MINOR_VERSION; -- ver.patchlevel = MD_PATCHLEVEL_VERSION; -- -- if (copy_to_user(arg, &ver, sizeof(ver))) -- return -EFAULT; -- -- return 0; --} -- --static int get_array_info(mddev_t * mddev, void * arg) --{ -- mdu_array_info_t info; -- int nr,working,active,failed,spare; -- mdk_rdev_t *rdev; -- struct list_head *tmp; -- -- nr=working=active=failed=spare=0; -- ITERATE_RDEV(mddev,rdev,tmp) { -- nr++; -- if (rdev->faulty) -- failed++; -- else { -- working++; -- if (rdev->in_sync) -- active++; -- else -- spare++; -- } -- } -- -- info.major_version = mddev->major_version; -- info.minor_version = mddev->minor_version; -- info.patch_version = 1; -- info.ctime = mddev->ctime; -- info.level = mddev->level; -- info.size = mddev->size; -- info.nr_disks = nr; -- info.raid_disks = mddev->raid_disks; -- info.md_minor = mddev->__minor; -- info.not_persistent= !mddev->persistent; -- -- info.utime = mddev->utime; -- info.state = 0; -- if (mddev->in_sync) -- info.state = (1<layout; -- info.chunk_size = mddev->chunk_size; -- -- if (copy_to_user(arg, &info, sizeof(info))) -- return -EFAULT; -- -- return 0; --} -- --static int get_disk_info(mddev_t * mddev, void * arg) --{ -- mdu_disk_info_t info; -- unsigned int nr; -- mdk_rdev_t *rdev; -- -- if (copy_from_user(&info, arg, sizeof(info))) -- return -EFAULT; -- -- nr = info.number; -- -- rdev = find_rdev_nr(mddev, nr); -- if (rdev) { -- info.major = MAJOR(rdev->bdev->bd_dev); -- info.minor = MINOR(rdev->bdev->bd_dev); -- info.raid_disk = rdev->raid_disk; -- info.state = 0; -- if (rdev->faulty) -- info.state |= (1<in_sync) { -- info.state |= (1<major,info->minor); -- if (!mddev->raid_disks) { -- int err; -- /* expecting a device which has a superblock */ -- rdev = md_import_device(dev, mddev->major_version, mddev->minor_version); -- if (IS_ERR(rdev)) { -- printk(KERN_WARNING -- "md: md_import_device returned %ld\n", -- PTR_ERR(rdev)); -- return PTR_ERR(rdev); -- } -- if (!list_empty(&mddev->disks)) { -- mdk_rdev_t *rdev0 = list_entry(mddev->disks.next, -- mdk_rdev_t, same_set); -- int err = super_types[mddev->major_version] -- .load_super(rdev, rdev0, mddev->minor_version); -- if (err < 0) { -- printk(KERN_WARNING -- "md: %s has different UUID to %s\n", -- bdev_partition_name(rdev->bdev), -- bdev_partition_name(rdev0->bdev)); -- export_rdev(rdev); -- return -EINVAL; -- } -- } -- err = bind_rdev_to_array(rdev, mddev); -- if (err) -- export_rdev(rdev); -- return err; -- } -- -- /* -- * add_new_disk can be used once the array is assembled -- * to add "hot spares". They must already have a superblock -- * written -- */ -- if (mddev->pers) { -- int err; -- if (!mddev->pers->hot_add_disk) { -- printk(KERN_WARNING -- "md%d: personality does not support diskops!\n", -- mdidx(mddev)); -- return -EINVAL; -- } -- rdev = md_import_device(dev, mddev->major_version, -- mddev->minor_version); -- if (IS_ERR(rdev)) { -- printk(KERN_WARNING -- "md: md_import_device returned %ld\n", -- PTR_ERR(rdev)); -- return PTR_ERR(rdev); -- } -- rdev->in_sync = 0; /* just to be sure */ -- rdev->raid_disk = -1; -- err = bind_rdev_to_array(rdev, mddev); -- if (err) -- export_rdev(rdev); -- if (mddev->thread) -- md_wakeup_thread(mddev->thread); -- return err; -- } -- -- /* otherwise, add_new_disk is only allowed -- * for major_version==0 superblocks -- */ -- if (mddev->major_version != 0) { -- printk(KERN_WARNING "md%d: ADD_NEW_DISK not supported\n", -- mdidx(mddev)); -- return -EINVAL; -- } -- -- if (!(info->state & (1<desc_nr = info->number; -- if (info->raid_disk < mddev->raid_disks) -- rdev->raid_disk = info->raid_disk; -- else -- rdev->raid_disk = -1; -- -- rdev->faulty = 0; -- if (rdev->raid_disk < mddev->raid_disks) -- rdev->in_sync = (info->state & (1<in_sync = 0; -- -- err = bind_rdev_to_array(rdev, mddev); -- if (err) { -- export_rdev(rdev); -- return err; -- } -- -- if (!mddev->persistent) { -- printk(KERN_INFO "md: nonpersistent superblock ...\n"); -- rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS; -- } else -- rdev->sb_offset = calc_dev_sboffset(rdev->bdev); -- rdev->size = calc_dev_size(rdev, mddev->chunk_size); -- -- if (!mddev->size || (mddev->size > rdev->size)) -- mddev->size = rdev->size; -- } -- -- return 0; --} -- --static int hot_generate_error(mddev_t * mddev, dev_t dev) --{ -- struct request_queue *q; -- mdk_rdev_t *rdev; -- -- if (!mddev->pers) -- return -ENODEV; -- -- printk(KERN_INFO "md: trying to generate %s error in md%d ... \n", -- partition_name(dev), mdidx(mddev)); -- -- rdev = find_rdev(mddev, dev); -- if (!rdev) { -- MD_BUG(); -- return -ENXIO; -- } -- -- if (rdev->desc_nr == -1) { -- MD_BUG(); -- return -EINVAL; -- } -- if (!rdev->in_sync) -- return -ENODEV; -- -- q = bdev_get_queue(rdev->bdev); -- if (!q) { -- MD_BUG(); -- return -ENODEV; -- } -- printk(KERN_INFO "md: okay, generating error!\n"); --// q->oneshot_error = 1; // disabled for now -- -- return 0; --} -- --static int hot_remove_disk(mddev_t * mddev, dev_t dev) --{ -- mdk_rdev_t *rdev; -- -- if (!mddev->pers) -- return -ENODEV; -- -- printk(KERN_INFO "md: trying to remove %s from md%d ... \n", -- partition_name(dev), mdidx(mddev)); -- -- rdev = find_rdev(mddev, dev); -- if (!rdev) -- return -ENXIO; -- -- if (rdev->raid_disk >= 0) -- goto busy; -- -- kick_rdev_from_array(rdev); -- md_update_sb(mddev); -- -- return 0; --busy: -- printk(KERN_WARNING "md: cannot remove active disk %s from md%d ... \n", -- bdev_partition_name(rdev->bdev), mdidx(mddev)); -- return -EBUSY; --} -- --static int hot_add_disk(mddev_t * mddev, dev_t dev) --{ -- int err; -- unsigned int size; -- mdk_rdev_t *rdev; -- -- if (!mddev->pers) -- return -ENODEV; -- -- printk(KERN_INFO "md: trying to hot-add %s to md%d ... \n", -- partition_name(dev), mdidx(mddev)); -- -- if (mddev->major_version != 0) { -- printk(KERN_WARNING "md%d: HOT_ADD may only be used with" -- " version-0 superblocks.\n", -- mdidx(mddev)); -- return -EINVAL; -- } -- if (!mddev->pers->hot_add_disk) { -- printk(KERN_WARNING -- "md%d: personality does not support diskops!\n", -- mdidx(mddev)); -- return -EINVAL; -- } -- -- rdev = md_import_device (dev, -1, 0); -- if (IS_ERR(rdev)) { -- printk(KERN_WARNING -- "md: error, md_import_device() returned %ld\n", -- PTR_ERR(rdev)); -- return -EINVAL; -- } -- -- rdev->sb_offset = calc_dev_sboffset(rdev->bdev); -- size = calc_dev_size(rdev, mddev->chunk_size); -- rdev->size = size; -- -- if (size < mddev->size) { -- printk(KERN_WARNING -- "md%d: disk size %llu blocks < array size %llu\n", -- mdidx(mddev), (unsigned long long)size, -- (unsigned long long)mddev->size); -- err = -ENOSPC; -- goto abort_export; -- } -- -- if (rdev->faulty) { -- printk(KERN_WARNING -- "md: can not hot-add faulty %s disk to md%d!\n", -- bdev_partition_name(rdev->bdev), mdidx(mddev)); -- err = -EINVAL; -- goto abort_export; -- } -- rdev->in_sync = 0; -- rdev->desc_nr = -1; -- bind_rdev_to_array(rdev, mddev); -- -- /* -- * The rest should better be atomic, we can have disk failures -- * noticed in interrupt contexts ... -- */ -- -- if (rdev->desc_nr == mddev->max_disks) { -- printk(KERN_WARNING "md%d: can not hot-add to full array!\n", -- mdidx(mddev)); -- err = -EBUSY; -- goto abort_unbind_export; -- } -- -- rdev->raid_disk = -1; -- -- md_update_sb(mddev); -- -- /* -- * Kick recovery, maybe this spare has to be added to the -- * array immediately. -- */ -- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); -- md_wakeup_thread(mddev->thread); -- -- return 0; -- --abort_unbind_export: -- unbind_rdev_from_array(rdev); -- --abort_export: -- export_rdev(rdev); -- return err; --} -- --/* -- * set_array_info is used two different ways -- * The original usage is when creating a new array. -- * In this usage, raid_disks is > = and it together with -- * level, size, not_persistent,layout,chunksize determine the -- * shape of the array. -- * This will always create an array with a type-0.90.0 superblock. -- * The newer usage is when assembling an array. -- * In this case raid_disks will be 0, and the major_version field is -- * use to determine which style super-blocks are to be found on the devices. -- * The minor and patch _version numbers are also kept incase the -- * super_block handler wishes to interpret them. -- */ --static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) --{ -- -- if (info->raid_disks == 0) { -- /* just setting version number for superblock loading */ -- if (info->major_version < 0 || -- info->major_version >= sizeof(super_types)/sizeof(super_types[0]) || -- super_types[info->major_version].name == NULL) { -- /* maybe try to auto-load a module? */ -- printk(KERN_INFO -- "md: superblock version %d not known\n", -- info->major_version); -- return -EINVAL; -- } -- mddev->major_version = info->major_version; -- mddev->minor_version = info->minor_version; -- mddev->patch_version = info->patch_version; -- return 0; -- } -- mddev->major_version = MD_MAJOR_VERSION; -- mddev->minor_version = MD_MINOR_VERSION; -- mddev->patch_version = MD_PATCHLEVEL_VERSION; -- mddev->ctime = get_seconds(); -- -- mddev->level = info->level; -- mddev->size = info->size; -- mddev->raid_disks = info->raid_disks; -- /* don't set __minor, it is determined by which /dev/md* was -- * openned -- */ -- if (info->state & (1<recovery_cp = MaxSector; -- else -- mddev->recovery_cp = 0; -- mddev->persistent = ! info->not_persistent; -- -- mddev->layout = info->layout; -- mddev->chunk_size = info->chunk_size; -- -- mddev->max_disks = MD_SB_DISKS; -- -- -- /* -- * Generate a 128 bit UUID -- */ -- get_random_bytes(mddev->uuid, 16); -- -- return 0; --} -- --static int set_disk_faulty(mddev_t *mddev, dev_t dev) --{ -- mdk_rdev_t *rdev; -- -- rdev = find_rdev(mddev, dev); -- if (!rdev) -- return 0; -- -- md_error(mddev, rdev); -- return 1; --} -- --static int md_ioctl(struct inode *inode, struct file *file, -- unsigned int cmd, unsigned long arg) --{ -- unsigned int minor; -- int err = 0; -- struct hd_geometry *loc = (struct hd_geometry *) arg; -- mddev_t *mddev = NULL; -- kdev_t dev; -- -- if (!capable(CAP_SYS_ADMIN)) -- return -EACCES; -- -- dev = inode->i_rdev; -- minor = minor(dev); -- if (minor >= MAX_MD_DEVS) { -- MD_BUG(); -- return -EINVAL; -- } -- -- /* -- * Commands dealing with the RAID driver but not any -- * particular array: -- */ -- switch (cmd) -- { -- case RAID_VERSION: -- err = get_version((void *)arg); -- goto done; -- -- case PRINT_RAID_DEBUG: -- err = 0; -- md_print_devices(); -- goto done; -- --#ifndef MODULE -- case RAID_AUTORUN: -- err = 0; -- autostart_arrays(); -- goto done; --#endif -- default:; -- } -- -- /* -- * Commands creating/starting a new array: -- */ -- -- mddev = inode->i_bdev->bd_inode->u.generic_ip; -- -- if (!mddev) { -- BUG(); -- goto abort; -- } -- -- -- if (cmd == START_ARRAY) { -- /* START_ARRAY doesn't need to lock the array as autostart_array -- * does the locking, and it could even be a different array -- */ -- err = autostart_array(arg); -- if (err) { -- printk(KERN_WARNING "md: autostart %s failed!\n", -- partition_name(arg)); -- goto abort; -- } -- goto done; -- } -- -- err = mddev_lock(mddev); -- if (err) { -- printk(KERN_INFO -- "md: ioctl lock interrupted, reason %d, cmd %d\n", -- err, cmd); -- goto abort; -- } -- -- switch (cmd) -- { -- case SET_ARRAY_INFO: -- -- if (!list_empty(&mddev->disks)) { -- printk(KERN_WARNING -- "md: array md%d already has disks!\n", -- mdidx(mddev)); -- err = -EBUSY; -- goto abort_unlock; -- } -- if (mddev->raid_disks) { -- printk(KERN_WARNING -- "md: array md%d already initialised!\n", -- mdidx(mddev)); -- err = -EBUSY; -- goto abort_unlock; -- } -- { -- mdu_array_info_t info; -- if (!arg) -- memset(&info, 0, sizeof(info)); -- else if (copy_from_user(&info, (void*)arg, sizeof(info))) { -- err = -EFAULT; -- goto abort_unlock; -- } -- err = set_array_info(mddev, &info); -- if (err) { -- printk(KERN_WARNING "md: couldn't set" -- " array info. %d\n", err); -- goto abort_unlock; -- } -- } -- goto done_unlock; -- -- default:; -- } -- -- /* -- * Commands querying/configuring an existing array: -- */ -- /* if we are initialised yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */ -- if (!mddev->raid_disks && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) { -- err = -ENODEV; -- goto abort_unlock; -- } -- -- /* -- * Commands even a read-only array can execute: -- */ -- switch (cmd) -- { -- case GET_ARRAY_INFO: -- err = get_array_info(mddev, (void *)arg); -- goto done_unlock; -- -- case GET_DISK_INFO: -- err = get_disk_info(mddev, (void *)arg); -- goto done_unlock; -- -- case RESTART_ARRAY_RW: -- err = restart_array(mddev); -- goto done_unlock; -- -- case STOP_ARRAY: -- err = do_md_stop (mddev, 0); -- goto done_unlock; -- -- case STOP_ARRAY_RO: -- err = do_md_stop (mddev, 1); -- goto done_unlock; -- -- /* -- * We have a problem here : there is no easy way to give a CHS -- * virtual geometry. We currently pretend that we have a 2 heads -- * 4 sectors (with a BIG number of cylinders...). This drives -- * dosfs just mad... ;-) -- */ -- case HDIO_GETGEO: -- if (!loc) { -- err = -EINVAL; -- goto abort_unlock; -- } -- err = put_user (2, (char *) &loc->heads); -- if (err) -- goto abort_unlock; -- err = put_user (4, (char *) &loc->sectors); -- if (err) -- goto abort_unlock; -- err = put_user(get_capacity(disks[mdidx(mddev)])/8, -- (short *) &loc->cylinders); -- if (err) -- goto abort_unlock; -- err = put_user (get_start_sect(inode->i_bdev), -- (long *) &loc->start); -- goto done_unlock; -- } -- -- /* -- * The remaining ioctls are changing the state of the -- * superblock, so we do not allow read-only arrays -- * here: -- */ -- if (mddev->ro) { -- err = -EROFS; -- goto abort_unlock; -- } -- -- switch (cmd) -- { -- case ADD_NEW_DISK: -- { -- mdu_disk_info_t info; -- if (copy_from_user(&info, (void*)arg, sizeof(info))) -- err = -EFAULT; -- else -- err = add_new_disk(mddev, &info); -- goto done_unlock; -- } -- case HOT_GENERATE_ERROR: -- err = hot_generate_error(mddev, arg); -- goto done_unlock; -- case HOT_REMOVE_DISK: -- err = hot_remove_disk(mddev, arg); -- goto done_unlock; -- -- case HOT_ADD_DISK: -- err = hot_add_disk(mddev, arg); -- goto done_unlock; -- -- case SET_DISK_FAULTY: -- err = set_disk_faulty(mddev, arg); -- goto done_unlock; -- -- case RUN_ARRAY: -- { -- err = do_md_run (mddev); -- /* -- * we have to clean up the mess if -- * the array cannot be run for some -- * reason ... -- * ->pers will not be set, to superblock will -- * not be updated. -- */ -- if (err) -- do_md_stop (mddev, 0); -- goto done_unlock; -- } -- -- default: -- if (_IOC_TYPE(cmd) == MD_MAJOR) -- printk(KERN_WARNING "md: %s(pid %d) used" -- " obsolete MD ioctl, upgrade your" -- " software to use new ictls.\n", -- current->comm, current->pid); -- err = -EINVAL; -- goto abort_unlock; -- } -- --done_unlock: --abort_unlock: -- mddev_unlock(mddev); -- -- return err; --done: -- if (err) -- MD_BUG(); --abort: -- return err; --} -- --static int md_open(struct inode *inode, struct file *file) --{ -- /* -- * Succeed if we can find or allocate a mddev structure. -- */ -- mddev_t *mddev = mddev_find(minor(inode->i_rdev)); -- int err = -ENOMEM; -- -- if (!mddev) -- goto out; -- -- if ((err = mddev_lock(mddev))) -- goto put; -- -- err = 0; -- mddev_unlock(mddev); -- inode->i_bdev->bd_inode->u.generic_ip = mddev_get(mddev); -- put: -- mddev_put(mddev); -- out: -- return err; --} -- --static int md_release(struct inode *inode, struct file * file) --{ -- mddev_t *mddev = inode->i_bdev->bd_inode->u.generic_ip; -- -- if (!mddev) -- BUG(); -- mddev_put(mddev); -- -- return 0; --} -- --static struct block_device_operations md_fops = --{ -- .owner = THIS_MODULE, -- .open = md_open, -- .release = md_release, -- .ioctl = md_ioctl, --}; -- --int md_thread(void * arg) --{ -- mdk_thread_t *thread = arg; -- -- lock_kernel(); -- -- /* -- * Detach thread -- */ -- -- daemonize(thread->name, mdidx(thread->mddev)); -- -- current->exit_signal = SIGCHLD; -- allow_signal(SIGKILL); -- thread->tsk = current; -- -- /* -- * md_thread is a 'system-thread', it's priority should be very -- * high. We avoid resource deadlocks individually in each -- * raid personality. (RAID5 does preallocation) We also use RR and -- * the very same RT priority as kswapd, thus we will never get -- * into a priority inversion deadlock. -- * -- * we definitely have to have equal or higher priority than -- * bdflush, otherwise bdflush will deadlock if there are too -- * many dirty RAID5 blocks. -- */ -- unlock_kernel(); -- -- complete(thread->event); -- while (thread->run) { -- void (*run)(mddev_t *); -- -- wait_event_interruptible(thread->wqueue, -- test_bit(THREAD_WAKEUP, &thread->flags)); -- if (current->flags & PF_FREEZE) -- refrigerator(PF_IOTHREAD); -- -- clear_bit(THREAD_WAKEUP, &thread->flags); -- -- run = thread->run; -- if (run) { -- run(thread->mddev); -- blk_run_queues(); -- } -- if (signal_pending(current)) -- flush_signals(current); -- } -- complete(thread->event); -- return 0; --} -- --void md_wakeup_thread(mdk_thread_t *thread) --{ -- if (thread) { -- dprintk("md: waking up MD thread %p.\n", thread); -- set_bit(THREAD_WAKEUP, &thread->flags); -- wake_up(&thread->wqueue); -- } --} -- --mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev, -- const char *name) --{ -- mdk_thread_t *thread; -- int ret; -- struct completion event; -- -- thread = (mdk_thread_t *) kmalloc -- (sizeof(mdk_thread_t), GFP_KERNEL); -- if (!thread) -- return NULL; -- -- memset(thread, 0, sizeof(mdk_thread_t)); -- init_waitqueue_head(&thread->wqueue); -- -- init_completion(&event); -- thread->event = &event; -- thread->run = run; -- thread->mddev = mddev; -- thread->name = name; -- ret = kernel_thread(md_thread, thread, 0); -- if (ret < 0) { -- kfree(thread); -- return NULL; -- } -- wait_for_completion(&event); -- return thread; --} -- --void md_interrupt_thread(mdk_thread_t *thread) --{ -- if (!thread->tsk) { -- MD_BUG(); -- return; -- } -- dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); -- send_sig(SIGKILL, thread->tsk, 1); --} -- --void md_unregister_thread(mdk_thread_t *thread) --{ -- struct completion event; -- -- init_completion(&event); -- -- thread->event = &event; -- thread->run = NULL; -- thread->name = NULL; -- md_interrupt_thread(thread); -- wait_for_completion(&event); -- kfree(thread); --} -- --void md_error(mddev_t *mddev, mdk_rdev_t *rdev) --{ -- dprintk("md_error dev:(%d:%d), rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", -- MD_MAJOR,mdidx(mddev), -- MAJOR(rdev->bdev->bd_dev), MINOR(rdev->bdev->bd_dev), -- __builtin_return_address(0),__builtin_return_address(1), -- __builtin_return_address(2),__builtin_return_address(3)); -- -- if (!mddev) { -- MD_BUG(); -- return; -- } -- -- if (!rdev || rdev->faulty) -- return; -- if (!mddev->pers->error_handler) -- return; -- mddev->pers->error_handler(mddev,rdev); -- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); -- md_wakeup_thread(mddev->thread); --} -- --/* seq_file implementation /proc/mdstat */ -- --static void status_unused(struct seq_file *seq) --{ -- int i = 0; -- mdk_rdev_t *rdev; -- struct list_head *tmp; -- -- seq_printf(seq, "unused devices: "); -- -- ITERATE_RDEV_PENDING(rdev,tmp) { -- i++; -- seq_printf(seq, "%s ", -- bdev_partition_name(rdev->bdev)); -- } -- if (!i) -- seq_printf(seq, ""); -- -- seq_printf(seq, "\n"); --} -- -- --static void status_resync(struct seq_file *seq, mddev_t * mddev) --{ -- unsigned long max_blocks, resync, res, dt, db, rt; -- -- resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; -- max_blocks = mddev->size; -- -- /* -- * Should not happen. -- */ -- if (!max_blocks) { -- MD_BUG(); -- return; -- } -- res = (resync/1024)*1000/(max_blocks/1024 + 1); -- { -- int i, x = res/50, y = 20-x; -- seq_printf(seq, "["); -- for (i = 0; i < x; i++) -- seq_printf(seq, "="); -- seq_printf(seq, ">"); -- for (i = 0; i < y; i++) -- seq_printf(seq, "."); -- seq_printf(seq, "] "); -- } -- seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", -- (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? -- "resync" : "recovery"), -- res/10, res % 10, resync, max_blocks); -- -- /* -- * We do not want to overflow, so the order of operands and -- * the * 100 / 100 trick are important. We do a +1 to be -- * safe against division by zero. We only estimate anyway. -- * -- * dt: time from mark until now -- * db: blocks written from mark until now -- * rt: remaining time -- */ -- dt = ((jiffies - mddev->resync_mark) / HZ); -- if (!dt) dt++; -- db = resync - (mddev->resync_mark_cnt/2); -- rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; -- -- seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); -- -- seq_printf(seq, " speed=%ldK/sec", db/dt); --} -- --static void *md_seq_start(struct seq_file *seq, loff_t *pos) --{ -- struct list_head *tmp; -- loff_t l = *pos; -- mddev_t *mddev; -- -- if (l > 0x10000) -- return NULL; -- if (!l--) -- /* header */ -- return (void*)1; -- -- spin_lock(&all_mddevs_lock); -- list_for_each(tmp,&all_mddevs) -- if (!l--) { -- mddev = list_entry(tmp, mddev_t, all_mddevs); -- mddev_get(mddev); -- spin_unlock(&all_mddevs_lock); -- return mddev; -- } -- spin_unlock(&all_mddevs_lock); -- return (void*)2;/* tail */ --} -- --static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) --{ -- struct list_head *tmp; -- mddev_t *next_mddev, *mddev = v; -- -- ++*pos; -- if (v == (void*)2) -- return NULL; -- -- spin_lock(&all_mddevs_lock); -- if (v == (void*)1) -- tmp = all_mddevs.next; -- else -- tmp = mddev->all_mddevs.next; -- if (tmp != &all_mddevs) -- next_mddev = mddev_get(list_entry(tmp,mddev_t,all_mddevs)); -- else { -- next_mddev = (void*)2; -- *pos = 0x10000; -- } -- spin_unlock(&all_mddevs_lock); -- -- if (v != (void*)1) -- mddev_put(mddev); -- return next_mddev; -- --} -- --static void md_seq_stop(struct seq_file *seq, void *v) --{ -- mddev_t *mddev = v; -- -- if (mddev && v != (void*)1 && v != (void*)2) -- mddev_put(mddev); --} -- --static int md_seq_show(struct seq_file *seq, void *v) --{ -- mddev_t *mddev = v; -- sector_t size; -- struct list_head *tmp2; -- mdk_rdev_t *rdev; -- int i; -- -- if (v == (void*)1) { -- seq_printf(seq, "Personalities : "); -- spin_lock(&pers_lock); -- for (i = 0; i < MAX_PERSONALITY; i++) -- if (pers[i]) -- seq_printf(seq, "[%s] ", pers[i]->name); -- -- spin_unlock(&pers_lock); -- seq_printf(seq, "\n"); -- return 0; -- } -- if (v == (void*)2) { -- status_unused(seq); -- return 0; -- } -- -- if (mddev_lock(mddev)!=0) -- return -EINTR; -- if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { -- seq_printf(seq, "md%d : %sactive", mdidx(mddev), -- mddev->pers ? "" : "in"); -- if (mddev->pers) { -- if (mddev->ro) -- seq_printf(seq, " (read-only)"); -- seq_printf(seq, " %s", mddev->pers->name); -- } -- -- size = 0; -- ITERATE_RDEV(mddev,rdev,tmp2) { -- seq_printf(seq, " %s[%d]", -- bdev_partition_name(rdev->bdev), rdev->desc_nr); -- if (rdev->faulty) { -- seq_printf(seq, "(F)"); -- continue; -- } -- size += rdev->size; -- } -- -- if (!list_empty(&mddev->disks)) { -- if (mddev->pers) -- seq_printf(seq, "\n %llu blocks", -- (unsigned long long)mddev->array_size); -- else -- seq_printf(seq, "\n %llu blocks", -- (unsigned long long)size); -- } -- -- if (mddev->pers) { -- mddev->pers->status (seq, mddev); -- seq_printf(seq, "\n "); -- if (mddev->curr_resync > 2) -- status_resync (seq, mddev); -- else if (mddev->curr_resync == 1 || mddev->curr_resync == 2) -- seq_printf(seq, " resync=DELAYED"); -- } -- -- seq_printf(seq, "\n"); -- } -- mddev_unlock(mddev); -- -- return 0; --} -- --static struct seq_operations md_seq_ops = { -- .start = md_seq_start, -- .next = md_seq_next, -- .stop = md_seq_stop, -- .show = md_seq_show, --}; -- --static int md_seq_open(struct inode *inode, struct file *file) --{ -- int error; -- -- error = seq_open(file, &md_seq_ops); -- return error; --} -- --static struct file_operations md_seq_fops = { -- .open = md_seq_open, -- .read = seq_read, -- .llseek = seq_lseek, -- .release = seq_release, --}; -- --int register_md_personality(int pnum, mdk_personality_t *p) --{ -- if (pnum >= MAX_PERSONALITY) { -- MD_BUG(); -- return -EINVAL; -- } -- -- spin_lock(&pers_lock); -- if (pers[pnum]) { -- spin_unlock(&pers_lock); -- MD_BUG(); -- return -EBUSY; -- } -- -- pers[pnum] = p; -- printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum); -- spin_unlock(&pers_lock); -- return 0; --} -- --int unregister_md_personality(int pnum) --{ -- if (pnum >= MAX_PERSONALITY) { -- MD_BUG(); -- return -EINVAL; -- } -- -- printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name); -- spin_lock(&pers_lock); -- pers[pnum] = NULL; -- spin_unlock(&pers_lock); -- return 0; --} -- --void md_sync_acct(mdk_rdev_t *rdev, unsigned long nr_sectors) --{ -- rdev->bdev->bd_contains->bd_disk->sync_io += nr_sectors; --} -- --static int is_mddev_idle(mddev_t *mddev) --{ -- mdk_rdev_t * rdev; -- struct list_head *tmp; -- int idle; -- unsigned long curr_events; -- -- idle = 1; -- ITERATE_RDEV(mddev,rdev,tmp) { -- struct gendisk *disk = rdev->bdev->bd_contains->bd_disk; -- curr_events = disk_stat_read(disk, read_sectors) + -- disk_stat_read(disk, write_sectors) - -- disk->sync_io; -- if ((curr_events - rdev->last_events) > 32) { -- rdev->last_events = curr_events; -- idle = 0; -- } -- } -- return idle; --} -- --void md_done_sync(mddev_t *mddev, int blocks, int ok) --{ -- /* another "blocks" (512byte) blocks have been synced */ -- atomic_sub(blocks, &mddev->recovery_active); -- wake_up(&mddev->recovery_wait); -- if (!ok) { -- set_bit(MD_RECOVERY_ERR, &mddev->recovery); -- md_wakeup_thread(mddev->thread); -- // stop recovery, signal do_sync .... -- } --} -- -- --void md_write_start(mddev_t *mddev) --{ -- if (!atomic_read(&mddev->writes_pending)) { -- mddev_lock_uninterruptible(mddev); -- if (mddev->in_sync) { -- mddev->in_sync = 0; -- del_timer(&mddev->safemode_timer); -- md_update_sb(mddev); -- } -- atomic_inc(&mddev->writes_pending); -- mddev_unlock(mddev); -- } else -- atomic_inc(&mddev->writes_pending); --} -- --void md_write_end(mddev_t *mddev) --{ -- if (atomic_dec_and_test(&mddev->writes_pending)) { -- if (mddev->safemode == 2) -- md_wakeup_thread(mddev->thread); -- else -- mod_timer(&mddev->safemode_timer, jiffies + mddev->safemode_delay); -- } --} -- --static inline void md_enter_safemode(mddev_t *mddev) --{ -- mddev_lock_uninterruptible(mddev); -- if (mddev->safemode && !atomic_read(&mddev->writes_pending) && -- !mddev->in_sync && mddev->recovery_cp == MaxSector) { -- mddev->in_sync = 1; -- md_update_sb(mddev); -- } -- mddev_unlock(mddev); -- -- if (mddev->safemode == 1) -- mddev->safemode = 0; --} -- --void md_handle_safemode(mddev_t *mddev) --{ -- if (signal_pending(current)) { -- printk(KERN_INFO "md: md%d in immediate safe mode\n", -- mdidx(mddev)); -- mddev->safemode = 2; -- flush_signals(current); -- } -- if (mddev->safemode) -- md_enter_safemode(mddev); --} -- -- --DECLARE_WAIT_QUEUE_HEAD(resync_wait); -- --#define SYNC_MARKS 10 --#define SYNC_MARK_STEP (3*HZ) --static void md_do_sync(mddev_t *mddev) --{ -- mddev_t *mddev2; -- unsigned int max_sectors, currspeed = 0, -- j, window; -- unsigned long mark[SYNC_MARKS]; -- unsigned long mark_cnt[SYNC_MARKS]; -- int last_mark,m; -- struct list_head *tmp; -- unsigned long last_check; -- -- /* just incase thread restarts... */ -- if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) -- return; -- -- /* we overload curr_resync somewhat here. -- * 0 == not engaged in resync at all -- * 2 == checking that there is no conflict with another sync -- * 1 == like 2, but have yielded to allow conflicting resync to -- * commense -- * other == active in resync - this many blocks -- */ -- do { -- mddev->curr_resync = 2; -- -- ITERATE_MDDEV(mddev2,tmp) { -- if (mddev2 == mddev) -- continue; -- if (mddev2->curr_resync && -- match_mddev_units(mddev,mddev2)) { -- printk(KERN_INFO "md: delaying resync of md%d" -- " until md%d has finished resync (they" -- " share one or more physical units)\n", -- mdidx(mddev), mdidx(mddev2)); -- if (mddev < mddev2) {/* arbitrarily yield */ -- mddev->curr_resync = 1; -- wake_up(&resync_wait); -- } -- if (wait_event_interruptible(resync_wait, -- mddev2->curr_resync < mddev->curr_resync)) { -- flush_signals(current); -- mddev_put(mddev2); -- goto skip; -- } -- } -- if (mddev->curr_resync == 1) { -- mddev_put(mddev2); -- break; -- } -- } -- } while (mddev->curr_resync < 2); -- -- max_sectors = mddev->size << 1; -- -- printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); -- printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:" -- " %d KB/sec/disc.\n", sysctl_speed_limit_min); -- printk(KERN_INFO "md: using maximum available idle IO bandwith " -- "(but not more than %d KB/sec) for reconstruction.\n", -- sysctl_speed_limit_max); -- -- is_mddev_idle(mddev); /* this also initializes IO event counters */ -- if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) -- j = mddev->recovery_cp; -- else -- j = 0; -- for (m = 0; m < SYNC_MARKS; m++) { -- mark[m] = jiffies; -- mark_cnt[m] = j; -- } -- last_mark = 0; -- mddev->resync_mark = mark[last_mark]; -- mddev->resync_mark_cnt = mark_cnt[last_mark]; -- -- /* -- * Tune reconstruction: -- */ -- window = 32*(PAGE_SIZE/512); -- printk(KERN_INFO "md: using %dk window, over a total of %d blocks.\n", -- window/2,max_sectors/2); -- -- atomic_set(&mddev->recovery_active, 0); -- init_waitqueue_head(&mddev->recovery_wait); -- last_check = 0; -- -- if (j) -- printk(KERN_INFO -- "md: resuming recovery of md%d from checkpoint.\n", -- mdidx(mddev)); -- -- while (j < max_sectors) { -- int sectors; -- -- sectors = mddev->pers->sync_request(mddev, j, currspeed < sysctl_speed_limit_min); -- if (sectors < 0) { -- set_bit(MD_RECOVERY_ERR, &mddev->recovery); -- goto out; -- } -- atomic_add(sectors, &mddev->recovery_active); -- j += sectors; -- if (j>1) mddev->curr_resync = j; -- -- if (last_check + window > j) -- continue; -- -- last_check = j; -- -- if (test_bit(MD_RECOVERY_INTR, &mddev->recovery) || -- test_bit(MD_RECOVERY_ERR, &mddev->recovery)) -- break; -- -- blk_run_queues(); -- -- repeat: -- if (jiffies >= mark[last_mark] + SYNC_MARK_STEP ) { -- /* step marks */ -- int next = (last_mark+1) % SYNC_MARKS; -- -- mddev->resync_mark = mark[next]; -- mddev->resync_mark_cnt = mark_cnt[next]; -- mark[next] = jiffies; -- mark_cnt[next] = j - atomic_read(&mddev->recovery_active); -- last_mark = next; -- } -- -- -- if (signal_pending(current)) { -- /* -- * got a signal, exit. -- */ -- printk(KERN_INFO -- "md: md_do_sync() got signal ... exiting\n"); -- flush_signals(current); -- set_bit(MD_RECOVERY_INTR, &mddev->recovery); -- goto out; -- } -- -- /* -- * this loop exits only if either when we are slower than -- * the 'hard' speed limit, or the system was IO-idle for -- * a jiffy. -- * the system might be non-idle CPU-wise, but we only care -- * about not overloading the IO subsystem. (things like an -- * e2fsck being done on the RAID array should execute fast) -- */ -- cond_resched(); -- -- currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; -- -- if (currspeed > sysctl_speed_limit_min) { -- if ((currspeed > sysctl_speed_limit_max) || -- !is_mddev_idle(mddev)) { -- current->state = TASK_INTERRUPTIBLE; -- schedule_timeout(HZ/4); -- goto repeat; -- } -- } -- } -- printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev)); -- /* -- * this also signals 'finished resyncing' to md_stop -- */ -- out: -- wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); -- -- /* tell personality that we are finished */ -- mddev->pers->sync_request(mddev, max_sectors, 1); -- -- if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) && -- mddev->curr_resync > 2 && -- mddev->curr_resync > mddev->recovery_cp) { -- if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { -- printk(KERN_INFO -- "md: checkpointing recovery of md%d.\n", -- mdidx(mddev)); -- mddev->recovery_cp = mddev->curr_resync; -- } else -- mddev->recovery_cp = MaxSector; -- } -- -- if (mddev->safemode) -- md_enter_safemode(mddev); -- skip: -- mddev->curr_resync = 0; -- set_bit(MD_RECOVERY_DONE, &mddev->recovery); -- md_wakeup_thread(mddev->thread); --} -- -- --/* -- * This routine is regularly called by all per-raid-array threads to -- * deal with generic issues like resync and super-block update. -- * Raid personalities that don't have a thread (linear/raid0) do not -- * need this as they never do any recovery or update the superblock. -- * -- * It does not do any resync itself, but rather "forks" off other threads -- * to do that as needed. -- * When it is determined that resync is needed, we set MD_RECOVERY_RUNNING in -- * "->recovery" and create a thread at ->sync_thread. -- * When the thread finishes it sets MD_RECOVERY_DONE (and might set MD_RECOVERY_ERR) -- * and wakeups up this thread which will reap the thread and finish up. -- * This thread also removes any faulty devices (with nr_pending == 0). -- * -- * The overall approach is: -- * 1/ if the superblock needs updating, update it. -- * 2/ If a recovery thread is running, don't do anything else. -- * 3/ If recovery has finished, clean up, possibly marking spares active. -- * 4/ If there are any faulty devices, remove them. -- * 5/ If array is degraded, try to add spares devices -- * 6/ If array has spares or is not in-sync, start a resync thread. -- */ --void md_check_recovery(mddev_t *mddev) --{ -- mdk_rdev_t *rdev; -- struct list_head *rtmp; -- -- -- dprintk(KERN_INFO "md: recovery thread got woken up ...\n"); -- -- if (mddev->ro) -- return; -- if ( ! ( -- mddev->sb_dirty || -- test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || -- test_bit(MD_RECOVERY_DONE, &mddev->recovery) -- )) -- return; -- if (mddev_trylock(mddev)==0) { -- int spares =0; -- if (mddev->sb_dirty) -- md_update_sb(mddev); -- if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && -- !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) -- /* resync/recovery still happening */ -- goto unlock; -- if (mddev->sync_thread) { -- /* resync has finished, collect result */ -- md_unregister_thread(mddev->sync_thread); -- mddev->sync_thread = NULL; -- if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery)) { -- /* success...*/ -- /* activate any spares */ -- mddev->pers->spare_active(mddev); -- } -- md_update_sb(mddev); -- mddev->recovery = 0; -- wake_up(&resync_wait); -- goto unlock; -- } -- if (mddev->recovery) { -- /* that's odd.. */ -- mddev->recovery = 0; -- wake_up(&resync_wait); -- } -- -- /* no recovery is running. -- * remove any failed drives, then -- * add spares if possible -- */ -- ITERATE_RDEV(mddev,rdev,rtmp) { -- if (rdev->raid_disk >= 0 && -- rdev->faulty && -- atomic_read(&rdev->nr_pending)==0) { -- mddev->pers->hot_remove_disk(mddev, rdev->raid_disk); -- rdev->raid_disk = -1; -- } -- if (!rdev->faulty && rdev->raid_disk >= 0 && !rdev->in_sync) -- spares++; -- } -- if (mddev->degraded) { -- ITERATE_RDEV(mddev,rdev,rtmp) -- if (rdev->raid_disk < 0 -- && !rdev->faulty) { -- if (mddev->pers->hot_add_disk(mddev,rdev)) -- spares++; -- else -- break; -- } -- } -- -- if (!spares && (mddev->recovery_cp == MaxSector )) { -- /* nothing we can do ... */ -- goto unlock; -- } -- if (mddev->pers->sync_request) { -- set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); -- if (!spares) -- set_bit(MD_RECOVERY_SYNC, &mddev->recovery); -- mddev->sync_thread = md_register_thread(md_do_sync, -- mddev, -- "md%d_resync"); -- if (!mddev->sync_thread) { -- printk(KERN_ERR "md%d: could not start resync" -- " thread...\n", -- mdidx(mddev)); -- /* leave the spares where they are, it shouldn't hurt */ -- mddev->recovery = 0; -- } else { -- md_wakeup_thread(mddev->sync_thread); -- } -- } -- unlock: -- mddev_unlock(mddev); -- } --} -- --int md_notify_reboot(struct notifier_block *this, -- unsigned long code, void *x) --{ -- struct list_head *tmp; -- mddev_t *mddev; -- -- if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) { -- -- printk(KERN_INFO "md: stopping all md devices.\n"); -- -- ITERATE_MDDEV(mddev,tmp) -- if (mddev_trylock(mddev)==0) -- do_md_stop (mddev, 1); -- /* -- * certain more exotic SCSI devices are known to be -- * volatile wrt too early system reboots. While the -- * right place to handle this issue is the given -- * driver, we do want to have a safe RAID driver ... -- */ -- mdelay(1000*1); -- } -- return NOTIFY_DONE; --} -- --struct notifier_block md_notifier = { -- .notifier_call = md_notify_reboot, -- .next = NULL, -- .priority = INT_MAX, /* before any real devices */ --}; -- --static void md_geninit(void) --{ -- struct proc_dir_entry *p; -- -- dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); -- --#ifdef CONFIG_PROC_FS -- p = create_proc_entry("mdstat", S_IRUGO, NULL); -- if (p) -- p->proc_fops = &md_seq_fops; --#endif --} -- --int __init md_init(void) --{ -- int minor; -- -- printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d," -- " MD_SB_DISKS=%d\n", -- MD_MAJOR_VERSION, MD_MINOR_VERSION, -- MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); -- -- if (register_blkdev(MAJOR_NR, "md")) -- return -1; -- -- devfs_mk_dir("md"); -- blk_register_region(MKDEV(MAJOR_NR, 0), MAX_MD_DEVS, THIS_MODULE, -- md_probe, NULL, NULL); -- for (minor=0; minor < MAX_MD_DEVS; ++minor) { -- char name[16]; -- sprintf(name, "md/%d", minor); -- devfs_register(NULL, name, DEVFS_FL_DEFAULT, MAJOR_NR, minor, -- S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); -- } -- -- register_reboot_notifier(&md_notifier); -- raid_table_header = register_sysctl_table(raid_root_table, 1); -- -- md_geninit(); -- return (0); --} -- -- --#ifndef MODULE -- --/* -- * Searches all registered partitions for autorun RAID arrays -- * at boot time. -- */ --static dev_t detected_devices[128]; --static int dev_cnt; -- --void md_autodetect_dev(dev_t dev) --{ -- if (dev_cnt >= 0 && dev_cnt < 127) -- detected_devices[dev_cnt++] = dev; --} -- -- --static void autostart_arrays(void) --{ -- mdk_rdev_t *rdev; -- int i; -- -- printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); -- -- for (i = 0; i < dev_cnt; i++) { -- dev_t dev = detected_devices[i]; -- -- rdev = md_import_device(dev,0, 0); -- if (IS_ERR(rdev)) { -- printk(KERN_ALERT "md: could not import %s!\n", -- partition_name(dev)); -- continue; -- } -- if (rdev->faulty) { -- MD_BUG(); -- continue; -- } -- list_add(&rdev->same_set, &pending_raid_disks); -- } -- dev_cnt = 0; -- -- autorun_devices(); --} -- --#endif -- --static __exit void md_exit(void) --{ -- int i; -- blk_unregister_region(MKDEV(MAJOR_NR,0), MAX_MD_DEVS); -- for (i=0; i < MAX_MD_DEVS; i++) -- devfs_remove("md/%d", i); -- devfs_remove("md"); -- -- unregister_blkdev(MAJOR_NR,"md"); -- unregister_reboot_notifier(&md_notifier); -- unregister_sysctl_table(raid_table_header); --#ifdef CONFIG_PROC_FS -- remove_proc_entry("mdstat", NULL); --#endif -- for (i = 0; i < MAX_MD_DEVS; i++) { -- struct gendisk *disk = disks[i]; -- mddev_t *mddev; -- if (!disks[i]) -- continue; -- mddev = disk->private_data; -- del_gendisk(disk); -- put_disk(disk); -- mddev_put(mddev); -- } --} -- --module_init(md_init) --module_exit(md_exit) -- --EXPORT_SYMBOL(register_md_personality); --EXPORT_SYMBOL(unregister_md_personality); --EXPORT_SYMBOL(md_error); --EXPORT_SYMBOL(md_sync_acct); --EXPORT_SYMBOL(md_done_sync); --EXPORT_SYMBOL(md_write_start); --EXPORT_SYMBOL(md_write_end); --EXPORT_SYMBOL(md_handle_safemode); --EXPORT_SYMBOL(md_register_thread); --EXPORT_SYMBOL(md_unregister_thread); --EXPORT_SYMBOL(md_wakeup_thread); --EXPORT_SYMBOL(md_print_devices); --EXPORT_SYMBOL(md_interrupt_thread); --EXPORT_SYMBOL(md_check_recovery); --MODULE_LICENSE("GPL"); ./linux/md/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.303262438 +0000 @@ -1,1909 +0,0 @@ -/* - * raid1.c : Multiple Devices driver for Linux - * - * Copyright (C) 1999, 2000 Ingo Molnar, Red Hat - * - * Copyright (C) 1996, 1997, 1998 Ingo Molnar, Miguel de Icaza, Gadi Oxman - * - * RAID-1 management functions. - * - * Better read-balancing code written by Mika Kuoppala , 2000 - * - * Fixes to reconstruction by Jakob Østergaard" - * Various fixes by Neil Brown - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * You should have received a copy of the GNU General Public License - * (for example /usr/src/linux/COPYING); if not, write to the Free - * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include -#include -#include -#include -#include - -#define MAJOR_NR MD_MAJOR -#define MD_DRIVER -#define MD_PERSONALITY - -#define MAX_WORK_PER_DISK 128 - -#define NR_RESERVED_BUFS 32 - - -/* - * The following can be used to debug the driver - */ -#define RAID1_DEBUG 0 - -#if RAID1_DEBUG -#define PRINTK(x...) printk(x) -#define inline -#define __inline__ -#else -#define PRINTK(x...) do { } while (0) -#endif - - -static mdk_personality_t raid1_personality; -static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED; -struct raid1_bh *raid1_retry_list = NULL, **raid1_retry_tail; - -static struct buffer_head *raid1_alloc_bh(raid1_conf_t *conf, int cnt) -{ - /* return a linked list of "cnt" struct buffer_heads. - * don't take any off the free list unless we know we can - * get all we need, otherwise we could deadlock - */ - struct buffer_head *bh=NULL; - - while(cnt) { - struct buffer_head *t; - md_spin_lock_irq(&conf->device_lock); - if (!conf->freebh_blocked && conf->freebh_cnt >= cnt) - while (cnt) { - t = conf->freebh; - conf->freebh = t->b_next; - t->b_next = bh; - bh = t; - t->b_state = 0; - conf->freebh_cnt--; - cnt--; - } - md_spin_unlock_irq(&conf->device_lock); - if (cnt == 0) - break; - t = kmem_cache_alloc(bh_cachep, SLAB_NOIO); - if (t) { - t->b_next = bh; - bh = t; - cnt--; - } else { - PRINTK("raid1: waiting for %d bh\n", cnt); - conf->freebh_blocked = 1; - wait_disk_event(conf->wait_buffer, - !conf->freebh_blocked || - conf->freebh_cnt > conf->raid_disks * NR_RESERVED_BUFS/2); - conf->freebh_blocked = 0; - } - } - return bh; -} - -static inline void raid1_free_bh(raid1_conf_t *conf, struct buffer_head *bh) -{ - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - while (bh) { - struct buffer_head *t = bh; - bh=bh->b_next; - if (t->b_pprev == NULL) - kmem_cache_free(bh_cachep, t); - else { - t->b_next= conf->freebh; - conf->freebh = t; - conf->freebh_cnt++; - } - } - spin_unlock_irqrestore(&conf->device_lock, flags); - wake_up(&conf->wait_buffer); -} - -static int raid1_grow_bh(raid1_conf_t *conf, int cnt) -{ - /* allocate cnt buffer_heads, possibly less if kmalloc fails */ - int i = 0; - - while (i < cnt) { - struct buffer_head *bh; - bh = kmem_cache_alloc(bh_cachep, SLAB_KERNEL); - if (!bh) break; - - md_spin_lock_irq(&conf->device_lock); - bh->b_pprev = &conf->freebh; - bh->b_next = conf->freebh; - conf->freebh = bh; - conf->freebh_cnt++; - md_spin_unlock_irq(&conf->device_lock); - - i++; - } - return i; -} - -static void raid1_shrink_bh(raid1_conf_t *conf) -{ - /* discard all buffer_heads */ - - md_spin_lock_irq(&conf->device_lock); - while (conf->freebh) { - struct buffer_head *bh = conf->freebh; - conf->freebh = bh->b_next; - kmem_cache_free(bh_cachep, bh); - conf->freebh_cnt--; - } - md_spin_unlock_irq(&conf->device_lock); -} - - -static struct raid1_bh *raid1_alloc_r1bh(raid1_conf_t *conf) -{ - struct raid1_bh *r1_bh = NULL; - - do { - md_spin_lock_irq(&conf->device_lock); - if (!conf->freer1_blocked && conf->freer1) { - r1_bh = conf->freer1; - conf->freer1 = r1_bh->next_r1; - conf->freer1_cnt--; - r1_bh->next_r1 = NULL; - r1_bh->state = (1 << R1BH_PreAlloc); - r1_bh->bh_req.b_state = 0; - } - md_spin_unlock_irq(&conf->device_lock); - if (r1_bh) - return r1_bh; - r1_bh = (struct raid1_bh *) kmalloc(sizeof(struct raid1_bh), GFP_NOIO); - if (r1_bh) { - memset(r1_bh, 0, sizeof(*r1_bh)); - return r1_bh; - } - conf->freer1_blocked = 1; - wait_disk_event(conf->wait_buffer, - !conf->freer1_blocked || - conf->freer1_cnt > NR_RESERVED_BUFS/2 - ); - conf->freer1_blocked = 0; - } while (1); -} - -static inline void raid1_free_r1bh(struct raid1_bh *r1_bh) -{ - struct buffer_head *bh = r1_bh->mirror_bh_list; - raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev); - - r1_bh->mirror_bh_list = NULL; - - if (test_bit(R1BH_PreAlloc, &r1_bh->state)) { - unsigned long flags; - spin_lock_irqsave(&conf->device_lock, flags); - r1_bh->next_r1 = conf->freer1; - conf->freer1 = r1_bh; - conf->freer1_cnt++; - spin_unlock_irqrestore(&conf->device_lock, flags); - /* don't need to wakeup wait_buffer because - * raid1_free_bh below will do that - */ - } else { - kfree(r1_bh); - } - raid1_free_bh(conf, bh); -} - -static int raid1_grow_r1bh (raid1_conf_t *conf, int cnt) -{ - int i = 0; - - while (i < cnt) { - struct raid1_bh *r1_bh; - r1_bh = (struct raid1_bh*)kmalloc(sizeof(*r1_bh), GFP_KERNEL); - if (!r1_bh) - break; - memset(r1_bh, 0, sizeof(*r1_bh)); - set_bit(R1BH_PreAlloc, &r1_bh->state); - r1_bh->mddev = conf->mddev; - - raid1_free_r1bh(r1_bh); - i++; - } - return i; -} - -static void raid1_shrink_r1bh(raid1_conf_t *conf) -{ - md_spin_lock_irq(&conf->device_lock); - while (conf->freer1) { - struct raid1_bh *r1_bh = conf->freer1; - conf->freer1 = r1_bh->next_r1; - conf->freer1_cnt--; - kfree(r1_bh); - } - md_spin_unlock_irq(&conf->device_lock); -} - - - -static inline void raid1_free_buf(struct raid1_bh *r1_bh) -{ - unsigned long flags; - struct buffer_head *bh = r1_bh->mirror_bh_list; - raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev); - r1_bh->mirror_bh_list = NULL; - - spin_lock_irqsave(&conf->device_lock, flags); - r1_bh->next_r1 = conf->freebuf; - conf->freebuf = r1_bh; - spin_unlock_irqrestore(&conf->device_lock, flags); - raid1_free_bh(conf, bh); -} - -static struct raid1_bh *raid1_alloc_buf(raid1_conf_t *conf) -{ - struct raid1_bh *r1_bh; - - md_spin_lock_irq(&conf->device_lock); - wait_event_lock_irq(conf->wait_buffer, conf->freebuf, conf->device_lock); - r1_bh = conf->freebuf; - conf->freebuf = r1_bh->next_r1; - r1_bh->next_r1= NULL; - md_spin_unlock_irq(&conf->device_lock); - - return r1_bh; -} - -static int raid1_grow_buffers (raid1_conf_t *conf, int cnt) -{ - int i = 0; - struct raid1_bh *head = NULL, **tail; - tail = &head; - - while (i < cnt) { - struct raid1_bh *r1_bh; - struct page *page; - - page = alloc_page(GFP_KERNEL); - if (!page) - break; - - r1_bh = (struct raid1_bh *) kmalloc(sizeof(*r1_bh), GFP_KERNEL); - if (!r1_bh) { - __free_page(page); - break; - } - memset(r1_bh, 0, sizeof(*r1_bh)); - r1_bh->bh_req.b_page = page; - r1_bh->bh_req.b_data = page_address(page); - *tail = r1_bh; - r1_bh->next_r1 = NULL; - tail = & r1_bh->next_r1; - i++; - } - /* this lock probably isn't needed, as at the time when - * we are allocating buffers, nobody else will be touching the - * freebuf list. But it doesn't hurt.... - */ - md_spin_lock_irq(&conf->device_lock); - *tail = conf->freebuf; - conf->freebuf = head; - md_spin_unlock_irq(&conf->device_lock); - return i; -} - -static void raid1_shrink_buffers (raid1_conf_t *conf) -{ - struct raid1_bh *head; - md_spin_lock_irq(&conf->device_lock); - head = conf->freebuf; - conf->freebuf = NULL; - md_spin_unlock_irq(&conf->device_lock); - - while (head) { - struct raid1_bh *r1_bh = head; - head = r1_bh->next_r1; - __free_page(r1_bh->bh_req.b_page); - kfree(r1_bh); - } -} - -static int raid1_map (mddev_t *mddev, kdev_t *rdev) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - int i, disks = MD_SB_DISKS; - - /* - * Later we do read balancing on the read side - * now we use the first available disk. - */ - - for (i = 0; i < disks; i++) { - if (conf->mirrors[i].operational) { - *rdev = conf->mirrors[i].dev; - return (0); - } - } - - printk (KERN_ERR "raid1_map(): huh, no more operational devices?\n"); - return (-1); -} - -static void raid1_reschedule_retry (struct raid1_bh *r1_bh) -{ - unsigned long flags; - mddev_t *mddev = r1_bh->mddev; - raid1_conf_t *conf = mddev_to_conf(mddev); - - md_spin_lock_irqsave(&retry_list_lock, flags); - if (raid1_retry_list == NULL) - raid1_retry_tail = &raid1_retry_list; - *raid1_retry_tail = r1_bh; - raid1_retry_tail = &r1_bh->next_r1; - r1_bh->next_r1 = NULL; - md_spin_unlock_irqrestore(&retry_list_lock, flags); - md_wakeup_thread(conf->thread); -} - - -static void inline io_request_done(unsigned long sector, raid1_conf_t *conf, int phase) -{ - unsigned long flags; - spin_lock_irqsave(&conf->segment_lock, flags); - if (sector < conf->start_active) - conf->cnt_done--; - else if (sector >= conf->start_future && conf->phase == phase) - conf->cnt_future--; - else if (!--conf->cnt_pending) - wake_up(&conf->wait_ready); - - spin_unlock_irqrestore(&conf->segment_lock, flags); -} - -static void inline sync_request_done (unsigned long sector, raid1_conf_t *conf) -{ - unsigned long flags; - spin_lock_irqsave(&conf->segment_lock, flags); - if (sector >= conf->start_ready) - --conf->cnt_ready; - else if (sector >= conf->start_active) { - if (!--conf->cnt_active) { - conf->start_active = conf->start_ready; - wake_up(&conf->wait_done); - } - } - spin_unlock_irqrestore(&conf->segment_lock, flags); -} - -/* - * raid1_end_bh_io() is called when we have finished servicing a mirrored - * operation and are ready to return a success/failure code to the buffer - * cache layer. - */ -static void raid1_end_bh_io (struct raid1_bh *r1_bh, int uptodate) -{ - struct buffer_head *bh = r1_bh->master_bh; - - io_request_done(bh->b_rsector, mddev_to_conf(r1_bh->mddev), - test_bit(R1BH_SyncPhase, &r1_bh->state)); - - bh->b_end_io(bh, uptodate); - raid1_free_r1bh(r1_bh); -} -void raid1_end_request (struct buffer_head *bh, int uptodate) -{ - struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private); - - /* - * this branch is our 'one mirror IO has finished' event handler: - */ - if (!uptodate) - md_error (r1_bh->mddev, bh->b_dev); - else - /* - * Set R1BH_Uptodate in our master buffer_head, so that - * we will return a good error code for to the higher - * levels even if IO on some other mirrored buffer fails. - * - * The 'master' represents the complex operation to - * user-side. So if something waits for IO, then it will - * wait for the 'master' buffer_head. - */ - set_bit (R1BH_Uptodate, &r1_bh->state); - - /* - * We split up the read and write side, imho they are - * conceptually different. - */ - - if ( (r1_bh->cmd == READ) || (r1_bh->cmd == READA) ) { - /* - * we have only one buffer_head on the read side - */ - - if (uptodate) { - raid1_end_bh_io(r1_bh, uptodate); - return; - } - /* - * oops, read error: - */ - printk(KERN_ERR "raid1: %s: rescheduling block %lu\n", - partition_name(bh->b_dev), bh->b_blocknr); - raid1_reschedule_retry(r1_bh); - return; - } - - /* - * WRITE: - * - * Let's see if all mirrored write operations have finished - * already. - */ - - if (atomic_dec_and_test(&r1_bh->remaining)) - raid1_end_bh_io(r1_bh, test_bit(R1BH_Uptodate, &r1_bh->state)); -} - -/* - * This routine returns the disk from which the requested read should - * be done. It bookkeeps the last read position for every disk - * in array and when new read requests come, the disk which last - * position is nearest to the request, is chosen. - * - * TODO: now if there are 2 mirrors in the same 2 devices, performance - * degrades dramatically because position is mirror, not device based. - * This should be changed to be device based. Also atomic sequential - * reads should be somehow balanced. - */ - -static int raid1_read_balance (raid1_conf_t *conf, struct buffer_head *bh) -{ - int new_disk = conf->last_used; - const int sectors = bh->b_size >> 9; - const unsigned long this_sector = bh->b_rsector; - int disk = new_disk; - unsigned long new_distance; - unsigned long current_distance; - - /* - * Check if it is sane at all to balance - */ - - if (!conf->mddev->in_sync) - goto rb_out; - - - /* make sure that disk is operational */ - while( !conf->mirrors[new_disk].operational) { - if (new_disk <= 0) new_disk = conf->raid_disks; - new_disk--; - if (new_disk == disk) { - /* - * This means no working disk was found - * Nothing much to do, lets not change anything - * and hope for the best... - */ - - new_disk = conf->last_used; - - goto rb_out; - } - } - disk = new_disk; - /* now disk == new_disk == starting point for search */ - - /* - * Don't touch anything for sequential reads. - */ - - if (this_sector == conf->mirrors[new_disk].head_position) - goto rb_out; - - /* - * If reads have been done only on a single disk - * for a time, lets give another disk a change. - * This is for kicking those idling disks so that - * they would find work near some hotspot. - */ - - if (conf->sect_count >= conf->mirrors[new_disk].sect_limit) { - conf->sect_count = 0; - -#if defined(CONFIG_SPARC64) && (__GNUC__ == 2) && (__GNUC_MINOR__ == 92) - /* Work around a compiler bug in egcs-2.92.11 19980921 */ - new_disk = *(volatile int *)&new_disk; -#endif - do { - if (new_disk<=0) - new_disk = conf->raid_disks; - new_disk--; - if (new_disk == disk) - break; - } while ((conf->mirrors[new_disk].write_only) || - (!conf->mirrors[new_disk].operational)); - - goto rb_out; - } - - current_distance = abs(this_sector - - conf->mirrors[disk].head_position); - - /* Find the disk which is closest */ - - do { - if (disk <= 0) - disk = conf->raid_disks; - disk--; - - if ((conf->mirrors[disk].write_only) || - (!conf->mirrors[disk].operational)) - continue; - - new_distance = abs(this_sector - - conf->mirrors[disk].head_position); - - if (new_distance < current_distance) { - conf->sect_count = 0; - current_distance = new_distance; - new_disk = disk; - } - } while (disk != conf->last_used); - -rb_out: - conf->mirrors[new_disk].head_position = this_sector + sectors; - - conf->last_used = new_disk; - conf->sect_count += sectors; - - return new_disk; -} - -static int raid1_make_request (request_queue_t *q, - struct buffer_head * bh) -{ - mddev_t *mddev = q->queuedata; - raid1_conf_t *conf = mddev_to_conf(mddev); - struct buffer_head *bh_req, *bhl; - struct raid1_bh * r1_bh; - int disks = MD_SB_DISKS; - int i, sum_bhs = 0; - struct mirror_info *mirror; - - if (!buffer_locked(bh)) - BUG(); - -/* - * make_request() can abort the operation when READA is being - * used and no empty request is available. - * - * Currently, just replace the command with READ/WRITE. - */ - r1_bh = raid1_alloc_r1bh (conf); - - spin_lock_irq(&conf->segment_lock); - wait_event_lock_irq(conf->wait_done, - bh->b_rsector < conf->start_active || - bh->b_rsector >= conf->start_future, - conf->segment_lock); - if (bh->b_rsector < conf->start_active) - conf->cnt_done++; - else { - conf->cnt_future++; - if (conf->phase) - set_bit(R1BH_SyncPhase, &r1_bh->state); - } - spin_unlock_irq(&conf->segment_lock); - - /* - * i think the read and write branch should be separated completely, - * since we want to do read balancing on the read side for example. - * Alternative implementations? :) --mingo - */ - - r1_bh->master_bh = bh; - r1_bh->mddev = mddev; - r1_bh->cmd = rw; - - if (rw == READ) { - /* - * read balancing logic: - */ - mirror = conf->mirrors + raid1_read_balance(conf, bh); - - bh_req = &r1_bh->bh_req; - memcpy(bh_req, bh, sizeof(*bh)); - bh_req->b_blocknr = bh->b_rsector; - bh_req->b_dev = mirror->dev; - bh_req->b_rdev = mirror->dev; - /* bh_req->b_rsector = bh->n_rsector; */ - bh_req->b_end_io = raid1_end_request; - bh_req->b_private = r1_bh; - generic_make_request (rw, bh_req); - return 0; - } - - /* - * WRITE: - */ - - bhl = raid1_alloc_bh(conf, conf->raid_disks); - for (i = 0; i < disks; i++) { - struct buffer_head *mbh; - if (!conf->mirrors[i].operational) - continue; - - /* - * We should use a private pool (size depending on NR_REQUEST), - * to avoid writes filling up the memory with bhs - * - * Such pools are much faster than kmalloc anyways (so we waste - * almost nothing by not using the master bh when writing and - * win alot of cleanness) but for now we are cool enough. --mingo - * - * It's safe to sleep here, buffer heads cannot be used in a shared - * manner in the write branch. Look how we lock the buffer at the - * beginning of this function to grok the difference ;) - */ - mbh = bhl; - if (mbh == NULL) { - MD_BUG(); - break; - } - bhl = mbh->b_next; - mbh->b_next = NULL; - mbh->b_this_page = (struct buffer_head *)1; - - /* - * prepare mirrored mbh (fields ordered for max mem throughput): - */ - mbh->b_blocknr = bh->b_rsector; - mbh->b_dev = conf->mirrors[i].dev; - mbh->b_rdev = conf->mirrors[i].dev; - mbh->b_rsector = bh->b_rsector; - mbh->b_state = (1<b_count, 1); - mbh->b_size = bh->b_size; - mbh->b_page = bh->b_page; - mbh->b_data = bh->b_data; - mbh->b_list = BUF_LOCKED; - mbh->b_end_io = raid1_end_request; - mbh->b_private = r1_bh; - - mbh->b_next = r1_bh->mirror_bh_list; - r1_bh->mirror_bh_list = mbh; - sum_bhs++; - } - if (bhl) raid1_free_bh(conf,bhl); - if (!sum_bhs) { - /* Gag - all mirrors non-operational.. */ - raid1_end_bh_io(r1_bh, 0); - return 0; - } - md_atomic_set(&r1_bh->remaining, sum_bhs); - - /* - * We have to be a bit careful about the semaphore above, thats - * why we start the requests separately. Since kmalloc() could - * fail, sleep and make_request() can sleep too, this is the - * safer solution. Imagine, end_request decreasing the semaphore - * before we could have set it up ... We could play tricks with - * the semaphore (presetting it and correcting at the end if - * sum_bhs is not 'n' but we have to do end_request by hand if - * all requests finish until we had a chance to set up the - * semaphore correctly ... lots of races). - */ - bh = r1_bh->mirror_bh_list; - while(bh) { - struct buffer_head *bh2 = bh; - bh = bh->b_next; - generic_make_request(rw, bh2); - } - return (0); -} - -static void raid1_status(struct seq_file *seq, mddev_t *mddev) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - int i; - - seq_printf(seq, " [%d/%d] [", conf->raid_disks, - conf->working_disks); - for (i = 0; i < conf->raid_disks; i++) - seq_printf(seq, "%s", - conf->mirrors[i].operational ? "U" : "_"); - seq_printf(seq, "]"); -} - -#define LAST_DISK KERN_ALERT \ -"raid1: only one disk left and IO error.\n" - -#define NO_SPARE_DISK KERN_ALERT \ -"raid1: no spare disk left, degrading mirror level by one.\n" - -#define DISK_FAILED KERN_ALERT \ -"raid1: Disk failure on %s, disabling device. \n" \ -" Operation continuing on %d devices\n" - -#define START_SYNCING KERN_ALERT \ -"raid1: start syncing spare disk.\n" - -#define ALREADY_SYNCING KERN_INFO \ -"raid1: syncing already in progress.\n" - -static void mark_disk_bad (mddev_t *mddev, int failed) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - struct mirror_info *mirror = conf->mirrors+failed; - mdp_super_t *sb = mddev->sb; - - mirror->operational = 0; - mark_disk_faulty(sb->disks+mirror->number); - mark_disk_nonsync(sb->disks+mirror->number); - mark_disk_inactive(sb->disks+mirror->number); - if (!mirror->write_only) - sb->active_disks--; - sb->working_disks--; - sb->failed_disks++; - mddev->sb_dirty = 1; - md_wakeup_thread(conf->thread); - if (!mirror->write_only) - conf->working_disks--; - printk (DISK_FAILED, partition_name (mirror->dev), - conf->working_disks); -} - -static int raid1_error (mddev_t *mddev, kdev_t dev) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - struct mirror_info * mirrors = conf->mirrors; - int disks = MD_SB_DISKS; - int i; - - /* Find the drive. - * If it is not operational, then we have already marked it as dead - * else if it is the last working disks, ignore the error, let the - * next level up know. - * else mark the drive as failed - */ - - for (i = 0; i < disks; i++) - if (mirrors[i].dev==dev && mirrors[i].operational) - break; - if (i == disks) - return 0; - - if (i < conf->raid_disks && conf->working_disks == 1) { - /* Don't fail the drive, act as though we were just a - * normal single drive - */ - - return 1; - } - mark_disk_bad(mddev, i); - return 0; -} - -#undef LAST_DISK -#undef NO_SPARE_DISK -#undef DISK_FAILED -#undef START_SYNCING - - -static void print_raid1_conf (raid1_conf_t *conf) -{ - int i; - struct mirror_info *tmp; - - printk("RAID1 conf printout:\n"); - if (!conf) { - printk("(conf==NULL)\n"); - return; - } - printk(" --- wd:%d rd:%d nd:%d\n", conf->working_disks, - conf->raid_disks, conf->nr_disks); - - for (i = 0; i < MD_SB_DISKS; i++) { - tmp = conf->mirrors + i; - printk(" disk %d, s:%d, o:%d, n:%d rd:%d us:%d dev:%s\n", - i, tmp->spare,tmp->operational, - tmp->number,tmp->raid_disk,tmp->used_slot, - partition_name(tmp->dev)); - } -} - -static void close_sync(raid1_conf_t *conf) -{ - mddev_t *mddev = conf->mddev; - /* If reconstruction was interrupted, we need to close the "active" and "pending" - * holes. - * we know that there are no active rebuild requests, os cnt_active == cnt_ready ==0 - */ - /* this is really needed when recovery stops too... */ - spin_lock_irq(&conf->segment_lock); - conf->start_active = conf->start_pending; - conf->start_ready = conf->start_pending; - wait_event_lock_irq(conf->wait_ready, !conf->cnt_pending, conf->segment_lock); - conf->start_active =conf->start_ready = conf->start_pending = conf->start_future; - conf->start_future = (mddev->sb->size<<1)+1; - conf->cnt_pending = conf->cnt_future; - conf->cnt_future = 0; - conf->phase = conf->phase ^1; - wait_event_lock_irq(conf->wait_ready, !conf->cnt_pending, conf->segment_lock); - conf->start_active = conf->start_ready = conf->start_pending = conf->start_future = 0; - conf->phase = 0; - conf->cnt_future = conf->cnt_done;; - conf->cnt_done = 0; - spin_unlock_irq(&conf->segment_lock); - wake_up(&conf->wait_done); - - mempool_destroy(conf->r1buf_pool); - conf->r1buf_pool = NULL; -} - -static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state) -{ - int err = 0; - int i, failed_disk=-1, spare_disk=-1, removed_disk=-1, added_disk=-1; - raid1_conf_t *conf = mddev->private; - struct mirror_info *tmp, *sdisk, *fdisk, *rdisk, *adisk; - mdp_super_t *sb = mddev->sb; - mdp_disk_t *failed_desc, *spare_desc, *added_desc; - mdk_rdev_t *spare_rdev, *failed_rdev; - - print_raid1_conf(conf); - - switch (state) { - case DISKOP_SPARE_ACTIVE: - case DISKOP_SPARE_INACTIVE: - /* need to wait for pending sync io before locking device */ - close_sync(conf); - } - - md_spin_lock_irq(&conf->device_lock); - /* - * find the disk ... - */ - switch (state) { - - case DISKOP_SPARE_ACTIVE: - - /* - * Find the failed disk within the RAID1 configuration ... - * (this can only be in the first conf->working_disks part) - */ - for (i = 0; i < conf->raid_disks; i++) { - tmp = conf->mirrors + i; - if ((!tmp->operational && !tmp->spare) || - !tmp->used_slot) { - failed_disk = i; - break; - } - } - /* - * When we activate a spare disk we _must_ have a disk in - * the lower (active) part of the array to replace. - */ - if ((failed_disk == -1) || (failed_disk >= conf->raid_disks)) { - MD_BUG(); - err = 1; - goto abort; - } - /* fall through */ - - case DISKOP_SPARE_WRITE: - case DISKOP_SPARE_INACTIVE: - - /* - * Find the spare disk ... (can only be in the 'high' - * area of the array) - */ - for (i = conf->raid_disks; i < MD_SB_DISKS; i++) { - tmp = conf->mirrors + i; - if (tmp->spare && tmp->number == (*d)->number) { - spare_disk = i; - break; - } - } - if (spare_disk == -1) { - MD_BUG(); - err = 1; - goto abort; - } - break; - - case DISKOP_HOT_REMOVE_DISK: - - for (i = 0; i < MD_SB_DISKS; i++) { - tmp = conf->mirrors + i; - if (tmp->used_slot && (tmp->number == (*d)->number)) { - if (tmp->operational) { - err = -EBUSY; - goto abort; - } - removed_disk = i; - break; - } - } - if (removed_disk == -1) { - MD_BUG(); - err = 1; - goto abort; - } - break; - - case DISKOP_HOT_ADD_DISK: - - for (i = conf->raid_disks; i < MD_SB_DISKS; i++) { - tmp = conf->mirrors + i; - if (!tmp->used_slot) { - added_disk = i; - break; - } - } - if (added_disk == -1) { - MD_BUG(); - err = 1; - goto abort; - } - break; - } - - switch (state) { - /* - * Switch the spare disk to write-only mode: - */ - case DISKOP_SPARE_WRITE: - sdisk = conf->mirrors + spare_disk; - sdisk->operational = 1; - sdisk->write_only = 1; - break; - /* - * Deactivate a spare disk: - */ - case DISKOP_SPARE_INACTIVE: -<<<<<<< found - if (conf->start_future > 0) { - MD_BUG(); - err = -EBUSY; - break; - } -||||||| expected - close_sync(conf); -======= ->>>>>>> replacement - sdisk = conf->mirrors + spare_disk; - sdisk->operational = 0; - sdisk->write_only = 0; - break; - /* - * Activate (mark read-write) the (now sync) spare disk, - * which means we switch it's 'raid position' (->raid_disk) - * with the failed disk. (only the first 'conf->nr_disks' - * slots are used for 'real' disks and we must preserve this - * property) - */ - case DISKOP_SPARE_ACTIVE: -<<<<<<< found - if (conf->start_future > 0) { - MD_BUG(); - err = -EBUSY; - break; - } -||||||| expected - close_sync(conf); -======= ->>>>>>> replacement - sdisk = conf->mirrors + spare_disk; - fdisk = conf->mirrors + failed_disk; - - spare_desc = &sb->disks[sdisk->number]; - failed_desc = &sb->disks[fdisk->number]; - - if (spare_desc != *d) { - MD_BUG(); - err = 1; - goto abort; - } - - if (spare_desc->raid_disk != sdisk->raid_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - if (sdisk->raid_disk != spare_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - if (failed_desc->raid_disk != fdisk->raid_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - if (fdisk->raid_disk != failed_disk) { - MD_BUG(); - err = 1; - goto abort; - } - - /* - * do the switch finally - */ - spare_rdev = find_rdev_nr(mddev, spare_desc->number); - failed_rdev = find_rdev_nr(mddev, failed_desc->number); - - /* There must be a spare_rdev, but there may not be a - * failed_rdev. That slot might be empty... - */ - spare_rdev->desc_nr = failed_desc->number; - if (failed_rdev) - failed_rdev->desc_nr = spare_desc->number; - - xchg_values(*spare_desc, *failed_desc); - xchg_values(*fdisk, *sdisk); - - /* - * (careful, 'failed' and 'spare' are switched from now on) - * - * we want to preserve linear numbering and we want to - * give the proper raid_disk number to the now activated - * disk. (this means we switch back these values) - */ - - xchg_values(spare_desc->raid_disk, failed_desc->raid_disk); - xchg_values(sdisk->raid_disk, fdisk->raid_disk); - xchg_values(spare_desc->number, failed_desc->number); - xchg_values(sdisk->number, fdisk->number); - - *d = failed_desc; - - if (sdisk->dev == MKDEV(0,0)) - sdisk->used_slot = 0; - /* - * this really activates the spare. - */ - fdisk->spare = 0; - fdisk->write_only = 0; - - /* - * if we activate a spare, we definitely replace a - * non-operational disk slot in the 'low' area of - * the disk array. - */ - - conf->working_disks++; - - break; - - case DISKOP_HOT_REMOVE_DISK: - rdisk = conf->mirrors + removed_disk; - - if (rdisk->spare && (removed_disk < conf->raid_disks)) { - MD_BUG(); - err = 1; - goto abort; - } - rdisk->dev = MKDEV(0,0); - rdisk->used_slot = 0; - conf->nr_disks--; - break; - - case DISKOP_HOT_ADD_DISK: - adisk = conf->mirrors + added_disk; - added_desc = *d; - - if (added_disk != added_desc->number) { - MD_BUG(); - err = 1; - goto abort; - } - - adisk->number = added_desc->number; - adisk->raid_disk = added_desc->raid_disk; - adisk->dev = MKDEV(added_desc->major,added_desc->minor); - - adisk->operational = 0; - adisk->write_only = 0; - adisk->spare = 1; - adisk->used_slot = 1; - adisk->head_position = 0; - conf->nr_disks++; - - break; - - default: - MD_BUG(); - err = 1; - goto abort; - } -abort: - md_spin_unlock_irq(&conf->device_lock); -<<<<<<< found - if (state == DISKOP_SPARE_ACTIVE || state == DISKOP_SPARE_INACTIVE) - /* should move to "END_REBUILD" when such exists */ - raid1_shrink_buffers(conf); - - print_raid1_conf(conf); -||||||| expected - if (state == DISKOP_SPARE_ACTIVE || state == DISKOP_SPARE_INACTIVE) { - mempool_destroy(conf->r1buf_pool); - conf->r1buf_pool = NULL; - } - - print_conf(conf); -======= - - print_conf(conf); ->>>>>>> replacement - return err; -} - - -#define IO_ERROR KERN_ALERT \ -"raid1: %s: unrecoverable I/O read error for block %lu\n" - -#define REDIRECT_SECTOR KERN_ERR \ -"raid1: %s: redirecting sector %lu to another mirror\n" - -/* - * This is a kernel thread which: - * - * 1. Retries failed read operations on working mirrors. - * 2. Updates the raid superblock when problems encounter. - * 3. Performs writes following reads for array syncronising. - */ -static void end_sync_write(struct buffer_head *bh, int uptodate); -static void end_sync_read(struct buffer_head *bh, int uptodate); - -static void raid1d (void *data) -{ - struct raid1_bh *r1_bh; - struct buffer_head *bh; - unsigned long flags; - raid1_conf_t *conf = data; - mddev_t *mddev = conf->mddev; - kdev_t dev; - - if (mddev->sb_dirty) - md_update_sb(mddev); - - for (;;) { - md_spin_lock_irqsave(&retry_list_lock, flags); - r1_bh = raid1_retry_list; - if (!r1_bh) - break; - raid1_retry_list = r1_bh->next_r1; - md_spin_unlock_irqrestore(&retry_list_lock, flags); - - mddev = r1_bh->mddev; - bh = &r1_bh->bh_req; - switch(r1_bh->cmd) { - case SPECIAL: - /* have to allocate lots of bh structures and - * schedule writes - */ - if (test_bit(R1BH_Uptodate, &r1_bh->state)) { - int i, sum_bhs = 0; - int disks = MD_SB_DISKS; - struct buffer_head *bhl, *mbh; - - conf = mddev_to_conf(mddev); - bhl = raid1_alloc_bh(conf, conf->raid_disks); /* don't really need this many */ - for (i = 0; i < disks ; i++) { - if (!conf->mirrors[i].operational) - continue; - if (i==conf->last_used) - /* we read from here, no need to write */ - continue; - if (i < conf->raid_disks - && mddev->in_sync) - /* don't need to write this, - * we are just rebuilding */ - continue; - mbh = bhl; - if (!mbh) { - MD_BUG(); - break; - } - bhl = mbh->b_next; - mbh->b_this_page = (struct buffer_head *)1; - - - /* - * prepare mirrored bh (fields ordered for max mem throughput): - */ - mbh->b_blocknr = bh->b_blocknr; - mbh->b_dev = conf->mirrors[i].dev; - mbh->b_rdev = conf->mirrors[i].dev; - mbh->b_rsector = bh->b_blocknr; - mbh->b_state = (1<b_count, 1); - mbh->b_size = bh->b_size; - mbh->b_page = bh->b_page; - mbh->b_data = bh->b_data; - mbh->b_list = BUF_LOCKED; - mbh->b_end_io = end_sync_write; - mbh->b_private = r1_bh; - - mbh->b_next = r1_bh->mirror_bh_list; - r1_bh->mirror_bh_list = mbh; - - sum_bhs++; - } - md_atomic_set(&r1_bh->remaining, sum_bhs); - if (bhl) raid1_free_bh(conf, bhl); - mbh = r1_bh->mirror_bh_list; - - if (!sum_bhs) { - /* nowhere to write this too... I guess we - * must be done - */ - sync_request_done(bh->b_blocknr, conf); - md_done_sync(mddev, bh->b_size>>9, 0); - raid1_free_buf(r1_bh); - } else - while (mbh) { - struct buffer_head *bh1 = mbh; - mbh = mbh->b_next; - generic_make_request(WRITE, bh1); - md_sync_acct(bh1->b_dev, bh1->b_size/512); - } - } else { - /* There is no point trying a read-for-reconstruct - * as reconstruct is about to be aborted - */ - - printk (IO_ERROR, partition_name(bh->b_dev), bh->b_blocknr); - md_done_sync(mddev, bh->b_size>>9, 0); - } - - break; - case READ: - case READA: - dev = bh->b_dev; - raid1_map (mddev, &bh->b_dev); - if (bh->b_dev == dev) { - printk (IO_ERROR, partition_name(bh->b_dev), bh->b_blocknr); - raid1_end_bh_io(r1_bh, 0); - } else { - printk (REDIRECT_SECTOR, - partition_name(bh->b_dev), bh->b_blocknr); - bh->b_rdev = bh->b_dev; - bh->b_rsector = bh->b_blocknr; - generic_make_request (r1_bh->cmd, bh); - } - break; - } - } - md_spin_unlock_irqrestore(&retry_list_lock, flags); -} -#undef IO_ERROR -#undef REDIRECT_SECTOR - -<<<<<<< found -static void raid1syncd (void *data) -{ - raid1_conf_t *conf = data; -||||||| expected -static void raid1syncd(void *data) -{ - conf_t *conf = data; -======= ->>>>>>> replacement - -/* - * perform a "sync" on one "block" - * - * We need to make sure that no normal I/O request - particularly write - * requests - conflict with active sync requests. - * This is achieved by conceptually dividing the device space into a - * number of sections: - * DONE: 0 .. a-1 These blocks are in-sync - * ACTIVE: a.. b-1 These blocks may have active sync requests, but - * no normal IO requests - * READY: b .. c-1 These blocks have no normal IO requests - sync - * request may be happening - * PENDING: c .. d-1 These blocks may have IO requests, but no new - * ones will be added - * FUTURE: d .. end These blocks are not to be considered yet. IO may - * be happening, but not sync - * - * We keep a - * phase which flips (0 or 1) each time d moves and - * a count of: - * z = active io requests in FUTURE since d moved - marked with - * current phase - * y = active io requests in FUTURE before d moved, or PENDING - - * marked with previous phase - * x = active sync requests in READY - * w = active sync requests in ACTIVE - * v = active io requests in DONE - * - * Normally, a=b=c=d=0 and z= active io requests - * or a=b=c=d=END and v= active io requests - * Allowed changes to a,b,c,d: - * A: c==d && y==0 -> d+=window, y=z, z=0, phase=!phase - * B: y==0 -> c=d - * C: b=c, w+=x, x=0 - * D: w==0 -> a=b - * E: a==b==c==d==end -> a=b=c=d=0, z=v, v=0 - * - * At start of sync we apply A. - * When y reaches 0, we apply B then A then being sync requests - * When sync point reaches c-1, we wait for y==0, and W==0, and - * then apply apply B then A then D then C. - * Finally, we apply E - * - * The sync request simply issues a "read" against a working drive - * This is marked so that on completion the raid1d thread is woken to - * issue suitable write requests - */ - -static int raid1_sync_request (mddev_t *mddev, unsigned long sector_nr) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - struct mirror_info *mirror; - struct raid1_bh *r1_bh; - struct buffer_head *bh; - int bsize; - int disk; - int block_nr; - int buffs; - - if (!sector_nr) { - /* we want enough buffers to hold twice the window of 128*/ - buffs = 128 *2 / (PAGE_SIZE>>9); - buffs = raid1_grow_buffers(conf, buffs); - if (buffs < 2) - goto nomem; - conf->window = buffs*(PAGE_SIZE>>9)/2; - } - spin_lock_irq(&conf->segment_lock); - if (!sector_nr) { - /* initialize ...*/ - conf->start_active = 0; - conf->start_ready = 0; - conf->start_pending = 0; - conf->start_future = 0; - conf->phase = 0; - - conf->cnt_future += conf->cnt_done+conf->cnt_pending; - conf->cnt_done = conf->cnt_pending = 0; - if (conf->cnt_ready || conf->cnt_active) - MD_BUG(); - } - while (sector_nr >= conf->start_pending) { - PRINTK("wait .. sect=%lu start_active=%d ready=%d pending=%d future=%d, cnt_done=%d active=%d ready=%d pending=%d future=%d\n", - sector_nr, conf->start_active, conf->start_ready, conf->start_pending, conf->start_future, - conf->cnt_done, conf->cnt_active, conf->cnt_ready, conf->cnt_pending, conf->cnt_future); - wait_event_lock_irq(conf->wait_done, - !conf->cnt_active, - conf->segment_lock); - wait_event_lock_irq(conf->wait_ready, - !conf->cnt_pending, - conf->segment_lock); - conf->start_active = conf->start_ready; - conf->start_ready = conf->start_pending; - conf->start_pending = conf->start_future; - conf->start_future = conf->start_future+conf->window; - // Note: falling off the end is not a problem - conf->phase = conf->phase ^1; - conf->cnt_active = conf->cnt_ready; - conf->cnt_ready = 0; - conf->cnt_pending = conf->cnt_future; - conf->cnt_future = 0; - wake_up(&conf->wait_done); - } - conf->cnt_ready++; - spin_unlock_irq(&conf->segment_lock); - - - /* If reconstructing, and >1 working disc, - * could dedicate one to rebuild and others to - * service read requests .. - */ - disk = conf->last_used; - /* make sure disk is operational */ - while (!conf->mirrors[disk].operational) { - if (disk <= 0) disk = conf->raid_disks; - disk--; - if (disk == conf->last_used) - break; - } - conf->last_used = disk; - - mirror = conf->mirrors+conf->last_used; - - r1_bh = raid1_alloc_buf (conf); - r1_bh->master_bh = NULL; - r1_bh->mddev = mddev; - r1_bh->cmd = SPECIAL; - bh = &r1_bh->bh_req; - - block_nr = sector_nr; - bsize = 512; - while (!(block_nr & 1) && bsize < PAGE_SIZE - && (block_nr+2)*(bsize>>9) < (mddev->sb->size *2)) { - block_nr >>= 1; - bsize <<= 1; - } - bh->b_size = bsize; - bh->b_list = BUF_LOCKED; - bh->b_dev = mirror->dev; - bh->b_rdev = mirror->dev; - bh->b_state = (1<b_page) - BUG(); - if (!bh->b_data) - BUG(); - if (bh->b_data != page_address(bh->b_page)) - BUG(); - bh->b_end_io = end_sync_read; - bh->b_private = r1_bh; - bh->b_blocknr = sector_nr; - bh->b_rsector = sector_nr; - init_waitqueue_head(&bh->b_wait); - - generic_make_request(READ, bh); - md_sync_acct(bh->b_dev, bh->b_size/512); - - return (bsize >> 9); - -nomem: -<<<<<<< found - raid1_shrink_buffers(conf); - return -ENOMEM; -} - -static void end_sync_read(struct buffer_head *bh, int uptodate) -||||||| expected - if (!sector_nr) - if (init_resync(conf)) - return -ENOMEM; - /* - * If there is non-resync activity waiting for us then - * put in a delay to throttle resync. -======= - if (sector_nr == 0) - if (init_resync(conf)) - return -ENOMEM; - - max_sector = mddev->sb->size << 1; - if (sector_nr >= max_sector) { - close_sync(conf); - return 0; - } - - /* - * If there is non-resync activity waiting for us then - * put in a delay to throttle resync. ->>>>>>> replacement -{ - struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private); - - /* we have read a block, now it needs to be re-written, - * or re-read if the read failed. - * We don't do much here, just schedule handling by raid1d - */ - if (!uptodate) - md_error (r1_bh->mddev, bh->b_dev); - else - set_bit(R1BH_Uptodate, &r1_bh->state); - raid1_reschedule_retry(r1_bh); -} - -static void end_sync_write(struct buffer_head *bh, int uptodate) -{ - struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private); - - if (!uptodate) - md_error (r1_bh->mddev, bh->b_dev); - if (atomic_dec_and_test(&r1_bh->remaining)) { - mddev_t *mddev = r1_bh->mddev; -<<<<<<< found - unsigned long sect = bh->b_blocknr; - int size = bh->b_size; - raid1_free_buf(r1_bh); - sync_request_done(sect, mddev_to_conf(mddev)); - md_done_sync(mddev,size>>9, uptodate); -||||||| expected - r1_bio->sector = sector_nr; - r1_bio->cmd = SPECIAL; - - max_sector = mddev->sb->size << 1; - if (sector_nr >= max_sector) - BUG(); - -======= - r1_bio->sector = sector_nr; - r1_bio->cmd = SPECIAL; - ->>>>>>> replacement - } -} - -#define INVALID_LEVEL KERN_WARNING \ -"raid1: md%d: raid level not set to mirroring (%d)\n" - -#define NO_SB KERN_ERR \ -"raid1: disabled mirror %s (couldn't access raid superblock)\n" - -#define ERRORS KERN_ERR \ -"raid1: disabled mirror %s (errors detected)\n" - -#define NOT_IN_SYNC KERN_ERR \ -"raid1: disabled mirror %s (not in sync)\n" - -#define INCONSISTENT KERN_ERR \ -"raid1: disabled mirror %s (inconsistent descriptor)\n" - -#define ALREADY_RUNNING KERN_ERR \ -"raid1: disabled mirror %s (mirror %d already operational)\n" - -#define OPERATIONAL KERN_INFO \ -"raid1: device %s operational as mirror %d\n" - -#define MEM_ERROR KERN_ERR \ -"raid1: couldn't allocate memory for md%d\n" - -#define SPARE KERN_INFO \ -"raid1: spare disk %s\n" - -#define NONE_OPERATIONAL KERN_ERR \ -"raid1: no operational mirrors for md%d\n" - -#define ARRAY_IS_ACTIVE KERN_INFO \ -"raid1: raid set md%d active with %d out of %d mirrors\n" - -#define THREAD_ERROR KERN_ERR \ -"raid1: couldn't allocate thread for md%d\n" - -#define START_RESYNC KERN_WARNING \ -"raid1: raid set md%d not clean; reconstructing mirrors\n" - -static int raid1_run (mddev_t *mddev) -{ - raid1_conf_t *conf; - int i, j, disk_idx; - struct mirror_info *disk; - mdp_super_t *sb = mddev->sb; - mdp_disk_t *descriptor; - mdk_rdev_t *rdev; - struct md_list_head *tmp; - - MOD_INC_USE_COUNT; - - if (sb->level != 1) { - printk(INVALID_LEVEL, mdidx(mddev), sb->level); - goto out; - } - /* - * copy the already verified devices into our private RAID1 - * bookkeeping area. [whatever we allocate in raid1_run(), - * should be freed in raid1_stop()] - */ - - conf = kmalloc(sizeof(raid1_conf_t), GFP_KERNEL); - mddev->private = conf; - if (!conf) { - printk(MEM_ERROR, mdidx(mddev)); - goto out; - } - memset(conf, 0, sizeof(*conf)); - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) { - printk(ERRORS, partition_name(rdev->dev)); - } else { - if (!rdev->sb) { - MD_BUG(); - continue; - } - } - if (rdev->desc_nr == -1) { - MD_BUG(); - continue; - } - descriptor = &sb->disks[rdev->desc_nr]; - disk_idx = descriptor->raid_disk; - disk = conf->mirrors + disk_idx; - - if (disk_faulty(descriptor)) { - disk->number = descriptor->number; - disk->raid_disk = disk_idx; - disk->dev = rdev->dev; - disk->sect_limit = MAX_WORK_PER_DISK; - disk->operational = 0; - disk->write_only = 0; - disk->spare = 0; - disk->used_slot = 1; - disk->head_position = 0; - continue; - } - if (disk_active(descriptor)) { - if (!disk_sync(descriptor)) { - printk(NOT_IN_SYNC, - partition_name(rdev->dev)); - continue; - } - if ((descriptor->number > MD_SB_DISKS) || - (disk_idx > sb->raid_disks)) { - - printk(INCONSISTENT, - partition_name(rdev->dev)); - continue; - } - if (disk->operational) { - printk(ALREADY_RUNNING, - partition_name(rdev->dev), - disk_idx); - continue; - } - printk(OPERATIONAL, partition_name(rdev->dev), - disk_idx); - disk->number = descriptor->number; - disk->raid_disk = disk_idx; - disk->dev = rdev->dev; - disk->sect_limit = MAX_WORK_PER_DISK; - disk->operational = 1; - disk->write_only = 0; - disk->spare = 0; - disk->used_slot = 1; - disk->head_position = 0; - conf->working_disks++; - } else { - /* - * Must be a spare disk .. - */ - printk(SPARE, partition_name(rdev->dev)); - disk->number = descriptor->number; - disk->raid_disk = disk_idx; - disk->dev = rdev->dev; - disk->sect_limit = MAX_WORK_PER_DISK; - disk->operational = 0; - disk->write_only = 0; - disk->spare = 1; - disk->used_slot = 1; - disk->head_position = 0; - } - } - conf->raid_disks = sb->raid_disks; - conf->nr_disks = sb->nr_disks; - conf->mddev = mddev; - conf->device_lock = MD_SPIN_LOCK_UNLOCKED; - - conf->segment_lock = MD_SPIN_LOCK_UNLOCKED; - init_waitqueue_head(&conf->wait_buffer); - init_waitqueue_head(&conf->wait_done); - init_waitqueue_head(&conf->wait_ready); - - if (!conf->working_disks) { - printk(NONE_OPERATIONAL, mdidx(mddev)); - goto out_free_conf; - } - - - /* pre-allocate some buffer_head structures. - * As a minimum, 1 r1bh and raid_disks buffer_heads - * would probably get us by in tight memory situations, - * but a few more is probably a good idea. - * For now, try NR_RESERVED_BUFS r1bh and - * NR_RESERVED_BUFS*raid_disks bufferheads - * This will allow at least NR_RESERVED_BUFS concurrent - * reads or writes even if kmalloc starts failing - */ - if (raid1_grow_r1bh(conf, NR_RESERVED_BUFS) < NR_RESERVED_BUFS || - raid1_grow_bh(conf, NR_RESERVED_BUFS*conf->raid_disks) - < NR_RESERVED_BUFS*conf->raid_disks) { - printk(MEM_ERROR, mdidx(mddev)); - goto out_free_conf; - } - - for (i = 0; i < MD_SB_DISKS; i++) { - - descriptor = sb->disks+i; - disk_idx = descriptor->raid_disk; - disk = conf->mirrors + disk_idx; - - if (disk_faulty(descriptor) && (disk_idx < conf->raid_disks) && - !disk->used_slot) { - - disk->number = descriptor->number; - disk->raid_disk = disk_idx; - disk->dev = MKDEV(0,0); - - disk->operational = 0; - disk->write_only = 0; - disk->spare = 0; - disk->used_slot = 1; - disk->head_position = 0; - } - } - - /* - * find the first working one and use it as a starting point - * to read balancing. - */ - for (j = 0; !conf->mirrors[j].operational && j < MD_SB_DISKS; j++) - /* nothing */; - conf->last_used = j; - - - - { - const char * name = "raid1d"; - - conf->thread = md_register_thread(raid1d, conf, name); - if (!conf->thread) { - printk(THREAD_ERROR, mdidx(mddev)); - goto out_free_conf; - } - } - -<<<<<<< found - if (!start_recovery && !(sb->state & (1 << MD_SB_CLEAN)) && - (conf->working_disks > 1)) { - const char * name = "raid1syncd"; - - conf->resync_thread = md_register_thread(raid1syncd, conf,name); -||||||| expected - if (!start_recovery && !(sb->state & (1 << MD_SB_CLEAN)) && - (conf->working_disks > 1)) { - const char * name = "raid1syncd"; - - conf->resync_thread = md_register_thread(raid1syncd, conf, name); -======= ->>>>>>> replacement - - /* - * Regenerate the "device is in sync with the raid set" bit for - * each device. - */ - for (i = 0; i < MD_SB_DISKS; i++) { - mark_disk_nonsync(sb->disks+i); - for (j = 0; j < sb->raid_disks; j++) { - if (!conf->mirrors[j].operational) - continue; - if (sb->disks[i].number == conf->mirrors[j].number) - mark_disk_sync(sb->disks+i); - } - } - sb->active_disks = conf->working_disks; - - printk(ARRAY_IS_ACTIVE, mdidx(mddev), sb->active_disks, sb->raid_disks); - /* - * Ok, everything is just fine now - */ - return 0; - -out_free_conf: - raid1_shrink_r1bh(conf); - raid1_shrink_bh(conf); - raid1_shrink_buffers(conf); - kfree(conf); - mddev->private = NULL; -out: - MOD_DEC_USE_COUNT; - return -EIO; -} - -#undef INVALID_LEVEL -#undef NO_SB -#undef ERRORS -#undef NOT_IN_SYNC -#undef INCONSISTENT -#undef ALREADY_RUNNING -#undef OPERATIONAL -#undef SPARE -#undef NONE_OPERATIONAL -#undef ARRAY_IS_ACTIVE - -<<<<<<< found -static int raid1_stop_resync (mddev_t *mddev) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - - if (conf->resync_thread) { - if (conf->resync_mirrors) { - md_interrupt_thread(conf->resync_thread); - - printk(KERN_INFO "raid1: mirror resync was not fully finished, restarting next time.\n"); - return 1; - } - return 0; - } - return 0; -} - -static int raid1_restart_resync (mddev_t *mddev) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); -||||||| expected -static int stop_resync(mddev_t *mddev) -{ - conf_t *conf = mddev_to_conf(mddev); - - if (conf->resync_thread) { - if (conf->resync_mirrors) { - md_interrupt_thread(conf->resync_thread); - - printk(KERN_INFO "raid1: mirror resync was not fully finished, restarting next time.\n"); - return 1; - } - return 0; - } - return 0; -} - -static int restart_resync(mddev_t *mddev) -{ - conf_t *conf = mddev_to_conf(mddev); -======= ->>>>>>> replacement -static int raid1_stop (mddev_t *mddev) -{ - raid1_conf_t *conf = mddev_to_conf(mddev); - - md_unregister_thread(conf->thread); - raid1_shrink_r1bh(conf); - raid1_shrink_bh(conf); - raid1_shrink_buffers(conf); - kfree(conf); - mddev->private = NULL; - MOD_DEC_USE_COUNT; - return 0; -} - -static mdk_personality_t raid1_personality= -{ - name: "raid1", - make_request: raid1_make_request, - run: raid1_run, - stop: raid1_stop, - status: raid1_status, - error_handler: raid1_error, - diskop: raid1_diskop, -<<<<<<< found - stop_resync: raid1_stop_resync, - restart_resync: raid1_restart_resync, -||||||| expected - stop_resync: stop_resync, - restart_resync: restart_resync, -======= ->>>>>>> replacement - sync_request: raid1_sync_request -}; - -static int md__init raid1_init (void) -{ - return register_md_personality (RAID1, &raid1_personality); -} - -static void raid1_exit (void) -{ - unregister_md_personality (RAID1); -} - -module_init(raid1_init); -module_exit(raid1_exit); -MODULE_LICENSE("GPL"); ./linux/md-resync/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.361397361 +0000 @@ -1,93 +0,0 @@ -@@ -1,90 +1,90 @@ -| return <<<--0-->>><<<++1++>>>; -|<<<--abort:-->>><<<++}++>>> -|<<<-- return-->>><<<++ -|#undef++>>> <<<--1; -|}-->>><<<++OLD_LEVEL++>>> - - static int device_size_calculation(mddev_t * mddev) - { - int data_disks = 0; - unsigned int readahead; - struct list_head *tmp; - mdk_rdev_t *rdev; - - /* - * Do device size calculation. Bail out if too small. - * (we have to do this after having validated chunk_size, - * because device size has to be modulo chunk_size) - */ - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size < mddev->chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size:" - " %lluk < %dk\n", - bdev_partition_name(rdev->bdev), - (unsigned long long)rdev->size, - mddev->chunk_size / 1024); - return -EINVAL; - } - } - - switch (mddev->level) { - case LEVEL_MULTIPATH: - data_disks = 1; - break; - case -3: - data_disks = 1; - break; - case -2: - data_disks = 1; - break; - case LEVEL_LINEAR: - zoned_raid_size(mddev); - data_disks = 1; - break; - case 0: - zoned_raid_size(mddev); - data_disks = mddev->raid_disks; - break; - case 1: - data_disks = 1; - break; - case 4: - case 5: - data_disks = mddev->raid_disks-1; - break; - default: - printk(KERN_ERR "md: md%d: unsupported raid level %d\n", - mdidx(mddev), mddev->level); - goto abort; - } - if (!md_size[mdidx(mddev)]) - md_size[mdidx(mddev)] = mddev->size * data_disks; - - readahead = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE; - if (!mddev->level || (mddev->level == 4) || (mddev->level == 5)) { - readahead = (mddev->chunk_size>>PAGE_SHIFT) * 4 * data_disks; - if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) - readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; - } else { - // (no multipath branch - it uses the default setting) - if (mddev->level == -3) - readahead = 0; - } - - printk(KERN_INFO "md%d: max total readahead window set to %ldk\n", - mdidx(mddev), readahead*(PAGE_SIZE/1024)); - - printk(KERN_INFO - "md%d: %d data-disks, max readahead per data-disk: %ldk\n", - mdidx(mddev), data_disks, readahead/data_disks*(PAGE_SIZE/1024)); - return 0; - abort: - return 1; - } - - static struct gendisk *md_probe(dev_t dev, int *part, void *data) - { - static DECLARE_MUTEX(disks_sem); -|<<<-- -->>> \ No newline at end of file ./linux/md-messy/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.372061600 +0000 @@ -1,3960 +0,0 @@ -/* - md.c : Multiple Devices driver for Linux - Copyright (C) 1998, 1999, 2000 Ingo Molnar - - completely rewritten, based on the MD driver code from Marc Zyngier - - Changes: - - - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar - - boot support for linear and striped mode by Harald Hoyer - - kerneld support by Boris Tobotras - - kmod support by: Cyrus Durgin - - RAID0 bugfixes: Mark Anthony Lisher - - Devfs support by Richard Gooch - - - lots of fixes and improvements to the RAID1/RAID5 and generic - RAID code (such as request based resynchronization): - - Neil Brown . - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - You should have received a copy of the GNU General Public License - (for example /usr/src/linux/COPYING); if not, write to the Free - Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -*/ - -#include -#include -#include -#include -#include -#include - -#include - -#ifdef CONFIG_KMOD -#include -#endif - -#define __KERNEL_SYSCALLS__ -#include - -#include - -#define MAJOR_NR MD_MAJOR -#define MD_DRIVER - -#include - -#define DEBUG 0 -#if DEBUG -# define dprintk(x...) printk(x) -#else -# define dprintk(x...) do { } while(0) -#endif - -#ifndef MODULE -static void autostart_arrays (void); -#endif - -static mdk_personality_t *pers[MAX_PERSONALITY]; - -/* - * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' - * is 100 KB/sec, so the extra system load does not show up that much. - * Increase it if you want to have more _guaranteed_ speed. Note that - * the RAID driver will use the maximum available bandwith if the IO - * subsystem is idle. There is also an 'absolute maximum' reconstruction - * speed limit - in case reconstruction slows down your system despite - * idle IO detection. - * - * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. - */ - -static int sysctl_speed_limit_min = 100; -static int sysctl_speed_limit_max = 100000; - -static struct ctl_table_header *raid_table_header; - -static ctl_table raid_table[] = { - {DEV_RAID_SPEED_LIMIT_MIN, "speed_limit_min", - &sysctl_speed_limit_min, sizeof(int), 0644, NULL, &proc_dointvec}, - {DEV_RAID_SPEED_LIMIT_MAX, "speed_limit_max", - &sysctl_speed_limit_max, sizeof(int), 0644, NULL, &proc_dointvec}, - {0} -}; - -static ctl_table raid_dir_table[] = { - {DEV_RAID, "raid", NULL, 0, 0555, raid_table}, - {0} -}; - -static ctl_table raid_root_table[] = { - {CTL_DEV, "dev", NULL, 0, 0555, raid_dir_table}, - {0} -}; - -/* - * these have to be allocated separately because external - * subsystems want to have a pre-defined structure - */ -struct hd_struct md_hd_struct[MAX_MD_DEVS]; -static int md_blocksizes[MAX_MD_DEVS]; -static int md_hardsect_sizes[MAX_MD_DEVS]; -static mdk_thread_t *md_recovery_thread; - -int md_size[MAX_MD_DEVS]; - -static struct block_device_operations md_fops; -static devfs_handle_t devfs_handle; - -static struct gendisk md_gendisk= -{ - major: MD_MAJOR, - major_name: "md", - minor_shift: 0, - max_p: 1, - part: md_hd_struct, - sizes: md_size, - nr_real: MAX_MD_DEVS, - real_devices: NULL, - next: NULL, - fops: &md_fops, -}; - -/* - * Enables to iterate over all existing md arrays - */ -static MD_LIST_HEAD(all_mddevs); - -static mddev_t *mddev_map[MAX_MD_DEVS]; - -static inline mddev_t * kdev_to_mddev (kdev_t dev) -{ - if (MAJOR(dev) != MD_MAJOR) - BUG(); - return mddev_map[MINOR(dev)]; -} - -static int md_fail_request (request_queue_t *q, struct bio *bio) -{ - bio_io_error(bio); - return 0; -} - -static mddev_t * alloc_mddev(kdev_t dev) -{ - mddev_t *mddev; - - if (MAJOR(dev) != MD_MAJOR) { - MD_BUG(); - return 0; - } - mddev = (mddev_t *) kmalloc(sizeof(*mddev), GFP_KERNEL); - if (!mddev) - return NULL; - - memset(mddev, 0, sizeof(*mddev)); - - mddev->__minor = MINOR(dev); - init_MUTEX(&mddev->reconfig_sem); - init_MUTEX(&mddev->recovery_sem); - init_MUTEX(&mddev->resync_sem); - MD_INIT_LIST_HEAD(&mddev->disks); - MD_INIT_LIST_HEAD(&mddev->all_mddevs); - atomic_set(&mddev->active, 0); - - mddev_map[mdidx(mddev)] = mddev; - md_list_add(&mddev->all_mddevs, &all_mddevs); - - MOD_INC_USE_COUNT; - - return mddev; -} - -mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) -{ - mdk_rdev_t * rdev; - struct md_list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == nr) - return rdev; - } - return NULL; -} - -mdk_rdev_t * find_rdev(mddev_t * mddev, kdev_t dev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->dev == dev) - return rdev; - } - return NULL; -} - -static MD_LIST_HEAD(device_names); - -char * partition_name(kdev_t dev) -{ - struct gendisk *hd; - static char nomem [] = ""; - dev_name_t *dname; - struct md_list_head *tmp; - - list_for_each(tmp, &device_names) { - dname = md_list_entry(tmp, dev_name_t, list); - if (dname->dev == dev) - return dname->name; - } - - dname = (dev_name_t *) kmalloc(sizeof(*dname), GFP_KERNEL); - - if (!dname) - return nomem; - /* - * ok, add this new device name to the list - */ - hd = get_gendisk (dev); - dname->name = NULL; - if (hd) - dname->name = disk_name (hd, MINOR(dev), dname->namebuf); - if (!dname->name) { - sprintf (dname->namebuf, "[dev %s]", kdevname(dev)); - dname->name = dname->namebuf; - } - - dname->dev = dev; - md_list_add(&dname->list, &device_names); - - return dname->name; -} - -static unsigned int calc_dev_sboffset(kdev_t dev, mddev_t *mddev, - int persistent) -{ - unsigned int size = 0; - - if (blk_size[MAJOR(dev)]) - size = blk_size[MAJOR(dev)][MINOR(dev)]; - if (persistent) - size = MD_NEW_SIZE_BLOCKS(size); - return size; -} - -static unsigned int calc_dev_size(kdev_t dev, mddev_t *mddev, int persistent) -{ - unsigned int size; - - size = calc_dev_sboffset(dev, mddev, persistent); - if (!mddev->sb) { - MD_BUG(); - return size; - } - if (mddev->sb->chunk_size) - size &= ~(mddev->sb->chunk_size/1024 - 1); - return size; -} - -static unsigned int zoned_raid_size(mddev_t *mddev) -{ - unsigned int mask; - mdk_rdev_t * rdev; - struct md_list_head *tmp; - - if (!mddev->sb) { - MD_BUG(); - return -EINVAL; - } - /* - * do size and offset calculations. - */ - mask = ~(mddev->sb->chunk_size/1024 - 1); - - ITERATE_RDEV(mddev,rdev,tmp) { - rdev->size &= mask; - md_size[mdidx(mddev)] += rdev->size; - } - return 0; -} - -static void remove_descriptor(mdp_disk_t *disk, mdp_super_t *sb) -{ - if (disk_active(disk)) { - sb->working_disks--; - } else { - if (disk_spare(disk)) { - sb->spare_disks--; - sb->working_disks--; - } else { - sb->failed_disks--; - } - } - sb->nr_disks--; - disk->major = 0; - disk->minor = 0; - mark_disk_removed(disk); -} - -#define BAD_MAGIC KERN_ERR \ -"md: invalid raid superblock magic on %s\n" - -#define BAD_MINOR KERN_ERR \ -"md: %s: invalid raid minor (%x)\n" - -#define OUT_OF_MEM KERN_ALERT \ -"md: out of memory.\n" - -#define NO_SB KERN_ERR \ -"md: disabled device %s, could not read superblock.\n" - -#define BAD_CSUM KERN_WARNING \ -"md: invalid superblock checksum on %s\n" - -static int alloc_array_sb(mddev_t * mddev) -{ - if (mddev->sb) { - MD_BUG(); - return 0; - } - - mddev->sb = (mdp_super_t *) __get_free_page (GFP_KERNEL); - if (!mddev->sb) - return -ENOMEM; - md_clear_page(mddev->sb); - return 0; -} - -static int alloc_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb) - MD_BUG(); - - rdev->sb_page = alloc_page(GFP_KERNEL); - if (!rdev->sb_page) { - printk(OUT_OF_MEM); - return -EINVAL; - } - rdev->sb = (mdp_super_t *) page_address(rdev->sb_page); - - return 0; -} - -static void free_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) { - page_cache_release(rdev->sb_page); - rdev->sb = NULL; - rdev->sb_page = NULL; - rdev->sb_offset = 0; - rdev->size = 0; - } else { - if (!rdev->faulty) - MD_BUG(); - } -} - - -static void bh_complete(struct buffer_head *bh, int uptodate) -{ - - if (uptodate) - set_bit(BH_Uptodate, &bh->b_state); - - complete((struct completion*)bh->b_private); -} - -static int sync_page_io(kdev_t dev, unsigned long sector, int size, - struct page *page, int rw) -{ - struct buffer_head bh; - struct completion event; - - init_completion(&event); - init_buffer(&bh, bh_complete, &event); - bh.b_rdev = dev; - bh.b_rsector = sector; - bh.b_state = (1 << BH_Req) | (1 << BH_Mapped) | (1 << BH_Lock); - bh.b_size = size; - bh.b_page = page; - bh.b_reqnext = NULL; - bh.b_data = page_address(page); - generic_make_request(rw, &bh); - - run_task_queue(&tq_disk); - wait_for_completion(&event); - - return test_bit(BH_Uptodate, &bh.b_state); -} - -static int read_disk_sb(mdk_rdev_t * rdev) -{ - int ret = -EINVAL; - kdev_t dev = rdev->dev; - unsigned long sb_offset; - - if (!rdev->sb) { - MD_BUG(); - goto abort; - } - - /* - * Calculate the position of the superblock, - * it's at the end of the disk - */ - sb_offset = calc_dev_sboffset(rdev->dev, rdev->mddev, 1); - rdev->sb_offset = sb_offset; - - if (!sync_page_io(dev, sb_offset<<1, MD_SB_BYTES, rdev->sb_page, READ)) { - printk(NO_SB,partition_name(dev)); - return -EINVAL; - } - printk(KERN_INFO " [events: %08lx]\n", (unsigned long)rdev->sb->events_lo); - ret = 0; -abort: - return ret; -} - -static unsigned int calc_sb_csum(mdp_super_t * sb) -{ - unsigned int disk_csum, csum; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); - sb->sb_csum = disk_csum; - return csum; -} - -/* - * Check one RAID superblock for generic plausibility - */ - -static int check_disk_sb(mdk_rdev_t * rdev) -{ - mdp_super_t *sb; - int ret = -EINVAL; - - sb = rdev->sb; - if (!sb) { - MD_BUG(); - goto abort; - } - - if (sb->md_magic != MD_SB_MAGIC) { - printk(BAD_MAGIC, partition_name(rdev->dev)); - goto abort; - } - - if (sb->md_minor >= MAX_MD_DEVS) { - printk(BAD_MINOR, partition_name(rdev->dev), sb->md_minor); - goto abort; - } - - if (calc_sb_csum(sb) != sb->sb_csum) { - printk(BAD_CSUM, partition_name(rdev->dev)); - goto abort; - } - ret = 0; -abort: - return ret; -} - -static kdev_t dev_unit(kdev_t dev) -{ - unsigned int mask; - struct gendisk *hd = get_gendisk(dev); - - if (!hd) - return 0; - mask = ~((1 << hd->minor_shift) - 1); - - return MKDEV(MAJOR(dev), MINOR(dev) & mask); -} - -static mdk_rdev_t * match_dev_unit(mddev_t *mddev, kdev_t dev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) - if (dev_unit(rdev->dev) == dev_unit(dev)) - return rdev; - - return NULL; -} - -static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev1,rdev,tmp) - if (match_dev_unit(mddev2, rdev->dev)) - return 1; - - return 0; -} - -static MD_LIST_HEAD(all_raid_disks); -static MD_LIST_HEAD(pending_raid_disks); - -static void bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) -{ - mdk_rdev_t *same_pdev; - - if (rdev->mddev) { - MD_BUG(); - return; - } - same_pdev = match_dev_unit(mddev, rdev->dev); - if (same_pdev) - printk( KERN_WARNING -"md%d: WARNING: %s appears to be on the same physical disk as %s. True\n" -" protection against single-disk failure might be compromised.\n", - mdidx(mddev), partition_name(rdev->dev), - partition_name(same_pdev->dev)); - - md_list_add(&rdev->same_set, &mddev->disks); - rdev->mddev = mddev; - printk(KERN_INFO "md: bind<%s>\n", partition_name(rdev->dev)); -} - -static void unbind_rdev_from_array(mdk_rdev_t * rdev) -{ - if (!rdev->mddev) { - MD_BUG(); - return; - } - list_del_init(&rdev->same_set); - printk(KERN_INFO "md: unbind<%s>\n", partition_name(rdev->dev)); - rdev->mddev = NULL; -} - -/* - * prevent the device from being mounted, repartitioned or - * otherwise reused by a RAID array (or any other kernel - * subsystem), by opening the device. [simply getting an - * inode is not enough, the SCSI module usage code needs - * an explicit open() on the device] - */ -static int lock_rdev(mdk_rdev_t *rdev) -{ - int err = 0; - struct block_device *bdev; - - bdev = bdget(rdev->dev); - if (!bdev) - return -ENOMEM; - err = blkdev_get(bdev, FMODE_READ|FMODE_WRITE, 0, BDEV_RAW); - if (!err) - rdev->bdev = bdev; - return err; -} - -static void unlock_rdev(mdk_rdev_t *rdev) -{ - struct block_device *bdev = rdev->bdev; - rdev->bdev = NULL; - if (!bdev) - MD_BUG(); - blkdev_put(bdev, BDEV_RAW); -} - -void md_autodetect_dev(kdev_t dev); - -static void export_rdev(mdk_rdev_t * rdev) -{ - printk(KERN_INFO "md: export_rdev(%s)\n",partition_name(rdev->dev)); - if (rdev->mddev) - MD_BUG(); - unlock_rdev(rdev); - free_disk_sb(rdev); - list_del_init(&rdev->all); - if (!list_empty(&rdev->pending)) { - printk(KERN_INFO "md: (%s was pending)\n", - partition_name(rdev->dev)); - list_del_init(&rdev->pending); - } -#ifndef MODULE - md_autodetect_dev(rdev->dev); -#endif - rdev->dev = 0; - rdev->faulty = 0; - kfree(rdev); -} - -static void kick_rdev_from_array(mdk_rdev_t * rdev) -{ - unbind_rdev_from_array(rdev); - export_rdev(rdev); -} - -static void export_array(mddev_t *mddev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - mdp_super_t *sb = mddev->sb; - - if (mddev->sb) { - mddev->sb = NULL; - free_page((unsigned long) sb); - } - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!rdev->mddev) { - MD_BUG(); - continue; - } - kick_rdev_from_array(rdev); - } - if (!list_empty(&mddev->disks)) - MD_BUG(); -} - -static void free_mddev(mddev_t *mddev) -{ - if (!mddev) { - MD_BUG(); - return; - } - - export_array(mddev); - md_size[mdidx(mddev)] = 0; - md_hd_struct[mdidx(mddev)].nr_sects = 0; - - /* - * Make sure nobody else is using this mddev - * (careful, we rely on the global kernel lock here) - */ - while (sem_getcount(&mddev->resync_sem) != 1) - schedule(); - while (sem_getcount(&mddev->recovery_sem) != 1) - schedule(); - -<<<<<<< found - del_mddev_mapping(mddev, MKDEV(MD_MAJOR, mdidx(mddev))); -||||||| expected - del_mddev_mapping(mddev, mk_kdev(MD_MAJOR, mdidx(mddev))); -======= - mddev_map[mdidx(mddev)] = NULL; ->>>>>>> replacement - md_list_del(&mddev->all_mddevs); - kfree(mddev); - MOD_DEC_USE_COUNT; -} - -#undef BAD_CSUM -#undef BAD_MAGIC -#undef OUT_OF_MEM -#undef NO_SB - -static void print_desc(mdp_disk_t *desc) -{ - printk(" DISK\n", desc->number, - partition_name(MKDEV(desc->major,desc->minor)), - desc->major,desc->minor,desc->raid_disk,desc->state); -} - -static void print_sb(mdp_super_t *sb) -{ - int i; - - printk(KERN_INFO "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", - sb->major_version, sb->minor_version, sb->patch_version, - sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, - sb->ctime); - printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", sb->level, - sb->size, sb->nr_disks, sb->raid_disks, sb->md_minor, - sb->layout, sb->chunk_size); - printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d FD:%d SD:%d CSUM:%08x E:%08lx\n", - sb->utime, sb->state, sb->active_disks, sb->working_disks, - sb->failed_disks, sb->spare_disks, - sb->sb_csum, (unsigned long)sb->events_lo); - - printk(KERN_INFO); - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - - desc = sb->disks + i; - if (desc->number || desc->major || desc->minor || - desc->raid_disk || (desc->state && (desc->state != 4))) { - printk(" D %2d: ", i); - print_desc(desc); - } - } - printk(KERN_INFO "md: THIS: "); - print_desc(&sb->this_disk); - -} - -static void print_rdev(mdk_rdev_t *rdev) -{ - printk(KERN_INFO "md: rdev %s: O:%s, SZ:%08ld F:%d DN:%d ", - partition_name(rdev->dev), partition_name(rdev->old_dev), - rdev->size, rdev->faulty, rdev->desc_nr); - if (rdev->sb) { - printk(KERN_INFO "md: rdev superblock:\n"); - print_sb(rdev->sb); - } else - printk(KERN_INFO "md: no rdev superblock!\n"); -} - -void md_print_devices(void) -{ - struct md_list_head *tmp, *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev; - - printk("\n"); - printk("md: **********************************\n"); - printk("md: * *\n"); - printk("md: **********************************\n"); - ITERATE_MDDEV(mddev,tmp) { - printk("md%d: ", mdidx(mddev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - printk("<%s>", partition_name(rdev->dev)); - - if (mddev->sb) { - printk(" array superblock:\n"); - print_sb(mddev->sb); - } else - printk(" no array superblock.\n"); - - ITERATE_RDEV(mddev,rdev,tmp2) - print_rdev(rdev); - } - printk("md: **********************************\n"); - printk("\n"); -} - -static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - int ret; - mdp_super_t *tmp1, *tmp2; - - tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); - tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); - - if (!tmp1 || !tmp2) { - ret = 0; - printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); - goto abort; - } - - *tmp1 = *sb1; - *tmp2 = *sb2; - - /* - * nr_disks is not constant - */ - tmp1->nr_disks = 0; - tmp2->nr_disks = 0; - - if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) - ret = 0; - else - ret = 1; - -abort: - if (tmp1) - kfree(tmp1); - if (tmp2) - kfree(tmp2); - - return ret; -} - -static int uuid_equal(mdk_rdev_t *rdev1, mdk_rdev_t *rdev2) -{ - if ( (rdev1->sb->set_uuid0 == rdev2->sb->set_uuid0) && - (rdev1->sb->set_uuid1 == rdev2->sb->set_uuid1) && - (rdev1->sb->set_uuid2 == rdev2->sb->set_uuid2) && - (rdev1->sb->set_uuid3 == rdev2->sb->set_uuid3)) - - return 1; - - return 0; -} - -static mdk_rdev_t * find_rdev_all(kdev_t dev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - list_for_each(tmp, &all_raid_disks) { - rdev = md_list_entry(tmp, mdk_rdev_t, all); - if (rdev->dev == dev) - return rdev; - } - return NULL; -} - -#define GETBLK_FAILED KERN_ERR \ -"md: getblk failed for device %s\n" - -static int write_disk_sb(mdk_rdev_t * rdev) -{ - kdev_t dev; - unsigned long sb_offset, size; - - if (!rdev->sb) { - MD_BUG(); - return 1; - } - if (rdev->faulty) { - MD_BUG(); - return 1; - } - if (rdev->sb->md_magic != MD_SB_MAGIC) { - MD_BUG(); - return 1; - } - - dev = rdev->dev; - sb_offset = calc_dev_sboffset(dev, rdev->mddev, 1); - if (rdev->sb_offset != sb_offset) { - printk(KERN_INFO "%s's sb offset has changed from %ld to %ld, skipping\n", - partition_name(dev), rdev->sb_offset, sb_offset); - goto skip; - } - /* - * If the disk went offline meanwhile and it's just a spare, then - * its size has changed to zero silently, and the MD code does - * not yet know that it's faulty. - */ - size = calc_dev_size(dev, rdev->mddev, 1); - if (size != rdev->size) { - printk(KERN_INFO "%s's size has changed from %ld to %ld since import, skipping\n", - partition_name(dev), rdev->size, size); - goto skip; - } - - printk(KERN_INFO "(write) %s's sb offset: %ld\n", partition_name(dev), sb_offset); - - if (!sync_page_io(dev, sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE)) { - printk("md: write_disk_sb failed for device %s\n", partition_name(dev)); - return 1; - } -skip: - return 0; -} -#undef GETBLK_FAILED - -static void set_this_disk(mddev_t *mddev, mdk_rdev_t *rdev) -{ - int i, ok = 0; - mdp_disk_t *desc; - - for (i = 0; i < MD_SB_DISKS; i++) { - desc = mddev->sb->disks + i; -#if 0 - if (disk_faulty(desc)) { - if (MKDEV(desc->major,desc->minor) == rdev->dev) - ok = 1; - continue; - } -#endif - if (MKDEV(desc->major,desc->minor) == rdev->dev) { - rdev->sb->this_disk = *desc; - rdev->desc_nr = desc->number; - ok = 1; - break; - } - } - - if (!ok) { - MD_BUG(); - } -} - -static int sync_sbs(mddev_t * mddev) -{ - mdk_rdev_t *rdev; - mdp_super_t *sb; - struct md_list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty || rdev->alias_device) - continue; - sb = rdev->sb; - *sb = *mddev->sb; - set_this_disk(mddev, rdev); - sb->sb_csum = calc_sb_csum(sb); - } - return 0; -} - -int md_update_sb(mddev_t * mddev) -{ - int err, count = 100; - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - if (!mddev->sb_dirty) { - printk("hm, md_update_sb() called without ->sb_dirty == 1, from %p.\n", __builtin_return_address(0)); - return 0; - } - mddev->sb_dirty = 0; -repeat: - mddev->sb->utime = CURRENT_TIME; - if ((++mddev->sb->events_lo)==0) - ++mddev->sb->events_hi; - - if ((mddev->sb->events_lo|mddev->sb->events_hi)==0) { - /* - * oops, this 64-bit counter should never wrap. - * Either we are in around ~1 trillion A.C., assuming - * 1 reboot per second, or we have a bug: - */ - MD_BUG(); - mddev->sb->events_lo = mddev->sb->events_hi = 0xffffffff; - } - sync_sbs(mddev); - - /* - * do not write anything to disk if using - * nonpersistent superblocks - */ - if (mddev->sb->not_persistent) - return 0; - - printk(KERN_INFO "md: updating md%d RAID superblock on device\n", - mdidx(mddev)); - - err = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - printk(KERN_INFO "md: "); - if (rdev->faulty) - printk("(skipping faulty "); - if (rdev->alias_device) - printk("(skipping alias "); - if (!rdev->faulty && disk_faulty(&rdev->sb->this_disk)) { - printk("(skipping new-faulty %s )\n", - partition_name(rdev->dev)); - continue; - } - printk("%s ", partition_name(rdev->dev)); - if (!rdev->faulty && !rdev->alias_device) { - printk("[events: %08lx]", - (unsigned long)rdev->sb->events_lo); - err += write_disk_sb(rdev); - } else - printk(")\n"); - } - if (err) { - if (--count) { - printk(KERN_ERR "md: errors occurred during superblock update, repeating\n"); - goto repeat; - } - printk(KERN_ERR "md: excessive errors occurred during superblock update, exiting\n"); - } - return 0; -} - -/* - * Import a device. If 'on_disk', then sanity check the superblock - * - * mark the device faulty if: - * - * - the device is nonexistent (zero size) - * - the device has no valid superblock - * - */ -static int md_import_device(kdev_t newdev, int on_disk) -{ - int err; - mdk_rdev_t *rdev; - unsigned int size; - - if (find_rdev_all(newdev)) - return -EEXIST; - - rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL); - if (!rdev) { - printk(KERN_ERR "md: could not alloc mem for %s!\n", partition_name(newdev)); - return -ENOMEM; - } - memset(rdev, 0, sizeof(*rdev)); - - if (is_mounted(newdev)) { - printk(KERN_WARNING "md: can not import %s, has active inodes!\n", - partition_name(newdev)); - err = -EBUSY; - goto abort_free; - } - - if ((err = alloc_disk_sb(rdev))) - goto abort_free; - - rdev->dev = newdev; - if (lock_rdev(rdev)) { - printk(KERN_ERR "md: could not lock %s, zero-size? Marking faulty.\n", - partition_name(newdev)); - err = -EINVAL; - goto abort_free; - } - rdev->desc_nr = -1; - rdev->faulty = 0; - - size = 0; - if (blk_size[MAJOR(newdev)]) - size = blk_size[MAJOR(newdev)][MINOR(newdev)]; - if (!size) { - printk(KERN_WARNING "md: %s has zero size, marking faulty!\n", - partition_name(newdev)); - err = -EINVAL; - goto abort_free; - } - - if (on_disk) { - if ((err = read_disk_sb(rdev))) { - printk(KERN_WARNING "md: could not read %s's sb, not importing!\n", - partition_name(newdev)); - goto abort_free; - } - if ((err = check_disk_sb(rdev))) { - printk(KERN_WARNING "md: %s has invalid sb, not importing!\n", - partition_name(newdev)); - goto abort_free; - } - - if (rdev->sb->level != -4) { - rdev->old_dev = MKDEV(rdev->sb->this_disk.major, - rdev->sb->this_disk.minor); - rdev->desc_nr = rdev->sb->this_disk.number; - } else { - rdev->old_dev = MKDEV(0, 0); - rdev->desc_nr = -1; - } - } - md_list_add(&rdev->all, &all_raid_disks); - MD_INIT_LIST_HEAD(&rdev->pending); - INIT_LIST_HEAD(&rdev->same_set); - - return 0; - -abort_free: - if (rdev->sb) { - if (rdev->bdev) - unlock_rdev(rdev); - free_disk_sb(rdev); - } - kfree(rdev); - return err; -} - -/* - * Check a full RAID array for plausibility - */ - -#define INCONSISTENT KERN_ERR \ -"md: fatal superblock inconsistency in %s -- removing from array\n" - -#define OUT_OF_DATE KERN_ERR \ -"md: superblock update time inconsistency -- using the most recent one\n" - -#define OLD_VERSION KERN_ALERT \ -"md: md%d: unsupported raid array version %d.%d.%d\n" - -#define NOT_CLEAN_IGNORE KERN_ERR \ -"md: md%d: raid array is not clean -- starting background reconstruction\n" - -#define UNKNOWN_LEVEL KERN_ERR \ -"md: md%d: unsupported raid level %d\n" - -static int analyze_sbs(mddev_t * mddev) -{ - int out_of_date = 0, i, first; - struct md_list_head *tmp, *tmp2; - mdk_rdev_t *rdev, *rdev2, *freshest; - mdp_super_t *sb; - - /* - * Verify the RAID superblock on each real device - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) { - MD_BUG(); - goto abort; - } - if (!rdev->sb) { - MD_BUG(); - goto abort; - } - if (check_disk_sb(rdev)) - goto abort; - } - - /* - * The superblock constant part has to be the same - * for all disks in the array. - */ - sb = NULL; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!sb) { - sb = rdev->sb; - continue; - } - if (!sb_equal(sb, rdev->sb)) { - printk(INCONSISTENT, partition_name(rdev->dev)); - kick_rdev_from_array(rdev); - continue; - } - } - - /* - * OK, we have all disks and the array is ready to run. Let's - * find the freshest superblock, that one will be the superblock - * that represents the whole array. - */ - if (!mddev->sb) - if (alloc_array_sb(mddev)) - goto abort; - sb = mddev->sb; - freshest = NULL; - - ITERATE_RDEV(mddev,rdev,tmp) { - __u64 ev1, ev2; - /* - * if the checksum is invalid, use the superblock - * only as a last resort. (decrease it's age by - * one event) - */ - if (calc_sb_csum(rdev->sb) != rdev->sb->sb_csum) { - if (rdev->sb->events_lo || rdev->sb->events_hi) - if ((rdev->sb->events_lo--)==0) - rdev->sb->events_hi--; - } - - printk(KERN_INFO "md: %s's event counter: %08lx\n", - partition_name(rdev->dev), - (unsigned long)rdev->sb->events_lo); - if (!freshest) { - freshest = rdev; - continue; - } - /* - * Find the newest superblock version - */ - ev1 = md_event(rdev->sb); - ev2 = md_event(freshest->sb); - if (ev1 != ev2) { - out_of_date = 1; - if (ev1 > ev2) - freshest = rdev; - } - } - if (out_of_date) { - printk(OUT_OF_DATE); - printk(KERN_INFO "md: freshest: %s\n", partition_name(freshest->dev)); - } - memcpy (sb, freshest->sb, sizeof(*sb)); - - /* - * at this point we have picked the 'best' superblock - * from all available superblocks. - * now we validate this superblock and kick out possibly - * failed disks. - */ - ITERATE_RDEV(mddev,rdev,tmp) { - /* - * Kick all non-fresh devices - */ - __u64 ev1, ev2; - ev1 = md_event(rdev->sb); - ev2 = md_event(sb); - ++ev1; - if (ev1 < ev2) { - printk(KERN_WARNING "md: kicking non-fresh %s from array!\n", - partition_name(rdev->dev)); - kick_rdev_from_array(rdev); - continue; - } - } - - /* - * Fix up changed device names ... but only if this disk has a - * recent update time. Use faulty checksum ones too. - */ - if (mddev->sb->level != -4) - ITERATE_RDEV(mddev,rdev,tmp) { - __u64 ev1, ev2, ev3; - if (rdev->faulty || rdev->alias_device) { - MD_BUG(); - goto abort; - } - ev1 = md_event(rdev->sb); - ev2 = md_event(sb); - ev3 = ev2; - --ev3; - if ((rdev->dev != rdev->old_dev) && - ((ev1 == ev2) || (ev1 == ev3))) { - mdp_disk_t *desc; - - printk(KERN_WARNING "md: device name has changed from %s to %s since last import!\n", - partition_name(rdev->old_dev), partition_name(rdev->dev)); - if (rdev->desc_nr == -1) { - MD_BUG(); - goto abort; - } - desc = &sb->disks[rdev->desc_nr]; - if (rdev->old_dev != MKDEV(desc->major, desc->minor)) { - MD_BUG(); - goto abort; - } - desc->major = MAJOR(rdev->dev); - desc->minor = MINOR(rdev->dev); - desc = &rdev->sb->this_disk; - desc->major = MAJOR(rdev->dev); - desc->minor = MINOR(rdev->dev); - } - } - - /* - * Remove unavailable and faulty devices ... - * - * note that if an array becomes completely unrunnable due to - * missing devices, we do not write the superblock back, so the - * administrator has a chance to fix things up. The removal thus - * only happens if it's nonfatal to the contents of the array. - */ - for (i = 0; i < MD_SB_DISKS; i++) { - int found; - mdp_disk_t *desc; - kdev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - /* - * We kick faulty devices/descriptors immediately. - * - * Note: multipath devices are a special case. Since we - * were able to read the superblock on the path, we don't - * care if it was previously marked as faulty, it's up now - * so enable it. - */ - if (disk_faulty(desc) && mddev->sb->level != -4) { - found = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr != desc->number) - continue; - printk(KERN_WARNING "md%d: kicking faulty %s!\n", - mdidx(mddev),partition_name(rdev->dev)); - kick_rdev_from_array(rdev); - found = 1; - break; - } - if (!found) { - if (dev == MKDEV(0,0)) - continue; - printk(KERN_WARNING "md%d: removing former faulty %s!\n", - mdidx(mddev), partition_name(dev)); - } - remove_descriptor(desc, sb); - continue; - } else if (disk_faulty(desc)) { - /* - * multipath entry marked as faulty, unfaulty it - */ - rdev = find_rdev(mddev, dev); - if(rdev) - mark_disk_spare(desc); - else - remove_descriptor(desc, sb); - } - - if (dev == MKDEV(0,0)) - continue; - /* - * Is this device present in the rdev ring? - */ - found = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - /* - * Multi-path IO special-case: since we have no - * this_disk descriptor at auto-detect time, - * we cannot check rdev->number. - * We can check the device though. - */ - if ((sb->level == -4) && (rdev->dev == - MKDEV(desc->major,desc->minor))) { - found = 1; - break; - } - if (rdev->desc_nr == desc->number) { - found = 1; - break; - } - } - if (found) - continue; - - printk(KERN_WARNING "md%d: former device %s is unavailable, removing from array!\n", - mdidx(mddev), partition_name(dev)); - remove_descriptor(desc, sb); - } - - /* - * Double check wether all devices mentioned in the - * superblock are in the rdev ring. - */ - first = 1; - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - kdev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (dev == MKDEV(0,0)) - continue; - - if (disk_faulty(desc)) { - MD_BUG(); - goto abort; - } - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - goto abort; - } - /* - * In the case of Multipath-IO, we have no - * other information source to find out which - * disk is which, only the position of the device - * in the superblock: - */ - if (mddev->sb->level == -4) { - if ((rdev->desc_nr != -1) && (rdev->desc_nr != i)) { - MD_BUG(); - goto abort; - } - rdev->desc_nr = i; - if (!first) - rdev->alias_device = 1; - else - first = 0; - } - } - - /* - * Kick all rdevs that are not in the - * descriptor array: - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == -1) - kick_rdev_from_array(rdev); - } - - /* - * Do a final reality check. - */ - if (mddev->sb->level != -4) { - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == -1) { - MD_BUG(); - goto abort; - } - /* - * is the desc_nr unique? - */ - ITERATE_RDEV(mddev,rdev2,tmp2) { - if ((rdev2 != rdev) && - (rdev2->desc_nr == rdev->desc_nr)) { - MD_BUG(); - goto abort; - } - } - /* - * is the device unique? - */ - ITERATE_RDEV(mddev,rdev2,tmp2) { - if ((rdev2 != rdev) && - (rdev2->dev == rdev->dev)) { - MD_BUG(); - goto abort; - } - } - } - } - - /* - * Check if we can support this RAID array - */ - if (sb->major_version != MD_MAJOR_VERSION || - sb->minor_version > MD_MINOR_VERSION) { - - printk(OLD_VERSION, mdidx(mddev), sb->major_version, - sb->minor_version, sb->patch_version); - goto abort; - } - - if ((sb->state != (1 << MD_SB_CLEAN)) && ((sb->level == 1) || - (sb->level == 4) || (sb->level == 5))) - printk(NOT_CLEAN_IGNORE, mdidx(mddev)); - - return 0; -abort: - return 1; -} - -#undef INCONSISTENT -#undef OUT_OF_DATE -#undef OLD_VERSION -#undef OLD_LEVEL - -static int device_size_calculation(mddev_t * mddev) -{ - int data_disks = 0, persistent; - unsigned int readahead; - mdp_super_t *sb = mddev->sb; - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - /* - * Do device size calculation. Bail out if too small. - * (we have to do this after having validated chunk_size, - * because device size has to be modulo chunk_size) - */ - persistent = !mddev->sb->not_persistent; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size) { - MD_BUG(); - continue; - } - rdev->size = calc_dev_size(rdev->dev, mddev, persistent); - if (rdev->size < sb->chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size: %ldk < %dk\n", - partition_name(rdev->dev), - rdev->size, sb->chunk_size / 1024); - return -EINVAL; - } - } - - switch (sb->level) { - case -4: - data_disks = 1; - break; - case -3: - data_disks = 1; - break; - case -2: - data_disks = 1; - break; - case -1: - zoned_raid_size(mddev); - data_disks = 1; - break; - case 0: - zoned_raid_size(mddev); - data_disks = sb->raid_disks; - break; - case 1: - data_disks = 1; - break; - case 4: - case 5: - data_disks = sb->raid_disks-1; - break; - default: - printk(UNKNOWN_LEVEL, mdidx(mddev), sb->level); - goto abort; - } - if (!md_size[mdidx(mddev)]) - md_size[mdidx(mddev)] = sb->size * data_disks; - - readahead = MD_READAHEAD; - if ((sb->level == 0) || (sb->level == 4) || (sb->level == 5)) { - readahead = (mddev->sb->chunk_size>>PAGE_SHIFT) * 4 * data_disks; - if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) - readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; - } else { - // (no multipath branch - it uses the default setting) - if (sb->level == -3) - readahead = 0; - } - - printk(KERN_INFO "md%d: max total readahead window set to %ldk\n", - mdidx(mddev), readahead*(PAGE_SIZE/1024)); - - printk(KERN_INFO - "md%d: %d data-disks, max readahead per data-disk: %ldk\n", - mdidx(mddev), data_disks, readahead/data_disks*(PAGE_SIZE/1024)); - return 0; -abort: - return 1; -} - - -#define TOO_BIG_CHUNKSIZE KERN_ERR \ -"too big chunk_size: %d > %d\n" - -#define TOO_SMALL_CHUNKSIZE KERN_ERR \ -"too small chunk_size: %d < %ld\n" - -#define BAD_CHUNKSIZE KERN_ERR \ -"no chunksize specified, see 'man raidtab'\n" - -static int do_md_run(mddev_t * mddev) -{ - int pnum, err; - int chunk_size; - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return -EINVAL; - } - - if (mddev->pers) - return -EBUSY; - - /* - * Resize disks to align partitions size on a given - * chunk size. - */ - md_size[mdidx(mddev)] = 0; - - /* - * Analyze all RAID superblock(s) - */ - if (analyze_sbs(mddev)) { - MD_BUG(); - return -EINVAL; - } - - chunk_size = mddev->sb->chunk_size; - pnum = level_to_pers(mddev->sb->level); - - if ((pnum != MULTIPATH) && (pnum != RAID1)) { - if (!chunk_size) { - /* - * 'default chunksize' in the old md code used to - * be PAGE_SIZE, baaad. - * we abort here to be on the safe side. We dont - * want to continue the bad practice. - */ - printk(BAD_CHUNKSIZE); - return -EINVAL; - } - if (chunk_size > MAX_CHUNK_SIZE) { - printk(TOO_BIG_CHUNKSIZE, chunk_size, MAX_CHUNK_SIZE); - return -EINVAL; - } - /* - * chunk-size has to be a power of 2 and multiples of PAGE_SIZE - */ - if ( (1 << ffz(~chunk_size)) != chunk_size) { - MD_BUG(); - return -EINVAL; - } - if (chunk_size < PAGE_SIZE) { - printk(TOO_SMALL_CHUNKSIZE, chunk_size, PAGE_SIZE); - return -EINVAL; - } - } else - if (chunk_size) - printk(KERN_INFO "md: RAID level %d does not need chunksize! Continuing anyway.\n", - mddev->sb->level); - - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - if (!pers[pnum]) - { -#ifdef CONFIG_KMOD - char module_name[80]; - sprintf (module_name, "md-personality-%d", pnum); - request_module (module_name); - if (!pers[pnum]) -#endif - { - printk(KERN_ERR "md: personality %d is not loaded!\n", - pnum); - return -EINVAL; - } - } - - if (device_size_calculation(mddev)) - return -EINVAL; - - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md - * device. - * Also find largest hardsector size - */ - md_hardsect_sizes[mdidx(mddev)] = 512; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - invalidate_device(rdev->dev, 1); - if (get_hardsect_size(rdev->dev) - > md_hardsect_sizes[mdidx(mddev)]) - md_hardsect_sizes[mdidx(mddev)] = - get_hardsect_size(rdev->dev); - } - md_blocksizes[mdidx(mddev)] = 1024; - if (md_blocksizes[mdidx(mddev)] < md_hardsect_sizes[mdidx(mddev)]) - md_blocksizes[mdidx(mddev)] = md_hardsect_sizes[mdidx(mddev)]; - mddev->pers = pers[pnum]; - - blk_queue_make_request(&mddev->queue, mddev->pers->make_request); - mddev->queue.queuedata = mddev; - - err = mddev->pers->run(mddev); - if (err) { - printk(KERN_ERR "md: pers->run() failed ...\n"); - mddev->pers = NULL; - return -EINVAL; - } - - mddev->sb->state &= ~(1 << MD_SB_CLEAN); - mddev->sb_dirty = 1; - md_update_sb(mddev); - - /* - * md_size has units of 1K blocks, which are - * twice as large as sectors. - */ - md_hd_struct[mdidx(mddev)].start_sect = 0; - register_disk(&md_gendisk, MKDEV(MAJOR_NR,mdidx(mddev)), - 1, &md_fops, md_size[mdidx(mddev)]<<1); - - read_ahead[MD_MAJOR] = 1024; - return (0); -} - -#undef TOO_BIG_CHUNKSIZE -#undef BAD_CHUNKSIZE - -static int restart_array(mddev_t *mddev) -{ - int err; - - /* - * Complain if it has no devices - */ - err = -ENXIO; - if (list_empty(&mddev->disks)) - goto out; - - if (mddev->pers) { - err = -EBUSY; - if (!mddev->ro) - goto out; - - mddev->ro = 0; - set_device_ro(mddev_to_kdev(mddev), 0); - - printk(KERN_INFO - "md: md%d switched to read-write mode.\n", mdidx(mddev)); - /* - * Kick recovery or resync if necessary - */ - md_recover_arrays(); - if (mddev->pers->restart_resync) - mddev->pers->restart_resync(mddev); - err = 0; - } else { - printk(KERN_ERR "md: md%d has no personality assigned.\n", - mdidx(mddev)); - err = -EINVAL; - } - -out: - return err; -} - -#define STILL_MOUNTED KERN_WARNING \ -"md: md%d still mounted.\n" -#define STILL_IN_USE \ -"md: md%d still in use.\n" - -static int do_md_stop(mddev_t * mddev, int ro) -{ - int err = 0, resync_interrupted = 0; - kdev_t dev = mddev_to_kdev(mddev); - - if (atomic_read(&mddev->active)>1) { - printk(STILL_IN_USE, mdidx(mddev)); - err = -EBUSY; - goto out; - } - - if (mddev->pers) { - /* - * It is safe to call stop here, it only frees private - * data. Also, it tells us if a device is unstoppable - * (eg. resyncing is in progress) - */ - if (mddev->pers->stop_resync) - if (mddev->pers->stop_resync(mddev)) - resync_interrupted = 1; - - if (mddev->recovery_running) - md_interrupt_thread(md_recovery_thread); - - /* - * This synchronizes with signal delivery to the - * resync or reconstruction thread. It also nicely - * hangs the process if some reconstruction has not - * finished. - */ - down(&mddev->recovery_sem); - up(&mddev->recovery_sem); - - invalidate_device(dev, 1); - - if (ro) { - err = -ENXIO; - if (mddev->ro) - goto out; - mddev->ro = 1; - } else { - if (mddev->ro) - set_device_ro(dev, 0); - if (mddev->pers->stop(mddev)) { - err = -EBUSY; - if (mddev->ro) - set_device_ro(dev, 1); - goto out; - } - if (mddev->ro) - mddev->ro = 0; - } - if (mddev->sb) { - /* - * mark it clean only if there was no resync - * interrupted. - */ - if (!mddev->recovery_running && !resync_interrupted) { - printk(KERN_INFO "md: marking sb clean...\n"); - mddev->sb->state |= 1 << MD_SB_CLEAN; - } - mddev->sb_dirty = 1; - md_update_sb(mddev); - } - if (ro) - set_device_ro(dev, 1); - } - - /* - * Free resources if final stop - */ - if (!ro) { - printk(KERN_INFO "md: md%d stopped.\n", mdidx(mddev)); - free_mddev(mddev); - } else - printk(KERN_INFO "md: md%d switched to read-only mode.\n", mdidx(mddev)); - err = 0; -out: - return err; -} - -/* - * We have to safely support old arrays too. - */ -int detect_old_array(mdp_super_t *sb) -{ - if (sb->major_version > 0) - return 0; - if (sb->minor_version >= 90) - return 0; - - return -EINVAL; -} - - -static void autorun_array(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct md_list_head *tmp; - int err; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return; - } - - printk(KERN_INFO "md: running: "); - - ITERATE_RDEV(mddev,rdev,tmp) { - printk("<%s>", partition_name(rdev->dev)); - } - printk("\n"); - - err = do_md_run (mddev); - if (err) { - printk(KERN_WARNING "md :do_md_run() returned %d\n", err); - /* - * prevent the writeback of an unrunnable array - */ - mddev->sb_dirty = 0; - do_md_stop (mddev, 0); - } -} - -/* - * lets try to run arrays based on all disks that have arrived - * until now. (those are in the ->pending list) - * - * the method: pick the first pending disk, collect all disks with - * the same UUID, remove all from the pending list and put them into - * the 'same_array' list. Then order this list based on superblock - * update time (freshest comes first), kick out 'old' disks and - * compare superblocks. If everything's fine then run it. - * - * If "unit" is allocated, then bump its reference count - */ -static void autorun_devices(kdev_t countdev) -{ - struct md_list_head candidates; - struct md_list_head *tmp; - mdk_rdev_t *rdev0, *rdev; - mddev_t *mddev; - kdev_t md_kdev; - - - printk(KERN_INFO "md: autorun ...\n"); - while (!list_empty(&pending_raid_disks)) { - rdev0 = md_list_entry(pending_raid_disks.next, - mdk_rdev_t, pending); - - printk(KERN_INFO "md: considering %s ...\n", partition_name(rdev0->dev)); - MD_INIT_LIST_HEAD(&candidates); - ITERATE_RDEV_PENDING(rdev,tmp) { - if (uuid_equal(rdev0, rdev)) { - if (!sb_equal(rdev0->sb, rdev->sb)) { - printk(KERN_WARNING - "md: %s has same UUID as %s, but superblocks differ ...\n", - partition_name(rdev->dev), partition_name(rdev0->dev)); - continue; - } - printk(KERN_INFO "md: adding %s ...\n", partition_name(rdev->dev)); - md_list_del(&rdev->pending); - md_list_add(&rdev->pending, &candidates); - } - } - /* - * now we have a set of devices, with all of them having - * mostly sane superblocks. It's time to allocate the - * mddev. - */ - md_kdev = MKDEV(MD_MAJOR, rdev0->sb->md_minor); - mddev = kdev_to_mddev(md_kdev); - if (mddev) { - printk(KERN_WARNING "md: md%d already running, cannot run %s\n", - mdidx(mddev), partition_name(rdev0->dev)); - ITERATE_RDEV_GENERIC(candidates,pending,rdev,tmp) - export_rdev(rdev); - continue; - } - mddev = alloc_mddev(md_kdev); - if (!mddev) { - printk(KERN_ERR "md: cannot allocate memory for md drive.\n"); - break; - } - if (md_kdev == countdev) - atomic_inc(&mddev->active); - printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); - ITERATE_RDEV_GENERIC(candidates,pending,rdev,tmp) { - bind_rdev_to_array(rdev, mddev); - list_del_init(&rdev->pending); - } - autorun_array(mddev); - } - printk(KERN_INFO "md: ... autorun DONE.\n"); -} - -/* - * import RAID devices based on one partition - * if possible, the array gets run as well. - */ - -#define BAD_VERSION KERN_ERR \ -"md: %s has RAID superblock version 0.%d, autodetect needs v0.90 or higher\n" - -#define OUT_OF_MEM KERN_ALERT \ -"md: out of memory.\n" - -#define NO_DEVICE KERN_ERR \ -"md: disabled device %s\n" - -#define AUTOADD_FAILED KERN_ERR \ -"md: auto-adding devices to md%d FAILED (error %d).\n" - -#define AUTOADD_FAILED_USED KERN_ERR \ -"md: cannot auto-add device %s to md%d, already used.\n" - -#define AUTORUN_FAILED KERN_ERR \ -"md: auto-running md%d FAILED (error %d).\n" - -#define MDDEV_BUSY KERN_ERR \ -"md: cannot auto-add to md%d, already running.\n" - -#define AUTOADDING KERN_INFO \ -"md: auto-adding devices to md%d, based on %s's superblock.\n" - -#define AUTORUNNING KERN_INFO \ -"md: auto-running md%d.\n" - -static int autostart_array(kdev_t startdev, kdev_t countdev) -{ - int err = -EINVAL, i; - mdp_super_t *sb = NULL; - mdk_rdev_t *start_rdev = NULL, *rdev; - - if (md_import_device(startdev, 1)) { - printk(KERN_WARNING "md: could not import %s!\n", partition_name(startdev)); - goto abort; - } - - start_rdev = find_rdev_all(startdev); - if (!start_rdev) { - MD_BUG(); - goto abort; - } - if (start_rdev->faulty) { - printk(KERN_WARNING "md: can not autostart based on faulty %s!\n", - partition_name(startdev)); - goto abort; - } - md_list_add(&start_rdev->pending, &pending_raid_disks); - - sb = start_rdev->sb; - - err = detect_old_array(sb); - if (err) { - printk(KERN_WARNING "md: array version is too old to be autostarted ," - "use raidtools 0.90 mkraid --upgrade to upgrade the array " - "without data loss!\n"); - goto abort; - } - - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - kdev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (dev == MKDEV(0,0)) - continue; - if (dev == startdev) - continue; - if (md_import_device(dev, 1)) { - printk(KERN_WARNING "md: could not import %s, trying to run array nevertheless.\n", - partition_name(dev)); - continue; - } - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - goto abort; - } - md_list_add(&rdev->pending, &pending_raid_disks); - } - - /* - * possibly return codes - */ - autorun_devices(countdev); - return 0; - -abort: - if (start_rdev) - export_rdev(start_rdev); - return err; -} - -#undef BAD_VERSION -#undef OUT_OF_MEM -#undef NO_DEVICE -#undef AUTOADD_FAILED_USED -#undef AUTOADD_FAILED -#undef AUTORUN_FAILED -#undef AUTOADDING -#undef AUTORUNNING - - -static int get_version(void * arg) -{ - mdu_version_t ver; - - ver.major = MD_MAJOR_VERSION; - ver.minor = MD_MINOR_VERSION; - ver.patchlevel = MD_PATCHLEVEL_VERSION; - - if (md_copy_to_user(arg, &ver, sizeof(ver))) - return -EFAULT; - - return 0; -} - -#define SET_FROM_SB(x) info.x = mddev->sb->x -static int get_array_info(mddev_t * mddev, void * arg) -{ - mdu_array_info_t info; - - if (!mddev->sb) { - MD_BUG(); - return -EINVAL; - } - - SET_FROM_SB(major_version); - SET_FROM_SB(minor_version); - SET_FROM_SB(patch_version); - SET_FROM_SB(ctime); - SET_FROM_SB(level); - SET_FROM_SB(size); - SET_FROM_SB(nr_disks); - SET_FROM_SB(raid_disks); - SET_FROM_SB(md_minor); - SET_FROM_SB(not_persistent); - - SET_FROM_SB(utime); - SET_FROM_SB(state); - SET_FROM_SB(active_disks); - SET_FROM_SB(working_disks); - SET_FROM_SB(failed_disks); - SET_FROM_SB(spare_disks); - - SET_FROM_SB(layout); - SET_FROM_SB(chunk_size); - - if (md_copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} -#undef SET_FROM_SB - -#define SET_FROM_SB(x) info.x = mddev->sb->disks[nr].x -static int get_disk_info(mddev_t * mddev, void * arg) -{ - mdu_disk_info_t info; - unsigned int nr; - - if (!mddev->sb) - return -EINVAL; - - if (md_copy_from_user(&info, arg, sizeof(info))) - return -EFAULT; - - nr = info.number; - if (nr >= MD_SB_DISKS) - return -EINVAL; - - SET_FROM_SB(major); - SET_FROM_SB(minor); - SET_FROM_SB(raid_disk); - SET_FROM_SB(state); - - if (md_copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} -#undef SET_FROM_SB - -#define SET_SB(x) mddev->sb->disks[nr].x = info->x - -static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) -{ - int err, size, persistent; - mdk_rdev_t *rdev; - unsigned int nr; - kdev_t dev; - dev = MKDEV(info->major,info->minor); - - if (find_rdev_all(dev)) { - printk(KERN_WARNING "md: device %s already used in a RAID array!\n", - partition_name(dev)); - return -EBUSY; - } - if (!mddev->sb) { - /* expecting a device which has a superblock */ - err = md_import_device(dev, 1); - if (err) { - printk(KERN_WARNING "md: md_import_device returned %d\n", err); - return -EINVAL; - } - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - return -EINVAL; - } - if (!list_empty(&mddev->disks)) { - mdk_rdev_t *rdev0 = md_list_entry(mddev->disks.next, - mdk_rdev_t, same_set); - if (!uuid_equal(rdev0, rdev)) { - printk(KERN_WARNING "md: %s has different UUID to %s\n", - partition_name(rdev->dev), partition_name(rdev0->dev)); - export_rdev(rdev); - return -EINVAL; - } - if (!sb_equal(rdev0->sb, rdev->sb)) { - printk(KERN_WARNING "md: %s has same UUID but different superblock to %s\n", - partition_name(rdev->dev), partition_name(rdev0->dev)); - export_rdev(rdev); - return -EINVAL; - } - } - bind_rdev_to_array(rdev, mddev); - return 0; - } - - nr = info->number; - if (nr >= mddev->sb->nr_disks) { - MD_BUG(); - return -EINVAL; - } - - - SET_SB(number); - SET_SB(major); - SET_SB(minor); - SET_SB(raid_disk); - SET_SB(state); - - if ((info->state & (1<old_dev = dev; - rdev->desc_nr = info->number; - - bind_rdev_to_array(rdev, mddev); - - persistent = !mddev->sb->not_persistent; - if (!persistent) - printk(KERN_INFO "md: nonpersistent superblock ...\n"); - - size = calc_dev_size(dev, mddev, persistent); - rdev->sb_offset = calc_dev_sboffset(dev, mddev, persistent); - - if (!mddev->sb->size || (mddev->sb->size > size)) - mddev->sb->size = size; - } - - /* - * sync all other superblocks with the main superblock - */ - sync_sbs(mddev); - - return 0; -} -#undef SET_SB - -static int hot_generate_error(mddev_t * mddev, kdev_t dev) -{ - struct request_queue *q; - mdk_rdev_t *rdev; - mdp_disk_t *disk; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to generate %s error in md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - return -ENXIO; - } - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - disk = &mddev->sb->disks[rdev->desc_nr]; - if (!disk_active(disk)) - return -ENODEV; - - q = blk_get_queue(rdev->dev); - if (!q) { - MD_BUG(); - return -ENODEV; - } - printk(KERN_INFO "md: okay, generating error!\n"); -// q->oneshot_error = 1; // disabled for now - - return 0; -} - -static int hot_remove_disk(mddev_t * mddev, kdev_t dev) -{ - int err; - mdk_rdev_t *rdev; - mdp_disk_t *disk; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to remove %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (!mddev->pers->diskop) { - printk(KERN_WARNING "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - rdev = find_rdev(mddev, dev); - if (!rdev) - return -ENXIO; - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - disk = &mddev->sb->disks[rdev->desc_nr]; - if (disk_active(disk)) - goto busy; - - if (disk_removed(disk)) - return -EINVAL; - - err = mddev->pers->diskop(mddev, &disk, DISKOP_HOT_REMOVE_DISK); - if (err == -EBUSY) - goto busy; - - if (err) { - MD_BUG(); - return -EINVAL; - } - - remove_descriptor(disk, mddev->sb); - kick_rdev_from_array(rdev); - mddev->sb_dirty = 1; - md_update_sb(mddev); - - return 0; -busy: - printk(KERN_WARNING "md: cannot remove active disk %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - return -EBUSY; -} - -static int hot_add_disk(mddev_t * mddev, kdev_t dev) -{ - int i, err, persistent; - unsigned int size; - mdk_rdev_t *rdev; - mdp_disk_t *disk; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to hot-add %s to md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (!mddev->pers->diskop) { - printk(KERN_WARNING "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - persistent = !mddev->sb->not_persistent; - - rdev = find_rdev(mddev, dev); - if (rdev) - return -EBUSY; - - err = md_import_device (dev, 0); - if (err) { - printk(KERN_WARNING "md: error, md_import_device() returned %d\n", err); - return -EINVAL; - } - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - return -EINVAL; - } - if (rdev->faulty) { - printk(KERN_WARNING "md: can not hot-add faulty %s disk to md%d!\n", - partition_name(dev), mdidx(mddev)); - err = -EINVAL; - goto abort_export; - } - size = calc_dev_size(dev, mddev, persistent); - - if (size < mddev->sb->size) { - printk(KERN_WARNING "md%d: disk size %d blocks < array size %d\n", - mdidx(mddev), size, mddev->sb->size); - err = -ENOSPC; - goto abort_export; - } - bind_rdev_to_array(rdev, mddev); - - /* - * The rest should better be atomic, we can have disk failures - * noticed in interrupt contexts ... - */ - rdev->old_dev = dev; - rdev->size = size; - rdev->sb_offset = calc_dev_sboffset(dev, mddev, persistent); - - disk = mddev->sb->disks + mddev->sb->raid_disks; - for (i = mddev->sb->raid_disks; i < MD_SB_DISKS; i++) { - disk = mddev->sb->disks + i; - - if (!disk->major && !disk->minor) - break; - if (disk_removed(disk)) - break; - } - if (i == MD_SB_DISKS) { - printk(KERN_WARNING "md%d: can not hot-add to full array!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unbind_export; - } - - if (disk_removed(disk)) { - /* - * reuse slot - */ - if (disk->number != i) { - MD_BUG(); - err = -EINVAL; - goto abort_unbind_export; - } - } else { - disk->number = i; - } - - disk->raid_disk = disk->number; - disk->major = MAJOR(dev); - disk->minor = MINOR(dev); - - if (mddev->pers->diskop(mddev, &disk, DISKOP_HOT_ADD_DISK)) { - MD_BUG(); - err = -EINVAL; - goto abort_unbind_export; - } - - mark_disk_spare(disk); - mddev->sb->nr_disks++; - mddev->sb->spare_disks++; - mddev->sb->working_disks++; - - mddev->sb_dirty = 1; - md_update_sb(mddev); - - /* - * Kick recovery, maybe this spare has to be added to the - * array immediately. - */ - md_recover_arrays(); - - return 0; - -abort_unbind_export: - unbind_rdev_from_array(rdev); - -abort_export: - export_rdev(rdev); - return err; -} - -#define SET_SB(x) mddev->sb->x = info->x -static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) -{ - - if (alloc_array_sb(mddev)) - return -ENOMEM; - - mddev->sb->major_version = MD_MAJOR_VERSION; - mddev->sb->minor_version = MD_MINOR_VERSION; - mddev->sb->patch_version = MD_PATCHLEVEL_VERSION; - mddev->sb->ctime = CURRENT_TIME; - - SET_SB(level); - SET_SB(size); - SET_SB(nr_disks); - SET_SB(raid_disks); - SET_SB(md_minor); - SET_SB(not_persistent); - - SET_SB(state); - SET_SB(active_disks); - SET_SB(working_disks); - SET_SB(failed_disks); - SET_SB(spare_disks); - - SET_SB(layout); - SET_SB(chunk_size); - - mddev->sb->md_magic = MD_SB_MAGIC; - - /* - * Generate a 128 bit UUID - */ - get_random_bytes(&mddev->sb->set_uuid0, 4); - get_random_bytes(&mddev->sb->set_uuid1, 4); - get_random_bytes(&mddev->sb->set_uuid2, 4); - get_random_bytes(&mddev->sb->set_uuid3, 4); - - return 0; -} -#undef SET_SB - -static int set_disk_faulty(mddev_t *mddev, kdev_t dev) -{ - int ret; - - ret = md_error(mddev, dev); - return ret; -} - -static int md_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg) -{ - unsigned int minor; - int err = 0; - struct hd_geometry *loc = (struct hd_geometry *) arg; - mddev_t *mddev = NULL; - kdev_t dev; - - if (!md_capable_admin()) - return -EACCES; - - dev = inode->i_rdev; - minor = MINOR(dev); - if (minor >= MAX_MD_DEVS) { - MD_BUG(); - return -EINVAL; - } - - /* - * Commands dealing with the RAID driver but not any - * particular array: - */ - switch (cmd) - { - case RAID_VERSION: - err = get_version((void *)arg); - goto done; - - case PRINT_RAID_DEBUG: - err = 0; - md_print_devices(); - goto done_unlock; - -#ifndef MODULE - case RAID_AUTORUN: - err = 0; - autostart_arrays(); - goto done; -#endif - - case BLKGETSIZE: - case BLKGETSIZE64: - case BLKRAGET: - case BLKRASET: - case BLKFLSBUF: - case BLKBSZGET: - case BLKBSZSET: - err = blk_ioctl (dev, cmd, arg); - goto abort; - - default:; - } - - /* - * Commands creating/starting a new array: - */ - - mddev = kdev_to_mddev(dev); - - switch (cmd) - { - case SET_ARRAY_INFO: - case START_ARRAY: - if (mddev) { - printk(KERN_WARNING "md: array md%d already exists!\n", - mdidx(mddev)); - err = -EEXIST; - goto abort; - } - default:; - } - switch (cmd) - { - case SET_ARRAY_INFO: - mddev = alloc_mddev(dev); - if (!mddev) { - err = -ENOMEM; - goto abort; - } - atomic_inc(&mddev->active); - - /* - * alloc_mddev() should possibly self-lock. - */ - err = lock_mddev(mddev); - if (err) { - printk(KERN_WARNING "md: ioctl, reason %d, cmd %d\n", - err, cmd); - goto abort; - } - - if (mddev->sb) { - printk(KERN_WARNING "md: array md%d already has a superblock!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - if (arg) { - mdu_array_info_t info; - if (md_copy_from_user(&info, (void*)arg, sizeof(info))) { - err = -EFAULT; - goto abort_unlock; - } - err = set_array_info(mddev, &info); - if (err) { - printk(KERN_WARNING "md: couldnt set array info. %d\n", err); - goto abort_unlock; - } - } - goto done_unlock; - - case START_ARRAY: - /* - * possibly make it lock the array ... - */ - err = autostart_array((kdev_t)arg, dev); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name((kdev_t)arg)); - goto abort; - } - goto done; - - default:; - } - - /* - * Commands querying/configuring an existing array: - */ - - if (!mddev) { - err = -ENODEV; - goto abort; - } - err = lock_mddev(mddev); - if (err) { - printk(KERN_INFO "md: ioctl lock interrupted, reason %d, cmd %d\n",err, cmd); - goto abort; - } - /* if we don't have a superblock yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */ - if (!mddev->sb && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) { - err = -ENODEV; - goto abort_unlock; - } - - /* - * Commands even a read-only array can execute: - */ - switch (cmd) - { - case GET_ARRAY_INFO: - err = get_array_info(mddev, (void *)arg); - goto done_unlock; - - case GET_DISK_INFO: - err = get_disk_info(mddev, (void *)arg); - goto done_unlock; - - case RESTART_ARRAY_RW: - err = restart_array(mddev); - goto done_unlock; - - case STOP_ARRAY: - if (!(err = do_md_stop (mddev, 0))) - mddev = NULL; - goto done_unlock; - - case STOP_ARRAY_RO: - err = do_md_stop (mddev, 1); - goto done_unlock; - - /* - * We have a problem here : there is no easy way to give a CHS - * virtual geometry. We currently pretend that we have a 2 heads - * 4 sectors (with a BIG number of cylinders...). This drives - * dosfs just mad... ;-) - */ - case HDIO_GETGEO: - if (!loc) { - err = -EINVAL; - goto abort_unlock; - } - err = md_put_user (2, (char *) &loc->heads); - if (err) - goto abort_unlock; - err = md_put_user (4, (char *) &loc->sectors); - if (err) - goto abort_unlock; - err = md_put_user (md_hd_struct[mdidx(mddev)].nr_sects/8, - (short *) &loc->cylinders); - if (err) - goto abort_unlock; - err = md_put_user (md_hd_struct[minor].start_sect, - (long *) &loc->start); - goto done_unlock; - } - - /* - * The remaining ioctls are changing the state of the - * superblock, so we do not allow read-only arrays - * here: - */ - if (mddev->ro) { - err = -EROFS; - goto abort_unlock; - } - - switch (cmd) - { - case ADD_NEW_DISK: - { - mdu_disk_info_t info; - if (md_copy_from_user(&info, (void*)arg, sizeof(info))) - err = -EFAULT; - else - err = add_new_disk(mddev, &info); - goto done_unlock; - } - case HOT_GENERATE_ERROR: - err = hot_generate_error(mddev, (kdev_t)arg); - goto done_unlock; - case HOT_REMOVE_DISK: - err = hot_remove_disk(mddev, (kdev_t)arg); - goto done_unlock; - - case HOT_ADD_DISK: - err = hot_add_disk(mddev, (kdev_t)arg); - goto done_unlock; - - case SET_DISK_FAULTY: - err = set_disk_faulty(mddev, (kdev_t)arg); - goto done_unlock; - - case RUN_ARRAY: - { - err = do_md_run (mddev); - /* - * we have to clean up the mess if - * the array cannot be run for some - * reason ... - */ - if (err) { - mddev->sb_dirty = 0; - if (!do_md_stop (mddev, 0)) - mddev = NULL; - } - goto done_unlock; - } - - default: - printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, " - "upgrade your software to use new ictls.\n", - current->comm, current->pid); - err = -EINVAL; - goto abort_unlock; - } - -done_unlock: -abort_unlock: - if (mddev) - unlock_mddev(mddev); - - return err; -done: - if (err) - MD_BUG(); -abort: - return err; -} - -static int md_open(struct inode *inode, struct file *file) -{ - /* - * Always succeed, but increment the usage count - */ - mddev_t *mddev = kdev_to_mddev(inode->i_rdev); - if (mddev) - atomic_inc(&mddev->active); - return (0); -} - -static int md_release(struct inode *inode, struct file * file) -{ - mddev_t *mddev = kdev_to_mddev(inode->i_rdev); - if (mddev) - atomic_dec(&mddev->active); - return 0; -} - -static struct block_device_operations md_fops= -{ - owner: THIS_MODULE, - open: md_open, - release: md_release, - ioctl: md_ioctl, -}; - - -int md_thread(void * arg) -{ - mdk_thread_t *thread = arg; - - md_lock_kernel(); - - /* - * Detach thread - */ - - daemonize(); - - sprintf(current->comm, thread->name); - md_init_signals(); - md_flush_signals(); - thread->tsk = current; - - /* - * md_thread is a 'system-thread', it's priority should be very - * high. We avoid resource deadlocks individually in each - * raid personality. (RAID5 does preallocation) We also use RR and - * the very same RT priority as kswapd, thus we will never get - * into a priority inversion deadlock. - * - * we definitely have to have equal or higher priority than - * bdflush, otherwise bdflush will deadlock if there are too - * many dirty RAID5 blocks. - */ - current->policy = SCHED_OTHER; - current->nice = -20; - md_unlock_kernel(); - - complete(thread->event); - while (thread->run) { - void (*run)(void *data); - - wait_event_interruptible(thread->wqueue, - test_bit(THREAD_WAKEUP, &thread->flags)); - - clear_bit(THREAD_WAKEUP, &thread->flags); - - run = thread->run; - if (run) { - run(thread->data); - run_task_queue(&tq_disk); - } - if (md_signal_pending(current)) - md_flush_signals(); - } - complete(thread->event); - return 0; -} - -void md_wakeup_thread(mdk_thread_t *thread) -{ - dprintk("md: waking up MD thread %p.\n", thread); - set_bit(THREAD_WAKEUP, &thread->flags); - wake_up(&thread->wqueue); -} - -mdk_thread_t *md_register_thread(void (*run) (void *), - void *data, const char *name) -{ - mdk_thread_t *thread; - int ret; - struct completion event; - - thread = (mdk_thread_t *) kmalloc - (sizeof(mdk_thread_t), GFP_KERNEL); - if (!thread) - return NULL; - - memset(thread, 0, sizeof(mdk_thread_t)); - md_init_waitqueue_head(&thread->wqueue); - - init_completion(&event); - thread->event = &event; - thread->run = run; - thread->data = data; - thread->name = name; - ret = kernel_thread(md_thread, thread, 0); - if (ret < 0) { - kfree(thread); - return NULL; - } - wait_for_completion(&event); - return thread; -} - -void md_interrupt_thread(mdk_thread_t *thread) -{ - if (!thread->tsk) { - MD_BUG(); - return; - } - dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); - send_sig(SIGKILL, thread->tsk, 1); -} - -void md_unregister_thread(mdk_thread_t *thread) -{ - struct completion event; - - init_completion(&event); - - thread->event = &event; - thread->run = NULL; - thread->name = NULL; - md_interrupt_thread(thread); - wait_for_completion(&event); - kfree(thread); -} - -void md_recover_arrays(void) -{ - if (!md_recovery_thread) { - MD_BUG(); - return; - } - md_wakeup_thread(md_recovery_thread); -} - - -int md_error(mddev_t *mddev, kdev_t rdev) -{ - mdk_rdev_t * rrdev; - - dprintk("md_error dev:(%d:%d), rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", - MD_MAJOR,mdidx(mddev),MAJOR(rdev),MINOR(rdev), - __builtin_return_address(0),__builtin_return_address(1), - __builtin_return_address(2),__builtin_return_address(3)); - - if (!mddev) { - MD_BUG(); - return 0; - } - rrdev = find_rdev(mddev, rdev); - if (!rrdev || rrdev->faulty) - return 0; - if (!mddev->pers->error_handler - || mddev->pers->error_handler(mddev,rdev) <= 0) { - rrdev->faulty = 1; - } else - return 1; - /* - * if recovery was running, stop it now. - */ - if (mddev->pers->stop_resync) - mddev->pers->stop_resync(mddev); - if (mddev->recovery_running) - md_interrupt_thread(md_recovery_thread); - md_recover_arrays(); - - return 0; -} - -static void status_unused(struct seq_file *seq) -{ - int i = 0; - mdk_rdev_t *rdev; - struct md_list_head *tmp; - - seq_printf(seq, "unused devices: "); - - ITERATE_RDEV_ALL(rdev,tmp) { - if (list_empty(&rdev->same_set)) { - /* - * The device is not yet used by any array. - */ - i++; - seq_printf(seq, "%s ", - partition_name(rdev->dev)); - } - } - if (!i) - seq_printf(seq, ""); - - seq_printf(seq, "\n"); -} - - -static void status_resync(struct seq_file *seq, mddev_t * mddev) -{ - unsigned long max_blocks, resync, res, dt, db, rt; - - resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; - max_blocks = mddev->sb->size; - - /* - * Should not happen. - */ - if (!max_blocks) - MD_BUG(); - - res = (resync/1024)*1000/(max_blocks/1024 + 1); - { - int i, x = res/50, y = 20-x; - seq_printf(seq, "["); - for (i = 0; i < x; i++) - seq_printf(seq, "="); - seq_printf(seq, ">"); - for (i = 0; i < y; i++) - seq_printf(seq, "."); - seq_printf(seq, "] "); - } - if (!mddev->recovery_running) - /* - * true resync - */ - seq_printf(seq, " resync =%3lu.%lu%% (%lu/%lu)", - res/10, res % 10, resync, max_blocks); - else - /* - * recovery ... - */ - seq_printf(seq, " recovery =%3lu.%lu%% (%lu/%lu)", - res/10, res % 10, resync, max_blocks); - - /* - * We do not want to overflow, so the order of operands and - * the * 100 / 100 trick are important. We do a +1 to be - * safe against division by zero. We only estimate anyway. - * - * dt: time from mark until now - * db: blocks written from mark until now - * rt: remaining time - */ - dt = ((jiffies - mddev->resync_mark) / HZ); - if (!dt) dt++; - db = resync - (mddev->resync_mark_cnt/2); - rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; - - seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); - - seq_printf(seq, " speed=%ldK/sec", db/dt); - -} - - -static void *md_seq_start(struct seq_file *seq, loff_t *pos) -{ - struct list_head *tmp; - loff_t l = *pos; - mddev_t *mddev; - - if (l > 0x10000) - return NULL; - if (!l--) - /* header */ - return (void*)1; - - list_for_each(tmp,&all_mddevs) - if (!l--) { - mddev = list_entry(tmp, mddev_t, all_mddevs); - return mddev; - } - return (void*)2;/* tail */ -} - -static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) -{ - struct list_head *tmp; - mddev_t *next_mddev, *mddev = v; - - ++*pos; - if (v == (void*)2) - return NULL; - - if (v == (void*)1) - tmp = all_mddevs.next; - else - tmp = mddev->all_mddevs.next; - if (tmp != &all_mddevs) - next_mddev = list_entry(tmp,mddev_t,all_mddevs); - else { - next_mddev = (void*)2; - *pos = 0x10000; - } - - return next_mddev; - -} - -static void md_seq_stop(struct seq_file *seq, void *v) -{ - -} - -static int md_seq_show(struct seq_file *seq, void *v) -{ - int j, size; - struct md_list_head *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev = v; - - if (v == (void*)1) { - seq_printf(seq, "Personalities : "); - for (j = 0; j < MAX_PERSONALITY; j++) - if (pers[j]) - seq_printf(seq, "[%s] ", pers[j]->name); - - seq_printf(seq, "\n"); - seq_printf(seq, "read_ahead "); - if (read_ahead[MD_MAJOR] == INT_MAX) - seq_printf(seq, "not set\n"); - else - seq_printf(seq, "%d sectors\n", read_ahead[MD_MAJOR]); - return 0; - } - if (v == (void*)2) { - status_unused(seq); - return 0; - } - - seq_printf(seq, "md%d : %sactive", mdidx(mddev), - mddev->pers ? "" : "in"); - if (mddev->pers) { - if (mddev->ro) - seq_printf(seq, " (read-only)"); - seq_printf(seq, " %s", mddev->pers->name); - } - - size = 0; - ITERATE_RDEV(mddev,rdev,tmp2) { - seq_printf(seq, " %s[%d]", - partition_name(rdev->dev), rdev->desc_nr); - if (rdev->faulty) { - seq_printf(seq, "(F)"); - continue; - } - size += rdev->size; - } - - if (!list_empty(&mddev->disks)) { - if (mddev->pers) - seq_printf(seq, "\n %d blocks", - md_size[mdidx(mddev)]); - else - seq_printf(seq, "\n %d blocks", size); - } - - if (mddev->pers) { - - mddev->pers->status (seq, mddev); - - seq_printf(seq, "\n "); - if (mddev->curr_resync) { - status_resync (seq, mddev); - } else { - if (sem_getcount(&mddev->resync_sem) != 1) - seq_printf(seq, " resync=DELAYED"); - } - } - seq_printf(seq, "\n"); - - return 0; -} - - -static struct seq_operations md_seq_ops = { - .start = md_seq_start, - .next = md_seq_next, - .stop = md_seq_stop, - .show = md_seq_show, -}; - -static int md_seq_open(struct inode *inode, struct file *file) -{ - int error; - - error = seq_open(file, &md_seq_ops); - return error; -} - -static struct file_operations md_seq_fops = { - .open = md_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - - -int register_md_personality(int pnum, mdk_personality_t *p) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - if (pers[pnum]) { - MD_BUG(); - return -EBUSY; - } - - pers[pnum] = p; - printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum); - return 0; -} - -int unregister_md_personality(int pnum) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name); - pers[pnum] = NULL; - return 0; -} - -mdp_disk_t *get_spare(mddev_t *mddev) -{ - mdp_super_t *sb = mddev->sb; - mdp_disk_t *disk; - mdk_rdev_t *rdev; - struct md_list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (!rdev->sb) { - MD_BUG(); - continue; - } - disk = &sb->disks[rdev->desc_nr]; - if (disk_faulty(disk)) { - MD_BUG(); - continue; - } - if (disk_active(disk)) - continue; - return disk; - } - return NULL; -} - -static unsigned int sync_io[DK_MAX_MAJOR][DK_MAX_DISK]; -void md_sync_acct(kdev_t dev, unsigned long nr_sectors) -{ - unsigned int major = MAJOR(dev); - unsigned int index; - - index = disk_index(dev); - if ((index >= DK_MAX_DISK) || (major >= DK_MAX_MAJOR)) - return; - - sync_io[major][index] += nr_sectors; -} - -static int is_mddev_idle(mddev_t *mddev) -{ - mdk_rdev_t * rdev; - struct md_list_head *tmp; - int idle; - unsigned long curr_events; - - idle = 1; - ITERATE_RDEV(mddev,rdev,tmp) { - int major = MAJOR(rdev->dev); - int idx = disk_index(rdev->dev); - - if ((idx >= DK_MAX_DISK) || (major >= DK_MAX_MAJOR)) - continue; - - curr_events = kstat.dk_drive_rblk[major][idx] + - kstat.dk_drive_wblk[major][idx] ; - curr_events -= sync_io[major][idx]; - if ((curr_events - rdev->last_events) > 32) { - rdev->last_events = curr_events; - idle = 0; - } - } - return idle; -} - -MD_DECLARE_WAIT_QUEUE_HEAD(resync_wait); - -void md_done_sync(mddev_t *mddev, int blocks, int ok) -{ - /* another "blocks" (512byte) blocks have been synced */ - atomic_sub(blocks, &mddev->recovery_active); - wake_up(&mddev->recovery_wait); - if (!ok) { - // stop recovery, signal do_sync .... - if (mddev->pers->stop_resync) - mddev->pers->stop_resync(mddev); - if (mddev->recovery_running) - md_interrupt_thread(md_recovery_thread); - } -} - -#define SYNC_MARKS 10 -#define SYNC_MARK_STEP (3*HZ) -int md_do_sync(mddev_t *mddev, mdp_disk_t *spare) -{ - mddev_t *mddev2; - unsigned int max_sectors, currspeed, - j, window, err, serialize; - unsigned long mark[SYNC_MARKS]; - unsigned long mark_cnt[SYNC_MARKS]; - int last_mark,m; - struct md_list_head *tmp; - unsigned long last_check; - - - err = down_interruptible(&mddev->resync_sem); - if (err) - goto out_nolock; - -recheck: - serialize = 0; - ITERATE_MDDEV(mddev2,tmp) { - if (mddev2 == mddev) - continue; - if (mddev2->curr_resync && match_mddev_units(mddev,mddev2)) { - printk(KERN_INFO "md: delaying resync of md%d until md%d " - "has finished resync (they share one or more physical units)\n", - mdidx(mddev), mdidx(mddev2)); - serialize = 1; - break; - } - } - if (serialize) { - interruptible_sleep_on(&resync_wait); - if (md_signal_pending(current)) { - md_flush_signals(); - err = -EINTR; - goto out; - } - goto recheck; - } - - mddev->curr_resync = 1; - - max_sectors = mddev->sb->size<<1; - - printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); - printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed: %d KB/sec/disc.\n", - sysctl_speed_limit_min); - printk(KERN_INFO "md: using maximum available idle IO bandwith " - "(but not more than %d KB/sec) for reconstruction.\n", - sysctl_speed_limit_max); - - /* - * Resync has low priority. - */ - current->nice = 19; - - is_mddev_idle(mddev); /* this also initializes IO event counters */ - for (m = 0; m < SYNC_MARKS; m++) { - mark[m] = jiffies; - mark_cnt[m] = 0; - } - last_mark = 0; - mddev->resync_mark = mark[last_mark]; - mddev->resync_mark_cnt = mark_cnt[last_mark]; - - /* - * Tune reconstruction: - */ - window = vm_max_readahead*(PAGE_SIZE/512); - printk(KERN_INFO "md: using %dk window, over a total of %d blocks.\n", - window/2,max_sectors/2); - - atomic_set(&mddev->recovery_active, 0); - init_waitqueue_head(&mddev->recovery_wait); - last_check = 0; - for (j = 0; j < max_sectors;) { - int sectors; - - sectors = mddev->pers->sync_request(mddev, j); - - if (sectors < 0) { - err = sectors; - goto out; - } - atomic_add(sectors, &mddev->recovery_active); - j += sectors; - mddev->curr_resync = j; - - if (last_check + window > j) - continue; - - last_check = j; - - run_task_queue(&tq_disk); - - repeat: - if (jiffies >= mark[last_mark] + SYNC_MARK_STEP ) { - /* step marks */ - int next = (last_mark+1) % SYNC_MARKS; - - mddev->resync_mark = mark[next]; - mddev->resync_mark_cnt = mark_cnt[next]; - mark[next] = jiffies; - mark_cnt[next] = j - atomic_read(&mddev->recovery_active); - last_mark = next; - } - - - if (md_signal_pending(current)) { - /* - * got a signal, exit. - */ - mddev->curr_resync = 0; - printk(KERN_INFO "md: md_do_sync() got signal ... exiting\n"); - md_flush_signals(); - err = -EINTR; - goto out; - } - - /* - * this loop exits only if either when we are slower than - * the 'hard' speed limit, or the system was IO-idle for - * a jiffy. - * the system might be non-idle CPU-wise, but we only care - * about not overloading the IO subsystem. (things like an - * e2fsck being done on the RAID array should execute fast) - */ - if (md_need_resched(current)) - schedule(); - - currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; - - if (currspeed > sysctl_speed_limit_min) { - current->nice = 19; - - if ((currspeed > sysctl_speed_limit_max) || - !is_mddev_idle(mddev)) { - current->state = TASK_INTERRUPTIBLE; - md_schedule_timeout(HZ/4); - goto repeat; - } - } else - current->nice = -20; - } - printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev)); - err = 0; - /* - * this also signals 'finished resyncing' to md_stop - */ -out: - wait_disk_event(mddev->recovery_wait, atomic_read(&mddev->recovery_active)==0); - up(&mddev->resync_sem); -out_nolock: - mddev->curr_resync = 0; - wake_up(&resync_wait); - return err; -} - - -/* - * This is a kernel thread which syncs a spare disk with the active array - * - * the amount of foolproofing might seem to be a tad excessive, but an - * early (not so error-safe) version of raid1syncd synced the first 0.5 gigs - * of my root partition with the first 0.5 gigs of my /home partition ... so - * i'm a bit nervous ;) - */ -void md_do_recovery(void *data) -{ - int err; - mddev_t *mddev; - mdp_super_t *sb; - mdp_disk_t *spare; - struct md_list_head *tmp; - - printk(KERN_INFO "md: recovery thread got woken up ...\n"); -restart: - ITERATE_MDDEV(mddev,tmp) { - sb = mddev->sb; - if (!sb) - continue; - if (mddev->recovery_running) - continue; - if (sb->active_disks == sb->raid_disks) - continue; - if (mddev->sb_dirty) - md_update_sb(mddev); - if (!sb->spare_disks) { - printk(KERN_ERR "md%d: no spare disk to reconstruct array! " - "-- continuing in degraded mode\n", mdidx(mddev)); - continue; - } - /* - * now here we get the spare and resync it. - */ - spare = get_spare(mddev); - if (!spare) - continue; - printk(KERN_INFO "md%d: resyncing spare disk %s to replace failed disk\n", - mdidx(mddev), partition_name(MKDEV(spare->major,spare->minor))); - if (!mddev->pers->diskop) - continue; - if (mddev->pers->diskop(mddev, &spare, DISKOP_SPARE_WRITE)) - continue; - down(&mddev->recovery_sem); - mddev->recovery_running = 1; - err = md_do_sync(mddev, spare); - if (err == -EIO) { - printk(KERN_INFO "md%d: spare disk %s failed, skipping to next spare.\n", - mdidx(mddev), partition_name(MKDEV(spare->major,spare->minor))); - if (!disk_faulty(spare)) { - mddev->pers->diskop(mddev,&spare,DISKOP_SPARE_INACTIVE); - mark_disk_faulty(spare); - mark_disk_nonsync(spare); - mark_disk_inactive(spare); - sb->spare_disks--; - sb->working_disks--; - sb->failed_disks++; - } - } else - if (disk_faulty(spare)) - mddev->pers->diskop(mddev, &spare, - DISKOP_SPARE_INACTIVE); - if (err == -EINTR || err == -ENOMEM) { - /* - * Recovery got interrupted, or ran out of mem ... - * signal back that we have finished using the array. - */ - mddev->pers->diskop(mddev, &spare, - DISKOP_SPARE_INACTIVE); - up(&mddev->recovery_sem); - mddev->recovery_running = 0; - continue; - } else { - mddev->recovery_running = 0; - up(&mddev->recovery_sem); - } - if (!disk_faulty(spare)) { - /* - * the SPARE_ACTIVE diskop possibly changes the - * pointer too - */ - mddev->pers->diskop(mddev, &spare, DISKOP_SPARE_ACTIVE); - mark_disk_sync(spare); - mark_disk_active(spare); - sb->active_disks++; - sb->spare_disks--; - } - mddev->sb_dirty = 1; - md_update_sb(mddev); - goto restart; - } - printk(KERN_INFO "md: recovery thread finished ...\n"); - -} - -int md_notify_reboot(struct notifier_block *this, - unsigned long code, void *x) -{ - struct md_list_head *tmp; - mddev_t *mddev; - - if ((code == MD_SYS_DOWN) || (code == MD_SYS_HALT) - || (code == MD_SYS_POWER_OFF)) { - - printk(KERN_INFO "md: stopping all md devices.\n"); - - ITERATE_MDDEV(mddev,tmp) - do_md_stop (mddev, 1); - /* - * certain more exotic SCSI devices are known to be - * volatile wrt too early system reboots. While the - * right place to handle this issue is the given - * driver, we do want to have a safe RAID driver ... - */ - md_mdelay(1000*1); - } - return NOTIFY_DONE; -} - -struct notifier_block md_notifier = { - notifier_call: md_notify_reboot, - next: NULL, - priority: INT_MAX, /* before any real devices */ -}; - -static void md_geninit(void) -{ - struct proc_dir_entry *p; - int i; - - for(i = 0; i < MAX_MD_DEVS; i++) { - md_blocksizes[i] = 1024; - md_size[i] = 0; - md_hardsect_sizes[i] = 512; - } - blksize_size[MAJOR_NR] = md_blocksizes; - blk_size[MAJOR_NR] = md_size; - max_readahead[MAJOR_NR] = md_maxreadahead; - hardsect_size[MAJOR_NR] = md_hardsect_sizes; - - dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); - -#ifdef CONFIG_PROC_FS - p = create_proc_entry("mdstat", S_IRUGO, NULL); - if (p) - p->proc_fops = &md_seq_fops; -#endif -} - -request_queue_t * md_queue_proc(kdev_t dev) -{ - mddev_t *mddev = kdev_to_mddev(dev); - if (mddev == NULL) - return BLK_DEFAULT_QUEUE(MAJOR_NR); - else - return &mddev->queue; -} - -int md__init md_init(void) -{ - static char * name = "mdrecoveryd"; - int minor; - - printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d, MD_SB_DISKS=%d\n", - MD_MAJOR_VERSION, MD_MINOR_VERSION, - MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); - - if (devfs_register_blkdev (MAJOR_NR, "md", &md_fops)) - { - printk(KERN_ALERT "md: Unable to get major %d for md\n", MAJOR_NR); - return (-1); - } - devfs_handle = devfs_mk_dir (NULL, "md", NULL); - /* we don't use devfs_register_series because we want to fill md_hd_struct */ - for (minor=0; minor < MAX_MD_DEVS; ++minor) { - char devname[128]; - sprintf (devname, "%u", minor); - md_hd_struct[minor].de = devfs_register (devfs_handle, - devname, DEVFS_FL_DEFAULT, MAJOR_NR, minor, - S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); - } - - /* all requests on an uninitialised device get failed... */ - blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), md_fail_request); - blk_dev[MAJOR_NR].queue = md_queue_proc; - - - read_ahead[MAJOR_NR] = INT_MAX; - - add_gendisk(&md_gendisk); - - md_recovery_thread = md_register_thread(md_do_recovery, NULL, name); - if (!md_recovery_thread) - printk(KERN_ALERT "md: bug: couldn't allocate md_recovery_thread\n"); - - md_register_reboot_notifier(&md_notifier); - raid_table_header = register_sysctl_table(raid_root_table, 1); - - md_geninit(); - return (0); -} - - -#ifndef MODULE - -/* - * When md (and any require personalities) are compiled into the kernel - * (not a module), arrays can be assembles are boot time using with AUTODETECT - * where specially marked partitions are registered with md_autodetect_dev(), - * and with MD_BOOT where devices to be collected are given on the boot line - * with md=..... - * The code for that is here. - */ - -struct { - int set; - int noautodetect; -} raid_setup_args md__initdata; - -/* - * Searches all registered partitions for autorun RAID arrays - * at boot time. - */ -static kdev_t detected_devices[128]; -static int dev_cnt; - -void md_autodetect_dev(kdev_t dev) -{ - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; -} - - -static void autostart_arrays(void) -{ - mdk_rdev_t *rdev; - int i; - - printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); - - for (i = 0; i < dev_cnt; i++) { - kdev_t dev = detected_devices[i]; - - if (md_import_device(dev,1)) { - printk(KERN_ALERT "md: could not import %s!\n", - partition_name(dev)); - continue; - } - /* - * Sanity checks: - */ - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - continue; - } - if (rdev->faulty) { - MD_BUG(); - continue; - } - md_list_add(&rdev->pending, &pending_raid_disks); - } - dev_cnt = 0; - - autorun_devices(-1); -} - -static struct { - char device_set [MAX_MD_DEVS]; - int pers[MAX_MD_DEVS]; - int chunk[MAX_MD_DEVS]; - char *device_names[MAX_MD_DEVS]; -} md_setup_args md__initdata; - -/* - * Parse the command-line parameters given our kernel, but do not - * actually try to invoke the MD device now; that is handled by - * md_setup_drive after the low-level disk drivers have initialised. - * - * 27/11/1999: Fixed to work correctly with the 2.3 kernel (which - * assigns the task of parsing integer arguments to the - * invoked program now). Added ability to initialise all - * the MD devices (by specifying multiple "md=" lines) - * instead of just one. -- KTK - * 18May2000: Added support for persistant-superblock arrays: - * md=n,0,factor,fault,device-list uses RAID0 for device n - * md=n,-1,factor,fault,device-list uses LINEAR for device n - * md=n,device-list reads a RAID superblock from the devices - * elements in device-list are read by name_to_kdev_t so can be - * a hex number or something like /dev/hda1 /dev/sdb - * 2001-06-03: Dave Cinege - * Shifted name_to_kdev_t() and related operations to md_set_drive() - * for later execution. Rewrote section to make devfs compatible. - */ -static int md__init md_setup(char *str) -{ - int minor, level, factor, fault; - char *pername = ""; - char *str1 = str; - - if (get_option(&str, &minor) != 2) { /* MD Number */ - printk(KERN_WARNING "md: Too few arguments supplied to md=.\n"); - return 0; - } - if (minor >= MAX_MD_DEVS) { - printk(KERN_WARNING "md: md=%d, Minor device number too high.\n", minor); - return 0; - } else if (md_setup_args.device_names[minor]) { - printk(KERN_WARNING "md: md=%d, Specified more then once. " - "Replacing previous definition.\n", minor); - } - switch (get_option(&str, &level)) { /* RAID Personality */ - case 2: /* could be 0 or -1.. */ - if (level == 0 || level == -1) { - if (get_option(&str, &factor) != 2 || /* Chunk Size */ - get_option(&str, &fault) != 2) { - printk(KERN_WARNING "md: Too few arguments supplied to md=.\n"); - return 0; - } - md_setup_args.pers[minor] = level; - md_setup_args.chunk[minor] = 1 << (factor+12); - switch(level) { - case -1: - level = LINEAR; - pername = "linear"; - break; - case 0: - level = RAID0; - pername = "raid0"; - break; - default: - printk(KERN_WARNING - "md: The kernel has not been configured for raid%d support!\n", - level); - return 0; - } - md_setup_args.pers[minor] = level; - break; - } - /* FALL THROUGH */ - case 1: /* the first device is numeric */ - str = str1; - /* FALL THROUGH */ - case 0: - md_setup_args.pers[minor] = 0; - pername="super-block"; - } - - printk(KERN_INFO "md: Will configure md%d (%s) from %s, below.\n", - minor, pername, str); - md_setup_args.device_names[minor] = str; - - return 1; -} - -extern kdev_t name_to_kdev_t(char *line) md__init; -void md__init md_setup_drive(void) -{ - int minor, i; - kdev_t dev; - mddev_t*mddev; - kdev_t devices[MD_SB_DISKS+1]; - - for (minor = 0; minor < MAX_MD_DEVS; minor++) { - int err = 0; - char *devname; - mdu_disk_info_t dinfo; - - if ((devname = md_setup_args.device_names[minor]) == 0) continue; - - for (i = 0; i < MD_SB_DISKS && devname != 0; i++) { - - char *p; - void *handle; - - p = strchr(devname, ','); - if (p) - *p++ = 0; - - dev = name_to_kdev_t(devname); - handle = devfs_find_handle(NULL, devname, MAJOR (dev), MINOR (dev), - DEVFS_SPECIAL_BLK, 1); - if (handle != 0) { - unsigned major, minor; - devfs_get_maj_min(handle, &major, &minor); - dev = MKDEV(major, minor); - } - if (dev == 0) { - printk(KERN_WARNING "md: Unknown device name: %s\n", devname); - break; - } - - devices[i] = dev; - md_setup_args.device_set[minor] = 1; - - devname = p; - } - devices[i] = 0; - - if (md_setup_args.device_set[minor] == 0) - continue; - - if (mddev_map[minor]) { - printk(KERN_WARNING - "md: Ignoring md=%d, already autodetected. (Use raid=noautodetect)\n", - minor); - continue; - } - printk(KERN_INFO "md: Loading md%d: %s\n", minor, md_setup_args.device_names[minor]); - - mddev = alloc_mddev(MKDEV(MD_MAJOR,minor)); - if (!mddev) { - printk(KERN_ERR "md: kmalloc failed - cannot start array %d\n", minor); - continue; - } - if (md_setup_args.pers[minor]) { - /* non-persistent */ - mdu_array_info_t ainfo; - ainfo.level = pers_to_level(md_setup_args.pers[minor]); - ainfo.size = 0; - ainfo.nr_disks =0; - ainfo.raid_disks =0; - ainfo.md_minor =minor; - ainfo.not_persistent = 1; - - ainfo.state = (1 << MD_SB_CLEAN); - ainfo.active_disks = 0; - ainfo.working_disks = 0; - ainfo.failed_disks = 0; - ainfo.spare_disks = 0; - ainfo.layout = 0; - ainfo.chunk_size = md_setup_args.chunk[minor]; - err = set_array_info(mddev, &ainfo); - for (i = 0; !err && (dev = devices[i]); i++) { - dinfo.number = i; - dinfo.raid_disk = i; - dinfo.state = (1<sb->nr_disks++; - mddev->sb->raid_disks++; - mddev->sb->active_disks++; - mddev->sb->working_disks++; - err = add_new_disk (mddev, &dinfo); - } - } else { - /* persistent */ - for (i = 0; (dev = devices[i]); i++) { - dinfo.major = MAJOR(dev); - dinfo.minor = MINOR(dev); - add_new_disk (mddev, &dinfo); - } - } - if (!err) - err = do_md_run(mddev); - if (err) { - mddev->sb_dirty = 0; - do_md_stop(mddev, 0); - printk(KERN_WARNING "md: starting md%d failed\n", minor); - } - } -} - -static int md__init raid_setup(char *str) -{ - int len, pos; - - len = strlen(str) + 1; - pos = 0; - - while (pos < len) { - char *comma = strchr(str+pos, ','); - int wlen; - if (comma) - wlen = (comma-str)-pos; - else wlen = (len-1)-pos; - - if (strncmp(str, "noautodetect", wlen) == 0) - raid_setup_args.noautodetect = 1; - pos += wlen+1; - } - raid_setup_args.set = 1; - return 1; -} - -int md__init md_run_setup(void) -{ - if (raid_setup_args.noautodetect) - printk(KERN_INFO "md: Skipping autodetection of RAID arrays. (raid=noautodetect)\n"); - else - autostart_arrays(); - md_setup_drive(); - return 0; -} - -__setup("raid=", raid_setup); -__setup("md=", md_setup); - -__initcall(md_init); -__initcall(md_run_setup); - -#else /* It is a MODULE */ - -int init_module(void) -{ - return md_init(); -} - -static void free_device_names(void) -{ - while (!list_empty(&device_names)) { - struct dname *tmp = list_entry(device_names.next, - dev_name_t, list); - list_del(&tmp->list); - kfree(tmp); - } -} - - -void cleanup_module(void) -{ - md_unregister_thread(md_recovery_thread); - devfs_unregister(devfs_handle); - - devfs_unregister_blkdev(MAJOR_NR,"md"); - unregister_reboot_notifier(&md_notifier); - unregister_sysctl_table(raid_table_header); -#ifdef CONFIG_PROC_FS - remove_proc_entry("mdstat", NULL); -#endif - - del_gendisk(&md_gendisk); - - blk_dev[MAJOR_NR].queue = NULL; - blksize_size[MAJOR_NR] = NULL; - blk_size[MAJOR_NR] = NULL; - max_readahead[MAJOR_NR] = NULL; - hardsect_size[MAJOR_NR] = NULL; - - free_device_names(); - -} -#endif - -MD_EXPORT_SYMBOL(md_size); -MD_EXPORT_SYMBOL(register_md_personality); -MD_EXPORT_SYMBOL(unregister_md_personality); -MD_EXPORT_SYMBOL(partition_name); -MD_EXPORT_SYMBOL(md_error); -MD_EXPORT_SYMBOL(md_do_sync); -MD_EXPORT_SYMBOL(md_sync_acct); -MD_EXPORT_SYMBOL(md_done_sync); -MD_EXPORT_SYMBOL(md_recover_arrays); -MD_EXPORT_SYMBOL(md_register_thread); -MD_EXPORT_SYMBOL(md_unregister_thread); -MD_EXPORT_SYMBOL(md_update_sb); -MD_EXPORT_SYMBOL(md_wakeup_thread); -MD_EXPORT_SYMBOL(md_print_devices); -MD_EXPORT_SYMBOL(find_rdev_nr); -MD_EXPORT_SYMBOL(md_interrupt_thread); -<<<<<<< found -MD_EXPORT_SYMBOL(mddev_map); -||||||| expected -EXPORT_SYMBOL(mddev_map); -======= ->>>>>>> replacement -MODULE_LICENSE("GPL"); ./linux/md-loop/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.471295758 +0000 @@ -1,4025 +0,0 @@ -/* - md.c : Multiple Devices driver for Linux - Copyright (C) 1998, 1999, 2000 Ingo Molnar - - completely rewritten, based on the MD driver code from Marc Zyngier - - Changes: - - - RAID-1/RAID-5 extensions by Miguel de Icaza, Gadi Oxman, Ingo Molnar - - boot support for linear and striped mode by Harald Hoyer - - kerneld support by Boris Tobotras - - kmod support by: Cyrus Durgin - - RAID0 bugfixes: Mark Anthony Lisher - - Devfs support by Richard Gooch - - - lots of fixes and improvements to the RAID1/RAID5 and generic - RAID code (such as request based resynchronization): - - Neil Brown . - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - You should have received a copy of the GNU General Public License - (for example /usr/src/linux/COPYING); if not, write to the Free - Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -*/ - -#include -#include -#include -#include -#include -#include - -#include - -#ifdef CONFIG_KMOD -#include -#endif - -#define __KERNEL_SYSCALLS__ -#include - -#include - -#define MAJOR_NR MD_MAJOR -#define MD_DRIVER - -#include - -#define DEBUG 0 -#if DEBUG -# define dprintk(x...) printk(x) -#else -# define dprintk(x...) do { } while(0) -#endif - -#ifndef MODULE -static void autostart_arrays (void); -#endif - -static mdk_personality_t *pers[MAX_PERSONALITY]; - -/* - * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit' - * is 100 KB/sec, so the extra system load does not show up that much. - * Increase it if you want to have more _guaranteed_ speed. Note that - * the RAID driver will use the maximum available bandwith if the IO - * subsystem is idle. There is also an 'absolute maximum' reconstruction - * speed limit - in case reconstruction slows down your system despite - * idle IO detection. - * - * you can change it via /proc/sys/dev/raid/speed_limit_min and _max. - */ - -static int sysctl_speed_limit_min = 100; -static int sysctl_speed_limit_max = 100000; - -static struct ctl_table_header *raid_table_header; - -static ctl_table raid_table[] = { - {DEV_RAID_SPEED_LIMIT_MIN, "speed_limit_min", - &sysctl_speed_limit_min, sizeof(int), 0644, NULL, &proc_dointvec}, - {DEV_RAID_SPEED_LIMIT_MAX, "speed_limit_max", - &sysctl_speed_limit_max, sizeof(int), 0644, NULL, &proc_dointvec}, - {0} -}; - -static ctl_table raid_dir_table[] = { - {DEV_RAID, "raid", NULL, 0, 0555, raid_table}, - {0} -}; - -static ctl_table raid_root_table[] = { - {CTL_DEV, "dev", NULL, 0, 0555, raid_dir_table}, - {0} -}; - -/* - * these have to be allocated separately because external - * subsystems want to have a pre-defined structure - */ -struct hd_struct md_hd_struct[MAX_MD_DEVS]; -static int md_blocksizes[MAX_MD_DEVS]; -static int md_hardsect_sizes[MAX_MD_DEVS]; -static void md_recover_arrays(void); -static mdk_thread_t *md_recovery_thread; - -int md_size[MAX_MD_DEVS]; - -static struct block_device_operations md_fops; -static devfs_handle_t devfs_handle; - -static struct gendisk md_gendisk= -{ - major: MD_MAJOR, - major_name: "md", - minor_shift: 0, - max_p: 1, - part: md_hd_struct, - sizes: md_size, - nr_real: MAX_MD_DEVS, - real_devices: NULL, - next: NULL, - fops: &md_fops, -}; - -/* - * Enables to iterate over all existing md arrays - * all_mddevs_lock protects this list as well as mddev_map. - */ -static MD_LIST_HEAD(all_mddevs); -static spinlock_t all_mddevs_lock = SPIN_LOCK_UNLOCKED; - - -/* - * iterates through all used mddevs in the system. - * We take care to grab the all_mddevs_lock whenever navigating - * the list, and to always hold a refcount when unlocked. - * Any code which breaks out of this loop while own - * a reference to the current mddev and must mddev_put it. - */ -#define ITERATE_MDDEV(mddev,tmp) \ - \ - for (spin_lock(&all_mddevs_lock), \ - (tmp = all_mddevs.next), \ - (mddev = NULL); \ - (void)(tmp != &all_mddevs && \ - mddev_get(list_entry(tmp, mddev_t, all_mddevs))),\ - spin_unlock(&all_mddevs_lock), \ - (mddev ? mddev_put(mddev):(void)NULL), \ - (mddev = list_entry(tmp, mddev_t, all_mddevs)), \ - (tmp != &all_mddevs); \ - spin_lock(&all_mddevs_lock), \ - (tmp = tmp->next) \ - ) - -static mddev_t *mddev_map[MAX_MD_DEVS]; - -static int md_fail_request (request_queue_t *q, struct bio *bio) -{ - bio_io_error(bio); - return 0; -} - -static inline mddev_t *mddev_get(mddev_t *mddev) -{ - atomic_inc(&mddev->active); - return mddev; -} - -static void mddev_put(mddev_t *mddev) -{ - if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) - return; - if (!mddev->sb && list_empty(&mddev->disks)) { - list_del(&mddev->all_mddevs); - mddev_map[mdidx(mddev)] = NULL; - kfree(mddev); - MOD_DEC_USE_COUNT; - } - spin_unlock(&all_mddevs_lock); -} - -static mddev_t * mddev_find(int unit) -{ - mddev_t *mddev, *new = NULL; - - retry: - spin_lock(&all_mddevs_lock); - if (mddev_map[unit]) { - mddev = mddev_get(mddev_map[unit]); - spin_unlock(&all_mddevs_lock); - if (new) - kfree(new); - return mddev; - } - if (new) { - mddev_map[unit] = new; - list_add(&new->all_mddevs, &all_mddevs); - spin_unlock(&all_mddevs_lock); - MOD_INC_USE_COUNT; - return new; - } - spin_unlock(&all_mddevs_lock); - - new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - memset(new, 0, sizeof(*new)); - - new->__minor = unit; - init_MUTEX(&new->reconfig_sem); - MD_INIT_LIST_HEAD(&new->disks); - MD_INIT_LIST_HEAD(&new->all_mddevs); - atomic_set(&new->active, 1); - - goto retry; -} - -static inline int mddev_lock(mddev_t * mddev) -{ - return down_interruptible(&mddev->reconfig_sem); -} - -static inline int mddev_trylock(mddev_t * mddev) -{ - return down_trylock(&mddev->reconfig_sem); -} - -static inline void mddev_unlock(mddev_t * mddev) -{ - up(&mddev->reconfig_sem); -} - -mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) -{ - mdk_rdev_t * rdev; - struct md_list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == nr) - return rdev; - } - return NULL; -} - -mdk_rdev_t * find_rdev(mddev_t * mddev, kdev_t dev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->dev == dev) - return rdev; - } - return NULL; -} - -static MD_LIST_HEAD(device_names); - -char * partition_name(kdev_t dev) -{ - struct gendisk *hd; - static char nomem [] = ""; - dev_name_t *dname; - struct md_list_head *tmp; - - list_for_each(tmp, &device_names) { - dname = md_list_entry(tmp, dev_name_t, list); - if (dname->dev == dev) - return dname->name; - } - - dname = (dev_name_t *) kmalloc(sizeof(*dname), GFP_KERNEL); - - if (!dname) - return nomem; - /* - * ok, add this new device name to the list - */ - hd = get_gendisk (dev); - dname->name = NULL; - if (hd) - dname->name = disk_name (hd, MINOR(dev), dname->namebuf); - if (!dname->name) { - sprintf (dname->namebuf, "[dev %s]", kdevname(dev)); - dname->name = dname->namebuf; - } - - dname->dev = dev; - md_list_add(&dname->list, &device_names); - - return dname->name; -} - -static unsigned int calc_dev_sboffset(kdev_t dev, mddev_t *mddev, - int persistent) -{ - unsigned int size = 0; - - if (blk_size[MAJOR(dev)]) - size = blk_size[MAJOR(dev)][MINOR(dev)]; - if (persistent) - size = MD_NEW_SIZE_BLOCKS(size); - return size; -} - -static unsigned int calc_dev_size(kdev_t dev, mddev_t *mddev, int persistent) -{ - unsigned int size; - - size = calc_dev_sboffset(dev, mddev, persistent); - if (!mddev->sb) { - MD_BUG(); - return size; - } - if (mddev->sb->chunk_size) - size &= ~(mddev->sb->chunk_size/1024 - 1); - return size; -} - -static unsigned int zoned_raid_size(mddev_t *mddev) -{ - unsigned int mask; - mdk_rdev_t * rdev; - struct md_list_head *tmp; - - if (!mddev->sb) { - MD_BUG(); - return -EINVAL; - } - /* - * do size and offset calculations. - */ - mask = ~(mddev->sb->chunk_size/1024 - 1); - - ITERATE_RDEV(mddev,rdev,tmp) { - rdev->size &= mask; - md_size[mdidx(mddev)] += rdev->size; - } - return 0; -} - -static void remove_descriptor(mdp_disk_t *disk, mdp_super_t *sb) -{ - if (disk_active(disk)) { - sb->working_disks--; - } else { - if (disk_spare(disk)) { - sb->spare_disks--; - sb->working_disks--; - } else { - sb->failed_disks--; - } - } - sb->nr_disks--; - disk->major = 0; - disk->minor = 0; - mark_disk_removed(disk); -} - -#define BAD_MAGIC KERN_ERR \ -"md: invalid raid superblock magic on %s\n" - -#define BAD_MINOR KERN_ERR \ -"md: %s: invalid raid minor (%x)\n" - -#define OUT_OF_MEM KERN_ALERT \ -"md: out of memory.\n" - -#define NO_SB KERN_ERR \ -"md: disabled device %s, could not read superblock.\n" - -#define BAD_CSUM KERN_WARNING \ -"md: invalid superblock checksum on %s\n" - -static int alloc_array_sb(mddev_t * mddev) -{ - if (mddev->sb) { - MD_BUG(); - return 0; - } - - mddev->sb = (mdp_super_t *) __get_free_page (GFP_KERNEL); - if (!mddev->sb) - return -ENOMEM; - md_clear_page(mddev->sb); - return 0; -} - -static int alloc_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb) - MD_BUG(); - - rdev->sb_page = alloc_page(GFP_KERNEL); - if (!rdev->sb_page) { - printk(OUT_OF_MEM); - return -EINVAL; - } - rdev->sb = (mdp_super_t *) page_address(rdev->sb_page); - - return 0; -} - -static void free_disk_sb(mdk_rdev_t * rdev) -{ - if (rdev->sb_page) { - page_cache_release(rdev->sb_page); - rdev->sb = NULL; - rdev->sb_page = NULL; - rdev->sb_offset = 0; - rdev->size = 0; - } else { - if (!rdev->faulty) - MD_BUG(); - } -} - - -static void bh_complete(struct buffer_head *bh, int uptodate) -{ - - if (uptodate) - set_bit(BH_Uptodate, &bh->b_state); - - complete((struct completion*)bh->b_private); -} - -static int sync_page_io(kdev_t dev, unsigned long sector, int size, - struct page *page, int rw) -{ - struct buffer_head bh; - struct completion event; - - init_completion(&event); - init_buffer(&bh, bh_complete, &event); - bh.b_rdev = dev; - bh.b_rsector = sector; - bh.b_state = (1 << BH_Req) | (1 << BH_Mapped) | (1 << BH_Lock); - bh.b_size = size; - bh.b_page = page; - bh.b_reqnext = NULL; - bh.b_data = page_address(page); - generic_make_request(rw, &bh); - - run_task_queue(&tq_disk); - wait_for_completion(&event); - - return test_bit(BH_Uptodate, &bh.b_state); -} - -static int read_disk_sb(mdk_rdev_t * rdev) -{ - int ret = -EINVAL; - kdev_t dev = rdev->dev; - unsigned long sb_offset; - - if (!rdev->sb) { - MD_BUG(); - goto abort; - } - - /* - * Calculate the position of the superblock, - * it's at the end of the disk - */ - sb_offset = calc_dev_sboffset(rdev->dev, rdev->mddev, 1); - rdev->sb_offset = sb_offset; - - if (!sync_page_io(dev, sb_offset<<1, MD_SB_BYTES, rdev->sb_page, READ)) { - printk(NO_SB,partition_name(dev)); - return -EINVAL; - } - printk(KERN_INFO " [events: %08lx]\n", (unsigned long)rdev->sb->events_lo); - ret = 0; -abort: - return ret; -} - -static unsigned int calc_sb_csum(mdp_super_t * sb) -{ - unsigned int disk_csum, csum; - - disk_csum = sb->sb_csum; - sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); - sb->sb_csum = disk_csum; - return csum; -} - -/* - * Check one RAID superblock for generic plausibility - */ - -static int check_disk_sb(mdk_rdev_t * rdev) -{ - mdp_super_t *sb; - int ret = -EINVAL; - - sb = rdev->sb; - if (!sb) { - MD_BUG(); - goto abort; - } - - if (sb->md_magic != MD_SB_MAGIC) { - printk(BAD_MAGIC, partition_name(rdev->dev)); - goto abort; - } - - if (sb->md_minor >= MAX_MD_DEVS) { - printk(BAD_MINOR, partition_name(rdev->dev), sb->md_minor); - goto abort; - } - - if (calc_sb_csum(sb) != sb->sb_csum) { - printk(BAD_CSUM, partition_name(rdev->dev)); - goto abort; - } - ret = 0; -abort: - return ret; -} - -static kdev_t dev_unit(kdev_t dev) -{ - unsigned int mask; - struct gendisk *hd = get_gendisk(dev); - - if (!hd) - return 0; - mask = ~((1 << hd->minor_shift) - 1); - - return MKDEV(MAJOR(dev), MINOR(dev) & mask); -} - -static mdk_rdev_t * match_dev_unit(mddev_t *mddev, kdev_t dev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev,rdev,tmp) - if (dev_unit(rdev->dev) == dev_unit(dev)) - return rdev; - - return NULL; -} - -static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - ITERATE_RDEV(mddev1,rdev,tmp) - if (match_dev_unit(mddev2, rdev->dev)) - return 1; - - return 0; -} - -static MD_LIST_HEAD(all_raid_disks); -static MD_LIST_HEAD(pending_raid_disks); - -static void bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev) -{ - mdk_rdev_t *same_pdev; - - if (rdev->mddev) { - MD_BUG(); - return; - } - same_pdev = match_dev_unit(mddev, rdev->dev); - if (same_pdev) - printk( KERN_WARNING -"md%d: WARNING: %s appears to be on the same physical disk as %s. True\n" -" protection against single-disk failure might be compromised.\n", - mdidx(mddev), partition_name(rdev->dev), - partition_name(same_pdev->dev)); - - md_list_add(&rdev->same_set, &mddev->disks); - rdev->mddev = mddev; - printk(KERN_INFO "md: bind<%s>\n", partition_name(rdev->dev)); -} - -static void unbind_rdev_from_array(mdk_rdev_t * rdev) -{ - if (!rdev->mddev) { - MD_BUG(); - return; - } - list_del_init(&rdev->same_set); - printk(KERN_INFO "md: unbind<%s>\n", partition_name(rdev->dev)); - rdev->mddev = NULL; -} - -/* - * prevent the device from being mounted, repartitioned or - * otherwise reused by a RAID array (or any other kernel - * subsystem), by opening the device. [simply getting an - * inode is not enough, the SCSI module usage code needs - * an explicit open() on the device] - */ -static int lock_rdev(mdk_rdev_t *rdev) -{ - int err = 0; - struct block_device *bdev; - - bdev = bdget(rdev->dev); - if (!bdev) - return -ENOMEM; - err = blkdev_get(bdev, FMODE_READ|FMODE_WRITE, 0, BDEV_RAW); - if (!err) - rdev->bdev = bdev; - return err; -} - -static void unlock_rdev(mdk_rdev_t *rdev) -{ - struct block_device *bdev = rdev->bdev; - rdev->bdev = NULL; - if (!bdev) - MD_BUG(); - blkdev_put(bdev, BDEV_RAW); -} - -void md_autodetect_dev(kdev_t dev); - -static void export_rdev(mdk_rdev_t * rdev) -{ - printk(KERN_INFO "md: export_rdev(%s)\n",partition_name(rdev->dev)); - if (rdev->mddev) - MD_BUG(); - unlock_rdev(rdev); - free_disk_sb(rdev); - list_del_init(&rdev->all); - if (!list_empty(&rdev->pending)) { - printk(KERN_INFO "md: (%s was pending)\n", - partition_name(rdev->dev)); - list_del_init(&rdev->pending); - } -#ifndef MODULE - md_autodetect_dev(rdev->dev); -#endif - rdev->dev = 0; - rdev->faulty = 0; - kfree(rdev); -} - -static void kick_rdev_from_array(mdk_rdev_t * rdev) -{ - unbind_rdev_from_array(rdev); - export_rdev(rdev); -} - -static void export_array(mddev_t *mddev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - mdp_super_t *sb = mddev->sb; - - if (mddev->sb) { - mddev->sb = NULL; - free_page((unsigned long) sb); - } - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!rdev->mddev) { - MD_BUG(); - continue; - } - kick_rdev_from_array(rdev); - } - if (!list_empty(&mddev->disks)) - MD_BUG(); -} - -static void free_mddev(mddev_t *mddev) -{ - if (!mddev) { - MD_BUG(); - return; - } - - export_array(mddev); - md_size[mdidx(mddev)] = 0; - md_hd_struct[mdidx(mddev)].nr_sects = 0; -} - -#undef BAD_CSUM -#undef BAD_MAGIC -#undef OUT_OF_MEM -#undef NO_SB - -static void print_desc(mdp_disk_t *desc) -{ - printk(" DISK\n", desc->number, - partition_name(MKDEV(desc->major,desc->minor)), - desc->major,desc->minor,desc->raid_disk,desc->state); -} - -static void print_sb(mdp_super_t *sb) -{ - int i; - - printk(KERN_INFO "md: SB: (V:%d.%d.%d) ID:<%08x.%08x.%08x.%08x> CT:%08x\n", - sb->major_version, sb->minor_version, sb->patch_version, - sb->set_uuid0, sb->set_uuid1, sb->set_uuid2, sb->set_uuid3, - sb->ctime); - printk(KERN_INFO "md: L%d S%08d ND:%d RD:%d md%d LO:%d CS:%d\n", sb->level, - sb->size, sb->nr_disks, sb->raid_disks, sb->md_minor, - sb->layout, sb->chunk_size); - printk(KERN_INFO "md: UT:%08x ST:%d AD:%d WD:%d FD:%d SD:%d CSUM:%08x E:%08lx\n", - sb->utime, sb->state, sb->active_disks, sb->working_disks, - sb->failed_disks, sb->spare_disks, - sb->sb_csum, (unsigned long)sb->events_lo); - - printk(KERN_INFO); - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - - desc = sb->disks + i; - if (desc->number || desc->major || desc->minor || - desc->raid_disk || (desc->state && (desc->state != 4))) { - printk(" D %2d: ", i); - print_desc(desc); - } - } - printk(KERN_INFO "md: THIS: "); - print_desc(&sb->this_disk); - -} - -static void print_rdev(mdk_rdev_t *rdev) -{ - printk(KERN_INFO "md: rdev %s: O:%s, SZ:%08ld F:%d DN:%d ", - partition_name(rdev->dev), partition_name(rdev->old_dev), - rdev->size, rdev->faulty, rdev->desc_nr); - if (rdev->sb) { - printk(KERN_INFO "md: rdev superblock:\n"); - print_sb(rdev->sb); - } else - printk(KERN_INFO "md: no rdev superblock!\n"); -} - -void md_print_devices(void) -{ - struct md_list_head *tmp, *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev; - - printk("\n"); - printk("md: **********************************\n"); - printk("md: * *\n"); - printk("md: **********************************\n"); - ITERATE_MDDEV(mddev,tmp) if (mddev_lock(mddev)==0) { - printk("md%d: ", mdidx(mddev)); - - ITERATE_RDEV(mddev,rdev,tmp2) - printk("<%s>", partition_name(rdev->dev)); - - if (mddev->sb) { - printk(" array superblock:\n"); - print_sb(mddev->sb); - } else - printk(" no array superblock.\n"); - - ITERATE_RDEV(mddev,rdev,tmp2) - print_rdev(rdev); - mddev_unlock(mddev); - } - printk("md: **********************************\n"); - printk("\n"); -} - -static int sb_equal(mdp_super_t *sb1, mdp_super_t *sb2) -{ - int ret; - mdp_super_t *tmp1, *tmp2; - - tmp1 = kmalloc(sizeof(*tmp1),GFP_KERNEL); - tmp2 = kmalloc(sizeof(*tmp2),GFP_KERNEL); - - if (!tmp1 || !tmp2) { - ret = 0; - printk(KERN_INFO "md.c: sb1 is not equal to sb2!\n"); - goto abort; - } - - *tmp1 = *sb1; - *tmp2 = *sb2; - - /* - * nr_disks is not constant - */ - tmp1->nr_disks = 0; - tmp2->nr_disks = 0; - - if (memcmp(tmp1, tmp2, MD_SB_GENERIC_CONSTANT_WORDS * 4)) - ret = 0; - else - ret = 1; - -abort: - if (tmp1) - kfree(tmp1); - if (tmp2) - kfree(tmp2); - - return ret; -} - -static int uuid_equal(mdk_rdev_t *rdev1, mdk_rdev_t *rdev2) -{ - if ( (rdev1->sb->set_uuid0 == rdev2->sb->set_uuid0) && - (rdev1->sb->set_uuid1 == rdev2->sb->set_uuid1) && - (rdev1->sb->set_uuid2 == rdev2->sb->set_uuid2) && - (rdev1->sb->set_uuid3 == rdev2->sb->set_uuid3)) - - return 1; - - return 0; -} - -static mdk_rdev_t * find_rdev_all(kdev_t dev) -{ - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - list_for_each(tmp, &all_raid_disks) { - rdev = md_list_entry(tmp, mdk_rdev_t, all); - if (rdev->dev == dev) - return rdev; - } - return NULL; -} - -#define GETBLK_FAILED KERN_ERR \ -"md: getblk failed for device %s\n" - -static int write_disk_sb(mdk_rdev_t * rdev) -{ - kdev_t dev; - unsigned long sb_offset, size; - - if (!rdev->sb) { - MD_BUG(); - return 1; - } - if (rdev->faulty) { - MD_BUG(); - return 1; - } - if (rdev->sb->md_magic != MD_SB_MAGIC) { - MD_BUG(); - return 1; - } - - dev = rdev->dev; - sb_offset = calc_dev_sboffset(dev, rdev->mddev, 1); - if (rdev->sb_offset != sb_offset) { - printk(KERN_INFO "%s's sb offset has changed from %ld to %ld, skipping\n", - partition_name(dev), rdev->sb_offset, sb_offset); - goto skip; - } - /* - * If the disk went offline meanwhile and it's just a spare, then - * its size has changed to zero silently, and the MD code does - * not yet know that it's faulty. - */ - size = calc_dev_size(dev, rdev->mddev, 1); - if (size != rdev->size) { - printk(KERN_INFO "%s's size has changed from %ld to %ld since import, skipping\n", - partition_name(dev), rdev->size, size); - goto skip; - } - - printk(KERN_INFO "(write) %s's sb offset: %ld\n", partition_name(dev), sb_offset); - - if (!sync_page_io(dev, sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE)) { - printk("md: write_disk_sb failed for device %s\n", partition_name(dev)); - return 1; - } -skip: - return 0; -} -#undef GETBLK_FAILED - -static void set_this_disk(mddev_t *mddev, mdk_rdev_t *rdev) -{ - int i, ok = 0; - mdp_disk_t *desc; - - for (i = 0; i < MD_SB_DISKS; i++) { - desc = mddev->sb->disks + i; -#if 0 - if (disk_faulty(desc)) { - if (MKDEV(desc->major,desc->minor) == rdev->dev) - ok = 1; - continue; - } -#endif - if (MKDEV(desc->major,desc->minor) == rdev->dev) { - rdev->sb->this_disk = *desc; - rdev->desc_nr = desc->number; - ok = 1; - break; - } - } - - if (!ok) { - MD_BUG(); - } -} - -static int sync_sbs(mddev_t * mddev) -{ - mdk_rdev_t *rdev; - mdp_super_t *sb; - struct md_list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty || rdev->alias_device) - continue; - sb = rdev->sb; - *sb = *mddev->sb; - set_this_disk(mddev, rdev); - sb->sb_csum = calc_sb_csum(sb); - } - return 0; -} - -void __md_update_sb(mddev_t * mddev) -{ - int err, count = 100; - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - if (!mddev->sb_dirty) { - printk("hm, md_update_sb() called without ->sb_dirty == 1, from %p.\n", __builtin_return_address(0)); - return 0; - } - mddev->sb_dirty = 0; -repeat: - mddev->sb->utime = CURRENT_TIME; - if ((++mddev->sb->events_lo)==0) - ++mddev->sb->events_hi; - - if ((mddev->sb->events_lo|mddev->sb->events_hi)==0) { - /* - * oops, this 64-bit counter should never wrap. - * Either we are in around ~1 trillion A.C., assuming - * 1 reboot per second, or we have a bug: - */ - MD_BUG(); - mddev->sb->events_lo = mddev->sb->events_hi = 0xffffffff; - } - sync_sbs(mddev); - - /* - * do not write anything to disk if using - * nonpersistent superblocks - */ - if (mddev->sb->not_persistent) - return; - - printk(KERN_INFO "md: updating md%d RAID superblock on device\n", - mdidx(mddev)); - - err = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - printk(KERN_INFO "md: "); - if (rdev->faulty) - printk("(skipping faulty "); - if (rdev->alias_device) - printk("(skipping alias "); - if (!rdev->faulty && disk_faulty(&rdev->sb->this_disk)) { - printk("(skipping new-faulty %s )\n", - partition_name(rdev->dev)); - continue; - } - printk("%s ", partition_name(rdev->dev)); - if (!rdev->faulty && !rdev->alias_device) { - printk("[events: %08lx]", - (unsigned long)rdev->sb->events_lo); - err += write_disk_sb(rdev); - } else - printk(")\n"); - } - if (err) { - if (--count) { - printk(KERN_ERR "md: errors occurred during superblock update, repeating\n"); - goto repeat; - } - printk(KERN_ERR "md: excessive errors occurred during superblock update, exiting\n"); - } -} - -void md_update_sb(mddev_t *mddev) -{ - if (mddev_lock(mddev)) - return; - if (mddev->sb_dirty) - __md_update_sb(mddev); - mddev_unlock(mddev); -} - - -/* - * Import a device. If 'on_disk', then sanity check the superblock - * - * mark the device faulty if: - * - * - the device is nonexistent (zero size) - * - the device has no valid superblock - * - */ -static int md_import_device(kdev_t newdev, int on_disk) -{ - int err; - mdk_rdev_t *rdev; - unsigned int size; - - if (find_rdev_all(newdev)) - return -EEXIST; - - rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL); - if (!rdev) { - printk(KERN_ERR "md: could not alloc mem for %s!\n", partition_name(newdev)); - return -ENOMEM; - } - memset(rdev, 0, sizeof(*rdev)); - - if (is_mounted(newdev)) { - printk(KERN_WARNING "md: can not import %s, has active inodes!\n", - partition_name(newdev)); - err = -EBUSY; - goto abort_free; - } - - if ((err = alloc_disk_sb(rdev))) - goto abort_free; - - rdev->dev = newdev; - if (lock_rdev(rdev)) { - printk(KERN_ERR "md: could not lock %s, zero-size? Marking faulty.\n", - partition_name(newdev)); - err = -EINVAL; - goto abort_free; - } - rdev->desc_nr = -1; - rdev->faulty = 0; - - size = 0; - if (blk_size[MAJOR(newdev)]) - size = blk_size[MAJOR(newdev)][MINOR(newdev)]; - if (!size) { - printk(KERN_WARNING "md: %s has zero size, marking faulty!\n", - partition_name(newdev)); - err = -EINVAL; - goto abort_free; - } - - if (on_disk) { - if ((err = read_disk_sb(rdev))) { - printk(KERN_WARNING "md: could not read %s's sb, not importing!\n", - partition_name(newdev)); - goto abort_free; - } - if ((err = check_disk_sb(rdev))) { - printk(KERN_WARNING "md: %s has invalid sb, not importing!\n", - partition_name(newdev)); - goto abort_free; - } - - if (rdev->sb->level != -4) { - rdev->old_dev = MKDEV(rdev->sb->this_disk.major, - rdev->sb->this_disk.minor); - rdev->desc_nr = rdev->sb->this_disk.number; - } else { - rdev->old_dev = MKDEV(0, 0); - rdev->desc_nr = -1; - } - } - md_list_add(&rdev->all, &all_raid_disks); - MD_INIT_LIST_HEAD(&rdev->pending); - INIT_LIST_HEAD(&rdev->same_set); - - return 0; - -abort_free: - if (rdev->sb) { - if (rdev->bdev) - unlock_rdev(rdev); - free_disk_sb(rdev); - } - kfree(rdev); - return err; -} - -/* - * Check a full RAID array for plausibility - */ - -#define INCONSISTENT KERN_ERR \ -"md: fatal superblock inconsistency in %s -- removing from array\n" - -#define OUT_OF_DATE KERN_ERR \ -"md: superblock update time inconsistency -- using the most recent one\n" - -#define OLD_VERSION KERN_ALERT \ -"md: md%d: unsupported raid array version %d.%d.%d\n" - -#define NOT_CLEAN_IGNORE KERN_ERR \ -"md: md%d: raid array is not clean -- starting background reconstruction\n" - -#define UNKNOWN_LEVEL KERN_ERR \ -"md: md%d: unsupported raid level %d\n" - -static int analyze_sbs(mddev_t * mddev) -{ - int out_of_date = 0, i, first; - struct md_list_head *tmp, *tmp2; - mdk_rdev_t *rdev, *rdev2, *freshest; - mdp_super_t *sb; - - /* - * Verify the RAID superblock on each real device - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) { - MD_BUG(); - goto abort; - } - if (!rdev->sb) { - MD_BUG(); - goto abort; - } - if (check_disk_sb(rdev)) - goto abort; - } - - /* - * The superblock constant part has to be the same - * for all disks in the array. - */ - sb = NULL; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (!sb) { - sb = rdev->sb; - continue; - } - if (!sb_equal(sb, rdev->sb)) { - printk(INCONSISTENT, partition_name(rdev->dev)); - kick_rdev_from_array(rdev); - continue; - } - } - - /* - * OK, we have all disks and the array is ready to run. Let's - * find the freshest superblock, that one will be the superblock - * that represents the whole array. - */ - if (!mddev->sb) - if (alloc_array_sb(mddev)) - goto abort; - sb = mddev->sb; - freshest = NULL; - - ITERATE_RDEV(mddev,rdev,tmp) { - __u64 ev1, ev2; - /* - * if the checksum is invalid, use the superblock - * only as a last resort. (decrease it's age by - * one event) - */ - if (calc_sb_csum(rdev->sb) != rdev->sb->sb_csum) { - if (rdev->sb->events_lo || rdev->sb->events_hi) - if ((rdev->sb->events_lo--)==0) - rdev->sb->events_hi--; - } - - printk(KERN_INFO "md: %s's event counter: %08lx\n", - partition_name(rdev->dev), - (unsigned long)rdev->sb->events_lo); - if (!freshest) { - freshest = rdev; - continue; - } - /* - * Find the newest superblock version - */ - ev1 = md_event(rdev->sb); - ev2 = md_event(freshest->sb); - if (ev1 != ev2) { - out_of_date = 1; - if (ev1 > ev2) - freshest = rdev; - } - } - if (out_of_date) { - printk(OUT_OF_DATE); - printk(KERN_INFO "md: freshest: %s\n", partition_name(freshest->dev)); - } - memcpy (sb, freshest->sb, sizeof(*sb)); - - /* - * at this point we have picked the 'best' superblock - * from all available superblocks. - * now we validate this superblock and kick out possibly - * failed disks. - */ - ITERATE_RDEV(mddev,rdev,tmp) { - /* - * Kick all non-fresh devices - */ - __u64 ev1, ev2; - ev1 = md_event(rdev->sb); - ev2 = md_event(sb); - ++ev1; - if (ev1 < ev2) { - printk(KERN_WARNING "md: kicking non-fresh %s from array!\n", - partition_name(rdev->dev)); - kick_rdev_from_array(rdev); - continue; - } - } - - /* - * Fix up changed device names ... but only if this disk has a - * recent update time. Use faulty checksum ones too. - */ - if (mddev->sb->level != -4) - ITERATE_RDEV(mddev,rdev,tmp) { - __u64 ev1, ev2, ev3; - if (rdev->faulty || rdev->alias_device) { - MD_BUG(); - goto abort; - } - ev1 = md_event(rdev->sb); - ev2 = md_event(sb); - ev3 = ev2; - --ev3; - if ((rdev->dev != rdev->old_dev) && - ((ev1 == ev2) || (ev1 == ev3))) { - mdp_disk_t *desc; - - printk(KERN_WARNING "md: device name has changed from %s to %s since last import!\n", - partition_name(rdev->old_dev), partition_name(rdev->dev)); - if (rdev->desc_nr == -1) { - MD_BUG(); - goto abort; - } - desc = &sb->disks[rdev->desc_nr]; - if (rdev->old_dev != MKDEV(desc->major, desc->minor)) { - MD_BUG(); - goto abort; - } - desc->major = MAJOR(rdev->dev); - desc->minor = MINOR(rdev->dev); - desc = &rdev->sb->this_disk; - desc->major = MAJOR(rdev->dev); - desc->minor = MINOR(rdev->dev); - } - } - - /* - * Remove unavailable and faulty devices ... - * - * note that if an array becomes completely unrunnable due to - * missing devices, we do not write the superblock back, so the - * administrator has a chance to fix things up. The removal thus - * only happens if it's nonfatal to the contents of the array. - */ - for (i = 0; i < MD_SB_DISKS; i++) { - int found; - mdp_disk_t *desc; - kdev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - /* - * We kick faulty devices/descriptors immediately. - * - * Note: multipath devices are a special case. Since we - * were able to read the superblock on the path, we don't - * care if it was previously marked as faulty, it's up now - * so enable it. - */ - if (disk_faulty(desc) && mddev->sb->level != -4) { - found = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr != desc->number) - continue; - printk(KERN_WARNING "md%d: kicking faulty %s!\n", - mdidx(mddev),partition_name(rdev->dev)); - kick_rdev_from_array(rdev); - found = 1; - break; - } - if (!found) { - if (dev == MKDEV(0,0)) - continue; - printk(KERN_WARNING "md%d: removing former faulty %s!\n", - mdidx(mddev), partition_name(dev)); - } - remove_descriptor(desc, sb); - continue; - } else if (disk_faulty(desc)) { - /* - * multipath entry marked as faulty, unfaulty it - */ - rdev = find_rdev(mddev, dev); - if(rdev) - mark_disk_spare(desc); - else - remove_descriptor(desc, sb); - } - - if (dev == MKDEV(0,0)) - continue; - /* - * Is this device present in the rdev ring? - */ - found = 0; - ITERATE_RDEV(mddev,rdev,tmp) { - /* - * Multi-path IO special-case: since we have no - * this_disk descriptor at auto-detect time, - * we cannot check rdev->number. - * We can check the device though. - */ - if ((sb->level == -4) && (rdev->dev == - MKDEV(desc->major,desc->minor))) { - found = 1; - break; - } - if (rdev->desc_nr == desc->number) { - found = 1; - break; - } - } - if (found) - continue; - - printk(KERN_WARNING "md%d: former device %s is unavailable, removing from array!\n", - mdidx(mddev), partition_name(dev)); - remove_descriptor(desc, sb); - } - - /* - * Double check wether all devices mentioned in the - * superblock are in the rdev ring. - */ - first = 1; - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - kdev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (dev == MKDEV(0,0)) - continue; - - if (disk_faulty(desc)) { - MD_BUG(); - goto abort; - } - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - goto abort; - } - /* - * In the case of Multipath-IO, we have no - * other information source to find out which - * disk is which, only the position of the device - * in the superblock: - */ - if (mddev->sb->level == -4) { - if ((rdev->desc_nr != -1) && (rdev->desc_nr != i)) { - MD_BUG(); - goto abort; - } - rdev->desc_nr = i; - if (!first) - rdev->alias_device = 1; - else - first = 0; - } - } - - /* - * Kick all rdevs that are not in the - * descriptor array: - */ - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == -1) - kick_rdev_from_array(rdev); - } - - /* - * Do a final reality check. - */ - if (mddev->sb->level != -4) { - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->desc_nr == -1) { - MD_BUG(); - goto abort; - } - /* - * is the desc_nr unique? - */ - ITERATE_RDEV(mddev,rdev2,tmp2) { - if ((rdev2 != rdev) && - (rdev2->desc_nr == rdev->desc_nr)) { - MD_BUG(); - goto abort; - } - } - /* - * is the device unique? - */ - ITERATE_RDEV(mddev,rdev2,tmp2) { - if ((rdev2 != rdev) && - (rdev2->dev == rdev->dev)) { - MD_BUG(); - goto abort; - } - } - } - } - - /* - * Check if we can support this RAID array - */ - if (sb->major_version != MD_MAJOR_VERSION || - sb->minor_version > MD_MINOR_VERSION) { - - printk(OLD_VERSION, mdidx(mddev), sb->major_version, - sb->minor_version, sb->patch_version); - goto abort; - } - - if ((sb->state != (1 << MD_SB_CLEAN)) && ((sb->level == 1) || - (sb->level == 4) || (sb->level == 5))) - printk(NOT_CLEAN_IGNORE, mdidx(mddev)); - - return 0; -abort: - return 1; -} - -#undef INCONSISTENT -#undef OUT_OF_DATE -#undef OLD_VERSION -#undef OLD_LEVEL - -static int device_size_calculation(mddev_t * mddev) -{ - int data_disks = 0, persistent; - unsigned int readahead; - mdp_super_t *sb = mddev->sb; - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - /* - * Do device size calculation. Bail out if too small. - * (we have to do this after having validated chunk_size, - * because device size has to be modulo chunk_size) - */ - persistent = !mddev->sb->not_persistent; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (rdev->size) { - MD_BUG(); - continue; - } - rdev->size = calc_dev_size(rdev->dev, mddev, persistent); - if (rdev->size < sb->chunk_size / 1024) { - printk(KERN_WARNING - "md: Dev %s smaller than chunk_size: %ldk < %dk\n", - partition_name(rdev->dev), - rdev->size, sb->chunk_size / 1024); - return -EINVAL; - } - } - - switch (sb->level) { - case -4: - data_disks = 1; - break; - case -3: - data_disks = 1; - break; - case -2: - data_disks = 1; - break; - case -1: - zoned_raid_size(mddev); - data_disks = 1; - break; - case 0: - zoned_raid_size(mddev); - data_disks = sb->raid_disks; - break; - case 1: - data_disks = 1; - break; - case 4: - case 5: - data_disks = sb->raid_disks-1; - break; - default: - printk(UNKNOWN_LEVEL, mdidx(mddev), sb->level); - goto abort; - } - if (!md_size[mdidx(mddev)]) - md_size[mdidx(mddev)] = sb->size * data_disks; - - readahead = MD_READAHEAD; - if ((sb->level == 0) || (sb->level == 4) || (sb->level == 5)) { - readahead = (mddev->sb->chunk_size>>PAGE_SHIFT) * 4 * data_disks; - if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2) - readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2; - } else { - // (no multipath branch - it uses the default setting) - if (sb->level == -3) - readahead = 0; - } - - printk(KERN_INFO "md%d: max total readahead window set to %ldk\n", - mdidx(mddev), readahead*(PAGE_SIZE/1024)); - - printk(KERN_INFO - "md%d: %d data-disks, max readahead per data-disk: %ldk\n", - mdidx(mddev), data_disks, readahead/data_disks*(PAGE_SIZE/1024)); - return 0; -abort: - return 1; -} - - -#define TOO_BIG_CHUNKSIZE KERN_ERR \ -"too big chunk_size: %d > %d\n" - -#define TOO_SMALL_CHUNKSIZE KERN_ERR \ -"too small chunk_size: %d < %ld\n" - -#define BAD_CHUNKSIZE KERN_ERR \ -"no chunksize specified, see 'man raidtab'\n" - -static int do_md_run(mddev_t * mddev) -{ - int pnum, err; - int chunk_size; - struct md_list_head *tmp; - mdk_rdev_t *rdev; - - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return -EINVAL; - } - - if (mddev->pers) - return -EBUSY; - - /* - * Resize disks to align partitions size on a given - * chunk size. - */ - md_size[mdidx(mddev)] = 0; - - /* - * Analyze all RAID superblock(s) - */ - if (analyze_sbs(mddev)) { - MD_BUG(); - return -EINVAL; - } - - chunk_size = mddev->sb->chunk_size; - pnum = level_to_pers(mddev->sb->level); - - if ((pnum != MULTIPATH) && (pnum != RAID1)) { - if (!chunk_size) { - /* - * 'default chunksize' in the old md code used to - * be PAGE_SIZE, baaad. - * we abort here to be on the safe side. We dont - * want to continue the bad practice. - */ - printk(BAD_CHUNKSIZE); - return -EINVAL; - } - if (chunk_size > MAX_CHUNK_SIZE) { - printk(TOO_BIG_CHUNKSIZE, chunk_size, MAX_CHUNK_SIZE); - return -EINVAL; - } - /* - * chunk-size has to be a power of 2 and multiples of PAGE_SIZE - */ - if ( (1 << ffz(~chunk_size)) != chunk_size) { - MD_BUG(); - return -EINVAL; - } - if (chunk_size < PAGE_SIZE) { - printk(TOO_SMALL_CHUNKSIZE, chunk_size, PAGE_SIZE); - return -EINVAL; - } - } else - if (chunk_size) - printk(KERN_INFO "md: RAID level %d does not need chunksize! Continuing anyway.\n", - mddev->sb->level); - - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - if (!pers[pnum]) - { -#ifdef CONFIG_KMOD - char module_name[80]; - sprintf (module_name, "md-personality-%d", pnum); - request_module (module_name); - if (!pers[pnum]) -#endif - { - printk(KERN_ERR "md: personality %d is not loaded!\n", - pnum); - return -EINVAL; - } - } - - if (device_size_calculation(mddev)) - return -EINVAL; - - /* - * Drop all container device buffers, from now on - * the only valid external interface is through the md - * device. - * Also find largest hardsector size - */ - md_hardsect_sizes[mdidx(mddev)] = 512; - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - invalidate_device(rdev->dev, 1); - if (get_hardsect_size(rdev->dev) - > md_hardsect_sizes[mdidx(mddev)]) - md_hardsect_sizes[mdidx(mddev)] = - get_hardsect_size(rdev->dev); - } - md_blocksizes[mdidx(mddev)] = 1024; - if (md_blocksizes[mdidx(mddev)] < md_hardsect_sizes[mdidx(mddev)]) - md_blocksizes[mdidx(mddev)] = md_hardsect_sizes[mdidx(mddev)]; - mddev->pers = pers[pnum]; - - blk_queue_make_request(&mddev->queue, mddev->pers->make_request); - mddev->queue.queuedata = mddev; - - err = mddev->pers->run(mddev); - if (err) { - printk(KERN_ERR "md: pers->run() failed ...\n"); - mddev->pers = NULL; - return -EINVAL; - } - - mddev->in_sync = (mddev->sb->state & (1<pers->sync_request) - mddev->sb->state &= ~(1 << MD_SB_CLEAN); - mddev->sb_dirty = 1; - __md_update_sb(mddev); - - md_recover_arrays(); - /* - * md_size has units of 1K blocks, which are - * twice as large as sectors. - */ - md_hd_struct[mdidx(mddev)].start_sect = 0; - register_disk(&md_gendisk, MKDEV(MAJOR_NR,mdidx(mddev)), - 1, &md_fops, md_size[mdidx(mddev)]<<1); - - read_ahead[MD_MAJOR] = 1024; - return (0); -} - -#undef TOO_BIG_CHUNKSIZE -#undef BAD_CHUNKSIZE - -static int restart_array(mddev_t *mddev) -{ - int err; - - /* - * Complain if it has no devices - */ - err = -ENXIO; - if (list_empty(&mddev->disks)) - goto out; - - if (mddev->pers) { - err = -EBUSY; - if (!mddev->ro) - goto out; - - mddev->ro = 0; - set_device_ro(mddev_to_kdev(mddev), 0); - - printk(KERN_INFO - "md: md%d switched to read-write mode.\n", mdidx(mddev)); - /* - * Kick recovery or resync if necessary - */ - md_recover_arrays(); - err = 0; - } else { - printk(KERN_ERR "md: md%d has no personality assigned.\n", - mdidx(mddev)); - err = -EINVAL; - } - -out: - return err; -} - -#define STILL_MOUNTED KERN_WARNING \ -"md: md%d still mounted.\n" -#define STILL_IN_USE \ -"md: md%d still in use.\n" - -static int do_md_stop(mddev_t * mddev, int ro) -{ - int err = 0; - kdev_t dev = mddev_to_kdev(mddev); - - if (atomic_read(&mddev->active)>1) { - printk(STILL_IN_USE, mdidx(mddev)); - err = -EBUSY; - goto out; - } - - if (mddev->pers) { - if (mddev->sync_thread) { - if (mddev->recovery_running > 0) - mddev->recovery_running = -EINTR; - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - if (mddev->spare) { - mddev->pers->diskop(mddev, &mddev->spare, - DISKOP_SPARE_INACTIVE); - mddev->spare = NULL; - } - } - - invalidate_device(dev, 1); - - if (ro) { - err = -ENXIO; - if (mddev->ro) - goto out; - mddev->ro = 1; - } else { - if (mddev->ro) - set_device_ro(dev, 0); - if (mddev->pers->stop(mddev)) { - err = -EBUSY; - if (mddev->ro) - set_device_ro(dev, 1); - goto out; - } - if (mddev->ro) - mddev->ro = 0; - } - if (mddev->sb) { - /* - * mark it clean only if there was no resync - * interrupted. - */ - if (mddev->in_sync) { - printk(KERN_INFO "md: marking sb clean...\n"); - mddev->sb->state |= 1 << MD_SB_CLEAN; - } - mddev->sb_dirty = 1; - __md_update_sb(mddev); - } - if (ro) - set_device_ro(dev, 1); - } - - /* - * Free resources if final stop - */ - if (!ro) { - printk(KERN_INFO "md: md%d stopped.\n", mdidx(mddev)); - free_mddev(mddev); - } else - printk(KERN_INFO "md: md%d switched to read-only mode.\n", mdidx(mddev)); - err = 0; -out: - return err; -} - -/* - * We have to safely support old arrays too. - */ -int detect_old_array(mdp_super_t *sb) -{ - if (sb->major_version > 0) - return 0; - if (sb->minor_version >= 90) - return 0; - - return -EINVAL; -} - - -static void autorun_array(mddev_t *mddev) -{ - mdk_rdev_t *rdev; - struct md_list_head *tmp; - int err; - - if (list_empty(&mddev->disks)) { - MD_BUG(); - return; - } - - printk(KERN_INFO "md: running: "); - - ITERATE_RDEV(mddev,rdev,tmp) { - printk("<%s>", partition_name(rdev->dev)); - } - printk("\n"); - - err = do_md_run (mddev); - if (err) { - printk(KERN_WARNING "md :do_md_run() returned %d\n", err); - /* - * prevent the writeback of an unrunnable array - */ - mddev->sb_dirty = 0; - do_md_stop (mddev, 0); - } -} - -/* - * lets try to run arrays based on all disks that have arrived - * until now. (those are in the ->pending list) - * - * the method: pick the first pending disk, collect all disks with - * the same UUID, remove all from the pending list and put them into - * the 'same_array' list. Then order this list based on superblock - * update time (freshest comes first), kick out 'old' disks and - * compare superblocks. If everything's fine then run it. - * - * If "unit" is allocated, then bump its reference count - */ -static void autorun_devices(void) -{ - struct md_list_head candidates; - struct md_list_head *tmp; - mdk_rdev_t *rdev0, *rdev; - mddev_t *mddev; - - printk(KERN_INFO "md: autorun ...\n"); - while (!list_empty(&pending_raid_disks)) { - rdev0 = md_list_entry(pending_raid_disks.next, - mdk_rdev_t, pending); - - printk(KERN_INFO "md: considering %s ...\n", partition_name(rdev0->dev)); - MD_INIT_LIST_HEAD(&candidates); - ITERATE_RDEV_PENDING(rdev,tmp) { - if (uuid_equal(rdev0, rdev)) { - if (!sb_equal(rdev0->sb, rdev->sb)) { - printk(KERN_WARNING - "md: %s has same UUID as %s, but superblocks differ ...\n", - partition_name(rdev->dev), partition_name(rdev0->dev)); - continue; - } - printk(KERN_INFO "md: adding %s ...\n", partition_name(rdev->dev)); - md_list_del(&rdev->pending); - md_list_add(&rdev->pending, &candidates); - } - } - /* - * now we have a set of devices, with all of them having - * mostly sane superblocks. It's time to allocate the - * mddev. - */ - - mddev = mddev_find(rdev0->sb->md_minor); - if (!mddev) { - printk(KERN_ERR "md: cannot allocate memory for md drive.\n"); - break; - } - if (mddev_lock(mddev)) - printk(KERN_WARNING "md: md%d locked, cannot run\n", - mdidx(mddev)); - else if (mddev->sb || !list_empty(&mddev->disks)) { - printk(KERN_WARNING "md: md%d already running, cannot run %s\n", - mdidx(mddev), partition_name(rdev0->dev)); - mddev_unlock(mddev); - } else { - printk(KERN_INFO "md: created md%d\n", mdidx(mddev)); - ITERATE_RDEV_GENERIC(candidates,pending,rdev,tmp) { - bind_rdev_to_array(rdev, mddev); - list_del_init(&rdev->pending); - } - autorun_array(mddev); - mddev_unlock(mddev); - } - /* on success, candidates will be empty, on error - * it wont... - */ - ITERATE_RDEV_GENERIC(candidates,pending,rdev,tmp) - export_rdev(rdev); - mddev_put(mddev); - } - printk(KERN_INFO "md: ... autorun DONE.\n"); -} - -/* - * import RAID devices based on one partition - * if possible, the array gets run as well. - */ - -#define BAD_VERSION KERN_ERR \ -"md: %s has RAID superblock version 0.%d, autodetect needs v0.90 or higher\n" - -#define OUT_OF_MEM KERN_ALERT \ -"md: out of memory.\n" - -#define NO_DEVICE KERN_ERR \ -"md: disabled device %s\n" - -#define AUTOADD_FAILED KERN_ERR \ -"md: auto-adding devices to md%d FAILED (error %d).\n" - -#define AUTOADD_FAILED_USED KERN_ERR \ -"md: cannot auto-add device %s to md%d, already used.\n" - -#define AUTORUN_FAILED KERN_ERR \ -"md: auto-running md%d FAILED (error %d).\n" - -#define MDDEV_BUSY KERN_ERR \ -"md: cannot auto-add to md%d, already running.\n" - -#define AUTOADDING KERN_INFO \ -"md: auto-adding devices to md%d, based on %s's superblock.\n" - -#define AUTORUNNING KERN_INFO \ -"md: auto-running md%d.\n" - -static int autostart_array(kdev_t startdev) -{ - int err = -EINVAL, i; - mdp_super_t *sb = NULL; - mdk_rdev_t *start_rdev = NULL, *rdev; - - if (md_import_device(startdev, 1)) { - printk(KERN_WARNING "md: could not import %s!\n", partition_name(startdev)); - goto abort; - } - - start_rdev = find_rdev_all(startdev); - if (!start_rdev) { - MD_BUG(); - goto abort; - } - if (start_rdev->faulty) { - printk(KERN_WARNING "md: can not autostart based on faulty %s!\n", - partition_name(startdev)); - goto abort; - } - md_list_add(&start_rdev->pending, &pending_raid_disks); - - sb = start_rdev->sb; - - err = detect_old_array(sb); - if (err) { - printk(KERN_WARNING "md: array version is too old to be autostarted ," - "use raidtools 0.90 mkraid --upgrade to upgrade the array " - "without data loss!\n"); - goto abort; - } - - for (i = 0; i < MD_SB_DISKS; i++) { - mdp_disk_t *desc; - kdev_t dev; - - desc = sb->disks + i; - dev = MKDEV(desc->major, desc->minor); - - if (dev == MKDEV(0,0)) - continue; - if (dev == startdev) - continue; - if (md_import_device(dev, 1)) { - printk(KERN_WARNING "md: could not import %s, trying to run array nevertheless.\n", - partition_name(dev)); - continue; - } - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - goto abort; - } - md_list_add(&rdev->pending, &pending_raid_disks); - } - - /* - * possibly return codes - */ - autorun_devices(); - return 0; - -abort: - if (start_rdev) - export_rdev(start_rdev); - return err; -} - -#undef BAD_VERSION -#undef OUT_OF_MEM -#undef NO_DEVICE -#undef AUTOADD_FAILED_USED -#undef AUTOADD_FAILED -#undef AUTORUN_FAILED -#undef AUTOADDING -#undef AUTORUNNING - - -static int get_version(void * arg) -{ - mdu_version_t ver; - - ver.major = MD_MAJOR_VERSION; - ver.minor = MD_MINOR_VERSION; - ver.patchlevel = MD_PATCHLEVEL_VERSION; - - if (md_copy_to_user(arg, &ver, sizeof(ver))) - return -EFAULT; - - return 0; -} - -#define SET_FROM_SB(x) info.x = mddev->sb->x -static int get_array_info(mddev_t * mddev, void * arg) -{ - mdu_array_info_t info; - - if (!mddev->sb) { - MD_BUG(); - return -EINVAL; - } - - SET_FROM_SB(major_version); - SET_FROM_SB(minor_version); - SET_FROM_SB(patch_version); - SET_FROM_SB(ctime); - SET_FROM_SB(level); - SET_FROM_SB(size); - SET_FROM_SB(nr_disks); - SET_FROM_SB(raid_disks); - SET_FROM_SB(md_minor); - SET_FROM_SB(not_persistent); - - SET_FROM_SB(utime); - SET_FROM_SB(state); - SET_FROM_SB(active_disks); - SET_FROM_SB(working_disks); - SET_FROM_SB(failed_disks); - SET_FROM_SB(spare_disks); - - SET_FROM_SB(layout); - SET_FROM_SB(chunk_size); - - if (md_copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} -#undef SET_FROM_SB - -#define SET_FROM_SB(x) info.x = mddev->sb->disks[nr].x -static int get_disk_info(mddev_t * mddev, void * arg) -{ - mdu_disk_info_t info; - unsigned int nr; - - if (!mddev->sb) - return -EINVAL; - - if (md_copy_from_user(&info, arg, sizeof(info))) - return -EFAULT; - - nr = info.number; - if (nr >= MD_SB_DISKS) - return -EINVAL; - - SET_FROM_SB(major); - SET_FROM_SB(minor); - SET_FROM_SB(raid_disk); - SET_FROM_SB(state); - - if (md_copy_to_user(arg, &info, sizeof(info))) - return -EFAULT; - - return 0; -} -#undef SET_FROM_SB - -#define SET_SB(x) mddev->sb->disks[nr].x = info->x - -static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) -{ - int err, size, persistent; - mdk_rdev_t *rdev; - unsigned int nr; - kdev_t dev; - dev = MKDEV(info->major,info->minor); - - if (find_rdev_all(dev)) { - printk(KERN_WARNING "md: device %s already used in a RAID array!\n", - partition_name(dev)); - return -EBUSY; - } - if (!mddev->sb) { - /* expecting a device which has a superblock */ - err = md_import_device(dev, 1); - if (err) { - printk(KERN_WARNING "md: md_import_device returned %d\n", err); - return -EINVAL; - } - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - return -EINVAL; - } - if (!list_empty(&mddev->disks)) { - mdk_rdev_t *rdev0 = md_list_entry(mddev->disks.next, - mdk_rdev_t, same_set); - if (!uuid_equal(rdev0, rdev)) { - printk(KERN_WARNING "md: %s has different UUID to %s\n", - partition_name(rdev->dev), partition_name(rdev0->dev)); - export_rdev(rdev); - return -EINVAL; - } - if (!sb_equal(rdev0->sb, rdev->sb)) { - printk(KERN_WARNING "md: %s has same UUID but different superblock to %s\n", - partition_name(rdev->dev), partition_name(rdev0->dev)); - export_rdev(rdev); - return -EINVAL; - } - } - bind_rdev_to_array(rdev, mddev); - return 0; - } - - nr = info->number; - if (nr >= mddev->sb->nr_disks) { - MD_BUG(); - return -EINVAL; - } - - - SET_SB(number); - SET_SB(major); - SET_SB(minor); - SET_SB(raid_disk); - SET_SB(state); - - if ((info->state & (1<old_dev = dev; - rdev->desc_nr = info->number; - - bind_rdev_to_array(rdev, mddev); - - persistent = !mddev->sb->not_persistent; - if (!persistent) - printk(KERN_INFO "md: nonpersistent superblock ...\n"); - - size = calc_dev_size(dev, mddev, persistent); - rdev->sb_offset = calc_dev_sboffset(dev, mddev, persistent); - - if (!mddev->sb->size || (mddev->sb->size > size)) - mddev->sb->size = size; - } - - /* - * sync all other superblocks with the main superblock - */ - sync_sbs(mddev); - - return 0; -} -#undef SET_SB - -static int hot_generate_error(mddev_t * mddev, kdev_t dev) -{ - struct request_queue *q; - mdk_rdev_t *rdev; - mdp_disk_t *disk; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to generate %s error in md%d ... \n", - partition_name(dev), mdidx(mddev)); - - rdev = find_rdev(mddev, dev); - if (!rdev) { - MD_BUG(); - return -ENXIO; - } - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - disk = &mddev->sb->disks[rdev->desc_nr]; - if (!disk_active(disk)) - return -ENODEV; - - q = blk_get_queue(rdev->dev); - if (!q) { - MD_BUG(); - return -ENODEV; - } - printk(KERN_INFO "md: okay, generating error!\n"); -// q->oneshot_error = 1; // disabled for now - - return 0; -} - -static int hot_remove_disk(mddev_t * mddev, kdev_t dev) -{ - int err; - mdk_rdev_t *rdev; - mdp_disk_t *disk; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to remove %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (!mddev->pers->diskop) { - printk(KERN_WARNING "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - rdev = find_rdev(mddev, dev); - if (!rdev) - return -ENXIO; - - if (rdev->desc_nr == -1) { - MD_BUG(); - return -EINVAL; - } - disk = &mddev->sb->disks[rdev->desc_nr]; - if (disk_active(disk)) - goto busy; - - if (disk_removed(disk)) - return -EINVAL; - - err = mddev->pers->diskop(mddev, &disk, DISKOP_HOT_REMOVE_DISK); - if (err == -EBUSY) - goto busy; - - if (err) { - MD_BUG(); - return -EINVAL; - } - - remove_descriptor(disk, mddev->sb); - kick_rdev_from_array(rdev); - __md_update_sb(mddev); - - return 0; -busy: - printk(KERN_WARNING "md: cannot remove active disk %s from md%d ... \n", - partition_name(dev), mdidx(mddev)); - return -EBUSY; -} - -static int hot_add_disk(mddev_t * mddev, kdev_t dev) -{ - int i, err, persistent; - unsigned int size; - mdk_rdev_t *rdev; - mdp_disk_t *disk; - - if (!mddev->pers) - return -ENODEV; - - printk(KERN_INFO "md: trying to hot-add %s to md%d ... \n", - partition_name(dev), mdidx(mddev)); - - if (!mddev->pers->diskop) { - printk(KERN_WARNING "md%d: personality does not support diskops!\n", - mdidx(mddev)); - return -EINVAL; - } - - persistent = !mddev->sb->not_persistent; - - rdev = find_rdev(mddev, dev); - if (rdev) - return -EBUSY; - - err = md_import_device (dev, 0); - if (err) { - printk(KERN_WARNING "md: error, md_import_device() returned %d\n", err); - return -EINVAL; - } - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - return -EINVAL; - } - if (rdev->faulty) { - printk(KERN_WARNING "md: can not hot-add faulty %s disk to md%d!\n", - partition_name(dev), mdidx(mddev)); - err = -EINVAL; - goto abort_export; - } - size = calc_dev_size(dev, mddev, persistent); - - if (size < mddev->sb->size) { - printk(KERN_WARNING "md%d: disk size %d blocks < array size %d\n", - mdidx(mddev), size, mddev->sb->size); - err = -ENOSPC; - goto abort_export; - } - bind_rdev_to_array(rdev, mddev); - - /* - * The rest should better be atomic, we can have disk failures - * noticed in interrupt contexts ... - */ - rdev->old_dev = dev; - rdev->size = size; - rdev->sb_offset = calc_dev_sboffset(dev, mddev, persistent); - - disk = mddev->sb->disks + mddev->sb->raid_disks; - for (i = mddev->sb->raid_disks; i < MD_SB_DISKS; i++) { - disk = mddev->sb->disks + i; - - if (!disk->major && !disk->minor) - break; - if (disk_removed(disk)) - break; - } - if (i == MD_SB_DISKS) { - printk(KERN_WARNING "md%d: can not hot-add to full array!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unbind_export; - } - - if (disk_removed(disk)) { - /* - * reuse slot - */ - if (disk->number != i) { - MD_BUG(); - err = -EINVAL; - goto abort_unbind_export; - } - } else { - disk->number = i; - } - - disk->raid_disk = disk->number; - disk->major = MAJOR(dev); - disk->minor = MINOR(dev); - - if (mddev->pers->diskop(mddev, &disk, DISKOP_HOT_ADD_DISK)) { - MD_BUG(); - err = -EINVAL; - goto abort_unbind_export; - } - - mark_disk_spare(disk); - mddev->sb->nr_disks++; - mddev->sb->spare_disks++; - mddev->sb->working_disks++; - - __md_update_sb(mddev); - - /* - * Kick recovery, maybe this spare has to be added to the - * array immediately. - */ - md_recover_arrays(); - - return 0; - -abort_unbind_export: - unbind_rdev_from_array(rdev); - -abort_export: - export_rdev(rdev); - return err; -} - -#define SET_SB(x) mddev->sb->x = info->x -static int set_array_info(mddev_t * mddev, mdu_array_info_t *info) -{ - - if (alloc_array_sb(mddev)) - return -ENOMEM; - - mddev->sb->major_version = MD_MAJOR_VERSION; - mddev->sb->minor_version = MD_MINOR_VERSION; - mddev->sb->patch_version = MD_PATCHLEVEL_VERSION; - mddev->sb->ctime = CURRENT_TIME; - - SET_SB(level); - SET_SB(size); - SET_SB(nr_disks); - SET_SB(raid_disks); - SET_SB(md_minor); - SET_SB(not_persistent); - - SET_SB(state); - SET_SB(active_disks); - SET_SB(working_disks); - SET_SB(failed_disks); - SET_SB(spare_disks); - - SET_SB(layout); - SET_SB(chunk_size); - - mddev->sb->md_magic = MD_SB_MAGIC; - - /* - * Generate a 128 bit UUID - */ - get_random_bytes(&mddev->sb->set_uuid0, 4); - get_random_bytes(&mddev->sb->set_uuid1, 4); - get_random_bytes(&mddev->sb->set_uuid2, 4); - get_random_bytes(&mddev->sb->set_uuid3, 4); - - return 0; -} -#undef SET_SB - -static int set_disk_faulty(mddev_t *mddev, kdev_t dev) -{ - int ret; - - ret = md_error(mddev, dev); - return ret; -} - -static int md_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg) -{ - unsigned int minor; - int err = 0; - struct hd_geometry *loc = (struct hd_geometry *) arg; - mddev_t *mddev = NULL; - kdev_t dev; - - if (!md_capable_admin()) - return -EACCES; - - dev = inode->i_rdev; - minor = MINOR(dev); - if (minor >= MAX_MD_DEVS) { - MD_BUG(); - return -EINVAL; - } - - /* - * Commands dealing with the RAID driver but not any - * particular array: - */ - switch (cmd) - { - case RAID_VERSION: - err = get_version((void *)arg); - goto done; - - case PRINT_RAID_DEBUG: - err = 0; - md_print_devices(); - goto done; - -#ifndef MODULE - case RAID_AUTORUN: - err = 0; - autostart_arrays(); - goto done; -#endif - - case BLKGETSIZE: - case BLKGETSIZE64: - case BLKRAGET: - case BLKRASET: - case BLKFLSBUF: - case BLKBSZGET: - case BLKBSZSET: - err = blk_ioctl (dev, cmd, arg); - goto abort; - - default:; - } - - /* - * Commands creating/starting a new array: - */ - - mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) { - BUG(); - goto abort; - } - - - if (cmd == START_ARRAY) { - /* START_ARRAY doesn't need to lock the array as autostart_array - * does the locking, and it could even be a different array - */ - err = autostart_array(val_to_kdev(arg)); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name(val_to_kdev(arg))); - goto abort; - } - goto done; - } - - err = mddev_lock(mddev); - if (err) { - printk(KERN_INFO "md: ioctl lock interrupted, reason %d, cmd %d\n", - err, cmd); - goto abort; - } - - switch (cmd) - { - case SET_ARRAY_INFO: - - if (!list_empty(&mddev->disks)) { - printk(KERN_WARNING "md: array md%d already has disks!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - if (mddev->sb) { - printk(KERN_WARNING "md: array md%d already has a superblock!\n", - mdidx(mddev)); - err = -EBUSY; - goto abort_unlock; - } - if (arg) { - mdu_array_info_t info; - if (md_copy_from_user(&info, (void*)arg, sizeof(info))) { - err = -EFAULT; - goto abort_unlock; - } - err = set_array_info(mddev, &info); - if (err) { - printk(KERN_WARNING "md: couldnt set array info. %d\n", err); - goto abort_unlock; - } - } - goto done_unlock; - -<<<<<<< found - err = autostart_array((kdev_t)arg); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name((kdev_t)arg)); -||||||| expected - err = autostart_array(val_to_kdev(arg)); - if (err) { - printk(KERN_WARNING "md: autostart %s failed!\n", - partition_name(val_to_kdev(arg))); -======= ->>>>>>> replacement - default:; - } - - /* - * Commands querying/configuring an existing array: - */ - /* if we don't have a superblock yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */ - if (!mddev->sb && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) { - err = -ENODEV; - goto abort_unlock; - } - - /* - * Commands even a read-only array can execute: - */ - switch (cmd) - { - case GET_ARRAY_INFO: - err = get_array_info(mddev, (void *)arg); - goto done_unlock; - - case GET_DISK_INFO: - err = get_disk_info(mddev, (void *)arg); - goto done_unlock; - - case RESTART_ARRAY_RW: - err = restart_array(mddev); - goto done_unlock; - - case STOP_ARRAY: - err = do_md_stop (mddev, 0); - goto done_unlock; - - case STOP_ARRAY_RO: - err = do_md_stop (mddev, 1); - goto done_unlock; - - /* - * We have a problem here : there is no easy way to give a CHS - * virtual geometry. We currently pretend that we have a 2 heads - * 4 sectors (with a BIG number of cylinders...). This drives - * dosfs just mad... ;-) - */ - case HDIO_GETGEO: - if (!loc) { - err = -EINVAL; - goto abort_unlock; - } - err = md_put_user (2, (char *) &loc->heads); - if (err) - goto abort_unlock; - err = md_put_user (4, (char *) &loc->sectors); - if (err) - goto abort_unlock; - err = md_put_user (md_hd_struct[mdidx(mddev)].nr_sects/8, - (short *) &loc->cylinders); - if (err) - goto abort_unlock; - err = md_put_user (md_hd_struct[minor].start_sect, - (long *) &loc->start); - goto done_unlock; - } - - /* - * The remaining ioctls are changing the state of the - * superblock, so we do not allow read-only arrays - * here: - */ - if (mddev->ro) { - err = -EROFS; - goto abort_unlock; - } - - switch (cmd) - { - case ADD_NEW_DISK: - { - mdu_disk_info_t info; - if (md_copy_from_user(&info, (void*)arg, sizeof(info))) - err = -EFAULT; - else - err = add_new_disk(mddev, &info); - goto done_unlock; - } - case HOT_GENERATE_ERROR: - err = hot_generate_error(mddev, (kdev_t)arg); - goto done_unlock; - case HOT_REMOVE_DISK: - err = hot_remove_disk(mddev, (kdev_t)arg); - goto done_unlock; - - case HOT_ADD_DISK: - err = hot_add_disk(mddev, (kdev_t)arg); - goto done_unlock; - - case SET_DISK_FAULTY: - err = set_disk_faulty(mddev, (kdev_t)arg); - goto done_unlock; - - case RUN_ARRAY: - { - err = do_md_run (mddev); - /* - * we have to clean up the mess if - * the array cannot be run for some - * reason ... - */ - if (err) { - mddev->sb_dirty = 0; - do_md_stop (mddev, 0); - } - goto done_unlock; - } - - default: - printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, " - "upgrade your software to use new ictls.\n", - current->comm, current->pid); - err = -EINVAL; - goto abort_unlock; - } - -done_unlock: -abort_unlock: - mddev_unlock(mddev); - - return err; -done: - if (err) - MD_BUG(); -abort: - return err; -} - -static int md_open(struct inode *inode, struct file *file) -{ - /* - * Succeed if we can find or allocate a mddev structure. - */ - mddev_t *mddev = mddev_find(minor(inode->i_rdev)); - int err = -ENOMEM; - - if (!mddev) - goto out; - - if ((err = mddev_lock(mddev))) - goto put; - - err = 0; - mddev_unlock(mddev); - inode->i_bdev->bd_inode->u.generic_ip = mddev_get(mddev); - put: - mddev_put(mddev); - out: - return err; -} - -static int md_release(struct inode *inode, struct file * file) -{ - mddev_t *mddev = inode->i_bdev->bd_inode->u.generic_ip; - - if (!mddev) - BUG(); - mddev_put(mddev); - - return 0; -} - -static struct block_device_operations md_fops= -{ - owner: THIS_MODULE, - open: md_open, - release: md_release, - ioctl: md_ioctl, -}; - - -int md_thread(void * arg) -{ - mdk_thread_t *thread = arg; - - md_lock_kernel(); - - /* - * Detach thread - */ - - daemonize(); - reparent_to_init(); - - sprintf(current->comm, thread->name); - md_init_signals(); - md_flush_signals(); - thread->tsk = current; - - /* - * md_thread is a 'system-thread', it's priority should be very - * high. We avoid resource deadlocks individually in each - * raid personality. (RAID5 does preallocation) We also use RR and - * the very same RT priority as kswapd, thus we will never get - * into a priority inversion deadlock. - * - * we definitely have to have equal or higher priority than - * bdflush, otherwise bdflush will deadlock if there are too - * many dirty RAID5 blocks. - */ - current->policy = SCHED_OTHER; - current->nice = -20; - md_unlock_kernel(); - - complete(thread->event); - while (thread->run) { - void (*run)(void *data); - - wait_event_interruptible(thread->wqueue, - test_bit(THREAD_WAKEUP, &thread->flags)); - - clear_bit(THREAD_WAKEUP, &thread->flags); - - run = thread->run; - if (run) { - run(thread->data); - run_task_queue(&tq_disk); - } - if (md_signal_pending(current)) - md_flush_signals(); - } - complete(thread->event); - return 0; -} - -void md_wakeup_thread(mdk_thread_t *thread) -{ - dprintk("md: waking up MD thread %p.\n", thread); - set_bit(THREAD_WAKEUP, &thread->flags); - wake_up(&thread->wqueue); -} - -mdk_thread_t *md_register_thread(void (*run) (void *), - void *data, const char *name) -{ - mdk_thread_t *thread; - int ret; - struct completion event; - - thread = (mdk_thread_t *) kmalloc - (sizeof(mdk_thread_t), GFP_KERNEL); - if (!thread) - return NULL; - - memset(thread, 0, sizeof(mdk_thread_t)); - md_init_waitqueue_head(&thread->wqueue); - - init_completion(&event); - thread->event = &event; - thread->run = run; - thread->data = data; - thread->name = name; - ret = kernel_thread(md_thread, thread, 0); - if (ret < 0) { - kfree(thread); - return NULL; - } - wait_for_completion(&event); - return thread; -} - -void md_interrupt_thread(mdk_thread_t *thread) -{ - if (!thread->tsk) { - MD_BUG(); - return; - } - dprintk("interrupting MD-thread pid %d\n", thread->tsk->pid); - send_sig(SIGKILL, thread->tsk, 1); -} - -void md_unregister_thread(mdk_thread_t *thread) -{ - struct completion event; - - init_completion(&event); - - thread->event = &event; - thread->run = NULL; - thread->name = NULL; - md_interrupt_thread(thread); - wait_for_completion(&event); - kfree(thread); -} - -static void md_recover_arrays(void) -{ - if (!md_recovery_thread) { - MD_BUG(); - return; - } - md_wakeup_thread(md_recovery_thread); -} - - -int md_error(mddev_t *mddev, kdev_t rdev) -{ - mdk_rdev_t * rrdev; - - dprintk("md_error dev:(%d:%d), rdev:(%d:%d), (caller: %p,%p,%p,%p).\n", - MD_MAJOR,mdidx(mddev),MAJOR(rdev),MINOR(rdev), - __builtin_return_address(0),__builtin_return_address(1), - __builtin_return_address(2),__builtin_return_address(3)); - - if (!mddev) { - MD_BUG(); - return 0; - } - rrdev = find_rdev(mddev, rdev); - if (!rrdev || rrdev->faulty) - return 0; - if (!mddev->pers->error_handler - || mddev->pers->error_handler(mddev,rdev) <= 0) { - rrdev->faulty = 1; - } else - return 1; - /* - * if recovery was running, stop it now. - */ - if (mddev->recovery_running) - mddev->recovery_running = -EIO; - md_recover_arrays(); - - return 0; -} - -static void status_unused(struct seq_file *seq) -{ - int i = 0; - mdk_rdev_t *rdev; - struct md_list_head *tmp; - - seq_printf(seq, "unused devices: "); - - ITERATE_RDEV_ALL(rdev,tmp) { - if (list_empty(&rdev->same_set)) { - /* - * The device is not yet used by any array. - */ - i++; - seq_printf(seq, "%s ", - partition_name(rdev->dev)); - } - } - if (!i) - seq_printf(seq, ""); - - seq_printf(seq, "\n"); -} - - -static void status_resync(struct seq_file *seq, mddev_t * mddev) -{ - unsigned long max_blocks, resync, res, dt, db, rt; - - resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active))/2; - max_blocks = mddev->sb->size; - - /* - * Should not happen. - */ - if (!max_blocks) - MD_BUG(); - - res = (resync/1024)*1000/(max_blocks/1024 + 1); - { - int i, x = res/50, y = 20-x; - seq_printf(seq, "["); - for (i = 0; i < x; i++) - seq_printf(seq, "="); - seq_printf(seq, ">"); - for (i = 0; i < y; i++) - seq_printf(seq, "."); - seq_printf(seq, "] "); - } - seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)", - (mddev->spare ? "recovery" : "resync"), - res/10, res % 10, resync, max_blocks); - - /* - * We do not want to overflow, so the order of operands and - * the * 100 / 100 trick are important. We do a +1 to be - * safe against division by zero. We only estimate anyway. - * - * dt: time from mark until now - * db: blocks written from mark until now - * rt: remaining time - */ - dt = ((jiffies - mddev->resync_mark) / HZ); - if (!dt) dt++; - db = resync - (mddev->resync_mark_cnt/2); - rt = (dt * ((max_blocks-resync) / (db/100+1)))/100; - - seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6); - - seq_printf(seq, " speed=%ldK/sec", db/dt); - -} - - -static void *md_seq_start(struct seq_file *seq, loff_t *pos) -{ - struct list_head *tmp; - loff_t l = *pos; - mddev_t *mddev; - - if (l > 0x10000) - return NULL; - if (!l--) - /* header */ - return (void*)1; - - list_for_each(tmp,&all_mddevs) - if (!l--) { - mddev = list_entry(tmp, mddev_t, all_mddevs); - return mddev; - } - return (void*)2;/* tail */ -} - -static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) -{ - struct list_head *tmp; - mddev_t *next_mddev, *mddev = v; - - ++*pos; - if (v == (void*)2) - return NULL; - - if (v == (void*)1) - tmp = all_mddevs.next; - else - tmp = mddev->all_mddevs.next; - if (tmp != &all_mddevs) - next_mddev = list_entry(tmp,mddev_t,all_mddevs); - else { - next_mddev = (void*)2; - *pos = 0x10000; - } - - return next_mddev; - -} - -static void md_seq_stop(struct seq_file *seq, void *v) -{ - -} - -static int md_seq_show(struct seq_file *seq, void *v) -{ - int j, size; - struct md_list_head *tmp2; - mdk_rdev_t *rdev; - mddev_t *mddev = v; - - if (v == (void*)1) { - seq_printf(seq, "Personalities : "); - for (j = 0; j < MAX_PERSONALITY; j++) - if (pers[j]) - seq_printf(seq, "[%s] ", pers[j]->name); - - seq_printf(seq, "\n"); - seq_printf(seq, "read_ahead "); - if (read_ahead[MD_MAJOR] == INT_MAX) - seq_printf(seq, "not set\n"); - else - seq_printf(seq, "%d sectors\n", read_ahead[MD_MAJOR]); - return 0; - } - if (v == (void*)2) { - status_unused(seq); - return 0; - } - - seq_printf(seq, "md%d : %sactive", mdidx(mddev), - mddev->pers ? "" : "in"); - if (mddev->pers) { - if (mddev->ro) - seq_printf(seq, " (read-only)"); - seq_printf(seq, " %s", mddev->pers->name); - } - - size = 0; - ITERATE_RDEV(mddev,rdev,tmp2) { - seq_printf(seq, " %s[%d]", - partition_name(rdev->dev), rdev->desc_nr); - if (rdev->faulty) { - seq_printf(seq, "(F)"); - continue; - } - size += rdev->size; - } - - if (!list_empty(&mddev->disks)) { - if (mddev->pers) - seq_printf(seq, "\n %d blocks", - md_size[mdidx(mddev)]); - else - seq_printf(seq, "\n %d blocks", size); - } - - if (mddev->pers) { - - mddev->pers->status (seq, mddev); - - seq_printf(seq, "\n "); - if (mddev->curr_resync > 1) - status_resync (seq, mddev); - else if (mddev->curr_resync == 1) - seq_printf(seq, " resync=DELAYED"); - - } - seq_printf(seq, "\n"); - return 0; -} - - -static struct seq_operations md_seq_ops = { - .start = md_seq_start, - .next = md_seq_next, - .stop = md_seq_stop, - .show = md_seq_show, -}; - -static int md_seq_open(struct inode *inode, struct file *file) -{ - int error; - - error = seq_open(file, &md_seq_ops); - return error; -} - -static struct file_operations md_seq_fops = { - .open = md_seq_open, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - - -int register_md_personality(int pnum, mdk_personality_t *p) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - if (pers[pnum]) { - MD_BUG(); - return -EBUSY; - } - - pers[pnum] = p; - printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum); - return 0; -} - -int unregister_md_personality(int pnum) -{ - if (pnum >= MAX_PERSONALITY) { - MD_BUG(); - return -EINVAL; - } - - printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name); - pers[pnum] = NULL; - return 0; -} - -mdp_disk_t *get_spare(mddev_t *mddev) -{ - mdp_super_t *sb = mddev->sb; - mdp_disk_t *disk; - mdk_rdev_t *rdev; - struct md_list_head *tmp; - - ITERATE_RDEV(mddev,rdev,tmp) { - if (rdev->faulty) - continue; - if (!rdev->sb) { - MD_BUG(); - continue; - } - disk = &sb->disks[rdev->desc_nr]; - if (disk_faulty(disk)) { - MD_BUG(); - continue; - } - if (disk_active(disk)) - continue; - return disk; - } - return NULL; -} - -static unsigned int sync_io[DK_MAX_MAJOR][DK_MAX_DISK]; -void md_sync_acct(kdev_t dev, unsigned long nr_sectors) -{ - unsigned int major = MAJOR(dev); - unsigned int index; - - index = disk_index(dev); - if ((index >= DK_MAX_DISK) || (major >= DK_MAX_MAJOR)) - return; - - sync_io[major][index] += nr_sectors; -} - -static int is_mddev_idle(mddev_t *mddev) -{ - mdk_rdev_t * rdev; - struct md_list_head *tmp; - int idle; - unsigned long curr_events; - - idle = 1; - ITERATE_RDEV(mddev,rdev,tmp) { - int major = MAJOR(rdev->dev); - int idx = disk_index(rdev->dev); - - if ((idx >= DK_MAX_DISK) || (major >= DK_MAX_MAJOR)) - continue; - - curr_events = kstat.dk_drive_rblk[major][idx] + - kstat.dk_drive_wblk[major][idx] ; - curr_events -= sync_io[major][idx]; - if ((curr_events - rdev->last_events) > 32) { - rdev->last_events = curr_events; - idle = 0; - } - } - return idle; -} - -void md_done_sync(mddev_t *mddev, int blocks, int ok) -{ - /* another "blocks" (512byte) blocks have been synced */ - atomic_sub(blocks, &mddev->recovery_active); - wake_up(&mddev->recovery_wait); - if (!ok) { - mddev->recovery_running = -EIO; - md_recover_arrays(); - // stop recovery, signal do_sync .... - if (mddev->pers->stop_resync) - mddev->pers->stop_resync(mddev); - if (mddev->recovery_running) - md_interrupt_thread(md_recovery_thread); - } -} - - -DECLARE_WAIT_QUEUE_HEAD(resync_wait); - -#define SYNC_MARKS 10 -#define SYNC_MARK_STEP (3*HZ) -static void md_do_sync(void *data) -{ - mddev_t *mddev = data; - mddev_t *mddev2; - unsigned int max_sectors, currspeed, - j, window, err; - unsigned long mark[SYNC_MARKS]; - unsigned long mark_cnt[SYNC_MARKS]; - int last_mark,m; - struct md_list_head *tmp; - unsigned long last_check; - - /* just incase thread restarts... */ - if (mddev->recovery_running <= 0) - return; - - /* we overload curr_resync somewhat here. - * 0 == not engaged in resync at all - * 2 == checking that there is no conflict with another sync - * 1 == like 2, but have yielded to allow conflicting resync to - * commense - * other == active in resync - this many blocks - */ - do { - mddev->curr_resync = 2; - - ITERATE_MDDEV(mddev2,tmp) { - if (mddev2 == mddev) - continue; - if (mddev2->curr_resync && - match_mddev_units(mddev,mddev2)) { - printk(KERN_INFO "md: delaying resync of md%d until md%d " - "has finished resync (they share one or more physical units)\n", - mdidx(mddev), mdidx(mddev2)); - if (mddev < mddev2) /* arbitrarily yield */ - mddev->curr_resync = 1; - if (wait_event_interruptible(resync_wait, - mddev2->curr_resync < 2)) { - md_flush_signals(); - err = -EINTR; - mddev_put(mddev2); - goto out; - } - } - } - } while (mddev->curr_resync < 2); - - max_sectors = mddev->sb->size<<1; - - printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev)); - printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed: %d KB/sec/disc.\n", - sysctl_speed_limit_min); - printk(KERN_INFO "md: using maximum available idle IO bandwith " - "(but not more than %d KB/sec) for reconstruction.\n", - sysctl_speed_limit_max); - - /* - * Resync has low priority. - */ - current->nice = 19; - - is_mddev_idle(mddev); /* this also initializes IO event counters */ - for (m = 0; m < SYNC_MARKS; m++) { - mark[m] = jiffies; - mark_cnt[m] = 0; - } - last_mark = 0; - mddev->resync_mark = mark[last_mark]; - mddev->resync_mark_cnt = mark_cnt[last_mark]; - - /* - * Tune reconstruction: - */ - window = vm_max_readahead*(PAGE_SIZE/512); - printk(KERN_INFO "md: using %dk window, over a total of %d blocks.\n", - window/2,max_sectors/2); - - atomic_set(&mddev->recovery_active, 0); - init_waitqueue_head(&mddev->recovery_wait); - last_check = 0; - for (j = 0; j < max_sectors;) { - int sectors; - - sectors = mddev->pers->sync_request(mddev, j); - - if (sectors < 0) { - err = sectors; - goto out; - } - atomic_add(sectors, &mddev->recovery_active); - j += sectors; - if (j>1) mddev->curr_resync = j; - - if (last_check + window > j) - continue; - - last_check = j; - - run_task_queue(&tq_disk); - - repeat: - if (jiffies >= mark[last_mark] + SYNC_MARK_STEP ) { - /* step marks */ - int next = (last_mark+1) % SYNC_MARKS; - - mddev->resync_mark = mark[next]; - mddev->resync_mark_cnt = mark_cnt[next]; - mark[next] = jiffies; - mark_cnt[next] = j - atomic_read(&mddev->recovery_active); - last_mark = next; - } - - - if (md_signal_pending(current)) { - /* - * got a signal, exit. - */ - printk(KERN_INFO "md: md_do_sync() got signal ... exiting\n"); - md_flush_signals(); - err = -EINTR; - goto out; - } - - /* - * this loop exits only if either when we are slower than - * the 'hard' speed limit, or the system was IO-idle for - * a jiffy. - * the system might be non-idle CPU-wise, but we only care - * about not overloading the IO subsystem. (things like an - * e2fsck being done on the RAID array should execute fast) - */ - if (md_need_resched(current)) - schedule(); - - currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1; - - if (currspeed > sysctl_speed_limit_min) { - current->nice = 19; - - if ((currspeed > sysctl_speed_limit_max) || - !is_mddev_idle(mddev)) { - current->state = TASK_INTERRUPTIBLE; - md_schedule_timeout(HZ/4); - goto repeat; - } - } else - current->nice = -20; - } - printk(KERN_INFO "md: md%d: sync done.\n",mdidx(mddev)); - err = 0; - /* - * this also signals 'finished resyncing' to md_stop - */ -out: - wait_disk_event(mddev->recovery_wait, atomic_read(&mddev->recovery_active)==0); - /* tell personality that we are finished */ - mddev->pers->sync_request(mddev, max_sectors, 1); - - mddev->curr_resync = 0; - if (err) - mddev->recovery_running = err; - if (mddev->recovery_running > 0) - mddev->recovery_running = 0; - if (mddev->recovery_running == 0) - mddev->in_sync = 1; - md_recover_arrays(); -} - - -/* - * This is the kernel thread that watches all md arrays for re-sync action - * that might be needed. - * It does not do any resync itself, but rather "forks" off other threads - * to do that as needed. - * When it is determined that resync is needed, we set "->recovery_running" and - * create a thread at ->sync_thread. - * When the thread finishes is clears recovery_running (or set and error) - * and wakeup up this thread which will reap the thread and finish up. - */ -void md_do_recovery(void *data) -{ - mddev_t *mddev; - mdp_super_t *sb; - struct md_list_head *tmp; - - dprintk(KERN_INFO "md: recovery thread got woken up ...\n"); - - ITERATE_MDDEV(mddev,tmp) if (mddev_lock(mddev)==0) { - sb = mddev->sb; - if (!sb || !mddev->pers || !mddev->pers->diskop || mddev->ro) - goto unlock; - if (mddev->recovery_running > 0) - /* resync/recovery still happening */ - goto unlock; - if (mddev->sb_dirty) - md_update_sb(mddev); - if (mddev->sync_thread) { - /* resync has finished, collect result */ - md_unregister_thread(mddev->sync_thread); - mddev->sync_thread = NULL; - if (mddev->recovery_running < 0) { - /* some sort of failure. - * If we were doing a reconstruction, - * we need to retrieve the spare - */ - if (mddev->spare) { - mddev->pers->diskop(mddev, &mddev->spare, - DISKOP_SPARE_INACTIVE); - mddev->spare = NULL; - } - } else { - /* success...*/ - if (mddev->spare) { - mddev->pers->diskop(mddev, &mddev->spare, - DISKOP_SPARE_ACTIVE); - mark_disk_sync(mddev->spare); - mark_disk_active(mddev->spare); - sb->active_disks++; - sb->spare_disks--; - mddev->spare = NULL; - } - } - __md_update_sb(mddev); - mddev->recovery_running = 0; - wake_up(&resync_wait); - goto unlock; - } - if (mddev->recovery_running) { - /* that's odd.. */ - mddev->recovery_running = 0; - wake_up(&resync_wait); - } - - if (sb->active_disks < sb->raid_disks) { - mddev->spare = get_spare(mddev); - if (!mddev->spare) - printk(KERN_ERR "md%d: no spare disk to reconstruct array! " - "-- continuing in degraded mode\n", mdidx(mddev)); - else - printk(KERN_INFO "md%d: resyncing spare disk %s to replace failed disk\n", - mdidx(mddev), partition_name(MKDEV(mddev->spare->major,mddev->spare->minor))); - } - if (!mddev->spare && mddev->in_sync) { - /* nothing we can do ... */ - goto unlock; - } - if (mddev->pers->sync_request) { - mddev->sync_thread = md_register_thread(md_do_sync, - mddev, - "md_resync"); - if (!mddev->sync_thread) { - printk(KERN_ERR "md%d: could not start resync thread...\n", mdidx(mddev)); - if (mddev->spare) - mddev->pers->diskop(mddev, &mddev->spare, DISKOP_SPARE_INACTIVE); - mddev->spare = NULL; - mddev->recovery_running = 0; - } else { - if (mddev->spare) - mddev->pers->diskop(mddev, &mddev->spare, DISKOP_SPARE_WRITE); - mddev->recovery_running = 1; - md_wakeup_thread(mddev->sync_thread); - } - } - unlock: - mddev_unlock(mddev); - } - dprintk(KERN_INFO "md: recovery thread finished ...\n"); - -} - -int md_notify_reboot(struct notifier_block *this, - unsigned long code, void *x) -{ - struct md_list_head *tmp; - mddev_t *mddev; - - if ((code == MD_SYS_DOWN) || (code == MD_SYS_HALT) - || (code == MD_SYS_POWER_OFF)) { - - printk(KERN_INFO "md: stopping all md devices.\n"); - - ITERATE_MDDEV(mddev,tmp) - if (mddev_trylock(mddev)==0) - do_md_stop (mddev, 1); - /* - * certain more exotic SCSI devices are known to be - * volatile wrt too early system reboots. While the - * right place to handle this issue is the given - * driver, we do want to have a safe RAID driver ... - */ - md_mdelay(1000*1); - } - return NOTIFY_DONE; -} - -struct notifier_block md_notifier = { - notifier_call: md_notify_reboot, - next: NULL, - priority: INT_MAX, /* before any real devices */ -}; - -static void md_geninit(void) -{ - struct proc_dir_entry *p; - int i; - - for(i = 0; i < MAX_MD_DEVS; i++) { - md_blocksizes[i] = 1024; - md_size[i] = 0; - md_hardsect_sizes[i] = 512; - } - blksize_size[MAJOR_NR] = md_blocksizes; - blk_size[MAJOR_NR] = md_size; - max_readahead[MAJOR_NR] = md_maxreadahead; - hardsect_size[MAJOR_NR] = md_hardsect_sizes; - - dprintk("md: sizeof(mdp_super_t) = %d\n", (int)sizeof(mdp_super_t)); - -#ifdef CONFIG_PROC_FS - p = create_proc_entry("mdstat", S_IRUGO, NULL); - if (p) - p->proc_fops = &md_seq_fops; -#endif -} - -request_queue_t * md_queue_proc(kdev_t dev) -{ - mddev_t *mddev = mddev_find(minor(dev)); - request_queue_t *q = BLK_DEFAULT_QUEUE(MAJOR_NR); - if (!mddev || atomic_read(&mddev->active)<2) - BUG(); - if (mddev->pers) - q = &mddev->queue; - mddev_put(mddev); /* the caller must hold a reference... */ - return q; -} - -int md__init md_init(void) -{ - static char * name = "mdrecoveryd"; - int minor; - - printk(KERN_INFO "md: md driver %d.%d.%d MAX_MD_DEVS=%d, MD_SB_DISKS=%d\n", - MD_MAJOR_VERSION, MD_MINOR_VERSION, - MD_PATCHLEVEL_VERSION, MAX_MD_DEVS, MD_SB_DISKS); - - if (devfs_register_blkdev (MAJOR_NR, "md", &md_fops)) - { - printk(KERN_ALERT "md: Unable to get major %d for md\n", MAJOR_NR); - return (-1); - } - devfs_handle = devfs_mk_dir (NULL, "md", NULL); - /* we don't use devfs_register_series because we want to fill md_hd_struct */ - for (minor=0; minor < MAX_MD_DEVS; ++minor) { - char devname[128]; - sprintf (devname, "%u", minor); - md_hd_struct[minor].de = devfs_register (devfs_handle, - devname, DEVFS_FL_DEFAULT, MAJOR_NR, minor, - S_IFBLK | S_IRUSR | S_IWUSR, &md_fops, NULL); - } - - /* all requests on an uninitialised device get failed... */ - blk_queue_make_request(BLK_DEFAULT_QUEUE(MAJOR_NR), md_fail_request); - blk_dev[MAJOR_NR].queue = md_queue_proc; - - - read_ahead[MAJOR_NR] = INT_MAX; - - add_gendisk(&md_gendisk); - - md_recovery_thread = md_register_thread(md_do_recovery, NULL, name); - if (!md_recovery_thread) - printk(KERN_ALERT "md: bug: couldn't allocate md_recovery_thread\n"); - - md_register_reboot_notifier(&md_notifier); - raid_table_header = register_sysctl_table(raid_root_table, 1); - - md_geninit(); - return (0); -} - - -#ifndef MODULE - -/* - * When md (and any require personalities) are compiled into the kernel - * (not a module), arrays can be assembles are boot time using with AUTODETECT - * where specially marked partitions are registered with md_autodetect_dev(), - * and with MD_BOOT where devices to be collected are given on the boot line - * with md=..... - * The code for that is here. - */ - -struct { - int set; - int noautodetect; -} raid_setup_args md__initdata; - -/* - * Searches all registered partitions for autorun RAID arrays - * at boot time. - */ -static kdev_t detected_devices[128]; -static int dev_cnt; - -void md_autodetect_dev(kdev_t dev) -{ - if (dev_cnt >= 0 && dev_cnt < 127) - detected_devices[dev_cnt++] = dev; -} - - -static void autostart_arrays(void) -{ - mdk_rdev_t *rdev; - int i; - - printk(KERN_INFO "md: Autodetecting RAID arrays.\n"); - - for (i = 0; i < dev_cnt; i++) { - kdev_t dev = detected_devices[i]; - - if (md_import_device(dev,1)) { - printk(KERN_ALERT "md: could not import %s!\n", - partition_name(dev)); - continue; - } - /* - * Sanity checks: - */ - rdev = find_rdev_all(dev); - if (!rdev) { - MD_BUG(); - continue; - } - if (rdev->faulty) { - MD_BUG(); - continue; - } - md_list_add(&rdev->pending, &pending_raid_disks); - } - dev_cnt = 0; - - autorun_devices(); -} - -static struct { - char device_set [MAX_MD_DEVS]; - int pers[MAX_MD_DEVS]; - int chunk[MAX_MD_DEVS]; - char *device_names[MAX_MD_DEVS]; -} md_setup_args md__initdata; - -/* - * Parse the command-line parameters given our kernel, but do not - * actually try to invoke the MD device now; that is handled by - * md_setup_drive after the low-level disk drivers have initialised. - * - * 27/11/1999: Fixed to work correctly with the 2.3 kernel (which - * assigns the task of parsing integer arguments to the - * invoked program now). Added ability to initialise all - * the MD devices (by specifying multiple "md=" lines) - * instead of just one. -- KTK - * 18May2000: Added support for persistant-superblock arrays: - * md=n,0,factor,fault,device-list uses RAID0 for device n - * md=n,-1,factor,fault,device-list uses LINEAR for device n - * md=n,device-list reads a RAID superblock from the devices - * elements in device-list are read by name_to_kdev_t so can be - * a hex number or something like /dev/hda1 /dev/sdb - * 2001-06-03: Dave Cinege - * Shifted name_to_kdev_t() and related operations to md_set_drive() - * for later execution. Rewrote section to make devfs compatible. - */ -static int md__init md_setup(char *str) -{ - int minor, level, factor, fault; - char *pername = ""; - char *str1 = str; - - if (get_option(&str, &minor) != 2) { /* MD Number */ - printk(KERN_WARNING "md: Too few arguments supplied to md=.\n"); - return 0; - } - if (minor >= MAX_MD_DEVS) { - printk(KERN_WARNING "md: md=%d, Minor device number too high.\n", minor); - return 0; - } else if (md_setup_args.device_names[minor]) { - printk(KERN_WARNING "md: md=%d, Specified more then once. " - "Replacing previous definition.\n", minor); - } - switch (get_option(&str, &level)) { /* RAID Personality */ - case 2: /* could be 0 or -1.. */ - if (level == 0 || level == -1) { - if (get_option(&str, &factor) != 2 || /* Chunk Size */ - get_option(&str, &fault) != 2) { - printk(KERN_WARNING "md: Too few arguments supplied to md=.\n"); - return 0; - } - md_setup_args.pers[minor] = level; - md_setup_args.chunk[minor] = 1 << (factor+12); - switch(level) { - case -1: - level = LINEAR; - pername = "linear"; - break; - case 0: - level = RAID0; - pername = "raid0"; - break; - default: - printk(KERN_WARNING - "md: The kernel has not been configured for raid%d support!\n", - level); - return 0; - } - md_setup_args.pers[minor] = level; - break; - } - /* FALL THROUGH */ - case 1: /* the first device is numeric */ - str = str1; - /* FALL THROUGH */ - case 0: - md_setup_args.pers[minor] = 0; - pername="super-block"; - } - - printk(KERN_INFO "md: Will configure md%d (%s) from %s, below.\n", - minor, pername, str); - md_setup_args.device_names[minor] = str; - - return 1; -} - -extern kdev_t name_to_kdev_t(char *line) md__init; -void md__init md_setup_drive(void) -{ - int minor, i; - kdev_t dev; - mddev_t*mddev; - kdev_t devices[MD_SB_DISKS+1]; - - for (minor = 0; minor < MAX_MD_DEVS; minor++) { - int err = 0; - char *devname; - mdu_disk_info_t dinfo; - - if ((devname = md_setup_args.device_names[minor]) == 0) continue; - - for (i = 0; i < MD_SB_DISKS && devname != 0; i++) { - - char *p; - void *handle; - - p = strchr(devname, ','); - if (p) - *p++ = 0; - - dev = name_to_kdev_t(devname); - handle = devfs_find_handle(NULL, devname, MAJOR (dev), MINOR (dev), - DEVFS_SPECIAL_BLK, 1); - if (handle != 0) { - unsigned major, minor; - devfs_get_maj_min(handle, &major, &minor); - dev = MKDEV(major, minor); - } - if (dev == 0) { - printk(KERN_WARNING "md: Unknown device name: %s\n", devname); - break; - } - - devices[i] = dev; - md_setup_args.device_set[minor] = 1; - - devname = p; - } - devices[i] = 0; - - if (md_setup_args.device_set[minor] == 0) - continue; - - printk(KERN_INFO "md: Loading md%d: %s\n", minor, md_setup_args.device_names[minor]); - - mddev = mddev_find(minor); - if (!mddev) { - printk(KERN_ERR "md: kmalloc failed - cannot start array %d\n", minor); - continue; - } - if (mddev_lock(mddev)) { - printk(KERN_WARNING - "md: Ignoring md=%d, cannot lock!\n", - minor); - mddev_put(mddev); - continue; - } - - if (mddev->sb || !list_empty(&mddev->disks)) { - printk(KERN_WARNING - "md: Ignoring md=%d, already autodetected. (Use raid=noautodetect)\n", - minor); - mddev_unlock(mddev); - mddev_put(mddev); - continue; - } - if (md_setup_args.pers[minor]) { - /* non-persistent */ - mdu_array_info_t ainfo; - ainfo.level = pers_to_level(md_setup_args.pers[minor]); - ainfo.size = 0; - ainfo.nr_disks =0; - ainfo.raid_disks =0; - ainfo.md_minor =minor; - ainfo.not_persistent = 1; - - ainfo.state = (1 << MD_SB_CLEAN); - ainfo.active_disks = 0; - ainfo.working_disks = 0; - ainfo.failed_disks = 0; - ainfo.spare_disks = 0; - ainfo.layout = 0; - ainfo.chunk_size = md_setup_args.chunk[minor]; - err = set_array_info(mddev, &ainfo); - for (i = 0; !err && (dev = devices[i]); i++) { - dinfo.number = i; - dinfo.raid_disk = i; - dinfo.state = (1<sb->nr_disks++; - mddev->sb->raid_disks++; - mddev->sb->active_disks++; - mddev->sb->working_disks++; - err = add_new_disk (mddev, &dinfo); - } - } else { - /* persistent */ - for (i = 0; (dev = devices[i]); i++) { - dinfo.major = MAJOR(dev); - dinfo.minor = MINOR(dev); - add_new_disk (mddev, &dinfo); - } - } - if (!err) - err = do_md_run(mddev); - if (err) { - mddev->sb_dirty = 0; - do_md_stop(mddev, 0); - printk(KERN_WARNING "md: starting md%d failed\n", minor); - } - mddev_unlock(mddev); - mddev_put(mddev); - } -} - -static int md__init raid_setup(char *str) -{ - int len, pos; - - len = strlen(str) + 1; - pos = 0; - - while (pos < len) { - char *comma = strchr(str+pos, ','); - int wlen; - if (comma) - wlen = (comma-str)-pos; - else wlen = (len-1)-pos; - - if (strncmp(str, "noautodetect", wlen) == 0) - raid_setup_args.noautodetect = 1; - pos += wlen+1; - } - raid_setup_args.set = 1; - return 1; -} - -int md__init md_run_setup(void) -{ - if (raid_setup_args.noautodetect) - printk(KERN_INFO "md: Skipping autodetection of RAID arrays. (raid=noautodetect)\n"); - else - autostart_arrays(); - md_setup_drive(); - return 0; -} - -__setup("raid=", raid_setup); -__setup("md=", md_setup); - -__initcall(md_init); -__initcall(md_run_setup); - -#else /* It is a MODULE */ - -int init_module(void) -{ - return md_init(); -} - -static void free_device_names(void) -{ - while (!list_empty(&device_names)) { - struct dname *tmp = list_entry(device_names.next, - dev_name_t, list); - list_del(&tmp->list); - kfree(tmp); - } -} - - -void cleanup_module(void) -{ - md_unregister_thread(md_recovery_thread); - devfs_unregister(devfs_handle); - - devfs_unregister_blkdev(MAJOR_NR,"md"); - unregister_reboot_notifier(&md_notifier); - unregister_sysctl_table(raid_table_header); -#ifdef CONFIG_PROC_FS - remove_proc_entry("mdstat", NULL); -#endif - - del_gendisk(&md_gendisk); - - blk_dev[MAJOR_NR].queue = NULL; - blksize_size[MAJOR_NR] = NULL; - blk_size[MAJOR_NR] = NULL; - max_readahead[MAJOR_NR] = NULL; - hardsect_size[MAJOR_NR] = NULL; - - free_device_names(); - -} -#endif - -MD_EXPORT_SYMBOL(md_size); -MD_EXPORT_SYMBOL(register_md_personality); -MD_EXPORT_SYMBOL(unregister_md_personality); -MD_EXPORT_SYMBOL(partition_name); -MD_EXPORT_SYMBOL(md_error); -MD_EXPORT_SYMBOL(md_done_sync); -MD_EXPORT_SYMBOL(md_unregister_thread); -MD_EXPORT_SYMBOL(md_update_sb); -MD_EXPORT_SYMBOL(md_wakeup_thread); -MD_EXPORT_SYMBOL(md_print_devices); -MD_EXPORT_SYMBOL(find_rdev_nr); -MD_EXPORT_SYMBOL(md_interrupt_thread); -MODULE_LICENSE("GPL"); ./linux/md-autostart/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.580401268 +0000 @@ -1,354 +0,0 @@ -/* - * Implement the default iomap interfaces - * - * (C) Copyright 2004 Linus Torvalds - */ -#include -#include - -#include - -/* - * Read/write from/to an (offsettable) iomem cookie. It might be a PIO - * access or a MMIO access, these functions don't care. The info is - * encoded in the hardware mapping set up by the mapping functions - * (or the cookie itself, depending on implementation and hw). - * - * The generic routines don't assume any hardware mappings, and just - * encode the PIO/MMIO as part of the cookie. They coldly assume that - * the MMIO IO mappings are not in the low address range. - * - * Architectures for which this is not true can't use this generic - * implementation and should do their own copy. - */ - -#ifndef HAVE_ARCH_PIO_SIZE -/* - * We encode the physical PIO addresses (0-0xffff) into the - * pointer by offsetting them with a constant (0x10000) and - * assuming that all the low addresses are always PIO. That means - * we can do some sanity checks on the low bits, and don't - * need to just take things for granted. - */ -#define PIO_OFFSET 0x10000UL -#define PIO_MASK 0x0ffffUL -#define PIO_RESERVED 0x40000UL -#endif - -static void bad_io_access(unsigned long port, const char *access) -{ - static int count = 10; - if (count) { - count--; - WARN(1, KERN_ERR "Bad IO access at port %#lx (%s)\n", port, access); - } -} - -/* - * Ugly macros are a way of life. - */ -#define IO_COND(addr, is_pio, is_mmio) do { \ - unsigned long port = (unsigned long __force)addr; \ - if (port >= PIO_RESERVED) { \ - is_mmio; \ - } else if (port > PIO_OFFSET) { \ - port &= PIO_MASK; \ - is_pio; \ - } else \ - bad_io_access(port, #is_pio ); \ -} while (0) - -#ifndef pio_read16be -#define pio_read16be(port) swab16(inw(port)) -#define pio_read32be(port) swab32(inl(port)) -#endif - -#ifndef mmio_read16be -#define mmio_read16be(addr) be16_to_cpu(__raw_readw(addr)) -#define mmio_read32be(addr) be32_to_cpu(__raw_readl(addr)) -#endif - -unsigned int ioread8(void __iomem *addr) -{ - IO_COND(addr, return inb(port), return readb(addr)); - return 0xff; -} -unsigned int ioread16(void __iomem *addr) -{ - IO_COND(addr, return inw(port), return readw(addr)); - return 0xffff; -} -unsigned int ioread16be(void __iomem *addr) -{ - IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr)); - return 0xffff; -} -unsigned int ioread32(void __iomem *addr) -{ - IO_COND(addr, return inl(port), return readl(addr)); - return 0xffffffff; -} -unsigned int ioread32be(void __iomem *addr) -{ - IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr)); - return 0xffffffff; -} -EXPORT_SYMBOL(ioread8); -EXPORT_SYMBOL(ioread16); -EXPORT_SYMBOL(ioread16be); -EXPORT_SYMBOL(ioread32); -EXPORT_SYMBOL(ioread32be); - -#ifndef pio_write16be -#define pio_write16be(val,port) outw(swab16(val),port) -#define pio_write32be(val,port) outl(swab32(val),port) -#endif - -#ifndef mmio_write16be -#define mmio_write16be(val,port) __raw_writew(be16_to_cpu(val),port) -#define mmio_write32be(val,port) __raw_writel(be32_to_cpu(val),port) -#endif - -void iowrite8(u8 val, void __iomem *addr) -{ - IO_COND(addr, outb(val,port), writeb(val, addr)); -} -void iowrite16(u16 val, void __iomem *addr) -{ - IO_COND(addr, outw(val,port), writew(val, addr)); -} -void iowrite16be(u16 val, void __iomem *addr) -{ - IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr)); -} -void iowrite32(u32 val, void __iomem *addr) -{ - IO_COND(addr, outl(val,port), writel(val, addr)); -} -void iowrite32be(u32 val, void __iomem *addr) -{ - IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr)); -} -EXPORT_SYMBOL(iowrite8); -EXPORT_SYMBOL(iowrite16); -EXPORT_SYMBOL(iowrite16be); -EXPORT_SYMBOL(iowrite32); -EXPORT_SYMBOL(iowrite32be); - -/* - * These are the "repeat MMIO read/write" functions. - * Note the "__raw" accesses, since we don't want to - * convert to CPU byte order. We write in "IO byte - * order" (we also don't have IO barriers). - */ -#ifndef mmio_insb -static inline void mmio_insb(void __iomem *addr, u8 *dst, int count) -{ - while (--count >= 0) { - u8 data = __raw_readb(addr); - *dst = data; - dst++; - } -} -static inline void mmio_insw(void __iomem *addr, u16 *dst, int count) -{ - while (--count >= 0) { - u16 data = __raw_readw(addr); - *dst = data; - dst++; - } -} -static inline void mmio_insl(void __iomem *addr, u32 *dst, int count) -{ - while (--count >= 0) { - u32 data = __raw_readl(addr); - *dst = data; - dst++; - } -} -#endif - -#ifndef mmio_outsb -static inline void mmio_outsb(void __iomem *addr, const u8 *src, int count) -{ - while (--count >= 0) { - __raw_writeb(*src, addr); - src++; - } -} -static inline void mmio_outsw(void __iomem *addr, const u16 *src, int count) -{ - while (--count >= 0) { - __raw_writew(*src, addr); - src++; - } -} -static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count) -{ - while (--count >= 0) { - __raw_writel(*src, addr); - src++; - } -} -#endif - -void ioread8_rep(void __iomem *addr, void *dst, unsigned long count) -{ - IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count)); -} -void ioread16_rep(void __iomem *addr, void *dst, unsigned long count) -{ - IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count)); -} -void ioread32_rep(void __iomem *addr, void *dst, unsigned long count) -{ - IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count)); -} -EXPORT_SYMBOL(ioread8_rep); -EXPORT_SYMBOL(ioread16_rep); -EXPORT_SYMBOL(ioread32_rep); - -void iowrite8_rep(void __iomem *addr, const void *src, unsigned long count) -{ - IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count)); -} -void iowrite16_rep(void __iomem *addr, const void *src, unsigned long count) -{ - IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count)); -} -void iowrite32_rep(void __iomem *addr, const void *src, unsigned long count) -{ - IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count)); -} -EXPORT_SYMBOL(iowrite8_rep); -EXPORT_SYMBOL(iowrite16_rep); -EXPORT_SYMBOL(iowrite32_rep); - -#ifdef CONFIG_HAS_IOPORT -/* Create a virtual mapping cookie for an IO port range */ -void __iomem *ioport_map(unsigned long port, unsigned int nr) -{ - if (port > PIO_MASK) - return NULL; - return (void __iomem *) (unsigned long) (port + PIO_OFFSET); -} - -<<<<<<< found -void ioport_unmap(void __iomem *addr) -{ - /* Nothing to do */ -} -||||||| expected -#ifdef CONFIG_PCI -/** - * pci_iomap - create a virtual mapping cookie for a PCI BAR - * @dev: PCI device that owns the BAR - * @bar: BAR number - * @maxlen: length of the memory to map - * - * Using this function you will get a __iomem address to your device BAR. -======= -#ifdef CONFIG_PCI -/** - * pci_iomap_range - create a virtual mapping cookie for a PCI BAR - * @dev: PCI device that owns the BAR - * @bar: BAR number - * @offset: map memory at the given offset in BAR - * @minlen: min length of the memory to map - * @maxlen: max length of the memory to map - * - * Using this function you will get a __iomem address to your device BAR. ->>>>>>> replacement -EXPORT_SYMBOL(ioport_map); -EXPORT_SYMBOL(ioport_unmap); -#endif /* CONFIG_HAS_IOPORT */ - -#ifdef CONFIG_PCI -/* Hide the details if this is a MMIO or PIO address space and just do what -<<<<<<< found - * you expect in the correct way. */||||||| expected - * you expect from them in the correct way. - * - * @maxlen specifies the maximum length to map. If you want to get access to - * the complete BAR without checking for its length first, pass %0 here. - * */ -void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen) -{ - resource_size_t start = pci_resource_start(dev, bar); - resource_size_t len = pci_resource_len(dev, bar); - unsigned long flags = pci_resource_flags(dev, bar); - - if (!len || !start) - return NULL; - if (maxlen && len > maxlen) - len = maxlen; - if (flags & IORESOURCE_IO) - return ioport_map(start, len); - if (flags & IORESOURCE_MEM) { - if (flags & IORESOURCE_CACHEABLE) - return ioremap(start, len); - return ioremap_nocache(start, len); - } -======= - * you expect from them in the correct way. - * - * @minlen specifies the minimum length to map. We check that BAR is - * large enough. - * @maxlen specifies the maximum length to map. If you want to get access to - * the complete BAR from offset to the end, pass %0 here. - * @force_nocache makes the mapping noncacheable even if the BAR - * is prefetcheable. It has no effect otherwise. - * */ -void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, - unsigned offset, - unsigned long minlen, - unsigned long maxlen, - bool force_nocache) -{ - resource_size_t start = pci_resource_start(dev, bar); - resource_size_t len = pci_resource_len(dev, bar); - unsigned long flags = pci_resource_flags(dev, bar); - - if (len <= offset || !start) - return NULL; - len -= offset; - start += offset; - if (len < minlen) - return NULL; - if (maxlen && len > maxlen) - len = maxlen; - if (flags & IORESOURCE_IO) - return ioport_map(start, len); - if (flags & IORESOURCE_MEM) { - if (!force_nocache && (flags & IORESOURCE_CACHEABLE)) - return ioremap(start, len); - return ioremap_nocache(start, len); - } ->>>>>>> replacement - -/** - * pci_iomap - create a virtual mapping cookie for a PCI BAR - * @dev: PCI device that owns the BAR - * @bar: BAR number - * @maxlen: length of the memory to map - * - * Using this function you will get a __iomem address to your device BAR. - * You can access it using ioread*() and iowrite*(). These functions hide - * the details if this is a MMIO or PIO address space and will just do what - * you expect from them in the correct way. - * - * @maxlen specifies the maximum length to map. If you want to get access to - * the complete BAR without checking for its length first, pass %0 here. - * */ -void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen) -{ - return pci_iomap_range(dev, bar, 0, 0, maxlen, false); -} - -void pci_iounmap(struct pci_dev *dev, void __iomem * addr) -{ - IO_COND(addr, /* nothing */, iounmap(addr)); -} -EXPORT_SYMBOL(pci_iomap_range); -EXPORT_SYMBOL(pci_iounmap); -#endif /* CONFIG_PCI */ ./linux/iomap/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.592084622 +0000 @@ -1,1352 +0,0 @@ -/* - * linux/fs/inode.c - * - * (C) 1997 Linus Torvalds - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * This is needed for the following functions: - * - inode_has_buffers - * - invalidate_inode_buffers - * - fsync_bdev - * - invalidate_bdev - * - * FIXME: remove all knowledge of the buffer layer from this file - */ -#include - -/* - * New inode.c implementation. - * - * This implementation has the basic premise of trying - * to be extremely low-overhead and SMP-safe, yet be - * simple enough to be "obviously correct". - * - * Famous last words. - */ - -/* inode dynamic allocation 1999, Andrea Arcangeli */ - -/* #define INODE_PARANOIA 1 */ -/* #define INODE_DEBUG 1 */ - -/* - * Inode lookup is no longer as critical as it used to be: - * most of the lookups are going to be through the dcache. - */ -#define I_HASHBITS i_hash_shift -#define I_HASHMASK i_hash_mask - -static unsigned int i_hash_mask; -static unsigned int i_hash_shift; - -/* - * Each inode can be on two separate lists. One is - * the hash list of the inode, used for lookups. The - * other linked list is the "type" list: - * "in_use" - valid inode, i_count > 0, i_nlink > 0 - * "dirty" - as "in_use" but also dirty - * "unused" - valid inode, i_count = 0 - * - * A "dirty" list is maintained for each super block, - * allowing for low-overhead inode sync() operations. - */ - -LIST_HEAD(inode_in_use); -LIST_HEAD(inode_unused); -static struct hlist_head *inode_hashtable; -static HLIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */ - -/* - * A simple spinlock to protect the list manipulations. - * - * NOTE! You also have to own the lock if you change - * the i_state of an inode while it is in use.. - */ -spinlock_t inode_lock = SPIN_LOCK_UNLOCKED; - -/* - * iprune_sem provides exclusion between the kswapd or try_to_free_pages - * icache shrinking path, and the umount path. Without this exclusion, - * by the time prune_icache calls iput for the inode whose pages it has - * been invalidating, or by the time it calls clear_inode & destroy_inode - * from its final dispose_list, the struct super_block they refer to - * (for inode->i_sb->s_op) may already have been freed and reused. - */ -static DECLARE_MUTEX(iprune_sem); - -/* - * Statistics gathering.. - */ -struct inodes_stat_t inodes_stat; - -static kmem_cache_t * inode_cachep; - -static struct inode *alloc_inode(struct super_block *sb) -{ - static struct address_space_operations empty_aops; - static struct inode_operations empty_iops; - static struct file_operations empty_fops; - struct inode *inode; - - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); - else - inode = (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL); - - if (inode) { - struct address_space * const mapping = &inode->i_data; - - inode->i_sb = sb; - inode->i_blkbits = sb->s_blocksize_bits; - inode->i_flags = 0; - atomic_set(&inode->i_count, 1); - inode->i_sock = 0; - inode->i_op = &empty_iops; - inode->i_fop = &empty_fops; - inode->i_nlink = 1; - atomic_set(&inode->i_writecount, 0); - inode->i_size = 0; - inode->i_blocks = 0; - inode->i_bytes = 0; - inode->i_generation = 0; - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); - inode->i_pipe = NULL; - inode->i_bdev = NULL; - inode->i_rdev = to_kdev_t(0); - inode->i_security = NULL; - if (security_inode_alloc(inode)) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); - return NULL; - } - - mapping->a_ops = &empty_aops; - mapping->host = inode; - mapping->gfp_mask = GFP_HIGHUSER; - mapping->dirtied_when = 0; - mapping->assoc_mapping = NULL; - mapping->backing_dev_info = &default_backing_dev_info; - if (sb->s_bdev) - mapping->backing_dev_info = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; - memset(&inode->u, 0, sizeof(inode->u)); - inode->i_mapping = mapping; - } - return inode; -} - -void destroy_inode(struct inode *inode) -{ - if (inode_has_buffers(inode)) - BUG(); - security_inode_free(inode); - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); -} - - -/* - * These are initializations that only need to be done - * once, because the fields are idempotent across use - * of the inode, so let the slab aware of that. - */ -void inode_init_once(struct inode *inode) -{ - memset(inode, 0, sizeof(*inode)); - INIT_HLIST_NODE(&inode->i_hash); - INIT_LIST_HEAD(&inode->i_data.clean_pages); - INIT_LIST_HEAD(&inode->i_data.dirty_pages); - INIT_LIST_HEAD(&inode->i_data.locked_pages); - INIT_LIST_HEAD(&inode->i_data.io_pages); - INIT_LIST_HEAD(&inode->i_dentry); - INIT_LIST_HEAD(&inode->i_devices); - sema_init(&inode->i_sem, 1); - INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); - rwlock_init(&inode->i_data.page_lock); - init_MUTEX(&inode->i_data.i_shared_sem); - INIT_LIST_HEAD(&inode->i_data.private_list); - spin_lock_init(&inode->i_data.private_lock); - INIT_LIST_HEAD(&inode->i_data.i_mmap); - INIT_LIST_HEAD(&inode->i_data.i_mmap_shared); - spin_lock_init(&inode->i_lock); -} - -static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags) -{ - struct inode * inode = (struct inode *) foo; - - if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == - SLAB_CTOR_CONSTRUCTOR) - inode_init_once(inode); -} - -/* - * inode_lock must be held - */ -void __iget(struct inode * inode) -{ - if (atomic_read(&inode->i_count)) { - atomic_inc(&inode->i_count); - return; - } - atomic_inc(&inode->i_count); - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_in_use); - } - inodes_stat.nr_unused--; -} - -/** - * clear_inode - clear an inode - * @inode: inode to clear - * - * This is called by the filesystem to tell us - * that the inode is no longer useful. We just - * terminate it with extreme prejudice. - */ - -void clear_inode(struct inode *inode) -{ - invalidate_inode_buffers(inode); - - if (inode->i_data.nrpages) - BUG(); - if (!(inode->i_state & I_FREEING)) - BUG(); - if (inode->i_state & I_CLEAR) - BUG(); - wait_on_inode(inode); - DQUOT_DROP(inode); - if (inode->i_sb && inode->i_sb->s_op->clear_inode) - inode->i_sb->s_op->clear_inode(inode); - if (inode->i_bdev) - bd_forget(inode); - inode->i_state = I_CLEAR; -} - -/* - * Dispose-list gets a local list with local inodes in it, so it doesn't - * need to worry about list corruption and SMP locks. - */ -static void dispose_list(struct list_head *head) -{ - int nr_disposed = 0; - - while (!list_empty(head)) { - struct inode *inode; - - inode = list_entry(head->next, struct inode, i_list); - list_del(&inode->i_list); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); - nr_disposed++; - } - spin_lock(&inode_lock); - inodes_stat.nr_inodes -= nr_disposed; - spin_unlock(&inode_lock); -} - -/* - * Invalidate all inodes for a device. - */ -static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose) -{ - struct list_head *next; - int busy = 0, count = 0; - - next = head->next; - for (;;) { - struct list_head * tmp = next; - struct inode * inode; - - next = next->next; - if (tmp == head) - break; - inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) - continue; - invalidate_inode_buffers(inode); - if (!atomic_read(&inode->i_count)) { - hlist_del_init(&inode->i_hash); - list_del(&inode->i_list); - list_add(&inode->i_list, dispose); - inode->i_state |= I_FREEING; - count++; - continue; - } - busy = 1; - } - /* only unused inodes may be cached with i_count zero */ - inodes_stat.nr_unused -= count; - return busy; -} - -/* - * This is a two-stage process. First we collect all - * offending inodes onto the throw-away list, and in - * the second stage we actually dispose of them. This - * is because we don't want to sleep while messing - * with the global lists.. - */ - -/** - * invalidate_inodes - discard the inodes on a device - * @sb: superblock - * - * Discard all of the inodes for a given superblock. If the discard - * fails because there are busy inodes then a non zero value is returned. - * If the discard is successful all the inodes have been discarded. - */ - -int invalidate_inodes(struct super_block * sb) -{ - int busy; - LIST_HEAD(throw_away); - - down(&iprune_sem); - spin_lock(&inode_lock); - busy = invalidate_list(&inode_in_use, sb, &throw_away); - busy |= invalidate_list(&inode_unused, sb, &throw_away); - busy |= invalidate_list(&sb->s_dirty, sb, &throw_away); - busy |= invalidate_list(&sb->s_io, sb, &throw_away); - spin_unlock(&inode_lock); - - dispose_list(&throw_away); - up(&iprune_sem); - - return busy; -} - -int invalidate_device(kdev_t dev, int do_sync) -{ - struct super_block *sb; - struct block_device *bdev = bdget(kdev_t_to_nr(dev)); - int res; - - if (!bdev) - return 0; - - if (do_sync) - fsync_bdev(bdev); - - res = 0; - sb = get_super(bdev); - if (sb) { - /* - * no need to lock the super, get_super holds the - * read semaphore so the filesystem cannot go away - * under us (->put_super runs with the write lock - * hold). - */ - shrink_dcache_sb(sb); - res = invalidate_inodes(sb); - drop_super(sb); - } - invalidate_bdev(bdev, 0); - bdput(bdev); - return res; -} - -static int can_unuse(struct inode *inode) -{ - if (inode->i_state) - return 0; - if (inode_has_buffers(inode)) - return 0; - if (atomic_read(&inode->i_count)) - return 0; - if (inode->i_data.nrpages) - return 0; - return 1; -} - -/* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). - * - * Any inodes which are pinned purely because of attached pagecache have their - * pagecache removed. We expect the final iput() on that inode to add it to - * the front of the inode_unused list. So look for it there and if the - * inode is still freeable, proceed. The right inode is found 99.9% of the - * time in testing on a 4-way. - * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. - */ -static void prune_icache(int nr_to_scan) -{ - LIST_HEAD(freeable); - int nr_pruned = 0; - int nr_scanned; - unsigned long reap = 0; - - down(&iprune_sem); - spin_lock(&inode_lock); - for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { - struct inode *inode; - - if (list_empty(&inode_unused)) - break; - - inode = list_entry(inode_unused.prev, struct inode, i_list); - - if (inode->i_state || atomic_read(&inode->i_count)) { - list_move(&inode->i_list, &inode_unused); - continue; - } - if (inode_has_buffers(inode) || inode->i_data.nrpages) { - __iget(inode); - spin_unlock(&inode_lock); - if (remove_inode_buffers(inode)) - reap += invalidate_inode_pages(&inode->i_data); - iput(inode); - spin_lock(&inode_lock); - - if (inode != list_entry(inode_unused.next, - struct inode, i_list)) - continue; /* wrong inode or list_empty */ - if (!can_unuse(inode)) - continue; - } - hlist_del_init(&inode->i_hash); - list_move(&inode->i_list, &freeable); - inode->i_state |= I_FREEING; - nr_pruned++; - } - inodes_stat.nr_unused -= nr_pruned; - spin_unlock(&inode_lock); - - dispose_list(&freeable); - up(&iprune_sem); - - if (current_is_kswapd) - mod_page_state(kswapd_inodesteal, reap); - else - mod_page_state(pginodesteal, reap); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(int nr, unsigned int gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (gfp_mask & __GFP_FS) - prune_icache(nr); - } - return inodes_stat.nr_unused; -} - -void __wait_on_freeing_inode(struct inode *inode); -/* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() - * by hand after calling find_inode now! This simplifies iunique and won't - * add any additional branch in the common code. - */ -static struct inode * find_inode(struct super_block * sb, struct hlist_head *head, int (*test)(struct inode *, void *), void *data) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = hlist_entry(node, struct inode, i_hash); - if (inode->i_sb != sb) - continue; - if (!test(inode, data)) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/* - * find_inode_fast is the fast path version of find_inode, see the comment at - * iget_locked for details. - */ -static struct inode * find_inode_fast(struct super_block * sb, struct hlist_head *head, unsigned long ino) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = list_entry(node, struct inode, i_hash); - if (inode->i_ino != ino) - continue; - if (inode->i_sb != sb) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/** - * new_inode - obtain an inode - * @sb: superblock - * - * Allocates a new inode for given superblock. - */ - -struct inode *new_inode(struct super_block *sb) -{ - static unsigned long last_ino; - struct inode * inode; - - spin_lock_prefetch(&inode_lock); - - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode_lock); - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - inode->i_ino = ++last_ino; - inode->i_state = 0; - spin_unlock(&inode_lock); - } - return inode; -} - -void unlock_new_inode(struct inode *inode) -{ - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). - */ - inode->i_state &= ~(I_LOCK|I_NEW); - wake_up_inode(inode); -} -EXPORT_SYMBOL(unlock_new_inode); - -/* - * This is called without the inode lock held.. Be careful. - * - * We no longer cache the sb_flags in i_flags - see fs.h - * -- rmk@arm.uk.linux.org - */ -static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode(sb, head, test, data); - if (!old) { - if (set(inode, data)) - goto set_failed; - - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; - -set_failed: - spin_unlock(&inode_lock); - destroy_inode(inode); - return NULL; -} - -/* - * get_new_inode_fast is the fast path version of get_new_inode, see the - * comment at iget_locked for details. - */ -static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); - if (!old) { - inode->i_ino = ino; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; -} - -static inline unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES); - tmp = tmp + (tmp >> I_HASHBITS); - return tmp & I_HASHMASK; -} - -/* Yeah, I know about quadratic hash. Maybe, later. */ - -/** - * iunique - get a unique inode number - * @sb: superblock - * @max_reserved: highest reserved inode number - * - * Obtain an inode number that is unique on the system for a given - * superblock. This is used by file systems that have no natural - * permanent inode numbering system. An inode number is returned that - * is higher than the reserved limit but unique. - * - * BUGS: - * With a large number of inodes live on the file system this function - * currently becomes quite slow. - */ - -ino_t iunique(struct super_block *sb, ino_t max_reserved) -{ - static ino_t counter = 0; - struct inode *inode; - struct hlist_head * head; - ino_t res; - spin_lock(&inode_lock); -retry: - if (counter > max_reserved) { - head = inode_hashtable + hash(sb,counter); - res = counter++; - inode = find_inode_fast(sb, head, res); - if (!inode) { - spin_unlock(&inode_lock); - return res; - } - } else { - counter = max_reserved + 1; - } - goto retry; - -} - -struct inode *igrab(struct inode *inode) -{ - spin_lock(&inode_lock); - if (!(inode->i_state & I_FREEING)) - __iget(inode); - else - /* - * Handle the case where s_op->clear_inode is not been - * called yet, and somebody is calling igrab - * while the inode is getting freed. - */ - inode = NULL; - spin_unlock(&inode_lock); - return inode; -} - -/** - * ifind - internal function, you want ilookup5() or iget5(). - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ifind() searches for the inode specified by @hashval and @data in the inode - * cache. This is a generalized version of ifind_fast() for file systems where - * the inode number is not sufficient for unique identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -static inline struct inode *ifind(struct super_block *sb, - struct hlist_head *head, int (*test)(struct inode *, void *), - void *data) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode(sb, head, test, data); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ifind_fast - internal function, you want ilookup() or iget(). - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ifind_fast() searches for the inode @ino in the inode cache. This is for - * file systems where the inode number is sufficient for unique identification - * of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -static inline struct inode *ifind_fast(struct super_block *sb, - struct hlist_head *head, unsigned long ino) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode_fast(sb, head, ino); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ilookup5 - search for an inode in the inode cache - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ilookup5() uses ifind() to search for the inode specified by @hashval and - * @data in the inode cache. This is a generalized version of ilookup() for - * file systems where the inode number is not sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -struct inode *ilookup5(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - - return ifind(sb, head, test, data); -} -EXPORT_SYMBOL(ilookup5); - -/** - * ilookup - search for an inode in the inode cache - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache. - * This is for file systems where the inode number is sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -struct inode *ilookup(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - - return ifind_fast(sb, head, ino); -} -EXPORT_SYMBOL(ilookup); - -/** - * iget5_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @hashval: hash value (usually inode number) to get - * @test: callback used for comparisons between inodes - * @set: callback used to initialize a new struct inode - * @data: opaque data pointer to pass to @test and @set - * - * This is iget() without the read_inode() portion of get_new_inode(). - * - * iget5_locked() uses ifind() to search for the inode specified by @hashval - * and @data in the inode cache and if present it is returned with an increased - * reference count. This is a generalized version of iget_locked() for file - * systems where the inode number is not sufficient for unique identification - * of an inode. - * - * If the inode is not in cache, get_new_inode() is called to allocate a new - * inode and this is returned locked, hashed, and with the I_NEW flag set. The - * file system gets to fill it in before unlocking it via unlock_new_inode(). - * - * Note both @test and @set are called with the inode_lock held, so can't sleep. - */ -struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), - int (*set)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - struct inode *inode; - - inode = ifind(sb, head, test, data); - if (inode) - return inode; - /* - * get_new_inode() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode(sb, head, test, set, data); -} -EXPORT_SYMBOL(iget5_locked); - -/** - * iget_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @ino: inode number to get - * - * This is iget() without the read_inode() portion of get_new_inode_fast(). - * - * iget_locked() uses ifind_fast() to search for the inode specified by @ino in - * the inode cache and if present it is returned with an increased reference - * count. This is for file systems where the inode number is sufficient for - * unique identification of an inode. - * - * If the inode is not in cache, get_new_inode_fast() is called to allocate a - * new inode and this is returned locked, hashed, and with the I_NEW flag set. - * The file system gets to fill it in before unlocking it via - * unlock_new_inode(). - */ -struct inode *iget_locked(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - struct inode *inode; - - inode = ifind_fast(sb, head, ino); - if (inode) - return inode; - /* - * get_new_inode_fast() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode_fast(sb, head, ino); -} -EXPORT_SYMBOL(iget_locked); - -/** - * __insert_inode_hash - hash an inode - * @inode: unhashed inode - * @hashval: unsigned long value used to locate this object in the - * inode_hashtable. - * - * Add an inode to the inode hash for this superblock. If the inode - * has no superblock it is added to a separate anonymous chain. - */ - -void __insert_inode_hash(struct inode *inode, unsigned long hashval) -{ - struct hlist_head *head = &anon_hash_chain; - if (inode->i_sb) - head = inode_hashtable + hash(inode->i_sb, hashval); - spin_lock(&inode_lock); - hlist_add_head(&inode->i_hash, head); - spin_unlock(&inode_lock); -} - -/** - * remove_inode_hash - remove an inode from the hash - * @inode: inode to unhash - * - * Remove an inode from the superblock or anonymous hash. - */ - -void remove_inode_hash(struct inode *inode) -{ - spin_lock(&inode_lock); - hlist_del_init(&inode->i_hash); - spin_unlock(&inode_lock); -} - -void generic_delete_inode(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - -<<<---hlist_del_init|||list_del_init===--->>> list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - - security_inode_delete(inode); - - if (op->delete_inode) { - void (*delete)(struct inode *) = op->delete_inode; - if (!is_bad_inode(inode)) - DQUOT_INIT(inode); - /* s_op->delete_inode internally recalls clear_inode() */ - delete(inode); - } else - clear_inode(inode); - spin_lock(&inode_lock); - list_del_init(&inode->i_hash); - spin_unlock(&inode_lock); - wake_up_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); -} -EXPORT_SYMBOL(generic_delete_inode); - -static void generic_forget_inode(struct inode *inode) -{ - struct super_block *sb = inode->i_sb; - - if (!hlist_unhashed(&inode->i_hash)) { - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_unused); - } - inodes_stat.nr_unused++; - spin_unlock(&inode_lock); - if (!sb || (sb->s_flags & MS_ACTIVE)) - return; - write_inode_now(inode, 1); - spin_lock(&inode_lock); - inodes_stat.nr_unused--; - hlist_del_init(&inode->i_hash); - } - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); -} - -/* - * Normal UNIX filesystem behaviour: delete the - * inode when the usage count drops to zero, and - * i_nlink is zero. - */ -static void generic_drop_inode(struct inode *inode) -{ - if (!inode->i_nlink) - generic_delete_inode(inode); - else - generic_forget_inode(inode); -} - -/* - * Called when we're dropping the last reference - * to an inode. - * - * Call the FS "drop()" function, defaulting to - * the legacy UNIX filesystem behaviour.. - * - * NOTE! NOTE! NOTE! We're called with the inode lock - * held, and the drop function is supposed to release - * the lock! - */ -static inline void iput_final(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - void (*drop)(struct inode *) = generic_drop_inode; - - if (op && op->drop_inode) - drop = op->drop_inode; - drop(inode); -} - -/** - * iput - put an inode - * @inode: inode to put - * - * Puts an inode, dropping its usage count. If the inode use count hits - * zero the inode is also then freed and may be destroyed. - */ - -void iput(struct inode *inode) -{ - if (inode) { - struct super_operations *op = inode->i_sb->s_op; - - if (inode->i_state == I_CLEAR) - BUG(); - - if (op && op->put_inode) - op->put_inode(inode); - - if (atomic_dec_and_lock(&inode->i_count, &inode_lock)) - iput_final(inode); - } -} - -/** - * bmap - find a block number in a file - * @inode: inode of file - * @block: block to find - * - * Returns the block number on the device holding the inode that - * is the disk block number for the block of the file requested. - * That is, asked for block 4 of inode 1 the function will return the - * disk block relative to the disk start that holds that block of the - * file. - */ - -sector_t bmap(struct inode * inode, sector_t block) -{ - sector_t res = 0; - if (inode->i_mapping->a_ops->bmap) - res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block); - return res; -} - -/* - * Return true if the filesystem which backs this inode considers the two - * passed timespecs to be sufficiently different to warrant flushing the - * altered time out to disk. - */ -static int inode_times_differ(struct inode *inode, - struct timespec *old, struct timespec *new) -{ - if (IS_ONE_SECOND(inode)) - return old->tv_sec != new->tv_sec; - return !timespec_equal(old, new); -} - -/** - * update_atime - update the access time - * @inode: inode accessed - * - * Update the accessed time on an inode and mark it for writeback. - * This function automatically handles read only file systems and media, - * as well as the "noatime" flag and inode specific "noatime" markers. - */ - -void update_atime(struct inode *inode) -{ - struct timespec now; - - if (IS_NOATIME(inode)) - return; - if (IS_NODIRATIME(inode) && S_ISDIR(inode->i_mode)) - return; - if (IS_RDONLY(inode)) - return; - - now = current_kernel_time(); - if (inode_times_differ(inode, &inode->i_atime, &now)) { - inode->i_atime = now; - mark_inode_dirty_sync(inode); - } else { - if (!timespec_equal(&inode->i_atime, &now)) - inode->i_atime = now; - } -} - -/** - * inode_update_time - update mtime and ctime time - * @inode: inode accessed - * @ctime_too: update ctime too - * - * Update the mtime time on an inode and mark it for writeback. - * When ctime_too is specified update the ctime too. - */ - -void inode_update_time(struct inode *inode, int ctime_too) -{ - struct timespec now = current_kernel_time(); - int sync_it = 0; - - if (inode_times_differ(inode, &inode->i_mtime, &now)) - sync_it = 1; - inode->i_mtime = now; - - if (ctime_too) { - if (inode_times_differ(inode, &inode->i_ctime, &now)) - sync_it = 1; - inode->i_ctime = now; - } - if (sync_it) - mark_inode_dirty_sync(inode); -} -EXPORT_SYMBOL(inode_update_time); - -int inode_needs_sync(struct inode *inode) -{ - if (IS_SYNC(inode)) - return 1; - if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) - return 1; - return 0; -} -EXPORT_SYMBOL(inode_needs_sync); - -/* - * Quota functions that want to walk the inode lists.. - */ -#ifdef CONFIG_QUOTA - -/* Functions back in dquot.c */ -void put_dquot_list(struct list_head *); -int remove_inode_dquot_ref(struct inode *, int, struct list_head *); - -void remove_dquot_ref(struct super_block *sb, int type) -{ - struct inode *inode; - struct list_head *act_head; - LIST_HEAD(tofree_head); - - if (!sb->dq_op) - return; /* nothing to do */ - spin_lock(&inode_lock); /* This lock is for inodes code */ - /* We don't have to lock against quota code - test IS_QUOTAINIT is just for speedup... */ - - list_for_each(act_head, &inode_in_use) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &inode_unused) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_dirty) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_io) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - spin_unlock(&inode_lock); - - put_dquot_list(&tofree_head); -} - -#endif - -/* - * Hashed waitqueues for wait_on_inode(). The table is pretty small - the - * kernel doesn't lock many inodes at the same time. - */ -#define I_WAIT_TABLE_ORDER 3 -static struct i_wait_queue_head { - wait_queue_head_t wqh; -} ____cacheline_aligned_in_smp i_wait_queue_heads[1<i_state & I_LOCK) { - schedule(); - goto repeat; - } - remove_wait_queue(wq, &wait); - __set_current_state(TASK_RUNNING); -} - -void __wait_on_freeing_inode(struct inode *inode) -{ - DECLARE_WAITQUEUE(wait, current); - wait_queue_head_t *wq = i_waitq_head(inode); - - add_wait_queue(wq, &wait); - set_current_state(TASK_UNINTERRUPTIBLE); - spin_unlock(&inode_lock); - schedule(); - remove_wait_queue(wq, &wait); - current->state = TASK_RUNNING; - spin_lock(&inode_lock); -} - - -void wake_up_inode(struct inode *inode) -{ - wait_queue_head_t *wq = i_waitq_head(inode); - - /* - * Prevent speculative execution through spin_unlock(&inode_lock); - */ - smp_mb(); - if (waitqueue_active(wq)) - wake_up_all(wq); -} - -/* - * Initialize the waitqueues and inode hash table. - */ -void __init inode_init(unsigned long mempages) -{ - struct hlist_head *head; - unsigned long order; - unsigned int nr_hash; - int i; - - for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++) - init_waitqueue_head(&i_wait_queue_heads[i].wqh); - - mempages >>= (14 - PAGE_SHIFT); - mempages *= sizeof(struct list_head); - for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++) - ; - - do { - unsigned long tmp; - - nr_hash = (1UL << order) * PAGE_SIZE / - sizeof(struct hlist_head); - i_hash_mask = (nr_hash - 1); - - tmp = nr_hash; - i_hash_shift = 0; - while ((tmp >>= 1UL) != 0UL) - i_hash_shift++; - - inode_hashtable = (struct hlist_head *) - __get_free_pages(GFP_ATOMIC, order); - } while (inode_hashtable == NULL && --order >= 0); - - printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n", - nr_hash, order, (PAGE_SIZE << order)); - - if (!inode_hashtable) - panic("Failed to allocate inode hash table\n"); - - head = inode_hashtable; - i = nr_hash; - do { - INIT_HLIST_HEAD(head); - head++; - i--; - } while (i); - - /* inode slab cache */ - inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), - 0, SLAB_HWCACHE_ALIGN, init_once, - NULL); - if (!inode_cachep) - panic("cannot create inode slab cache"); - - set_shrinker(DEFAULT_SEEKS, shrink_icache_memory); -} - -void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) -{ - inode->i_mode = mode; - if (S_ISCHR(mode)) { - inode->i_fop = &def_chr_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISBLK(mode)) { - inode->i_fop = &def_blk_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISFIFO(mode)) - inode->i_fop = &def_fifo_fops; - else if (S_ISSOCK(mode)) - inode->i_fop = &bad_sock_fops; - else - printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n", - mode); -} ./linux/inode-justrej/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.640392475 +0000 @@ -1,1358 +0,0 @@ -/* - * linux/fs/inode.c - * - * (C) 1997 Linus Torvalds - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * This is needed for the following functions: - * - inode_has_buffers - * - invalidate_inode_buffers - * - fsync_bdev - * - invalidate_bdev - * - * FIXME: remove all knowledge of the buffer layer from this file - */ -#include - -/* - * New inode.c implementation. - * - * This implementation has the basic premise of trying - * to be extremely low-overhead and SMP-safe, yet be - * simple enough to be "obviously correct". - * - * Famous last words. - */ - -/* inode dynamic allocation 1999, Andrea Arcangeli */ - -/* #define INODE_PARANOIA 1 */ -/* #define INODE_DEBUG 1 */ - -/* - * Inode lookup is no longer as critical as it used to be: - * most of the lookups are going to be through the dcache. - */ -#define I_HASHBITS i_hash_shift -#define I_HASHMASK i_hash_mask - -static unsigned int i_hash_mask; -static unsigned int i_hash_shift; - -/* - * Each inode can be on two separate lists. One is - * the hash list of the inode, used for lookups. The - * other linked list is the "type" list: - * "in_use" - valid inode, i_count > 0, i_nlink > 0 - * "dirty" - as "in_use" but also dirty - * "unused" - valid inode, i_count = 0 - * - * A "dirty" list is maintained for each super block, - * allowing for low-overhead inode sync() operations. - */ - -LIST_HEAD(inode_in_use); -LIST_HEAD(inode_unused); -static struct hlist_head *inode_hashtable; -static HLIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */ - -/* - * A simple spinlock to protect the list manipulations. - * - * NOTE! You also have to own the lock if you change - * the i_state of an inode while it is in use.. - */ -spinlock_t inode_lock = SPIN_LOCK_UNLOCKED; - -/* - * iprune_sem provides exclusion between the kswapd or try_to_free_pages - * icache shrinking path, and the umount path. Without this exclusion, - * by the time prune_icache calls iput for the inode whose pages it has - * been invalidating, or by the time it calls clear_inode & destroy_inode - * from its final dispose_list, the struct super_block they refer to - * (for inode->i_sb->s_op) may already have been freed and reused. - */ -static DECLARE_MUTEX(iprune_sem); - -/* - * Statistics gathering.. - */ -struct inodes_stat_t inodes_stat; - -static kmem_cache_t * inode_cachep; - -static struct inode *alloc_inode(struct super_block *sb) -{ - static struct address_space_operations empty_aops; - static struct inode_operations empty_iops; - static struct file_operations empty_fops; - struct inode *inode; - - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); - else - inode = (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL); - - if (inode) { - struct address_space * const mapping = &inode->i_data; - - inode->i_sb = sb; - inode->i_blkbits = sb->s_blocksize_bits; - inode->i_flags = 0; - atomic_set(&inode->i_count, 1); - inode->i_sock = 0; - inode->i_op = &empty_iops; - inode->i_fop = &empty_fops; - inode->i_nlink = 1; - atomic_set(&inode->i_writecount, 0); - inode->i_size = 0; - inode->i_blocks = 0; - inode->i_bytes = 0; - inode->i_generation = 0; - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); - inode->i_pipe = NULL; - inode->i_bdev = NULL; - inode->i_rdev = to_kdev_t(0); - inode->i_security = NULL; - if (security_inode_alloc(inode)) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); - return NULL; - } - - mapping->a_ops = &empty_aops; - mapping->host = inode; - mapping->gfp_mask = GFP_HIGHUSER; - mapping->dirtied_when = 0; - mapping->assoc_mapping = NULL; - mapping->backing_dev_info = &default_backing_dev_info; - if (sb->s_bdev) - mapping->backing_dev_info = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; - memset(&inode->u, 0, sizeof(inode->u)); - inode->i_mapping = mapping; - } - return inode; -} - -void destroy_inode(struct inode *inode) -{ - if (inode_has_buffers(inode)) - BUG(); - security_inode_free(inode); - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); -} - - -/* - * These are initializations that only need to be done - * once, because the fields are idempotent across use - * of the inode, so let the slab aware of that. - */ -void inode_init_once(struct inode *inode) -{ - memset(inode, 0, sizeof(*inode)); - INIT_HLIST_NODE(&inode->i_hash); - INIT_LIST_HEAD(&inode->i_data.clean_pages); - INIT_LIST_HEAD(&inode->i_data.dirty_pages); - INIT_LIST_HEAD(&inode->i_data.locked_pages); - INIT_LIST_HEAD(&inode->i_data.io_pages); - INIT_LIST_HEAD(&inode->i_dentry); - INIT_LIST_HEAD(&inode->i_devices); - sema_init(&inode->i_sem, 1); - INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); - rwlock_init(&inode->i_data.page_lock); - init_MUTEX(&inode->i_data.i_shared_sem); - INIT_LIST_HEAD(&inode->i_data.private_list); - spin_lock_init(&inode->i_data.private_lock); - INIT_LIST_HEAD(&inode->i_data.i_mmap); - INIT_LIST_HEAD(&inode->i_data.i_mmap_shared); - spin_lock_init(&inode->i_lock); -} - -static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags) -{ - struct inode * inode = (struct inode *) foo; - - if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == - SLAB_CTOR_CONSTRUCTOR) - inode_init_once(inode); -} - -/* - * inode_lock must be held - */ -void __iget(struct inode * inode) -{ - if (atomic_read(&inode->i_count)) { - atomic_inc(&inode->i_count); - return; - } - atomic_inc(&inode->i_count); - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_in_use); - } - inodes_stat.nr_unused--; -} - -/** - * clear_inode - clear an inode - * @inode: inode to clear - * - * This is called by the filesystem to tell us - * that the inode is no longer useful. We just - * terminate it with extreme prejudice. - */ - -void clear_inode(struct inode *inode) -{ - invalidate_inode_buffers(inode); - - if (inode->i_data.nrpages) - BUG(); - if (!(inode->i_state & I_FREEING)) - BUG(); - if (inode->i_state & I_CLEAR) - BUG(); - wait_on_inode(inode); - DQUOT_DROP(inode); - if (inode->i_sb && inode->i_sb->s_op->clear_inode) - inode->i_sb->s_op->clear_inode(inode); - if (inode->i_bdev) - bd_forget(inode); - inode->i_state = I_CLEAR; -} - -/* - * Dispose-list gets a local list with local inodes in it, so it doesn't - * need to worry about list corruption and SMP locks. - */ -static void dispose_list(struct list_head *head) -{ - int nr_disposed = 0; - - while (!list_empty(head)) { - struct inode *inode; - - inode = list_entry(head->next, struct inode, i_list); - list_del(&inode->i_list); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); - nr_disposed++; - } - spin_lock(&inode_lock); - inodes_stat.nr_inodes -= nr_disposed; - spin_unlock(&inode_lock); -} - -/* - * Invalidate all inodes for a device. - */ -static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose) -{ - struct list_head *next; - int busy = 0, count = 0; - - next = head->next; - for (;;) { - struct list_head * tmp = next; - struct inode * inode; - - next = next->next; - if (tmp == head) - break; - inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) - continue; - invalidate_inode_buffers(inode); - if (!atomic_read(&inode->i_count)) { - hlist_del_init(&inode->i_hash); - list_del(&inode->i_list); - list_add(&inode->i_list, dispose); - inode->i_state |= I_FREEING; - count++; - continue; - } - busy = 1; - } - /* only unused inodes may be cached with i_count zero */ - inodes_stat.nr_unused -= count; - return busy; -} - -/* - * This is a two-stage process. First we collect all - * offending inodes onto the throw-away list, and in - * the second stage we actually dispose of them. This - * is because we don't want to sleep while messing - * with the global lists.. - */ - -/** - * invalidate_inodes - discard the inodes on a device - * @sb: superblock - * - * Discard all of the inodes for a given superblock. If the discard - * fails because there are busy inodes then a non zero value is returned. - * If the discard is successful all the inodes have been discarded. - */ - -int invalidate_inodes(struct super_block * sb) -{ - int busy; - LIST_HEAD(throw_away); - - down(&iprune_sem); - spin_lock(&inode_lock); - busy = invalidate_list(&inode_in_use, sb, &throw_away); - busy |= invalidate_list(&inode_unused, sb, &throw_away); - busy |= invalidate_list(&sb->s_dirty, sb, &throw_away); - busy |= invalidate_list(&sb->s_io, sb, &throw_away); - spin_unlock(&inode_lock); - - dispose_list(&throw_away); - up(&iprune_sem); - - return busy; -} - -int invalidate_device(kdev_t dev, int do_sync) -{ - struct super_block *sb; - struct block_device *bdev = bdget(kdev_t_to_nr(dev)); - int res; - - if (!bdev) - return 0; - - if (do_sync) - fsync_bdev(bdev); - - res = 0; - sb = get_super(bdev); - if (sb) { - /* - * no need to lock the super, get_super holds the - * read semaphore so the filesystem cannot go away - * under us (->put_super runs with the write lock - * hold). - */ - shrink_dcache_sb(sb); - res = invalidate_inodes(sb); - drop_super(sb); - } - invalidate_bdev(bdev, 0); - bdput(bdev); - return res; -} - -static int can_unuse(struct inode *inode) -{ - if (inode->i_state) - return 0; - if (inode_has_buffers(inode)) - return 0; - if (atomic_read(&inode->i_count)) - return 0; - if (inode->i_data.nrpages) - return 0; - return 1; -} - -/* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). - * - * Any inodes which are pinned purely because of attached pagecache have their - * pagecache removed. We expect the final iput() on that inode to add it to - * the front of the inode_unused list. So look for it there and if the - * inode is still freeable, proceed. The right inode is found 99.9% of the - * time in testing on a 4-way. - * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. - */ -static void prune_icache(int nr_to_scan) -{ - LIST_HEAD(freeable); - int nr_pruned = 0; - int nr_scanned; - unsigned long reap = 0; - - down(&iprune_sem); - spin_lock(&inode_lock); - for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { - struct inode *inode; - - if (list_empty(&inode_unused)) - break; - - inode = list_entry(inode_unused.prev, struct inode, i_list); - - if (inode->i_state || atomic_read(&inode->i_count)) { - list_move(&inode->i_list, &inode_unused); - continue; - } - if (inode_has_buffers(inode) || inode->i_data.nrpages) { - __iget(inode); - spin_unlock(&inode_lock); - if (remove_inode_buffers(inode)) - reap += invalidate_inode_pages(&inode->i_data); - iput(inode); - spin_lock(&inode_lock); - - if (inode != list_entry(inode_unused.next, - struct inode, i_list)) - continue; /* wrong inode or list_empty */ - if (!can_unuse(inode)) - continue; - } - hlist_del_init(&inode->i_hash); - list_move(&inode->i_list, &freeable); - inode->i_state |= I_FREEING; - nr_pruned++; - } - inodes_stat.nr_unused -= nr_pruned; - spin_unlock(&inode_lock); - - dispose_list(&freeable); - up(&iprune_sem); - - if (current_is_kswapd) - mod_page_state(kswapd_inodesteal, reap); - else - mod_page_state(pginodesteal, reap); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(int nr, unsigned int gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (gfp_mask & __GFP_FS) - prune_icache(nr); - } - return inodes_stat.nr_unused; -} - -void __wait_on_freeing_inode(struct inode *inode); -/* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() - * by hand after calling find_inode now! This simplifies iunique and won't - * add any additional branch in the common code. - */ -static struct inode * find_inode(struct super_block * sb, struct hlist_head *head, int (*test)(struct inode *, void *), void *data) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = hlist_entry(node, struct inode, i_hash); - if (inode->i_sb != sb) - continue; - if (!test(inode, data)) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/* - * find_inode_fast is the fast path version of find_inode, see the comment at - * iget_locked for details. - */ -static struct inode * find_inode_fast(struct super_block * sb, struct hlist_head *head, unsigned long ino) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = list_entry(node, struct inode, i_hash); - if (inode->i_ino != ino) - continue; - if (inode->i_sb != sb) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/** - * new_inode - obtain an inode - * @sb: superblock - * - * Allocates a new inode for given superblock. - */ - -struct inode *new_inode(struct super_block *sb) -{ - static unsigned long last_ino; - struct inode * inode; - - spin_lock_prefetch(&inode_lock); - - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode_lock); - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - inode->i_ino = ++last_ino; - inode->i_state = 0; - spin_unlock(&inode_lock); - } - return inode; -} - -void unlock_new_inode(struct inode *inode) -{ - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). - */ - inode->i_state &= ~(I_LOCK|I_NEW); - wake_up_inode(inode); -} -EXPORT_SYMBOL(unlock_new_inode); - -/* - * This is called without the inode lock held.. Be careful. - * - * We no longer cache the sb_flags in i_flags - see fs.h - * -- rmk@arm.uk.linux.org - */ -static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode(sb, head, test, data); - if (!old) { - if (set(inode, data)) - goto set_failed; - - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; - -set_failed: - spin_unlock(&inode_lock); - destroy_inode(inode); - return NULL; -} - -/* - * get_new_inode_fast is the fast path version of get_new_inode, see the - * comment at iget_locked for details. - */ -static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); - if (!old) { - inode->i_ino = ino; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; -} - -static inline unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES); - tmp = tmp + (tmp >> I_HASHBITS); - return tmp & I_HASHMASK; -} - -/* Yeah, I know about quadratic hash. Maybe, later. */ - -/** - * iunique - get a unique inode number - * @sb: superblock - * @max_reserved: highest reserved inode number - * - * Obtain an inode number that is unique on the system for a given - * superblock. This is used by file systems that have no natural - * permanent inode numbering system. An inode number is returned that - * is higher than the reserved limit but unique. - * - * BUGS: - * With a large number of inodes live on the file system this function - * currently becomes quite slow. - */ - -ino_t iunique(struct super_block *sb, ino_t max_reserved) -{ - static ino_t counter = 0; - struct inode *inode; - struct hlist_head * head; - ino_t res; - spin_lock(&inode_lock); -retry: - if (counter > max_reserved) { - head = inode_hashtable + hash(sb,counter); - res = counter++; - inode = find_inode_fast(sb, head, res); - if (!inode) { - spin_unlock(&inode_lock); - return res; - } - } else { - counter = max_reserved + 1; - } - goto retry; - -} - -struct inode *igrab(struct inode *inode) -{ - spin_lock(&inode_lock); - if (!(inode->i_state & I_FREEING)) - __iget(inode); - else - /* - * Handle the case where s_op->clear_inode is not been - * called yet, and somebody is calling igrab - * while the inode is getting freed. - */ - inode = NULL; - spin_unlock(&inode_lock); - return inode; -} - -/** - * ifind - internal function, you want ilookup5() or iget5(). - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ifind() searches for the inode specified by @hashval and @data in the inode - * cache. This is a generalized version of ifind_fast() for file systems where - * the inode number is not sufficient for unique identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -static inline struct inode *ifind(struct super_block *sb, - struct hlist_head *head, int (*test)(struct inode *, void *), - void *data) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode(sb, head, test, data); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ifind_fast - internal function, you want ilookup() or iget(). - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ifind_fast() searches for the inode @ino in the inode cache. This is for - * file systems where the inode number is sufficient for unique identification - * of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -static inline struct inode *ifind_fast(struct super_block *sb, - struct hlist_head *head, unsigned long ino) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode_fast(sb, head, ino); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ilookup5 - search for an inode in the inode cache - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ilookup5() uses ifind() to search for the inode specified by @hashval and - * @data in the inode cache. This is a generalized version of ilookup() for - * file systems where the inode number is not sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -struct inode *ilookup5(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - - return ifind(sb, head, test, data); -} -EXPORT_SYMBOL(ilookup5); - -/** - * ilookup - search for an inode in the inode cache - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache. - * This is for file systems where the inode number is sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -struct inode *ilookup(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - - return ifind_fast(sb, head, ino); -} -EXPORT_SYMBOL(ilookup); - -/** - * iget5_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @hashval: hash value (usually inode number) to get - * @test: callback used for comparisons between inodes - * @set: callback used to initialize a new struct inode - * @data: opaque data pointer to pass to @test and @set - * - * This is iget() without the read_inode() portion of get_new_inode(). - * - * iget5_locked() uses ifind() to search for the inode specified by @hashval - * and @data in the inode cache and if present it is returned with an increased - * reference count. This is a generalized version of iget_locked() for file - * systems where the inode number is not sufficient for unique identification - * of an inode. - * - * If the inode is not in cache, get_new_inode() is called to allocate a new - * inode and this is returned locked, hashed, and with the I_NEW flag set. The - * file system gets to fill it in before unlocking it via unlock_new_inode(). - * - * Note both @test and @set are called with the inode_lock held, so can't sleep. - */ -struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), - int (*set)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - struct inode *inode; - - inode = ifind(sb, head, test, data); - if (inode) - return inode; - /* - * get_new_inode() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode(sb, head, test, set, data); -} -EXPORT_SYMBOL(iget5_locked); - -/** - * iget_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @ino: inode number to get - * - * This is iget() without the read_inode() portion of get_new_inode_fast(). - * - * iget_locked() uses ifind_fast() to search for the inode specified by @ino in - * the inode cache and if present it is returned with an increased reference - * count. This is for file systems where the inode number is sufficient for - * unique identification of an inode. - * - * If the inode is not in cache, get_new_inode_fast() is called to allocate a - * new inode and this is returned locked, hashed, and with the I_NEW flag set. - * The file system gets to fill it in before unlocking it via - * unlock_new_inode(). - */ -struct inode *iget_locked(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - struct inode *inode; - - inode = ifind_fast(sb, head, ino); - if (inode) - return inode; - /* - * get_new_inode_fast() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode_fast(sb, head, ino); -} -EXPORT_SYMBOL(iget_locked); - -/** - * __insert_inode_hash - hash an inode - * @inode: unhashed inode - * @hashval: unsigned long value used to locate this object in the - * inode_hashtable. - * - * Add an inode to the inode hash for this superblock. If the inode - * has no superblock it is added to a separate anonymous chain. - */ - -void __insert_inode_hash(struct inode *inode, unsigned long hashval) -{ - struct hlist_head *head = &anon_hash_chain; - if (inode->i_sb) - head = inode_hashtable + hash(inode->i_sb, hashval); - spin_lock(&inode_lock); - hlist_add_head(&inode->i_hash, head); - spin_unlock(&inode_lock); -} - -/** - * remove_inode_hash - remove an inode from the hash - * @inode: inode to unhash - * - * Remove an inode from the superblock or anonymous hash. - */ - -void remove_inode_hash(struct inode *inode) -{ - spin_lock(&inode_lock); - hlist_del_init(&inode->i_hash); - spin_unlock(&inode_lock); -} - -void generic_delete_inode(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - -<<<<<<< found - hlist_del_init(&inode->i_hash); -||||||| expected - list_del_init(&inode->i_hash); -======= ->>>>>>> replacement - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - - security_inode_delete(inode); - - if (op->delete_inode) { - void (*delete)(struct inode *) = op->delete_inode; - if (!is_bad_inode(inode)) - DQUOT_INIT(inode); - /* s_op->delete_inode internally recalls clear_inode() */ - delete(inode); - } else - clear_inode(inode); - spin_lock(&inode_lock); - list_del_init(&inode->i_hash); - spin_unlock(&inode_lock); - wake_up_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); -} -EXPORT_SYMBOL(generic_delete_inode); - -static void generic_forget_inode(struct inode *inode) -{ - struct super_block *sb = inode->i_sb; - - if (!hlist_unhashed(&inode->i_hash)) { - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_unused); - } - inodes_stat.nr_unused++; - spin_unlock(&inode_lock); - if (!sb || (sb->s_flags & MS_ACTIVE)) - return; - write_inode_now(inode, 1); - spin_lock(&inode_lock); - inodes_stat.nr_unused--; - hlist_del_init(&inode->i_hash); - } - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); -} - -/* - * Normal UNIX filesystem behaviour: delete the - * inode when the usage count drops to zero, and - * i_nlink is zero. - */ -static void generic_drop_inode(struct inode *inode) -{ - if (!inode->i_nlink) - generic_delete_inode(inode); - else - generic_forget_inode(inode); -} - -/* - * Called when we're dropping the last reference - * to an inode. - * - * Call the FS "drop()" function, defaulting to - * the legacy UNIX filesystem behaviour.. - * - * NOTE! NOTE! NOTE! We're called with the inode lock - * held, and the drop function is supposed to release - * the lock! - */ -static inline void iput_final(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - void (*drop)(struct inode *) = generic_drop_inode; - - if (op && op->drop_inode) - drop = op->drop_inode; - drop(inode); -} - -/** - * iput - put an inode - * @inode: inode to put - * - * Puts an inode, dropping its usage count. If the inode use count hits - * zero the inode is also then freed and may be destroyed. - */ - -void iput(struct inode *inode) -{ - if (inode) { - struct super_operations *op = inode->i_sb->s_op; - - if (inode->i_state == I_CLEAR) - BUG(); - - if (op && op->put_inode) - op->put_inode(inode); - - if (atomic_dec_and_lock(&inode->i_count, &inode_lock)) - iput_final(inode); - } -} - -/** - * bmap - find a block number in a file - * @inode: inode of file - * @block: block to find - * - * Returns the block number on the device holding the inode that - * is the disk block number for the block of the file requested. - * That is, asked for block 4 of inode 1 the function will return the - * disk block relative to the disk start that holds that block of the - * file. - */ - -sector_t bmap(struct inode * inode, sector_t block) -{ - sector_t res = 0; - if (inode->i_mapping->a_ops->bmap) - res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block); - return res; -} - -/* - * Return true if the filesystem which backs this inode considers the two - * passed timespecs to be sufficiently different to warrant flushing the - * altered time out to disk. - */ -static int inode_times_differ(struct inode *inode, - struct timespec *old, struct timespec *new) -{ - if (IS_ONE_SECOND(inode)) - return old->tv_sec != new->tv_sec; - return !timespec_equal(old, new); -} - -/** - * update_atime - update the access time - * @inode: inode accessed - * - * Update the accessed time on an inode and mark it for writeback. - * This function automatically handles read only file systems and media, - * as well as the "noatime" flag and inode specific "noatime" markers. - */ - -void update_atime(struct inode *inode) -{ - struct timespec now; - - if (IS_NOATIME(inode)) - return; - if (IS_NODIRATIME(inode) && S_ISDIR(inode->i_mode)) - return; - if (IS_RDONLY(inode)) - return; - - now = current_kernel_time(); - if (inode_times_differ(inode, &inode->i_atime, &now)) { - inode->i_atime = now; - mark_inode_dirty_sync(inode); - } else { - if (!timespec_equal(&inode->i_atime, &now)) - inode->i_atime = now; - } -} - -/** - * inode_update_time - update mtime and ctime time - * @inode: inode accessed - * @ctime_too: update ctime too - * - * Update the mtime time on an inode and mark it for writeback. - * When ctime_too is specified update the ctime too. - */ - -void inode_update_time(struct inode *inode, int ctime_too) -{ - struct timespec now = current_kernel_time(); - int sync_it = 0; - - if (inode_times_differ(inode, &inode->i_mtime, &now)) - sync_it = 1; - inode->i_mtime = now; - - if (ctime_too) { - if (inode_times_differ(inode, &inode->i_ctime, &now)) - sync_it = 1; - inode->i_ctime = now; - } - if (sync_it) - mark_inode_dirty_sync(inode); -} -EXPORT_SYMBOL(inode_update_time); - -int inode_needs_sync(struct inode *inode) -{ - if (IS_SYNC(inode)) - return 1; - if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) - return 1; - return 0; -} -EXPORT_SYMBOL(inode_needs_sync); - -/* - * Quota functions that want to walk the inode lists.. - */ -#ifdef CONFIG_QUOTA - -/* Functions back in dquot.c */ -void put_dquot_list(struct list_head *); -int remove_inode_dquot_ref(struct inode *, int, struct list_head *); - -void remove_dquot_ref(struct super_block *sb, int type) -{ - struct inode *inode; - struct list_head *act_head; - LIST_HEAD(tofree_head); - - if (!sb->dq_op) - return; /* nothing to do */ - spin_lock(&inode_lock); /* This lock is for inodes code */ - /* We don't have to lock against quota code - test IS_QUOTAINIT is just for speedup... */ - - list_for_each(act_head, &inode_in_use) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &inode_unused) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_dirty) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_io) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - spin_unlock(&inode_lock); - - put_dquot_list(&tofree_head); -} - -#endif - -/* - * Hashed waitqueues for wait_on_inode(). The table is pretty small - the - * kernel doesn't lock many inodes at the same time. - */ -#define I_WAIT_TABLE_ORDER 3 -static struct i_wait_queue_head { - wait_queue_head_t wqh; -} ____cacheline_aligned_in_smp i_wait_queue_heads[1<i_state & I_LOCK) { - schedule(); - goto repeat; - } - remove_wait_queue(wq, &wait); - __set_current_state(TASK_RUNNING); -} - -void __wait_on_freeing_inode(struct inode *inode) -{ - DECLARE_WAITQUEUE(wait, current); - wait_queue_head_t *wq = i_waitq_head(inode); - - add_wait_queue(wq, &wait); - set_current_state(TASK_UNINTERRUPTIBLE); - spin_unlock(&inode_lock); - schedule(); - remove_wait_queue(wq, &wait); - current->state = TASK_RUNNING; - spin_lock(&inode_lock); -} - - -void wake_up_inode(struct inode *inode) -{ - wait_queue_head_t *wq = i_waitq_head(inode); - - /* - * Prevent speculative execution through spin_unlock(&inode_lock); - */ - smp_mb(); - if (waitqueue_active(wq)) - wake_up_all(wq); -} - -/* - * Initialize the waitqueues and inode hash table. - */ -void __init inode_init(unsigned long mempages) -{ - struct hlist_head *head; - unsigned long order; - unsigned int nr_hash; - int i; - - for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++) - init_waitqueue_head(&i_wait_queue_heads[i].wqh); - - mempages >>= (14 - PAGE_SHIFT); - mempages *= sizeof(struct list_head); - for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++) - ; - - do { - unsigned long tmp; - - nr_hash = (1UL << order) * PAGE_SIZE / - sizeof(struct hlist_head); - i_hash_mask = (nr_hash - 1); - - tmp = nr_hash; - i_hash_shift = 0; - while ((tmp >>= 1UL) != 0UL) - i_hash_shift++; - - inode_hashtable = (struct hlist_head *) - __get_free_pages(GFP_ATOMIC, order); - } while (inode_hashtable == NULL && --order >= 0); - - printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n", - nr_hash, order, (PAGE_SIZE << order)); - - if (!inode_hashtable) - panic("Failed to allocate inode hash table\n"); - - head = inode_hashtable; - i = nr_hash; - do { - INIT_HLIST_HEAD(head); - head++; - i--; - } while (i); - - /* inode slab cache */ - inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), - 0, SLAB_HWCACHE_ALIGN, init_once, - NULL); - if (!inode_cachep) - panic("cannot create inode slab cache"); - - set_shrinker(DEFAULT_SEEKS, shrink_icache_memory); -} - -void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) -{ - inode->i_mode = mode; - if (S_ISCHR(mode)) { - inode->i_fop = &def_chr_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISBLK(mode)) { - inode->i_fop = &def_blk_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISFIFO(mode)) - inode->i_fop = &def_fifo_fops; - else if (S_ISSOCK(mode)) - inode->i_fop = &bad_sock_fops; - else - printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n", - mode); -} ./linux/inode-justrej/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- lmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.690007590 +0000 @@ -1,1358 +0,0 @@ -/* - * linux/fs/inode.c - * - * (C) 1997 Linus Torvalds - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * This is needed for the following functions: - * - inode_has_buffers - * - invalidate_inode_buffers - * - fsync_bdev - * - invalidate_bdev - * - * FIXME: remove all knowledge of the buffer layer from this file - */ -#include - -/* - * New inode.c implementation. - * - * This implementation has the basic premise of trying - * to be extremely low-overhead and SMP-safe, yet be - * simple enough to be "obviously correct". - * - * Famous last words. - */ - -/* inode dynamic allocation 1999, Andrea Arcangeli */ - -/* #define INODE_PARANOIA 1 */ -/* #define INODE_DEBUG 1 */ - -/* - * Inode lookup is no longer as critical as it used to be: - * most of the lookups are going to be through the dcache. - */ -#define I_HASHBITS i_hash_shift -#define I_HASHMASK i_hash_mask - -static unsigned int i_hash_mask; -static unsigned int i_hash_shift; - -/* - * Each inode can be on two separate lists. One is - * the hash list of the inode, used for lookups. The - * other linked list is the "type" list: - * "in_use" - valid inode, i_count > 0, i_nlink > 0 - * "dirty" - as "in_use" but also dirty - * "unused" - valid inode, i_count = 0 - * - * A "dirty" list is maintained for each super block, - * allowing for low-overhead inode sync() operations. - */ - -LIST_HEAD(inode_in_use); -LIST_HEAD(inode_unused); -static struct hlist_head *inode_hashtable; -static HLIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */ - -/* - * A simple spinlock to protect the list manipulations. - * - * NOTE! You also have to own the lock if you change - * the i_state of an inode while it is in use.. - */ -spinlock_t inode_lock = SPIN_LOCK_UNLOCKED; - -/* - * iprune_sem provides exclusion between the kswapd or try_to_free_pages - * icache shrinking path, and the umount path. Without this exclusion, - * by the time prune_icache calls iput for the inode whose pages it has - * been invalidating, or by the time it calls clear_inode & destroy_inode - * from its final dispose_list, the struct super_block they refer to - * (for inode->i_sb->s_op) may already have been freed and reused. - */ -static DECLARE_MUTEX(iprune_sem); - -/* - * Statistics gathering.. - */ -struct inodes_stat_t inodes_stat; - -static kmem_cache_t * inode_cachep; - -static struct inode *alloc_inode(struct super_block *sb) -{ - static struct address_space_operations empty_aops; - static struct inode_operations empty_iops; - static struct file_operations empty_fops; - struct inode *inode; - - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); - else - inode = (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL); - - if (inode) { - struct address_space * const mapping = &inode->i_data; - - inode->i_sb = sb; - inode->i_blkbits = sb->s_blocksize_bits; - inode->i_flags = 0; - atomic_set(&inode->i_count, 1); - inode->i_sock = 0; - inode->i_op = &empty_iops; - inode->i_fop = &empty_fops; - inode->i_nlink = 1; - atomic_set(&inode->i_writecount, 0); - inode->i_size = 0; - inode->i_blocks = 0; - inode->i_bytes = 0; - inode->i_generation = 0; - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); - inode->i_pipe = NULL; - inode->i_bdev = NULL; - inode->i_rdev = to_kdev_t(0); - inode->i_security = NULL; - if (security_inode_alloc(inode)) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); - return NULL; - } - - mapping->a_ops = &empty_aops; - mapping->host = inode; - mapping->gfp_mask = GFP_HIGHUSER; - mapping->dirtied_when = 0; - mapping->assoc_mapping = NULL; - mapping->backing_dev_info = &default_backing_dev_info; - if (sb->s_bdev) - mapping->backing_dev_info = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; - memset(&inode->u, 0, sizeof(inode->u)); - inode->i_mapping = mapping; - } - return inode; -} - -void destroy_inode(struct inode *inode) -{ - if (inode_has_buffers(inode)) - BUG(); - security_inode_free(inode); - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); -} - - -/* - * These are initializations that only need to be done - * once, because the fields are idempotent across use - * of the inode, so let the slab aware of that. - */ -void inode_init_once(struct inode *inode) -{ - memset(inode, 0, sizeof(*inode)); - INIT_HLIST_NODE(&inode->i_hash); - INIT_LIST_HEAD(&inode->i_data.clean_pages); - INIT_LIST_HEAD(&inode->i_data.dirty_pages); - INIT_LIST_HEAD(&inode->i_data.locked_pages); - INIT_LIST_HEAD(&inode->i_data.io_pages); - INIT_LIST_HEAD(&inode->i_dentry); - INIT_LIST_HEAD(&inode->i_devices); - sema_init(&inode->i_sem, 1); - INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); - rwlock_init(&inode->i_data.page_lock); - init_MUTEX(&inode->i_data.i_shared_sem); - INIT_LIST_HEAD(&inode->i_data.private_list); - spin_lock_init(&inode->i_data.private_lock); - INIT_LIST_HEAD(&inode->i_data.i_mmap); - INIT_LIST_HEAD(&inode->i_data.i_mmap_shared); - spin_lock_init(&inode->i_lock); -} - -static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags) -{ - struct inode * inode = (struct inode *) foo; - - if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == - SLAB_CTOR_CONSTRUCTOR) - inode_init_once(inode); -} - -/* - * inode_lock must be held - */ -void __iget(struct inode * inode) -{ - if (atomic_read(&inode->i_count)) { - atomic_inc(&inode->i_count); - return; - } - atomic_inc(&inode->i_count); - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_in_use); - } - inodes_stat.nr_unused--; -} - -/** - * clear_inode - clear an inode - * @inode: inode to clear - * - * This is called by the filesystem to tell us - * that the inode is no longer useful. We just - * terminate it with extreme prejudice. - */ - -void clear_inode(struct inode *inode) -{ - invalidate_inode_buffers(inode); - - if (inode->i_data.nrpages) - BUG(); - if (!(inode->i_state & I_FREEING)) - BUG(); - if (inode->i_state & I_CLEAR) - BUG(); - wait_on_inode(inode); - DQUOT_DROP(inode); - if (inode->i_sb && inode->i_sb->s_op->clear_inode) - inode->i_sb->s_op->clear_inode(inode); - if (inode->i_bdev) - bd_forget(inode); - inode->i_state = I_CLEAR; -} - -/* - * Dispose-list gets a local list with local inodes in it, so it doesn't - * need to worry about list corruption and SMP locks. - */ -static void dispose_list(struct list_head *head) -{ - int nr_disposed = 0; - - while (!list_empty(head)) { - struct inode *inode; - - inode = list_entry(head->next, struct inode, i_list); - list_del(&inode->i_list); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); - nr_disposed++; - } - spin_lock(&inode_lock); - inodes_stat.nr_inodes -= nr_disposed; - spin_unlock(&inode_lock); -} - -/* - * Invalidate all inodes for a device. - */ -static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose) -{ - struct list_head *next; - int busy = 0, count = 0; - - next = head->next; - for (;;) { - struct list_head * tmp = next; - struct inode * inode; - - next = next->next; - if (tmp == head) - break; - inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) - continue; - invalidate_inode_buffers(inode); - if (!atomic_read(&inode->i_count)) { - hlist_del_init(&inode->i_hash); - list_del(&inode->i_list); - list_add(&inode->i_list, dispose); - inode->i_state |= I_FREEING; - count++; - continue; - } - busy = 1; - } - /* only unused inodes may be cached with i_count zero */ - inodes_stat.nr_unused -= count; - return busy; -} - -/* - * This is a two-stage process. First we collect all - * offending inodes onto the throw-away list, and in - * the second stage we actually dispose of them. This - * is because we don't want to sleep while messing - * with the global lists.. - */ - -/** - * invalidate_inodes - discard the inodes on a device - * @sb: superblock - * - * Discard all of the inodes for a given superblock. If the discard - * fails because there are busy inodes then a non zero value is returned. - * If the discard is successful all the inodes have been discarded. - */ - -int invalidate_inodes(struct super_block * sb) -{ - int busy; - LIST_HEAD(throw_away); - - down(&iprune_sem); - spin_lock(&inode_lock); - busy = invalidate_list(&inode_in_use, sb, &throw_away); - busy |= invalidate_list(&inode_unused, sb, &throw_away); - busy |= invalidate_list(&sb->s_dirty, sb, &throw_away); - busy |= invalidate_list(&sb->s_io, sb, &throw_away); - spin_unlock(&inode_lock); - - dispose_list(&throw_away); - up(&iprune_sem); - - return busy; -} - -int invalidate_device(kdev_t dev, int do_sync) -{ - struct super_block *sb; - struct block_device *bdev = bdget(kdev_t_to_nr(dev)); - int res; - - if (!bdev) - return 0; - - if (do_sync) - fsync_bdev(bdev); - - res = 0; - sb = get_super(bdev); - if (sb) { - /* - * no need to lock the super, get_super holds the - * read semaphore so the filesystem cannot go away - * under us (->put_super runs with the write lock - * hold). - */ - shrink_dcache_sb(sb); - res = invalidate_inodes(sb); - drop_super(sb); - } - invalidate_bdev(bdev, 0); - bdput(bdev); - return res; -} - -static int can_unuse(struct inode *inode) -{ - if (inode->i_state) - return 0; - if (inode_has_buffers(inode)) - return 0; - if (atomic_read(&inode->i_count)) - return 0; - if (inode->i_data.nrpages) - return 0; - return 1; -} - -/* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). - * - * Any inodes which are pinned purely because of attached pagecache have their - * pagecache removed. We expect the final iput() on that inode to add it to - * the front of the inode_unused list. So look for it there and if the - * inode is still freeable, proceed. The right inode is found 99.9% of the - * time in testing on a 4-way. - * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. - */ -static void prune_icache(int nr_to_scan) -{ - LIST_HEAD(freeable); - int nr_pruned = 0; - int nr_scanned; - unsigned long reap = 0; - - down(&iprune_sem); - spin_lock(&inode_lock); - for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { - struct inode *inode; - - if (list_empty(&inode_unused)) - break; - - inode = list_entry(inode_unused.prev, struct inode, i_list); - - if (inode->i_state || atomic_read(&inode->i_count)) { - list_move(&inode->i_list, &inode_unused); - continue; - } - if (inode_has_buffers(inode) || inode->i_data.nrpages) { - __iget(inode); - spin_unlock(&inode_lock); - if (remove_inode_buffers(inode)) - reap += invalidate_inode_pages(&inode->i_data); - iput(inode); - spin_lock(&inode_lock); - - if (inode != list_entry(inode_unused.next, - struct inode, i_list)) - continue; /* wrong inode or list_empty */ - if (!can_unuse(inode)) - continue; - } - hlist_del_init(&inode->i_hash); - list_move(&inode->i_list, &freeable); - inode->i_state |= I_FREEING; - nr_pruned++; - } - inodes_stat.nr_unused -= nr_pruned; - spin_unlock(&inode_lock); - - dispose_list(&freeable); - up(&iprune_sem); - - if (current_is_kswapd) - mod_page_state(kswapd_inodesteal, reap); - else - mod_page_state(pginodesteal, reap); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(int nr, unsigned int gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (gfp_mask & __GFP_FS) - prune_icache(nr); - } - return inodes_stat.nr_unused; -} - -void __wait_on_freeing_inode(struct inode *inode); -/* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() - * by hand after calling find_inode now! This simplifies iunique and won't - * add any additional branch in the common code. - */ -static struct inode * find_inode(struct super_block * sb, struct hlist_head *head, int (*test)(struct inode *, void *), void *data) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = hlist_entry(node, struct inode, i_hash); - if (inode->i_sb != sb) - continue; - if (!test(inode, data)) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/* - * find_inode_fast is the fast path version of find_inode, see the comment at - * iget_locked for details. - */ -static struct inode * find_inode_fast(struct super_block * sb, struct hlist_head *head, unsigned long ino) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = list_entry(node, struct inode, i_hash); - if (inode->i_ino != ino) - continue; - if (inode->i_sb != sb) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/** - * new_inode - obtain an inode - * @sb: superblock - * - * Allocates a new inode for given superblock. - */ - -struct inode *new_inode(struct super_block *sb) -{ - static unsigned long last_ino; - struct inode * inode; - - spin_lock_prefetch(&inode_lock); - - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode_lock); - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - inode->i_ino = ++last_ino; - inode->i_state = 0; - spin_unlock(&inode_lock); - } - return inode; -} - -void unlock_new_inode(struct inode *inode) -{ - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). - */ - inode->i_state &= ~(I_LOCK|I_NEW); - wake_up_inode(inode); -} -EXPORT_SYMBOL(unlock_new_inode); - -/* - * This is called without the inode lock held.. Be careful. - * - * We no longer cache the sb_flags in i_flags - see fs.h - * -- rmk@arm.uk.linux.org - */ -static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode(sb, head, test, data); - if (!old) { - if (set(inode, data)) - goto set_failed; - - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; - -set_failed: - spin_unlock(&inode_lock); - destroy_inode(inode); - return NULL; -} - -/* - * get_new_inode_fast is the fast path version of get_new_inode, see the - * comment at iget_locked for details. - */ -static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); - if (!old) { - inode->i_ino = ino; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; -} - -static inline unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES); - tmp = tmp + (tmp >> I_HASHBITS); - return tmp & I_HASHMASK; -} - -/* Yeah, I know about quadratic hash. Maybe, later. */ - -/** - * iunique - get a unique inode number - * @sb: superblock - * @max_reserved: highest reserved inode number - * - * Obtain an inode number that is unique on the system for a given - * superblock. This is used by file systems that have no natural - * permanent inode numbering system. An inode number is returned that - * is higher than the reserved limit but unique. - * - * BUGS: - * With a large number of inodes live on the file system this function - * currently becomes quite slow. - */ - -ino_t iunique(struct super_block *sb, ino_t max_reserved) -{ - static ino_t counter = 0; - struct inode *inode; - struct hlist_head * head; - ino_t res; - spin_lock(&inode_lock); -retry: - if (counter > max_reserved) { - head = inode_hashtable + hash(sb,counter); - res = counter++; - inode = find_inode_fast(sb, head, res); - if (!inode) { - spin_unlock(&inode_lock); - return res; - } - } else { - counter = max_reserved + 1; - } - goto retry; - -} - -struct inode *igrab(struct inode *inode) -{ - spin_lock(&inode_lock); - if (!(inode->i_state & I_FREEING)) - __iget(inode); - else - /* - * Handle the case where s_op->clear_inode is not been - * called yet, and somebody is calling igrab - * while the inode is getting freed. - */ - inode = NULL; - spin_unlock(&inode_lock); - return inode; -} - -/** - * ifind - internal function, you want ilookup5() or iget5(). - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ifind() searches for the inode specified by @hashval and @data in the inode - * cache. This is a generalized version of ifind_fast() for file systems where - * the inode number is not sufficient for unique identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -static inline struct inode *ifind(struct super_block *sb, - struct hlist_head *head, int (*test)(struct inode *, void *), - void *data) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode(sb, head, test, data); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ifind_fast - internal function, you want ilookup() or iget(). - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ifind_fast() searches for the inode @ino in the inode cache. This is for - * file systems where the inode number is sufficient for unique identification - * of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -static inline struct inode *ifind_fast(struct super_block *sb, - struct hlist_head *head, unsigned long ino) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode_fast(sb, head, ino); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ilookup5 - search for an inode in the inode cache - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ilookup5() uses ifind() to search for the inode specified by @hashval and - * @data in the inode cache. This is a generalized version of ilookup() for - * file systems where the inode number is not sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -struct inode *ilookup5(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - - return ifind(sb, head, test, data); -} -EXPORT_SYMBOL(ilookup5); - -/** - * ilookup - search for an inode in the inode cache - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache. - * This is for file systems where the inode number is sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -struct inode *ilookup(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - - return ifind_fast(sb, head, ino); -} -EXPORT_SYMBOL(ilookup); - -/** - * iget5_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @hashval: hash value (usually inode number) to get - * @test: callback used for comparisons between inodes - * @set: callback used to initialize a new struct inode - * @data: opaque data pointer to pass to @test and @set - * - * This is iget() without the read_inode() portion of get_new_inode(). - * - * iget5_locked() uses ifind() to search for the inode specified by @hashval - * and @data in the inode cache and if present it is returned with an increased - * reference count. This is a generalized version of iget_locked() for file - * systems where the inode number is not sufficient for unique identification - * of an inode. - * - * If the inode is not in cache, get_new_inode() is called to allocate a new - * inode and this is returned locked, hashed, and with the I_NEW flag set. The - * file system gets to fill it in before unlocking it via unlock_new_inode(). - * - * Note both @test and @set are called with the inode_lock held, so can't sleep. - */ -struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), - int (*set)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - struct inode *inode; - - inode = ifind(sb, head, test, data); - if (inode) - return inode; - /* - * get_new_inode() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode(sb, head, test, set, data); -} -EXPORT_SYMBOL(iget5_locked); - -/** - * iget_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @ino: inode number to get - * - * This is iget() without the read_inode() portion of get_new_inode_fast(). - * - * iget_locked() uses ifind_fast() to search for the inode specified by @ino in - * the inode cache and if present it is returned with an increased reference - * count. This is for file systems where the inode number is sufficient for - * unique identification of an inode. - * - * If the inode is not in cache, get_new_inode_fast() is called to allocate a - * new inode and this is returned locked, hashed, and with the I_NEW flag set. - * The file system gets to fill it in before unlocking it via - * unlock_new_inode(). - */ -struct inode *iget_locked(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - struct inode *inode; - - inode = ifind_fast(sb, head, ino); - if (inode) - return inode; - /* - * get_new_inode_fast() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode_fast(sb, head, ino); -} -EXPORT_SYMBOL(iget_locked); - -/** - * __insert_inode_hash - hash an inode - * @inode: unhashed inode - * @hashval: unsigned long value used to locate this object in the - * inode_hashtable. - * - * Add an inode to the inode hash for this superblock. If the inode - * has no superblock it is added to a separate anonymous chain. - */ - -void __insert_inode_hash(struct inode *inode, unsigned long hashval) -{ - struct hlist_head *head = &anon_hash_chain; - if (inode->i_sb) - head = inode_hashtable + hash(inode->i_sb, hashval); - spin_lock(&inode_lock); - hlist_add_head(&inode->i_hash, head); - spin_unlock(&inode_lock); -} - -/** - * remove_inode_hash - remove an inode from the hash - * @inode: inode to unhash - * - * Remove an inode from the superblock or anonymous hash. - */ - -void remove_inode_hash(struct inode *inode) -{ - spin_lock(&inode_lock); - hlist_del_init(&inode->i_hash); - spin_unlock(&inode_lock); -} - -void generic_delete_inode(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - -<<<<<<< found - hlist_del_init(&inode->i_hash); -||||||| expected - list_del_init(&inode->i_hash); -======= ->>>>>>> replacement - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - - security_inode_delete(inode); - - if (op->delete_inode) { - void (*delete)(struct inode *) = op->delete_inode; - if (!is_bad_inode(inode)) - DQUOT_INIT(inode); - /* s_op->delete_inode internally recalls clear_inode() */ - delete(inode); - } else - clear_inode(inode); - spin_lock(&inode_lock); - list_del_init(&inode->i_hash); - spin_unlock(&inode_lock); - wake_up_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); -} -EXPORT_SYMBOL(generic_delete_inode); - -static void generic_forget_inode(struct inode *inode) -{ - struct super_block *sb = inode->i_sb; - - if (!hlist_unhashed(&inode->i_hash)) { - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_unused); - } - inodes_stat.nr_unused++; - spin_unlock(&inode_lock); - if (!sb || (sb->s_flags & MS_ACTIVE)) - return; - write_inode_now(inode, 1); - spin_lock(&inode_lock); - inodes_stat.nr_unused--; - hlist_del_init(&inode->i_hash); - } - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); -} - -/* - * Normal UNIX filesystem behaviour: delete the - * inode when the usage count drops to zero, and - * i_nlink is zero. - */ -static void generic_drop_inode(struct inode *inode) -{ - if (!inode->i_nlink) - generic_delete_inode(inode); - else - generic_forget_inode(inode); -} - -/* - * Called when we're dropping the last reference - * to an inode. - * - * Call the FS "drop()" function, defaulting to - * the legacy UNIX filesystem behaviour.. - * - * NOTE! NOTE! NOTE! We're called with the inode lock - * held, and the drop function is supposed to release - * the lock! - */ -static inline void iput_final(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - void (*drop)(struct inode *) = generic_drop_inode; - - if (op && op->drop_inode) - drop = op->drop_inode; - drop(inode); -} - -/** - * iput - put an inode - * @inode: inode to put - * - * Puts an inode, dropping its usage count. If the inode use count hits - * zero the inode is also then freed and may be destroyed. - */ - -void iput(struct inode *inode) -{ - if (inode) { - struct super_operations *op = inode->i_sb->s_op; - - if (inode->i_state == I_CLEAR) - BUG(); - - if (op && op->put_inode) - op->put_inode(inode); - - if (atomic_dec_and_lock(&inode->i_count, &inode_lock)) - iput_final(inode); - } -} - -/** - * bmap - find a block number in a file - * @inode: inode of file - * @block: block to find - * - * Returns the block number on the device holding the inode that - * is the disk block number for the block of the file requested. - * That is, asked for block 4 of inode 1 the function will return the - * disk block relative to the disk start that holds that block of the - * file. - */ - -sector_t bmap(struct inode * inode, sector_t block) -{ - sector_t res = 0; - if (inode->i_mapping->a_ops->bmap) - res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block); - return res; -} - -/* - * Return true if the filesystem which backs this inode considers the two - * passed timespecs to be sufficiently different to warrant flushing the - * altered time out to disk. - */ -static int inode_times_differ(struct inode *inode, - struct timespec *old, struct timespec *new) -{ - if (IS_ONE_SECOND(inode)) - return old->tv_sec != new->tv_sec; - return !timespec_equal(old, new); -} - -/** - * update_atime - update the access time - * @inode: inode accessed - * - * Update the accessed time on an inode and mark it for writeback. - * This function automatically handles read only file systems and media, - * as well as the "noatime" flag and inode specific "noatime" markers. - */ - -void update_atime(struct inode *inode) -{ - struct timespec now; - - if (IS_NOATIME(inode)) - return; - if (IS_NODIRATIME(inode) && S_ISDIR(inode->i_mode)) - return; - if (IS_RDONLY(inode)) - return; - - now = current_kernel_time(); - if (inode_times_differ(inode, &inode->i_atime, &now)) { - inode->i_atime = now; - mark_inode_dirty_sync(inode); - } else { - if (!timespec_equal(&inode->i_atime, &now)) - inode->i_atime = now; - } -} - -/** - * inode_update_time - update mtime and ctime time - * @inode: inode accessed - * @ctime_too: update ctime too - * - * Update the mtime time on an inode and mark it for writeback. - * When ctime_too is specified update the ctime too. - */ - -void inode_update_time(struct inode *inode, int ctime_too) -{ - struct timespec now = current_kernel_time(); - int sync_it = 0; - - if (inode_times_differ(inode, &inode->i_mtime, &now)) - sync_it = 1; - inode->i_mtime = now; - - if (ctime_too) { - if (inode_times_differ(inode, &inode->i_ctime, &now)) - sync_it = 1; - inode->i_ctime = now; - } - if (sync_it) - mark_inode_dirty_sync(inode); -} -EXPORT_SYMBOL(inode_update_time); - -int inode_needs_sync(struct inode *inode) -{ - if (IS_SYNC(inode)) - return 1; - if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) - return 1; - return 0; -} -EXPORT_SYMBOL(inode_needs_sync); - -/* - * Quota functions that want to walk the inode lists.. - */ -#ifdef CONFIG_QUOTA - -/* Functions back in dquot.c */ -void put_dquot_list(struct list_head *); -int remove_inode_dquot_ref(struct inode *, int, struct list_head *); - -void remove_dquot_ref(struct super_block *sb, int type) -{ - struct inode *inode; - struct list_head *act_head; - LIST_HEAD(tofree_head); - - if (!sb->dq_op) - return; /* nothing to do */ - spin_lock(&inode_lock); /* This lock is for inodes code */ - /* We don't have to lock against quota code - test IS_QUOTAINIT is just for speedup... */ - - list_for_each(act_head, &inode_in_use) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &inode_unused) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_dirty) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_io) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - spin_unlock(&inode_lock); - - put_dquot_list(&tofree_head); -} - -#endif - -/* - * Hashed waitqueues for wait_on_inode(). The table is pretty small - the - * kernel doesn't lock many inodes at the same time. - */ -#define I_WAIT_TABLE_ORDER 3 -static struct i_wait_queue_head { - wait_queue_head_t wqh; -} ____cacheline_aligned_in_smp i_wait_queue_heads[1<i_state & I_LOCK) { - schedule(); - goto repeat; - } - remove_wait_queue(wq, &wait); - __set_current_state(TASK_RUNNING); -} - -void __wait_on_freeing_inode(struct inode *inode) -{ - DECLARE_WAITQUEUE(wait, current); - wait_queue_head_t *wq = i_waitq_head(inode); - - add_wait_queue(wq, &wait); - set_current_state(TASK_UNINTERRUPTIBLE); - spin_unlock(&inode_lock); - schedule(); - remove_wait_queue(wq, &wait); - current->state = TASK_RUNNING; - spin_lock(&inode_lock); -} - - -void wake_up_inode(struct inode *inode) -{ - wait_queue_head_t *wq = i_waitq_head(inode); - - /* - * Prevent speculative execution through spin_unlock(&inode_lock); - */ - smp_mb(); - if (waitqueue_active(wq)) - wake_up_all(wq); -} - -/* - * Initialize the waitqueues and inode hash table. - */ -void __init inode_init(unsigned long mempages) -{ - struct hlist_head *head; - unsigned long order; - unsigned int nr_hash; - int i; - - for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++) - init_waitqueue_head(&i_wait_queue_heads[i].wqh); - - mempages >>= (14 - PAGE_SHIFT); - mempages *= sizeof(struct list_head); - for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++) - ; - - do { - unsigned long tmp; - - nr_hash = (1UL << order) * PAGE_SIZE / - sizeof(struct hlist_head); - i_hash_mask = (nr_hash - 1); - - tmp = nr_hash; - i_hash_shift = 0; - while ((tmp >>= 1UL) != 0UL) - i_hash_shift++; - - inode_hashtable = (struct hlist_head *) - __get_free_pages(GFP_ATOMIC, order); - } while (inode_hashtable == NULL && --order >= 0); - - printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n", - nr_hash, order, (PAGE_SIZE << order)); - - if (!inode_hashtable) - panic("Failed to allocate inode hash table\n"); - - head = inode_hashtable; - i = nr_hash; - do { - INIT_HLIST_HEAD(head); - head++; - i--; - } while (i); - - /* inode slab cache */ - inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), - 0, SLAB_HWCACHE_ALIGN, init_once, - NULL); - if (!inode_cachep) - panic("cannot create inode slab cache"); - - set_shrinker(DEFAULT_SEEKS, shrink_icache_memory); -} - -void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) -{ - inode->i_mode = mode; - if (S_ISCHR(mode)) { - inode->i_fop = &def_chr_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISBLK(mode)) { - inode->i_fop = &def_blk_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISFIFO(mode)) - inode->i_fop = &def_fifo_fops; - else if (S_ISSOCK(mode)) - inode->i_fop = &bad_sock_fops; - else - printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n", - mode); -} ./linux/inode-justrej/lmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- wmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.730825874 +0000 @@ -1,1352 +0,0 @@ -/* - * linux/fs/inode.c - * - * (C) 1997 Linus Torvalds - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * This is needed for the following functions: - * - inode_has_buffers - * - invalidate_inode_buffers - * - fsync_bdev - * - invalidate_bdev - * - * FIXME: remove all knowledge of the buffer layer from this file - */ -#include - -/* - * New inode.c implementation. - * - * This implementation has the basic premise of trying - * to be extremely low-overhead and SMP-safe, yet be - * simple enough to be "obviously correct". - * - * Famous last words. - */ - -/* inode dynamic allocation 1999, Andrea Arcangeli */ - -/* #define INODE_PARANOIA 1 */ -/* #define INODE_DEBUG 1 */ - -/* - * Inode lookup is no longer as critical as it used to be: - * most of the lookups are going to be through the dcache. - */ -#define I_HASHBITS i_hash_shift -#define I_HASHMASK i_hash_mask - -static unsigned int i_hash_mask; -static unsigned int i_hash_shift; - -/* - * Each inode can be on two separate lists. One is - * the hash list of the inode, used for lookups. The - * other linked list is the "type" list: - * "in_use" - valid inode, i_count > 0, i_nlink > 0 - * "dirty" - as "in_use" but also dirty - * "unused" - valid inode, i_count = 0 - * - * A "dirty" list is maintained for each super block, - * allowing for low-overhead inode sync() operations. - */ - -LIST_HEAD(inode_in_use); -LIST_HEAD(inode_unused); -static struct hlist_head *inode_hashtable; -static HLIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */ - -/* - * A simple spinlock to protect the list manipulations. - * - * NOTE! You also have to own the lock if you change - * the i_state of an inode while it is in use.. - */ -spinlock_t inode_lock = SPIN_LOCK_UNLOCKED; - -/* - * iprune_sem provides exclusion between the kswapd or try_to_free_pages - * icache shrinking path, and the umount path. Without this exclusion, - * by the time prune_icache calls iput for the inode whose pages it has - * been invalidating, or by the time it calls clear_inode & destroy_inode - * from its final dispose_list, the struct super_block they refer to - * (for inode->i_sb->s_op) may already have been freed and reused. - */ -static DECLARE_MUTEX(iprune_sem); - -/* - * Statistics gathering.. - */ -struct inodes_stat_t inodes_stat; - -static kmem_cache_t * inode_cachep; - -static struct inode *alloc_inode(struct super_block *sb) -{ - static struct address_space_operations empty_aops; - static struct inode_operations empty_iops; - static struct file_operations empty_fops; - struct inode *inode; - - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); - else - inode = (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL); - - if (inode) { - struct address_space * const mapping = &inode->i_data; - - inode->i_sb = sb; - inode->i_blkbits = sb->s_blocksize_bits; - inode->i_flags = 0; - atomic_set(&inode->i_count, 1); - inode->i_sock = 0; - inode->i_op = &empty_iops; - inode->i_fop = &empty_fops; - inode->i_nlink = 1; - atomic_set(&inode->i_writecount, 0); - inode->i_size = 0; - inode->i_blocks = 0; - inode->i_bytes = 0; - inode->i_generation = 0; - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); - inode->i_pipe = NULL; - inode->i_bdev = NULL; - inode->i_rdev = to_kdev_t(0); - inode->i_security = NULL; - if (security_inode_alloc(inode)) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); - return NULL; - } - - mapping->a_ops = &empty_aops; - mapping->host = inode; - mapping->gfp_mask = GFP_HIGHUSER; - mapping->dirtied_when = 0; - mapping->assoc_mapping = NULL; - mapping->backing_dev_info = &default_backing_dev_info; - if (sb->s_bdev) - mapping->backing_dev_info = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; - memset(&inode->u, 0, sizeof(inode->u)); - inode->i_mapping = mapping; - } - return inode; -} - -void destroy_inode(struct inode *inode) -{ - if (inode_has_buffers(inode)) - BUG(); - security_inode_free(inode); - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); -} - - -/* - * These are initializations that only need to be done - * once, because the fields are idempotent across use - * of the inode, so let the slab aware of that. - */ -void inode_init_once(struct inode *inode) -{ - memset(inode, 0, sizeof(*inode)); - INIT_HLIST_NODE(&inode->i_hash); - INIT_LIST_HEAD(&inode->i_data.clean_pages); - INIT_LIST_HEAD(&inode->i_data.dirty_pages); - INIT_LIST_HEAD(&inode->i_data.locked_pages); - INIT_LIST_HEAD(&inode->i_data.io_pages); - INIT_LIST_HEAD(&inode->i_dentry); - INIT_LIST_HEAD(&inode->i_devices); - sema_init(&inode->i_sem, 1); - INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); - rwlock_init(&inode->i_data.page_lock); - init_MUTEX(&inode->i_data.i_shared_sem); - INIT_LIST_HEAD(&inode->i_data.private_list); - spin_lock_init(&inode->i_data.private_lock); - INIT_LIST_HEAD(&inode->i_data.i_mmap); - INIT_LIST_HEAD(&inode->i_data.i_mmap_shared); - spin_lock_init(&inode->i_lock); -} - -static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags) -{ - struct inode * inode = (struct inode *) foo; - - if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == - SLAB_CTOR_CONSTRUCTOR) - inode_init_once(inode); -} - -/* - * inode_lock must be held - */ -void __iget(struct inode * inode) -{ - if (atomic_read(&inode->i_count)) { - atomic_inc(&inode->i_count); - return; - } - atomic_inc(&inode->i_count); - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_in_use); - } - inodes_stat.nr_unused--; -} - -/** - * clear_inode - clear an inode - * @inode: inode to clear - * - * This is called by the filesystem to tell us - * that the inode is no longer useful. We just - * terminate it with extreme prejudice. - */ - -void clear_inode(struct inode *inode) -{ - invalidate_inode_buffers(inode); - - if (inode->i_data.nrpages) - BUG(); - if (!(inode->i_state & I_FREEING)) - BUG(); - if (inode->i_state & I_CLEAR) - BUG(); - wait_on_inode(inode); - DQUOT_DROP(inode); - if (inode->i_sb && inode->i_sb->s_op->clear_inode) - inode->i_sb->s_op->clear_inode(inode); - if (inode->i_bdev) - bd_forget(inode); - inode->i_state = I_CLEAR; -} - -/* - * Dispose-list gets a local list with local inodes in it, so it doesn't - * need to worry about list corruption and SMP locks. - */ -static void dispose_list(struct list_head *head) -{ - int nr_disposed = 0; - - while (!list_empty(head)) { - struct inode *inode; - - inode = list_entry(head->next, struct inode, i_list); - list_del(&inode->i_list); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); - nr_disposed++; - } - spin_lock(&inode_lock); - inodes_stat.nr_inodes -= nr_disposed; - spin_unlock(&inode_lock); -} - -/* - * Invalidate all inodes for a device. - */ -static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose) -{ - struct list_head *next; - int busy = 0, count = 0; - - next = head->next; - for (;;) { - struct list_head * tmp = next; - struct inode * inode; - - next = next->next; - if (tmp == head) - break; - inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) - continue; - invalidate_inode_buffers(inode); - if (!atomic_read(&inode->i_count)) { - hlist_del_init(&inode->i_hash); - list_del(&inode->i_list); - list_add(&inode->i_list, dispose); - inode->i_state |= I_FREEING; - count++; - continue; - } - busy = 1; - } - /* only unused inodes may be cached with i_count zero */ - inodes_stat.nr_unused -= count; - return busy; -} - -/* - * This is a two-stage process. First we collect all - * offending inodes onto the throw-away list, and in - * the second stage we actually dispose of them. This - * is because we don't want to sleep while messing - * with the global lists.. - */ - -/** - * invalidate_inodes - discard the inodes on a device - * @sb: superblock - * - * Discard all of the inodes for a given superblock. If the discard - * fails because there are busy inodes then a non zero value is returned. - * If the discard is successful all the inodes have been discarded. - */ - -int invalidate_inodes(struct super_block * sb) -{ - int busy; - LIST_HEAD(throw_away); - - down(&iprune_sem); - spin_lock(&inode_lock); - busy = invalidate_list(&inode_in_use, sb, &throw_away); - busy |= invalidate_list(&inode_unused, sb, &throw_away); - busy |= invalidate_list(&sb->s_dirty, sb, &throw_away); - busy |= invalidate_list(&sb->s_io, sb, &throw_away); - spin_unlock(&inode_lock); - - dispose_list(&throw_away); - up(&iprune_sem); - - return busy; -} - -int invalidate_device(kdev_t dev, int do_sync) -{ - struct super_block *sb; - struct block_device *bdev = bdget(kdev_t_to_nr(dev)); - int res; - - if (!bdev) - return 0; - - if (do_sync) - fsync_bdev(bdev); - - res = 0; - sb = get_super(bdev); - if (sb) { - /* - * no need to lock the super, get_super holds the - * read semaphore so the filesystem cannot go away - * under us (->put_super runs with the write lock - * hold). - */ - shrink_dcache_sb(sb); - res = invalidate_inodes(sb); - drop_super(sb); - } - invalidate_bdev(bdev, 0); - bdput(bdev); - return res; -} - -static int can_unuse(struct inode *inode) -{ - if (inode->i_state) - return 0; - if (inode_has_buffers(inode)) - return 0; - if (atomic_read(&inode->i_count)) - return 0; - if (inode->i_data.nrpages) - return 0; - return 1; -} - -/* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). - * - * Any inodes which are pinned purely because of attached pagecache have their - * pagecache removed. We expect the final iput() on that inode to add it to - * the front of the inode_unused list. So look for it there and if the - * inode is still freeable, proceed. The right inode is found 99.9% of the - * time in testing on a 4-way. - * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. - */ -static void prune_icache(int nr_to_scan) -{ - LIST_HEAD(freeable); - int nr_pruned = 0; - int nr_scanned; - unsigned long reap = 0; - - down(&iprune_sem); - spin_lock(&inode_lock); - for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { - struct inode *inode; - - if (list_empty(&inode_unused)) - break; - - inode = list_entry(inode_unused.prev, struct inode, i_list); - - if (inode->i_state || atomic_read(&inode->i_count)) { - list_move(&inode->i_list, &inode_unused); - continue; - } - if (inode_has_buffers(inode) || inode->i_data.nrpages) { - __iget(inode); - spin_unlock(&inode_lock); - if (remove_inode_buffers(inode)) - reap += invalidate_inode_pages(&inode->i_data); - iput(inode); - spin_lock(&inode_lock); - - if (inode != list_entry(inode_unused.next, - struct inode, i_list)) - continue; /* wrong inode or list_empty */ - if (!can_unuse(inode)) - continue; - } - hlist_del_init(&inode->i_hash); - list_move(&inode->i_list, &freeable); - inode->i_state |= I_FREEING; - nr_pruned++; - } - inodes_stat.nr_unused -= nr_pruned; - spin_unlock(&inode_lock); - - dispose_list(&freeable); - up(&iprune_sem); - - if (current_is_kswapd) - mod_page_state(kswapd_inodesteal, reap); - else - mod_page_state(pginodesteal, reap); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(int nr, unsigned int gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (gfp_mask & __GFP_FS) - prune_icache(nr); - } - return inodes_stat.nr_unused; -} - -void __wait_on_freeing_inode(struct inode *inode); -/* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() - * by hand after calling find_inode now! This simplifies iunique and won't - * add any additional branch in the common code. - */ -static struct inode * find_inode(struct super_block * sb, struct hlist_head *head, int (*test)(struct inode *, void *), void *data) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = hlist_entry(node, struct inode, i_hash); - if (inode->i_sb != sb) - continue; - if (!test(inode, data)) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/* - * find_inode_fast is the fast path version of find_inode, see the comment at - * iget_locked for details. - */ -static struct inode * find_inode_fast(struct super_block * sb, struct hlist_head *head, unsigned long ino) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = list_entry(node, struct inode, i_hash); - if (inode->i_ino != ino) - continue; - if (inode->i_sb != sb) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/** - * new_inode - obtain an inode - * @sb: superblock - * - * Allocates a new inode for given superblock. - */ - -struct inode *new_inode(struct super_block *sb) -{ - static unsigned long last_ino; - struct inode * inode; - - spin_lock_prefetch(&inode_lock); - - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode_lock); - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - inode->i_ino = ++last_ino; - inode->i_state = 0; - spin_unlock(&inode_lock); - } - return inode; -} - -void unlock_new_inode(struct inode *inode) -{ - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). - */ - inode->i_state &= ~(I_LOCK|I_NEW); - wake_up_inode(inode); -} -EXPORT_SYMBOL(unlock_new_inode); - -/* - * This is called without the inode lock held.. Be careful. - * - * We no longer cache the sb_flags in i_flags - see fs.h - * -- rmk@arm.uk.linux.org - */ -static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode(sb, head, test, data); - if (!old) { - if (set(inode, data)) - goto set_failed; - - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; - -set_failed: - spin_unlock(&inode_lock); - destroy_inode(inode); - return NULL; -} - -/* - * get_new_inode_fast is the fast path version of get_new_inode, see the - * comment at iget_locked for details. - */ -static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); - if (!old) { - inode->i_ino = ino; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; -} - -static inline unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES); - tmp = tmp + (tmp >> I_HASHBITS); - return tmp & I_HASHMASK; -} - -/* Yeah, I know about quadratic hash. Maybe, later. */ - -/** - * iunique - get a unique inode number - * @sb: superblock - * @max_reserved: highest reserved inode number - * - * Obtain an inode number that is unique on the system for a given - * superblock. This is used by file systems that have no natural - * permanent inode numbering system. An inode number is returned that - * is higher than the reserved limit but unique. - * - * BUGS: - * With a large number of inodes live on the file system this function - * currently becomes quite slow. - */ - -ino_t iunique(struct super_block *sb, ino_t max_reserved) -{ - static ino_t counter = 0; - struct inode *inode; - struct hlist_head * head; - ino_t res; - spin_lock(&inode_lock); -retry: - if (counter > max_reserved) { - head = inode_hashtable + hash(sb,counter); - res = counter++; - inode = find_inode_fast(sb, head, res); - if (!inode) { - spin_unlock(&inode_lock); - return res; - } - } else { - counter = max_reserved + 1; - } - goto retry; - -} - -struct inode *igrab(struct inode *inode) -{ - spin_lock(&inode_lock); - if (!(inode->i_state & I_FREEING)) - __iget(inode); - else - /* - * Handle the case where s_op->clear_inode is not been - * called yet, and somebody is calling igrab - * while the inode is getting freed. - */ - inode = NULL; - spin_unlock(&inode_lock); - return inode; -} - -/** - * ifind - internal function, you want ilookup5() or iget5(). - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ifind() searches for the inode specified by @hashval and @data in the inode - * cache. This is a generalized version of ifind_fast() for file systems where - * the inode number is not sufficient for unique identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -static inline struct inode *ifind(struct super_block *sb, - struct hlist_head *head, int (*test)(struct inode *, void *), - void *data) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode(sb, head, test, data); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ifind_fast - internal function, you want ilookup() or iget(). - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ifind_fast() searches for the inode @ino in the inode cache. This is for - * file systems where the inode number is sufficient for unique identification - * of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -static inline struct inode *ifind_fast(struct super_block *sb, - struct hlist_head *head, unsigned long ino) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode_fast(sb, head, ino); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ilookup5 - search for an inode in the inode cache - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ilookup5() uses ifind() to search for the inode specified by @hashval and - * @data in the inode cache. This is a generalized version of ilookup() for - * file systems where the inode number is not sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -struct inode *ilookup5(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - - return ifind(sb, head, test, data); -} -EXPORT_SYMBOL(ilookup5); - -/** - * ilookup - search for an inode in the inode cache - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache. - * This is for file systems where the inode number is sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -struct inode *ilookup(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - - return ifind_fast(sb, head, ino); -} -EXPORT_SYMBOL(ilookup); - -/** - * iget5_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @hashval: hash value (usually inode number) to get - * @test: callback used for comparisons between inodes - * @set: callback used to initialize a new struct inode - * @data: opaque data pointer to pass to @test and @set - * - * This is iget() without the read_inode() portion of get_new_inode(). - * - * iget5_locked() uses ifind() to search for the inode specified by @hashval - * and @data in the inode cache and if present it is returned with an increased - * reference count. This is a generalized version of iget_locked() for file - * systems where the inode number is not sufficient for unique identification - * of an inode. - * - * If the inode is not in cache, get_new_inode() is called to allocate a new - * inode and this is returned locked, hashed, and with the I_NEW flag set. The - * file system gets to fill it in before unlocking it via unlock_new_inode(). - * - * Note both @test and @set are called with the inode_lock held, so can't sleep. - */ -struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), - int (*set)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - struct inode *inode; - - inode = ifind(sb, head, test, data); - if (inode) - return inode; - /* - * get_new_inode() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode(sb, head, test, set, data); -} -EXPORT_SYMBOL(iget5_locked); - -/** - * iget_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @ino: inode number to get - * - * This is iget() without the read_inode() portion of get_new_inode_fast(). - * - * iget_locked() uses ifind_fast() to search for the inode specified by @ino in - * the inode cache and if present it is returned with an increased reference - * count. This is for file systems where the inode number is sufficient for - * unique identification of an inode. - * - * If the inode is not in cache, get_new_inode_fast() is called to allocate a - * new inode and this is returned locked, hashed, and with the I_NEW flag set. - * The file system gets to fill it in before unlocking it via - * unlock_new_inode(). - */ -struct inode *iget_locked(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - struct inode *inode; - - inode = ifind_fast(sb, head, ino); - if (inode) - return inode; - /* - * get_new_inode_fast() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode_fast(sb, head, ino); -} -EXPORT_SYMBOL(iget_locked); - -/** - * __insert_inode_hash - hash an inode - * @inode: unhashed inode - * @hashval: unsigned long value used to locate this object in the - * inode_hashtable. - * - * Add an inode to the inode hash for this superblock. If the inode - * has no superblock it is added to a separate anonymous chain. - */ - -void __insert_inode_hash(struct inode *inode, unsigned long hashval) -{ - struct hlist_head *head = &anon_hash_chain; - if (inode->i_sb) - head = inode_hashtable + hash(inode->i_sb, hashval); - spin_lock(&inode_lock); - hlist_add_head(&inode->i_hash, head); - spin_unlock(&inode_lock); -} - -/** - * remove_inode_hash - remove an inode from the hash - * @inode: inode to unhash - * - * Remove an inode from the superblock or anonymous hash. - */ - -void remove_inode_hash(struct inode *inode) -{ - spin_lock(&inode_lock); - hlist_del_init(&inode->i_hash); - spin_unlock(&inode_lock); -} - -void generic_delete_inode(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - -<<<---hlist_del_init|||list_del_init===--->>> list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - - security_inode_delete(inode); - - if (op->delete_inode) { - void (*delete)(struct inode *) = op->delete_inode; - if (!is_bad_inode(inode)) - DQUOT_INIT(inode); - /* s_op->delete_inode internally recalls clear_inode() */ - delete(inode); - } else - clear_inode(inode); - spin_lock(&inode_lock); - list_del_init(&inode->i_hash); - spin_unlock(&inode_lock); - wake_up_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); -} -EXPORT_SYMBOL(generic_delete_inode); - -static void generic_forget_inode(struct inode *inode) -{ - struct super_block *sb = inode->i_sb; - - if (!hlist_unhashed(&inode->i_hash)) { - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_unused); - } - inodes_stat.nr_unused++; - spin_unlock(&inode_lock); - if (!sb || (sb->s_flags & MS_ACTIVE)) - return; - write_inode_now(inode, 1); - spin_lock(&inode_lock); - inodes_stat.nr_unused--; - hlist_del_init(&inode->i_hash); - } - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); -} - -/* - * Normal UNIX filesystem behaviour: delete the - * inode when the usage count drops to zero, and - * i_nlink is zero. - */ -static void generic_drop_inode(struct inode *inode) -{ - if (!inode->i_nlink) - generic_delete_inode(inode); - else - generic_forget_inode(inode); -} - -/* - * Called when we're dropping the last reference - * to an inode. - * - * Call the FS "drop()" function, defaulting to - * the legacy UNIX filesystem behaviour.. - * - * NOTE! NOTE! NOTE! We're called with the inode lock - * held, and the drop function is supposed to release - * the lock! - */ -static inline void iput_final(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - void (*drop)(struct inode *) = generic_drop_inode; - - if (op && op->drop_inode) - drop = op->drop_inode; - drop(inode); -} - -/** - * iput - put an inode - * @inode: inode to put - * - * Puts an inode, dropping its usage count. If the inode use count hits - * zero the inode is also then freed and may be destroyed. - */ - -void iput(struct inode *inode) -{ - if (inode) { - struct super_operations *op = inode->i_sb->s_op; - - if (inode->i_state == I_CLEAR) - BUG(); - - if (op && op->put_inode) - op->put_inode(inode); - - if (atomic_dec_and_lock(&inode->i_count, &inode_lock)) - iput_final(inode); - } -} - -/** - * bmap - find a block number in a file - * @inode: inode of file - * @block: block to find - * - * Returns the block number on the device holding the inode that - * is the disk block number for the block of the file requested. - * That is, asked for block 4 of inode 1 the function will return the - * disk block relative to the disk start that holds that block of the - * file. - */ - -sector_t bmap(struct inode * inode, sector_t block) -{ - sector_t res = 0; - if (inode->i_mapping->a_ops->bmap) - res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block); - return res; -} - -/* - * Return true if the filesystem which backs this inode considers the two - * passed timespecs to be sufficiently different to warrant flushing the - * altered time out to disk. - */ -static int inode_times_differ(struct inode *inode, - struct timespec *old, struct timespec *new) -{ - if (IS_ONE_SECOND(inode)) - return old->tv_sec != new->tv_sec; - return !timespec_equal(old, new); -} - -/** - * update_atime - update the access time - * @inode: inode accessed - * - * Update the accessed time on an inode and mark it for writeback. - * This function automatically handles read only file systems and media, - * as well as the "noatime" flag and inode specific "noatime" markers. - */ - -void update_atime(struct inode *inode) -{ - struct timespec now; - - if (IS_NOATIME(inode)) - return; - if (IS_NODIRATIME(inode) && S_ISDIR(inode->i_mode)) - return; - if (IS_RDONLY(inode)) - return; - - now = current_kernel_time(); - if (inode_times_differ(inode, &inode->i_atime, &now)) { - inode->i_atime = now; - mark_inode_dirty_sync(inode); - } else { - if (!timespec_equal(&inode->i_atime, &now)) - inode->i_atime = now; - } -} - -/** - * inode_update_time - update mtime and ctime time - * @inode: inode accessed - * @ctime_too: update ctime too - * - * Update the mtime time on an inode and mark it for writeback. - * When ctime_too is specified update the ctime too. - */ - -void inode_update_time(struct inode *inode, int ctime_too) -{ - struct timespec now = current_kernel_time(); - int sync_it = 0; - - if (inode_times_differ(inode, &inode->i_mtime, &now)) - sync_it = 1; - inode->i_mtime = now; - - if (ctime_too) { - if (inode_times_differ(inode, &inode->i_ctime, &now)) - sync_it = 1; - inode->i_ctime = now; - } - if (sync_it) - mark_inode_dirty_sync(inode); -} -EXPORT_SYMBOL(inode_update_time); - -int inode_needs_sync(struct inode *inode) -{ - if (IS_SYNC(inode)) - return 1; - if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) - return 1; - return 0; -} -EXPORT_SYMBOL(inode_needs_sync); - -/* - * Quota functions that want to walk the inode lists.. - */ -#ifdef CONFIG_QUOTA - -/* Functions back in dquot.c */ -void put_dquot_list(struct list_head *); -int remove_inode_dquot_ref(struct inode *, int, struct list_head *); - -void remove_dquot_ref(struct super_block *sb, int type) -{ - struct inode *inode; - struct list_head *act_head; - LIST_HEAD(tofree_head); - - if (!sb->dq_op) - return; /* nothing to do */ - spin_lock(&inode_lock); /* This lock is for inodes code */ - /* We don't have to lock against quota code - test IS_QUOTAINIT is just for speedup... */ - - list_for_each(act_head, &inode_in_use) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &inode_unused) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_dirty) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_io) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - spin_unlock(&inode_lock); - - put_dquot_list(&tofree_head); -} - -#endif - -/* - * Hashed waitqueues for wait_on_inode(). The table is pretty small - the - * kernel doesn't lock many inodes at the same time. - */ -#define I_WAIT_TABLE_ORDER 3 -static struct i_wait_queue_head { - wait_queue_head_t wqh; -} ____cacheline_aligned_in_smp i_wait_queue_heads[1<i_state & I_LOCK) { - schedule(); - goto repeat; - } - remove_wait_queue(wq, &wait); - __set_current_state(TASK_RUNNING); -} - -void __wait_on_freeing_inode(struct inode *inode) -{ - DECLARE_WAITQUEUE(wait, current); - wait_queue_head_t *wq = i_waitq_head(inode); - - add_wait_queue(wq, &wait); - set_current_state(TASK_UNINTERRUPTIBLE); - spin_unlock(&inode_lock); - schedule(); - remove_wait_queue(wq, &wait); - current->state = TASK_RUNNING; - spin_lock(&inode_lock); -} - - -void wake_up_inode(struct inode *inode) -{ - wait_queue_head_t *wq = i_waitq_head(inode); - - /* - * Prevent speculative execution through spin_unlock(&inode_lock); - */ - smp_mb(); - if (waitqueue_active(wq)) - wake_up_all(wq); -} - -/* - * Initialize the waitqueues and inode hash table. - */ -void __init inode_init(unsigned long mempages) -{ - struct hlist_head *head; - unsigned long order; - unsigned int nr_hash; - int i; - - for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++) - init_waitqueue_head(&i_wait_queue_heads[i].wqh); - - mempages >>= (14 - PAGE_SHIFT); - mempages *= sizeof(struct list_head); - for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++) - ; - - do { - unsigned long tmp; - - nr_hash = (1UL << order) * PAGE_SIZE / - sizeof(struct hlist_head); - i_hash_mask = (nr_hash - 1); - - tmp = nr_hash; - i_hash_shift = 0; - while ((tmp >>= 1UL) != 0UL) - i_hash_shift++; - - inode_hashtable = (struct hlist_head *) - __get_free_pages(GFP_ATOMIC, order); - } while (inode_hashtable == NULL && --order >= 0); - - printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n", - nr_hash, order, (PAGE_SIZE << order)); - - if (!inode_hashtable) - panic("Failed to allocate inode hash table\n"); - - head = inode_hashtable; - i = nr_hash; - do { - INIT_HLIST_HEAD(head); - head++; - i--; - } while (i); - - /* inode slab cache */ - inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), - 0, SLAB_HWCACHE_ALIGN, init_once, - NULL); - if (!inode_cachep) - panic("cannot create inode slab cache"); - - set_shrinker(DEFAULT_SEEKS, shrink_icache_memory); -} - -void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) -{ - inode->i_mode = mode; - if (S_ISCHR(mode)) { - inode->i_fop = &def_chr_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISBLK(mode)) { - inode->i_fop = &def_blk_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISFIFO(mode)) - inode->i_fop = &def_fifo_fops; - else if (S_ISSOCK(mode)) - inode->i_fop = &bad_sock_fops; - else - printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n", - mode); -} ./linux/inode-fullpatch/wmerge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- rediff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.773119821 +0000 @@ -1,73 +0,0 @@ -@@ -470,6 +470,7 @@ static int shrink_icache_memory(int nr, - return inodes_stat.nr_inodes; - } - -+void __wait_on_freeing_inode(struct inode *inode); - /* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() -@@ -492,6 +493,11 @@ static struct inode * find_inode(struct - continue; - if (!test(inode, data)) - continue; -+ if (inode->i_state & (I_FREEING|I_CLEAR)) { -+ __wait_on_freeing_inode(inode); -+ tmp = head; -+ continue; -+ } - break; - } - return inode; -@@ -517,6 +523,11 @@ static struct inode * find_inode_fast(st - continue; - if (inode->i_sb != sb) - continue; -+ if (inode->i_state & (I_FREEING|I_CLEAR)) { -+ __wait_on_freeing_inode(inode); -+ tmp = head; -+ continue; -+ } - break; - } - return inode; -@@ -949,7 +960,6 @@ void generic_delete_inode(struct inode * - { - struct super_operations *op = inode->i_sb->s_op; - -- list_del_init(&inode->i_hash); - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; -@@ -968,6 +978,10 @@ void generic_delete_inode(struct inode * - delete(inode); - } else - clear_inode(inode); -+ spin_lock(&inode_lock); -+ list_del_init(&inode->i_hash); -+ spin_unlock(&inode_lock); -+ wake_up_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); -@@ -1219,6 +1233,21 @@ repeat: - current->state = TASK_RUNNING; - } - -+void __wait_on_freeing_inode(struct inode *inode) -+{ -+ DECLARE_WAITQUEUE(wait, current); -+ wait_queue_head_t *wq = i_waitq_head(inode); -+ -+ add_wait_queue(wq, &wait); -+ set_current_state(TASK_UNINTERRUPTIBLE); -+ spin_unlock(&inode_lock); -+ schedule(); -+ remove_wait_queue(wq, &wait); -+ current->state = TASK_RUNNING; -+ spin_lock(&inode_lock); -+} -+ -+ - void wake_up_inode(struct inode *inode) - { - wait_queue_head_t *wq = i_waitq_head(inode); ./linux/inode-fullpatch/rediff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.783761021 +0000 @@ -1,1358 +0,0 @@ -/* - * linux/fs/inode.c - * - * (C) 1997 Linus Torvalds - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * This is needed for the following functions: - * - inode_has_buffers - * - invalidate_inode_buffers - * - fsync_bdev - * - invalidate_bdev - * - * FIXME: remove all knowledge of the buffer layer from this file - */ -#include - -/* - * New inode.c implementation. - * - * This implementation has the basic premise of trying - * to be extremely low-overhead and SMP-safe, yet be - * simple enough to be "obviously correct". - * - * Famous last words. - */ - -/* inode dynamic allocation 1999, Andrea Arcangeli */ - -/* #define INODE_PARANOIA 1 */ -/* #define INODE_DEBUG 1 */ - -/* - * Inode lookup is no longer as critical as it used to be: - * most of the lookups are going to be through the dcache. - */ -#define I_HASHBITS i_hash_shift -#define I_HASHMASK i_hash_mask - -static unsigned int i_hash_mask; -static unsigned int i_hash_shift; - -/* - * Each inode can be on two separate lists. One is - * the hash list of the inode, used for lookups. The - * other linked list is the "type" list: - * "in_use" - valid inode, i_count > 0, i_nlink > 0 - * "dirty" - as "in_use" but also dirty - * "unused" - valid inode, i_count = 0 - * - * A "dirty" list is maintained for each super block, - * allowing for low-overhead inode sync() operations. - */ - -LIST_HEAD(inode_in_use); -LIST_HEAD(inode_unused); -static struct hlist_head *inode_hashtable; -static HLIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */ - -/* - * A simple spinlock to protect the list manipulations. - * - * NOTE! You also have to own the lock if you change - * the i_state of an inode while it is in use.. - */ -spinlock_t inode_lock = SPIN_LOCK_UNLOCKED; - -/* - * iprune_sem provides exclusion between the kswapd or try_to_free_pages - * icache shrinking path, and the umount path. Without this exclusion, - * by the time prune_icache calls iput for the inode whose pages it has - * been invalidating, or by the time it calls clear_inode & destroy_inode - * from its final dispose_list, the struct super_block they refer to - * (for inode->i_sb->s_op) may already have been freed and reused. - */ -static DECLARE_MUTEX(iprune_sem); - -/* - * Statistics gathering.. - */ -struct inodes_stat_t inodes_stat; - -static kmem_cache_t * inode_cachep; - -static struct inode *alloc_inode(struct super_block *sb) -{ - static struct address_space_operations empty_aops; - static struct inode_operations empty_iops; - static struct file_operations empty_fops; - struct inode *inode; - - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); - else - inode = (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL); - - if (inode) { - struct address_space * const mapping = &inode->i_data; - - inode->i_sb = sb; - inode->i_blkbits = sb->s_blocksize_bits; - inode->i_flags = 0; - atomic_set(&inode->i_count, 1); - inode->i_sock = 0; - inode->i_op = &empty_iops; - inode->i_fop = &empty_fops; - inode->i_nlink = 1; - atomic_set(&inode->i_writecount, 0); - inode->i_size = 0; - inode->i_blocks = 0; - inode->i_bytes = 0; - inode->i_generation = 0; - memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); - inode->i_pipe = NULL; - inode->i_bdev = NULL; - inode->i_rdev = to_kdev_t(0); - inode->i_security = NULL; - if (security_inode_alloc(inode)) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); - return NULL; - } - - mapping->a_ops = &empty_aops; - mapping->host = inode; - mapping->gfp_mask = GFP_HIGHUSER; - mapping->dirtied_when = 0; - mapping->assoc_mapping = NULL; - mapping->backing_dev_info = &default_backing_dev_info; - if (sb->s_bdev) - mapping->backing_dev_info = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; - memset(&inode->u, 0, sizeof(inode->u)); - inode->i_mapping = mapping; - } - return inode; -} - -void destroy_inode(struct inode *inode) -{ - if (inode_has_buffers(inode)) - BUG(); - security_inode_free(inode); - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, (inode)); -} - - -/* - * These are initializations that only need to be done - * once, because the fields are idempotent across use - * of the inode, so let the slab aware of that. - */ -void inode_init_once(struct inode *inode) -{ - memset(inode, 0, sizeof(*inode)); - INIT_HLIST_NODE(&inode->i_hash); - INIT_LIST_HEAD(&inode->i_data.clean_pages); - INIT_LIST_HEAD(&inode->i_data.dirty_pages); - INIT_LIST_HEAD(&inode->i_data.locked_pages); - INIT_LIST_HEAD(&inode->i_data.io_pages); - INIT_LIST_HEAD(&inode->i_dentry); - INIT_LIST_HEAD(&inode->i_devices); - sema_init(&inode->i_sem, 1); - INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); - rwlock_init(&inode->i_data.page_lock); - init_MUTEX(&inode->i_data.i_shared_sem); - INIT_LIST_HEAD(&inode->i_data.private_list); - spin_lock_init(&inode->i_data.private_lock); - INIT_LIST_HEAD(&inode->i_data.i_mmap); - INIT_LIST_HEAD(&inode->i_data.i_mmap_shared); - spin_lock_init(&inode->i_lock); -} - -static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags) -{ - struct inode * inode = (struct inode *) foo; - - if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == - SLAB_CTOR_CONSTRUCTOR) - inode_init_once(inode); -} - -/* - * inode_lock must be held - */ -void __iget(struct inode * inode) -{ - if (atomic_read(&inode->i_count)) { - atomic_inc(&inode->i_count); - return; - } - atomic_inc(&inode->i_count); - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_in_use); - } - inodes_stat.nr_unused--; -} - -/** - * clear_inode - clear an inode - * @inode: inode to clear - * - * This is called by the filesystem to tell us - * that the inode is no longer useful. We just - * terminate it with extreme prejudice. - */ - -void clear_inode(struct inode *inode) -{ - invalidate_inode_buffers(inode); - - if (inode->i_data.nrpages) - BUG(); - if (!(inode->i_state & I_FREEING)) - BUG(); - if (inode->i_state & I_CLEAR) - BUG(); - wait_on_inode(inode); - DQUOT_DROP(inode); - if (inode->i_sb && inode->i_sb->s_op->clear_inode) - inode->i_sb->s_op->clear_inode(inode); - if (inode->i_bdev) - bd_forget(inode); - inode->i_state = I_CLEAR; -} - -/* - * Dispose-list gets a local list with local inodes in it, so it doesn't - * need to worry about list corruption and SMP locks. - */ -static void dispose_list(struct list_head *head) -{ - int nr_disposed = 0; - - while (!list_empty(head)) { - struct inode *inode; - - inode = list_entry(head->next, struct inode, i_list); - list_del(&inode->i_list); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); - nr_disposed++; - } - spin_lock(&inode_lock); - inodes_stat.nr_inodes -= nr_disposed; - spin_unlock(&inode_lock); -} - -/* - * Invalidate all inodes for a device. - */ -static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose) -{ - struct list_head *next; - int busy = 0, count = 0; - - next = head->next; - for (;;) { - struct list_head * tmp = next; - struct inode * inode; - - next = next->next; - if (tmp == head) - break; - inode = list_entry(tmp, struct inode, i_list); - if (inode->i_sb != sb) - continue; - invalidate_inode_buffers(inode); - if (!atomic_read(&inode->i_count)) { - hlist_del_init(&inode->i_hash); - list_del(&inode->i_list); - list_add(&inode->i_list, dispose); - inode->i_state |= I_FREEING; - count++; - continue; - } - busy = 1; - } - /* only unused inodes may be cached with i_count zero */ - inodes_stat.nr_unused -= count; - return busy; -} - -/* - * This is a two-stage process. First we collect all - * offending inodes onto the throw-away list, and in - * the second stage we actually dispose of them. This - * is because we don't want to sleep while messing - * with the global lists.. - */ - -/** - * invalidate_inodes - discard the inodes on a device - * @sb: superblock - * - * Discard all of the inodes for a given superblock. If the discard - * fails because there are busy inodes then a non zero value is returned. - * If the discard is successful all the inodes have been discarded. - */ - -int invalidate_inodes(struct super_block * sb) -{ - int busy; - LIST_HEAD(throw_away); - - down(&iprune_sem); - spin_lock(&inode_lock); - busy = invalidate_list(&inode_in_use, sb, &throw_away); - busy |= invalidate_list(&inode_unused, sb, &throw_away); - busy |= invalidate_list(&sb->s_dirty, sb, &throw_away); - busy |= invalidate_list(&sb->s_io, sb, &throw_away); - spin_unlock(&inode_lock); - - dispose_list(&throw_away); - up(&iprune_sem); - - return busy; -} - -int invalidate_device(kdev_t dev, int do_sync) -{ - struct super_block *sb; - struct block_device *bdev = bdget(kdev_t_to_nr(dev)); - int res; - - if (!bdev) - return 0; - - if (do_sync) - fsync_bdev(bdev); - - res = 0; - sb = get_super(bdev); - if (sb) { - /* - * no need to lock the super, get_super holds the - * read semaphore so the filesystem cannot go away - * under us (->put_super runs with the write lock - * hold). - */ - shrink_dcache_sb(sb); - res = invalidate_inodes(sb); - drop_super(sb); - } - invalidate_bdev(bdev, 0); - bdput(bdev); - return res; -} - -static int can_unuse(struct inode *inode) -{ - if (inode->i_state) - return 0; - if (inode_has_buffers(inode)) - return 0; - if (atomic_read(&inode->i_count)) - return 0; - if (inode->i_data.nrpages) - return 0; - return 1; -} - -/* - * Scan `goal' inodes on the unused list for freeable ones. They are moved to - * a temporary list and then are freed outside inode_lock by dispose_list(). - * - * Any inodes which are pinned purely because of attached pagecache have their - * pagecache removed. We expect the final iput() on that inode to add it to - * the front of the inode_unused list. So look for it there and if the - * inode is still freeable, proceed. The right inode is found 99.9% of the - * time in testing on a 4-way. - * - * If the inode has metadata buffers attached to mapping->private_list then - * try to remove them. - */ -static void prune_icache(int nr_to_scan) -{ - LIST_HEAD(freeable); - int nr_pruned = 0; - int nr_scanned; - unsigned long reap = 0; - - down(&iprune_sem); - spin_lock(&inode_lock); - for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { - struct inode *inode; - - if (list_empty(&inode_unused)) - break; - - inode = list_entry(inode_unused.prev, struct inode, i_list); - - if (inode->i_state || atomic_read(&inode->i_count)) { - list_move(&inode->i_list, &inode_unused); - continue; - } - if (inode_has_buffers(inode) || inode->i_data.nrpages) { - __iget(inode); - spin_unlock(&inode_lock); - if (remove_inode_buffers(inode)) - reap += invalidate_inode_pages(&inode->i_data); - iput(inode); - spin_lock(&inode_lock); - - if (inode != list_entry(inode_unused.next, - struct inode, i_list)) - continue; /* wrong inode or list_empty */ - if (!can_unuse(inode)) - continue; - } - hlist_del_init(&inode->i_hash); - list_move(&inode->i_list, &freeable); - inode->i_state |= I_FREEING; - nr_pruned++; - } - inodes_stat.nr_unused -= nr_pruned; - spin_unlock(&inode_lock); - - dispose_list(&freeable); - up(&iprune_sem); - - if (current_is_kswapd) - mod_page_state(kswapd_inodesteal, reap); - else - mod_page_state(pginodesteal, reap); -} - -/* - * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, - * "unused" means that no dentries are referring to the inodes: the files are - * not open and the dcache references to those inodes have already been - * reclaimed. - * - * This function is passed the number of inodes to scan, and it returns the - * total number of remaining possibly-reclaimable inodes. - */ -static int shrink_icache_memory(int nr, unsigned int gfp_mask) -{ - if (nr) { - /* - * Nasty deadlock avoidance. We may hold various FS locks, - * and we don't want to recurse into the FS that called us - * in clear_inode() and friends.. - */ - if (gfp_mask & __GFP_FS) - prune_icache(nr); - } - return inodes_stat.nr_unused; -} - -void __wait_on_freeing_inode(struct inode *inode); -/* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() - * by hand after calling find_inode now! This simplifies iunique and won't - * add any additional branch in the common code. - */ -static struct inode * find_inode(struct super_block * sb, struct hlist_head *head, int (*test)(struct inode *, void *), void *data) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = hlist_entry(node, struct inode, i_hash); - if (inode->i_sb != sb) - continue; - if (!test(inode, data)) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/* - * find_inode_fast is the fast path version of find_inode, see the comment at - * iget_locked for details. - */ -static struct inode * find_inode_fast(struct super_block * sb, struct hlist_head *head, unsigned long ino) -{ - struct hlist_node *node; - struct inode * inode = NULL; - - hlist_for_each (node, head) { - prefetch(node->next); - inode = list_entry(node, struct inode, i_hash); - if (inode->i_ino != ino) - continue; - if (inode->i_sb != sb) - continue; - if (inode->i_state & (I_FREEING|I_CLEAR)) { - __wait_on_freeing_inode(inode); - tmp = head; - continue; - } - break; - } - return node ? inode : NULL; -} - -/** - * new_inode - obtain an inode - * @sb: superblock - * - * Allocates a new inode for given superblock. - */ - -struct inode *new_inode(struct super_block *sb) -{ - static unsigned long last_ino; - struct inode * inode; - - spin_lock_prefetch(&inode_lock); - - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode_lock); - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - inode->i_ino = ++last_ino; - inode->i_state = 0; - spin_unlock(&inode_lock); - } - return inode; -} - -void unlock_new_inode(struct inode *inode) -{ - /* - * This is special! We do not need the spinlock - * when clearing I_LOCK, because we're guaranteed - * that nobody else tries to do anything about the - * state of the inode when it is locked, as we - * just created it (so there can be no old holders - * that haven't tested I_LOCK). - */ - inode->i_state &= ~(I_LOCK|I_NEW); - wake_up_inode(inode); -} -EXPORT_SYMBOL(unlock_new_inode); - -/* - * This is called without the inode lock held.. Be careful. - * - * We no longer cache the sb_flags in i_flags - see fs.h - * -- rmk@arm.uk.linux.org - */ -static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode(sb, head, test, data); - if (!old) { - if (set(inode, data)) - goto set_failed; - - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; - -set_failed: - spin_unlock(&inode_lock); - destroy_inode(inode); - return NULL; -} - -/* - * get_new_inode_fast is the fast path version of get_new_inode, see the - * comment at iget_locked for details. - */ -static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino) -{ - struct inode * inode; - - inode = alloc_inode(sb); - if (inode) { - struct inode * old; - - spin_lock(&inode_lock); - /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); - if (!old) { - inode->i_ino = ino; - inodes_stat.nr_inodes++; - list_add(&inode->i_list, &inode_in_use); - hlist_add_head(&inode->i_hash, head); - inode->i_state = I_LOCK|I_NEW; - spin_unlock(&inode_lock); - - /* Return the locked inode with I_NEW set, the - * caller is responsible for filling in the contents - */ - return inode; - } - - /* - * Uhhuh, somebody else created the same inode under - * us. Use the old inode instead of the one we just - * allocated. - */ - __iget(old); - spin_unlock(&inode_lock); - destroy_inode(inode); - inode = old; - wait_on_inode(inode); - } - return inode; -} - -static inline unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES); - tmp = tmp + (tmp >> I_HASHBITS); - return tmp & I_HASHMASK; -} - -/* Yeah, I know about quadratic hash. Maybe, later. */ - -/** - * iunique - get a unique inode number - * @sb: superblock - * @max_reserved: highest reserved inode number - * - * Obtain an inode number that is unique on the system for a given - * superblock. This is used by file systems that have no natural - * permanent inode numbering system. An inode number is returned that - * is higher than the reserved limit but unique. - * - * BUGS: - * With a large number of inodes live on the file system this function - * currently becomes quite slow. - */ - -ino_t iunique(struct super_block *sb, ino_t max_reserved) -{ - static ino_t counter = 0; - struct inode *inode; - struct hlist_head * head; - ino_t res; - spin_lock(&inode_lock); -retry: - if (counter > max_reserved) { - head = inode_hashtable + hash(sb,counter); - res = counter++; - inode = find_inode_fast(sb, head, res); - if (!inode) { - spin_unlock(&inode_lock); - return res; - } - } else { - counter = max_reserved + 1; - } - goto retry; - -} - -struct inode *igrab(struct inode *inode) -{ - spin_lock(&inode_lock); - if (!(inode->i_state & I_FREEING)) - __iget(inode); - else - /* - * Handle the case where s_op->clear_inode is not been - * called yet, and somebody is calling igrab - * while the inode is getting freed. - */ - inode = NULL; - spin_unlock(&inode_lock); - return inode; -} - -/** - * ifind - internal function, you want ilookup5() or iget5(). - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ifind() searches for the inode specified by @hashval and @data in the inode - * cache. This is a generalized version of ifind_fast() for file systems where - * the inode number is not sufficient for unique identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -static inline struct inode *ifind(struct super_block *sb, - struct hlist_head *head, int (*test)(struct inode *, void *), - void *data) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode(sb, head, test, data); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ifind_fast - internal function, you want ilookup() or iget(). - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ifind_fast() searches for the inode @ino in the inode cache. This is for - * file systems where the inode number is sufficient for unique identification - * of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -static inline struct inode *ifind_fast(struct super_block *sb, - struct hlist_head *head, unsigned long ino) -{ - struct inode *inode; - - spin_lock(&inode_lock); - inode = find_inode_fast(sb, head, ino); - if (inode) { - __iget(inode); - spin_unlock(&inode_lock); - wait_on_inode(inode); - return inode; - } - spin_unlock(&inode_lock); - return NULL; -} - -/** - * ilookup5 - search for an inode in the inode cache - * @sb: super block of file system to search - * @hashval: hash value (usually inode number) to search for - * @test: callback used for comparisons between inodes - * @data: opaque data pointer to pass to @test - * - * ilookup5() uses ifind() to search for the inode specified by @hashval and - * @data in the inode cache. This is a generalized version of ilookup() for - * file systems where the inode number is not sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - * - * Note, @test is called with the inode_lock held, so can't sleep. - */ -struct inode *ilookup5(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - - return ifind(sb, head, test, data); -} -EXPORT_SYMBOL(ilookup5); - -/** - * ilookup - search for an inode in the inode cache - * @sb: super block of file system to search - * @ino: inode number to search for - * - * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache. - * This is for file systems where the inode number is sufficient for unique - * identification of an inode. - * - * If the inode is in the cache, the inode is returned with an incremented - * reference count. - * - * Otherwise NULL is returned. - */ -struct inode *ilookup(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - - return ifind_fast(sb, head, ino); -} -EXPORT_SYMBOL(ilookup); - -/** - * iget5_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @hashval: hash value (usually inode number) to get - * @test: callback used for comparisons between inodes - * @set: callback used to initialize a new struct inode - * @data: opaque data pointer to pass to @test and @set - * - * This is iget() without the read_inode() portion of get_new_inode(). - * - * iget5_locked() uses ifind() to search for the inode specified by @hashval - * and @data in the inode cache and if present it is returned with an increased - * reference count. This is a generalized version of iget_locked() for file - * systems where the inode number is not sufficient for unique identification - * of an inode. - * - * If the inode is not in cache, get_new_inode() is called to allocate a new - * inode and this is returned locked, hashed, and with the I_NEW flag set. The - * file system gets to fill it in before unlocking it via unlock_new_inode(). - * - * Note both @test and @set are called with the inode_lock held, so can't sleep. - */ -struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, - int (*test)(struct inode *, void *), - int (*set)(struct inode *, void *), void *data) -{ - struct hlist_head *head = inode_hashtable + hash(sb, hashval); - struct inode *inode; - - inode = ifind(sb, head, test, data); - if (inode) - return inode; - /* - * get_new_inode() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode(sb, head, test, set, data); -} -EXPORT_SYMBOL(iget5_locked); - -/** - * iget_locked - obtain an inode from a mounted file system - * @sb: super block of file system - * @ino: inode number to get - * - * This is iget() without the read_inode() portion of get_new_inode_fast(). - * - * iget_locked() uses ifind_fast() to search for the inode specified by @ino in - * the inode cache and if present it is returned with an increased reference - * count. This is for file systems where the inode number is sufficient for - * unique identification of an inode. - * - * If the inode is not in cache, get_new_inode_fast() is called to allocate a - * new inode and this is returned locked, hashed, and with the I_NEW flag set. - * The file system gets to fill it in before unlocking it via - * unlock_new_inode(). - */ -struct inode *iget_locked(struct super_block *sb, unsigned long ino) -{ - struct hlist_head *head = inode_hashtable + hash(sb, ino); - struct inode *inode; - - inode = ifind_fast(sb, head, ino); - if (inode) - return inode; - /* - * get_new_inode_fast() will do the right thing, re-trying the search - * in case it had to block at any point. - */ - return get_new_inode_fast(sb, head, ino); -} -EXPORT_SYMBOL(iget_locked); - -/** - * __insert_inode_hash - hash an inode - * @inode: unhashed inode - * @hashval: unsigned long value used to locate this object in the - * inode_hashtable. - * - * Add an inode to the inode hash for this superblock. If the inode - * has no superblock it is added to a separate anonymous chain. - */ - -void __insert_inode_hash(struct inode *inode, unsigned long hashval) -{ - struct hlist_head *head = &anon_hash_chain; - if (inode->i_sb) - head = inode_hashtable + hash(inode->i_sb, hashval); - spin_lock(&inode_lock); - hlist_add_head(&inode->i_hash, head); - spin_unlock(&inode_lock); -} - -/** - * remove_inode_hash - remove an inode from the hash - * @inode: inode to unhash - * - * Remove an inode from the superblock or anonymous hash. - */ - -void remove_inode_hash(struct inode *inode) -{ - spin_lock(&inode_lock); - hlist_del_init(&inode->i_hash); - spin_unlock(&inode_lock); -} - -void generic_delete_inode(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - -<<<<<<< found - hlist_del_init(&inode->i_hash); -||||||| expected - list_del_init(&inode->i_hash); -======= ->>>>>>> replacement - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - - security_inode_delete(inode); - - if (op->delete_inode) { - void (*delete)(struct inode *) = op->delete_inode; - if (!is_bad_inode(inode)) - DQUOT_INIT(inode); - /* s_op->delete_inode internally recalls clear_inode() */ - delete(inode); - } else - clear_inode(inode); - spin_lock(&inode_lock); - list_del_init(&inode->i_hash); - spin_unlock(&inode_lock); - wake_up_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); -} -EXPORT_SYMBOL(generic_delete_inode); - -static void generic_forget_inode(struct inode *inode) -{ - struct super_block *sb = inode->i_sb; - - if (!hlist_unhashed(&inode->i_hash)) { - if (!(inode->i_state & (I_DIRTY|I_LOCK))) { - list_del(&inode->i_list); - list_add(&inode->i_list, &inode_unused); - } - inodes_stat.nr_unused++; - spin_unlock(&inode_lock); - if (!sb || (sb->s_flags & MS_ACTIVE)) - return; - write_inode_now(inode, 1); - spin_lock(&inode_lock); - inodes_stat.nr_unused--; - hlist_del_init(&inode->i_hash); - } - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; - spin_unlock(&inode_lock); - if (inode->i_data.nrpages) - truncate_inode_pages(&inode->i_data, 0); - clear_inode(inode); - destroy_inode(inode); -} - -/* - * Normal UNIX filesystem behaviour: delete the - * inode when the usage count drops to zero, and - * i_nlink is zero. - */ -static void generic_drop_inode(struct inode *inode) -{ - if (!inode->i_nlink) - generic_delete_inode(inode); - else - generic_forget_inode(inode); -} - -/* - * Called when we're dropping the last reference - * to an inode. - * - * Call the FS "drop()" function, defaulting to - * the legacy UNIX filesystem behaviour.. - * - * NOTE! NOTE! NOTE! We're called with the inode lock - * held, and the drop function is supposed to release - * the lock! - */ -static inline void iput_final(struct inode *inode) -{ - struct super_operations *op = inode->i_sb->s_op; - void (*drop)(struct inode *) = generic_drop_inode; - - if (op && op->drop_inode) - drop = op->drop_inode; - drop(inode); -} - -/** - * iput - put an inode - * @inode: inode to put - * - * Puts an inode, dropping its usage count. If the inode use count hits - * zero the inode is also then freed and may be destroyed. - */ - -void iput(struct inode *inode) -{ - if (inode) { - struct super_operations *op = inode->i_sb->s_op; - - if (inode->i_state == I_CLEAR) - BUG(); - - if (op && op->put_inode) - op->put_inode(inode); - - if (atomic_dec_and_lock(&inode->i_count, &inode_lock)) - iput_final(inode); - } -} - -/** - * bmap - find a block number in a file - * @inode: inode of file - * @block: block to find - * - * Returns the block number on the device holding the inode that - * is the disk block number for the block of the file requested. - * That is, asked for block 4 of inode 1 the function will return the - * disk block relative to the disk start that holds that block of the - * file. - */ - -sector_t bmap(struct inode * inode, sector_t block) -{ - sector_t res = 0; - if (inode->i_mapping->a_ops->bmap) - res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block); - return res; -} - -/* - * Return true if the filesystem which backs this inode considers the two - * passed timespecs to be sufficiently different to warrant flushing the - * altered time out to disk. - */ -static int inode_times_differ(struct inode *inode, - struct timespec *old, struct timespec *new) -{ - if (IS_ONE_SECOND(inode)) - return old->tv_sec != new->tv_sec; - return !timespec_equal(old, new); -} - -/** - * update_atime - update the access time - * @inode: inode accessed - * - * Update the accessed time on an inode and mark it for writeback. - * This function automatically handles read only file systems and media, - * as well as the "noatime" flag and inode specific "noatime" markers. - */ - -void update_atime(struct inode *inode) -{ - struct timespec now; - - if (IS_NOATIME(inode)) - return; - if (IS_NODIRATIME(inode) && S_ISDIR(inode->i_mode)) - return; - if (IS_RDONLY(inode)) - return; - - now = current_kernel_time(); - if (inode_times_differ(inode, &inode->i_atime, &now)) { - inode->i_atime = now; - mark_inode_dirty_sync(inode); - } else { - if (!timespec_equal(&inode->i_atime, &now)) - inode->i_atime = now; - } -} - -/** - * inode_update_time - update mtime and ctime time - * @inode: inode accessed - * @ctime_too: update ctime too - * - * Update the mtime time on an inode and mark it for writeback. - * When ctime_too is specified update the ctime too. - */ - -void inode_update_time(struct inode *inode, int ctime_too) -{ - struct timespec now = current_kernel_time(); - int sync_it = 0; - - if (inode_times_differ(inode, &inode->i_mtime, &now)) - sync_it = 1; - inode->i_mtime = now; - - if (ctime_too) { - if (inode_times_differ(inode, &inode->i_ctime, &now)) - sync_it = 1; - inode->i_ctime = now; - } - if (sync_it) - mark_inode_dirty_sync(inode); -} -EXPORT_SYMBOL(inode_update_time); - -int inode_needs_sync(struct inode *inode) -{ - if (IS_SYNC(inode)) - return 1; - if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) - return 1; - return 0; -} -EXPORT_SYMBOL(inode_needs_sync); - -/* - * Quota functions that want to walk the inode lists.. - */ -#ifdef CONFIG_QUOTA - -/* Functions back in dquot.c */ -void put_dquot_list(struct list_head *); -int remove_inode_dquot_ref(struct inode *, int, struct list_head *); - -void remove_dquot_ref(struct super_block *sb, int type) -{ - struct inode *inode; - struct list_head *act_head; - LIST_HEAD(tofree_head); - - if (!sb->dq_op) - return; /* nothing to do */ - spin_lock(&inode_lock); /* This lock is for inodes code */ - /* We don't have to lock against quota code - test IS_QUOTAINIT is just for speedup... */ - - list_for_each(act_head, &inode_in_use) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &inode_unused) { - inode = list_entry(act_head, struct inode, i_list); - if (inode->i_sb == sb && IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_dirty) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - list_for_each(act_head, &sb->s_io) { - inode = list_entry(act_head, struct inode, i_list); - if (IS_QUOTAINIT(inode)) - remove_inode_dquot_ref(inode, type, &tofree_head); - } - spin_unlock(&inode_lock); - - put_dquot_list(&tofree_head); -} - -#endif - -/* - * Hashed waitqueues for wait_on_inode(). The table is pretty small - the - * kernel doesn't lock many inodes at the same time. - */ -#define I_WAIT_TABLE_ORDER 3 -static struct i_wait_queue_head { - wait_queue_head_t wqh; -} ____cacheline_aligned_in_smp i_wait_queue_heads[1<i_state & I_LOCK) { - schedule(); - goto repeat; - } - remove_wait_queue(wq, &wait); - __set_current_state(TASK_RUNNING); -} - -void __wait_on_freeing_inode(struct inode *inode) -{ - DECLARE_WAITQUEUE(wait, current); - wait_queue_head_t *wq = i_waitq_head(inode); - - add_wait_queue(wq, &wait); - set_current_state(TASK_UNINTERRUPTIBLE); - spin_unlock(&inode_lock); - schedule(); - remove_wait_queue(wq, &wait); - current->state = TASK_RUNNING; - spin_lock(&inode_lock); -} - - -void wake_up_inode(struct inode *inode) -{ - wait_queue_head_t *wq = i_waitq_head(inode); - - /* - * Prevent speculative execution through spin_unlock(&inode_lock); - */ - smp_mb(); - if (waitqueue_active(wq)) - wake_up_all(wq); -} - -/* - * Initialize the waitqueues and inode hash table. - */ -void __init inode_init(unsigned long mempages) -{ - struct hlist_head *head; - unsigned long order; - unsigned int nr_hash; - int i; - - for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++) - init_waitqueue_head(&i_wait_queue_heads[i].wqh); - - mempages >>= (14 - PAGE_SHIFT); - mempages *= sizeof(struct list_head); - for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++) - ; - - do { - unsigned long tmp; - - nr_hash = (1UL << order) * PAGE_SIZE / - sizeof(struct hlist_head); - i_hash_mask = (nr_hash - 1); - - tmp = nr_hash; - i_hash_shift = 0; - while ((tmp >>= 1UL) != 0UL) - i_hash_shift++; - - inode_hashtable = (struct hlist_head *) - __get_free_pages(GFP_ATOMIC, order); - } while (inode_hashtable == NULL && --order >= 0); - - printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n", - nr_hash, order, (PAGE_SIZE << order)); - - if (!inode_hashtable) - panic("Failed to allocate inode hash table\n"); - - head = inode_hashtable; - i = nr_hash; - do { - INIT_HLIST_HEAD(head); - head++; - i--; - } while (i); - - /* inode slab cache */ - inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), - 0, SLAB_HWCACHE_ALIGN, init_once, - NULL); - if (!inode_cachep) - panic("cannot create inode slab cache"); - - set_shrinker(DEFAULT_SEEKS, shrink_icache_memory); -} - -void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) -{ - inode->i_mode = mode; - if (S_ISCHR(mode)) { - inode->i_fop = &def_chr_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISBLK(mode)) { - inode->i_fop = &def_blk_fops; - inode->i_rdev = to_kdev_t(rdev); - } else if (S_ISFIFO(mode)) - inode->i_fop = &def_fifo_fops; - else if (S_ISSOCK(mode)) - inode->i_fop = &bad_sock_fops; - else - printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n", - mode); -} ./linux/inode-fullpatch/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- diff 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.808557647 +0000 @@ -1,1330 +0,0 @@ -@@ -1,1323 +1,43 @@ --/* -- * linux/fs/inode.c -- * -- * (C) 1997 Linus Torvalds -- */ -- --#include --#include --#include --#include --#include --#include --#include --#include --#include --#include --#include --#include --#include --#include -- --/* -- * This is needed for the following functions: -- * - inode_has_buffers -- * - invalidate_inode_buffers -- * - fsync_bdev -- * - invalidate_bdev -- * -- * FIXME: remove all knowledge of the buffer layer from this file -- */ --#include -- --/* -- * New inode.c implementation. -- * -- * This implementation has the basic premise of trying -- * to be extremely low-overhead and SMP-safe, yet be -- * simple enough to be "obviously correct". -- * -- * Famous last words. -- */ -- --/* inode dynamic allocation 1999, Andrea Arcangeli */ -- --/* #define INODE_PARANOIA 1 */ --/* #define INODE_DEBUG 1 */ -- --/* -- * Inode lookup is no longer as critical as it used to be: -- * most of the lookups are going to be through the dcache. -- */ --#define I_HASHBITS i_hash_shift --#define I_HASHMASK i_hash_mask -- --static unsigned int i_hash_mask; --static unsigned int i_hash_shift; -- --/* -- * Each inode can be on two separate lists. One is -- * the hash list of the inode, used for lookups. The -- * other linked list is the "type" list: -- * "in_use" - valid inode, i_count > 0, i_nlink > 0 -- * "dirty" - as "in_use" but also dirty -- * "unused" - valid inode, i_count = 0 -- * -- * A "dirty" list is maintained for each super block, -- * allowing for low-overhead inode sync() operations. -- */ -- --LIST_HEAD(inode_in_use); --LIST_HEAD(inode_unused); --static struct hlist_head *inode_hashtable; --static HLIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */ -- --/* -- * A simple spinlock to protect the list manipulations. -- * -- * NOTE! You also have to own the lock if you change -- * the i_state of an inode while it is in use.. -- */ --spinlock_t inode_lock = SPIN_LOCK_UNLOCKED; -- --/* -- * iprune_sem provides exclusion between the kswapd or try_to_free_pages -- * icache shrinking path, and the umount path. Without this exclusion, -- * by the time prune_icache calls iput for the inode whose pages it has -- * been invalidating, or by the time it calls clear_inode & destroy_inode -- * from its final dispose_list, the struct super_block they refer to -- * (for inode->i_sb->s_op) may already have been freed and reused. -- */ --static DECLARE_MUTEX(iprune_sem); -- --/* -- * Statistics gathering.. -- */ --struct inodes_stat_t inodes_stat; -- --static kmem_cache_t * inode_cachep; -- --static struct inode *alloc_inode(struct super_block *sb) --{ -- static struct address_space_operations empty_aops; -- static struct inode_operations empty_iops; -- static struct file_operations empty_fops; -- struct inode *inode; -- -- if (sb->s_op->alloc_inode) -- inode = sb->s_op->alloc_inode(sb); -- else -- inode = (struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL); -- -- if (inode) { -- struct address_space * const mapping = &inode->i_data; -- -- inode->i_sb = sb; -- inode->i_blkbits = sb->s_blocksize_bits; -- inode->i_flags = 0; -- atomic_set(&inode->i_count, 1); -- inode->i_sock = 0; -- inode->i_op = &empty_iops; -- inode->i_fop = &empty_fops; -- inode->i_nlink = 1; -- atomic_set(&inode->i_writecount, 0); -- inode->i_size = 0; -- inode->i_blocks = 0; -- inode->i_bytes = 0; -- inode->i_generation = 0; -- memset(&inode->i_dquot, 0, sizeof(inode->i_dquot)); -- inode->i_pipe = NULL; -- inode->i_bdev = NULL; -- inode->i_rdev = to_kdev_t(0); -- inode->i_security = NULL; -- if (security_inode_alloc(inode)) { -- if (inode->i_sb->s_op->destroy_inode) -- inode->i_sb->s_op->destroy_inode(inode); -- else -- kmem_cache_free(inode_cachep, (inode)); -- return NULL; -- } -- -- mapping->a_ops = &empty_aops; -- mapping->host = inode; -- mapping->gfp_mask = GFP_HIGHUSER; -- mapping->dirtied_when = 0; -- mapping->assoc_mapping = NULL; -- mapping->backing_dev_info = &default_backing_dev_info; -- if (sb->s_bdev) -- mapping->backing_dev_info = sb->s_bdev->bd_inode->i_mapping->backing_dev_info; -- memset(&inode->u, 0, sizeof(inode->u)); -- inode->i_mapping = mapping; -- } -- return inode; --} -- --void destroy_inode(struct inode *inode) --{ -- if (inode_has_buffers(inode)) -- BUG(); -- security_inode_free(inode); -- if (inode->i_sb->s_op->destroy_inode) -- inode->i_sb->s_op->destroy_inode(inode); -- else -- kmem_cache_free(inode_cachep, (inode)); --} -- -- --/* -- * These are initializations that only need to be done -- * once, because the fields are idempotent across use -- * of the inode, so let the slab aware of that. -- */ --void inode_init_once(struct inode *inode) --{ -- memset(inode, 0, sizeof(*inode)); -- INIT_HLIST_NODE(&inode->i_hash); -- INIT_LIST_HEAD(&inode->i_data.clean_pages); -- INIT_LIST_HEAD(&inode->i_data.dirty_pages); -- INIT_LIST_HEAD(&inode->i_data.locked_pages); -- INIT_LIST_HEAD(&inode->i_data.io_pages); -- INIT_LIST_HEAD(&inode->i_dentry); -- INIT_LIST_HEAD(&inode->i_devices); -- sema_init(&inode->i_sem, 1); -- INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); -- rwlock_init(&inode->i_data.page_lock); -- init_MUTEX(&inode->i_data.i_shared_sem); -- INIT_LIST_HEAD(&inode->i_data.private_list); -- spin_lock_init(&inode->i_data.private_lock); -- INIT_LIST_HEAD(&inode->i_data.i_mmap); -- INIT_LIST_HEAD(&inode->i_data.i_mmap_shared); -- spin_lock_init(&inode->i_lock); --} -- --static void init_once(void * foo, kmem_cache_t * cachep, unsigned long flags) --{ -- struct inode * inode = (struct inode *) foo; -- -- if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == -- SLAB_CTOR_CONSTRUCTOR) -- inode_init_once(inode); --} -- --/* -- * inode_lock must be held -- */ --void __iget(struct inode * inode) --{ -- if (atomic_read(&inode->i_count)) { -- atomic_inc(&inode->i_count); -- return; -- } -- atomic_inc(&inode->i_count); -- if (!(inode->i_state & (I_DIRTY|I_LOCK))) { -- list_del(&inode->i_list); -- list_add(&inode->i_list, &inode_in_use); -- } -- inodes_stat.nr_unused--; --} -- --/** -- * clear_inode - clear an inode -- * @inode: inode to clear -- * -- * This is called by the filesystem to tell us -- * that the inode is no longer useful. We just -- * terminate it with extreme prejudice. -- */ -- --void clear_inode(struct inode *inode) --{ -- invalidate_inode_buffers(inode); -- -- if (inode->i_data.nrpages) -- BUG(); -- if (!(inode->i_state & I_FREEING)) -- BUG(); -- if (inode->i_state & I_CLEAR) -- BUG(); -- wait_on_inode(inode); -- DQUOT_DROP(inode); -- if (inode->i_sb && inode->i_sb->s_op->clear_inode) -- inode->i_sb->s_op->clear_inode(inode); -- if (inode->i_bdev) -- bd_forget(inode); -- inode->i_state = I_CLEAR; --} -- --/* -- * Dispose-list gets a local list with local inodes in it, so it doesn't -- * need to worry about list corruption and SMP locks. -- */ --static void dispose_list(struct list_head *head) --{ -- int nr_disposed = 0; -- -- while (!list_empty(head)) { -- struct inode *inode; -- -- inode = list_entry(head->next, struct inode, i_list); -- list_del(&inode->i_list); -- -- if (inode->i_data.nrpages) -- truncate_inode_pages(&inode->i_data, 0); -- clear_inode(inode); -- destroy_inode(inode); -- nr_disposed++; -- } -- spin_lock(&inode_lock); -- inodes_stat.nr_inodes -= nr_disposed; -- spin_unlock(&inode_lock); --} -- --/* -- * Invalidate all inodes for a device. -- */ --static int invalidate_list(struct list_head *head, struct super_block * sb, struct list_head * dispose) --{ -- struct list_head *next; -- int busy = 0, count = 0; -- -- next = head->next; -- for (;;) { -- struct list_head * tmp = next; -- struct inode * inode; -- -- next = next->next; -- if (tmp == head) -- break; -- inode = list_entry(tmp, struct inode, i_list); -- if (inode->i_sb != sb) -- continue; -- invalidate_inode_buffers(inode); -- if (!atomic_read(&inode->i_count)) { -- hlist_del_init(&inode->i_hash); -- list_del(&inode->i_list); -- list_add(&inode->i_list, dispose); -- inode->i_state |= I_FREEING; -- count++; -- continue; -- } -- busy = 1; -- } -- /* only unused inodes may be cached with i_count zero */ -- inodes_stat.nr_unused -= count; -- return busy; --} -- --/* -- * This is a two-stage process. First we collect all -- * offending inodes onto the throw-away list, and in -- * the second stage we actually dispose of them. This -- * is because we don't want to sleep while messing -- * with the global lists.. -- */ -- --/** -- * invalidate_inodes - discard the inodes on a device -- * @sb: superblock -- * -- * Discard all of the inodes for a given superblock. If the discard -- * fails because there are busy inodes then a non zero value is returned. -- * If the discard is successful all the inodes have been discarded. -- */ -- --int invalidate_inodes(struct super_block * sb) --{ -- int busy; -- LIST_HEAD(throw_away); -- -- down(&iprune_sem); -- spin_lock(&inode_lock); -- busy = invalidate_list(&inode_in_use, sb, &throw_away); -- busy |= invalidate_list(&inode_unused, sb, &throw_away); -- busy |= invalidate_list(&sb->s_dirty, sb, &throw_away); -- busy |= invalidate_list(&sb->s_io, sb, &throw_away); -- spin_unlock(&inode_lock); -- -- dispose_list(&throw_away); -- up(&iprune_sem); -- -- return busy; --} -- --int invalidate_device(kdev_t dev, int do_sync) --{ -- struct super_block *sb; -- struct block_device *bdev = bdget(kdev_t_to_nr(dev)); -- int res; -- -- if (!bdev) -- return 0; -- -- if (do_sync) -- fsync_bdev(bdev); -- -- res = 0; -- sb = get_super(bdev); -- if (sb) { -- /* -- * no need to lock the super, get_super holds the -- * read semaphore so the filesystem cannot go away -- * under us (->put_super runs with the write lock -- * hold). -- */ -- shrink_dcache_sb(sb); -- res = invalidate_inodes(sb); -- drop_super(sb); -- } -- invalidate_bdev(bdev, 0); -- bdput(bdev); -- return res; --} -- --static int can_unuse(struct inode *inode) --{ -- if (inode->i_state) -- return 0; -- if (inode_has_buffers(inode)) -- return 0; -- if (atomic_read(&inode->i_count)) -- return 0; -- if (inode->i_data.nrpages) -- return 0; -- return 1; --} -- --/* -- * Scan `goal' inodes on the unused list for freeable ones. They are moved to -- * a temporary list and then are freed outside inode_lock by dispose_list(). -- * -- * Any inodes which are pinned purely because of attached pagecache have their -- * pagecache removed. We expect the final iput() on that inode to add it to -- * the front of the inode_unused list. So look for it there and if the -- * inode is still freeable, proceed. The right inode is found 99.9% of the -- * time in testing on a 4-way. -- * -- * If the inode has metadata buffers attached to mapping->private_list then -- * try to remove them. -- */ --static void prune_icache(int nr_to_scan) --{ -- LIST_HEAD(freeable); -- int nr_pruned = 0; -- int nr_scanned; -- unsigned long reap = 0; -- -- down(&iprune_sem); -- spin_lock(&inode_lock); -- for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { -- struct inode *inode; -- -- if (list_empty(&inode_unused)) -- break; -- -- inode = list_entry(inode_unused.prev, struct inode, i_list); -- -- if (inode->i_state || atomic_read(&inode->i_count)) { -- list_move(&inode->i_list, &inode_unused); -- continue; -- } -- if (inode_has_buffers(inode) || inode->i_data.nrpages) { -- __iget(inode); -- spin_unlock(&inode_lock); -- if (remove_inode_buffers(inode)) -- reap += invalidate_inode_pages(&inode->i_data); -- iput(inode); -- spin_lock(&inode_lock); -- -- if (inode != list_entry(inode_unused.next, -- struct inode, i_list)) -- continue; /* wrong inode or list_empty */ -- if (!can_unuse(inode)) -- continue; -- } -- hlist_del_init(&inode->i_hash); -- list_move(&inode->i_list, &freeable); -- inode->i_state |= I_FREEING; -- nr_pruned++; -- } -- inodes_stat.nr_unused -= nr_pruned; -- spin_unlock(&inode_lock); -- -- dispose_list(&freeable); -- up(&iprune_sem); -- -- if (current_is_kswapd) -- mod_page_state(kswapd_inodesteal, reap); -- else -- mod_page_state(pginodesteal, reap); --} -- --/* -- * shrink_icache_memory() will attempt to reclaim some unused inodes. Here, -- * "unused" means that no dentries are referring to the inodes: the files are -- * not open and the dcache references to those inodes have already been -- * reclaimed. -- * -- * This function is passed the number of inodes to scan, and it returns the -- * total number of remaining possibly-reclaimable inodes. -- */ --static int shrink_icache_memory(int nr, unsigned int gfp_mask) --{ -- if (nr) { -- /* -- * Nasty deadlock avoidance. We may hold various FS locks, -- * and we don't want to recurse into the FS that called us -- * in clear_inode() and friends.. -- */ -- if (gfp_mask & __GFP_FS) -- prune_icache(nr); -- } -+*** 470,6 **** 1 static int shrink_icache_memory(int nr, -| return inodes_stat.<<<--nr_unused-->>><<<++nr_inodes++>>>; - } - - /* - * Called with the inode lock held. - * NOTE: we are not increasing the inode-refcount, you must call __iget() -- * by hand after calling find_inode now! This simplifies iunique and won't -- * add any additional branch in the common code. -- */ --static struct inode * find_inode(struct super_block * sb, struct hlist_head *head, int (*test)(struct inode *, void *), void *data) --{ -- struct hlist_node *node; -- struct inode * inode = NULL; -- -- hlist_for_each (node, head) { -- prefetch(node->next); -- inode = hlist_entry(node, struct inode, i_hash); -- if (inode->i_sb != sb) -+*** 492,6 **** 2 static struct inode * find_inode(struct - continue; - if (!test(inode, data)) - continue; - break; - } -| return<<<-- node ?-->>> inode<<<-- : NULL-->>>; --} -- --/* -- * find_inode_fast is the fast path version of find_inode, see the comment at -- * iget_locked for details. -- */ --static struct inode * find_inode_fast(struct super_block * sb, struct hlist_head *head, unsigned long ino) --{ -- struct hlist_node *node; -- struct inode * inode = NULL; -- -- hlist_for_each (node, head) { -- prefetch(node->next); -- inode = list_entry(node, struct inode, i_hash); -- if (inode->i_ino != ino) -+*** 517,6 **** 3 static struct inode * find_inode_fast(st - continue; - if (inode->i_sb != sb) - continue; - break; - } -| return<<<-- node ?-->>> inode<<<-- : NULL-->>>; --} -- --/** -- * new_inode - obtain an inode -- * @sb: superblock -- * -- * Allocates a new inode for given superblock. -- */ -- --struct inode *new_inode(struct super_block *sb) --{ -- static unsigned long last_ino; -- struct inode * inode; -- -- spin_lock_prefetch(&inode_lock); -- -- inode = alloc_inode(sb); -- if (inode) { -- spin_lock(&inode_lock); -- inodes_stat.nr_inodes++; -- list_add(&inode->i_list, &inode_in_use); -- inode->i_ino = ++last_ino; -- inode->i_state = 0; -- spin_unlock(&inode_lock); -- } -- return inode; --} -- --void unlock_new_inode(struct inode *inode) --{ -- /* -- * This is special! We do not need the spinlock -- * when clearing I_LOCK, because we're guaranteed -- * that nobody else tries to do anything about the -- * state of the inode when it is locked, as we -- * just created it (so there can be no old holders -- * that haven't tested I_LOCK). -- */ -- inode->i_state &= ~(I_LOCK|I_NEW); -- wake_up_inode(inode); --} --EXPORT_SYMBOL(unlock_new_inode); -- --/* -- * This is called without the inode lock held.. Be careful. -- * -- * We no longer cache the sb_flags in i_flags - see fs.h -- * -- rmk@arm.uk.linux.org -- */ --static struct inode * get_new_inode(struct super_block *sb, struct hlist_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) --{ -- struct inode * inode; -- -- inode = alloc_inode(sb); -- if (inode) { -- struct inode * old; -- -- spin_lock(&inode_lock); -- /* We released the lock, so.. */ -- old = find_inode(sb, head, test, data); -- if (!old) { -- if (set(inode, data)) -- goto set_failed; -- -- inodes_stat.nr_inodes++; -- list_add(&inode->i_list, &inode_in_use); -- hlist_add_head(&inode->i_hash, head); -- inode->i_state = I_LOCK|I_NEW; -- spin_unlock(&inode_lock); -- -- /* Return the locked inode with I_NEW set, the -- * caller is responsible for filling in the contents -- */ -- return inode; -- } -- -- /* -- * Uhhuh, somebody else created the same inode under -- * us. Use the old inode instead of the one we just -- * allocated. -- */ -- __iget(old); -- spin_unlock(&inode_lock); -- destroy_inode(inode); -- inode = old; -- wait_on_inode(inode); -- } -- return inode; -- --set_failed: -- spin_unlock(&inode_lock); -- destroy_inode(inode); -- return NULL; --} -- --/* -- * get_new_inode_fast is the fast path version of get_new_inode, see the -- * comment at iget_locked for details. -- */ --static struct inode * get_new_inode_fast(struct super_block *sb, struct hlist_head *head, unsigned long ino) --{ -- struct inode * inode; -- -- inode = alloc_inode(sb); -- if (inode) { -- struct inode * old; -- -- spin_lock(&inode_lock); -- /* We released the lock, so.. */ -- old = find_inode_fast(sb, head, ino); -- if (!old) { -- inode->i_ino = ino; -- inodes_stat.nr_inodes++; -- list_add(&inode->i_list, &inode_in_use); -- hlist_add_head(&inode->i_hash, head); -- inode->i_state = I_LOCK|I_NEW; -- spin_unlock(&inode_lock); -- -- /* Return the locked inode with I_NEW set, the -- * caller is responsible for filling in the contents -- */ -- return inode; -- } -- -- /* -- * Uhhuh, somebody else created the same inode under -- * us. Use the old inode instead of the one we just -- * allocated. -- */ -- __iget(old); -- spin_unlock(&inode_lock); -- destroy_inode(inode); -- inode = old; -- wait_on_inode(inode); -- } -- return inode; --} -- --static inline unsigned long hash(struct super_block *sb, unsigned long hashval) --{ -- unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES); -- tmp = tmp + (tmp >> I_HASHBITS); -- return tmp & I_HASHMASK; --} -- --/* Yeah, I know about quadratic hash. Maybe, later. */ -- --/** -- * iunique - get a unique inode number -- * @sb: superblock -- * @max_reserved: highest reserved inode number -- * -- * Obtain an inode number that is unique on the system for a given -- * superblock. This is used by file systems that have no natural -- * permanent inode numbering system. An inode number is returned that -- * is higher than the reserved limit but unique. -- * -- * BUGS: -- * With a large number of inodes live on the file system this function -- * currently becomes quite slow. -- */ -- --ino_t iunique(struct super_block *sb, ino_t max_reserved) --{ -- static ino_t counter = 0; -- struct inode *inode; -- struct hlist_head * head; -- ino_t res; -- spin_lock(&inode_lock); --retry: -- if (counter > max_reserved) { -- head = inode_hashtable + hash(sb,counter); -- res = counter++; -- inode = find_inode_fast(sb, head, res); -- if (!inode) { -- spin_unlock(&inode_lock); -- return res; -- } -- } else { -- counter = max_reserved + 1; -- } -- goto retry; -- --} -- --struct inode *igrab(struct inode *inode) --{ -- spin_lock(&inode_lock); -- if (!(inode->i_state & I_FREEING)) -- __iget(inode); -- else -- /* -- * Handle the case where s_op->clear_inode is not been -- * called yet, and somebody is calling igrab -- * while the inode is getting freed. -- */ -- inode = NULL; -- spin_unlock(&inode_lock); -- return inode; --} -- --/** -- * ifind - internal function, you want ilookup5() or iget5(). -- * @sb: super block of file system to search -- * @hashval: hash value (usually inode number) to search for -- * @test: callback used for comparisons between inodes -- * @data: opaque data pointer to pass to @test -- * -- * ifind() searches for the inode specified by @hashval and @data in the inode -- * cache. This is a generalized version of ifind_fast() for file systems where -- * the inode number is not sufficient for unique identification of an inode. -- * -- * If the inode is in the cache, the inode is returned with an incremented -- * reference count. -- * -- * Otherwise NULL is returned. -- * -- * Note, @test is called with the inode_lock held, so can't sleep. -- */ --static inline struct inode *ifind(struct super_block *sb, -- struct hlist_head *head, int (*test)(struct inode *, void *), -- void *data) --{ -- struct inode *inode; -- -- spin_lock(&inode_lock); -- inode = find_inode(sb, head, test, data); -- if (inode) { -- __iget(inode); -- spin_unlock(&inode_lock); -- wait_on_inode(inode); -- return inode; -- } -- spin_unlock(&inode_lock); -- return NULL; --} -- --/** -- * ifind_fast - internal function, you want ilookup() or iget(). -- * @sb: super block of file system to search -- * @ino: inode number to search for -- * -- * ifind_fast() searches for the inode @ino in the inode cache. This is for -- * file systems where the inode number is sufficient for unique identification -- * of an inode. -- * -- * If the inode is in the cache, the inode is returned with an incremented -- * reference count. -- * -- * Otherwise NULL is returned. -- */ --static inline struct inode *ifind_fast(struct super_block *sb, -- struct hlist_head *head, unsigned long ino) --{ -- struct inode *inode; -- -- spin_lock(&inode_lock); -- inode = find_inode_fast(sb, head, ino); -- if (inode) { -- __iget(inode); -- spin_unlock(&inode_lock); -- wait_on_inode(inode); -- return inode; -- } -- spin_unlock(&inode_lock); -- return NULL; --} -- --/** -- * ilookup5 - search for an inode in the inode cache -- * @sb: super block of file system to search -- * @hashval: hash value (usually inode number) to search for -- * @test: callback used for comparisons between inodes -- * @data: opaque data pointer to pass to @test -- * -- * ilookup5() uses ifind() to search for the inode specified by @hashval and -- * @data in the inode cache. This is a generalized version of ilookup() for -- * file systems where the inode number is not sufficient for unique -- * identification of an inode. -- * -- * If the inode is in the cache, the inode is returned with an incremented -- * reference count. -- * -- * Otherwise NULL is returned. -- * -- * Note, @test is called with the inode_lock held, so can't sleep. -- */ --struct inode *ilookup5(struct super_block *sb, unsigned long hashval, -- int (*test)(struct inode *, void *), void *data) --{ -- struct hlist_head *head = inode_hashtable + hash(sb, hashval); -- -- return ifind(sb, head, test, data); --} --EXPORT_SYMBOL(ilookup5); -- --/** -- * ilookup - search for an inode in the inode cache -- * @sb: super block of file system to search -- * @ino: inode number to search for -- * -- * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache. -- * This is for file systems where the inode number is sufficient for unique -- * identification of an inode. -- * -- * If the inode is in the cache, the inode is returned with an incremented -- * reference count. -- * -- * Otherwise NULL is returned. -- */ --struct inode *ilookup(struct super_block *sb, unsigned long ino) --{ -- struct hlist_head *head = inode_hashtable + hash(sb, ino); -- -- return ifind_fast(sb, head, ino); --} --EXPORT_SYMBOL(ilookup); -- --/** -- * iget5_locked - obtain an inode from a mounted file system -- * @sb: super block of file system -- * @hashval: hash value (usually inode number) to get -- * @test: callback used for comparisons between inodes -- * @set: callback used to initialize a new struct inode -- * @data: opaque data pointer to pass to @test and @set -- * -- * This is iget() without the read_inode() portion of get_new_inode(). -- * -- * iget5_locked() uses ifind() to search for the inode specified by @hashval -- * and @data in the inode cache and if present it is returned with an increased -- * reference count. This is a generalized version of iget_locked() for file -- * systems where the inode number is not sufficient for unique identification -- * of an inode. -- * -- * If the inode is not in cache, get_new_inode() is called to allocate a new -- * inode and this is returned locked, hashed, and with the I_NEW flag set. The -- * file system gets to fill it in before unlocking it via unlock_new_inode(). -- * -- * Note both @test and @set are called with the inode_lock held, so can't sleep. -- */ --struct inode *iget5_locked(struct super_block *sb, unsigned long hashval, -- int (*test)(struct inode *, void *), -- int (*set)(struct inode *, void *), void *data) --{ -- struct hlist_head *head = inode_hashtable + hash(sb, hashval); -- struct inode *inode; -- -- inode = ifind(sb, head, test, data); -- if (inode) -- return inode; -- /* -- * get_new_inode() will do the right thing, re-trying the search -- * in case it had to block at any point. -- */ -- return get_new_inode(sb, head, test, set, data); --} --EXPORT_SYMBOL(iget5_locked); -- --/** -- * iget_locked - obtain an inode from a mounted file system -- * @sb: super block of file system -- * @ino: inode number to get -- * -- * This is iget() without the read_inode() portion of get_new_inode_fast(). -- * -- * iget_locked() uses ifind_fast() to search for the inode specified by @ino in -- * the inode cache and if present it is returned with an increased reference -- * count. This is for file systems where the inode number is sufficient for -- * unique identification of an inode. -- * -- * If the inode is not in cache, get_new_inode_fast() is called to allocate a -- * new inode and this is returned locked, hashed, and with the I_NEW flag set. -- * The file system gets to fill it in before unlocking it via -- * unlock_new_inode(). -- */ --struct inode *iget_locked(struct super_block *sb, unsigned long ino) --{ -- struct hlist_head *head = inode_hashtable + hash(sb, ino); -- struct inode *inode; -- -- inode = ifind_fast(sb, head, ino); -- if (inode) -- return inode; -- /* -- * get_new_inode_fast() will do the right thing, re-trying the search -- * in case it had to block at any point. -- */ -- return get_new_inode_fast(sb, head, ino); --} --EXPORT_SYMBOL(iget_locked); -- --/** -- * __insert_inode_hash - hash an inode -- * @inode: unhashed inode -- * @hashval: unsigned long value used to locate this object in the -- * inode_hashtable. -- * -- * Add an inode to the inode hash for this superblock. If the inode -- * has no superblock it is added to a separate anonymous chain. -- */ -- --void __insert_inode_hash(struct inode *inode, unsigned long hashval) --{ -- struct hlist_head *head = &anon_hash_chain; -- if (inode->i_sb) -- head = inode_hashtable + hash(inode->i_sb, hashval); -- spin_lock(&inode_lock); -- hlist_add_head(&inode->i_hash, head); -- spin_unlock(&inode_lock); --} -- --/** -- * remove_inode_hash - remove an inode from the hash -- * @inode: inode to unhash -- * -- * Remove an inode from the superblock or anonymous hash. -- */ -- --void remove_inode_hash(struct inode *inode) --{ -- spin_lock(&inode_lock); -- hlist_del_init(&inode->i_hash); -- spin_unlock(&inode_lock); --} -- --void generic_delete_inode(struct inode *inode) -+*** 949,7 **** 4 void generic_delete_inode(struct inode * - { - struct super_operations *op = inode->i_sb->s_op; - -| <<<--hlist_del_init-->>><<<++list_del_init++>>>(&inode->i_hash); - list_del_init(&inode->i_list); - inode->i_state|=I_FREEING; - inodes_stat.nr_inodes--; -- spin_unlock(&inode_lock); -- -- if (inode->i_data.nrpages) -- truncate_inode_pages(&inode->i_data, 0); -- -- security_inode_delete(inode); -- -- if (op->delete_inode) { -- void (*delete)(struct inode *) = op->delete_inode; -- if (!is_bad_inode(inode)) -- DQUOT_INIT(inode); -- /* s_op->delete_inode internally recalls clear_inode() */ -+*** 968,6 **** 5 void generic_delete_inode(struct inode * - delete(inode); - } else - clear_inode(inode); - if (inode->i_state != I_CLEAR) - BUG(); - destroy_inode(inode); --} --EXPORT_SYMBOL(generic_delete_inode); -- --static void generic_forget_inode(struct inode *inode) --{ -- struct super_block *sb = inode->i_sb; -- -- if (!hlist_unhashed(&inode->i_hash)) { -- if (!(inode->i_state & (I_DIRTY|I_LOCK))) { -- list_del(&inode->i_list); -- list_add(&inode->i_list, &inode_unused); -- } -- inodes_stat.nr_unused++; -- spin_unlock(&inode_lock); -- if (!sb || (sb->s_flags & MS_ACTIVE)) -- return; -- write_inode_now(inode, 1); -- spin_lock(&inode_lock); -- inodes_stat.nr_unused--; -- hlist_del_init(&inode->i_hash); -- } -- list_del_init(&inode->i_list); -- inode->i_state|=I_FREEING; -- inodes_stat.nr_inodes--; -- spin_unlock(&inode_lock); -- if (inode->i_data.nrpages) -- truncate_inode_pages(&inode->i_data, 0); -- clear_inode(inode); -- destroy_inode(inode); --} -- --/* -- * Normal UNIX filesystem behaviour: delete the -- * inode when the usage count drops to zero, and -- * i_nlink is zero. -- */ --static void generic_drop_inode(struct inode *inode) --{ -- if (!inode->i_nlink) -- generic_delete_inode(inode); -- else -- generic_forget_inode(inode); --} -- --/* -- * Called when we're dropping the last reference -- * to an inode. -- * -- * Call the FS "drop()" function, defaulting to -- * the legacy UNIX filesystem behaviour.. -- * -- * NOTE! NOTE! NOTE! We're called with the inode lock -- * held, and the drop function is supposed to release -- * the lock! -- */ --static inline void iput_final(struct inode *inode) --{ -- struct super_operations *op = inode->i_sb->s_op; -- void (*drop)(struct inode *) = generic_drop_inode; -- -- if (op && op->drop_inode) -- drop = op->drop_inode; -- drop(inode); --} -- --/** -- * iput - put an inode -- * @inode: inode to put -- * -- * Puts an inode, dropping its usage count. If the inode use count hits -- * zero the inode is also then freed and may be destroyed. -- */ -- --void iput(struct inode *inode) --{ -- if (inode) { -- struct super_operations *op = inode->i_sb->s_op; -- -- if (inode->i_state == I_CLEAR) -- BUG(); -- -- if (op && op->put_inode) -- op->put_inode(inode); -- -- if (atomic_dec_and_lock(&inode->i_count, &inode_lock)) -- iput_final(inode); -- } --} -- --/** -- * bmap - find a block number in a file -- * @inode: inode of file -- * @block: block to find -- * -- * Returns the block number on the device holding the inode that -- * is the disk block number for the block of the file requested. -- * That is, asked for block 4 of inode 1 the function will return the -- * disk block relative to the disk start that holds that block of the -- * file. -- */ -- --sector_t bmap(struct inode * inode, sector_t block) --{ -- sector_t res = 0; -- if (inode->i_mapping->a_ops->bmap) -- res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block); -- return res; --} -- --/* -- * Return true if the filesystem which backs this inode considers the two -- * passed timespecs to be sufficiently different to warrant flushing the -- * altered time out to disk. -- */ --static int inode_times_differ(struct inode *inode, -- struct timespec *old, struct timespec *new) --{ -- if (IS_ONE_SECOND(inode)) -- return old->tv_sec != new->tv_sec; -- return !timespec_equal(old, new); --} -- --/** -- * update_atime - update the access time -- * @inode: inode accessed -- * -- * Update the accessed time on an inode and mark it for writeback. -- * This function automatically handles read only file systems and media, -- * as well as the "noatime" flag and inode specific "noatime" markers. -- */ -- --void update_atime(struct inode *inode) --{ -- struct timespec now; -- -- if (IS_NOATIME(inode)) -- return; -- if (IS_NODIRATIME(inode) && S_ISDIR(inode->i_mode)) -- return; -- if (IS_RDONLY(inode)) -- return; -- -- now = current_kernel_time(); -- if (inode_times_differ(inode, &inode->i_atime, &now)) { -- inode->i_atime = now; -- mark_inode_dirty_sync(inode); -- } else { -- if (!timespec_equal(&inode->i_atime, &now)) -- inode->i_atime = now; -- } --} -- --/** -- * inode_update_time - update mtime and ctime time -- * @inode: inode accessed -- * @ctime_too: update ctime too -- * -- * Update the mtime time on an inode and mark it for writeback. -- * When ctime_too is specified update the ctime too. -- */ -- --void inode_update_time(struct inode *inode, int ctime_too) --{ -- struct timespec now = current_kernel_time(); -- int sync_it = 0; -- -- if (inode_times_differ(inode, &inode->i_mtime, &now)) -- sync_it = 1; -- inode->i_mtime = now; -- -- if (ctime_too) { -- if (inode_times_differ(inode, &inode->i_ctime, &now)) -- sync_it = 1; -- inode->i_ctime = now; -- } -- if (sync_it) -- mark_inode_dirty_sync(inode); --} --EXPORT_SYMBOL(inode_update_time); -- --int inode_needs_sync(struct inode *inode) --{ -- if (IS_SYNC(inode)) -- return 1; -- if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) -- return 1; -- return 0; --} --EXPORT_SYMBOL(inode_needs_sync); -- --/* -- * Quota functions that want to walk the inode lists.. -- */ --#ifdef CONFIG_QUOTA -- --/* Functions back in dquot.c */ --void put_dquot_list(struct list_head *); --int remove_inode_dquot_ref(struct inode *, int, struct list_head *); -- --void remove_dquot_ref(struct super_block *sb, int type) --{ -- struct inode *inode; -- struct list_head *act_head; -- LIST_HEAD(tofree_head); -- -- if (!sb->dq_op) -- return; /* nothing to do */ -- spin_lock(&inode_lock); /* This lock is for inodes code */ -- /* We don't have to lock against quota code - test IS_QUOTAINIT is just for speedup... */ -- -- list_for_each(act_head, &inode_in_use) { -- inode = list_entry(act_head, struct inode, i_list); -- if (inode->i_sb == sb && IS_QUOTAINIT(inode)) -- remove_inode_dquot_ref(inode, type, &tofree_head); -- } -- list_for_each(act_head, &inode_unused) { -- inode = list_entry(act_head, struct inode, i_list); -- if (inode->i_sb == sb && IS_QUOTAINIT(inode)) -- remove_inode_dquot_ref(inode, type, &tofree_head); -- } -- list_for_each(act_head, &sb->s_dirty) { -- inode = list_entry(act_head, struct inode, i_list); -- if (IS_QUOTAINIT(inode)) -- remove_inode_dquot_ref(inode, type, &tofree_head); -- } -- list_for_each(act_head, &sb->s_io) { -- inode = list_entry(act_head, struct inode, i_list); -- if (IS_QUOTAINIT(inode)) -- remove_inode_dquot_ref(inode, type, &tofree_head); -- } -- spin_unlock(&inode_lock); -- -- put_dquot_list(&tofree_head); --} -- --#endif -- --/* -- * Hashed waitqueues for wait_on_inode(). The table is pretty small - the -- * kernel doesn't lock many inodes at the same time. -- */ --#define I_WAIT_TABLE_ORDER 3 --static struct i_wait_queue_head { -- wait_queue_head_t wqh; --} ____cacheline_aligned_in_smp i_wait_queue_heads[1<i_state & I_LOCK) { -- schedule(); -- goto repeat; -- } -- remove_wait_queue(wq, &wait); -|<<<-- __set_current_state(-->>><<<++*** 1219,6 **** 6 repeat: -| current->state = ++>>>TASK_RUNNING<<<--)-->>>; - } - - void wake_up_inode(struct inode *inode) - { - wait_queue_head_t *wq = i_waitq_head(inode); -- -- /* -- * Prevent speculative execution through spin_unlock(&inode_lock); -- */ -- smp_mb(); -- if (waitqueue_active(wq)) -- wake_up_all(wq); --} -- --/* -- * Initialize the waitqueues and inode hash table. -- */ --void __init inode_init(unsigned long mempages) --{ -- struct hlist_head *head; -- unsigned long order; -- unsigned int nr_hash; -- int i; -- -- for (i = 0; i < ARRAY_SIZE(i_wait_queue_heads); i++) -- init_waitqueue_head(&i_wait_queue_heads[i].wqh); -- -- mempages >>= (14 - PAGE_SHIFT); -- mempages *= sizeof(struct list_head); -- for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++) -- ; -- -- do { -- unsigned long tmp; -- -- nr_hash = (1UL << order) * PAGE_SIZE / -- sizeof(struct hlist_head); -- i_hash_mask = (nr_hash - 1); -- -- tmp = nr_hash; -- i_hash_shift = 0; -- while ((tmp >>= 1UL) != 0UL) -- i_hash_shift++; -- -- inode_hashtable = (struct hlist_head *) -- __get_free_pages(GFP_ATOMIC, order); -- } while (inode_hashtable == NULL && --order >= 0); -- -- printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n", -- nr_hash, order, (PAGE_SIZE << order)); -- -- if (!inode_hashtable) -- panic("Failed to allocate inode hash table\n"); -- -- head = inode_hashtable; -- i = nr_hash; -- do { -- INIT_HLIST_HEAD(head); -- head++; -- i--; -- } while (i); -- -- /* inode slab cache */ -- inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), -- 0, SLAB_HWCACHE_ALIGN, init_once, -- NULL); -- if (!inode_cachep) -- panic("cannot create inode slab cache"); -- -- set_shrinker(DEFAULT_SEEKS, shrink_icache_memory); --} -- --void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) --{ -- inode->i_mode = mode; -- if (S_ISCHR(mode)) { -- inode->i_fop = &def_chr_fops; -- inode->i_rdev = to_kdev_t(rdev); -- } else if (S_ISBLK(mode)) { -- inode->i_fop = &def_blk_fops; -- inode->i_rdev = to_kdev_t(rdev); -- } else if (S_ISFIFO(mode)) -- inode->i_fop = &def_fifo_fops; -- else if (S_ISSOCK(mode)) -- inode->i_fop = &bad_sock_fops; -- else -- printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)\n", -- mode); --} ./linux/inode-fullpatch/diff FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.851487517 +0000 @@ -1,18 +0,0 @@ -<<<<<<< found -||||||| expected -#define IDMAP_STATUS_LOOKUPFAIL IDMAP_STATUS_FAIL - - -/* XXX get (include) from bits/utmp.h */ -#define IDMAP_NAMESZ 128 - -======= -#define IDMAP_STATUS_LOOKUPFAIL IDMAP_STATUS_FAIL - - -#define IDMAP_MAXMSGSZ 256 - -/* XXX get (include) from bits/utmp.h */ -#define IDMAP_NAMESZ 128 - ->>>>>>> replacement ./linux/idmap.h/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:11.862123842 +0000 @@ -1,7269 +0,0 @@ -/* xfaces.c -- "Face" primitives. - Copyright (C) 1993, 1994, 1998, 1999, 2000, 2001 - Free Software Foundation. - -This file is part of GNU Emacs. - -GNU Emacs is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 2, or (at your option) -any later version. - -GNU Emacs is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with GNU Emacs; see the file COPYING. If not, write to -the Free Software Foundation, Inc., 59 Temple Place - Suite 330, -Boston, MA 02111-1307, USA. */ - -/* New face implementation by Gerd Moellmann . */ - -/* Faces. - - When using Emacs with X, the display style of characters can be - changed by defining `faces'. Each face can specify the following - display attributes: - - 1. Font family name. - - 2. Relative proportionate width, aka character set width or set - width (swidth), e.g. `semi-compressed'. - - 3. Font height in 1/10pt. - - 4. Font weight, e.g. `bold'. - - 5. Font slant, e.g. `italic'. - - 6. Foreground color. - - 7. Background color. - - 8. Whether or not characters should be underlined, and in what color. - - 9. Whether or not characters should be displayed in inverse video. - - 10. A background stipple, a bitmap. - - 11. Whether or not characters should be overlined, and in what color. - - 12. Whether or not characters should be strike-through, and in what - color. - - 13. Whether or not a box should be drawn around characters, the box - type, and, for simple boxes, in what color. - - 14. Font or fontset pattern, or nil. This is a special attribute. - When this attribute is specified, the face uses a font opened by - that pattern as is. In addition, all the other font-related - attributes (1st thru 5th) are generated from the opened font name. - On the other hand, if one of the other font-related attributes are - specified, this attribute is set to nil. In that case, the face - doesn't inherit this attribute from the `default' face, and uses a - font determined by the other attributes (those may be inherited - from the `default' face). - - 15. A face name or list of face names from which to inherit attributes. - - 16. A specified average font width, which is invisible from Lisp, - and is used to ensure that a font specified on the command line, - for example, can be matched exactly. - - Faces are frame-local by nature because Emacs allows to define the - same named face (face names are symbols) differently for different - frames. Each frame has an alist of face definitions for all named - faces. The value of a named face in such an alist is a Lisp vector - with the symbol `face' in slot 0, and a slot for each of the face - attributes mentioned above. - - There is also a global face alist `Vface_new_frame_defaults'. Face - definitions from this list are used to initialize faces of newly - created frames. - - A face doesn't have to specify all attributes. Those not specified - have a value of `unspecified'. Faces specifying all attributes but - the 14th are called `fully-specified'. - - - Face merging. - - The display style of a given character in the text is determined by - combining several faces. This process is called `face merging'. - Any aspect of the display style that isn't specified by overlays or - text properties is taken from the `default' face. Since it is made - sure that the default face is always fully-specified, face merging - always results in a fully-specified face. - - - Face realization. - - After all face attributes for a character have been determined by - merging faces of that character, that face is `realized'. The - realization process maps face attributes to what is physically - available on the system where Emacs runs. The result is a - `realized face' in form of a struct face which is stored in the - face cache of the frame on which it was realized. - - Face realization is done in the context of the character to display - because different fonts may be used for different characters. In - other words, for characters that have different font - specifications, different realized faces are needed to display - them. - - Font specification is done by fontsets. See the comment in - fontset.c for the details. In the current implementation, all ASCII - characters share the same font in a fontset. - - Faces are at first realized for ASCII characters, and, at that - time, assigned a specific realized fontset. Hereafter, we call - such a face as `ASCII face'. When a face for a multibyte character - is realized, it inherits (thus shares) a fontset of an ASCII face - that has the same attributes other than font-related ones. - - Thus, all realized face have a realized fontset. - - - Unibyte text. - - Unibyte text (i.e. raw 8-bit characters) is displayed with the same - font as ASCII characters. That is because it is expected that - unibyte text users specify a font that is suitable both for ASCII - and raw 8-bit characters. - - - Font selection. - - Font selection tries to find the best available matching font for a - given (character, face) combination. - - If the face specifies a fontset name, that fontset determines a - pattern for fonts of the given character. If the face specifies a - font name or the other font-related attributes, a fontset is - realized from the default fontset. In that case, that - specification determines a pattern for ASCII characters and the - default fontset determines a pattern for multibyte characters. - - Available fonts on the system on which Emacs runs are then matched - against the font pattern. The result of font selection is the best - match for the given face attributes in this font list. - - Font selection can be influenced by the user. - - 1. The user can specify the relative importance he gives the face - attributes width, height, weight, and slant by setting - face-font-selection-order (faces.el) to a list of face attribute - names. The default is '(:width :height :weight :slant), and means - that font selection first tries to find a good match for the font - width specified by a face, then---within fonts with that - width---tries to find a best match for the specified font height, - etc. - - 2. Setting face-font-family-alternatives allows the user to - specify alternative font families to try if a family specified by a - face doesn't exist. - - 3. Setting face-font-registry-alternatives allows the user to - specify all alternative font registries to try for a face - specifying a registry. - - 4. Setting face-ignored-fonts allows the user to ignore specific - fonts. - - - Character composition. - - Usually, the realization process is already finished when Emacs - actually reflects the desired glyph matrix on the screen. However, - on displaying a composition (sequence of characters to be composed - on the screen), a suitable font for the components of the - composition is selected and realized while drawing them on the - screen, i.e. the realization process is delayed but in principle - the same. - - - Initialization of basic faces. - - The faces `default', `modeline' are considered `basic faces'. - When redisplay happens the first time for a newly created frame, - basic faces are realized for CHARSET_ASCII. Frame parameters are - used to fill in unspecified attributes of the default face. */ - -#include -#include -#include -#include "lisp.h" -#include "charset.h" -#include "keyboard.h" -#include "frame.h" - -#ifdef HAVE_WINDOW_SYSTEM -#include "fontset.h" -#endif /* HAVE_WINDOW_SYSTEM */ - -#ifdef HAVE_X_WINDOWS -#include "xterm.h" -#ifdef USE_MOTIF -#include -#include -#endif /* USE_MOTIF */ -#endif /* HAVE_X_WINDOWS */ - -#ifdef MSDOS -#include "dosfns.h" -#endif - -#ifdef WINDOWSNT -#include "w32term.h" -#include "fontset.h" -/* Redefine X specifics to W32 equivalents to avoid cluttering the - code with #ifdef blocks. */ -#undef FRAME_X_DISPLAY_INFO -#define FRAME_X_DISPLAY_INFO FRAME_W32_DISPLAY_INFO -#define x_display_info w32_display_info -#define FRAME_X_FONT_TABLE FRAME_W32_FONT_TABLE -#define check_x check_w32 -#define x_list_fonts w32_list_fonts -#define GCGraphicsExposures 0 -/* For historic reasons, FONT_WIDTH refers to average width on W32, - not maximum as on X. Redefine here. */ -#undef FONT_WIDTH -#define FONT_WIDTH FONT_MAX_WIDTH -#endif /* WINDOWSNT */ - -#ifdef macintosh -#include "macterm.h" -#define x_display_info mac_display_info -#define check_x check_mac - -extern XGCValues *XCreateGC (void *, WindowPtr, unsigned long, XGCValues *); - -static INLINE GC -x_create_gc (f, mask, xgcv) - struct frame *f; - unsigned long mask; - XGCValues *xgcv; -{ - GC gc; - gc = XCreateGC (FRAME_MAC_DISPLAY (f), FRAME_MAC_WINDOW (f), mask, xgcv); - return gc; -} - -static INLINE void -x_free_gc (f, gc) - struct frame *f; - GC gc; -{ - XFreeGC (FRAME_MAC_DISPLAY (f), gc); -} -#endif - -#include "buffer.h" -#include "dispextern.h" -#include "blockinput.h" -#include "window.h" -#include "intervals.h" - -#ifdef HAVE_X_WINDOWS - -/* Compensate for a bug in Xos.h on some systems, on which it requires - time.h. On some such systems, Xos.h tries to redefine struct - timeval and struct timezone if USG is #defined while it is - #included. */ - -#ifdef XOS_NEEDS_TIME_H -#include -#undef USG -#include -#define USG -#define __TIMEVAL__ -#else /* not XOS_NEEDS_TIME_H */ -#include -#endif /* not XOS_NEEDS_TIME_H */ - -#endif /* HAVE_X_WINDOWS */ - -#include -#include - -#ifndef max -#define max(A, B) ((A) > (B) ? (A) : (B)) -#define min(A, B) ((A) < (B) ? (A) : (B)) -#define abs(X) ((X) < 0 ? -(X) : (X)) -#endif - -/* Number of pt per inch (from the TeXbook). */ - -#define PT_PER_INCH 72.27 - -/* Non-zero if face attribute ATTR is unspecified. */ - -#define UNSPECIFIEDP(ATTR) EQ ((ATTR), Qunspecified) - -/* Value is the number of elements of VECTOR. */ - -#define DIM(VECTOR) (sizeof (VECTOR) / sizeof *(VECTOR)) - -/* Make a copy of string S on the stack using alloca. Value is a pointer - to the copy. */ - -#define STRDUPA(S) strcpy ((char *) alloca (strlen ((S)) + 1), (S)) - -/* Make a copy of the contents of Lisp string S on the stack using - alloca. Value is a pointer to the copy. */ - -#define LSTRDUPA(S) STRDUPA (XSTRING ((S))->data) - -/* Size of hash table of realized faces in face caches (should be a - prime number). */ - -#define FACE_CACHE_BUCKETS_SIZE 1001 - -/* A definition of XColor for non-X frames. */ - -#ifndef HAVE_X_WINDOWS - -typedef struct -{ - unsigned long pixel; - unsigned short red, green, blue; - char flags; - char pad; -} -XColor; - -#endif /* not HAVE_X_WINDOWS */ - -/* Keyword symbols used for face attribute names. */ - -Lisp_Object QCfamily, QCheight, QCweight, QCslant, QCunderline; -Lisp_Object QCinverse_video, QCforeground, QCbackground, QCstipple; -Lisp_Object QCwidth, QCfont, QCbold, QCitalic; -Lisp_Object QCreverse_video; -Lisp_Object QCoverline, QCstrike_through, QCbox, QCinherit; - -/* Symbols used for attribute values. */ - -Lisp_Object Qnormal, Qbold, Qultra_light, Qextra_light, Qlight; -Lisp_Object Qsemi_light, Qsemi_bold, Qextra_bold, Qultra_bold; -Lisp_Object Qoblique, Qitalic, Qreverse_oblique, Qreverse_italic; -Lisp_Object Qultra_condensed, Qextra_condensed, Qcondensed; -Lisp_Object Qsemi_condensed, Qsemi_expanded, Qexpanded, Qextra_expanded; -Lisp_Object Qultra_expanded; -Lisp_Object Qreleased_button, Qpressed_button; -Lisp_Object QCstyle, QCcolor, QCline_width; -Lisp_Object Qunspecified; - -char unspecified_fg[] = "unspecified-fg", unspecified_bg[] = "unspecified-bg"; - -/* The name of the function to call when the background of the frame - has changed, frame_update_face_colors. */ - -Lisp_Object Qframe_update_face_colors; - -/* Names of basic faces. */ - -Lisp_Object Qdefault, Qtool_bar, Qregion, Qfringe; -Lisp_Object Qheader_line, Qscroll_bar, Qcursor, Qborder, Qmouse, Qmenu; -extern Lisp_Object Qmode_line; - -/* The symbol `face-alias'. A symbols having that property is an - alias for another face. Value of the property is the name of - the aliased face. */ - -Lisp_Object Qface_alias; - -/* Names of frame parameters related to faces. */ - -extern Lisp_Object Qscroll_bar_foreground, Qscroll_bar_background; -extern Lisp_Object Qborder_color, Qcursor_color, Qmouse_color; - -/* Default stipple pattern used on monochrome displays. This stipple - pattern is used on monochrome displays instead of shades of gray - for a face background color. See `set-face-stipple' for possible - values for this variable. */ - -Lisp_Object Vface_default_stipple; - -/* Alist of alternative font families. Each element is of the form - (FAMILY FAMILY1 FAMILY2 ...). If fonts of FAMILY can't be loaded, - try FAMILY1, then FAMILY2, ... */ - -Lisp_Object Vface_alternative_font_family_alist; - -/* Alist of alternative font registries. Each element is of the form - (REGISTRY REGISTRY1 REGISTRY2...). If fonts of REGISTRY can't be - loaded, try REGISTRY1, then REGISTRY2, ... */ - -Lisp_Object Vface_alternative_font_registry_alist; - -/* Allowed scalable fonts. A value of nil means don't allow any - scalable fonts. A value of t means allow the use of any scalable - font. Otherwise, value must be a list of regular expressions. A - font may be scaled if its name matches a regular expression in the - list. */ - -Lisp_Object Vscalable_fonts_allowed, Qscalable_fonts_allowed; - -/* List of regular expressions that matches names of fonts to ignore. */ - -Lisp_Object Vface_ignored_fonts; - -/* Maximum number of fonts to consider in font_list. If not an - integer > 0, DEFAULT_FONT_LIST_LIMIT is used instead. */ - -Lisp_Object Vfont_list_limit; -#define DEFAULT_FONT_LIST_LIMIT 100 - -/* The symbols `foreground-color' and `background-color' which can be - used as part of a `face' property. This is for compatibility with - Emacs 20.2. */ - -Lisp_Object Qforeground_color, Qbackground_color; - -/* The symbols `face' and `mouse-face' used as text properties. */ - -Lisp_Object Qface; -extern Lisp_Object Qmouse_face; - -/* Error symbol for wrong_type_argument in load_pixmap. */ - -Lisp_Object Qbitmap_spec_p; - -/* Alist of global face definitions. Each element is of the form - (FACE . LFACE) where FACE is a symbol naming a face and LFACE - is a Lisp vector of face attributes. These faces are used - to initialize faces for new frames. */ - -Lisp_Object Vface_new_frame_defaults; - -/* The next ID to assign to Lisp faces. */ - -static int next_lface_id; - -/* A vector mapping Lisp face Id's to face names. */ - -static Lisp_Object *lface_id_to_name; -static int lface_id_to_name_size; - -/* TTY color-related functions (defined in tty-colors.el). */ - -Lisp_Object Qtty_color_desc, Qtty_color_by_index; - -/* The name of the function used to compute colors on TTYs. */ - -Lisp_Object Qtty_color_alist; - -/* An alist of defined terminal colors and their RGB values. */ - -Lisp_Object Vtty_defined_color_alist; - -/* Counter for calls to clear_face_cache. If this counter reaches - CLEAR_FONT_TABLE_COUNT, and a frame has more than - CLEAR_FONT_TABLE_NFONTS load, unused fonts are freed. */ - -static int clear_font_table_count; -#define CLEAR_FONT_TABLE_COUNT 100 -#define CLEAR_FONT_TABLE_NFONTS 10 - -/* Non-zero means face attributes have been changed since the last - redisplay. Used in redisplay_internal. */ - -int face_change_count; - -/* Non-zero means don't display bold text if a face's foreground - and background colors are the inverse of the default colors of the - display. This is a kluge to suppress `bold black' foreground text - which is hard to read on an LCD monitor. */ - -int tty_suppress_bold_inverse_default_colors_p; - -/* A list of the form `((x . y))' used to avoid consing in - Finternal_set_lisp_face_attribute. */ - -static Lisp_Object Vparam_value_alist; - -/* The total number of colors currently allocated. */ - -#if GLYPH_DEBUG -static int ncolors_allocated; -static int npixmaps_allocated; -static int ngcs; -#endif - -/* Non-zero means the definition of the `menu' face for new frames has - been changed. */ - -int menu_face_changed_default; - - -/* Function prototypes. */ - -struct font_name; -struct table_entry; - -static void map_tty_color P_ ((struct frame *, struct face *, - enum lface_attribute_index, int *)); -static Lisp_Object resolve_face_name P_ ((Lisp_Object)); -static int may_use_scalable_font_p P_ ((char *)); -static void set_font_frame_param P_ ((Lisp_Object, Lisp_Object)); -static int better_font_p P_ ((int *, struct font_name *, struct font_name *, - int, int)); -static int x_face_list_fonts P_ ((struct frame *, char *, - struct font_name *, int, int)); -static int font_scalable_p P_ ((struct font_name *)); -static int get_lface_attributes P_ ((struct frame *, Lisp_Object, Lisp_Object *, int)); -static int load_pixmap P_ ((struct frame *, Lisp_Object, unsigned *, unsigned *)); -static unsigned char *xstrlwr P_ ((unsigned char *)); -static void signal_error P_ ((char *, Lisp_Object)); -static struct frame *frame_or_selected_frame P_ ((Lisp_Object, int)); -static void load_face_font P_ ((struct frame *, struct face *, int)); -static void load_face_colors P_ ((struct frame *, struct face *, Lisp_Object *)); -static void free_face_colors P_ ((struct frame *, struct face *)); -static int face_color_gray_p P_ ((struct frame *, char *)); -static char *build_font_name P_ ((struct font_name *)); -static void free_font_names P_ ((struct font_name *, int)); -static int sorted_font_list P_ ((struct frame *, char *, - int (*cmpfn) P_ ((const void *, const void *)), - struct font_name **)); -static int font_list_1 P_ ((struct frame *, Lisp_Object, Lisp_Object, - Lisp_Object, struct font_name **)); -static int font_list P_ ((struct frame *, Lisp_Object, Lisp_Object, - Lisp_Object, struct font_name **)); -static int try_font_list P_ ((struct frame *, Lisp_Object *, - Lisp_Object, Lisp_Object, struct font_name **)); -static int try_alternative_families P_ ((struct frame *f, Lisp_Object, - Lisp_Object, struct font_name **)); -static int cmp_font_names P_ ((const void *, const void *)); -static struct face *realize_face P_ ((struct face_cache *, Lisp_Object *, int, - struct face *, int)); -static struct face *realize_x_face P_ ((struct face_cache *, - Lisp_Object *, int, struct face *)); -static struct face *realize_tty_face P_ ((struct face_cache *, - Lisp_Object *, int)); -static int realize_basic_faces P_ ((struct frame *)); -static int realize_default_face P_ ((struct frame *)); -static void realize_named_face P_ ((struct frame *, Lisp_Object, int)); -static int lface_fully_specified_p P_ ((Lisp_Object *)); -static int lface_equal_p P_ ((Lisp_Object *, Lisp_Object *)); -static unsigned hash_string_case_insensitive P_ ((Lisp_Object)); -static unsigned lface_hash P_ ((Lisp_Object *)); -static int lface_same_font_attributes_p P_ ((Lisp_Object *, Lisp_Object *)); -static struct face_cache *make_face_cache P_ ((struct frame *)); -static void free_realized_face P_ ((struct frame *, struct face *)); -static void clear_face_gcs P_ ((struct face_cache *)); -static void free_face_cache P_ ((struct face_cache *)); -static int face_numeric_weight P_ ((Lisp_Object)); -static int face_numeric_slant P_ ((Lisp_Object)); -static int face_numeric_swidth P_ ((Lisp_Object)); -static int face_fontset P_ ((Lisp_Object *)); -static char *choose_face_font P_ ((struct frame *, Lisp_Object *, int, int)); -static void merge_face_vectors P_ ((struct frame *, Lisp_Object *, Lisp_Object*, Lisp_Object)); -static void merge_face_inheritance P_ ((struct frame *f, Lisp_Object, - Lisp_Object *, Lisp_Object)); -static void merge_face_vector_with_property P_ ((struct frame *, Lisp_Object *, - Lisp_Object)); -static int set_lface_from_font_name P_ ((struct frame *, Lisp_Object, - Lisp_Object, int, int)); -static Lisp_Object lface_from_face_name P_ ((struct frame *, Lisp_Object, int)); -static struct face *make_realized_face P_ ((Lisp_Object *)); -static void free_realized_faces P_ ((struct face_cache *)); -static char *best_matching_font P_ ((struct frame *, Lisp_Object *, - struct font_name *, int, int)); -static void cache_face P_ ((struct face_cache *, struct face *, unsigned)); -static void uncache_face P_ ((struct face_cache *, struct face *)); -static int xlfd_numeric_slant P_ ((struct font_name *)); -static int xlfd_numeric_weight P_ ((struct font_name *)); -static int xlfd_numeric_swidth P_ ((struct font_name *)); -static Lisp_Object xlfd_symbolic_slant P_ ((struct font_name *)); -static Lisp_Object xlfd_symbolic_weight P_ ((struct font_name *)); -static Lisp_Object xlfd_symbolic_swidth P_ ((struct font_name *)); -static int xlfd_fixed_p P_ ((struct font_name *)); -static int xlfd_numeric_value P_ ((struct table_entry *, int, struct font_name *, - int, int)); -static Lisp_Object xlfd_symbolic_value P_ ((struct table_entry *, int, - struct font_name *, int, - Lisp_Object)); -static struct table_entry *xlfd_lookup_field_contents P_ ((struct table_entry *, int, - struct font_name *, int)); - -#ifdef HAVE_WINDOW_SYSTEM - -static int split_font_name P_ ((struct frame *, struct font_name *, int)); -static int xlfd_point_size P_ ((struct frame *, struct font_name *)); -static void sort_fonts P_ ((struct frame *, struct font_name *, int, - int (*cmpfn) P_ ((const void *, const void *)))); -static GC x_create_gc P_ ((struct frame *, unsigned long, XGCValues *)); -static void x_free_gc P_ ((struct frame *, GC)); -static void clear_font_table P_ ((struct x_display_info *)); - -#ifdef WINDOWSNT -extern Lisp_Object w32_list_fonts P_ ((struct frame *, Lisp_Object, int, int)); -#endif /* WINDOWSNT */ - -#ifdef USE_X_TOOLKIT -static void x_update_menu_appearance P_ ((struct frame *)); -#endif /* USE_X_TOOLKIT */ - -#endif /* HAVE_WINDOW_SYSTEM */ - - -/*********************************************************************** - Utilities - ***********************************************************************/ - -#ifdef HAVE_X_WINDOWS - -#ifdef DEBUG_X_COLORS - -/* The following is a poor mans infrastructure for debugging X color - allocation problems on displays with PseudoColor-8. Some X servers - like 3.3.5 XF86_SVGA with Matrox cards apparently don't implement - color reference counts completely so that they don't signal an - error when a color is freed whose reference count is already 0. - Other X servers do. To help me debug this, the following code - implements a simple reference counting schema of its own, for a - single display/screen. --gerd. */ - -/* Reference counts for pixel colors. */ - -int color_count[256]; - -/* Register color PIXEL as allocated. */ - -void -register_color (pixel) - unsigned long pixel; -{ - xassert (pixel < 256); - ++color_count[pixel]; -} - - -/* Register color PIXEL as deallocated. */ - -void -unregister_color (pixel) - unsigned long pixel; -{ - xassert (pixel < 256); - if (color_count[pixel] > 0) - --color_count[pixel]; - else - abort (); -} - - -/* Register N colors from PIXELS as deallocated. */ - -void -unregister_colors (pixels, n) - unsigned long *pixels; - int n; -{ - int i; - for (i = 0; i < n; ++i) - unregister_color (pixels[i]); -} - - -DEFUN ("dump-colors", Fdump_colors, Sdump_colors, 0, 0, 0, - "Dump currently allocated colors and their reference counts to stderr.") - () -{ - int i, n; - - fputc ('\n', stderr); - - for (i = n = 0; i < sizeof color_count / sizeof color_count[0]; ++i) - if (color_count[i]) - { - fprintf (stderr, "%3d: %5d", i, color_count[i]); - ++n; - if (n % 5 == 0) - fputc ('\n', stderr); - else - fputc ('\t', stderr); - } - - if (n % 5 != 0) - fputc ('\n', stderr); - return Qnil; -} - -#endif /* DEBUG_X_COLORS */ - - -/* Free colors used on frame F. PIXELS is an array of NPIXELS pixel - color values. Interrupt input must be blocked when this function - is called. */ - -void -x_free_colors (f, pixels, npixels) - struct frame *f; - unsigned long *pixels; - int npixels; -{ - int class = FRAME_X_DISPLAY_INFO (f)->visual->class; - - /* If display has an immutable color map, freeing colors is not - necessary and some servers don't allow it. So don't do it. */ - if (class != StaticColor && class != StaticGray && class != TrueColor) - { -#ifdef DEBUG_X_COLORS - unregister_colors (pixels, npixels); -#endif - XFreeColors (FRAME_X_DISPLAY (f), FRAME_X_COLORMAP (f), - pixels, npixels, 0); - } -} - - -/* Free colors used on frame F. PIXELS is an array of NPIXELS pixel - color values. Interrupt input must be blocked when this function - is called. */ - -void -x_free_dpy_colors (dpy, screen, cmap, pixels, npixels) - Display *dpy; - Screen *screen; - Colormap cmap; - unsigned long *pixels; - int npixels; -{ - struct x_display_info *dpyinfo = x_display_info_for_display (dpy); - int class = dpyinfo->visual->class; - - /* If display has an immutable color map, freeing colors is not - necessary and some servers don't allow it. So don't do it. */ - if (class != StaticColor && class != StaticGray && class != TrueColor) - { -#ifdef DEBUG_X_COLORS - unregister_colors (pixels, npixels); -#endif - XFreeColors (dpy, cmap, pixels, npixels, 0); - } -} - - -/* Create and return a GC for use on frame F. GC values and mask - are given by XGCV and MASK. */ - -static INLINE GC -x_create_gc (f, mask, xgcv) - struct frame *f; - unsigned long mask; - XGCValues *xgcv; -{ - GC gc; - BLOCK_INPUT; - gc = XCreateGC (FRAME_X_DISPLAY (f), FRAME_X_WINDOW (f), mask, xgcv); - UNBLOCK_INPUT; - IF_DEBUG (++ngcs); - return gc; -} - - -/* Free GC which was used on frame F. */ - -static INLINE void -x_free_gc (f, gc) - struct frame *f; - GC gc; -{ - BLOCK_INPUT; - xassert (--ngcs >= 0); - XFreeGC (FRAME_X_DISPLAY (f), gc); - UNBLOCK_INPUT; -} - -#endif /* HAVE_X_WINDOWS */ - -#ifdef WINDOWSNT -/* W32 emulation of GCs */ - -static INLINE GC -x_create_gc (f, mask, xgcv) - struct frame *f; - unsigned long mask; - XGCValues *xgcv; -{ - GC gc; - BLOCK_INPUT; - gc = XCreateGC (NULL, FRAME_W32_WINDOW (f), mask, xgcv); - UNBLOCK_INPUT; - IF_DEBUG (++ngcs); - return gc; -} - - -/* Free GC which was used on frame F. */ - -static INLINE void -x_free_gc (f, gc) - struct frame *f; - GC gc; -{ - BLOCK_INPUT; - xassert (--ngcs >= 0); - xfree (gc); - UNBLOCK_INPUT; -} - -#endif /* WINDOWSNT */ - -/* Like stricmp. Used to compare parts of font names which are in - ISO8859-1. */ - -int -xstricmp (s1, s2) - unsigned char *s1, *s2; -{ - while (*s1 && *s2) - { - unsigned char c1 = tolower (*s1); - unsigned char c2 = tolower (*s2); - if (c1 != c2) - return c1 < c2 ? -1 : 1; - ++s1, ++s2; - } - - if (*s1 == 0) - return *s2 == 0 ? 0 : -1; - return 1; -} - - -/* Like strlwr, which might not always be available. */ - -static unsigned char * -xstrlwr (s) - unsigned char *s; -{ - unsigned char *p = s; - - for (p = s; *p; ++p) - *p = tolower (*p); - - return s; -} - - -/* Signal `error' with message S, and additional argument ARG. */ - -static void -signal_error (s, arg) - char *s; - Lisp_Object arg; -{ - Fsignal (Qerror, Fcons (build_string (s), Fcons (arg, Qnil))); -} - - -/* If FRAME is nil, return a pointer to the selected frame. - Otherwise, check that FRAME is a live frame, and return a pointer - to it. NPARAM is the parameter number of FRAME, for - CHECK_LIVE_FRAME. This is here because it's a frequent pattern in - Lisp function definitions. */ - -static INLINE struct frame * -frame_or_selected_frame (frame, nparam) - Lisp_Object frame; - int nparam; -{ - if (NILP (frame)) - frame = selected_frame; - - CHECK_LIVE_FRAME (frame, nparam); - return XFRAME (frame); -} - - -/*********************************************************************** - Frames and faces - ***********************************************************************/ - -/* Initialize face cache and basic faces for frame F. */ - -void -init_frame_faces (f) - struct frame *f; -{ - /* Make a face cache, if F doesn't have one. */ - if (FRAME_FACE_CACHE (f) == NULL) - FRAME_FACE_CACHE (f) = make_face_cache (f); - -#ifdef HAVE_WINDOW_SYSTEM - /* Make the image cache. */ - if (FRAME_WINDOW_P (f)) - { - if (FRAME_X_IMAGE_CACHE (f) == NULL) - FRAME_X_IMAGE_CACHE (f) = make_image_cache (); - ++FRAME_X_IMAGE_CACHE (f)->refcount; - } -#endif /* HAVE_WINDOW_SYSTEM */ - - /* Realize basic faces. Must have enough information in frame - parameters to realize basic faces at this point. */ -#ifdef HAVE_X_WINDOWS - if (!FRAME_X_P (f) || FRAME_X_WINDOW (f)) -#endif -#ifdef WINDOWSNT - if (!FRAME_WINDOW_P (f) || FRAME_W32_WINDOW (f)) -#endif - if (!realize_basic_faces (f)) - abort (); -} - - -/* Free face cache of frame F. Called from Fdelete_frame. */ - -void -free_frame_faces (f) - struct frame *f; -{ - struct face_cache *face_cache = FRAME_FACE_CACHE (f); - - if (face_cache) - { - free_face_cache (face_cache); - FRAME_FACE_CACHE (f) = NULL; - } - -#ifdef HAVE_WINDOW_SYSTEM - if (FRAME_WINDOW_P (f)) - { - struct image_cache *image_cache = FRAME_X_IMAGE_CACHE (f); - if (image_cache) - { - --image_cache->refcount; - if (image_cache->refcount == 0) - free_image_cache (f); - } - } -#endif /* HAVE_WINDOW_SYSTEM */ -} - - -/* Clear face caches, and recompute basic faces for frame F. Call - this after changing frame parameters on which those faces depend, - or when realized faces have been freed due to changing attributes - of named faces. */ - -void -recompute_basic_faces (f) - struct frame *f; -{ - if (FRAME_FACE_CACHE (f)) - { - clear_face_cache (0); - if (!realize_basic_faces (f)) - abort (); - } -} - - -/* Clear the face caches of all frames. CLEAR_FONTS_P non-zero means - try to free unused fonts, too. */ - -void -clear_face_cache (clear_fonts_p) - int clear_fonts_p; -{ -#ifdef HAVE_WINDOW_SYSTEM - Lisp_Object tail, frame; - struct frame *f; - - if (clear_fonts_p - || ++clear_font_table_count == CLEAR_FONT_TABLE_COUNT) - { - struct x_display_info *dpyinfo; - - /* Fonts are common for frames on one display, i.e. on - one X screen. */ - for (dpyinfo = x_display_list; dpyinfo; dpyinfo = dpyinfo->next) - if (dpyinfo->n_fonts > CLEAR_FONT_TABLE_NFONTS) - clear_font_table (dpyinfo); - - /* From time to time see if we can unload some fonts. This also - frees all realized faces on all frames. Fonts needed by - faces will be loaded again when faces are realized again. */ - clear_font_table_count = 0; - - FOR_EACH_FRAME (tail, frame) - { - struct frame *f = XFRAME (frame); - if (FRAME_WINDOW_P (f) - && FRAME_X_DISPLAY_INFO (f)->n_fonts > CLEAR_FONT_TABLE_NFONTS) - free_all_realized_faces (frame); - } - } - else - { - /* Clear GCs of realized faces. */ - FOR_EACH_FRAME (tail, frame) - { - f = XFRAME (frame); - if (FRAME_WINDOW_P (f)) - { - clear_face_gcs (FRAME_FACE_CACHE (f)); - clear_image_cache (f, 0); - } - } - } -#endif /* HAVE_WINDOW_SYSTEM */ -} - - -DEFUN ("clear-face-cache", Fclear_face_cache, Sclear_face_cache, 0, 1, 0, - "Clear face caches on all frames.\n\ -Optional THOROUGHLY non-nil means try to free unused fonts, too.") - (thoroughly) - Lisp_Object thoroughly; -{ - clear_face_cache (!NILP (thoroughly)); - ++face_change_count; - ++windows_or_buffers_changed; - return Qnil; -} - - - -#ifdef HAVE_WINDOW_SYSTEM - - -/* Remove fonts from the font table of DPYINFO except for the default - ASCII fonts of frames on that display. Called from clear_face_cache - from time to time. */ - -static void -clear_font_table (dpyinfo) - struct x_display_info *dpyinfo; -{ - int i; - - /* Free those fonts that are not used by frames on DPYINFO. */ - for (i = 0; i < dpyinfo->n_fonts; ++i) - { - struct font_info *font_info = dpyinfo->font_table + i; - Lisp_Object tail, frame; - - /* Check if slot is already free. */ - if (font_info->name == NULL) - continue; - - /* Don't free a default font of some frame on this display. */ - FOR_EACH_FRAME (tail, frame) - { - struct frame *f = XFRAME (frame); - if (FRAME_WINDOW_P (f) - && FRAME_X_DISPLAY_INFO (f) == dpyinfo - && font_info->font == FRAME_FONT (f)) - break; - } - - if (!NILP (tail)) - continue; - - /* Free names. */ - if (font_info->full_name != font_info->name) - xfree (font_info->full_name); - xfree (font_info->name); - - /* Free the font. */ - BLOCK_INPUT; -#ifdef HAVE_X_WINDOWS - XFreeFont (dpyinfo->display, font_info->font); -#endif -#ifdef WINDOWSNT - w32_unload_font (dpyinfo, font_info->font); -#endif - UNBLOCK_INPUT; - - /* Mark font table slot free. */ - font_info->font = NULL; - font_info->name = font_info->full_name = NULL; - } -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - - -/*********************************************************************** - X Pixmaps - ***********************************************************************/ - -#ifdef HAVE_WINDOW_SYSTEM - -DEFUN ("bitmap-spec-p", Fbitmap_spec_p, Sbitmap_spec_p, 1, 1, 0, - "Value is non-nil if OBJECT is a valid bitmap specification.\n\ -A bitmap specification is either a string, a file name, or a list\n\ -(WIDTH HEIGHT DATA) where WIDTH is the pixel width of the bitmap,\n\ -HEIGHT is its height, and DATA is a string containing the bits of\n\ -the pixmap. Bits are stored row by row, each row occupies\n\ -(WIDTH + 7)/8 bytes.") - (object) - Lisp_Object object; -{ - int pixmap_p = 0; - - if (STRINGP (object)) - /* If OBJECT is a string, it's a file name. */ - pixmap_p = 1; - else if (CONSP (object)) - { - /* Otherwise OBJECT must be (WIDTH HEIGHT DATA), WIDTH and - HEIGHT must be integers > 0, and DATA must be string large - enough to hold a bitmap of the specified size. */ - Lisp_Object width, height, data; - - height = width = data = Qnil; - - if (CONSP (object)) - { - width = XCAR (object); - object = XCDR (object); - if (CONSP (object)) - { - height = XCAR (object); - object = XCDR (object); - if (CONSP (object)) - data = XCAR (object); - } - } - - if (NATNUMP (width) && NATNUMP (height) && STRINGP (data)) - { - int bytes_per_row = ((XFASTINT (width) + BITS_PER_CHAR - 1) - / BITS_PER_CHAR); - if (STRING_BYTES (XSTRING (data)) >= bytes_per_row * XINT (height)) - pixmap_p = 1; - } - } - - return pixmap_p ? Qt : Qnil; -} - - -/* Load a bitmap according to NAME (which is either a file name or a - pixmap spec) for use on frame F. Value is the bitmap_id (see - xfns.c). If NAME is nil, return with a bitmap id of zero. If - bitmap cannot be loaded, display a message saying so, and return - zero. Store the bitmap width in *W_PTR and its height in *H_PTR, - if these pointers are not null. */ - -static int -load_pixmap (f, name, w_ptr, h_ptr) - FRAME_PTR f; - Lisp_Object name; - unsigned int *w_ptr, *h_ptr; -{ - int bitmap_id; - Lisp_Object tem; - - if (NILP (name)) - return 0; - - tem = Fbitmap_spec_p (name); - if (NILP (tem)) - wrong_type_argument (Qbitmap_spec_p, name); - - BLOCK_INPUT; - if (CONSP (name)) - { - /* Decode a bitmap spec into a bitmap. */ - - int h, w; - Lisp_Object bits; - - w = XINT (Fcar (name)); - h = XINT (Fcar (Fcdr (name))); - bits = Fcar (Fcdr (Fcdr (name))); - - bitmap_id = x_create_bitmap_from_data (f, XSTRING (bits)->data, - w, h); - } - else - { - /* It must be a string -- a file name. */ - bitmap_id = x_create_bitmap_from_file (f, name); - } - UNBLOCK_INPUT; - - if (bitmap_id < 0) - { - add_to_log ("Invalid or undefined bitmap %s", name, Qnil); - bitmap_id = 0; - - if (w_ptr) - *w_ptr = 0; - if (h_ptr) - *h_ptr = 0; - } - else - { -#if GLYPH_DEBUG - ++npixmaps_allocated; -#endif - if (w_ptr) - *w_ptr = x_bitmap_width (f, bitmap_id); - - if (h_ptr) - *h_ptr = x_bitmap_height (f, bitmap_id); - } - - return bitmap_id; -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - - -/*********************************************************************** - Minimum font bounds - ***********************************************************************/ - -#ifdef HAVE_WINDOW_SYSTEM - -/* Update the line_height of frame F. Return non-zero if line height - changes. */ - -int -frame_update_line_height (f) - struct frame *f; -{ - int line_height, changed_p; - - line_height = FONT_HEIGHT (FRAME_FONT (f)); - changed_p = line_height != FRAME_LINE_HEIGHT (f); - FRAME_LINE_HEIGHT (f) = line_height; - return changed_p; -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - -/*********************************************************************** - Fonts - ***********************************************************************/ - -#ifdef HAVE_WINDOW_SYSTEM - -/* Load font of face FACE which is used on frame F to display - character C. The name of the font to load is determined by lface - and fontset of FACE. */ - -static void -load_face_font (f, face, c) - struct frame *f; - struct face *face; - int c; -{ - struct font_info *font_info = NULL; - char *font_name; - - face->font_info_id = -1; - face->font = NULL; - - font_name = choose_face_font (f, face->lface, face->fontset, c); - if (!font_name) - return; - - BLOCK_INPUT; - font_info = FS_LOAD_FACE_FONT (f, c, font_name, face); - UNBLOCK_INPUT; - - if (font_info) - { - face->font_info_id = font_info->font_idx; - face->font = font_info->font; - face->font_name = font_info->full_name; - if (face->gc) - { - x_free_gc (f, face->gc); - face->gc = 0; - } - } - else - add_to_log ("Unable to load font %s", - build_string (font_name), Qnil); - xfree (font_name); -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - - -/*********************************************************************** - X Colors - ***********************************************************************/ - -/* A version of defined_color for non-X frames. */ - -int -tty_defined_color (f, color_name, color_def, alloc) - struct frame *f; - char *color_name; - XColor *color_def; - int alloc; -{ - Lisp_Object color_desc; - unsigned long color_idx = FACE_TTY_DEFAULT_COLOR; - unsigned long red = 0, green = 0, blue = 0; - int status = 1; - - if (*color_name && !NILP (Ffboundp (Qtty_color_desc))) - { - Lisp_Object frame; - - XSETFRAME (frame, f); - status = 0; - color_desc = call2 (Qtty_color_desc, build_string (color_name), frame); - if (CONSP (color_desc) && CONSP (XCDR (color_desc))) - { - color_idx = XINT (XCAR (XCDR (color_desc))); - if (CONSP (XCDR (XCDR (color_desc)))) - { - red = XINT (XCAR (XCDR (XCDR (color_desc)))); - green = XINT (XCAR (XCDR (XCDR (XCDR (color_desc))))); - blue = XINT (XCAR (XCDR (XCDR (XCDR (XCDR (color_desc)))))); - } - status = 1; - } - else if (NILP (Fsymbol_value (intern ("tty-defined-color-alist")))) - /* We were called early during startup, and the colors are not - yet set up in tty-defined-color-alist. Don't return a failure - indication, since this produces the annoying "Unable to - load color" messages in the *Messages* buffer. */ - status = 1; - } - if (color_idx == FACE_TTY_DEFAULT_COLOR && *color_name) - { - if (strcmp (color_name, "unspecified-fg") == 0) - color_idx = FACE_TTY_DEFAULT_FG_COLOR; - else if (strcmp (color_name, "unspecified-bg") == 0) - color_idx = FACE_TTY_DEFAULT_BG_COLOR; - } - - if (color_idx != FACE_TTY_DEFAULT_COLOR) - status = 1; - - color_def->pixel = color_idx; - color_def->red = red; - color_def->green = green; - color_def->blue = blue; - - return status; -} - - -/* Decide if color named COLOR_NAME is valid for the display - associated with the frame F; if so, return the rgb values in - COLOR_DEF. If ALLOC is nonzero, allocate a new colormap cell. - - This does the right thing for any type of frame. */ - -int -defined_color (f, color_name, color_def, alloc) - struct frame *f; - char *color_name; - XColor *color_def; - int alloc; -{ - if (!FRAME_WINDOW_P (f)) - return tty_defined_color (f, color_name, color_def, alloc); -#ifdef HAVE_X_WINDOWS - else if (FRAME_X_P (f)) - return x_defined_color (f, color_name, color_def, alloc); -#endif -#ifdef WINDOWSNT - else if (FRAME_W32_P (f)) - return w32_defined_color (f, color_name, color_def, alloc); -#endif -#ifdef macintosh - else if (FRAME_MAC_P (f)) - return mac_defined_color (f, color_name, color_def, alloc); -#endif - else - abort (); -} - - -/* Given the index IDX of a tty color on frame F, return its name, a - Lisp string. */ - -Lisp_Object -tty_color_name (f, idx) - struct frame *f; - int idx; -{ - if (idx >= 0 && !NILP (Ffboundp (Qtty_color_by_index))) - { - Lisp_Object frame; - Lisp_Object coldesc; - - XSETFRAME (frame, f); - coldesc = call2 (Qtty_color_by_index, make_number (idx), frame); - - if (!NILP (coldesc)) - return XCAR (coldesc); - } -#ifdef MSDOS - /* We can have an MSDOG frame under -nw for a short window of - opportunity before internal_terminal_init is called. DTRT. */ - if (FRAME_MSDOS_P (f) && !inhibit_window_system) - return msdos_stdcolor_name (idx); -#endif - - if (idx == FACE_TTY_DEFAULT_FG_COLOR) - return build_string (unspecified_fg); - if (idx == FACE_TTY_DEFAULT_BG_COLOR) - return build_string (unspecified_bg); - -#ifdef WINDOWSNT - return vga_stdcolor_name (idx); -#endif - - return Qunspecified; -} - - -/* Return non-zero if COLOR_NAME is a shade of gray (or white or - black) on frame F. The algorithm is taken from 20.2 faces.el. */ - -static int -face_color_gray_p (f, color_name) - struct frame *f; - char *color_name; -{ - XColor color; - int gray_p; - - if (defined_color (f, color_name, &color, 0)) - gray_p = ((abs (color.red - color.green) - < max (color.red, color.green) / 20) - && (abs (color.green - color.blue) - < max (color.green, color.blue) / 20) - && (abs (color.blue - color.red) - < max (color.blue, color.red) / 20)); - else - gray_p = 0; - - return gray_p; -} - - -/* Return non-zero if color COLOR_NAME can be displayed on frame F. - BACKGROUND_P non-zero means the color will be used as background - color. */ - -static int -face_color_supported_p (f, color_name, background_p) - struct frame *f; - char *color_name; - int background_p; -{ - Lisp_Object frame; - XColor not_used; - - XSETFRAME (frame, f); - return (FRAME_WINDOW_P (f) - ? (!NILP (Fxw_display_color_p (frame)) - || xstricmp (color_name, "black") == 0 - || xstricmp (color_name, "white") == 0 - || (background_p - && face_color_gray_p (f, color_name)) - || (!NILP (Fx_display_grayscale_p (frame)) - && face_color_gray_p (f, color_name))) - : tty_defined_color (f, color_name, ¬_used, 0)); -} - - -DEFUN ("color-gray-p", Fcolor_gray_p, Scolor_gray_p, 1, 2, 0, - "Return non-nil if COLOR is a shade of gray (or white or black).\n\ -FRAME specifies the frame and thus the display for interpreting COLOR.\n\ -If FRAME is nil or omitted, use the selected frame.") - (color, frame) - Lisp_Object color, frame; -{ - struct frame *f; - - CHECK_FRAME (frame, 0); - CHECK_STRING (color, 0); - f = XFRAME (frame); - return face_color_gray_p (f, XSTRING (color)->data) ? Qt : Qnil; -} - - -DEFUN ("color-supported-p", Fcolor_supported_p, - Scolor_supported_p, 2, 3, 0, - "Return non-nil if COLOR can be displayed on FRAME.\n\ -BACKGROUND-P non-nil means COLOR is used as a background.\n\ -If FRAME is nil or omitted, use the selected frame.\n\ -COLOR must be a valid color name.") - (color, frame, background_p) - Lisp_Object frame, color, background_p; -{ - struct frame *f; - - CHECK_FRAME (frame, 0); - CHECK_STRING (color, 0); - f = XFRAME (frame); - if (face_color_supported_p (f, XSTRING (color)->data, !NILP (background_p))) - return Qt; - return Qnil; -} - - -/* Load color with name NAME for use by face FACE on frame F. - TARGET_INDEX must be one of LFACE_FOREGROUND_INDEX, - LFACE_BACKGROUND_INDEX, LFACE_UNDERLINE_INDEX, LFACE_OVERLINE_INDEX, - LFACE_STRIKE_THROUGH_INDEX, or LFACE_BOX_INDEX. Value is the - pixel color. If color cannot be loaded, display a message, and - return the foreground, background or underline color of F, but - record that fact in flags of the face so that we don't try to free - these colors. */ - -unsigned long -load_color (f, face, name, target_index) - struct frame *f; - struct face *face; - Lisp_Object name; - enum lface_attribute_index target_index; -{ - XColor color; - - xassert (STRINGP (name)); - xassert (target_index == LFACE_FOREGROUND_INDEX - || target_index == LFACE_BACKGROUND_INDEX - || target_index == LFACE_UNDERLINE_INDEX - || target_index == LFACE_OVERLINE_INDEX - || target_index == LFACE_STRIKE_THROUGH_INDEX - || target_index == LFACE_BOX_INDEX); - - /* if the color map is full, defined_color will return a best match - to the values in an existing cell. */ - if (!defined_color (f, XSTRING (name)->data, &color, 1)) - { - add_to_log ("Unable to load color \"%s\"", name, Qnil); - - switch (target_index) - { - case LFACE_FOREGROUND_INDEX: - face->foreground_defaulted_p = 1; - color.pixel = FRAME_FOREGROUND_PIXEL (f); - break; - - case LFACE_BACKGROUND_INDEX: - face->background_defaulted_p = 1; - color.pixel = FRAME_BACKGROUND_PIXEL (f); - break; - - case LFACE_UNDERLINE_INDEX: - face->underline_defaulted_p = 1; - color.pixel = FRAME_FOREGROUND_PIXEL (f); - break; - - case LFACE_OVERLINE_INDEX: - face->overline_color_defaulted_p = 1; - color.pixel = FRAME_FOREGROUND_PIXEL (f); - break; - - case LFACE_STRIKE_THROUGH_INDEX: - face->strike_through_color_defaulted_p = 1; - color.pixel = FRAME_FOREGROUND_PIXEL (f); - break; - - case LFACE_BOX_INDEX: - face->box_color_defaulted_p = 1; - color.pixel = FRAME_FOREGROUND_PIXEL (f); - break; - - default: - abort (); - } - } -#if GLYPH_DEBUG - else - ++ncolors_allocated; -#endif - - return color.pixel; -} - - -#ifdef HAVE_WINDOW_SYSTEM - -/* Load colors for face FACE which is used on frame F. Colors are - specified by slots LFACE_BACKGROUND_INDEX and LFACE_FOREGROUND_INDEX - of ATTRS. If the background color specified is not supported on F, - try to emulate gray colors with a stipple from Vface_default_stipple. */ - -static void -load_face_colors (f, face, attrs) - struct frame *f; - struct face *face; - Lisp_Object *attrs; -{ - Lisp_Object fg, bg; - - bg = attrs[LFACE_BACKGROUND_INDEX]; - fg = attrs[LFACE_FOREGROUND_INDEX]; - - /* Swap colors if face is inverse-video. */ - if (EQ (attrs[LFACE_INVERSE_INDEX], Qt)) - { - Lisp_Object tmp; - tmp = fg; - fg = bg; - bg = tmp; - } - - /* Check for support for foreground, not for background because - face_color_supported_p is smart enough to know that grays are - "supported" as background because we are supposed to use stipple - for them. */ - if (!face_color_supported_p (f, XSTRING (bg)->data, 0) - && !NILP (Fbitmap_spec_p (Vface_default_stipple))) - { - x_destroy_bitmap (f, face->stipple); - face->stipple = load_pixmap (f, Vface_default_stipple, - &face->pixmap_w, &face->pixmap_h); - } - - face->background = load_color (f, face, bg, LFACE_BACKGROUND_INDEX); - face->foreground = load_color (f, face, fg, LFACE_FOREGROUND_INDEX); -} - - -/* Free color PIXEL on frame F. */ - -void -unload_color (f, pixel) - struct frame *f; - unsigned long pixel; -{ -#ifdef HAVE_X_WINDOWS - if (pixel != -1) - { - BLOCK_INPUT; - x_free_colors (f, &pixel, 1); - UNBLOCK_INPUT; - } -#endif -} - - -/* Free colors allocated for FACE. */ - -static void -free_face_colors (f, face) - struct frame *f; - struct face *face; -{ -#ifdef HAVE_X_WINDOWS - if (face->colors_copied_bitwise_p) - return; - - BLOCK_INPUT; - - if (!face->foreground_defaulted_p) - { - x_free_colors (f, &face->foreground, 1); - IF_DEBUG (--ncolors_allocated); - } - - if (!face->background_defaulted_p) - { - x_free_colors (f, &face->background, 1); - IF_DEBUG (--ncolors_allocated); - } - - if (face->underline_p - && !face->underline_defaulted_p) - { - x_free_colors (f, &face->underline_color, 1); - IF_DEBUG (--ncolors_allocated); - } - - if (face->overline_p - && !face->overline_color_defaulted_p) - { - x_free_colors (f, &face->overline_color, 1); - IF_DEBUG (--ncolors_allocated); - } - - if (face->strike_through_p - && !face->strike_through_color_defaulted_p) - { - x_free_colors (f, &face->strike_through_color, 1); - IF_DEBUG (--ncolors_allocated); - } - - if (face->box != FACE_NO_BOX - && !face->box_color_defaulted_p) - { - x_free_colors (f, &face->box_color, 1); - IF_DEBUG (--ncolors_allocated); - } - - UNBLOCK_INPUT; -#endif /* HAVE_X_WINDOWS */ -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - - -/*********************************************************************** - XLFD Font Names - ***********************************************************************/ - -/* An enumerator for each field of an XLFD font name. */ - -enum xlfd_field -{ - XLFD_FOUNDRY, - XLFD_FAMILY, - XLFD_WEIGHT, - XLFD_SLANT, - XLFD_SWIDTH, - XLFD_ADSTYLE, - XLFD_PIXEL_SIZE, - XLFD_POINT_SIZE, - XLFD_RESX, - XLFD_RESY, - XLFD_SPACING, - XLFD_AVGWIDTH, - XLFD_REGISTRY, - XLFD_ENCODING, - XLFD_LAST -}; - -/* An enumerator for each possible slant value of a font. Taken from - the XLFD specification. */ - -enum xlfd_slant -{ - XLFD_SLANT_UNKNOWN, - XLFD_SLANT_ROMAN, - XLFD_SLANT_ITALIC, - XLFD_SLANT_OBLIQUE, - XLFD_SLANT_REVERSE_ITALIC, - XLFD_SLANT_REVERSE_OBLIQUE, - XLFD_SLANT_OTHER -}; - -/* Relative font weight according to XLFD documentation. */ - -enum xlfd_weight -{ - XLFD_WEIGHT_UNKNOWN, - XLFD_WEIGHT_ULTRA_LIGHT, /* 10 */ - XLFD_WEIGHT_EXTRA_LIGHT, /* 20 */ - XLFD_WEIGHT_LIGHT, /* 30 */ - XLFD_WEIGHT_SEMI_LIGHT, /* 40: SemiLight, Book, ... */ - XLFD_WEIGHT_MEDIUM, /* 50: Medium, Normal, Regular, ... */ - XLFD_WEIGHT_SEMI_BOLD, /* 60: SemiBold, DemiBold, ... */ - XLFD_WEIGHT_BOLD, /* 70: Bold, ... */ - XLFD_WEIGHT_EXTRA_BOLD, /* 80: ExtraBold, Heavy, ... */ - XLFD_WEIGHT_ULTRA_BOLD /* 90: UltraBold, Black, ... */ -}; - -/* Relative proportionate width. */ - -enum xlfd_swidth -{ - XLFD_SWIDTH_UNKNOWN, - XLFD_SWIDTH_ULTRA_CONDENSED, /* 10 */ - XLFD_SWIDTH_EXTRA_CONDENSED, /* 20 */ - XLFD_SWIDTH_CONDENSED, /* 30: Condensed, Narrow, Compressed, ... */ - XLFD_SWIDTH_SEMI_CONDENSED, /* 40: semicondensed */ - XLFD_SWIDTH_MEDIUM, /* 50: Medium, Normal, Regular, ... */ - XLFD_SWIDTH_SEMI_EXPANDED, /* 60: SemiExpanded, DemiExpanded, ... */ - XLFD_SWIDTH_EXPANDED, /* 70: Expanded... */ - XLFD_SWIDTH_EXTRA_EXPANDED, /* 80: ExtraExpanded, Wide... */ - XLFD_SWIDTH_ULTRA_EXPANDED /* 90: UltraExpanded... */ -}; - -/* Structure used for tables mapping XLFD weight, slant, and width - names to numeric and symbolic values. */ - -struct table_entry -{ - char *name; - int numeric; - Lisp_Object *symbol; -}; - -/* Table of XLFD slant names and their numeric and symbolic - representations. This table must be sorted by slant names in - ascending order. */ - -static struct table_entry slant_table[] = -{ - {"i", XLFD_SLANT_ITALIC, &Qitalic}, - {"o", XLFD_SLANT_OBLIQUE, &Qoblique}, - {"ot", XLFD_SLANT_OTHER, &Qitalic}, - {"r", XLFD_SLANT_ROMAN, &Qnormal}, - {"ri", XLFD_SLANT_REVERSE_ITALIC, &Qreverse_italic}, - {"ro", XLFD_SLANT_REVERSE_OBLIQUE, &Qreverse_oblique} -}; - -/* Table of XLFD weight names. This table must be sorted by weight - names in ascending order. */ - -static struct table_entry weight_table[] = -{ - {"black", XLFD_WEIGHT_ULTRA_BOLD, &Qultra_bold}, - {"bold", XLFD_WEIGHT_BOLD, &Qbold}, - {"book", XLFD_WEIGHT_SEMI_LIGHT, &Qsemi_light}, - {"demi", XLFD_WEIGHT_SEMI_BOLD, &Qsemi_bold}, - {"demibold", XLFD_WEIGHT_SEMI_BOLD, &Qsemi_bold}, - {"extralight", XLFD_WEIGHT_EXTRA_LIGHT, &Qextra_light}, - {"extrabold", XLFD_WEIGHT_EXTRA_BOLD, &Qextra_bold}, - {"heavy", XLFD_WEIGHT_EXTRA_BOLD, &Qextra_bold}, - {"light", XLFD_WEIGHT_LIGHT, &Qlight}, - {"medium", XLFD_WEIGHT_MEDIUM, &Qnormal}, - {"normal", XLFD_WEIGHT_MEDIUM, &Qnormal}, - {"regular", XLFD_WEIGHT_MEDIUM, &Qnormal}, - {"semibold", XLFD_WEIGHT_SEMI_BOLD, &Qsemi_bold}, - {"semilight", XLFD_WEIGHT_SEMI_LIGHT, &Qsemi_light}, - {"ultralight", XLFD_WEIGHT_ULTRA_LIGHT, &Qultra_light}, - {"ultrabold", XLFD_WEIGHT_ULTRA_BOLD, &Qultra_bold} -}; - -/* Table of XLFD width names. This table must be sorted by width - names in ascending order. */ - -static struct table_entry swidth_table[] = -{ - {"compressed", XLFD_SWIDTH_CONDENSED, &Qcondensed}, - {"condensed", XLFD_SWIDTH_CONDENSED, &Qcondensed}, - {"demiexpanded", XLFD_SWIDTH_SEMI_EXPANDED, &Qsemi_expanded}, - {"expanded", XLFD_SWIDTH_EXPANDED, &Qexpanded}, - {"extracondensed", XLFD_SWIDTH_EXTRA_CONDENSED, &Qextra_condensed}, - {"extraexpanded", XLFD_SWIDTH_EXTRA_EXPANDED, &Qextra_expanded}, - {"medium", XLFD_SWIDTH_MEDIUM, &Qnormal}, - {"narrow", XLFD_SWIDTH_CONDENSED, &Qcondensed}, - {"normal", XLFD_SWIDTH_MEDIUM, &Qnormal}, - {"regular", XLFD_SWIDTH_MEDIUM, &Qnormal}, - {"semicondensed", XLFD_SWIDTH_SEMI_CONDENSED, &Qsemi_condensed}, - {"semiexpanded", XLFD_SWIDTH_SEMI_EXPANDED, &Qsemi_expanded}, - {"ultracondensed", XLFD_SWIDTH_ULTRA_CONDENSED, &Qultra_condensed}, - {"ultraexpanded", XLFD_SWIDTH_ULTRA_EXPANDED, &Qultra_expanded}, - {"wide", XLFD_SWIDTH_EXTRA_EXPANDED, &Qextra_expanded} -}; - -/* Structure used to hold the result of splitting font names in XLFD - format into their fields. */ - -struct font_name -{ - /* The original name which is modified destructively by - split_font_name. The pointer is kept here to be able to free it - if it was allocated from the heap. */ - char *name; - - /* Font name fields. Each vector element points into `name' above. - Fields are NUL-terminated. */ - char *fields[XLFD_LAST]; - - /* Numeric values for those fields that interest us. See - split_font_name for which these are. */ - int numeric[XLFD_LAST]; - - /* Lower value mean higher priority. */ - int registry_priority; -}; - -/* The frame in effect when sorting font names. Set temporarily in - sort_fonts so that it is available in font comparison functions. */ - -static struct frame *font_frame; - -/* Order by which font selection chooses fonts. The default values - mean `first, find a best match for the font width, then for the - font height, then for weight, then for slant.' This variable can be - set via set-face-font-sort-order. */ - -#ifdef macintosh -static int font_sort_order[4] = { - XLFD_SWIDTH, XLFD_POINT_SIZE, XLFD_WEIGHT, XLFD_SLANT -}; -#else -static int font_sort_order[4]; -#endif - -/* Look up FONT.fields[FIELD_INDEX] in TABLE which has DIM entries. - TABLE must be sorted by TABLE[i]->name in ascending order. Value - is a pointer to the matching table entry or null if no table entry - matches. */ - -static struct table_entry * -xlfd_lookup_field_contents (table, dim, font, field_index) - struct table_entry *table; - int dim; - struct font_name *font; - int field_index; -{ - /* Function split_font_name converts fields to lower-case, so there - is no need to use xstrlwr or xstricmp here. */ - char *s = font->fields[field_index]; - int low, mid, high, cmp; - - low = 0; - high = dim - 1; - - while (low <= high) - { - mid = (low + high) / 2; - cmp = strcmp (table[mid].name, s); - - if (cmp < 0) - low = mid + 1; - else if (cmp > 0) - high = mid - 1; - else - return table + mid; - } - - return NULL; -} - - -/* Return a numeric representation for font name field - FONT.fields[FIELD_INDEX]. The field is looked up in TABLE which - has DIM entries. Value is the numeric value found or DFLT if no - table entry matches. This function is used to translate weight, - slant, and swidth names of XLFD font names to numeric values. */ - -static INLINE int -xlfd_numeric_value (table, dim, font, field_index, dflt) - struct table_entry *table; - int dim; - struct font_name *font; - int field_index; - int dflt; -{ - struct table_entry *p; - p = xlfd_lookup_field_contents (table, dim, font, field_index); - return p ? p->numeric : dflt; -} - - -/* Return a symbolic representation for font name field - FONT.fields[FIELD_INDEX]. The field is looked up in TABLE which - has DIM entries. Value is the symbolic value found or DFLT if no - table entry matches. This function is used to translate weight, - slant, and swidth names of XLFD font names to symbols. */ - -static INLINE Lisp_Object -xlfd_symbolic_value (table, dim, font, field_index, dflt) - struct table_entry *table; - int dim; - struct font_name *font; - int field_index; - Lisp_Object dflt; -{ - struct table_entry *p; - p = xlfd_lookup_field_contents (table, dim, font, field_index); - return p ? *p->symbol : dflt; -} - - -/* Return a numeric value for the slant of the font given by FONT. */ - -static INLINE int -xlfd_numeric_slant (font) - struct font_name *font; -{ - return xlfd_numeric_value (slant_table, DIM (slant_table), - font, XLFD_SLANT, XLFD_SLANT_ROMAN); -} - - -/* Return a symbol representing the weight of the font given by FONT. */ - -static INLINE Lisp_Object -xlfd_symbolic_slant (font) - struct font_name *font; -{ - return xlfd_symbolic_value (slant_table, DIM (slant_table), - font, XLFD_SLANT, Qnormal); -} - - -/* Return a numeric value for the weight of the font given by FONT. */ - -static INLINE int -xlfd_numeric_weight (font) - struct font_name *font; -{ - return xlfd_numeric_value (weight_table, DIM (weight_table), - font, XLFD_WEIGHT, XLFD_WEIGHT_MEDIUM); -} - - -/* Return a symbol representing the slant of the font given by FONT. */ - -static INLINE Lisp_Object -xlfd_symbolic_weight (font) - struct font_name *font; -{ - return xlfd_symbolic_value (weight_table, DIM (weight_table), - font, XLFD_WEIGHT, Qnormal); -} - - -/* Return a numeric value for the swidth of the font whose XLFD font - name fields are found in FONT. */ - -static INLINE int -xlfd_numeric_swidth (font) - struct font_name *font; -{ - return xlfd_numeric_value (swidth_table, DIM (swidth_table), - font, XLFD_SWIDTH, XLFD_SWIDTH_MEDIUM); -} - - -/* Return a symbolic value for the swidth of FONT. */ - -static INLINE Lisp_Object -xlfd_symbolic_swidth (font) - struct font_name *font; -{ - return xlfd_symbolic_value (swidth_table, DIM (swidth_table), - font, XLFD_SWIDTH, Qnormal); -} - - -/* Look up the entry of SYMBOL in the vector TABLE which has DIM - entries. Value is a pointer to the matching table entry or null if - no element of TABLE contains SYMBOL. */ - -static struct table_entry * -face_value (table, dim, symbol) - struct table_entry *table; - int dim; - Lisp_Object symbol; -{ - int i; - - xassert (SYMBOLP (symbol)); - - for (i = 0; i < dim; ++i) - if (EQ (*table[i].symbol, symbol)) - break; - - return i < dim ? table + i : NULL; -} - - -/* Return a numeric value for SYMBOL in the vector TABLE which has DIM - entries. Value is -1 if SYMBOL is not found in TABLE. */ - -static INLINE int -face_numeric_value (table, dim, symbol) - struct table_entry *table; - int dim; - Lisp_Object symbol; -{ - struct table_entry *p = face_value (table, dim, symbol); - return p ? p->numeric : -1; -} - - -/* Return a numeric value representing the weight specified by Lisp - symbol WEIGHT. Value is one of the enumerators of enum - xlfd_weight. */ - -static INLINE int -face_numeric_weight (weight) - Lisp_Object weight; -{ - return face_numeric_value (weight_table, DIM (weight_table), weight); -} - - -/* Return a numeric value representing the slant specified by Lisp - symbol SLANT. Value is one of the enumerators of enum xlfd_slant. */ - -static INLINE int -face_numeric_slant (slant) - Lisp_Object slant; -{ - return face_numeric_value (slant_table, DIM (slant_table), slant); -} - - -/* Return a numeric value representing the swidth specified by Lisp - symbol WIDTH. Value is one of the enumerators of enum xlfd_swidth. */ - -static int -face_numeric_swidth (width) - Lisp_Object width; -{ - return face_numeric_value (swidth_table, DIM (swidth_table), width); -} - - -#ifdef HAVE_WINDOW_SYSTEM - -/* Return non-zero if FONT is the name of a fixed-pitch font. */ - -static INLINE int -xlfd_fixed_p (font) - struct font_name *font; -{ - /* Function split_font_name converts fields to lower-case, so there - is no need to use tolower here. */ - return *font->fields[XLFD_SPACING] != 'p'; -} - - -/* Return the point size of FONT on frame F, measured in 1/10 pt. - - The actual height of the font when displayed on F depends on the - resolution of both the font and frame. For example, a 10pt font - designed for a 100dpi display will display larger than 10pt on a - 75dpi display. (It's not unusual to use fonts not designed for the - display one is using. For example, some intlfonts are available in - 72dpi versions, only.) - - Value is the real point size of FONT on frame F, or 0 if it cannot - be determined. */ - -static INLINE int -xlfd_point_size (f, font) - struct frame *f; - struct font_name *font; -{ - double resy = FRAME_X_DISPLAY_INFO (f)->resy; - char *pixel_field = font->fields[XLFD_PIXEL_SIZE]; - double pixel; - int real_pt; - - if (*pixel_field == '[') - { - /* The pixel size field is `[A B C D]' which specifies - a transformation matrix. - - A B 0 - C D 0 - 0 0 1 - - by which all glyphs of the font are transformed. The spec - says that s scalar value N for the pixel size is equivalent - to A = N * resx/resy, B = C = 0, D = N. */ - char *start = pixel_field + 1, *end; - double matrix[4]; - int i; - - for (i = 0; i < 4; ++i) - { - matrix[i] = strtod (start, &end); - start = end; - } - - pixel = matrix[3]; - } - else - pixel = atoi (pixel_field); - - if (pixel == 0) - real_pt = 0; - else - real_pt = PT_PER_INCH * 10.0 * pixel / resy + 0.5; - - return real_pt; -} - - -/* Return point size of PIXEL dots while considering Y-resultion (DPI) - of frame F. This function is used to guess a point size of font - when only the pixel height of the font is available. */ - -static INLINE int -pixel_point_size (f, pixel) - struct frame *f; - int pixel; -{ - double resy = FRAME_X_DISPLAY_INFO (f)->resy; - double real_pt; - int int_pt; - - /* As one inch is PT_PER_INCH points, PT_PER_INCH/RESY gives the - point size of one dot. */ - real_pt = pixel * PT_PER_INCH / resy; - int_pt = real_pt + 0.5; - - return int_pt; -} - - -/* Split XLFD font name FONT->name destructively into NUL-terminated, - lower-case fields in FONT->fields. NUMERIC_P non-zero means - compute numeric values for fields XLFD_POINT_SIZE, XLFD_SWIDTH, - XLFD_RESY, XLFD_SLANT, and XLFD_WEIGHT in FONT->numeric. Value is - zero if the font name doesn't have the format we expect. The - expected format is a font name that starts with a `-' and has - XLFD_LAST fields separated by `-'. */ - -static int -split_font_name (f, font, numeric_p) - struct frame *f; - struct font_name *font; - int numeric_p; -{ - int i = 0; - int success_p; - - if (*font->name == '-') - { - char *p = xstrlwr (font->name) + 1; - - while (i < XLFD_LAST) - { - font->fields[i] = p; - ++i; - - /* Pixel and point size may be of the form `[....]'. For - BNF, see XLFD spec, chapter 4. Negative values are - indicated by tilde characters which we replace with - `-' characters, here. */ - if (*p == '[' - && (i - 1 == XLFD_PIXEL_SIZE - || i - 1 == XLFD_POINT_SIZE)) - { - char *start, *end; - int j; - - for (++p; *p && *p != ']'; ++p) - if (*p == '~') - *p = '-'; - - /* Check that the matrix contains 4 floating point - numbers. */ - for (j = 0, start = font->fields[i - 1] + 1; - j < 4; - ++j, start = end) - if (strtod (start, &end) == 0 && start == end) - break; - - if (j < 4) - break; - } - - while (*p && *p != '-') - ++p; - - if (*p != '-') - break; - - *p++ = 0; - } - } - - success_p = i == XLFD_LAST; - - /* If requested, and font name was in the expected format, - compute numeric values for some fields. */ - if (numeric_p && success_p) - { - font->numeric[XLFD_POINT_SIZE] = xlfd_point_size (f, font); - font->numeric[XLFD_RESY] = atoi (font->fields[XLFD_RESY]); - font->numeric[XLFD_SLANT] = xlfd_numeric_slant (font); - font->numeric[XLFD_WEIGHT] = xlfd_numeric_weight (font); - font->numeric[XLFD_SWIDTH] = xlfd_numeric_swidth (font); - font->numeric[XLFD_AVGWIDTH] = atoi (font->fields[XLFD_AVGWIDTH]); - } - - /* Initialize it to zero. It will be overridden by font_list while - trying alternate registries. */ - font->registry_priority = 0; - - return success_p; -} - - -/* Build an XLFD font name from font name fields in FONT. Value is a - pointer to the font name, which is allocated via xmalloc. */ - -static char * -build_font_name (font) - struct font_name *font; -{ - int i; - int size = 100; - char *font_name = (char *) xmalloc (size); - int total_length = 0; - - for (i = 0; i < XLFD_LAST; ++i) - { - /* Add 1 because of the leading `-'. */ - int len = strlen (font->fields[i]) + 1; - - /* Reallocate font_name if necessary. Add 1 for the final - NUL-byte. */ - if (total_length + len + 1 >= size) - { - int new_size = max (2 * size, size + len + 1); - int sz = new_size * sizeof *font_name; - font_name = (char *) xrealloc (font_name, sz); - size = new_size; - } - - font_name[total_length] = '-'; - bcopy (font->fields[i], font_name + total_length + 1, len - 1); - total_length += len; - } - - font_name[total_length] = 0; - return font_name; -} - - -/* Free an array FONTS of N font_name structures. This frees FONTS - itself and all `name' fields in its elements. */ - -static INLINE void -free_font_names (fonts, n) - struct font_name *fonts; - int n; -{ - while (n) - xfree (fonts[--n].name); - xfree (fonts); -} - - -/* Sort vector FONTS of font_name structures which contains NFONTS - elements using qsort and comparison function CMPFN. F is the frame - on which the fonts will be used. The global variable font_frame - is temporarily set to F to make it available in CMPFN. */ - -static INLINE void -sort_fonts (f, fonts, nfonts, cmpfn) - struct frame *f; - struct font_name *fonts; - int nfonts; - int (*cmpfn) P_ ((const void *, const void *)); -{ - font_frame = f; - qsort (fonts, nfonts, sizeof *fonts, cmpfn); - font_frame = NULL; -} - - -/* Get fonts matching PATTERN on frame F. If F is null, use the first - display in x_display_list. FONTS is a pointer to a vector of - NFONTS font_name structures. TRY_ALTERNATIVES_P non-zero means try - alternative patterns from Valternate_fontname_alist if no fonts are - found matching PATTERN. - - For all fonts found, set FONTS[i].name to the name of the font, - allocated via xmalloc, and split font names into fields. Ignore - fonts that we can't parse. Value is the number of fonts found. */ - -static int -x_face_list_fonts (f, pattern, fonts, nfonts, try_alternatives_p) - struct frame *f; - char *pattern; - struct font_name *fonts; - int nfonts, try_alternatives_p; -{ - int n, nignored; - - /* NTEMACS_TODO : currently this uses w32_list_fonts, but it may be - better to do it the other way around. */ - Lisp_Object lfonts; - Lisp_Object lpattern, tem; - - lpattern = build_string (pattern); - - /* Get the list of fonts matching PATTERN. */ -#ifdef WINDOWSNT - BLOCK_INPUT; - lfonts = w32_list_fonts (f, lpattern, 0, nfonts); - UNBLOCK_INPUT; -#else - lfonts = x_list_fonts (f, lpattern, -1, nfonts); -#endif - - /* Make a copy of the font names we got from X, and - split them into fields. */ - n = nignored = 0; - for (tem = lfonts; CONSP (tem) && n < nfonts; tem = XCDR (tem)) - { - Lisp_Object elt, tail; - char *name = XSTRING (XCAR (tem))->data; - - /* Ignore fonts matching a pattern from face-ignored-fonts. */ - for (tail = Vface_ignored_fonts; CONSP (tail); tail = XCDR (tail)) - { - elt = XCAR (tail); - if (STRINGP (elt) - && fast_c_string_match_ignore_case (elt, name) >= 0) - break; - } - if (!NILP (tail)) - { - ++nignored; - continue; - } - - /* Make a copy of the font name. */ - fonts[n].name = xstrdup (name); - - if (split_font_name (f, fonts + n, 1)) - { - if (font_scalable_p (fonts + n) - && !may_use_scalable_font_p (name)) - { - ++nignored; - xfree (fonts[n].name); - } - else - ++n; - } - else - xfree (fonts[n].name); - } - - /* If no fonts found, try patterns from Valternate_fontname_alist. */ - if (n == 0 && try_alternatives_p) - { - Lisp_Object list = Valternate_fontname_alist; - - while (CONSP (list)) - { - Lisp_Object entry = XCAR (list); - if (CONSP (entry) - && STRINGP (XCAR (entry)) - && strcmp (XSTRING (XCAR (entry))->data, pattern) == 0) - break; - list = XCDR (list); - } - - if (CONSP (list)) - { - Lisp_Object patterns = XCAR (list); - Lisp_Object name; - - while (CONSP (patterns) - /* If list is screwed up, give up. */ - && (name = XCAR (patterns), - STRINGP (name)) - /* Ignore patterns equal to PATTERN because we tried that - already with no success. */ - && (strcmp (XSTRING (name)->data, pattern) == 0 - || (n = x_face_list_fonts (f, XSTRING (name)->data, - fonts, nfonts, 0), - n == 0))) - patterns = XCDR (patterns); - } - } - - return n; -} - - -/* Determine fonts matching PATTERN on frame F. Sort resulting fonts - using comparison function CMPFN. Value is the number of fonts - found. If value is non-zero, *FONTS is set to a vector of - font_name structures allocated from the heap containing matching - fonts. Each element of *FONTS contains a name member that is also - allocated from the heap. Font names in these structures are split - into fields. Use free_font_names to free such an array. */ - -static int -sorted_font_list (f, pattern, cmpfn, fonts) - struct frame *f; - char *pattern; - int (*cmpfn) P_ ((const void *, const void *)); - struct font_name **fonts; -{ - int nfonts; - - /* Get the list of fonts matching pattern. 100 should suffice. */ - nfonts = DEFAULT_FONT_LIST_LIMIT; - if (INTEGERP (Vfont_list_limit) && XINT (Vfont_list_limit) > 0) - nfonts = XFASTINT (Vfont_list_limit); - - *fonts = (struct font_name *) xmalloc (nfonts * sizeof **fonts); - nfonts = x_face_list_fonts (f, pattern, *fonts, nfonts, 1); - - /* Sort the resulting array and return it in *FONTS. If no - fonts were found, make sure to set *FONTS to null. */ - if (nfonts) - sort_fonts (f, *fonts, nfonts, cmpfn); - else - { - xfree (*fonts); - *fonts = NULL; - } - - return nfonts; -} - - -/* Compare two font_name structures *A and *B. Value is analogous to - strcmp. Sort order is given by the global variable - font_sort_order. Font names are sorted so that, everything else - being equal, fonts with a resolution closer to that of the frame on - which they are used are listed first. The global variable - font_frame is the frame on which we operate. */ - -static int -cmp_font_names (a, b) - const void *a, *b; -{ - struct font_name *x = (struct font_name *) a; - struct font_name *y = (struct font_name *) b; - int cmp; - - /* All strings have been converted to lower-case by split_font_name, - so we can use strcmp here. */ - cmp = strcmp (x->fields[XLFD_FAMILY], y->fields[XLFD_FAMILY]); - if (cmp == 0) - { - int i; - - for (i = 0; i < DIM (font_sort_order) && cmp == 0; ++i) - { - int j = font_sort_order[i]; - cmp = x->numeric[j] - y->numeric[j]; - } - - if (cmp == 0) - { - /* Everything else being equal, we prefer fonts with an - y-resolution closer to that of the frame. */ - int resy = FRAME_X_DISPLAY_INFO (font_frame)->resy; - int x_resy = x->numeric[XLFD_RESY]; - int y_resy = y->numeric[XLFD_RESY]; - cmp = abs (resy - x_resy) - abs (resy - y_resy); - } - } - - return cmp; -} - - -/* Get a sorted list of fonts of family FAMILY on frame F. If PATTERN - is non-nil list fonts matching that pattern. Otherwise, if - REGISTRY is non-nil return only fonts with that registry, otherwise - return fonts of any registry. Set *FONTS to a vector of font_name - structures allocated from the heap containing the fonts found. - Value is the number of fonts found. */ - -static int -font_list_1 (f, pattern, family, registry, fonts) - struct frame *f; - Lisp_Object pattern, family, registry; - struct font_name **fonts; -{ - char *pattern_str, *family_str, *registry_str; - - if (NILP (pattern)) - { - family_str = (NILP (family) ? "*" : (char *) XSTRING (family)->data); - registry_str = (NILP (registry) ? "*" : (char *) XSTRING (registry)->data); - - pattern_str = (char *) alloca (strlen (family_str) - + strlen (registry_str) - + 10); - strcpy (pattern_str, index (family_str, '-') ? "-" : "-*-"); - strcat (pattern_str, family_str); - strcat (pattern_str, "-*-"); - strcat (pattern_str, registry_str); - if (!index (registry_str, '-')) - { - if (registry_str[strlen (registry_str) - 1] == '*') - strcat (pattern_str, "-*"); - else - strcat (pattern_str, "*-*"); - } - } - else - pattern_str = (char *) XSTRING (pattern)->data; - - return sorted_font_list (f, pattern_str, cmp_font_names, fonts); -} - - -/* Concatenate font list FONTS1 and FONTS2. FONTS1 and FONTS2 - contains NFONTS1 fonts and NFONTS2 fonts respectively. Return a - pointer to a newly allocated font list. FONTS1 and FONTS2 are - freed. */ - -static struct font_name * -concat_font_list (fonts1, nfonts1, fonts2, nfonts2) - struct font_name *fonts1, *fonts2; - int nfonts1, nfonts2; -{ - int new_nfonts = nfonts1 + nfonts2; - struct font_name *new_fonts; - - new_fonts = (struct font_name *) xmalloc (sizeof *new_fonts * new_nfonts); - bcopy (fonts1, new_fonts, sizeof *new_fonts * nfonts1); - bcopy (fonts2, new_fonts + nfonts1, sizeof *new_fonts * nfonts2); - xfree (fonts1); - xfree (fonts2); - return new_fonts; -} - - -/* Get a sorted list of fonts of family FAMILY on frame F. - - If PATTERN is non-nil list fonts matching that pattern. - - If REGISTRY is non-nil, return fonts with that registry and the - alternative registries from Vface_alternative_font_registry_alist. - - If REGISTRY is nil return fonts of any registry. - - Set *FONTS to a vector of font_name structures allocated from the - heap containing the fonts found. Value is the number of fonts - found. */ - -static int -font_list (f, pattern, family, registry, fonts) - struct frame *f; - Lisp_Object pattern, family, registry; - struct font_name **fonts; -{ - int nfonts = font_list_1 (f, pattern, family, registry, fonts); - - if (!NILP (registry) - && CONSP (Vface_alternative_font_registry_alist)) - { - Lisp_Object alter; - - alter = Fassoc (registry, Vface_alternative_font_registry_alist); - if (CONSP (alter)) - { - int reg_prio, i; - - for (alter = XCDR (alter), reg_prio = 1; - CONSP (alter); - alter = XCDR (alter), reg_prio++) - if (STRINGP (XCAR (alter))) - { - int nfonts2; - struct font_name *fonts2; - - nfonts2 = font_list_1 (f, pattern, family, XCAR (alter), - &fonts2); - for (i = 0; i < nfonts2; i++) - fonts2[i].registry_priority = reg_prio; - *fonts = (nfonts > 0 - ? concat_font_list (*fonts, nfonts, fonts2, nfonts2) - : fonts2); - nfonts += nfonts2; - } - } - } - - return nfonts; -} - - -/* Remove elements from LIST whose cars are `equal'. Called from - x-family-fonts and x-font-family-list to remove duplicate font - entries. */ - -static void -remove_duplicates (list) - Lisp_Object list; -{ - Lisp_Object tail = list; - - while (!NILP (tail) && !NILP (XCDR (tail))) - { - Lisp_Object next = XCDR (tail); - if (!NILP (Fequal (XCAR (next), XCAR (tail)))) - XCDR (tail) = XCDR (next); - else - tail = XCDR (tail); - } -} - - -DEFUN ("x-family-fonts", Fx_family_fonts, Sx_family_fonts, 0, 2, 0, - "Return a list of available fonts of family FAMILY on FRAME.\n\ -If FAMILY is omitted or nil, list all families.\n\ -Otherwise, FAMILY must be a string, possibly containing wildcards\n\ -`?' and `*'.\n\ -If FRAME is omitted or nil, use the selected frame.\n\ -Each element of the result is a vector [FAMILY WIDTH POINT-SIZE WEIGHT\n\ -SLANT FIXED-P FULL REGISTRY-AND-ENCODING].\n\ -FAMILY is the font family name. POINT-SIZE is the size of the\n\ -font in 1/10 pt. WIDTH, WEIGHT, and SLANT are symbols describing the\n\ -width, weight and slant of the font. These symbols are the same as for\n\ -face attributes. FIXED-P is non-nil if the font is fixed-pitch.\n\ -FULL is the full name of the font, and REGISTRY-AND-ENCODING is a string\n\ -giving the registry and encoding of the font.\n\ -The result list is sorted according to the current setting of\n\ -the face font sort order.") - (family, frame) - Lisp_Object family, frame; -{ - struct frame *f = check_x_frame (frame); - struct font_name *fonts; - int i, nfonts; - Lisp_Object result; - struct gcpro gcpro1; - - if (!NILP (family)) - CHECK_STRING (family, 1); - - result = Qnil; - GCPRO1 (result); - nfonts = font_list (f, Qnil, family, Qnil, &fonts); - for (i = nfonts - 1; i >= 0; --i) - { - Lisp_Object v = Fmake_vector (make_number (8), Qnil); - char *tem; - - ASET (v, 0, build_string (fonts[i].fields[XLFD_FAMILY])); - ASET (v, 1, xlfd_symbolic_swidth (fonts + i)); - ASET (v, 2, make_number (xlfd_point_size (f, fonts + i))); - ASET (v, 3, xlfd_symbolic_weight (fonts + i)); - ASET (v, 4, xlfd_symbolic_slant (fonts + i)); - ASET (v, 5, xlfd_fixed_p (fonts + i) ? Qt : Qnil); - tem = build_font_name (fonts + i); - ASET (v, 6, build_string (tem)); - sprintf (tem, "%s-%s", fonts[i].fields[XLFD_REGISTRY], - fonts[i].fields[XLFD_ENCODING]); - ASET (v, 7, build_string (tem)); - xfree (tem); - - result = Fcons (v, result); - } - - remove_duplicates (result); - free_font_names (fonts, nfonts); - UNGCPRO; - return result; -} - - -DEFUN ("x-font-family-list", Fx_font_family_list, Sx_font_family_list, - 0, 1, 0, - "Return a list of available font families on FRAME.\n\ -If FRAME is omitted or nil, use the selected frame.\n\ -Value is a list of conses (FAMILY . FIXED-P) where FAMILY\n\ -is a font family, and FIXED-P is non-nil if fonts of that family\n\ -are fixed-pitch.") - (frame) - Lisp_Object frame; -{ - struct frame *f = check_x_frame (frame); - int nfonts, i; - struct font_name *fonts; - Lisp_Object result; - struct gcpro gcpro1; - int count = specpdl_ptr - specpdl; - int limit; - - /* Let's consider all fonts. Increase the limit for matching - fonts until we have them all. */ - for (limit = 500;;) - { - specbind (intern ("font-list-limit"), make_number (limit)); - nfonts = font_list (f, Qnil, Qnil, Qnil, &fonts); - - if (nfonts == limit) - { - free_font_names (fonts, nfonts); - limit *= 2; - } - else - break; - } - - result = Qnil; - GCPRO1 (result); - for (i = nfonts - 1; i >= 0; --i) - result = Fcons (Fcons (build_string (fonts[i].fields[XLFD_FAMILY]), - xlfd_fixed_p (fonts + i) ? Qt : Qnil), - result); - - remove_duplicates (result); - free_font_names (fonts, nfonts); - UNGCPRO; - return unbind_to (count, result); -} - - -DEFUN ("x-list-fonts", Fx_list_fonts, Sx_list_fonts, 1, 5, 0, - "Return a list of the names of available fonts matching PATTERN.\n\ -If optional arguments FACE and FRAME are specified, return only fonts\n\ -the same size as FACE on FRAME.\n\ -PATTERN is a string, perhaps with wildcard characters;\n\ - the * character matches any substring, and\n\ - the ? character matches any single character.\n\ - PATTERN is case-insensitive.\n\ -FACE is a face name--a symbol.\n\ -\n\ -The return value is a list of strings, suitable as arguments to\n\ -set-face-font.\n\ -\n\ -Fonts Emacs can't use may or may not be excluded\n\ -even if they match PATTERN and FACE.\n\ -The optional fourth argument MAXIMUM sets a limit on how many\n\ -fonts to match. The first MAXIMUM fonts are reported.\n\ -The optional fifth argument WIDTH, if specified, is a number of columns\n\ -occupied by a character of a font. In that case, return only fonts\n\ -the WIDTH times as wide as FACE on FRAME.") - (pattern, face, frame, maximum, width) - Lisp_Object pattern, face, frame, maximum, width; -{ - struct frame *f; - int size; - int maxnames; - - check_x (); - CHECK_STRING (pattern, 0); - - if (NILP (maximum)) - maxnames = 2000; - else - { - CHECK_NATNUM (maximum, 0); - maxnames = XINT (maximum); - } - - if (!NILP (width)) - CHECK_NUMBER (width, 4); - - /* We can't simply call check_x_frame because this function may be - called before any frame is created. */ - f = frame_or_selected_frame (frame, 2); - if (!FRAME_WINDOW_P (f)) - { - /* Perhaps we have not yet created any frame. */ - f = NULL; - face = Qnil; - } - - /* Determine the width standard for comparison with the fonts we find. */ - - if (NILP (face)) - size = 0; - else - { - /* This is of limited utility since it works with character - widths. Keep it for compatibility. --gerd. */ - int face_id = lookup_named_face (f, face, 0); - struct face *face = (face_id < 0 - ? NULL - : FACE_FROM_ID (f, face_id)); - - if (face && face->font) - size = FONT_WIDTH (face->font); - else - size = FONT_WIDTH (FRAME_FONT (f)); - - if (!NILP (width)) - size *= XINT (width); - } - - { - Lisp_Object args[2]; - - args[0] = x_list_fonts (f, pattern, size, maxnames); - if (f == NULL) - /* We don't have to check fontsets. */ - return args[0]; - args[1] = list_fontsets (f, pattern, size); - return Fnconc (2, args); - } -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - - -/*********************************************************************** - Lisp Faces - ***********************************************************************/ - -/* Access face attributes of face LFACE, a Lisp vector. */ - -#define LFACE_FAMILY(LFACE) AREF ((LFACE), LFACE_FAMILY_INDEX) -#define LFACE_HEIGHT(LFACE) AREF ((LFACE), LFACE_HEIGHT_INDEX) -#define LFACE_WEIGHT(LFACE) AREF ((LFACE), LFACE_WEIGHT_INDEX) -#define LFACE_SLANT(LFACE) AREF ((LFACE), LFACE_SLANT_INDEX) -#define LFACE_UNDERLINE(LFACE) AREF ((LFACE), LFACE_UNDERLINE_INDEX) -#define LFACE_INVERSE(LFACE) AREF ((LFACE), LFACE_INVERSE_INDEX) -#define LFACE_FOREGROUND(LFACE) AREF ((LFACE), LFACE_FOREGROUND_INDEX) -#define LFACE_BACKGROUND(LFACE) AREF ((LFACE), LFACE_BACKGROUND_INDEX) -#define LFACE_STIPPLE(LFACE) AREF ((LFACE), LFACE_STIPPLE_INDEX) -#define LFACE_SWIDTH(LFACE) AREF ((LFACE), LFACE_SWIDTH_INDEX) -#define LFACE_OVERLINE(LFACE) AREF ((LFACE), LFACE_OVERLINE_INDEX) -#define LFACE_STRIKE_THROUGH(LFACE) AREF ((LFACE), LFACE_STRIKE_THROUGH_INDEX) -#define LFACE_BOX(LFACE) AREF ((LFACE), LFACE_BOX_INDEX) -#define LFACE_FONT(LFACE) AREF ((LFACE), LFACE_FONT_INDEX) -#define LFACE_INHERIT(LFACE) AREF ((LFACE), LFACE_INHERIT_INDEX) -#define LFACE_AVGWIDTH(LFACE) AREF ((LFACE), LFACE_AVGWIDTH_INDEX) - -/* Non-zero if LFACE is a Lisp face. A Lisp face is a vector of size - LFACE_VECTOR_SIZE which has the symbol `face' in slot 0. */ - -#define LFACEP(LFACE) \ - (VECTORP (LFACE) \ - && XVECTOR (LFACE)->size == LFACE_VECTOR_SIZE \ - && EQ (AREF (LFACE, 0), Qface)) - - -#if GLYPH_DEBUG - -/* Check consistency of Lisp face attribute vector ATTRS. */ - -static void -check_lface_attrs (attrs) - Lisp_Object *attrs; -{ - xassert (UNSPECIFIEDP (attrs[LFACE_FAMILY_INDEX]) - || STRINGP (attrs[LFACE_FAMILY_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_SWIDTH_INDEX]) - || SYMBOLP (attrs[LFACE_SWIDTH_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_AVGWIDTH_INDEX]) - || INTEGERP (attrs[LFACE_AVGWIDTH_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_HEIGHT_INDEX]) - || INTEGERP (attrs[LFACE_HEIGHT_INDEX]) - || FLOATP (attrs[LFACE_HEIGHT_INDEX]) - || FUNCTIONP (attrs[LFACE_HEIGHT_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_WEIGHT_INDEX]) - || SYMBOLP (attrs[LFACE_WEIGHT_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_SLANT_INDEX]) - || SYMBOLP (attrs[LFACE_SLANT_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_UNDERLINE_INDEX]) - || SYMBOLP (attrs[LFACE_UNDERLINE_INDEX]) - || STRINGP (attrs[LFACE_UNDERLINE_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_OVERLINE_INDEX]) - || SYMBOLP (attrs[LFACE_OVERLINE_INDEX]) - || STRINGP (attrs[LFACE_OVERLINE_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_STRIKE_THROUGH_INDEX]) - || SYMBOLP (attrs[LFACE_STRIKE_THROUGH_INDEX]) - || STRINGP (attrs[LFACE_STRIKE_THROUGH_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_BOX_INDEX]) - || SYMBOLP (attrs[LFACE_BOX_INDEX]) - || STRINGP (attrs[LFACE_BOX_INDEX]) - || INTEGERP (attrs[LFACE_BOX_INDEX]) - || CONSP (attrs[LFACE_BOX_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_INVERSE_INDEX]) - || SYMBOLP (attrs[LFACE_INVERSE_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_FOREGROUND_INDEX]) - || STRINGP (attrs[LFACE_FOREGROUND_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_BACKGROUND_INDEX]) - || STRINGP (attrs[LFACE_BACKGROUND_INDEX])); - xassert (UNSPECIFIEDP (attrs[LFACE_INHERIT_INDEX]) - || NILP (attrs[LFACE_INHERIT_INDEX]) - || SYMBOLP (attrs[LFACE_INHERIT_INDEX]) - || CONSP (attrs[LFACE_INHERIT_INDEX])); -#ifdef HAVE_WINDOW_SYSTEM - xassert (UNSPECIFIEDP (attrs[LFACE_STIPPLE_INDEX]) - || SYMBOLP (attrs[LFACE_STIPPLE_INDEX]) - || !NILP (Fbitmap_spec_p (attrs[LFACE_STIPPLE_INDEX]))); - xassert (UNSPECIFIEDP (attrs[LFACE_FONT_INDEX]) - || NILP (attrs[LFACE_FONT_INDEX]) - || STRINGP (attrs[LFACE_FONT_INDEX])); -#endif -} - - -/* Check consistency of attributes of Lisp face LFACE (a Lisp vector). */ - -static void -check_lface (lface) - Lisp_Object lface; -{ - if (!NILP (lface)) - { - xassert (LFACEP (lface)); - check_lface_attrs (XVECTOR (lface)->contents); - } -} - -#else /* GLYPH_DEBUG == 0 */ - -#define check_lface_attrs(attrs) (void) 0 -#define check_lface(lface) (void) 0 - -#endif /* GLYPH_DEBUG == 0 */ - - -/* Resolve face name FACE_NAME. If FACE_NAME is a string, intern it - to make it a symvol. If FACE_NAME is an alias for another face, - return that face's name. */ - -static Lisp_Object -resolve_face_name (face_name) - Lisp_Object face_name; -{ - Lisp_Object aliased; - - if (STRINGP (face_name)) - face_name = intern (XSTRING (face_name)->data); - - while (SYMBOLP (face_name)) - { - aliased = Fget (face_name, Qface_alias); - if (NILP (aliased)) - break; - else - face_name = aliased; - } - - return face_name; -} - - -/* Return the face definition of FACE_NAME on frame F. F null means - return the definition for new frames. FACE_NAME may be a string or - a symbol (apparently Emacs 20.2 allowed strings as face names in - face text properties; Ediff uses that). If FACE_NAME is an alias - for another face, return that face's definition. If SIGNAL_P is - non-zero, signal an error if FACE_NAME is not a valid face name. - If SIGNAL_P is zero, value is nil if FACE_NAME is not a valid face - name. */ - -static INLINE Lisp_Object -lface_from_face_name (f, face_name, signal_p) - struct frame *f; - Lisp_Object face_name; - int signal_p; -{ - Lisp_Object lface; - - face_name = resolve_face_name (face_name); - - if (f) - lface = assq_no_quit (face_name, f->face_alist); - else - lface = assq_no_quit (face_name, Vface_new_frame_defaults); - - if (CONSP (lface)) - lface = XCDR (lface); - else if (signal_p) - signal_error ("Invalid face", face_name); - - check_lface (lface); - return lface; -} - - -/* Get face attributes of face FACE_NAME from frame-local faces on - frame F. Store the resulting attributes in ATTRS which must point - to a vector of Lisp_Objects of size LFACE_VECTOR_SIZE. If SIGNAL_P - is non-zero, signal an error if FACE_NAME does not name a face. - Otherwise, value is zero if FACE_NAME is not a face. */ - -static INLINE int -get_lface_attributes (f, face_name, attrs, signal_p) - struct frame *f; - Lisp_Object face_name; - Lisp_Object *attrs; - int signal_p; -{ - Lisp_Object lface; - int success_p; - - lface = lface_from_face_name (f, face_name, signal_p); - if (!NILP (lface)) - { - bcopy (XVECTOR (lface)->contents, attrs, - LFACE_VECTOR_SIZE * sizeof *attrs); - success_p = 1; - } - else - success_p = 0; - - return success_p; -} - - -/* Non-zero if all attributes in face attribute vector ATTRS are - specified, i.e. are non-nil. */ - -static int -lface_fully_specified_p (attrs) - Lisp_Object *attrs; -{ - int i; - - for (i = 1; i < LFACE_VECTOR_SIZE; ++i) - if (i != LFACE_FONT_INDEX && i != LFACE_INHERIT_INDEX - && i != LFACE_AVGWIDTH_INDEX) - if (UNSPECIFIEDP (attrs[i])) - break; - - return i == LFACE_VECTOR_SIZE; -} - -#ifdef HAVE_WINDOW_SYSTEM - -/* Set font-related attributes of Lisp face LFACE from the fullname of - the font opened by FONTNAME. If FORCE_P is zero, set only - unspecified attributes of LFACE. The exception is `font' - attribute. It is set to FONTNAME as is regardless of FORCE_P. - - If FONTNAME is not available on frame F, - return 0 if MAY_FAIL_P is non-zero, otherwise abort. - If the fullname is not in a valid XLFD format, - return 0 if MAY_FAIL_P is non-zero, otherwise set normal values - in LFACE and return 1. - Otherwise, return 1. */ - -static int -set_lface_from_font_name (f, lface, fontname, force_p, may_fail_p) - struct frame *f; - Lisp_Object lface; - Lisp_Object fontname; - int force_p, may_fail_p; -{ - struct font_name font; - char *buffer; - int pt; - int have_xlfd_p; - int fontset; - char *font_name = XSTRING (fontname)->data; - struct font_info *font_info; - - /* If FONTNAME is actually a fontset name, get ASCII font name of it. */ - fontset = fs_query_fontset (fontname, 0); - if (fontset >= 0) - font_name = XSTRING (fontset_ascii (fontset))->data; - - /* Check if FONT_NAME is surely available on the system. Usually - FONT_NAME is already cached for the frame F and FS_LOAD_FONT - returns quickly. But, even if FONT_NAME is not yet cached, - caching it now is not futail because we anyway load the font - later. */ - BLOCK_INPUT; - font_info = FS_LOAD_FONT (f, 0, font_name, -1); - UNBLOCK_INPUT; - - if (!font_info) - { - if (may_fail_p) - return 0; - abort (); - } - - font.name = STRDUPA (font_info->full_name); - have_xlfd_p = split_font_name (f, &font, 1); - - /* Set attributes only if unspecified, otherwise face defaults for - new frames would never take effect. If we couldn't get a font - name conforming to XLFD, set normal values. */ - - if (force_p || UNSPECIFIEDP (LFACE_FAMILY (lface))) - { - Lisp_Object val; - if (have_xlfd_p) - { - buffer = (char *) alloca (strlen (font.fields[XLFD_FAMILY]) - + strlen (font.fields[XLFD_FOUNDRY]) - + 2); - sprintf (buffer, "%s-%s", font.fields[XLFD_FOUNDRY], - font.fields[XLFD_FAMILY]); - val = build_string (buffer); - } - else - val = build_string ("*"); - LFACE_FAMILY (lface) = val; - } - - if (force_p || UNSPECIFIEDP (LFACE_HEIGHT (lface))) - { - if (have_xlfd_p) - pt = xlfd_point_size (f, &font); - else - pt = pixel_point_size (f, font_info->height * 10); - xassert (pt > 0); - LFACE_HEIGHT (lface) = make_number (pt); - } - - if (force_p || UNSPECIFIEDP (LFACE_SWIDTH (lface))) - LFACE_SWIDTH (lface) - = have_xlfd_p ? xlfd_symbolic_swidth (&font) : Qnormal; - - if (force_p || UNSPECIFIEDP (LFACE_AVGWIDTH (lface))) - LFACE_AVGWIDTH (lface) - = (have_xlfd_p - ? make_number (font.numeric[XLFD_AVGWIDTH]) - : Qunspecified); - - if (force_p || UNSPECIFIEDP (LFACE_WEIGHT (lface))) - LFACE_WEIGHT (lface) - = have_xlfd_p ? xlfd_symbolic_weight (&font) : Qnormal; - - if (force_p || UNSPECIFIEDP (LFACE_SLANT (lface))) - LFACE_SLANT (lface) - = have_xlfd_p ? xlfd_symbolic_slant (&font) : Qnormal; - - LFACE_FONT (lface) = fontname; - - return 1; -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - -/* Merges the face height FROM with the face height TO, and returns the - merged height. If FROM is an invalid height, then INVALID is - returned instead. FROM may be a either an absolute face height or a - `relative' height, and TO must be an absolute height. The returned - value is always an absolute height. GCPRO is a lisp value that will - be protected from garbage-collection if this function makes a call - into lisp. */ - -Lisp_Object -merge_face_heights (from, to, invalid, gcpro) - Lisp_Object from, to, invalid, gcpro; -{ - int result = 0; - - if (INTEGERP (from)) - result = XINT (from); - else if (NUMBERP (from)) - result = XFLOATINT (from) * XINT (to); -#if 0 /* Probably not so useful. */ - else if (CONSP (from) && CONSP (XCDR (from))) - { - if (EQ (XCAR(from), Qplus) || EQ (XCAR(from), Qminus)) - { - if (INTEGERP (XCAR (XCDR (from)))) - { - int inc = XINT (XCAR (XCDR (from))); - if (EQ (XCAR (from), Qminus)) - inc = -inc; - - result = XFASTINT (to); - if (result + inc > 0) - /* Note that `underflows' don't mean FROM is invalid, so - we just pin the result at TO if it would otherwise be - negative or 0. */ - result += inc; - } - } - } -#endif - else if (FUNCTIONP (from)) - { - /* Call function with current height as argument. - From is the new height. */ - Lisp_Object args[2], height; - struct gcpro gcpro1; - - GCPRO1 (gcpro); - - args[0] = from; - args[1] = to; - height = safe_call (2, args); - - UNGCPRO; - - if (NUMBERP (height)) - result = XFLOATINT (height); - } - - if (result > 0) - return make_number (result); - else - return invalid; -} - - -/* Merge two Lisp face attribute vectors on frame F, FROM and TO, and - store the resulting attributes in TO, which must be already be - completely specified and contain only absolute attributes. Every - specified attribute of FROM overrides the corresponding attribute of - TO; relative attributes in FROM are merged with the absolute value in - TO and replace it. CYCLE_CHECK is used internally to detect loops in - face inheritance; it should be Qnil when called from other places. */ - -static INLINE void -merge_face_vectors (f, from, to, cycle_check) - struct frame *f; - Lisp_Object *from, *to; - Lisp_Object cycle_check; -{ - int i; - - /* If FROM inherits from some other faces, merge their attributes into - TO before merging FROM's direct attributes. Note that an :inherit - attribute of `unspecified' is the same as one of nil; we never - merge :inherit attributes, so nil is more correct, but lots of - other code uses `unspecified' as a generic value for face attributes. */ - if (!UNSPECIFIEDP (from[LFACE_INHERIT_INDEX]) - && !NILP (from[LFACE_INHERIT_INDEX])) - merge_face_inheritance (f, from[LFACE_INHERIT_INDEX], to, cycle_check); - - /* If TO specifies a :font attribute, and FROM specifies some - font-related attribute, we need to clear TO's :font attribute - (because it will be inconsistent with whatever FROM specifies, and - FROM takes precedence). */ - if (!NILP (to[LFACE_FONT_INDEX]) - && (!UNSPECIFIEDP (from[LFACE_FAMILY_INDEX]) - || !UNSPECIFIEDP (from[LFACE_HEIGHT_INDEX]) - || !UNSPECIFIEDP (from[LFACE_WEIGHT_INDEX]) - || !UNSPECIFIEDP (from[LFACE_SLANT_INDEX]) - || !UNSPECIFIEDP (from[LFACE_SWIDTH_INDEX]) - || !UNSPECIFIEDP (from[LFACE_AVGWIDTH_INDEX]))) - to[LFACE_FONT_INDEX] = Qnil; - - for (i = 1; i < LFACE_VECTOR_SIZE; ++i) - if (!UNSPECIFIEDP (from[i])) - if (i == LFACE_HEIGHT_INDEX && !INTEGERP (from[i])) - to[i] = merge_face_heights (from[i], to[i], to[i], cycle_check); - else - to[i] = from[i]; - - /* TO is always an absolute face, which should inherit from nothing. - We blindly copy the :inherit attribute above and fix it up here. */ - to[LFACE_INHERIT_INDEX] = Qnil; -} - - -/* Checks the `cycle check' variable CHECK to see if it indicates that - EL is part of a cycle; CHECK must be either Qnil or a value returned - by an earlier use of CYCLE_CHECK. SUSPICIOUS is the number of - elements after which a cycle might be suspected; after that many - elements, this macro begins consing in order to keep more precise - track of elements. - - Returns NIL if a cycle was detected, otherwise a new value for CHECK - that includes EL. - - CHECK is evaluated multiple times, EL and SUSPICIOUS 0 or 1 times, so - the caller should make sure that's ok. */ - -#define CYCLE_CHECK(check, el, suspicious) \ - (NILP (check) \ - ? make_number (0) \ - : (INTEGERP (check) \ - ? (XFASTINT (check) < (suspicious) \ - ? make_number (XFASTINT (check) + 1) \ - : Fcons (el, Qnil)) \ - : (!NILP (Fmemq ((el), (check))) \ - ? Qnil \ - : Fcons ((el), (check))))) - - -/* Merge face attributes from the face on frame F whose name is - INHERITS, into the vector of face attributes TO; INHERITS may also be - a list of face names, in which case they are applied in order. - CYCLE_CHECK is used to detect loops in face inheritance. - Returns true if any of the inherited attributes are `font-related'. */ - -static void -merge_face_inheritance (f, inherit, to, cycle_check) - struct frame *f; - Lisp_Object inherit; - Lisp_Object *to; - Lisp_Object cycle_check; -{ - if (SYMBOLP (inherit) && !EQ (inherit, Qunspecified)) - /* Inherit from the named face INHERIT. */ - { - Lisp_Object lface; - - /* Make sure we're not in an inheritance loop. */ - cycle_check = CYCLE_CHECK (cycle_check, inherit, 15); - if (NILP (cycle_check)) - /* Cycle detected, ignore any further inheritance. */ - return; - - lface = lface_from_face_name (f, inherit, 0); - if (!NILP (lface)) - merge_face_vectors (f, XVECTOR (lface)->contents, to, cycle_check); - } - else if (CONSP (inherit)) - /* Handle a list of inherited faces by calling ourselves recursively - on each element. Note that we only do so for symbol elements, so - it's not possible to infinitely recurse. */ - { - while (CONSP (inherit)) - { - if (SYMBOLP (XCAR (inherit))) - merge_face_inheritance (f, XCAR (inherit), to, cycle_check); - - /* Check for a circular inheritance list. */ - cycle_check = CYCLE_CHECK (cycle_check, inherit, 15); - if (NILP (cycle_check)) - /* Cycle detected. */ - break; - - inherit = XCDR (inherit); - } - } -} - - -/* Given a Lisp face attribute vector TO and a Lisp object PROP that - is a face property, determine the resulting face attributes on - frame F, and store them in TO. PROP may be a single face - specification or a list of such specifications. Each face - specification can be - - 1. A symbol or string naming a Lisp face. - - 2. A property list of the form (KEYWORD VALUE ...) where each - KEYWORD is a face attribute name, and value is an appropriate value - for that attribute. - - 3. Conses or the form (FOREGROUND-COLOR . COLOR) or - (BACKGROUND-COLOR . COLOR) where COLOR is a color name. This is - for compatibility with 20.2. - - Face specifications earlier in lists take precedence over later - specifications. */ - -static void -merge_face_vector_with_property (f, to, prop) - struct frame *f; - Lisp_Object *to; - Lisp_Object prop; -{ - if (CONSP (prop)) - { - Lisp_Object first = XCAR (prop); - - if (EQ (first, Qforeground_color) - || EQ (first, Qbackground_color)) - { - /* One of (FOREGROUND-COLOR . COLOR) or (BACKGROUND-COLOR - . COLOR). COLOR must be a string. */ - Lisp_Object color_name = XCDR (prop); - Lisp_Object color = first; - - if (STRINGP (color_name)) - { - if (EQ (color, Qforeground_color)) - to[LFACE_FOREGROUND_INDEX] = color_name; - else - to[LFACE_BACKGROUND_INDEX] = color_name; - } - else - add_to_log ("Invalid face color", color_name, Qnil); - } - else if (SYMBOLP (first) - && *XSYMBOL (first)->name->data == ':') - { - /* Assume this is the property list form. */ - while (CONSP (prop) && CONSP (XCDR (prop))) - { - Lisp_Object keyword = XCAR (prop); - Lisp_Object value = XCAR (XCDR (prop)); - - if (EQ (keyword, QCfamily)) - { - if (STRINGP (value)) - to[LFACE_FAMILY_INDEX] = value; - else - add_to_log ("Invalid face font family", value, Qnil); - } - else if (EQ (keyword, QCheight)) - { - Lisp_Object new_height = - merge_face_heights (value, to[LFACE_HEIGHT_INDEX], - Qnil, Qnil); - - if (NILP (new_height)) - add_to_log ("Invalid face font height", value, Qnil); - else - to[LFACE_HEIGHT_INDEX] = new_height; - } - else if (EQ (keyword, QCweight)) - { - if (SYMBOLP (value) - && face_numeric_weight (value) >= 0) - to[LFACE_WEIGHT_INDEX] = value; - else - add_to_log ("Invalid face weight", value, Qnil); - } - else if (EQ (keyword, QCslant)) - { - if (SYMBOLP (value) - && face_numeric_slant (value) >= 0) - to[LFACE_SLANT_INDEX] = value; - else - add_to_log ("Invalid face slant", value, Qnil); - } - else if (EQ (keyword, QCunderline)) - { - if (EQ (value, Qt) - || NILP (value) - || STRINGP (value)) - to[LFACE_UNDERLINE_INDEX] = value; - else - add_to_log ("Invalid face underline", value, Qnil); - } - else if (EQ (keyword, QCoverline)) - { - if (EQ (value, Qt) - || NILP (value) - || STRINGP (value)) - to[LFACE_OVERLINE_INDEX] = value; - else - add_to_log ("Invalid face overline", value, Qnil); - } - else if (EQ (keyword, QCstrike_through)) - { - if (EQ (value, Qt) - || NILP (value) - || STRINGP (value)) - to[LFACE_STRIKE_THROUGH_INDEX] = value; - else - add_to_log ("Invalid face strike-through", value, Qnil); - } - else if (EQ (keyword, QCbox)) - { - if (EQ (value, Qt)) - value = make_number (1); - if (INTEGERP (value) - || STRINGP (value) - || CONSP (value) - || NILP (value)) - to[LFACE_BOX_INDEX] = value; - else - add_to_log ("Invalid face box", value, Qnil); - } - else if (EQ (keyword, QCinverse_video) - || EQ (keyword, QCreverse_video)) - { - if (EQ (value, Qt) || NILP (value)) - to[LFACE_INVERSE_INDEX] = value; - else - add_to_log ("Invalid face inverse-video", value, Qnil); - } - else if (EQ (keyword, QCforeground)) - { - if (STRINGP (value)) - to[LFACE_FOREGROUND_INDEX] = value; - else - add_to_log ("Invalid face foreground", value, Qnil); - } - else if (EQ (keyword, QCbackground)) - { - if (STRINGP (value)) - to[LFACE_BACKGROUND_INDEX] = value; - else - add_to_log ("Invalid face background", value, Qnil); - } - else if (EQ (keyword, QCstipple)) - { -#ifdef HAVE_X_WINDOWS - Lisp_Object pixmap_p = Fbitmap_spec_p (value); - if (!NILP (pixmap_p)) - to[LFACE_STIPPLE_INDEX] = value; - else - add_to_log ("Invalid face stipple", value, Qnil); -#endif - } - else if (EQ (keyword, QCwidth)) - { - if (SYMBOLP (value) - && face_numeric_swidth (value) >= 0) - to[LFACE_SWIDTH_INDEX] = value; - else - add_to_log ("Invalid face width", value, Qnil); - } - else if (EQ (keyword, QCinherit)) - { - if (SYMBOLP (value)) - to[LFACE_INHERIT_INDEX] = value; - else - { - Lisp_Object tail; - for (tail = value; CONSP (tail); tail = XCDR (tail)) - if (!SYMBOLP (XCAR (tail))) - break; - if (NILP (tail)) - to[LFACE_INHERIT_INDEX] = value; - else - add_to_log ("Invalid face inherit", value, Qnil); - } - } - else - add_to_log ("Invalid attribute %s in face property", - keyword, Qnil); - - prop = XCDR (XCDR (prop)); - } - } - else - { - /* This is a list of face specs. Specifications at the - beginning of the list take precedence over later - specifications, so we have to merge starting with the - last specification. */ - Lisp_Object next = XCDR (prop); - if (!NILP (next)) - merge_face_vector_with_property (f, to, next); - merge_face_vector_with_property (f, to, first); - } - } - else - { - /* PROP ought to be a face name. */ - Lisp_Object lface = lface_from_face_name (f, prop, 0); - if (NILP (lface)) - add_to_log ("Invalid face text property value: %s", prop, Qnil); - else - merge_face_vectors (f, XVECTOR (lface)->contents, to, Qnil); - } -} - - -DEFUN ("internal-make-lisp-face", Finternal_make_lisp_face, - Sinternal_make_lisp_face, 1, 2, 0, - "Make FACE, a symbol, a Lisp face with all attributes nil.\n\ -If FACE was not known as a face before, create a new one.\n\ -If optional argument FRAME is specified, make a frame-local face\n\ -for that frame. Otherwise operate on the global face definition.\n\ -Value is a vector of face attributes.") - (face, frame) - Lisp_Object face, frame; -{ - Lisp_Object global_lface, lface; - struct frame *f; - int i; - - CHECK_SYMBOL (face, 0); - global_lface = lface_from_face_name (NULL, face, 0); - - if (!NILP (frame)) - { - CHECK_LIVE_FRAME (frame, 1); - f = XFRAME (frame); - lface = lface_from_face_name (f, face, 0); - } - else - f = NULL, lface = Qnil; - - /* Add a global definition if there is none. */ - if (NILP (global_lface)) - { - global_lface = Fmake_vector (make_number (LFACE_VECTOR_SIZE), - Qunspecified); - AREF (global_lface, 0) = Qface; - Vface_new_frame_defaults = Fcons (Fcons (face, global_lface), - Vface_new_frame_defaults); - - /* Assign the new Lisp face a unique ID. The mapping from Lisp - face id to Lisp face is given by the vector lface_id_to_name. - The mapping from Lisp face to Lisp face id is given by the - property `face' of the Lisp face name. */ - if (next_lface_id == lface_id_to_name_size) - { - int new_size = max (50, 2 * lface_id_to_name_size); - int sz = new_size * sizeof *lface_id_to_name; - lface_id_to_name = (Lisp_Object *) xrealloc (lface_id_to_name, sz); - lface_id_to_name_size = new_size; - } - - lface_id_to_name[next_lface_id] = face; - Fput (face, Qface, make_number (next_lface_id)); - ++next_lface_id; - } - else if (f == NULL) - for (i = 1; i < LFACE_VECTOR_SIZE; ++i) - AREF (global_lface, i) = Qunspecified; - - /* Add a frame-local definition. */ - if (f) - { - if (NILP (lface)) - { - lface = Fmake_vector (make_number (LFACE_VECTOR_SIZE), - Qunspecified); - AREF (lface, 0) = Qface; - f->face_alist = Fcons (Fcons (face, lface), f->face_alist); - } - else - for (i = 1; i < LFACE_VECTOR_SIZE; ++i) - AREF (lface, i) = Qunspecified; - } - else - lface = global_lface; - - xassert (LFACEP (lface)); - check_lface (lface); - return lface; -} - - -DEFUN ("internal-lisp-face-p", Finternal_lisp_face_p, - Sinternal_lisp_face_p, 1, 2, 0, - "Return non-nil if FACE names a face.\n\ -If optional second parameter FRAME is non-nil, check for the\n\ -existence of a frame-local face with name FACE on that frame.\n\ -Otherwise check for the existence of a global face.") - (face, frame) - Lisp_Object face, frame; -{ - Lisp_Object lface; - - if (!NILP (frame)) - { - CHECK_LIVE_FRAME (frame, 1); - lface = lface_from_face_name (XFRAME (frame), face, 0); - } - else - lface = lface_from_face_name (NULL, face, 0); - - return lface; -} - - -DEFUN ("internal-copy-lisp-face", Finternal_copy_lisp_face, - Sinternal_copy_lisp_face, 4, 4, 0, - "Copy face FROM to TO.\n\ -If FRAME it t, copy the global face definition of FROM to the\n\ -global face definition of TO. Otherwise, copy the frame-local\n\ -definition of FROM on FRAME to the frame-local definition of TO\n\ -on NEW-FRAME, or FRAME if NEW-FRAME is nil.\n\ -\n\ -Value is TO.") - (from, to, frame, new_frame) - Lisp_Object from, to, frame, new_frame; -{ - Lisp_Object lface, copy; - - CHECK_SYMBOL (from, 0); - CHECK_SYMBOL (to, 1); - if (NILP (new_frame)) - new_frame = frame; - - if (EQ (frame, Qt)) - { - /* Copy global definition of FROM. We don't make copies of - strings etc. because 20.2 didn't do it either. */ - lface = lface_from_face_name (NULL, from, 1); - copy = Finternal_make_lisp_face (to, Qnil); - } - else - { - /* Copy frame-local definition of FROM. */ - CHECK_LIVE_FRAME (frame, 2); - CHECK_LIVE_FRAME (new_frame, 3); - lface = lface_from_face_name (XFRAME (frame), from, 1); - copy = Finternal_make_lisp_face (to, new_frame); - } - - bcopy (XVECTOR (lface)->contents, XVECTOR (copy)->contents, - LFACE_VECTOR_SIZE * sizeof (Lisp_Object)); - - return to; -} - - -DEFUN ("internal-set-lisp-face-attribute", Finternal_set_lisp_face_attribute, - Sinternal_set_lisp_face_attribute, 3, 4, 0, - "Set attribute ATTR of FACE to VALUE.\n\ -FRAME being a frame means change the face on that frame.\n\ -FRAME nil means change the face of the selected frame.\n\ -FRAME t means change the default for new frames.\n\ -FRAME 0 means change the face on all frames, and change the default\n\ - for new frames.") - (face, attr, value, frame) - Lisp_Object face, attr, value, frame; -{ - Lisp_Object lface; - Lisp_Object old_value = Qnil; - /* Set 1 if ATTR is QCfont. */ - int font_attr_p = 0; - /* Set 1 if ATTR is one of font-related attributes other than QCfont. */ - int font_related_attr_p = 0; - - CHECK_SYMBOL (face, 0); - CHECK_SYMBOL (attr, 1); - - face = resolve_face_name (face); - - /* If FRAME is 0, change face on all frames, and change the - default for new frames. */ - if (INTEGERP (frame) && XINT (frame) == 0) - { - Lisp_Object tail; - Finternal_set_lisp_face_attribute (face, attr, value, Qt); - FOR_EACH_FRAME (tail, frame) - Finternal_set_lisp_face_attribute (face, attr, value, frame); - return face; - } - - /* Set lface to the Lisp attribute vector of FACE. */ - if (EQ (frame, Qt)) - lface = lface_from_face_name (NULL, face, 1); - else - { - if (NILP (frame)) - frame = selected_frame; - - CHECK_LIVE_FRAME (frame, 3); - lface = lface_from_face_name (XFRAME (frame), face, 0); - - /* If a frame-local face doesn't exist yet, create one. */ - if (NILP (lface)) - lface = Finternal_make_lisp_face (face, frame); - } - - if (EQ (attr, QCfamily)) - { - if (!UNSPECIFIEDP (value)) - { - CHECK_STRING (value, 3); - if (XSTRING (value)->size == 0) - signal_error ("Invalid face family", value); - } - old_value = LFACE_FAMILY (lface); - LFACE_FAMILY (lface) = value; - font_related_attr_p = 1; - } - else if (EQ (attr, QCheight)) - { - if (!UNSPECIFIEDP (value)) - { - Lisp_Object test = - (EQ (face, Qdefault) ? value : - /* The default face must have an absolute size, otherwise, we do - a test merge with a random height to see if VALUE's ok. */ - merge_face_heights (value, make_number(10), Qnil, Qnil)); - - if (!INTEGERP(test) || XINT(test) <= 0) - signal_error ("Invalid face height", value); - } - - old_value = LFACE_HEIGHT (lface); - LFACE_HEIGHT (lface) = value; - font_related_attr_p = 1; - } - else if (EQ (attr, QCweight)) - { - if (!UNSPECIFIEDP (value)) - { - CHECK_SYMBOL (value, 3); - if (face_numeric_weight (value) < 0) - signal_error ("Invalid face weight", value); - } - old_value = LFACE_WEIGHT (lface); - LFACE_WEIGHT (lface) = value; - font_related_attr_p = 1; - } - else if (EQ (attr, QCslant)) - { - if (!UNSPECIFIEDP (value)) - { - CHECK_SYMBOL (value, 3); - if (face_numeric_slant (value) < 0) - signal_error ("Invalid face slant", value); - } - old_value = LFACE_SLANT (lface); - LFACE_SLANT (lface) = value; - font_related_attr_p = 1; - } - else if (EQ (attr, QCunderline)) - { - if (!UNSPECIFIEDP (value)) - if ((SYMBOLP (value) - && !EQ (value, Qt) - && !EQ (value, Qnil)) - /* Underline color. */ - || (STRINGP (value) - && XSTRING (value)->size == 0)) - signal_error ("Invalid face underline", value); - - old_value = LFACE_UNDERLINE (lface); - LFACE_UNDERLINE (lface) = value; - } - else if (EQ (attr, QCoverline)) - { - if (!UNSPECIFIEDP (value)) - if ((SYMBOLP (value) - && !EQ (value, Qt) - && !EQ (value, Qnil)) - /* Overline color. */ - || (STRINGP (value) - && XSTRING (value)->size == 0)) - signal_error ("Invalid face overline", value); - - old_value = LFACE_OVERLINE (lface); - LFACE_OVERLINE (lface) = value; - } - else if (EQ (attr, QCstrike_through)) - { - if (!UNSPECIFIEDP (value)) - if ((SYMBOLP (value) - && !EQ (value, Qt) - && !EQ (value, Qnil)) - /* Strike-through color. */ - || (STRINGP (value) - && XSTRING (value)->size == 0)) - signal_error ("Invalid face strike-through", value); - - old_value = LFACE_STRIKE_THROUGH (lface); - LFACE_STRIKE_THROUGH (lface) = value; - } - else if (EQ (attr, QCbox)) - { - int valid_p; - - /* Allow t meaning a simple box of width 1 in foreground color - of the face. */ - if (EQ (value, Qt)) - value = make_number (1); - - if (UNSPECIFIEDP (value)) - valid_p = 1; - else if (NILP (value)) - valid_p = 1; - else if (INTEGERP (value)) - valid_p = XINT (value) != 0; - else if (STRINGP (value)) - valid_p = XSTRING (value)->size > 0; - else if (CONSP (value)) - { - Lisp_Object tem; - - tem = value; - while (CONSP (tem)) - { - Lisp_Object k, v; - - k = XCAR (tem); - tem = XCDR (tem); - if (!CONSP (tem)) - break; - v = XCAR (tem); - tem = XCDR (tem); - - if (EQ (k, QCline_width)) - { - if (!INTEGERP (v) || XINT (v) == 0) - break; - } - else if (EQ (k, QCcolor)) - { - if (!STRINGP (v) || XSTRING (v)->size == 0) - break; - } - else if (EQ (k, QCstyle)) - { - if (!EQ (v, Qpressed_button) && !EQ (v, Qreleased_button)) - break; - } - else - break; - } - - valid_p = NILP (tem); - } - else - valid_p = 0; - - if (!valid_p) - signal_error ("Invalid face box", value); - - old_value = LFACE_BOX (lface); - LFACE_BOX (lface) = value; - } - else if (EQ (attr, QCinverse_video) - || EQ (attr, QCreverse_video)) - { - if (!UNSPECIFIEDP (value)) - { - CHECK_SYMBOL (value, 3); - if (!EQ (value, Qt) && !NILP (value)) - signal_error ("Invalid inverse-video face attribute value", value); - } - old_value = LFACE_INVERSE (lface); - LFACE_INVERSE (lface) = value; - } - else if (EQ (attr, QCforeground)) - { - if (!UNSPECIFIEDP (value)) - { - /* Don't check for valid color names here because it depends - on the frame (display) whether the color will be valid - when the face is realized. */ - CHECK_STRING (value, 3); - if (XSTRING (value)->size == 0) - signal_error ("Empty foreground color value", value); - } - old_value = LFACE_FOREGROUND (lface); - LFACE_FOREGROUND (lface) = value; - } - else if (EQ (attr, QCbackground)) - { - if (!UNSPECIFIEDP (value)) - { - /* Don't check for valid color names here because it depends - on the frame (display) whether the color will be valid - when the face is realized. */ - CHECK_STRING (value, 3); - if (XSTRING (value)->size == 0) - signal_error ("Empty background color value", value); - } - old_value = LFACE_BACKGROUND (lface); - LFACE_BACKGROUND (lface) = value; - } - else if (EQ (attr, QCstipple)) - { -#ifdef HAVE_X_WINDOWS - if (!UNSPECIFIEDP (value) - && !NILP (value) - && NILP (Fbitmap_spec_p (value))) - signal_error ("Invalid stipple attribute", value); - old_value = LFACE_STIPPLE (lface); - LFACE_STIPPLE (lface) = value; -#endif /* HAVE_X_WINDOWS */ - } - else if (EQ (attr, QCwidth)) - { - if (!UNSPECIFIEDP (value)) - { - CHECK_SYMBOL (value, 3); - if (face_numeric_swidth (value) < 0) - signal_error ("Invalid face width", value); - } - old_value = LFACE_SWIDTH (lface); - LFACE_SWIDTH (lface) = value; - font_related_attr_p = 1; - } - else if (EQ (attr, QCfont)) - { -#ifdef HAVE_WINDOW_SYSTEM - /* Set font-related attributes of the Lisp face from an - XLFD font name. */ - struct frame *f; - Lisp_Object tmp; - - CHECK_STRING (value, 3); - if (EQ (frame, Qt)) - f = SELECTED_FRAME (); - else - f = check_x_frame (frame); - - /* VALUE may be a fontset name or an alias of fontset. In such - a case, use the base fontset name. */ - tmp = Fquery_fontset (value, Qnil); - if (!NILP (tmp)) - value = tmp; - - if (!set_lface_from_font_name (f, lface, value, 1, 1)) - signal_error ("Invalid font or fontset name", value); - - font_attr_p = 1; -#endif /* HAVE_WINDOW_SYSTEM */ - } - else if (EQ (attr, QCinherit)) - { - Lisp_Object tail; - if (SYMBOLP (value)) - tail = Qnil; - else - for (tail = value; CONSP (tail); tail = XCDR (tail)) - if (!SYMBOLP (XCAR (tail))) - break; - if (NILP (tail)) - LFACE_INHERIT (lface) = value; - else - signal_error ("Invalid face inheritance", value); - } - else if (EQ (attr, QCbold)) - { - old_value = LFACE_WEIGHT (lface); - LFACE_WEIGHT (lface) = NILP (value) ? Qnormal : Qbold; - font_related_attr_p = 1; - } - else if (EQ (attr, QCitalic)) - { - old_value = LFACE_SLANT (lface); - LFACE_SLANT (lface) = NILP (value) ? Qnormal : Qitalic; - font_related_attr_p = 1; - } - else - signal_error ("Invalid face attribute name", attr); - - if (font_related_attr_p - && !UNSPECIFIEDP (value)) - /* If a font-related attribute other than QCfont is specified, the - original `font' attribute nor that of default face is useless - to determine a new font. Thus, we set it to nil so that font - selection mechanism doesn't use it. */ - LFACE_FONT (lface) = Qnil; - - /* Changing a named face means that all realized faces depending on - that face are invalid. Since we cannot tell which realized faces - depend on the face, make sure they are all removed. This is done - by incrementing face_change_count. The next call to - init_iterator will then free realized faces. */ - if (!EQ (frame, Qt) - && (EQ (attr, QCfont) - || NILP (Fequal (old_value, value)))) - { - ++face_change_count; - ++windows_or_buffers_changed; - } - - if (!UNSPECIFIEDP (value) - && NILP (Fequal (old_value, value))) - { - Lisp_Object param; - - param = Qnil; - - if (EQ (face, Qdefault)) - { -#ifdef HAVE_WINDOW_SYSTEM - /* Changed font-related attributes of the `default' face are - reflected in changed `font' frame parameters. */ - if (FRAMEP (frame) - && (font_related_attr_p || font_attr_p) - && lface_fully_specified_p (XVECTOR (lface)->contents)) - set_font_frame_param (frame, lface); - else -#endif /* HAVE_WINDOW_SYSTEM */ - - if (EQ (attr, QCforeground)) - param = Qforeground_color; - else if (EQ (attr, QCbackground)) - param = Qbackground_color; - } -#ifdef HAVE_WINDOW_SYSTEM -#ifndef WINDOWSNT - else if (EQ (face, Qscroll_bar)) - { - /* Changing the colors of `scroll-bar' sets frame parameters - `scroll-bar-foreground' and `scroll-bar-background'. */ - if (EQ (attr, QCforeground)) - param = Qscroll_bar_foreground; - else if (EQ (attr, QCbackground)) - param = Qscroll_bar_background; - } -#endif /* not WINDOWSNT */ - else if (EQ (face, Qborder)) - { - /* Changing background color of `border' sets frame parameter - `border-color'. */ - if (EQ (attr, QCbackground)) - param = Qborder_color; - } - else if (EQ (face, Qcursor)) - { - /* Changing background color of `cursor' sets frame parameter - `cursor-color'. */ - if (EQ (attr, QCbackground)) - param = Qcursor_color; - } - else if (EQ (face, Qmouse)) - { - /* Changing background color of `mouse' sets frame parameter - `mouse-color'. */ - if (EQ (attr, QCbackground)) - param = Qmouse_color; - } -#endif /* HAVE_WINDOW_SYSTEM */ - else if (EQ (face, Qmenu)) - { - /* Indicate that we have to update the menu bar when - realizing faces on FRAME. FRAME t change the - default for new frames. We do this by setting - setting the flag in new face caches */ - if (FRAMEP (frame)) - { - struct frame *f = XFRAME (frame); - if (FRAME_FACE_CACHE (f) == NULL) - FRAME_FACE_CACHE (f) = make_face_cache (f); - FRAME_FACE_CACHE (f)->menu_face_changed_p = 1; - } - else - menu_face_changed_default = 1; - } - - if (!NILP (param)) - if (EQ (frame, Qt)) - /* Update `default-frame-alist', which is used for new frames. */ - { - store_in_alist (&Vdefault_frame_alist, param, value); - } - else - /* Update the current frame's parameters. */ - { - Lisp_Object cons; - cons = XCAR (Vparam_value_alist); - XCAR (cons) = param; - XCDR (cons) = value; - Fmodify_frame_parameters (frame, Vparam_value_alist); - } - } - - return face; -} - - -#ifdef HAVE_WINDOW_SYSTEM - -/* Set the `font' frame parameter of FRAME determined from `default' - face attributes LFACE. If a face or fontset name is explicitely - specfied in LFACE, use it as is. Otherwise, determine a font name - from the other font-related atrributes of LFACE. In that case, if - there's no matching font, signals an error. */ - -static void -set_font_frame_param (frame, lface) - Lisp_Object frame, lface; -{ - struct frame *f = XFRAME (frame); - - if (FRAME_WINDOW_P (f)) - { - Lisp_Object font_name; - char *font; - - if (STRINGP (LFACE_FONT (lface))) - font_name = LFACE_FONT (lface); - else - { - /* Choose a font name that reflects LFACE's attributes and has - the registry and encoding pattern specified in the default - fontset (3rd arg: -1) for ASCII characters (4th arg: 0). */ - font = choose_face_font (f, XVECTOR (lface)->contents, -1, 0); - if (!font) - error ("No font matches the specified attribute"); - font_name = build_string (font); - xfree (font); - } - - Fmodify_frame_parameters (frame, Fcons (Fcons (Qfont, font_name), Qnil)); - } -} - - -/* Update the corresponding face when frame parameter PARAM on frame F - has been assigned the value NEW_VALUE. */ - -void -update_face_from_frame_parameter (f, param, new_value) - struct frame *f; - Lisp_Object param, new_value; -{ - Lisp_Object lface; - - /* If there are no faces yet, give up. This is the case when called - from Fx_create_frame, and we do the necessary things later in - face-set-after-frame-defaults. */ - if (NILP (f->face_alist)) - return; - - if (EQ (param, Qforeground_color)) - { - lface = lface_from_face_name (f, Qdefault, 1); - LFACE_FOREGROUND (lface) = (STRINGP (new_value) - ? new_value : Qunspecified); - realize_basic_faces (f); - } - else if (EQ (param, Qbackground_color)) - { - Lisp_Object frame; - - /* Changing the background color might change the background - mode, so that we have to load new defface specs. Call - frame-update-face-colors to do that. */ - XSETFRAME (frame, f); - call1 (Qframe_update_face_colors, frame); - - face = Qdefault; - lface = lface_from_face_name (f, face, 1); - LFACE_BACKGROUND (lface) = (STRINGP (new_value) - ? new_value : Qunspecified); - realize_basic_faces (f); - } - else if (EQ (param, Qborder_color)) - { - face = Qborder; - lface = lface_from_face_name (f, face, 1); - LFACE_BACKGROUND (lface) = (STRINGP (new_value) - ? new_value : Qunspecified); - } - else if (EQ (param, Qcursor_color)) - { - face = Qcursor; - lface = lface_from_face_name (f, face, 1); - LFACE_BACKGROUND (lface) = (STRINGP (new_value) - ? new_value : Qunspecified); - } - else if (EQ (param, Qmouse_color)) - { - face = Qmouse; - lface = lface_from_face_name (f, face, 1); - LFACE_BACKGROUND (lface) = (STRINGP (new_value) - ? new_value : Qunspecified); - } - - /* Changing a named face means that all realized faces depending on - that face are invalid. Since we cannot tell which realized faces - depend on the face, make sure they are all removed. This is done - by incrementing face_change_count. The next call to - init_iterator will then free realized faces. */ - if (!NILP (face) - && NILP (Fget (face, Qface_no_inherit))) - { - ++face_change_count; - ++windows_or_buffers_changed; - } -} - - -/* Get the value of X resource RESOURCE, class CLASS for the display - of frame FRAME. This is here because ordinary `x-get-resource' - doesn't take a frame argument. */ - -DEFUN ("internal-face-x-get-resource", Finternal_face_x_get_resource, - Sinternal_face_x_get_resource, 3, 3, 0, "") - (resource, class, frame) - Lisp_Object resource, class, frame; -{ - Lisp_Object value = Qnil; -#ifndef WINDOWSNT -#ifndef macintosh - CHECK_STRING (resource, 0); - CHECK_STRING (class, 1); - CHECK_LIVE_FRAME (frame, 2); - BLOCK_INPUT; - value = display_x_get_resource (FRAME_X_DISPLAY_INFO (XFRAME (frame)), - resource, class, Qnil, Qnil); - UNBLOCK_INPUT; -#endif /* not macintosh */ -#endif /* not WINDOWSNT */ - return value; -} - - -/* Return resource string VALUE as a boolean value, i.e. nil, or t. - If VALUE is "on" or "true", return t. If VALUE is "off" or - "false", return nil. Otherwise, if SIGNAL_P is non-zero, signal an - error; if SIGNAL_P is zero, return 0. */ - -static Lisp_Object -face_boolean_x_resource_value (value, signal_p) - Lisp_Object value; - int signal_p; -{ - Lisp_Object result = make_number (0); - - xassert (STRINGP (value)); - - if (xstricmp (XSTRING (value)->data, "on") == 0 - || xstricmp (XSTRING (value)->data, "true") == 0) - result = Qt; - else if (xstricmp (XSTRING (value)->data, "off") == 0 - || xstricmp (XSTRING (value)->data, "false") == 0) - result = Qnil; - else if (xstricmp (XSTRING (value)->data, "unspecified") == 0) - result = Qunspecified; - else if (signal_p) - signal_error ("Invalid face attribute value from X resource", value); - - return result; -} - - -DEFUN ("internal-set-lisp-face-attribute-from-resource", - Finternal_set_lisp_face_attribute_from_resource, - Sinternal_set_lisp_face_attribute_from_resource, - 3, 4, 0, "") - (face, attr, value, frame) - Lisp_Object face, attr, value, frame; -{ - CHECK_SYMBOL (face, 0); - CHECK_SYMBOL (attr, 1); - CHECK_STRING (value, 2); - - if (xstricmp (XSTRING (value)->data, "unspecified") == 0) - value = Qunspecified; - else if (EQ (attr, QCheight)) - { - value = Fstring_to_number (value, make_number (10)); - if (XINT (value) <= 0) - signal_error ("Invalid face height from X resource", value); - } - else if (EQ (attr, QCbold) || EQ (attr, QCitalic)) - value = face_boolean_x_resource_value (value, 1); - else if (EQ (attr, QCweight) || EQ (attr, QCslant) || EQ (attr, QCwidth)) - value = intern (XSTRING (value)->data); - else if (EQ (attr, QCreverse_video) || EQ (attr, QCinverse_video)) - value = face_boolean_x_resource_value (value, 1); - else if (EQ (attr, QCunderline) - || EQ (attr, QCoverline) - || EQ (attr, QCstrike_through)) - { - Lisp_Object boolean_value; - - /* If the result of face_boolean_x_resource_value is t or nil, - VALUE does NOT specify a color. */ - boolean_value = face_boolean_x_resource_value (value, 0); - if (SYMBOLP (boolean_value)) - value = boolean_value; - } - else if (EQ (attr, QCbox)) - value = Fcar (Fread_from_string (value, Qnil, Qnil)); - - return Finternal_set_lisp_face_attribute (face, attr, value, frame); -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - -/*********************************************************************** - Menu face - ***********************************************************************/ - -#if defined HAVE_X_WINDOWS && defined USE_X_TOOLKIT - -/* Make menus on frame F appear as specified by the `menu' face. */ - -static void -x_update_menu_appearance (f) - struct frame *f; -{ - struct x_display_info *dpyinfo = FRAME_X_DISPLAY_INFO (f); - XrmDatabase rdb; - - if (dpyinfo - && (rdb = XrmGetDatabase (FRAME_X_DISPLAY (f)), - rdb != NULL)) - { - char line[512]; - Lisp_Object lface = lface_from_face_name (f, Qmenu, 1); - struct face *face = FACE_FROM_ID (f, MENU_FACE_ID); - char *myname = XSTRING (Vx_resource_name)->data; - int changed_p = 0; -#ifdef USE_MOTIF - const char *popup_path = "popup_menu"; -#else - const char *popup_path = "menu.popup"; -#endif - - if (STRINGP (LFACE_FOREGROUND (lface))) - { - sprintf (line, "%s.%s*foreground: %s", - myname, popup_path, - XSTRING (LFACE_FOREGROUND (lface))->data); - XrmPutLineResource (&rdb, line); - sprintf (line, "%s.pane.menubar*foreground: %s", - myname, XSTRING (LFACE_FOREGROUND (lface))->data); - XrmPutLineResource (&rdb, line); - changed_p = 1; - } - - if (STRINGP (LFACE_BACKGROUND (lface))) - { - sprintf (line, "%s.%s*background: %s", - myname, popup_path, - XSTRING (LFACE_BACKGROUND (lface))->data); - XrmPutLineResource (&rdb, line); - sprintf (line, "%s.pane.menubar*background: %s", - myname, XSTRING (LFACE_BACKGROUND (lface))->data); - XrmPutLineResource (&rdb, line); - changed_p = 1; - } - - if (face->font_name - && (!UNSPECIFIEDP (LFACE_FAMILY (lface)) - || !UNSPECIFIEDP (LFACE_SWIDTH (lface)) - || !UNSPECIFIEDP (LFACE_AVGWIDTH (lface)) - || !UNSPECIFIEDP (LFACE_WEIGHT (lface)) - || !UNSPECIFIEDP (LFACE_SLANT (lface)) - || !UNSPECIFIEDP (LFACE_HEIGHT (lface)))) - { -#ifdef USE_MOTIF - const char *suffix = "List"; -#else - const char *suffix = ""; -#endif - sprintf (line, "%s.pane.menubar*font%s: %s", - myname, suffix, face->font_name); - XrmPutLineResource (&rdb, line); - sprintf (line, "%s.%s*font%s: %s", - myname, popup_path, suffix, face->font_name); - XrmPutLineResource (&rdb, line); - changed_p = 1; - } - - if (changed_p && f->output_data.x->menubar_widget) - free_frame_menubar (f); - } -} - -#endif /* HAVE_X_WINDOWS && USE_X_TOOLKIT */ - - - -DEFUN ("internal-get-lisp-face-attribute", Finternal_get_lisp_face_attribute, - Sinternal_get_lisp_face_attribute, - 2, 3, 0, - "Return face attribute KEYWORD of face SYMBOL.\n\ -If SYMBOL does not name a valid Lisp face or KEYWORD isn't a valid\n\ -face attribute name, signal an error.\n\ -If the optional argument FRAME is given, report on face FACE in that\n\ -frame. If FRAME is t, report on the defaults for face FACE (for new\n\ -frames). If FRAME is omitted or nil, use the selected frame.") - (symbol, keyword, frame) - Lisp_Object symbol, keyword, frame; -{ - Lisp_Object lface, value = Qnil; - - CHECK_SYMBOL (symbol, 0); - CHECK_SYMBOL (keyword, 1); - - if (EQ (frame, Qt)) - lface = lface_from_face_name (NULL, symbol, 1); - else - { - if (NILP (frame)) - frame = selected_frame; - CHECK_LIVE_FRAME (frame, 2); - lface = lface_from_face_name (XFRAME (frame), symbol, 1); - } - - if (EQ (keyword, QCfamily)) - value = LFACE_FAMILY (lface); - else if (EQ (keyword, QCheight)) - value = LFACE_HEIGHT (lface); - else if (EQ (keyword, QCweight)) - value = LFACE_WEIGHT (lface); - else if (EQ (keyword, QCslant)) - value = LFACE_SLANT (lface); - else if (EQ (keyword, QCunderline)) - value = LFACE_UNDERLINE (lface); - else if (EQ (keyword, QCoverline)) - value = LFACE_OVERLINE (lface); - else if (EQ (keyword, QCstrike_through)) - value = LFACE_STRIKE_THROUGH (lface); - else if (EQ (keyword, QCbox)) - value = LFACE_BOX (lface); - else if (EQ (keyword, QCinverse_video) - || EQ (keyword, QCreverse_video)) - value = LFACE_INVERSE (lface); - else if (EQ (keyword, QCforeground)) - value = LFACE_FOREGROUND (lface); - else if (EQ (keyword, QCbackground)) - value = LFACE_BACKGROUND (lface); - else if (EQ (keyword, QCstipple)) - value = LFACE_STIPPLE (lface); - else if (EQ (keyword, QCwidth)) - value = LFACE_SWIDTH (lface); - else if (EQ (keyword, QCinherit)) - value = LFACE_INHERIT (lface); - else if (EQ (keyword, QCfont)) - value = LFACE_FONT (lface); - else - signal_error ("Invalid face attribute name", keyword); - - return value; -} - - -DEFUN ("internal-lisp-face-attribute-values", - Finternal_lisp_face_attribute_values, - Sinternal_lisp_face_attribute_values, 1, 1, 0, - "Return a list of valid discrete values for face attribute ATTR.\n\ -Value is nil if ATTR doesn't have a discrete set of valid values.") - (attr) - Lisp_Object attr; -{ - Lisp_Object result = Qnil; - - CHECK_SYMBOL (attr, 0); - - if (EQ (attr, QCweight) - || EQ (attr, QCslant) - || EQ (attr, QCwidth)) - { - /* Extract permissible symbols from tables. */ - struct table_entry *table; - int i, dim; - - if (EQ (attr, QCweight)) - table = weight_table, dim = DIM (weight_table); - else if (EQ (attr, QCslant)) - table = slant_table, dim = DIM (slant_table); - else - table = swidth_table, dim = DIM (swidth_table); - - for (i = 0; i < dim; ++i) - { - Lisp_Object symbol = *table[i].symbol; - Lisp_Object tail = result; - - while (!NILP (tail) - && !EQ (XCAR (tail), symbol)) - tail = XCDR (tail); - - if (NILP (tail)) - result = Fcons (symbol, result); - } - } - else if (EQ (attr, QCunderline)) - result = Fcons (Qt, Fcons (Qnil, Qnil)); - else if (EQ (attr, QCoverline)) - result = Fcons (Qt, Fcons (Qnil, Qnil)); - else if (EQ (attr, QCstrike_through)) - result = Fcons (Qt, Fcons (Qnil, Qnil)); - else if (EQ (attr, QCinverse_video) || EQ (attr, QCreverse_video)) - result = Fcons (Qt, Fcons (Qnil, Qnil)); - - return result; -} - - -DEFUN ("internal-merge-in-global-face", Finternal_merge_in_global_face, - Sinternal_merge_in_global_face, 2, 2, 0, - "Add attributes from frame-default definition of FACE to FACE on FRAME.\n\ -Default face attributes override any local face attributes.") - (face, frame) - Lisp_Object face, frame; -{ - int i; - Lisp_Object global_lface, local_lface, *gvec, *lvec; - - CHECK_LIVE_FRAME (frame, 1); - global_lface = lface_from_face_name (NULL, face, 1); - local_lface = lface_from_face_name (XFRAME (frame), face, 0); - if (NILP (local_lface)) - local_lface = Finternal_make_lisp_face (face, frame); - - /* Make every specified global attribute override the local one. - BEWARE!! This is only used from `face-set-after-frame-default' where - the local frame is defined from default specs in `face-defface-spec' - and those should be overridden by global settings. Hence the strange - "global before local" priority. */ - lvec = XVECTOR (local_lface)->contents; - gvec = XVECTOR (global_lface)->contents; - for (i = 1; i < LFACE_VECTOR_SIZE; ++i) - if (! UNSPECIFIEDP (gvec[i])) - lvec[i] = gvec[i]; - - return Qnil; -} - - -/* The following function is implemented for compatibility with 20.2. - The function is used in x-resolve-fonts when it is asked to - return fonts with the same size as the font of a face. This is - done in fontset.el. */ - -DEFUN ("face-font", Fface_font, Sface_font, 1, 2, 0, - "Return the font name of face FACE, or nil if it is unspecified.\n\ -If the optional argument FRAME is given, report on face FACE in that frame.\n\ -If FRAME is t, report on the defaults for face FACE (for new frames).\n\ - The font default for a face is either nil, or a list\n\ - of the form (bold), (italic) or (bold italic).\n\ -If FRAME is omitted or nil, use the selected frame.") - (face, frame) - Lisp_Object face, frame; -{ - if (EQ (frame, Qt)) - { - Lisp_Object result = Qnil; - Lisp_Object lface = lface_from_face_name (NULL, face, 1); - - if (!UNSPECIFIEDP (LFACE_WEIGHT (lface)) - && !EQ (LFACE_WEIGHT (lface), Qnormal)) - result = Fcons (Qbold, result); - - if (!UNSPECIFIEDP (LFACE_SLANT (lface)) - && !EQ (LFACE_SLANT (lface), Qnormal)) - result = Fcons (Qitalic, result); - - return result; - } - else - { - struct frame *f = frame_or_selected_frame (frame, 1); - int face_id = lookup_named_face (f, face, 0); - struct face *face = FACE_FROM_ID (f, face_id); - return face ? build_string (face->font_name) : Qnil; - } -} - - -/* Compare face vectors V1 and V2 for equality. Value is non-zero if - all attributes are `equal'. Tries to be fast because this function - is called quite often. */ - -static INLINE int -lface_equal_p (v1, v2) - Lisp_Object *v1, *v2; -{ - int i, equal_p = 1; - - for (i = 1; i < LFACE_VECTOR_SIZE && equal_p; ++i) - { - Lisp_Object a = v1[i]; - Lisp_Object b = v2[i]; - - /* Type can differ, e.g. when one attribute is unspecified, i.e. nil, - and the other is specified. */ - equal_p = XTYPE (a) == XTYPE (b); - if (!equal_p) - break; - - if (!EQ (a, b)) - { - switch (XTYPE (a)) - { - case Lisp_String: - equal_p = ((STRING_BYTES (XSTRING (a)) - == STRING_BYTES (XSTRING (b))) - && bcmp (XSTRING (a)->data, XSTRING (b)->data, - STRING_BYTES (XSTRING (a))) == 0); - break; - - case Lisp_Int: - case Lisp_Symbol: - equal_p = 0; - break; - - default: - equal_p = !NILP (Fequal (a, b)); - break; - } - } - } - - return equal_p; -} - - -DEFUN ("internal-lisp-face-equal-p", Finternal_lisp_face_equal_p, - Sinternal_lisp_face_equal_p, 2, 3, 0, - "True if FACE1 and FACE2 are equal.\n\ -If the optional argument FRAME is given, report on face FACE in that frame.\n\ -If FRAME is t, report on the defaults for face FACE (for new frames).\n\ -If FRAME is omitted or nil, use the selected frame.") - (face1, face2, frame) - Lisp_Object face1, face2, frame; -{ - int equal_p; - struct frame *f; - Lisp_Object lface1, lface2; - - if (EQ (frame, Qt)) - f = NULL; - else - /* Don't use check_x_frame here because this function is called - before X frames exist. At that time, if FRAME is nil, - selected_frame will be used which is the frame dumped with - Emacs. That frame is not an X frame. */ - f = frame_or_selected_frame (frame, 2); - - lface1 = lface_from_face_name (NULL, face1, 1); - lface2 = lface_from_face_name (NULL, face2, 1); - equal_p = lface_equal_p (XVECTOR (lface1)->contents, - XVECTOR (lface2)->contents); - return equal_p ? Qt : Qnil; -} - - -DEFUN ("internal-lisp-face-empty-p", Finternal_lisp_face_empty_p, - Sinternal_lisp_face_empty_p, 1, 2, 0, - "True if FACE has no attribute specified.\n\ -If the optional argument FRAME is given, report on face FACE in that frame.\n\ -If FRAME is t, report on the defaults for face FACE (for new frames).\n\ -If FRAME is omitted or nil, use the selected frame.") - (face, frame) - Lisp_Object face, frame; -{ - struct frame *f; - Lisp_Object lface; - int i; - - if (NILP (frame)) - frame = selected_frame; - CHECK_LIVE_FRAME (frame, 0); - f = XFRAME (frame); - - if (EQ (frame, Qt)) - lface = lface_from_face_name (NULL, face, 1); - else - lface = lface_from_face_name (f, face, 1); - - for (i = 1; i < LFACE_VECTOR_SIZE; ++i) - if (!UNSPECIFIEDP (AREF (lface, i))) - break; - - return i == LFACE_VECTOR_SIZE ? Qt : Qnil; -} - - -DEFUN ("frame-face-alist", Fframe_face_alist, Sframe_face_alist, - 0, 1, 0, - "Return an alist of frame-local faces defined on FRAME.\n\ -For internal use only.") - (frame) - Lisp_Object frame; -{ - struct frame *f = frame_or_selected_frame (frame, 0); - return f->face_alist; -} - - -/* Return a hash code for Lisp string STRING with case ignored. Used - below in computing a hash value for a Lisp face. */ - -static INLINE unsigned -hash_string_case_insensitive (string) - Lisp_Object string; -{ - unsigned char *s; - unsigned hash = 0; - xassert (STRINGP (string)); - for (s = XSTRING (string)->data; *s; ++s) - hash = (hash << 1) ^ tolower (*s); - return hash; -} - - -/* Return a hash code for face attribute vector V. */ - -static INLINE unsigned -lface_hash (v) - Lisp_Object *v; -{ - return (hash_string_case_insensitive (v[LFACE_FAMILY_INDEX]) - ^ hash_string_case_insensitive (v[LFACE_FOREGROUND_INDEX]) - ^ hash_string_case_insensitive (v[LFACE_BACKGROUND_INDEX]) - ^ XFASTINT (v[LFACE_WEIGHT_INDEX]) - ^ XFASTINT (v[LFACE_SLANT_INDEX]) - ^ XFASTINT (v[LFACE_SWIDTH_INDEX]) - ^ XFASTINT (v[LFACE_HEIGHT_INDEX])); -} - - -/* Return non-zero if LFACE1 and LFACE2 specify the same font (without - considering charsets/registries). They do if they specify the same - family, point size, weight, width, slant, and fontset. Both LFACE1 - and LFACE2 must be fully-specified. */ - -static INLINE int -lface_same_font_attributes_p (lface1, lface2) - Lisp_Object *lface1, *lface2; -{ - xassert (lface_fully_specified_p (lface1) - && lface_fully_specified_p (lface2)); - return (xstricmp (XSTRING (lface1[LFACE_FAMILY_INDEX])->data, - XSTRING (lface2[LFACE_FAMILY_INDEX])->data) == 0 - && EQ (lface1[LFACE_HEIGHT_INDEX], lface2[LFACE_HEIGHT_INDEX]) - && EQ (lface1[LFACE_SWIDTH_INDEX], lface2[LFACE_SWIDTH_INDEX]) - && EQ (lface1[LFACE_AVGWIDTH_INDEX], lface2[LFACE_AVGWIDTH_INDEX]) - && EQ (lface1[LFACE_WEIGHT_INDEX], lface2[LFACE_WEIGHT_INDEX]) - && EQ (lface1[LFACE_SLANT_INDEX], lface2[LFACE_SLANT_INDEX]) - && (EQ (lface1[LFACE_FONT_INDEX], lface2[LFACE_FONT_INDEX]) - || (STRINGP (lface1[LFACE_FONT_INDEX]) - && STRINGP (lface2[LFACE_FONT_INDEX]) - && xstricmp (XSTRING (lface1[LFACE_FONT_INDEX])->data, - XSTRING (lface2[LFACE_FONT_INDEX])->data)))); -} - - - -/*********************************************************************** - Realized Faces - ***********************************************************************/ - -/* Allocate and return a new realized face for Lisp face attribute - vector ATTR. */ - -static struct face * -make_realized_face (attr) - Lisp_Object *attr; -{ - struct face *face = (struct face *) xmalloc (sizeof *face); - bzero (face, sizeof *face); - face->ascii_face = face; - bcopy (attr, face->lface, sizeof face->lface); - return face; -} - - -/* Free realized face FACE, including its X resources. FACE may - be null. */ - -static void -free_realized_face (f, face) - struct frame *f; - struct face *face; -{ - if (face) - { -#ifdef HAVE_WINDOW_SYSTEM - if (FRAME_WINDOW_P (f)) - { - /* Free fontset of FACE if it is ASCII face. */ - if (face->fontset >= 0 && face == face->ascii_face) - free_face_fontset (f, face); - if (face->gc) - { - x_free_gc (f, face->gc); - face->gc = 0; - } - - free_face_colors (f, face); - x_destroy_bitmap (f, face->stipple); - } -#endif /* HAVE_WINDOW_SYSTEM */ - - xfree (face); - } -} - - -/* Prepare face FACE for subsequent display on frame F. This - allocated GCs if they haven't been allocated yet or have been freed - by clearing the face cache. */ - -void -prepare_face_for_display (f, face) - struct frame *f; - struct face *face; -{ -#ifdef HAVE_WINDOW_SYSTEM - xassert (FRAME_WINDOW_P (f)); - - if (face->gc == 0) - { - XGCValues xgcv; - unsigned long mask = GCForeground | GCBackground | GCGraphicsExposures; - - xgcv.foreground = face->foreground; - xgcv.background = face->background; -#ifdef HAVE_X_WINDOWS - xgcv.graphics_exposures = False; -#endif - /* The font of FACE may be null if we couldn't load it. */ - if (face->font) - { -#ifdef HAVE_X_WINDOWS - xgcv.font = face->font->fid; -#endif -#ifdef WINDOWSNT - xgcv.font = face->font; -#endif -#ifdef macintosh - xgcv.font = face->font; -#endif - mask |= GCFont; - } - - BLOCK_INPUT; -#ifdef HAVE_X_WINDOWS - if (face->stipple) - { - xgcv.fill_style = FillOpaqueStippled; - xgcv.stipple = x_bitmap_pixmap (f, face->stipple); - mask |= GCFillStyle | GCStipple; - } -#endif - face->gc = x_create_gc (f, mask, &xgcv); - UNBLOCK_INPUT; - } -#endif /* HAVE_WINDOW_SYSTEM */ -} - - -/*********************************************************************** - Face Cache - ***********************************************************************/ - -/* Return a new face cache for frame F. */ - -static struct face_cache * -make_face_cache (f) - struct frame *f; -{ - struct face_cache *c; - int size; - - c = (struct face_cache *) xmalloc (sizeof *c); - bzero (c, sizeof *c); - size = FACE_CACHE_BUCKETS_SIZE * sizeof *c->buckets; - c->buckets = (struct face **) xmalloc (size); - bzero (c->buckets, size); - c->size = 50; - c->faces_by_id = (struct face **) xmalloc (c->size * sizeof *c->faces_by_id); - c->f = f; - c->menu_face_changed_p = menu_face_changed_default; - return c; -} - - -/* Clear out all graphics contexts for all realized faces, except for - the basic faces. This should be done from time to time just to avoid - keeping too many graphics contexts that are no longer needed. */ - -static void -clear_face_gcs (c) - struct face_cache *c; -{ - if (c && FRAME_WINDOW_P (c->f)) - { -#ifdef HAVE_WINDOW_SYSTEM - int i; - for (i = BASIC_FACE_ID_SENTINEL; i < c->used; ++i) - { - struct face *face = c->faces_by_id[i]; - if (face && face->gc) - { - x_free_gc (c->f, face->gc); - face->gc = 0; - } - } -#endif /* HAVE_WINDOW_SYSTEM */ - } -} - - -/* Free all realized faces in face cache C, including basic faces. C - may be null. If faces are freed, make sure the frame's current - matrix is marked invalid, so that a display caused by an expose - event doesn't try to use faces we destroyed. */ - -static void -free_realized_faces (c) - struct face_cache *c; -{ - if (c && c->used) - { - int i, size; - struct frame *f = c->f; - - /* We must block input here because we can't process X events - safely while only some faces are freed, or when the frame's - current matrix still references freed faces. */ - BLOCK_INPUT; - - for (i = 0; i < c->used; ++i) - { - free_realized_face (f, c->faces_by_id[i]); - c->faces_by_id[i] = NULL; - } - - c->used = 0; - size = FACE_CACHE_BUCKETS_SIZE * sizeof *c->buckets; - bzero (c->buckets, size); - - /* Must do a thorough redisplay the next time. Mark current - matrices as invalid because they will reference faces freed - above. This function is also called when a frame is - destroyed. In this case, the root window of F is nil. */ - if (WINDOWP (f->root_window)) - { - clear_current_matrices (f); - ++windows_or_buffers_changed; - } - - UNBLOCK_INPUT; - } -} - - -/* Free all faces realized for multibyte characters on frame F that - has FONTSET. */ - -void -free_realized_multibyte_face (f, fontset) - struct frame *f; - int fontset; -{ - struct face_cache *cache = FRAME_FACE_CACHE (f); - struct face *face; - int i; - - /* We must block input here because we can't process X events safely - while only some faces are freed, or when the frame's current - matrix still references freed faces. */ - BLOCK_INPUT; - - for (i = 0; i < cache->used; i++) - { - face = cache->faces_by_id[i]; - if (face - && face != face->ascii_face - && face->fontset == fontset) - { - uncache_face (cache, face); - free_realized_face (f, face); - } - } - - /* Must do a thorough redisplay the next time. Mark current - matrices as invalid because they will reference faces freed - above. This function is also called when a frame is destroyed. - In this case, the root window of F is nil. */ - if (WINDOWP (f->root_window)) - { - clear_current_matrices (f); - ++windows_or_buffers_changed; - } - - UNBLOCK_INPUT; -} - - -/* Free all realized faces on FRAME or on all frames if FRAME is nil. - This is done after attributes of a named face have been changed, - because we can't tell which realized faces depend on that face. */ - -void -free_all_realized_faces (frame) - Lisp_Object frame; -{ - if (NILP (frame)) - { - Lisp_Object rest; - FOR_EACH_FRAME (rest, frame) - free_realized_faces (FRAME_FACE_CACHE (XFRAME (frame))); - } - else - free_realized_faces (FRAME_FACE_CACHE (XFRAME (frame))); -} - - -/* Free face cache C and faces in it, including their X resources. */ - -static void -free_face_cache (c) - struct face_cache *c; -{ - if (c) - { - free_realized_faces (c); - xfree (c->buckets); - xfree (c->faces_by_id); - xfree (c); - } -} - - -/* Cache realized face FACE in face cache C. HASH is the hash value - of FACE. If FACE->fontset >= 0, add the new face to the end of the - collision list of the face hash table of C. This is done because - otherwise lookup_face would find FACE for every character, even if - faces with the same attributes but for specific characters exist. */ - -static void -cache_face (c, face, hash) - struct face_cache *c; - struct face *face; - unsigned hash; -{ - int i = hash % FACE_CACHE_BUCKETS_SIZE; - - face->hash = hash; - - if (face->fontset >= 0) - { - struct face *last = c->buckets[i]; - if (last) - { - while (last->next) - last = last->next; - last->next = face; - face->prev = last; - face->next = NULL; - } - else - { - c->buckets[i] = face; - face->prev = face->next = NULL; - } - } - else - { - face->prev = NULL; - face->next = c->buckets[i]; - if (face->next) - face->next->prev = face; - c->buckets[i] = face; - } - - /* Find a free slot in C->faces_by_id and use the index of the free - slot as FACE->id. */ - for (i = 0; i < c->used; ++i) - if (c->faces_by_id[i] == NULL) - break; - face->id = i; - - /* Maybe enlarge C->faces_by_id. */ - if (i == c->used && c->used == c->size) - { - int new_size = 2 * c->size; - int sz = new_size * sizeof *c->faces_by_id; - c->faces_by_id = (struct face **) xrealloc (c->faces_by_id, sz); - c->size = new_size; - } - -#if GLYPH_DEBUG - /* Check that FACE got a unique id. */ - { - int j, n; - struct face *face; - - for (j = n = 0; j < FACE_CACHE_BUCKETS_SIZE; ++j) - for (face = c->buckets[j]; face; face = face->next) - if (face->id == i) - ++n; - - xassert (n == 1); - } -#endif /* GLYPH_DEBUG */ - - c->faces_by_id[i] = face; - if (i == c->used) - ++c->used; -} - - -/* Remove face FACE from cache C. */ - -static void -uncache_face (c, face) - struct face_cache *c; - struct face *face; -{ - int i = face->hash % FACE_CACHE_BUCKETS_SIZE; - - if (face->prev) - face->prev->next = face->next; - else - c->buckets[i] = face->next; - - if (face->next) - face->next->prev = face->prev; - - c->faces_by_id[face->id] = NULL; - if (face->id == c->used) - --c->used; -} - - -/* Look up a realized face with face attributes ATTR in the face cache - of frame F. The face will be used to display character C. Value - is the ID of the face found. If no suitable face is found, realize - a new one. In that case, if C is a multibyte character, BASE_FACE - is a face that has the same attributes. */ - -INLINE int -lookup_face (f, attr, c, base_face) - struct frame *f; - Lisp_Object *attr; - int c; - struct face *base_face; -{ - struct face_cache *cache = FRAME_FACE_CACHE (f); - unsigned hash; - int i; - struct face *face; - - xassert (cache != NULL); - check_lface_attrs (attr); - - /* Look up ATTR in the face cache. */ - hash = lface_hash (attr); - i = hash % FACE_CACHE_BUCKETS_SIZE; - - for (face = cache->buckets[i]; face; face = face->next) - if (face->hash == hash - && (!FRAME_WINDOW_P (f) - || FACE_SUITABLE_FOR_CHAR_P (face, c)) - && lface_equal_p (face->lface, attr)) - break; - - /* If not found, realize a new face. */ - if (face == NULL) - face = realize_face (cache, attr, c, base_face, -1); - -#if GLYPH_DEBUG - xassert (face == FACE_FROM_ID (f, face->id)); - -/* When this function is called from face_for_char (in this case, C is - a multibyte character), a fontset of a face returned by - realize_face is not yet set, i.e. FACE_SUITABLE_FOR_CHAR_P (FACE, - C) is not sutisfied. The fontset is set for this face by - face_for_char later. */ -#if 0 - if (FRAME_WINDOW_P (f)) - xassert (FACE_SUITABLE_FOR_CHAR_P (face, c)); -#endif -#endif /* GLYPH_DEBUG */ - - return face->id; -} - - -/* Return the face id of the realized face for named face SYMBOL on - frame F suitable for displaying character C. Value is -1 if the - face couldn't be determined, which might happen if the default face - isn't realized and cannot be realized. */ - -int -lookup_named_face (f, symbol, c) - struct frame *f; - Lisp_Object symbol; - int c; -{ - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - Lisp_Object symbol_attrs[LFACE_VECTOR_SIZE]; - struct face *default_face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - - if (default_face == NULL) - { - if (!realize_basic_faces (f)) - return -1; - default_face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - } - - get_lface_attributes (f, symbol, symbol_attrs, 1); - bcopy (default_face->lface, attrs, sizeof attrs); - merge_face_vectors (f, symbol_attrs, attrs, Qnil); - return lookup_face (f, attrs, c, NULL); -} - - -/* Return the ID of the realized ASCII face of Lisp face with ID - LFACE_ID on frame F. Value is -1 if LFACE_ID isn't valid. */ - -int -ascii_face_of_lisp_face (f, lface_id) - struct frame *f; - int lface_id; -{ - int face_id; - - if (lface_id >= 0 && lface_id < lface_id_to_name_size) - { - Lisp_Object face_name = lface_id_to_name[lface_id]; - face_id = lookup_named_face (f, face_name, 0); - } - else - face_id = -1; - - return face_id; -} - - -/* Return a face for charset ASCII that is like the face with id - FACE_ID on frame F, but has a font that is STEPS steps smaller. - STEPS < 0 means larger. Value is the id of the face. */ - -int -smaller_face (f, face_id, steps) - struct frame *f; - int face_id, steps; -{ -#ifdef HAVE_WINDOW_SYSTEM - struct face *face; - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - int pt, last_pt, last_height; - int delta; - int new_face_id; - struct face *new_face; - - /* If not called for an X frame, just return the original face. */ - if (FRAME_TERMCAP_P (f)) - return face_id; - - /* Try in increments of 1/2 pt. */ - delta = steps < 0 ? 5 : -5; - steps = abs (steps); - - face = FACE_FROM_ID (f, face_id); - bcopy (face->lface, attrs, sizeof attrs); - pt = last_pt = XFASTINT (attrs[LFACE_HEIGHT_INDEX]); - new_face_id = face_id; - last_height = FONT_HEIGHT (face->font); - - while (steps - && pt + delta > 0 - /* Give up if we cannot find a font within 10pt. */ - && abs (last_pt - pt) < 100) - { - /* Look up a face for a slightly smaller/larger font. */ - pt += delta; - attrs[LFACE_HEIGHT_INDEX] = make_number (pt); - new_face_id = lookup_face (f, attrs, 0, NULL); - new_face = FACE_FROM_ID (f, new_face_id); - - /* If height changes, count that as one step. */ - if ((delta < 0 && FONT_HEIGHT (new_face->font) < last_height) - || (delta > 0 && FONT_HEIGHT (new_face->font) > last_height)) - { - --steps; - last_height = FONT_HEIGHT (new_face->font); - last_pt = pt; - } - } - - return new_face_id; - -#else /* not HAVE_WINDOW_SYSTEM */ - - return face_id; - -#endif /* not HAVE_WINDOW_SYSTEM */ -} - - -/* Return a face for charset ASCII that is like the face with id - FACE_ID on frame F, but has height HEIGHT. */ - -int -face_with_height (f, face_id, height) - struct frame *f; - int face_id; - int height; -{ -#ifdef HAVE_WINDOW_SYSTEM - struct face *face; - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - - if (FRAME_TERMCAP_P (f) - || height <= 0) - return face_id; - - face = FACE_FROM_ID (f, face_id); - bcopy (face->lface, attrs, sizeof attrs); - attrs[LFACE_HEIGHT_INDEX] = make_number (height); - face_id = lookup_face (f, attrs, 0, NULL); -#endif /* HAVE_WINDOW_SYSTEM */ - - return face_id; -} - - -/* Return the face id of the realized face for named face SYMBOL on - frame F suitable for displaying character C, and use attributes of - the face FACE_ID for attributes that aren't completely specified by - SYMBOL. This is like lookup_named_face, except that the default - attributes come from FACE_ID, not from the default face. FACE_ID - is assumed to be already realized. */ - -int -lookup_derived_face (f, symbol, c, face_id) - struct frame *f; - Lisp_Object symbol; - int c; - int face_id; -{ - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - Lisp_Object symbol_attrs[LFACE_VECTOR_SIZE]; - struct face *default_face = FACE_FROM_ID (f, face_id); - - if (!default_face) - abort (); - - get_lface_attributes (f, symbol, symbol_attrs, 1); - bcopy (default_face->lface, attrs, sizeof attrs); - merge_face_vectors (f, symbol_attrs, attrs, Qnil); - return lookup_face (f, attrs, c, default_face); -} - - - -/*********************************************************************** - Font selection - ***********************************************************************/ - -DEFUN ("internal-set-font-selection-order", - Finternal_set_font_selection_order, - Sinternal_set_font_selection_order, 1, 1, 0, - "Set font selection order for face font selection to ORDER.\n\ -ORDER must be a list of length 4 containing the symbols `:width',\n\ -`:height', `:weight', and `:slant'. Face attributes appearing\n\ -first in ORDER are matched first, e.g. if `:height' appears before\n\ -`:weight' in ORDER, font selection first tries to find a font with\n\ -a suitable height, and then tries to match the font weight.\n\ -Value is ORDER.") - (order) - Lisp_Object order; -{ - Lisp_Object list; - int i; - int indices[DIM (font_sort_order)]; - - CHECK_LIST (order, 0); - bzero (indices, sizeof indices); - i = 0; - - for (list = order; - CONSP (list) && i < DIM (indices); - list = XCDR (list), ++i) - { - Lisp_Object attr = XCAR (list); - int xlfd; - - if (EQ (attr, QCwidth)) - xlfd = XLFD_SWIDTH; - else if (EQ (attr, QCheight)) - xlfd = XLFD_POINT_SIZE; - else if (EQ (attr, QCweight)) - xlfd = XLFD_WEIGHT; - else if (EQ (attr, QCslant)) - xlfd = XLFD_SLANT; - else - break; - - if (indices[i] != 0) - break; - indices[i] = xlfd; - } - - if (!NILP (list) || i != DIM (indices)) - signal_error ("Invalid font sort order", order); - for (i = 0; i < DIM (font_sort_order); ++i) - if (indices[i] == 0) - signal_error ("Invalid font sort order", order); - - if (bcmp (indices, font_sort_order, sizeof indices) != 0) - { - bcopy (indices, font_sort_order, sizeof font_sort_order); - free_all_realized_faces (Qnil); - } - - return Qnil; -} - - -DEFUN ("internal-set-alternative-font-family-alist", - Finternal_set_alternative_font_family_alist, - Sinternal_set_alternative_font_family_alist, 1, 1, 0, - "Define alternative font families to try in face font selection.\n\ -ALIST is an alist of (FAMILY ALTERNATIVE1 ALTERNATIVE2 ...) entries.\n\ -Each ALTERNATIVE is tried in order if no fonts of font family FAMILY can\n\ -be found. Value is ALIST.") - (alist) - Lisp_Object alist; -{ - CHECK_LIST (alist, 0); - Vface_alternative_font_family_alist = alist; - free_all_realized_faces (Qnil); - return alist; -} - - -DEFUN ("internal-set-alternative-font-registry-alist", - Finternal_set_alternative_font_registry_alist, - Sinternal_set_alternative_font_registry_alist, 1, 1, 0, - "Define alternative font registries to try in face font selection.\n\ -ALIST is an alist of (REGISTRY ALTERNATIVE1 ALTERNATIVE2 ...) entries.\n\ -Each ALTERNATIVE is tried in order if no fonts of font registry REGISTRY can\n\ -be found. Value is ALIST.") - (alist) - Lisp_Object alist; -{ - CHECK_LIST (alist, 0); - Vface_alternative_font_registry_alist = alist; - free_all_realized_faces (Qnil); - return alist; -} - - -#ifdef HAVE_WINDOW_SYSTEM - -/* Value is non-zero if FONT is the name of a scalable font. The - X11R6 XLFD spec says that point size, pixel size, and average width - are zero for scalable fonts. Intlfonts contain at least one - scalable font ("*-muleindian-1") for which this isn't true, so we - just test average width. */ - -static int -font_scalable_p (font) - struct font_name *font; -{ - char *s = font->fields[XLFD_AVGWIDTH]; - return (*s == '0' && *(s + 1) == '\0') -#ifdef WINDOWSNT - /* Windows implementation of XLFD is slightly broken for backward - compatibility with previous broken versions, so test for - wildcards as well as 0. */ - || *s == '*' -#endif - ; -} - - -/* Ignore the difference of font point size less than this value. */ - -#define FONT_POINT_SIZE_QUANTUM 5 - -/* Value is non-zero if FONT1 is a better match for font attributes - VALUES than FONT2. VALUES is an array of face attribute values in - font sort order. COMPARE_PT_P zero means don't compare point - sizes. AVGWIDTH, if not zero, is a specified font average width - to compare with. */ - -static int -better_font_p (values, font1, font2, compare_pt_p, avgwidth) - int *values; - struct font_name *font1, *font2; - int compare_pt_p, avgwidth; -{ - int i; - - for (i = 0; i < DIM (font_sort_order); ++i) - { - int xlfd_idx = font_sort_order[i]; - - if (compare_pt_p || xlfd_idx != XLFD_POINT_SIZE) - { - int delta1 = abs (values[i] - font1->numeric[xlfd_idx]); - int delta2 = abs (values[i] - font2->numeric[xlfd_idx]); - - if (xlfd_idx == XLFD_POINT_SIZE - && abs (delta1 - delta2) < FONT_POINT_SIZE_QUANTUM) - continue; - if (delta1 > delta2) - return 0; - else if (delta1 < delta2) - return 1; - else - { - /* The difference may be equal because, e.g., the face - specifies `italic' but we have only `regular' and - `oblique'. Prefer `oblique' in this case. */ - if ((xlfd_idx == XLFD_WEIGHT || xlfd_idx == XLFD_SLANT) - && font1->numeric[xlfd_idx] > values[i] - && font2->numeric[xlfd_idx] < values[i]) - return 1; - } - } - } - - if (avgwidth) - { - int delta1 = abs (avgwidth - font1->numeric[XLFD_AVGWIDTH]); - int delta2 = abs (avgwidth - font2->numeric[XLFD_AVGWIDTH]); - if (delta1 > delta2) - return 0; - else if (delta1 < delta2) - return 1; - } - - return font1->registry_priority < font2->registry_priority; -} - - -/* Value is non-zero if FONT is an exact match for face attributes in - SPECIFIED. SPECIFIED is an array of face attribute values in font - sort order. AVGWIDTH, if non-zero, is an average width to compare - with. */ - -static int -exact_face_match_p (specified, font, avgwidth) - int *specified; - struct font_name *font; - int avgwidth; -{ - int i; - - for (i = 0; i < DIM (font_sort_order); ++i) - if (specified[i] != font->numeric[font_sort_order[i]]) - break; - - return (i == DIM (font_sort_order) - && (avgwidth <= 0 - || avgwidth == font->numeric[XLFD_AVGWIDTH])); -} - - -/* Value is the name of a scaled font, generated from scalable font - FONT on frame F. SPECIFIED_PT is the point-size to scale FONT to. - Value is allocated from heap. */ - -static char * -build_scalable_font_name (f, font, specified_pt) - struct frame *f; - struct font_name *font; - int specified_pt; -{ - char point_size[20], pixel_size[20]; - int pixel_value; - double resy = FRAME_X_DISPLAY_INFO (f)->resy; - double pt; - - /* If scalable font is for a specific resolution, compute - the point size we must specify from the resolution of - the display and the specified resolution of the font. */ - if (font->numeric[XLFD_RESY] != 0) - { - pt = resy / font->numeric[XLFD_RESY] * specified_pt + 0.5; - pixel_value = font->numeric[XLFD_RESY] / (PT_PER_INCH * 10.0) * pt; - } - else - { - pt = specified_pt; - pixel_value = resy / (PT_PER_INCH * 10.0) * pt; - } - - /* Set point size of the font. */ - sprintf (point_size, "%d", (int) pt); - font->fields[XLFD_POINT_SIZE] = point_size; - font->numeric[XLFD_POINT_SIZE] = pt; - - /* Set pixel size. */ - sprintf (pixel_size, "%d", pixel_value); - font->fields[XLFD_PIXEL_SIZE] = pixel_size; - font->numeric[XLFD_PIXEL_SIZE] = pixel_value; - - /* If font doesn't specify its resolution, use the - resolution of the display. */ - if (font->numeric[XLFD_RESY] == 0) - { - char buffer[20]; - sprintf (buffer, "%d", (int) resy); - font->fields[XLFD_RESY] = buffer; - font->numeric[XLFD_RESY] = resy; - } - - if (strcmp (font->fields[XLFD_RESX], "0") == 0) - { - char buffer[20]; - int resx = FRAME_X_DISPLAY_INFO (f)->resx; - sprintf (buffer, "%d", resx); - font->fields[XLFD_RESX] = buffer; - font->numeric[XLFD_RESX] = resx; - } - - return build_font_name (font); -} - - -/* Value is non-zero if we are allowed to use scalable font FONT. We - can't run a Lisp function here since this function may be called - with input blocked. */ - -static int -may_use_scalable_font_p (font) - char *font; -{ - if (EQ (Vscalable_fonts_allowed, Qt)) - return 1; - else if (CONSP (Vscalable_fonts_allowed)) - { - Lisp_Object tail, regexp; - - for (tail = Vscalable_fonts_allowed; CONSP (tail); tail = XCDR (tail)) - { - regexp = XCAR (tail); - if (STRINGP (regexp) - && fast_c_string_match_ignore_case (regexp, font) >= 0) - return 1; - } - } - - return 0; -} - - - -/* Return the name of the best matching font for face attributes ATTRS - in the array of font_name structures FONTS which contains NFONTS - elements. WIDTH_RATIO is a factor with which to multiply average - widths if ATTRS specifies such a width. - - Value is a font name which is allocated from the heap. FONTS is - freed by this function. */ - -static char * -best_matching_font (f, attrs, fonts, nfonts, width_ratio) - struct frame *f; - Lisp_Object *attrs; - struct font_name *fonts; - int nfonts; - int width_ratio; -{ - char *font_name; - struct font_name *best; - int i, pt = 0; - int specified[5]; - int exact_p, avgwidth; - - if (nfonts == 0) - return NULL; - - /* Make specified font attributes available in `specified', - indexed by sort order. */ - for (i = 0; i < DIM (font_sort_order); ++i) - { - int xlfd_idx = font_sort_order[i]; - - if (xlfd_idx == XLFD_SWIDTH) - specified[i] = face_numeric_swidth (attrs[LFACE_SWIDTH_INDEX]); - else if (xlfd_idx == XLFD_POINT_SIZE) - specified[i] = pt = XFASTINT (attrs[LFACE_HEIGHT_INDEX]); - else if (xlfd_idx == XLFD_WEIGHT) - specified[i] = face_numeric_weight (attrs[LFACE_WEIGHT_INDEX]); - else if (xlfd_idx == XLFD_SLANT) - specified[i] = face_numeric_slant (attrs[LFACE_SLANT_INDEX]); - else - abort (); - } - - avgwidth = (UNSPECIFIEDP (attrs[LFACE_AVGWIDTH_INDEX]) - ? 0 - : XFASTINT (attrs[LFACE_AVGWIDTH_INDEX]) * width_ratio); - - exact_p = 0; - - /* Start with the first non-scalable font in the list. */ - for (i = 0; i < nfonts; ++i) - if (!font_scalable_p (fonts + i)) - break; - - /* Find the best match among the non-scalable fonts. */ - if (i < nfonts) - { - best = fonts + i; - - for (i = 1; i < nfonts; ++i) - if (!font_scalable_p (fonts + i) - && better_font_p (specified, fonts + i, best, 1, avgwidth)) - { - best = fonts + i; - - exact_p = exact_face_match_p (specified, best, avgwidth); - if (exact_p) - break; - } - - } - else - best = NULL; - - /* Unless we found an exact match among non-scalable fonts, see if - we can find a better match among scalable fonts. */ - if (!exact_p) - { - /* A scalable font is better if - - 1. its weight, slant, swidth attributes are better, or. - - 2. the best non-scalable font doesn't have the required - point size, and the scalable fonts weight, slant, swidth - isn't worse. */ - - int non_scalable_has_exact_height_p; - - if (best && best->numeric[XLFD_POINT_SIZE] == pt) - non_scalable_has_exact_height_p = 1; - else - non_scalable_has_exact_height_p = 0; - - for (i = 0; i < nfonts; ++i) - if (font_scalable_p (fonts + i)) - { - if (best == NULL - || better_font_p (specified, fonts + i, best, 0, 0) - || (!non_scalable_has_exact_height_p - && !better_font_p (specified, best, fonts + i, 0, 0))) - best = fonts + i; - } - } - - if (font_scalable_p (best)) - font_name = build_scalable_font_name (f, best, pt); - else - font_name = build_font_name (best); - - /* Free font_name structures. */ - free_font_names (fonts, nfonts); - - return font_name; -} - - -/* Get a list of matching fonts on frame F, considering FAMILY - and alternative font families from Vface_alternative_font_registry_alist. - - FAMILY is the font family whose alternatives are considered. - - REGISTRY, if a string, specifies a font registry and encoding to - match. A value of nil means include fonts of any registry and - encoding. - - Return in *FONTS a pointer to a vector of font_name structures for - the fonts matched. Value is the number of fonts found. */ - -static int -try_alternative_families (f, family, registry, fonts) - struct frame *f; - Lisp_Object family, registry; - struct font_name **fonts; -{ - Lisp_Object alter; - int nfonts = 0; - - nfonts = font_list (f, Qnil, family, registry, fonts); - if (nfonts == 0) - { - /* Try alternative font families. */ - alter = Fassoc (family, Vface_alternative_font_family_alist); - if (CONSP (alter)) - { - for (alter = XCDR (alter); - CONSP (alter) && nfonts == 0; - alter = XCDR (alter)) - { - if (STRINGP (XCAR (alter))) - nfonts = font_list (f, Qnil, XCAR (alter), registry, fonts); - } - } - - /* Try scalable fonts before giving up. */ - if (nfonts == 0 && NILP (Vscalable_fonts_allowed)) - { - int count = BINDING_STACK_SIZE (); - specbind (Qscalable_fonts_allowed, Qt); - nfonts = try_alternative_families (f, family, registry, fonts); - unbind_to (count, Qnil); - } - } - return nfonts; -} - - -/* Get a list of matching fonts on frame F. - - FAMILY, if a string, specifies a font family derived from the fontset. - It is only used if the face does not specify any family in ATTRS or - if we cannot find any font of the face's family. - - REGISTRY, if a string, specifies a font registry and encoding to - match. A value of nil means include fonts of any registry and - encoding. - - Return in *FONTS a pointer to a vector of font_name structures for - the fonts matched. Value is the number of fonts found. */ - -static int -try_font_list (f, attrs, family, registry, fonts) - struct frame *f; - Lisp_Object *attrs; - Lisp_Object family, registry; - struct font_name **fonts; -{ - int nfonts = 0; - Lisp_Object face_family = attrs[LFACE_FAMILY_INDEX]; - - if (STRINGP (face_family)) - nfonts = try_alternative_families (f, face_family, registry, fonts); - - if (nfonts == 0 && !NILP (family)) - nfonts = try_alternative_families (f, family, registry, fonts); - - /* Try font family of the default face or "fixed". */ - if (nfonts == 0) - { - struct face *default_face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - if (default_face) - family = default_face->lface[LFACE_FAMILY_INDEX]; - else - family = build_string ("fixed"); - nfonts = font_list (f, Qnil, family, registry, fonts); - } - - /* Try any family with the given registry. */ - if (nfonts == 0) - nfonts = font_list (f, Qnil, Qnil, registry, fonts); - - return nfonts; -} - - -/* Return the fontset id of the base fontset name or alias name given - by the fontset attribute of ATTRS. Value is -1 if the fontset - attribute of ATTRS doesn't name a fontset. */ - -static int -face_fontset (attrs) - Lisp_Object *attrs; -{ - Lisp_Object name; - - name = attrs[LFACE_FONT_INDEX]; - if (!STRINGP (name)) - return -1; - return fs_query_fontset (name, 0); -} - - -/* Choose a name of font to use on frame F to display character C with - Lisp face attributes specified by ATTRS. The font name is - determined by the font-related attributes in ATTRS and the name - pattern for C in FONTSET. Value is the font name which is - allocated from the heap and must be freed by the caller, or NULL if - we can get no information about the font name of C. It is assured - that we always get some information for a single byte - character. */ - -static char * -choose_face_font (f, attrs, fontset, c) - struct frame *f; - Lisp_Object *attrs; - int fontset, c; -{ - Lisp_Object pattern; - char *font_name = NULL; - struct font_name *fonts; - int nfonts, width_ratio; - - /* Get (foundry and) family name and registry (and encoding) name of - a font for C. */ - pattern = fontset_font_pattern (f, fontset, c); - if (NILP (pattern)) - { - xassert (!SINGLE_BYTE_CHAR_P (c)); - return NULL; - } - - /* If what we got is a name pattern, return it. */ - if (STRINGP (pattern)) - return xstrdup (XSTRING (pattern)->data); - - /* Get a list of fonts matching that pattern and choose the - best match for the specified face attributes from it. */ - nfonts = try_font_list (f, attrs, XCAR (pattern), XCDR (pattern), &fonts); - width_ratio = (SINGLE_BYTE_CHAR_P (c) - ? 1 - : CHARSET_WIDTH (CHAR_CHARSET (c))); - font_name = best_matching_font (f, attrs, fonts, nfonts, width_ratio); - return font_name; -} - -#endif /* HAVE_WINDOW_SYSTEM */ - - - -/*********************************************************************** - Face Realization - ***********************************************************************/ - -/* Realize basic faces on frame F. Value is zero if frame parameters - of F don't contain enough information needed to realize the default - face. */ - -static int -realize_basic_faces (f) - struct frame *f; -{ - int success_p = 0; - int count = BINDING_STACK_SIZE (); - - /* Block input here so that we won't be surprised by an X expose - event, for instance, without having the faces set up. */ - BLOCK_INPUT; - specbind (Qscalable_fonts_allowed, Qt); - - if (realize_default_face (f)) - { - realize_named_face (f, Qmode_line, MODE_LINE_FACE_ID); - realize_named_face (f, Qtool_bar, TOOL_BAR_FACE_ID); - realize_named_face (f, Qfringe, BITMAP_AREA_FACE_ID); - realize_named_face (f, Qheader_line, HEADER_LINE_FACE_ID); - realize_named_face (f, Qscroll_bar, SCROLL_BAR_FACE_ID); - realize_named_face (f, Qborder, BORDER_FACE_ID); - realize_named_face (f, Qcursor, CURSOR_FACE_ID); - realize_named_face (f, Qmouse, MOUSE_FACE_ID); - realize_named_face (f, Qmenu, MENU_FACE_ID); - - /* Reflect changes in the `menu' face in menu bars. */ - if (FRAME_FACE_CACHE (f)->menu_face_changed_p) - { - FRAME_FACE_CACHE (f)->menu_face_changed_p = 0; -#ifdef USE_X_TOOLKIT - x_update_menu_appearance (f); -#endif - } - - success_p = 1; - } - - unbind_to (count, Qnil); - UNBLOCK_INPUT; - return success_p; -} - - -/* Realize the default face on frame F. If the face is not fully - specified, make it fully-specified. Attributes of the default face - that are not explicitly specified are taken from frame parameters. */ - -static int -realize_default_face (f) - struct frame *f; -{ - struct face_cache *c = FRAME_FACE_CACHE (f); - Lisp_Object lface; - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - Lisp_Object frame_font; - struct face *face; - - /* If the `default' face is not yet known, create it. */ - lface = lface_from_face_name (f, Qdefault, 0); - if (NILP (lface)) - { - Lisp_Object frame; - XSETFRAME (frame, f); - lface = Finternal_make_lisp_face (Qdefault, frame); - } - -#ifdef HAVE_WINDOW_SYSTEM - if (FRAME_WINDOW_P (f)) - { - /* Set frame_font to the value of the `font' frame parameter. */ - frame_font = Fassq (Qfont, f->param_alist); - xassert (CONSP (frame_font) && STRINGP (XCDR (frame_font))); - frame_font = XCDR (frame_font); - set_lface_from_font_name (f, lface, frame_font, 1, 1); - } -#endif /* HAVE_WINDOW_SYSTEM */ - - if (!FRAME_WINDOW_P (f)) - { - LFACE_FAMILY (lface) = build_string ("default"); - LFACE_SWIDTH (lface) = Qnormal; - LFACE_HEIGHT (lface) = make_number (1); - LFACE_WEIGHT (lface) = Qnormal; - LFACE_SLANT (lface) = Qnormal; - LFACE_AVGWIDTH (lface) = Qunspecified; - } - - if (UNSPECIFIEDP (LFACE_UNDERLINE (lface))) - LFACE_UNDERLINE (lface) = Qnil; - - if (UNSPECIFIEDP (LFACE_OVERLINE (lface))) - LFACE_OVERLINE (lface) = Qnil; - - if (UNSPECIFIEDP (LFACE_STRIKE_THROUGH (lface))) - LFACE_STRIKE_THROUGH (lface) = Qnil; - - if (UNSPECIFIEDP (LFACE_BOX (lface))) - LFACE_BOX (lface) = Qnil; - - if (UNSPECIFIEDP (LFACE_INVERSE (lface))) - LFACE_INVERSE (lface) = Qnil; - - if (UNSPECIFIEDP (LFACE_FOREGROUND (lface))) - { - /* This function is called so early that colors are not yet - set in the frame parameter list. */ - Lisp_Object color = Fassq (Qforeground_color, f->param_alist); - - if (CONSP (color) && STRINGP (XCDR (color))) - LFACE_FOREGROUND (lface) = XCDR (color); - else if (FRAME_WINDOW_P (f)) - return 0; - else if (FRAME_TERMCAP_P (f) || FRAME_MSDOS_P (f)) - LFACE_FOREGROUND (lface) = build_string (unspecified_fg); - else - abort (); - } - - if (UNSPECIFIEDP (LFACE_BACKGROUND (lface))) - { - /* This function is called so early that colors are not yet - set in the frame parameter list. */ - Lisp_Object color = Fassq (Qbackground_color, f->param_alist); - if (CONSP (color) && STRINGP (XCDR (color))) - LFACE_BACKGROUND (lface) = XCDR (color); - else if (FRAME_WINDOW_P (f)) - return 0; - else if (FRAME_TERMCAP_P (f) || FRAME_MSDOS_P (f)) - LFACE_BACKGROUND (lface) = build_string (unspecified_bg); - else - abort (); - } - - if (UNSPECIFIEDP (LFACE_STIPPLE (lface))) - LFACE_STIPPLE (lface) = Qnil; - - /* Realize the face; it must be fully-specified now. */ - xassert (lface_fully_specified_p (XVECTOR (lface)->contents)); - check_lface (lface); - bcopy (XVECTOR (lface)->contents, attrs, sizeof attrs); - face = realize_face (c, attrs, 0, NULL, DEFAULT_FACE_ID); - return 1; -} - - -/* Realize basic faces other than the default face in face cache C. - SYMBOL is the face name, ID is the face id the realized face must - have. The default face must have been realized already. */ - -static void -realize_named_face (f, symbol, id) - struct frame *f; - Lisp_Object symbol; - int id; -{ - struct face_cache *c = FRAME_FACE_CACHE (f); - Lisp_Object lface = lface_from_face_name (f, symbol, 0); - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - Lisp_Object symbol_attrs[LFACE_VECTOR_SIZE]; - struct face *new_face; - - /* The default face must exist and be fully specified. */ - get_lface_attributes (f, Qdefault, attrs, 1); - check_lface_attrs (attrs); - xassert (lface_fully_specified_p (attrs)); - - /* If SYMBOL isn't know as a face, create it. */ - if (NILP (lface)) - { - Lisp_Object frame; - XSETFRAME (frame, f); - lface = Finternal_make_lisp_face (symbol, frame); - } - - /* Merge SYMBOL's face with the default face. */ - get_lface_attributes (f, symbol, symbol_attrs, 1); - merge_face_vectors (f, symbol_attrs, attrs, Qnil); - - /* Realize the face. */ - new_face = realize_face (c, attrs, 0, NULL, id); -} - - -/* Realize the fully-specified face with attributes ATTRS in face - cache CACHE for character C. If C is a multibyte character, - BASE_FACE is a face that has the same attributes. Otherwise, - BASE_FACE is ignored. If FORMER_FACE_ID is non-negative, it is an - ID of face to remove before caching the new face. Value is a - pointer to the newly created realized face. */ - -static struct face * -realize_face (cache, attrs, c, base_face, former_face_id) - struct face_cache *cache; - Lisp_Object *attrs; - int c; - struct face *base_face; - int former_face_id; -{ - struct face *face; - - /* LFACE must be fully specified. */ - xassert (cache != NULL); - check_lface_attrs (attrs); - - if (former_face_id >= 0 && cache->used > former_face_id) - { - /* Remove the former face. */ - struct face *former_face = cache->faces_by_id[former_face_id]; - uncache_face (cache, former_face); - free_realized_face (cache->f, former_face); - } - - if (FRAME_WINDOW_P (cache->f)) - face = realize_x_face (cache, attrs, c, base_face); - else if (FRAME_TERMCAP_P (cache->f) || FRAME_MSDOS_P (cache->f)) - face = realize_tty_face (cache, attrs, c); - else - abort (); - - /* Insert the new face. */ - cache_face (cache, face, lface_hash (attrs)); -#ifdef HAVE_WINDOW_SYSTEM - if (FRAME_WINDOW_P (cache->f) && face->font == NULL) - load_face_font (cache->f, face, c); -#endif /* HAVE_WINDOW_SYSTEM */ - return face; -} - - -/* Realize the fully-specified face with attributes ATTRS in face - cache CACHE for character C. Do it for X frame CACHE->f. If C is - a multibyte character, BASE_FACE is a face that has the same - attributes. Otherwise, BASE_FACE is ignored. If the new face - doesn't share font with the default face, a fontname is allocated - from the heap and set in `font_name' of the new face, but it is not - yet loaded here. Value is a pointer to the newly created realized - face. */ - -static struct face * -realize_x_face (cache, attrs, c, base_face) - struct face_cache *cache; - Lisp_Object *attrs; - int c; - struct face *base_face; -{ -#ifdef HAVE_WINDOW_SYSTEM - struct face *face, *default_face; - struct frame *f; - Lisp_Object stipple, overline, strike_through, box; - - xassert (FRAME_WINDOW_P (cache->f)); - xassert (SINGLE_BYTE_CHAR_P (c) - || base_face); - - /* Allocate a new realized face. */ - face = make_realized_face (attrs); - - f = cache->f; - - /* If C is a multibyte character, we share all face attirbutes with - BASE_FACE including the realized fontset. But, we must load a - different font. */ - if (!SINGLE_BYTE_CHAR_P (c)) - { - bcopy (base_face, face, sizeof *face); - face->gc = 0; - - /* Don't try to free the colors copied bitwise from BASE_FACE. */ - face->colors_copied_bitwise_p = 1; - - /* to force realize_face to load font */ - face->font = NULL; - return face; - } - - /* Now we are realizing a face for ASCII (and unibyte) characters. */ - - /* Determine the font to use. Most of the time, the font will be - the same as the font of the default face, so try that first. */ - default_face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - if (default_face - && FACE_SUITABLE_FOR_CHAR_P (default_face, c) - && lface_same_font_attributes_p (default_face->lface, attrs)) - { - face->font = default_face->font; - face->fontset = default_face->fontset; - face->font_info_id = default_face->font_info_id; - face->font_name = default_face->font_name; - face->ascii_face = face; - - /* But, as we can't share the fontset, make a new realized - fontset that has the same base fontset as of the default - face. */ - face->fontset - = make_fontset_for_ascii_face (f, default_face->fontset); - } - else - { - /* If the face attribute ATTRS specifies a fontset, use it as - the base of a new realized fontset. Otherwise, use the same - base fontset as of the default face. The base determines - registry and encoding of a font. It may also determine - foundry and family. The other fields of font name pattern - are constructed from ATTRS. */ - int fontset = face_fontset (attrs); - - if ((fontset == -1) && default_face) - fontset = default_face->fontset; - face->fontset = make_fontset_for_ascii_face (f, fontset); - face->font = NULL; /* to force realize_face to load font */ - -#ifdef macintosh - /* Load the font if it is specified in ATTRS. This fixes - changing frame font on the Mac. */ - if (STRINGP (attrs[LFACE_FONT_INDEX])) - { - struct font_info *font_info = - FS_LOAD_FONT (f, 0, XSTRING (attrs[LFACE_FONT_INDEX])->data, -1); - if (font_info) - face->font = font_info->font; - } -#endif - } - - /* Load colors, and set remaining attributes. */ - - load_face_colors (f, face, attrs); - - /* Set up box. */ - box = attrs[LFACE_BOX_INDEX]; - if (STRINGP (box)) - { - /* A simple box of line width 1 drawn in color given by - the string. */ - face->box_color = load_color (f, face, attrs[LFACE_BOX_INDEX], - LFACE_BOX_INDEX); - face->box = FACE_SIMPLE_BOX; - face->box_line_width = 1; - } - else if (INTEGERP (box)) - { - /* Simple box of specified line width in foreground color of the - face. */ - xassert (XINT (box) != 0); - face->box = FACE_SIMPLE_BOX; - face->box_line_width = XINT (box); - face->box_color = face->foreground; - face->box_color_defaulted_p = 1; - } - else if (CONSP (box)) - { - /* `(:width WIDTH :color COLOR :shadow SHADOW)'. SHADOW - being one of `raised' or `sunken'. */ - face->box = FACE_SIMPLE_BOX; - face->box_color = face->foreground; - face->box_color_defaulted_p = 1; - face->box_line_width = 1; - - while (CONSP (box)) - { - Lisp_Object keyword, value; - - keyword = XCAR (box); - box = XCDR (box); - - if (!CONSP (box)) - break; - value = XCAR (box); - box = XCDR (box); - - if (EQ (keyword, QCline_width)) - { - if (INTEGERP (value) && XINT (value) != 0) - face->box_line_width = XINT (value); - } - else if (EQ (keyword, QCcolor)) - { - if (STRINGP (value)) - { - face->box_color = load_color (f, face, value, - LFACE_BOX_INDEX); - face->use_box_color_for_shadows_p = 1; - } - } - else if (EQ (keyword, QCstyle)) - { - if (EQ (value, Qreleased_button)) - face->box = FACE_RAISED_BOX; - else if (EQ (value, Qpressed_button)) - face->box = FACE_SUNKEN_BOX; - } - } - } - - /* Text underline, overline, strike-through. */ - - if (EQ (attrs[LFACE_UNDERLINE_INDEX], Qt)) - { - /* Use default color (same as foreground color). */ - face->underline_p = 1; - face->underline_defaulted_p = 1; - face->underline_color = 0; - } - else if (STRINGP (attrs[LFACE_UNDERLINE_INDEX])) - { - /* Use specified color. */ - face->underline_p = 1; - face->underline_defaulted_p = 0; - face->underline_color - = load_color (f, face, attrs[LFACE_UNDERLINE_INDEX], - LFACE_UNDERLINE_INDEX); - } - else if (NILP (attrs[LFACE_UNDERLINE_INDEX])) - { - face->underline_p = 0; - face->underline_defaulted_p = 0; - face->underline_color = 0; - } - - overline = attrs[LFACE_OVERLINE_INDEX]; - if (STRINGP (overline)) - { - face->overline_color - = load_color (f, face, attrs[LFACE_OVERLINE_INDEX], - LFACE_OVERLINE_INDEX); - face->overline_p = 1; - } - else if (EQ (overline, Qt)) - { - face->overline_color = face->foreground; - face->overline_color_defaulted_p = 1; - face->overline_p = 1; - } - - strike_through = attrs[LFACE_STRIKE_THROUGH_INDEX]; - if (STRINGP (strike_through)) - { - face->strike_through_color - = load_color (f, face, attrs[LFACE_STRIKE_THROUGH_INDEX], - LFACE_STRIKE_THROUGH_INDEX); - face->strike_through_p = 1; - } - else if (EQ (strike_through, Qt)) - { - face->strike_through_color = face->foreground; - face->strike_through_color_defaulted_p = 1; - face->strike_through_p = 1; - } - - stipple = attrs[LFACE_STIPPLE_INDEX]; - if (!NILP (stipple)) - face->stipple = load_pixmap (f, stipple, &face->pixmap_w, &face->pixmap_h); - - xassert (FACE_SUITABLE_FOR_CHAR_P (face, c)); - return face; -#endif /* HAVE_WINDOW_SYSTEM */ -} - - -/* Map a specified color of face FACE on frame F to a tty color index. - IDX is either LFACE_FOREGROUND_INDEX or LFACE_BACKGROUND_INDEX, and - specifies which color to map. Set *DEFAULTED to 1 if mapping to the - default foreground/background colors. */ - -static void -map_tty_color (f, face, idx, defaulted) - struct frame *f; - struct face *face; - enum lface_attribute_index idx; - int *defaulted; -{ - Lisp_Object frame, color, def; - int foreground_p = idx == LFACE_FOREGROUND_INDEX; - unsigned long default_pixel, default_other_pixel, pixel; - - xassert (idx == LFACE_FOREGROUND_INDEX || idx == LFACE_BACKGROUND_INDEX); - - if (foreground_p) - { - pixel = default_pixel = FACE_TTY_DEFAULT_FG_COLOR; - default_other_pixel = FACE_TTY_DEFAULT_BG_COLOR; - } - else - { - pixel = default_pixel = FACE_TTY_DEFAULT_BG_COLOR; - default_other_pixel = FACE_TTY_DEFAULT_FG_COLOR; - } - - XSETFRAME (frame, f); - color = face->lface[idx]; - - if (STRINGP (color) - && XSTRING (color)->size - && CONSP (Vtty_defined_color_alist) - && (def = assq_no_quit (color, call1 (Qtty_color_alist, frame)), - CONSP (def))) - { - /* Associations in tty-defined-color-alist are of the form - (NAME INDEX R G B). We need the INDEX part. */ - pixel = XINT (XCAR (XCDR (def))); - } - - if (pixel == default_pixel && STRINGP (color)) - { - pixel = load_color (f, face, color, idx); - -#if defined (MSDOS) || defined (WINDOWSNT) - /* If the foreground of the default face is the default color, - use the foreground color defined by the frame. */ -#ifdef MSDOS - if (FRAME_MSDOS_P (f)) - { -#endif /* MSDOS */ - if (pixel == default_pixel - || pixel == FACE_TTY_DEFAULT_COLOR) - { - if (foreground_p) - pixel = FRAME_FOREGROUND_PIXEL (f); - else - pixel = FRAME_BACKGROUND_PIXEL (f); - face->lface[idx] = tty_color_name (f, pixel); - *defaulted = 1; - } - else if (pixel == default_other_pixel) - { - if (foreground_p) - pixel = FRAME_BACKGROUND_PIXEL (f); - else - pixel = FRAME_FOREGROUND_PIXEL (f); - face->lface[idx] = tty_color_name (f, pixel); - *defaulted = 1; - } -#ifdef MSDOS - } -#endif -#endif /* MSDOS or WINDOWSNT */ - } - - if (foreground_p) - face->foreground = pixel; - else - face->background = pixel; -} - - -/* Realize the fully-specified face with attributes ATTRS in face - cache CACHE for character C. Do it for TTY frame CACHE->f. Value is a - pointer to the newly created realized face. */ - -static struct face * -realize_tty_face (cache, attrs, c) - struct face_cache *cache; - Lisp_Object *attrs; - int c; -{ - struct face *face; - int weight, slant; - int face_colors_defaulted = 0; - struct frame *f = cache->f; - - /* Frame must be a termcap frame. */ - xassert (FRAME_TERMCAP_P (cache->f) || FRAME_MSDOS_P (cache->f)); - - /* Allocate a new realized face. */ - face = make_realized_face (attrs); - face->font_name = FRAME_MSDOS_P (cache->f) ? "ms-dos" : "tty"; - - /* Map face attributes to TTY appearances. We map slant to - dimmed text because we want italic text to appear differently - and because dimmed text is probably used infrequently. */ - weight = face_numeric_weight (attrs[LFACE_WEIGHT_INDEX]); - slant = face_numeric_slant (attrs[LFACE_SLANT_INDEX]); - - if (weight > XLFD_WEIGHT_MEDIUM) - face->tty_bold_p = 1; - if (weight < XLFD_WEIGHT_MEDIUM || slant != XLFD_SLANT_ROMAN) - face->tty_dim_p = 1; - if (!NILP (attrs[LFACE_UNDERLINE_INDEX])) - face->tty_underline_p = 1; - if (!NILP (attrs[LFACE_INVERSE_INDEX])) - face->tty_reverse_p = 1; - - /* Map color names to color indices. */ - map_tty_color (f, face, LFACE_FOREGROUND_INDEX, &face_colors_defaulted); - map_tty_color (f, face, LFACE_BACKGROUND_INDEX, &face_colors_defaulted); - - /* Swap colors if face is inverse-video. If the colors are taken - from the frame colors, they are already inverted, since the - frame-creation function calls x-handle-reverse-video. */ - if (face->tty_reverse_p && !face_colors_defaulted) - { - unsigned long tem = face->foreground; - face->foreground = face->background; - face->background = tem; - } - - if (tty_suppress_bold_inverse_default_colors_p - && face->tty_bold_p - && face->background == FACE_TTY_DEFAULT_FG_COLOR - && face->foreground == FACE_TTY_DEFAULT_BG_COLOR) - face->tty_bold_p = 0; - - return face; -} - - -DEFUN ("tty-suppress-bold-inverse-default-colors", - Ftty_suppress_bold_inverse_default_colors, - Stty_suppress_bold_inverse_default_colors, 1, 1, 0, - "Suppress/allow boldness of faces with inverse default colors.\n\ -SUPPRESS non-nil means suppress it.\n\ -This affects bold faces on TTYs whose foreground is the default background\n\ -color of the display and whose background is the default foreground color.\n\ -For such faces, the bold face attribute is ignored if this variable\n\ -is non-nil.") - (suppress) - Lisp_Object suppress; -{ - tty_suppress_bold_inverse_default_colors_p = !NILP (suppress); - ++face_change_count; - return suppress; -} - - - -/*********************************************************************** - Computing Faces - ***********************************************************************/ - -/* Return the ID of the face to use to display character CH with face - property PROP on frame F in current_buffer. */ - -int -compute_char_face (f, ch, prop) - struct frame *f; - int ch; - Lisp_Object prop; -{ - int face_id; - - if (NILP (current_buffer->enable_multibyte_characters)) - ch = 0; - - if (NILP (prop)) - { - struct face *face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - face_id = FACE_FOR_CHAR (f, face, ch); - } - else - { - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - struct face *default_face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - bcopy (default_face->lface, attrs, sizeof attrs); - merge_face_vector_with_property (f, attrs, prop); - face_id = lookup_face (f, attrs, ch, NULL); - } - - return face_id; -} - - -/* Return the face ID associated with buffer position POS for - displaying ASCII characters. Return in *ENDPTR the position at - which a different face is needed, as far as text properties and - overlays are concerned. W is a window displaying current_buffer. - - REGION_BEG, REGION_END delimit the region, so it can be - highlighted. - - LIMIT is a position not to scan beyond. That is to limit the time - this function can take. - - If MOUSE is non-zero, use the character's mouse-face, not its face. - - The face returned is suitable for displaying ASCII characters. */ - -int -face_at_buffer_position (w, pos, region_beg, region_end, - endptr, limit, mouse) - struct window *w; - int pos; - int region_beg, region_end; - int *endptr; - int limit; - int mouse; -{ - struct frame *f = XFRAME (w->frame); - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - Lisp_Object prop, position; - int i, noverlays; - Lisp_Object *overlay_vec; - Lisp_Object frame; - int endpos; - Lisp_Object propname = mouse ? Qmouse_face : Qface; - Lisp_Object limit1, end; - struct face *default_face; - - /* W must display the current buffer. We could write this function - to use the frame and buffer of W, but right now it doesn't. */ - /* xassert (XBUFFER (w->buffer) == current_buffer); */ - - XSETFRAME (frame, f); - XSETFASTINT (position, pos); - - endpos = ZV; - if (pos < region_beg && region_beg < endpos) - endpos = region_beg; - - /* Get the `face' or `mouse_face' text property at POS, and - determine the next position at which the property changes. */ - prop = Fget_text_property (position, propname, w->buffer); - XSETFASTINT (limit1, (limit < endpos ? limit : endpos)); - end = Fnext_single_property_change (position, propname, w->buffer, limit1); - if (INTEGERP (end)) - endpos = XINT (end); - - /* Look at properties from overlays. */ - { - int next_overlay; - int len; - - /* First try with room for 40 overlays. */ - len = 40; - overlay_vec = (Lisp_Object *) alloca (len * sizeof (Lisp_Object)); - noverlays = overlays_at (pos, 0, &overlay_vec, &len, - &next_overlay, NULL, 0); - - /* If there are more than 40, make enough space for all, and try - again. */ - if (noverlays > len) - { - len = noverlays; - overlay_vec = (Lisp_Object *) alloca (len * sizeof (Lisp_Object)); - noverlays = overlays_at (pos, 0, &overlay_vec, &len, - &next_overlay, NULL, 0); - } - - if (next_overlay < endpos) - endpos = next_overlay; - } - - *endptr = endpos; - - default_face = FACE_FROM_ID (f, DEFAULT_FACE_ID); - - /* Optimize common cases where we can use the default face. */ - if (noverlays == 0 - && NILP (prop) - && !(pos >= region_beg && pos < region_end)) - return DEFAULT_FACE_ID; - - /* Begin with attributes from the default face. */ - bcopy (default_face->lface, attrs, sizeof attrs); - - /* Merge in attributes specified via text properties. */ - if (!NILP (prop)) - merge_face_vector_with_property (f, attrs, prop); - - /* Now merge the overlay data. */ - noverlays = sort_overlays (overlay_vec, noverlays, w); - for (i = 0; i < noverlays; i++) - { - Lisp_Object oend; - int oendpos; - - prop = Foverlay_get (overlay_vec[i], propname); - if (!NILP (prop)) - merge_face_vector_with_property (f, attrs, prop); - - oend = OVERLAY_END (overlay_vec[i]); - oendpos = OVERLAY_POSITION (oend); - if (oendpos < endpos) - endpos = oendpos; - } - - /* If in the region, merge in the region face. */ - if (pos >= region_beg && pos < region_end) - { - Lisp_Object region_face = lface_from_face_name (f, Qregion, 0); - merge_face_vectors (f, XVECTOR (region_face)->contents, attrs, Qnil); - - if (region_end < endpos) - endpos = region_end; - } - - *endptr = endpos; - - /* Look up a realized face with the given face attributes, - or realize a new one for ASCII characters. */ - return lookup_face (f, attrs, 0, NULL); -} - - -/* Compute the face at character position POS in Lisp string STRING on - window W, for ASCII characters. - - If STRING is an overlay string, it comes from position BUFPOS in - current_buffer, otherwise BUFPOS is zero to indicate that STRING is - not an overlay string. W must display the current buffer. - REGION_BEG and REGION_END give the start and end positions of the - region; both are -1 if no region is visible. - - BASE_FACE_ID is the id of a face to merge with. For strings coming - from overlays or the `display' property it is the face at BUFPOS. - - If MOUSE_P is non-zero, use the character's mouse-face, not its face. - - Set *ENDPTR to the next position where to check for faces in - STRING; -1 if the face is constant from POS to the end of the - string. - - Value is the id of the face to use. The face returned is suitable - for displaying ASCII characters. */ - -int -face_at_string_position (w, string, pos, bufpos, region_beg, - region_end, endptr, base_face_id, mouse_p) - struct window *w; - Lisp_Object string; - int pos, bufpos; - int region_beg, region_end; - int *endptr; - enum face_id base_face_id; - int mouse_p; -{ - Lisp_Object prop, position, end, limit; - struct frame *f = XFRAME (WINDOW_FRAME (w)); - Lisp_Object attrs[LFACE_VECTOR_SIZE]; - struct face *base_face; - int multibyte_p = STRING_MULTIBYTE (string); - Lisp_Object prop_name = mouse_p ? Qmouse_face : Qface; - - /* Get the value of the face property at the current position within - STRING. Value is nil if there is no face property. */ - XSETFASTINT (position, pos); - prop = Fget_text_property (position, prop_name, string); - - /* Get the next position at which to check for faces. Value of end - is nil if face is constant all the way to the end of the string. - Otherwise it is a string position where to check faces next. - Limit is the maximum position up to which to check for property - changes in Fnext_single_property_change. Strings are usually - short, so set the limit to the end of the string. */ - XSETFASTINT (limit, XSTRING (string)->size); - end = Fnext_single_property_change (position, prop_name, string, limit); - if (INTEGERP (end)) - *endptr = XFASTINT (end); - else - *endptr = -1; - - base_face = FACE_FROM_ID (f, base_face_id); - xassert (base_face); - - /* Optimize the default case that there is no face property and we - are not in the region. */ - if (NILP (prop) - && (base_face_id != DEFAULT_FACE_ID - /* BUFPOS <= 0 means STRING is not an overlay string, so - that the region doesn't have to be taken into account. */ - || bufpos <= 0 - || bufpos < region_beg - || bufpos >= region_end) - && (multibyte_p - /* We can't realize faces for different charsets differently - if we don't have fonts, so we can stop here if not working - on a window-system frame. */ - || !FRAME_WINDOW_P (f) - || FACE_SUITABLE_FOR_CHAR_P (base_face, 0))) - return base_face->id; - - /* Begin with attributes from the base face. */ - bcopy (base_face->lface, attrs, sizeof attrs); - - /* Merge in attributes specified via text properties. */ - if (!NILP (prop)) - merge_face_vector_with_property (f, attrs, prop); - - /* If in the region, merge in the region face. */ - if (bufpos - && bufpos >= region_beg - && bufpos < region_end) - { - Lisp_Object region_face = lface_from_face_name (f, Qregion, 0); - merge_face_vectors (f, XVECTOR (region_face)->contents, attrs, Qnil); - } - - /* Look up a realized face with the given face attributes, - or realize a new one for ASCII characters. */ - return lookup_face (f, attrs, 0, NULL); -} - - - -/*********************************************************************** - Tests - ***********************************************************************/ - -#if GLYPH_DEBUG - -/* Print the contents of the realized face FACE to stderr. */ - -static void -dump_realized_face (face) - struct face *face; -{ - fprintf (stderr, "ID: %d\n", face->id); -#ifdef HAVE_X_WINDOWS - fprintf (stderr, "gc: %d\n", (int) face->gc); -#endif - fprintf (stderr, "foreground: 0x%lx (%s)\n", - face->foreground, - XSTRING (face->lface[LFACE_FOREGROUND_INDEX])->data); - fprintf (stderr, "background: 0x%lx (%s)\n", - face->background, - XSTRING (face->lface[LFACE_BACKGROUND_INDEX])->data); - fprintf (stderr, "font_name: %s (%s)\n", - face->font_name, - XSTRING (face->lface[LFACE_FAMILY_INDEX])->data); -#ifdef HAVE_X_WINDOWS - fprintf (stderr, "font = %p\n", face->font); -#endif - fprintf (stderr, "font_info_id = %d\n", face->font_info_id); - fprintf (stderr, "fontset: %d\n", face->fontset); - fprintf (stderr, "underline: %d (%s)\n", - face->underline_p, - XSTRING (Fsymbol_name (face->lface[LFACE_UNDERLINE_INDEX]))->data); - fprintf (stderr, "hash: %d\n", face->hash); - fprintf (stderr, "charset: %d\n", face->charset); -} - - -DEFUN ("dump-face", Fdump_face, Sdump_face, 0, 1, 0, "") - (n) - Lisp_Object n; -{ - if (NILP (n)) - { - int i; - - fprintf (stderr, "font selection order: "); - for (i = 0; i < DIM (font_sort_order); ++i) - fprintf (stderr, "%d ", font_sort_order[i]); - fprintf (stderr, "\n"); - - fprintf (stderr, "alternative fonts: "); - debug_print (Vface_alternative_font_family_alist); - fprintf (stderr, "\n"); - - for (i = 0; i < FRAME_FACE_CACHE (SELECTED_FRAME ())->used; ++i) - Fdump_face (make_number (i)); - } - else - { - struct face *face; - CHECK_NUMBER (n, 0); - face = FACE_FROM_ID (SELECTED_FRAME (), XINT (n)); - if (face == NULL) - error ("Not a valid face"); - dump_realized_face (face); - } - - return Qnil; -} - - -DEFUN ("show-face-resources", Fshow_face_resources, Sshow_face_resources, - 0, 0, 0, "") - () -{ - fprintf (stderr, "number of colors = %d\n", ncolors_allocated); - fprintf (stderr, "number of pixmaps = %d\n", npixmaps_allocated); - fprintf (stderr, "number of GCs = %d\n", ngcs); - return Qnil; -} - -#endif /* GLYPH_DEBUG != 0 */ - - - -/*********************************************************************** - Initialization - ***********************************************************************/ - -void -syms_of_xfaces () -{ - Qface = intern ("face"); - staticpro (&Qface); - Qbitmap_spec_p = intern ("bitmap-spec-p"); - staticpro (&Qbitmap_spec_p); - Qframe_update_face_colors = intern ("frame-update-face-colors"); - staticpro (&Qframe_update_face_colors); - - /* Lisp face attribute keywords. */ - QCfamily = intern (":family"); - staticpro (&QCfamily); - QCheight = intern (":height"); - staticpro (&QCheight); - QCweight = intern (":weight"); - staticpro (&QCweight); - QCslant = intern (":slant"); - staticpro (&QCslant); - QCunderline = intern (":underline"); - staticpro (&QCunderline); - QCinverse_video = intern (":inverse-video"); - staticpro (&QCinverse_video); - QCreverse_video = intern (":reverse-video"); - staticpro (&QCreverse_video); - QCforeground = intern (":foreground"); - staticpro (&QCforeground); - QCbackground = intern (":background"); - staticpro (&QCbackground); - QCstipple = intern (":stipple");; - staticpro (&QCstipple); - QCwidth = intern (":width"); - staticpro (&QCwidth); - QCfont = intern (":font"); - staticpro (&QCfont); - QCbold = intern (":bold"); - staticpro (&QCbold); - QCitalic = intern (":italic"); - staticpro (&QCitalic); - QCoverline = intern (":overline"); - staticpro (&QCoverline); - QCstrike_through = intern (":strike-through"); - staticpro (&QCstrike_through); - QCbox = intern (":box"); - staticpro (&QCbox); - QCinherit = intern (":inherit"); - staticpro (&QCinherit); - - /* Symbols used for Lisp face attribute values. */ - QCcolor = intern (":color"); - staticpro (&QCcolor); - QCline_width = intern (":line-width"); - staticpro (&QCline_width); - QCstyle = intern (":style"); - staticpro (&QCstyle); - Qreleased_button = intern ("released-button"); - staticpro (&Qreleased_button); - Qpressed_button = intern ("pressed-button"); - staticpro (&Qpressed_button); - Qnormal = intern ("normal"); - staticpro (&Qnormal); - Qultra_light = intern ("ultra-light"); - staticpro (&Qultra_light); - Qextra_light = intern ("extra-light"); - staticpro (&Qextra_light); - Qlight = intern ("light"); - staticpro (&Qlight); - Qsemi_light = intern ("semi-light"); - staticpro (&Qsemi_light); - Qsemi_bold = intern ("semi-bold"); - staticpro (&Qsemi_bold); - Qbold = intern ("bold"); - staticpro (&Qbold); - Qextra_bold = intern ("extra-bold"); - staticpro (&Qextra_bold); - Qultra_bold = intern ("ultra-bold"); - staticpro (&Qultra_bold); - Qoblique = intern ("oblique"); - staticpro (&Qoblique); - Qitalic = intern ("italic"); - staticpro (&Qitalic); - Qreverse_oblique = intern ("reverse-oblique"); - staticpro (&Qreverse_oblique); - Qreverse_italic = intern ("reverse-italic"); - staticpro (&Qreverse_italic); - Qultra_condensed = intern ("ultra-condensed"); - staticpro (&Qultra_condensed); - Qextra_condensed = intern ("extra-condensed"); - staticpro (&Qextra_condensed); - Qcondensed = intern ("condensed"); - staticpro (&Qcondensed); - Qsemi_condensed = intern ("semi-condensed"); - staticpro (&Qsemi_condensed); - Qsemi_expanded = intern ("semi-expanded"); - staticpro (&Qsemi_expanded); - Qexpanded = intern ("expanded"); - staticpro (&Qexpanded); - Qextra_expanded = intern ("extra-expanded"); - staticpro (&Qextra_expanded); - Qultra_expanded = intern ("ultra-expanded"); - staticpro (&Qultra_expanded); - Qbackground_color = intern ("background-color"); - staticpro (&Qbackground_color); - Qforeground_color = intern ("foreground-color"); - staticpro (&Qforeground_color); - Qunspecified = intern ("unspecified"); - staticpro (&Qunspecified); - - Qface_alias = intern ("face-alias"); - staticpro (&Qface_alias); - Qdefault = intern ("default"); - staticpro (&Qdefault); - Qtool_bar = intern ("tool-bar"); - staticpro (&Qtool_bar); - Qregion = intern ("region"); - staticpro (&Qregion); - Qfringe = intern ("fringe"); - staticpro (&Qfringe); - Qheader_line = intern ("header-line"); - staticpro (&Qheader_line); - Qscroll_bar = intern ("scroll-bar"); - staticpro (&Qscroll_bar); - Qmenu = intern ("menu"); - staticpro (&Qmenu); - Qcursor = intern ("cursor"); - staticpro (&Qcursor); - Qborder = intern ("border"); - staticpro (&Qborder); - Qmouse = intern ("mouse"); - staticpro (&Qmouse); - Qtty_color_desc = intern ("tty-color-desc"); - staticpro (&Qtty_color_desc); - Qtty_color_by_index = intern ("tty-color-by-index"); - staticpro (&Qtty_color_by_index); - Qtty_color_alist = intern ("tty-color-alist"); - staticpro (&Qtty_color_alist); - Qscalable_fonts_allowed = intern ("scalable-fonts-allowed"); - staticpro (&Qscalable_fonts_allowed); - - Vparam_value_alist = Fcons (Fcons (Qnil, Qnil), Qnil); - staticpro (&Vparam_value_alist); - Vface_alternative_font_family_alist = Qnil; - staticpro (&Vface_alternative_font_family_alist); - Vface_alternative_font_registry_alist = Qnil; - staticpro (&Vface_alternative_font_registry_alist); - - defsubr (&Sinternal_make_lisp_face); - defsubr (&Sinternal_lisp_face_p); - defsubr (&Sinternal_set_lisp_face_attribute); -#ifdef HAVE_WINDOW_SYSTEM - defsubr (&Sinternal_set_lisp_face_attribute_from_resource); -#endif - defsubr (&Scolor_gray_p); - defsubr (&Scolor_supported_p); - defsubr (&Sinternal_get_lisp_face_attribute); - defsubr (&Sinternal_lisp_face_attribute_values); - defsubr (&Sinternal_lisp_face_equal_p); - defsubr (&Sinternal_lisp_face_empty_p); - defsubr (&Sinternal_copy_lisp_face); - defsubr (&Sinternal_merge_in_global_face); - defsubr (&Sface_font); - defsubr (&Sframe_face_alist); - defsubr (&Sinternal_set_font_selection_order); - defsubr (&Sinternal_set_alternative_font_family_alist); - defsubr (&Sinternal_set_alternative_font_registry_alist); -#if GLYPH_DEBUG - defsubr (&Sdump_face); - defsubr (&Sshow_face_resources); -#endif /* GLYPH_DEBUG */ - defsubr (&Sclear_face_cache); - defsubr (&Stty_suppress_bold_inverse_default_colors); - -#if defined DEBUG_X_COLORS && defined HAVE_X_WINDOWS - defsubr (&Sdump_colors); -#endif - - DEFVAR_LISP ("font-list-limit", &Vfont_list_limit, - "*Limit for font matching.\n\ -If an integer > 0, font matching functions won't load more than\n\ -that number of fonts when searching for a matching font."); - Vfont_list_limit = make_number (DEFAULT_FONT_LIST_LIMIT); - - DEFVAR_LISP ("face-new-frame-defaults", &Vface_new_frame_defaults, - "List of global face definitions (for internal use only.)"); - Vface_new_frame_defaults = Qnil; - - DEFVAR_LISP ("face-default-stipple", &Vface_default_stipple, - "*Default stipple pattern used on monochrome displays.\n\ -This stipple pattern is used on monochrome displays\n\ -instead of shades of gray for a face background color.\n\ -See `set-face-stipple' for possible values for this variable."); - Vface_default_stipple = build_string ("gray3"); - - DEFVAR_LISP ("tty-defined-color-alist", &Vtty_defined_color_alist, - "An alist of defined terminal colors and their RGB values."); - Vtty_defined_color_alist = Qnil; - - DEFVAR_LISP ("scalable-fonts-allowed", &Vscalable_fonts_allowed, - "Allowed scalable fonts.\n\ -A value of nil means don't allow any scalable fonts.\n\ -A value of t means allow any scalable font.\n\ -Otherwise, value must be a list of regular expressions. A font may be\n\ -scaled if its name matches a regular expression in the list.\n\ -Note that if value is nil, a scalable font might still be used, if no\n\ -other font of the appropriate family and registry is available."); - Vscalable_fonts_allowed = Qnil; - - DEFVAR_LISP ("face-ignored-fonts", &Vface_ignored_fonts, - "List of ignored fonts.\n\ -Each element is a regular expression that matches names of fonts to ignore."); - Vface_ignored_fonts = Qnil; - -#ifdef HAVE_WINDOW_SYSTEM - defsubr (&Sbitmap_spec_p); - defsubr (&Sx_list_fonts); - defsubr (&Sinternal_face_x_get_resource); - defsubr (&Sx_family_fonts); - defsubr (&Sx_font_family_list); -#endif /* HAVE_WINDOW_SYSTEM */ -} ./contrib/xfaces/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:12.075098383 +0000 @@ -1,1789 +0,0 @@ -/* - * RFC2367 PF_KEYv2 Key management API message parser - * Copyright (C) 1999, 2000, 2001 Richard Guy Briggs. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. See . - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY - * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * for more details. - * - * RCSID $Id: pfkey_v2_parse.c,v 1.53 2003/01/30 02:32:09 rgb Exp $ - */ - -/* - * Template from klips/net/ipsec/ipsec/ipsec_parser.c. - */ - -char pfkey_v2_parse_c_version[] = "$Id: pfkey_v2_parse.c,v 1.53 2003/01/30 02:32:09 rgb Exp $"; - -/* - * Some ugly stuff to allow consistent debugging code for use in the - * kernel and in user space -*/ - -#ifdef __KERNEL__ - -# include /* for printk */ - -#include "freeswan/ipsec_kversion.h" /* for malloc switch */ - -# ifdef MALLOC_SLAB -# include /* kmalloc() */ -# else /* MALLOC_SLAB */ -# include /* kmalloc() */ -# endif /* MALLOC_SLAB */ -# include /* error codes */ -# include /* size_t */ -# include /* mark_bh */ - -# include /* struct device, and other headers */ -# include /* eth_type_trans */ -# include /* struct iphdr */ -# if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) -# include /* struct ipv6hdr */ -# endif /* if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) */ -extern int debug_pfkey; - -# include - -#include "freeswan/ipsec_encap.h" - -#else /* __KERNEL__ */ - -# include -# include -# include - -# include -# include "programs/pluto/constants.h" -# include "programs/pluto/defs.h" /* for PRINTF_LIKE */ -# include "programs/pluto/log.h" /* for debugging and DBG_log */ - -/* #define PLUTO */ - -# ifdef PLUTO -# define DEBUGGING(level, args...) { DBG_log("pfkey_lib_debug:" args); } -# else -# define DEBUGGING(level, args...) if(pfkey_lib_debug & level) { printf("pfkey_lib_debug:" args); } else { ; } -# endif - -#endif /* __KERNEL__ */ - - -#include -#include - -#ifdef __KERNEL__ -# include "freeswan/ipsec_netlink.h" /* KLIPS_PRINT */ -extern int sysctl_ipsec_debug_verbose; -# define DEBUGGING(level, args...) \ - KLIPS_PRINT( \ - ((debug_pfkey & level & (PF_KEY_DEBUG_PARSE_STRUCT | PF_KEY_DEBUG_PARSE_PROBLEM)) \ - || (sysctl_ipsec_debug_verbose && (debug_pfkey & level & PF_KEY_DEBUG_PARSE_FLOW))) \ - , "klips_debug:" args) -#endif /* __KERNEL__ */ -#include "freeswan/ipsec_sa.h" /* IPSEC_SAREF_NULL, IPSEC_SA_REF_TABLE_IDX_WIDTH */ - - -#define SENDERR(_x) do { error = -(_x); goto errlab; } while (0) - -struct satype_tbl { - uint8_t proto; - uint8_t satype; - char* name; -} static satype_tbl[] = { -#ifdef __KERNEL__ - { IPPROTO_ESP, SADB_SATYPE_ESP, "ESP" }, - { IPPROTO_AH, SADB_SATYPE_AH, "AH" }, - { IPPROTO_IPIP, SADB_X_SATYPE_IPIP, "IPIP" }, -#ifdef CONFIG_IPSEC_IPCOMP - { IPPROTO_COMP, SADB_X_SATYPE_COMP, "COMP" }, -#endif /* CONFIG_IPSEC_IPCOMP */ - { IPPROTO_INT, SADB_X_SATYPE_INT, "INT" }, -#else /* __KERNEL__ */ - { SA_ESP, SADB_SATYPE_ESP, "ESP" }, - { SA_AH, SADB_SATYPE_AH, "AH" }, - { SA_IPIP, SADB_X_SATYPE_IPIP, "IPIP" }, - { SA_COMP, SADB_X_SATYPE_COMP, "COMP" }, - { SA_INT, SADB_X_SATYPE_INT, "INT" }, -#endif /* __KERNEL__ */ - { 0, 0, "UNKNOWN" } -}; - -uint8_t -satype2proto(uint8_t satype) -{ - int i =0; - - while(satype_tbl[i].satype != satype && satype_tbl[i].satype != 0) { - i++; - } - return satype_tbl[i].proto; -} - -uint8_t -proto2satype(uint8_t proto) -{ - int i = 0; - - while(satype_tbl[i].proto != proto && satype_tbl[i].proto != 0) { - i++; - } - return satype_tbl[i].satype; -} - -char* -satype2name(uint8_t satype) -{ - int i = 0; - - while(satype_tbl[i].satype != satype && satype_tbl[i].satype != 0) { - i++; - } - return satype_tbl[i].name; -} - -char* -proto2name(uint8_t proto) -{ - int i = 0; - - while(satype_tbl[i].proto != proto && satype_tbl[i].proto != 0) { - i++; - } - return satype_tbl[i].name; -} - -/* Default extension parsers taken from the KLIPS code */ - -DEBUG_NO_STATIC int -pfkey_sa_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_sa *pfkey_sa = (struct sadb_sa *)pfkey_ext; -#if 0 - struct sadb_sa sav2; -#endif - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_sa_parse: entry\n"); - /* sanity checks... */ - if(!pfkey_sa) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "NULL pointer passed in.\n"); - SENDERR(EINVAL); - } - -#if 0 - /* check if this structure is short, and if so, fix it up. - * XXX this is NOT the way to do things. - */ - if(pfkey_sa->sadb_sa_len == sizeof(struct sadb_sa_v1)/IPSEC_PFKEYv2_ALIGN) { - - /* yes, so clear out a temporary structure, and copy first */ - memset(&sav2, 0, sizeof(sav2)); - memcpy(&sav2, pfkey_sa, sizeof(struct sadb_sa_v1)); - sav2.sadb_x_sa_ref=-1; - sav2.sadb_sa_len = sizeof(struct sadb_sa) / IPSEC_PFKEYv2_ALIGN; - - pfkey_sa = &sav2; - } -#endif - - - if(pfkey_sa->sadb_sa_len != sizeof(struct sadb_sa) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "length wrong pfkey_sa->sadb_sa_len=%d sizeof(struct sadb_sa)=%d.\n", - pfkey_sa->sadb_sa_len, - (int)sizeof(struct sadb_sa)); - SENDERR(EINVAL); - } - - if(pfkey_sa->sadb_sa_encrypt > SADB_EALG_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "pfkey_sa->sadb_sa_encrypt=%d > SADB_EALG_MAX=%d.\n", - pfkey_sa->sadb_sa_encrypt, - SADB_EALG_MAX); - SENDERR(EINVAL); - } - - if(pfkey_sa->sadb_sa_auth > SADB_AALG_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "pfkey_sa->sadb_sa_auth=%d > SADB_AALG_MAX=%d.\n", - pfkey_sa->sadb_sa_auth, - SADB_AALG_MAX); - SENDERR(EINVAL); - } - - if(pfkey_sa->sadb_sa_state > SADB_SASTATE_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "state=%d exceeds MAX=%d.\n", - pfkey_sa->sadb_sa_state, - SADB_SASTATE_MAX); - SENDERR(EINVAL); - } - - if(pfkey_sa->sadb_sa_state == SADB_SASTATE_DEAD) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "state=%d is DEAD=%d.\n", - pfkey_sa->sadb_sa_state, - SADB_SASTATE_DEAD); - SENDERR(EINVAL); - } - - if(pfkey_sa->sadb_sa_replay > 64) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "replay window size: %d -- must be 0 <= size <= 64\n", - pfkey_sa->sadb_sa_replay); - SENDERR(EINVAL); - } - - if(! ((pfkey_sa->sadb_sa_exttype == SADB_EXT_SA) || - (pfkey_sa->sadb_sa_exttype == SADB_X_EXT_SA2))) - { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "unknown exttype=%d, expecting SADB_EXT_SA=%d or SADB_X_EXT_SA2=%d.\n", - pfkey_sa->sadb_sa_exttype, - SADB_EXT_SA, - SADB_X_EXT_SA2); - SENDERR(EINVAL); - } - - if((IPSEC_SAREF_NULL != pfkey_sa->sadb_x_sa_ref) && (pfkey_sa->sadb_x_sa_ref >= (1 << IPSEC_SA_REF_TABLE_IDX_WIDTH))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sa_parse: " - "SAref=%d must be (SAref == IPSEC_SAREF_NULL(%d) || SAref < IPSEC_SA_REF_TABLE_NUM_ENTRIES(%d)).\n", - pfkey_sa->sadb_x_sa_ref, - IPSEC_SAREF_NULL, - IPSEC_SA_REF_TABLE_NUM_ENTRIES); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_sa_parse: " - "successfully found len=%d exttype=%d(%s) spi=%08lx replay=%d state=%d auth=%d encrypt=%d flags=%d ref=%d.\n", - pfkey_sa->sadb_sa_len, - pfkey_sa->sadb_sa_exttype, - pfkey_v2_sadb_ext_string(pfkey_sa->sadb_sa_exttype), - (long unsigned int)ntohl(pfkey_sa->sadb_sa_spi), - pfkey_sa->sadb_sa_replay, - pfkey_sa->sadb_sa_state, - pfkey_sa->sadb_sa_auth, - pfkey_sa->sadb_sa_encrypt, - pfkey_sa->sadb_sa_flags, - pfkey_sa->sadb_x_sa_ref); - - errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_lifetime_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_lifetime *pfkey_lifetime = (struct sadb_lifetime *)pfkey_ext; - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_lifetime_parse:enter\n"); - /* sanity checks... */ - if(!pfkey_lifetime) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_lifetime_parse: " - "NULL pointer passed in.\n"); - SENDERR(EINVAL); - } - - if(pfkey_lifetime->sadb_lifetime_len != - sizeof(struct sadb_lifetime) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_lifetime_parse: " - "length wrong pfkey_lifetime->sadb_lifetime_len=%d sizeof(struct sadb_lifetime)=%d.\n", - pfkey_lifetime->sadb_lifetime_len, - (int)sizeof(struct sadb_lifetime)); - SENDERR(EINVAL); - } - - if((pfkey_lifetime->sadb_lifetime_exttype != SADB_EXT_LIFETIME_HARD) && - (pfkey_lifetime->sadb_lifetime_exttype != SADB_EXT_LIFETIME_SOFT) && - (pfkey_lifetime->sadb_lifetime_exttype != SADB_EXT_LIFETIME_CURRENT)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_lifetime_parse: " - "unexpected ext_type=%d.\n", - pfkey_lifetime->sadb_lifetime_exttype); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_lifetime_parse: " - "life_type=%d(%s) alloc=%u bytes=%u add=%u use=%u pkts=%u.\n", - pfkey_lifetime->sadb_lifetime_exttype, - pfkey_v2_sadb_ext_string(pfkey_lifetime->sadb_lifetime_exttype), - pfkey_lifetime->sadb_lifetime_allocations, - (unsigned)pfkey_lifetime->sadb_lifetime_bytes, - (unsigned)pfkey_lifetime->sadb_lifetime_addtime, - (unsigned)pfkey_lifetime->sadb_lifetime_usetime, - pfkey_lifetime->sadb_x_lifetime_packets); -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_address_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - int saddr_len = 0; - struct sadb_address *pfkey_address = (struct sadb_address *)pfkey_ext; - struct sockaddr* s = (struct sockaddr*)((char*)pfkey_address + sizeof(*pfkey_address)); - char ipaddr_txt[ADDRTOT_BUF]; - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_address_parse:enter\n"); - /* sanity checks... */ - if(!pfkey_address) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "NULL pointer passed in.\n"); - SENDERR(EINVAL); - } - - if(pfkey_address->sadb_address_len < - (sizeof(struct sadb_address) + sizeof(struct sockaddr))/ - IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "size wrong 1 ext_len=%d, adr_ext_len=%d, saddr_len=%d.\n", - pfkey_address->sadb_address_len, - (int)sizeof(struct sadb_address), - (int)sizeof(struct sockaddr)); - SENDERR(EINVAL); - } - - if(pfkey_address->sadb_address_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "res=%d, must be zero.\n", - pfkey_address->sadb_address_reserved); - SENDERR(EINVAL); - } - - switch(pfkey_address->sadb_address_exttype) { - case SADB_EXT_ADDRESS_SRC: - case SADB_EXT_ADDRESS_DST: - case SADB_EXT_ADDRESS_PROXY: - case SADB_X_EXT_ADDRESS_DST2: - case SADB_X_EXT_ADDRESS_SRC_FLOW: - case SADB_X_EXT_ADDRESS_DST_FLOW: - case SADB_X_EXT_ADDRESS_SRC_MASK: - case SADB_X_EXT_ADDRESS_DST_MASK: -#ifdef NAT_TRAVERSAL - case SADB_X_EXT_NAT_T_OA: -#endif - break; - default: - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "unexpected ext_type=%d.\n", - pfkey_address->sadb_address_exttype); - SENDERR(EINVAL); - } - - switch(s->sa_family) { - case AF_INET: - saddr_len = sizeof(struct sockaddr_in); - sprintf(ipaddr_txt, "%d.%d.%d.%d" - , (((struct sockaddr_in*)s)->sin_addr.s_addr >> 0) & 0xFF - , (((struct sockaddr_in*)s)->sin_addr.s_addr >> 8) & 0xFF - , (((struct sockaddr_in*)s)->sin_addr.s_addr >> 16) & 0xFF - , (((struct sockaddr_in*)s)->sin_addr.s_addr >> 24) & 0xFF); - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_address_parse: " - "found exttype=%u(%s) family=%d(AF_INET) address=%s proto=%u port=%u.\n", - pfkey_address->sadb_address_exttype, - pfkey_v2_sadb_ext_string(pfkey_address->sadb_address_exttype), - s->sa_family, - ipaddr_txt, - pfkey_address->sadb_address_proto, - ((struct sockaddr_in*)s)->sin_port); - break; - case AF_INET6: - saddr_len = sizeof(struct sockaddr_in6); - sprintf(ipaddr_txt, "%x:%x:%x:%x:%x:%x:%x:%x" - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[0]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[1]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[2]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[3]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[4]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[5]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[6]) - , ntohs(((struct sockaddr_in6*)s)->sin6_addr.s6_addr16[7])); - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_address_parse: " - "found exttype=%u(%s) family=%d(AF_INET6) address=%s proto=%u port=%u.\n", - pfkey_address->sadb_address_exttype, - pfkey_v2_sadb_ext_string(pfkey_address->sadb_address_exttype), - s->sa_family, - ipaddr_txt, - pfkey_address->sadb_address_proto, - ((struct sockaddr_in6*)s)->sin6_port); - break; - default: - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "s->sa_family=%d not supported.\n", - s->sa_family); - SENDERR(EPFNOSUPPORT); - } - - if(pfkey_address->sadb_address_len != - DIVUP(sizeof(struct sadb_address) + saddr_len, IPSEC_PFKEYv2_ALIGN)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "size wrong 2 ext_len=%d, adr_ext_len=%d, saddr_len=%d.\n", - pfkey_address->sadb_address_len, - (int)sizeof(struct sadb_address), - saddr_len); - SENDERR(EINVAL); - } - - if(pfkey_address->sadb_address_prefixlen != 0) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_address_parse: " - "address prefixes not supported yet.\n"); - SENDERR(EAFNOSUPPORT); /* not supported yet */ - } - - /* XXX check if port!=0 */ - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_address_parse: successful.\n"); - errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_key_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_key *pfkey_key = (struct sadb_key *)pfkey_ext; - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_key_parse:enter\n"); - /* sanity checks... */ - - if(!pfkey_key) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "NULL pointer passed in.\n"); - SENDERR(EINVAL); - } - - if(pfkey_key->sadb_key_len < sizeof(struct sadb_key) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_key->sadb_key_len, - (int)sizeof(struct sadb_key)); - SENDERR(EINVAL); - } - - if(!pfkey_key->sadb_key_bits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "key length set to zero, must be non-zero.\n"); - SENDERR(EINVAL); - } - - if(pfkey_key->sadb_key_len != - DIVUP(sizeof(struct sadb_key) * OCTETBITS + pfkey_key->sadb_key_bits, - PFKEYBITS)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "key length=%d does not agree with extension length=%d.\n", - pfkey_key->sadb_key_bits, - pfkey_key->sadb_key_len); - SENDERR(EINVAL); - } - - if(pfkey_key->sadb_key_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "res=%d, must be zero.\n", - pfkey_key->sadb_key_reserved); - SENDERR(EINVAL); - } - - if(! ( (pfkey_key->sadb_key_exttype == SADB_EXT_KEY_AUTH) || - (pfkey_key->sadb_key_exttype == SADB_EXT_KEY_ENCRYPT))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "expecting extension type AUTH or ENCRYPT, got %d.\n", - pfkey_key->sadb_key_exttype); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_key_parse: " - "success, found len=%d exttype=%d(%s) bits=%d reserved=%d.\n", - pfkey_key->sadb_key_len, - pfkey_key->sadb_key_exttype, - pfkey_v2_sadb_ext_string(pfkey_key->sadb_key_exttype), - pfkey_key->sadb_key_bits, - pfkey_key->sadb_key_reserved); - -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_ident_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_ident *pfkey_ident = (struct sadb_ident *)pfkey_ext; - - /* sanity checks... */ - if(pfkey_ident->sadb_ident_len < sizeof(struct sadb_ident) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_ident_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_ident->sadb_ident_len, - (int)sizeof(struct sadb_ident)); - SENDERR(EINVAL); - } - - if(pfkey_ident->sadb_ident_type > SADB_IDENTTYPE_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_ident_parse: " - "ident_type=%d out of range, must be less than %d.\n", - pfkey_ident->sadb_ident_type, - SADB_IDENTTYPE_MAX); - SENDERR(EINVAL); - } - - if(pfkey_ident->sadb_ident_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_ident_parse: " - "res=%d, must be zero.\n", - pfkey_ident->sadb_ident_reserved); - SENDERR(EINVAL); - } - - /* string terminator/padding must be zero */ - if(pfkey_ident->sadb_ident_len > sizeof(struct sadb_ident) / IPSEC_PFKEYv2_ALIGN) { - if(*((char*)pfkey_ident + pfkey_ident->sadb_ident_len * IPSEC_PFKEYv2_ALIGN - 1)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_ident_parse: " - "string padding must be zero, last is 0x%02x.\n", - *((char*)pfkey_ident + - pfkey_ident->sadb_ident_len * IPSEC_PFKEYv2_ALIGN - 1)); - SENDERR(EINVAL); - } - } - - if( ! ((pfkey_ident->sadb_ident_exttype == SADB_EXT_IDENTITY_SRC) || - (pfkey_ident->sadb_ident_exttype == SADB_EXT_IDENTITY_DST))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_key_parse: " - "expecting extension type IDENTITY_SRC or IDENTITY_DST, got %d.\n", - pfkey_ident->sadb_ident_exttype); - SENDERR(EINVAL); - } - -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_sens_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_sens *pfkey_sens = (struct sadb_sens *)pfkey_ext; - - /* sanity checks... */ - if(pfkey_sens->sadb_sens_len < sizeof(struct sadb_sens) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sens_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_sens->sadb_sens_len, - (int)sizeof(struct sadb_sens)); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_sens_parse: " - "Sorry, I can't parse exttype=%d yet.\n", - pfkey_ext->sadb_ext_type); -#if 0 - SENDERR(EINVAL); /* don't process these yet */ -#endif - -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_prop_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - int i, num_comb; - struct sadb_prop *pfkey_prop = (struct sadb_prop *)pfkey_ext; - struct sadb_comb *pfkey_comb = (struct sadb_comb *)((char*)pfkey_ext + sizeof(struct sadb_prop)); - - /* sanity checks... */ - if((pfkey_prop->sadb_prop_len < sizeof(struct sadb_prop) / IPSEC_PFKEYv2_ALIGN) || - (((pfkey_prop->sadb_prop_len * IPSEC_PFKEYv2_ALIGN) - sizeof(struct sadb_prop)) % sizeof(struct sadb_comb))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "size wrong ext_len=%d, prop_ext_len=%d comb_ext_len=%d.\n", - pfkey_prop->sadb_prop_len, - (int)sizeof(struct sadb_prop), - (int)sizeof(struct sadb_comb)); - SENDERR(EINVAL); - } - - if(pfkey_prop->sadb_prop_replay > 64) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "replay window size: %d -- must be 0 <= size <= 64\n", - pfkey_prop->sadb_prop_replay); - SENDERR(EINVAL); - } - - for(i=0; i<3; i++) { - if(pfkey_prop->sadb_prop_reserved[i]) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "res[%d]=%d, must be zero.\n", - i, pfkey_prop->sadb_prop_reserved[i]); - SENDERR(EINVAL); - } - } - - num_comb = ((pfkey_prop->sadb_prop_len * IPSEC_PFKEYv2_ALIGN) - sizeof(struct sadb_prop)) / sizeof(struct sadb_comb); - - for(i = 0; i < num_comb; i++) { - if(pfkey_comb->sadb_comb_auth > SADB_AALG_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_auth=%d > SADB_AALG_MAX=%d.\n", - i, - pfkey_comb->sadb_comb_auth, - SADB_AALG_MAX); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_comb_auth) { - if(!pfkey_comb->sadb_comb_auth_minbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_auth_minbits=0, fatal.\n", - i); - SENDERR(EINVAL); - } - if(!pfkey_comb->sadb_comb_auth_maxbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_auth_maxbits=0, fatal.\n", - i); - SENDERR(EINVAL); - } - if(pfkey_comb->sadb_comb_auth_minbits > pfkey_comb->sadb_comb_auth_maxbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_auth_minbits=%d > maxbits=%d, fatal.\n", - i, - pfkey_comb->sadb_comb_auth_minbits, - pfkey_comb->sadb_comb_auth_maxbits); - SENDERR(EINVAL); - } - } else { - if(pfkey_comb->sadb_comb_auth_minbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_auth_minbits=%d != 0, fatal.\n", - i, - pfkey_comb->sadb_comb_auth_minbits); - SENDERR(EINVAL); - } - if(pfkey_comb->sadb_comb_auth_maxbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_auth_maxbits=%d != 0, fatal.\n", - i, - pfkey_comb->sadb_comb_auth_maxbits); - SENDERR(EINVAL); - } - } - - if(pfkey_comb->sadb_comb_encrypt > SADB_EALG_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_comb_parse: " - "pfkey_comb[%d]->sadb_comb_encrypt=%d > SADB_EALG_MAX=%d.\n", - i, - pfkey_comb->sadb_comb_encrypt, - SADB_EALG_MAX); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_comb_encrypt) { - if(!pfkey_comb->sadb_comb_encrypt_minbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_encrypt_minbits=0, fatal.\n", - i); - SENDERR(EINVAL); - } - if(!pfkey_comb->sadb_comb_encrypt_maxbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_encrypt_maxbits=0, fatal.\n", - i); - SENDERR(EINVAL); - } - if(pfkey_comb->sadb_comb_encrypt_minbits > pfkey_comb->sadb_comb_encrypt_maxbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_encrypt_minbits=%d > maxbits=%d, fatal.\n", - i, - pfkey_comb->sadb_comb_encrypt_minbits, - pfkey_comb->sadb_comb_encrypt_maxbits); - SENDERR(EINVAL); - } - } else { - if(pfkey_comb->sadb_comb_encrypt_minbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_encrypt_minbits=%d != 0, fatal.\n", - i, - pfkey_comb->sadb_comb_encrypt_minbits); - SENDERR(EINVAL); - } - if(pfkey_comb->sadb_comb_encrypt_maxbits) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_encrypt_maxbits=%d != 0, fatal.\n", - i, - pfkey_comb->sadb_comb_encrypt_maxbits); - SENDERR(EINVAL); - } - } - - /* XXX do sanity check on flags */ - - if(pfkey_comb->sadb_comb_hard_allocations && pfkey_comb->sadb_comb_soft_allocations > pfkey_comb->sadb_comb_hard_allocations) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_soft_allocations=%d > hard_allocations=%d, fatal.\n", - i, - pfkey_comb->sadb_comb_soft_allocations, - pfkey_comb->sadb_comb_hard_allocations); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_comb_hard_bytes && pfkey_comb->sadb_comb_soft_bytes > pfkey_comb->sadb_comb_hard_bytes) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_soft_bytes=%Ld > hard_bytes=%Ld, fatal.\n", - i, - (unsigned long long int)pfkey_comb->sadb_comb_soft_bytes, - (unsigned long long int)pfkey_comb->sadb_comb_hard_bytes); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_comb_hard_addtime && pfkey_comb->sadb_comb_soft_addtime > pfkey_comb->sadb_comb_hard_addtime) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_soft_addtime=%Ld > hard_addtime=%Ld, fatal.\n", - i, - (unsigned long long int)pfkey_comb->sadb_comb_soft_addtime, - (unsigned long long int)pfkey_comb->sadb_comb_hard_addtime); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_comb_hard_usetime && pfkey_comb->sadb_comb_soft_usetime > pfkey_comb->sadb_comb_hard_usetime) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_comb_soft_usetime=%Ld > hard_usetime=%Ld, fatal.\n", - i, - (unsigned long long int)pfkey_comb->sadb_comb_soft_usetime, - (unsigned long long int)pfkey_comb->sadb_comb_hard_usetime); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_x_comb_hard_packets && pfkey_comb->sadb_x_comb_soft_packets > pfkey_comb->sadb_x_comb_hard_packets) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "pfkey_comb[%d]->sadb_x_comb_soft_packets=%d > hard_packets=%d, fatal.\n", - i, - pfkey_comb->sadb_x_comb_soft_packets, - pfkey_comb->sadb_x_comb_hard_packets); - SENDERR(EINVAL); - } - - if(pfkey_comb->sadb_comb_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_prop_parse: " - "comb[%d].res=%d, must be zero.\n", - i, - pfkey_comb->sadb_comb_reserved); - SENDERR(EINVAL); - } - pfkey_comb++; - } - -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_supported_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - unsigned int i, num_alg; - struct sadb_supported *pfkey_supported = (struct sadb_supported *)pfkey_ext; - struct sadb_alg *pfkey_alg = (struct sadb_alg*)((char*)pfkey_ext + sizeof(struct sadb_supported)); - - /* sanity checks... */ - if((pfkey_supported->sadb_supported_len < - sizeof(struct sadb_supported) / IPSEC_PFKEYv2_ALIGN) || - (((pfkey_supported->sadb_supported_len * IPSEC_PFKEYv2_ALIGN) - - sizeof(struct sadb_supported)) % sizeof(struct sadb_alg))) { - - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_supported_parse: " - "size wrong ext_len=%d, supported_ext_len=%d alg_ext_len=%d.\n", - pfkey_supported->sadb_supported_len, - (int)sizeof(struct sadb_supported), - (int)sizeof(struct sadb_alg)); - SENDERR(EINVAL); - } - - if(pfkey_supported->sadb_supported_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_supported_parse: " - "res=%d, must be zero.\n", - pfkey_supported->sadb_supported_reserved); - SENDERR(EINVAL); - } - - num_alg = ((pfkey_supported->sadb_supported_len * IPSEC_PFKEYv2_ALIGN) - sizeof(struct sadb_supported)) / sizeof(struct sadb_alg); - - for(i = 0; i < num_alg; i++) { - /* process algo description */ - if(pfkey_alg->sadb_alg_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_supported_parse: " - "alg[%d], id=%d, ivlen=%d, minbits=%d, maxbits=%d, res=%d, must be zero.\n", - i, - pfkey_alg->sadb_alg_id, - pfkey_alg->sadb_alg_ivlen, - pfkey_alg->sadb_alg_minbits, - pfkey_alg->sadb_alg_maxbits, - pfkey_alg->sadb_alg_reserved); - SENDERR(EINVAL); - } - - /* XXX can alg_id auth/enc be determined from info given? - Yes, but OpenBSD's method does not iteroperate with rfc2367. - rgb, 2000-04-06 */ - - switch(pfkey_supported->sadb_supported_exttype) { - case SADB_EXT_SUPPORTED_AUTH: - if(pfkey_alg->sadb_alg_id > SADB_AALG_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_supported_parse: " - "alg[%d], alg_id=%d > SADB_AALG_MAX=%d, fatal.\n", - i, - pfkey_alg->sadb_alg_id, - SADB_AALG_MAX); - SENDERR(EINVAL); - } - break; - case SADB_EXT_SUPPORTED_ENCRYPT: - if(pfkey_alg->sadb_alg_id > SADB_EALG_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_supported_parse: " - "alg[%d], alg_id=%d > SADB_EALG_MAX=%d, fatal.\n", - i, - pfkey_alg->sadb_alg_id, - SADB_EALG_MAX); - SENDERR(EINVAL); - } - break; - default: - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_supported_parse: " - "alg[%d], alg_id=%d > SADB_EALG_MAX=%d, fatal.\n", - i, - pfkey_alg->sadb_alg_id, - SADB_EALG_MAX); - SENDERR(EINVAL); - } - pfkey_alg++; - } - - errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_spirange_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_spirange *pfkey_spirange = (struct sadb_spirange *)pfkey_ext; - - /* sanity checks... */ - if(pfkey_spirange->sadb_spirange_len != - sizeof(struct sadb_spirange) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_spirange_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_spirange->sadb_spirange_len, - (int)sizeof(struct sadb_spirange)); - SENDERR(EINVAL); - } - - if(pfkey_spirange->sadb_spirange_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_spirange_parse: " - "reserved=%d must be set to zero.\n", - pfkey_spirange->sadb_spirange_reserved); - SENDERR(EINVAL); - } - - if(ntohl(pfkey_spirange->sadb_spirange_max) < ntohl(pfkey_spirange->sadb_spirange_min)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_spirange_parse: " - "minspi=%08x must be < maxspi=%08x.\n", - ntohl(pfkey_spirange->sadb_spirange_min), - ntohl(pfkey_spirange->sadb_spirange_max)); - SENDERR(EINVAL); - } - - if(ntohl(pfkey_spirange->sadb_spirange_min) <= 255) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_spirange_parse: " - "minspi=%08x must be > 255.\n", - ntohl(pfkey_spirange->sadb_spirange_min)); - SENDERR(EEXIST); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_spirange_parse: " - "ext_len=%u ext_type=%u(%s) min=%u max=%u res=%u.\n", - pfkey_spirange->sadb_spirange_len, - pfkey_spirange->sadb_spirange_exttype, - pfkey_v2_sadb_ext_string(pfkey_spirange->sadb_spirange_exttype), - pfkey_spirange->sadb_spirange_min, - pfkey_spirange->sadb_spirange_max, - pfkey_spirange->sadb_spirange_reserved); - errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_x_kmprivate_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - struct sadb_x_kmprivate *pfkey_x_kmprivate = (struct sadb_x_kmprivate *)pfkey_ext; - - /* sanity checks... */ - if(pfkey_x_kmprivate->sadb_x_kmprivate_len < - sizeof(struct sadb_x_kmprivate) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_kmprivate_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_x_kmprivate->sadb_x_kmprivate_len, - (int)sizeof(struct sadb_x_kmprivate)); - SENDERR(EINVAL); - } - - if(pfkey_x_kmprivate->sadb_x_kmprivate_reserved) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_kmprivate_parse: " - "reserved=%d must be set to zero.\n", - pfkey_x_kmprivate->sadb_x_kmprivate_reserved); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_kmprivate_parse: " - "Sorry, I can't parse exttype=%d yet.\n", - pfkey_ext->sadb_ext_type); - SENDERR(EINVAL); /* don't process these yet */ - -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_x_satype_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - int i; - struct sadb_x_satype *pfkey_x_satype = (struct sadb_x_satype *)pfkey_ext; - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_x_satype_parse: enter\n"); - /* sanity checks... */ - if(pfkey_x_satype->sadb_x_satype_len != - sizeof(struct sadb_x_satype) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_satype_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_x_satype->sadb_x_satype_len, - (int)sizeof(struct sadb_x_satype)); - SENDERR(EINVAL); - } - - if(!pfkey_x_satype->sadb_x_satype_satype) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_satype_parse: " - "satype is zero, must be non-zero.\n"); - SENDERR(EINVAL); - } - - if(pfkey_x_satype->sadb_x_satype_satype > SADB_SATYPE_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_satype_parse: " - "satype %d > max %d, invalid.\n", - pfkey_x_satype->sadb_x_satype_satype, SADB_SATYPE_MAX); - SENDERR(EINVAL); - } - - if(!(satype2proto(pfkey_x_satype->sadb_x_satype_satype))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_satype_parse: " - "proto lookup from satype=%d failed.\n", - pfkey_x_satype->sadb_x_satype_satype); - SENDERR(EINVAL); - } - - for(i = 0; i < 3; i++) { - if(pfkey_x_satype->sadb_x_satype_reserved[i]) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_satype_parse: " - "reserved[%d]=%d must be set to zero.\n", - i, pfkey_x_satype->sadb_x_satype_reserved[i]); - SENDERR(EINVAL); - } - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_x_satype_parse: " - "len=%u ext=%u(%s) satype=%u(%s) res=%u,%u,%u.\n", - pfkey_x_satype->sadb_x_satype_len, - pfkey_x_satype->sadb_x_satype_exttype, - pfkey_v2_sadb_ext_string(pfkey_x_satype->sadb_x_satype_exttype), - pfkey_x_satype->sadb_x_satype_satype, - satype2name(pfkey_x_satype->sadb_x_satype_satype), - pfkey_x_satype->sadb_x_satype_reserved[0], - pfkey_x_satype->sadb_x_satype_reserved[1], - pfkey_x_satype->sadb_x_satype_reserved[2]); -errlab: - return error; -} - -DEBUG_NO_STATIC int -pfkey_x_ext_debug_parse(struct sadb_ext *pfkey_ext) -{ - int error = 0; - int i; - struct sadb_x_debug *pfkey_x_debug = (struct sadb_x_debug *)pfkey_ext; - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_x_debug_parse: enter\n"); - /* sanity checks... */ - if(pfkey_x_debug->sadb_x_debug_len != - sizeof(struct sadb_x_debug) / IPSEC_PFKEYv2_ALIGN) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_debug_parse: " - "size wrong ext_len=%d, key_ext_len=%d.\n", - pfkey_x_debug->sadb_x_debug_len, - (int)sizeof(struct sadb_x_debug)); - SENDERR(EINVAL); - } - - for(i = 0; i < 4; i++) { - if(pfkey_x_debug->sadb_x_debug_reserved[i]) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_x_debug_parse: " - "reserved[%d]=%d must be set to zero.\n", - i, pfkey_x_debug->sadb_x_debug_reserved[i]); - SENDERR(EINVAL); - } - } - -errlab: - return error; -} - -#ifdef NAT_TRAVERSAL -DEBUG_NO_STATIC int -pfkey_x_ext_nat_t_type_parse(struct sadb_ext *pfkey_ext) -{ - return 0; -} -DEBUG_NO_STATIC int -pfkey_x_ext_nat_t_port_parse(struct sadb_ext *pfkey_ext) -{ - return 0; -} -#endif - -#define DEFINEPARSER(NAME) static struct pf_key_ext_parsers_def NAME##_def={NAME, #NAME}; - -DEFINEPARSER(pfkey_sa_parse); -DEFINEPARSER(pfkey_lifetime_parse); -DEFINEPARSER(pfkey_address_parse); -DEFINEPARSER(pfkey_key_parse); -DEFINEPARSER(pfkey_ident_parse); -DEFINEPARSER(pfkey_sens_parse); -DEFINEPARSER(pfkey_prop_parse); -DEFINEPARSER(pfkey_supported_parse); -DEFINEPARSER(pfkey_spirange_parse); -DEFINEPARSER(pfkey_x_kmprivate_parse); -DEFINEPARSER(pfkey_x_satype_parse); -DEFINEPARSER(pfkey_x_ext_debug_parse); -#ifdef NAT_TRAVERSAL -DEFINEPARSER(pfkey_x_ext_nat_t_type_parse); -DEFINEPARSER(pfkey_x_ext_nat_t_port_parse); -#endif - -struct pf_key_ext_parsers_def *ext_default_parsers[]= -{ - NULL, /* pfkey_msg_parse, */ - &pfkey_sa_parse_def, - &pfkey_lifetime_parse_def, - &pfkey_lifetime_parse_def, - &pfkey_lifetime_parse_def, - &pfkey_address_parse_def, - &pfkey_address_parse_def, - &pfkey_address_parse_def, - &pfkey_key_parse_def, - &pfkey_key_parse_def, - &pfkey_ident_parse_def, - &pfkey_ident_parse_def, - &pfkey_sens_parse_def, - &pfkey_prop_parse_def, - &pfkey_supported_parse_def, - &pfkey_supported_parse_def, - &pfkey_spirange_parse_def, - &pfkey_x_kmprivate_parse_def, - &pfkey_x_satype_parse_def, - &pfkey_sa_parse_def, - &pfkey_address_parse_def, - &pfkey_address_parse_def, - &pfkey_address_parse_def, - &pfkey_address_parse_def, - &pfkey_address_parse_def, - &pfkey_x_ext_debug_parse_def -#ifdef NAT_TRAVERSAL - , - &pfkey_x_ext_nat_t_type_parse_def, - &pfkey_x_ext_nat_t_port_parse_def, - &pfkey_x_ext_nat_t_port_parse_def, - &pfkey_address_parse_def -#endif -}; - -int -pfkey_msg_parse(struct sadb_msg *pfkey_msg, - struct pf_key_ext_parsers_def *ext_parsers[], - struct sadb_ext *extensions[], - int dir) -{ - int error = 0; - int remain; - struct sadb_ext *pfkey_ext; - int extensions_seen = 0; - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_msg_parse: " - "parsing message ver=%d, type=%d(%s), errno=%d, satype=%d(%s), len=%d, res=%d, seq=%d, pid=%d.\n", - pfkey_msg->sadb_msg_version, - pfkey_msg->sadb_msg_type, - pfkey_v2_sadb_type_string(pfkey_msg->sadb_msg_type), - pfkey_msg->sadb_msg_errno, - pfkey_msg->sadb_msg_satype, - satype2name(pfkey_msg->sadb_msg_satype), - pfkey_msg->sadb_msg_len, - pfkey_msg->sadb_msg_reserved, - pfkey_msg->sadb_msg_seq, - pfkey_msg->sadb_msg_pid); - - if(ext_parsers == NULL) ext_parsers = ext_default_parsers; - - pfkey_extensions_init(extensions); - - remain = pfkey_msg->sadb_msg_len; - remain -= sizeof(struct sadb_msg) / IPSEC_PFKEYv2_ALIGN; - - pfkey_ext = (struct sadb_ext*)((char*)pfkey_msg + - sizeof(struct sadb_msg)); - - extensions[0] = (struct sadb_ext *) pfkey_msg; - - - if(pfkey_msg->sadb_msg_version != PF_KEY_V2) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "not PF_KEY_V2 msg, found %d, should be %d.\n", - pfkey_msg->sadb_msg_version, - PF_KEY_V2); - SENDERR(EINVAL); - } - - if(!pfkey_msg->sadb_msg_type) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "msg type not set, must be non-zero..\n"); - SENDERR(EINVAL); - } - - if(pfkey_msg->sadb_msg_type > SADB_MAX) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "msg type=%d > max=%d.\n", - pfkey_msg->sadb_msg_type, - SADB_MAX); - SENDERR(EINVAL); - } - - switch(pfkey_msg->sadb_msg_type) { - case SADB_GETSPI: - case SADB_UPDATE: - case SADB_ADD: - case SADB_DELETE: - case SADB_GET: - case SADB_X_GRPSA: - case SADB_X_ADDFLOW: - if(!satype2proto(pfkey_msg->sadb_msg_satype)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "satype %d conversion to proto failed for msg_type %d (%s).\n", - pfkey_msg->sadb_msg_satype, - pfkey_msg->sadb_msg_type, - pfkey_v2_sadb_type_string(pfkey_msg->sadb_msg_type)); - SENDERR(EINVAL); - } else { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "satype %d(%s) conversion to proto gives %d for msg_type %d(%s).\n", - pfkey_msg->sadb_msg_satype, - satype2name(pfkey_msg->sadb_msg_satype), - satype2proto(pfkey_msg->sadb_msg_satype), - pfkey_msg->sadb_msg_type, - pfkey_v2_sadb_type_string(pfkey_msg->sadb_msg_type)); - } - case SADB_ACQUIRE: - case SADB_REGISTER: - case SADB_EXPIRE: - if(!pfkey_msg->sadb_msg_satype) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "satype is zero, must be non-zero for msg_type %d(%s).\n", - pfkey_msg->sadb_msg_type, - pfkey_v2_sadb_type_string(pfkey_msg->sadb_msg_type)); - SENDERR(EINVAL); - } - default: - break; - } - - /* errno must not be set in downward messages */ - /* this is not entirely true... a response to an ACQUIRE could return an error */ - if((dir == EXT_BITS_IN) && (pfkey_msg->sadb_msg_type != SADB_ACQUIRE) && pfkey_msg->sadb_msg_errno) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "errno set to %d.\n", - pfkey_msg->sadb_msg_errno); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_msg_parse: " - "remain=%d, ext_type=%d(%s), ext_len=%d.\n", - remain, - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type), - pfkey_ext->sadb_ext_len); - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_msg_parse: " - "extensions permitted=%08x, required=%08x.\n", - extensions_bitmaps[dir][EXT_BITS_PERM][pfkey_msg->sadb_msg_type], - extensions_bitmaps[dir][EXT_BITS_REQ][pfkey_msg->sadb_msg_type]); - - extensions_seen = 1; - - while( (remain * IPSEC_PFKEYv2_ALIGN) >= sizeof(struct sadb_ext) ) { - /* Is there enough message left to support another extension header? */ - if(remain < pfkey_ext->sadb_ext_len) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "remain %d less than ext len %d.\n", - remain, pfkey_ext->sadb_ext_len); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_msg_parse: " - "parsing ext type=%d(%s) remain=%d.\n", - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type), - remain); - - /* Is the extension header type valid? */ - if((pfkey_ext->sadb_ext_type > SADB_EXT_MAX) || (!pfkey_ext->sadb_ext_type)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "ext type %d(%s) invalid, SADB_EXT_MAX=%d.\n", - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type), - SADB_EXT_MAX); - SENDERR(EINVAL); - } - - /* Have we already seen this type of extension? */ - if((extensions_seen & ( 1 << pfkey_ext->sadb_ext_type )) != 0) - { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "ext type %d(%s) already seen.\n", - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type)); - SENDERR(EINVAL); - } - - /* Do I even know about this type of extension? */ - if(ext_parsers[pfkey_ext->sadb_ext_type]==NULL) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "ext type %d(%s) unknown, ignoring.\n", - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type)); - goto next_ext; - } - - /* Is this type of extension permitted for this type of message? */ - if(!(extensions_bitmaps[dir][EXT_BITS_PERM][pfkey_msg->sadb_msg_type] & - 1<sadb_ext_type)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "ext type %d(%s) not permitted, exts_perm_in=%08x, 1<sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type), - extensions_bitmaps[dir][EXT_BITS_PERM][pfkey_msg->sadb_msg_type], - 1<sadb_ext_type); - SENDERR(EINVAL); - } - - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_msg_parse: " - "remain=%d ext_type=%d(%s) ext_len=%d parsing ext 0p%p with parser %s.\n", - remain, - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type), - pfkey_ext->sadb_ext_len, - pfkey_ext, - ext_parsers[pfkey_ext->sadb_ext_type]->parser_name); - - /* Parse the extension */ - if((error = - (*ext_parsers[pfkey_ext->sadb_ext_type]->parser)(pfkey_ext))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "extension parsing for type %d(%s) failed with error %d.\n", - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type), - error); - SENDERR(-error); - } - DEBUGGING(PF_KEY_DEBUG_PARSE_FLOW, - "pfkey_msg_parse: " - "Extension %d(%s) parsed.\n", - pfkey_ext->sadb_ext_type, - pfkey_v2_sadb_ext_string(pfkey_ext->sadb_ext_type)); - - /* Mark that we have seen this extension and remember the header location */ - extensions_seen |= ( 1 << pfkey_ext->sadb_ext_type ); - extensions[pfkey_ext->sadb_ext_type] = pfkey_ext; - - next_ext: - /* Calculate how much message remains */ - remain -= pfkey_ext->sadb_ext_len; - - if(!remain) { - break; - } - /* Find the next extension header */ - pfkey_ext = (struct sadb_ext*)((char*)pfkey_ext + - pfkey_ext->sadb_ext_len * IPSEC_PFKEYv2_ALIGN); - } - - if(remain) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "unexpected remainder of %d.\n", - remain); - /* why is there still something remaining? */ - SENDERR(EINVAL); - } - - /* check required extensions */ - DEBUGGING(PF_KEY_DEBUG_PARSE_STRUCT, - "pfkey_msg_parse: " - "extensions permitted=%08x, seen=%08x, required=%08x.\n", - extensions_bitmaps[dir][EXT_BITS_PERM][pfkey_msg->sadb_msg_type], - extensions_seen, - extensions_bitmaps[dir][EXT_BITS_REQ][pfkey_msg->sadb_msg_type]); - - /* don't check further if it is an error return message since it - may not have a body */ - if(pfkey_msg->sadb_msg_errno) { - SENDERR(-error); - } - - if((extensions_seen & - extensions_bitmaps[dir][EXT_BITS_REQ][pfkey_msg->sadb_msg_type]) != - extensions_bitmaps[dir][EXT_BITS_REQ][pfkey_msg->sadb_msg_type]) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "required extensions missing:%08x.\n", - extensions_bitmaps[dir][EXT_BITS_REQ][pfkey_msg->sadb_msg_type] - - (extensions_seen & - extensions_bitmaps[dir][EXT_BITS_REQ][pfkey_msg->sadb_msg_type])); - SENDERR(EINVAL); - } - - if((dir == EXT_BITS_IN) && (pfkey_msg->sadb_msg_type == SADB_X_DELFLOW) - && ((extensions_seen & SADB_X_EXT_ADDRESS_DELFLOW) - != SADB_X_EXT_ADDRESS_DELFLOW) - && (((extensions_seen & (1<sadb_sa_flags - & SADB_X_SAFLAGS_CLEARFLOW) - != SADB_X_SAFLAGS_CLEARFLOW))) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "required SADB_X_DELFLOW extensions missing: either %08x must be present or %08x must be present with SADB_X_SAFLAGS_CLEARFLOW set.\n", - SADB_X_EXT_ADDRESS_DELFLOW - - (extensions_seen & SADB_X_EXT_ADDRESS_DELFLOW), - (1<sadb_msg_type) { - case SADB_ADD: - case SADB_UPDATE: - /* check maturity */ - if(((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_state != - SADB_SASTATE_MATURE) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "state=%d for add or update should be MATURE=%d.\n", - ((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_state, - SADB_SASTATE_MATURE); - SENDERR(EINVAL); - } - - /* check AH and ESP */ - switch(((struct sadb_msg*)extensions[SADB_EXT_RESERVED])->sadb_msg_satype) { - case SADB_SATYPE_AH: - if(!(((struct sadb_sa*)extensions[SADB_EXT_SA]) && - ((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_auth != - SADB_AALG_NONE)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "auth alg is zero, must be non-zero for AH SAs.\n"); - SENDERR(EINVAL); - } - if(((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_encrypt != - SADB_EALG_NONE) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "AH handed encalg=%d, must be zero.\n", - ((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_encrypt); - SENDERR(EINVAL); - } - break; - case SADB_SATYPE_ESP: - if(!(((struct sadb_sa*)extensions[SADB_EXT_SA]) && - ((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_encrypt != - SADB_EALG_NONE)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "encrypt alg=%d is zero, must be non-zero for ESP=%d SAs.\n", - ((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_encrypt, - ((struct sadb_msg*)extensions[SADB_EXT_RESERVED])->sadb_msg_satype); - SENDERR(EINVAL); - } - if((((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_encrypt == - SADB_EALG_NULL) && - (((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_auth == - SADB_AALG_NONE) ) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "ESP handed encNULL+authNONE, illegal combination.\n"); - SENDERR(EINVAL); - } - break; - case SADB_X_SATYPE_COMP: - if(!(((struct sadb_sa*)extensions[SADB_EXT_SA]) && - ((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_encrypt != - SADB_EALG_NONE)) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "encrypt alg=%d is zero, must be non-zero for COMP=%d SAs.\n", - ((struct sadb_sa*)extensions[SADB_EXT_SA])->sadb_sa_encrypt, - ((struct sadb_msg*)extensions[SADB_EXT_RESERVED])->sadb_msg_satype); - SENDERR(EINVAL); - } - if(((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_auth != - SADB_AALG_NONE) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "COMP handed auth=%d, must be zero.\n", - ((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_auth); - SENDERR(EINVAL); - } - break; - default: - break; - } - if(ntohl(((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_spi) <= 255) { - DEBUGGING(PF_KEY_DEBUG_PARSE_PROBLEM, - "pfkey_msg_parse: " - "spi=%08x must be > 255.\n", - ntohl(((struct sadb_sa*)(extensions[SADB_EXT_SA]))->sadb_sa_spi)); - SENDERR(EINVAL); - } - default: - break; - } -errlab: - - return error; -} - -/* - * $Log: pfkey_v2_parse.c,v $ - * Revision 1.53 2003/01/30 02:32:09 rgb - * - * Rename SAref table macro names for clarity. - * Convert IPsecSAref_t from signed to unsigned to fix apparent SAref exhaustion bug. - * - * Revision 1.52 2002/12/30 06:53:07 mcr - * deal with short SA structures... #if 0 out for now. Probably - * not quite the right way. - * - * Revision 1.51 2002/12/13 18:16:02 mcr - * restored sa_ref code - * - * Revision 1.50 2002/12/13 18:06:52 mcr - * temporarily removed sadb_x_sa_ref reference for 2.xx - * - * Revision 1.49 2002/10/05 05:02:58 dhr - * - * C labels go on statements - * - * Revision 1.48 2002/09/20 15:40:45 rgb - * Added sadb_x_sa_ref to struct sadb_sa. - * - * Revision 1.47 2002/09/20 05:01:31 rgb - * Fixed usage of pfkey_lib_debug. - * Format for function declaration style consistency. - * Added text labels to elucidate numeric values presented. - * Re-organised debug output to reduce noise in output. - * - * Revision 1.46 2002/07/24 18:44:54 rgb - * Type fiddling to tame ia64 compiler. - * - * Revision 1.45 2002/05/23 07:14:11 rgb - * Cleaned up %p variants to 0p%p for test suite cleanup. - * - * Revision 1.44 2002/04/24 07:55:32 mcr - * #include patches and Makefiles for post-reorg compilation. - * - * Revision 1.43 2002/04/24 07:36:40 mcr - * Moved from ./lib/pfkey_v2_parse.c,v - * - * Revision 1.42 2002/01/29 22:25:36 rgb - * Re-add ipsec_kversion.h to keep MALLOC happy. - * - * Revision 1.41 2002/01/29 01:59:10 mcr - * removal of kversions.h - sources that needed it now use ipsec_param.h. - * updating of IPv6 structures to match latest in6.h version. - * removed dead code from freeswan.h that also duplicated kversions.h - * code. - * - * Revision 1.40 2002/01/20 20:34:50 mcr - * added pfkey_v2_sadb_type_string to decode sadb_type to string. - * - * Revision 1.39 2001/11/27 05:29:22 mcr - * pfkey parses are now maintained by a structure - * that includes their name for debug purposes. - * DEBUGGING() macro changed so that it takes a debug - * level so that pf_key() can use this to decode the - * structures without innundanting humans. - * Also uses pfkey_v2_sadb_ext_string() in messages. - * - * Revision 1.38 2001/11/06 19:47:47 rgb - * Added packet parameter to lifetime and comb structures. - * - * Revision 1.37 2001/10/18 04:45:24 rgb - * 2.4.9 kernel deprecates linux/malloc.h in favour of linux/slab.h, - * lib/freeswan.h version macros moved to lib/kversions.h. - * Other compiler directive cleanups. - * - * Revision 1.36 2001/06/14 19:35:16 rgb - * Update copyright date. - * - * Revision 1.35 2001/05/03 19:44:51 rgb - * Standardise on SENDERR() macro. - * - * Revision 1.34 2001/03/16 07:41:51 rgb - * Put freeswan.h include before pluto includes. - * - * Revision 1.33 2001/02/27 07:13:51 rgb - * Added satype2name() function. - * Added text to default satype_tbl entry. - * Added satype2name() conversions for most satype debug output. - * - * Revision 1.32 2001/02/26 20:01:09 rgb - * Added internal IP protocol 61 for magic SAs. - * Ditch unused sadb_satype2proto[], replaced by satype2proto(). - * Re-formatted debug output (split lines, consistent spacing). - * Removed acquire, register and expire requirements for a known satype. - * Changed message type checking to a switch structure. - * Verify expected NULL auth for IPCOMP. - * Enforced spi > 0x100 requirement, now that pass uses a magic SA for - * appropriate message types. - * - * Revision 1.31 2000/12/01 07:09:00 rgb - * Added ipcomp sanity check to require encalgo is set. - * - * Revision 1.30 2000/11/17 18:10:30 rgb - * Fixed bugs mostly relating to spirange, to treat all spi variables as - * network byte order since this is the way PF_KEYv2 stored spis. - * - * Revision 1.29 2000/10/12 00:02:39 rgb - * Removed 'format, ##' nonsense from debug macros for RH7.0. - * - * Revision 1.28 2000/09/20 16:23:04 rgb - * Remove over-paranoid extension check in the presence of sadb_msg_errno. - * - * Revision 1.27 2000/09/20 04:04:21 rgb - * Changed static functions to DEBUG_NO_STATIC to reveal function names in - * oopsen. - * - * Revision 1.26 2000/09/15 11:37:02 rgb - * Merge in heavily modified Svenning Soerensen's - * IPCOMP zlib deflate code. - * - * Revision 1.25 2000/09/12 22:35:37 rgb - * Restructured to remove unused extensions from CLEARFLOW messages. - * - * Revision 1.24 2000/09/12 18:59:54 rgb - * Added Gerhard's IPv6 support to pfkey parts of libfreeswan. - * - * Revision 1.23 2000/09/12 03:27:00 rgb - * Moved DEBUGGING definition to compile kernel with debug off. - * - * Revision 1.22 2000/09/09 06:39:27 rgb - * Restrict pfkey errno check to downward messages only. - * - * Revision 1.21 2000/09/08 19:22:34 rgb - * Enabled pfkey_sens_parse(). - * Added check for errno on downward acquire messages only. - * - * Revision 1.20 2000/09/01 18:48:23 rgb - * Fixed reserved check bug and added debug output in - * pfkey_supported_parse(). - * Fixed debug output label bug in pfkey_ident_parse(). - * - * Revision 1.19 2000/08/27 01:55:26 rgb - * Define OCTETBITS and PFKEYBITS to avoid using 'magic' numbers in code. - * - * Revision 1.18 2000/08/24 17:00:36 rgb - * Ignore unknown extensions instead of failing. - * - * Revision 1.17 2000/06/02 22:54:14 rgb - * Added Gerhard Gessler's struct sockaddr_storage mods for IPv6 support. - * - * Revision 1.16 2000/05/10 19:25:11 rgb - * Fleshed out proposal and supported extensions. - * - * Revision 1.15 2000/01/24 21:15:31 rgb - * Added disabled pluto pfkey lib debug flag. - * Added algo debugging reporting. - * - * Revision 1.14 2000/01/22 23:24:29 rgb - * Added new functions proto2satype() and satype2proto() and lookup - * table satype_tbl. Also added proto2name() since it was easy. - * - * Revision 1.13 2000/01/21 09:43:59 rgb - * Cast ntohl(spi) as (unsigned long int) to shut up compiler. - * - * Revision 1.12 2000/01/21 06:28:19 rgb - * Added address cases for eroute flows. - * Indented compiler directives for readability. - * Added klipsdebug switching capability. - * - * Revision 1.11 1999/12/29 21:14:59 rgb - * Fixed debug text cut and paste typo. - * - * Revision 1.10 1999/12/10 17:45:24 rgb - * Added address debugging. - * - * Revision 1.9 1999/12/09 23:11:42 rgb - * Ditched include since we no longer use memset(). - * Use new pfkey_extensions_init() instead of memset(). - * Added check for SATYPE in pfkey_msg_build(). - * Tidy up comments and debugging comments. - * - * Revision 1.8 1999/12/07 19:55:26 rgb - * Removed unused first argument from extension parsers. - * Removed static pluto debug flag. - * Moved message type and state checking to pfkey_msg_parse(). - * Changed print[fk] type from lx to x to quiet compiler. - * Removed redundant remain check. - * Changed __u* types to uint* to avoid use of asm/types.h and - * sys/types.h in userspace code. - * - * Revision 1.7 1999/12/01 22:20:51 rgb - * Moved pfkey_lib_debug variable into the library. - * Added pfkey version check into header parsing. - * Added check for SATYPE only for those extensions that require a - * non-zero value. - * - * Revision 1.6 1999/11/27 11:58:05 rgb - * Added ipv6 headers. - * Moved sadb_satype2proto protocol lookup table from - * klips/net/ipsec/pfkey_v2_parser.c. - * Enable lifetime_current checking. - * Debugging error messages added. - * Add argument to pfkey_msg_parse() for direction. - * Consolidated the 4 1-d extension bitmap arrays into one 4-d array. - * Add CVS log entry to bottom of file. - * Moved auth and enc alg check to pfkey_msg_parse(). - * Enable accidentally disabled spirange parsing. - * Moved protocol/algorithm checks from klips/net/ipsec/pfkey_v2_parser.c - * - * Local variables: - * c-file-style: "linux" - * End: - * - */ ./contrib/pfkey_v2_parse.c/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:12.130894648 +0000 @@ -1,471 +0,0 @@ -/* - * linux/arch/i386/nmi.c - * - * NMI watchdog support on APIC systems - * - * Started by Ingo Molnar - * - * Fixes: - * Mikael Pettersson : AMD K7 support for local APIC NMI watchdog. - * Mikael Pettersson : Power Management for local APIC NMI watchdog. - * Mikael Pettersson : Pentium 4 support for local APIC NMI watchdog. - * Pavel Machek and - * Mikael Pettersson : PM converted to driver model. Disable/enable API. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -unsigned int nmi_watchdog = NMI_NONE; -static unsigned int nmi_hz = HZ; -unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ -extern void show_registers(struct pt_regs *regs); - -/* nmi_active: - * +1: the lapic NMI watchdog is active, but can be disabled - * 0: the lapic NMI watchdog has not been set up, and cannot - * be enabled - * -1: the lapic NMI watchdog is disabled, but can be enabled - */ -static int nmi_active; - -#define K7_EVNTSEL_ENABLE (1 << 22) -#define K7_EVNTSEL_INT (1 << 20) -#define K7_EVNTSEL_OS (1 << 17) -#define K7_EVNTSEL_USR (1 << 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 -#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING - -#define P6_EVNTSEL0_ENABLE (1 << 22) -#define P6_EVNTSEL_INT (1 << 20) -#define P6_EVNTSEL_OS (1 << 17) -#define P6_EVNTSEL_USR (1 << 16) -#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 -#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED - -#define MSR_P4_MISC_ENABLE 0x1A0 -#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7) -#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL (1<<12) -#define MSR_P4_PERFCTR0 0x300 -#define MSR_P4_CCCR0 0x360 -#define P4_ESCR_EVENT_SELECT(N) ((N)<<25) -#define P4_ESCR_OS (1<<3) -#define P4_ESCR_USR (1<<2) -#define P4_CCCR_OVF_PMI (1<<26) -#define P4_CCCR_THRESHOLD(N) ((N)<<20) -#define P4_CCCR_COMPLEMENT (1<<19) -#define P4_CCCR_COMPARE (1<<18) -#define P4_CCCR_REQUIRED (3<<16) -#define P4_CCCR_ESCR_SELECT(N) ((N)<<13) -#define P4_CCCR_ENABLE (1<<12) -/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter - CRU_ESCR0 (with any non-null event selector) through a complemented - max threshold. [IA32-Vol3, Section 14.9.9] */ -#define MSR_P4_IQ_COUNTER0 0x30C -#define P4_NMI_CRU_ESCR0 (P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR) -#define P4_NMI_IQ_CCCR0 \ - (P4_CCCR_OVF_PMI|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \ - P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE) - -int __init check_nmi_watchdog (void) -{ - unsigned int prev_nmi_count[NR_CPUS]; - int cpu; - - printk(KERN_INFO "testing NMI watchdog ... "); - - for (cpu = 0; cpu < NR_CPUS; cpu++) - prev_nmi_count[cpu] = irq_stat[cpu].__nmi_count; - local_irq_enable(); - mdelay((10*1000)/nmi_hz); // wait 10 ticks - - /* FIXME: Only boot CPU is online at this stage. Check CPUs - as they come up. */ - for (cpu = 0; cpu < NR_CPUS; cpu++) { - if (!cpu_online(cpu)) - continue; - if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) { - printk("CPU#%d: NMI appears to be stuck!\n", cpu); - nmi_active = 0; - return -1; - } - } - printk("OK.\n"); - - /* now that we know it works we can reduce NMI frequency to - something more reasonable; makes a difference in some configs */ - if (nmi_watchdog == NMI_LOCAL_APIC) - nmi_hz = 1; - - return 0; -} - -static int __init setup_nmi_watchdog(char *str) -{ - int nmi; - - get_option(&str, &nmi); - - if (nmi >= NMI_INVALID) - return 0; - if (nmi == NMI_NONE) - nmi_watchdog = nmi; - /* - * If any other x86 CPU has a local APIC, then - * please test the NMI stuff there and send me the - * missing bits. Right now Intel P6/P4 and AMD K7 only. - */ - if ((nmi == NMI_LOCAL_APIC) && - (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && - (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15)) - nmi_watchdog = nmi; - if ((nmi == NMI_LOCAL_APIC) && - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && - (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15)) - nmi_watchdog = nmi; - /* - * We can enable the IO-APIC watchdog - * unconditionally. - */ - if (nmi == NMI_IO_APIC) { - nmi_active = 1; - nmi_watchdog = nmi; - } - return 1; -} - -__setup("nmi_watchdog=", setup_nmi_watchdog); - -void disable_lapic_nmi_watchdog(void) -{ - if (nmi_active <= 0) - return; - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_AMD: - wrmsr(MSR_K7_EVNTSEL0, 0, 0); - break; - case X86_VENDOR_INTEL: - switch (boot_cpu_data.x86) { - case 6: - if (boot_cpu_data.x86_model > 0xd) - break; - - wrmsr(MSR_P6_EVNTSEL0, 0, 0); - break; - case 15: - if (boot_cpu_data.x86_model > 0x3) - break; - - wrmsr(MSR_P4_IQ_CCCR0, 0, 0); - wrmsr(MSR_P4_CRU_ESCR0, 0, 0); - break; - } - break; - } - nmi_active = -1; - /* tell do_nmi() and others that we're not active any more */ - nmi_watchdog = 0; -} - -void enable_lapic_nmi_watchdog(void) -{ - if (nmi_active < 0) { - nmi_watchdog = NMI_LOCAL_APIC; - setup_apic_nmi_watchdog(); - } -} - -void disable_timer_nmi_watchdog(void) -{ - if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0)) - return; - - unset_nmi_callback(); - nmi_active = -1; - nmi_watchdog = NMI_NONE; -} - -void enable_timer_nmi_watchdog(void) -{ - if (nmi_active < 0) { - nmi_watchdog = NMI_IO_APIC; - touch_nmi_watchdog(); - nmi_active = 1; - } -} - -#ifdef CONFIG_PM - -static int nmi_pm_active; /* nmi_active before suspend */ - -static int lapic_nmi_suspend(struct sys_device *dev, u32 state) -{ - nmi_pm_active = nmi_active; - disable_lapic_nmi_watchdog(); - return 0; -} - -static int lapic_nmi_resume(struct sys_device *dev) -{ - if (nmi_pm_active > 0) - enable_lapic_nmi_watchdog(); - return 0; -} - - -static struct sysdev_class nmi_sysclass = { - set_kset_name("lapic_nmi"), - .resume = lapic_nmi_resume, - .suspend = lapic_nmi_suspend, -}; - -static struct sys_device device_lapic_nmi = { - .id = 0, - .cls = &nmi_sysclass, -}; - -static int __init init_lapic_nmi_sysfs(void) -{ - int error; - - if (nmi_active == 0) - return 0; - - error = sysdev_class_register(&nmi_sysclass); - if (!error) - error = sys_device_register(&device_lapic_nmi); - return error; -} -/* must come after the local APIC's device_initcall() */ -late_initcall(init_lapic_nmi_sysfs); - -#endif /* CONFIG_PM */ - -/* - * Activate the NMI watchdog via the local APIC. - * Original code written by Keith Owens. - */ - -static void clear_msr_range(unsigned int base, unsigned int n) -{ - unsigned int i; - - for(i = 0; i < n; ++i) - wrmsr(base+i, 0, 0); -} - -static void setup_k7_watchdog(void) -{ - unsigned int evntsel; - - nmi_perfctr_msr = MSR_K7_PERFCTR0; - - clear_msr_range(MSR_K7_EVNTSEL0, 4); - clear_msr_range(MSR_K7_PERFCTR0, 4); - - evntsel = K7_EVNTSEL_INT - | K7_EVNTSEL_OS - | K7_EVNTSEL_USR - | K7_NMI_EVENT; - - wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - Dprintk("setting K7_PERFCTR0 to %08lx\n", -(cpu_khz/nmi_hz*1000)); - wrmsr(MSR_K7_PERFCTR0, -(cpu_khz/nmi_hz*1000), -1); - apic_write(APIC_LVTPC, APIC_DM_NMI); - evntsel |= K7_EVNTSEL_ENABLE; - wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); -} - -static void setup_p6_watchdog(void) -{ - unsigned int evntsel; - - nmi_perfctr_msr = MSR_P6_PERFCTR0; - - clear_msr_range(MSR_P6_EVNTSEL0, 2); - clear_msr_range(MSR_P6_PERFCTR0, 2); - - evntsel = P6_EVNTSEL_INT - | P6_EVNTSEL_OS - | P6_EVNTSEL_USR - | P6_NMI_EVENT; - - wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); - Dprintk("setting P6_PERFCTR0 to %08lx\n", -(cpu_khz/nmi_hz*1000)); - wrmsr(MSR_P6_PERFCTR0, -(cpu_khz/nmi_hz*1000), 0); - apic_write(APIC_LVTPC, APIC_DM_NMI); - evntsel |= P6_EVNTSEL0_ENABLE; - wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); -} - -static int setup_p4_watchdog(void) -{ - unsigned int misc_enable, dummy; - - rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy); - if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL)) - return 0; - - nmi_perfctr_msr = MSR_P4_IQ_COUNTER0; - - if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL)) - clear_msr_range(0x3F1, 2); - /* MSR 0x3F0 seems to have a default value of 0xFC00, but current - docs doesn't fully define it, so leave it alone for now. */ - clear_msr_range(0x3A0, 31); - clear_msr_range(0x3C0, 6); - clear_msr_range(0x3C8, 6); - clear_msr_range(0x3E0, 2); - clear_msr_range(MSR_P4_CCCR0, 18); - clear_msr_range(MSR_P4_PERFCTR0, 18); - - wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0); - wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0); - Dprintk("setting P4_IQ_COUNTER0 to 0x%08lx\n", -(cpu_khz/nmi_hz*1000)); - wrmsr(MSR_P4_IQ_COUNTER0, -(cpu_khz/nmi_hz*1000), -1); - apic_write(APIC_LVTPC, APIC_DM_NMI); - wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0, 0); - return 1; -} - -void setup_apic_nmi_watchdog (void) -{ - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_AMD: - if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15) - return; - setup_k7_watchdog(); - break; - case X86_VENDOR_INTEL: - switch (boot_cpu_data.x86) { - case 6: - if (boot_cpu_data.x86_model > 0xd) - return; - - setup_p6_watchdog(); - break; - case 15: - if (boot_cpu_data.x86_model > 0x3) - return; - - if (!setup_p4_watchdog()) - return; - break; - default: - return; - } - break; - default: - return; - } - nmi_active = 1; -} - -static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; - -/* - * the best way to detect whether a CPU has a 'hard lockup' problem - * is to check it's local APIC timer IRQ counts. If they are not - * changing then that CPU has some problem. - * - * as these watchdog NMI IRQs are generated on every CPU, we only - * have to check the current processor. - * - * since NMIs don't listen to _any_ locks, we have to be extremely - * careful not to rely on unsafe variables. The printk might lock - * up though, so we have to break up any console locks first ... - * [when there will be more tty-related locks, break them up - * here too!] - */ - -static unsigned int - last_irq_sums [NR_CPUS], - alert_counter [NR_CPUS]; - -void touch_nmi_watchdog (void) -{ - int i; - - /* - * Just reset the alert counters, (other CPUs might be - * spinning on locks we hold): - */ - for (i = 0; i < NR_CPUS; i++) - alert_counter[i] = 0; -} - -void nmi_watchdog_tick (struct pt_regs * regs) -{ - - /* - * Since current_thread_info()-> is always on the stack, and we - * always switch the stack NMI-atomically, it's safe to use - * smp_processor_id(). - */ - int sum, cpu = smp_processor_id(); - - sum = irq_stat[cpu].apic_timer_irqs; - - if (last_irq_sums[cpu] == sum) { - /* - * Ayiee, looks like this CPU is stuck ... - * wait a few IRQs (5 seconds) before doing the oops ... - */ - alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) { - spin_lock(&nmi_print_lock); - /* - * We are in trouble anyway, lets at least try - * to get a message out. - */ - bust_spinlocks(1); - printk("NMI Watchdog detected LOCKUP on CPU%d, eip %08lx, registers:\n", cpu, regs->eip); - show_registers(regs); - dump("NMI Watchdog detected LOCKUP", regs); - printk("console shuts up ...\n"); - console_silent(); - spin_unlock(&nmi_print_lock); - bust_spinlocks(0); - do_exit(SIGSEGV); - } - } else { - last_irq_sums[cpu] = sum; - alert_counter[cpu] = 0; - } - if (nmi_perfctr_msr) { - if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) { - /* - * P4 quirks: - * - An overflown perfctr will assert its interrupt - * until the OVF flag in its CCCR is cleared. - * - LVTPC is masked on interrupt and must be - * unmasked by the LVTPC handler. - */ - wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0, 0); - apic_write(APIC_LVTPC, APIC_DM_NMI); - } - wrmsr(nmi_perfctr_msr, -(cpu_khz/nmi_hz*1000), -1); - } -} - -EXPORT_SYMBOL(nmi_watchdog); -EXPORT_SYMBOL(disable_lapic_nmi_watchdog); -EXPORT_SYMBOL(enable_lapic_nmi_watchdog); -EXPORT_SYMBOL(disable_timer_nmi_watchdog); -EXPORT_SYMBOL(enable_timer_nmi_watchdog); ./contrib/nmi.c/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- merge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:12.147982738 +0000 @@ -1,36 +0,0 @@ -#ifndef MOD_TBILL_H -#define MOD_TBILL_H - -class DB; -class ServerRequest; - -#include -#include -#include "functionHash.h" - -typedef struct _tbill_state -{ - /* settable via conf file */ - char *logon; - char *template_dir; - char *graph_dir; - char *pdf_dir; - char *http_proxy_server_addr; //for reverse lookups - int http_proxy_server_port; //for reverse lookups - int auth_d_reload_interval; - char *debuglvl_path; - - /* internal state */ - DB *db; - FunctionHash * fh; - int available; - String *linkPathPrefix; - int auth_d_pipe_fd; - int auth_d_fd; - int production_mode; -} tbill_state; - -void generatePage(tbill_state *, ServerRequest *); - -#endif /* MOD_TBILL_H */ - ./contrib/mod_tbill/merge FAILED 0.00 /<>/wiggle: 1: Syntax error: "(" unexpected --- bmerge 2017-09-28 12:37:04.000000000 +0000 +++ - 2020-03-09 16:05:12.158606766 +0000 @@ -1,24 +0,0 @@ -\begin{ abstract } - % Start with a two-sentence (at most) description of the big-picture - % problem and why we care, and a sentence at the end that emphasizes how - % your work is part of the solution. - - Heterogeneous systems with <<<---Central Processing Units |||CPUs ===central CPUs --->>>and accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming - mainstream. In these systems, peak performance includes the performance - of not just the CPUs but also all available accelerators. In spite of this - fact, the majority of programming models for heterogeneous computing focus - on only one of these. With the development of Accelerated OpenMP for GPUs, - both from PGI and Cray, we have a clear path to extend traditional OpenMP - applications incrementally to use GPUs. The extensions are geared toward - switching from CPU parallelism to GPU parallelism. However they do not - preserve the former while adding the latter. Thus computational potential is - wasted since either the CPU cores or the GPU cores are left idle. Our goal - is to create a runtime system that can intelligently divide an accelerated - OpenMP region across all available resources automatically. This paper - presents our proof-of-concept runtime system for dynamic task scheduling - across CPUs and GPUs. Further, we motivate the addition of this system into - the proposed \emph{OpenMP for Accelerators} standard. Finally, we show that - this option can produce as much as a two-fold performance improvement over - using either the CPU or GPU alone. - -\end{ abstract } ./contrib/abstract/bmerge FAILED 0.00 0 succeeded and 67 failed make[2]: *** [Makefile:36: test] Error 1 make[2]: Leaving directory '/<>' dh_auto_build: error: make -j1 "INSTALL=install --strip-program=true" PKG_CONFIG=aarch64-linux-gnu-pkg-config CXX=aarch64-linux-gnu-g\+\+ CC=aarch64-linux-gnu-gcc "CFLAGS=-g -O2 -fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wno-unused-result -I. -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed" returned exit code 2 make[1]: *** [debian/rules:13: override_dh_auto_build] Error 25 make[1]: Leaving directory '/<>' make: *** [debian/rules:10: binary-arch] Error 2 dpkg-buildpackage: error: debian/rules binary-arch subprocess returned exit status 2 -------------------------------------------------------------------------------- Build finished at 2020-03-09T16:05:12Z Finished -------- +------------------------------------------------------------------------------+ | Cleanup | +------------------------------------------------------------------------------+ Purging /<> Not cleaning session: cloned chroot in use E: Build failure (dpkg-buildpackage died) +------------------------------------------------------------------------------+ | Summary | +------------------------------------------------------------------------------+ Build Architecture: amd64 Build Profiles: cross nocheck Build Type: any Build-Space: 5260 Build-Time: 8 Distribution: unstable Fail-Stage: build Foreign Architectures: arm64 Host Architecture: arm64 Install-Time: 19 Job: wiggle_1.1-1 Machine Architecture: amd64 Package: wiggle Package-Time: 37 Source-Version: 1.1-1 Space: 5260 Status: attempted Version: 1.1-1 -------------------------------------------------------------------------------- Finished at 2020-03-09T16:05:12Z Build needed 00:00:37, 5260k disk space