Mojo: A mini-tutorial on the Unix/Linux find command

Locating Files:

The find command is used to locate files on a Unix or Linux system.
find will search any set of directories you specify for files that
match the supplied search criteria. You can search for files by name,
owner, group, type, permissions, date, and other criteria. The search
is recursive in that it will search all subdirectories too. The
syntax looks like this:

find where-to-look criteria what-to-do

All arguments to find are optional, and there are defaults for all
parts. (This may depend on which version of find is used. Here we
discuss the freely available GNU version of find, which is the version
available on YborStudent.) For example where-to-look defaults to .
(that is, the current working directory), criteria defaults to none
(that is, show all files), and what-to-do (known as the find action)
defaults to -print (that is, display found files to standard output).

For example:

find

will display all files in the current directory and all
subdirectories. The commands

find . -print
find .

do the exact same thing. Here's an example find command using a
search criteria and the default action:

find / -name foo

will search the whole system for any files named foo and display them.
Here we are using the criteria -name with the argument foo to tell
find to perform a name search for the filename foo. The output might
look like this:

/home/wpollock/foo
/home/ua02/foo
/tmp/foo

If find doesn't locate any matching files, it produces no output.

The above example said to search the whole system, by specifying the
root directory (/) to search. If you don't run this command as root,
find will display a error message for each directory on which you
don't have read permission. This can be a lot of messages, and the
matching files that are found may scroll right off your screen. A
good way to deal with this problem is to redirect the error messages
so you don't have to see them at all:

find / -name foo 2>/dev/null

Other Features And Applications:

The -print action lists the files separated by a space when the output
is piped to another command. This can lead to a problem if any found
files contain spaces in their names, as the output doesn't use any
quoting. In such cases, when the output of find contains a file name
such as foo bar and is piped into another command, that command sees
two file names, not one file name containing a space.

In such cases you can specify the action -print0 instead, which lists
the found files separated not with a space, but with a null character
(which is not a legal character in Unix or Linux file names). Of
course the command that reads the output of find must be able to
handle such a list of file names. Many commands commonly used with
find (such as tar or cpio) have special options to read in file names
separated with nulls instead of spaces.

You can use shell-style wildcards in the -name search argument:

find . -name foo\*bar

This will search from the current directory down for foo*bar (that is,
any filename that begins with foo and ends with bar). Note that
wildcards in the name argument must be quoted so the shell doesn't
expand them before passing them to find. Also, unlike regular shell
wildcards, these will match leading periods in filenames. (For
example find -name \*.txt.)

You can search for other criteria beside the name. Also you can list
multiple search criteria. When you have multiple criteria any found
files must match all listed criteria. That is, there is an implied
Boolean AND operator between the listed search criteria. find also
allows OR and NOT Boolean operators, as well as grouping, to combine
search criteria in powerful ways (not shown here.)

Here's an example using two search criteria:

find / -type f -mtime -7 | xargs tar -rf weekly_incremental.tar
gzip weekly_incremental.tar

will find any regular files (i.e., not directories or other special
files) with the criteria -type f, and only those modified seven or
fewer days ago (-mtime -7). Note the use of xargs, a handy utility
that coverts a stream of input (in this case the output of find) into
command line arguments for the supplied command (in this case tar,
used to create a backup archive). 1

Another use of xargs is illustrated below. This command will
efficiently remove all files named core from your system (provided you
run the command as root of course):

find / -name core | xargs /bin/rm -f
find / -name core -exec /bin/rm -f '{}' \; # same thing
find / -name core -delete # same if using Gnu find

(The last two forms run the rm command once per file, and are not as
efficient as the first form.)

One of my favorite find criteria is to locate files modified less than
10 minutes ago. I use this right after using some system
administration tool, to learn which files got changed by that tool:

find / -mmin -10

(This search is also useful when I've downloaded some file but can't locate it.)

Another common use is to locate all files owned by a given user (-user
username). This is useful when deleting user accounts.

You can also find files with various permissions set. -perm
/permissions means to find files with any of the specified permissions
on, -perm -permissions means to find files with all of the specified
permissions on, and -perm permissions means to find files with exactly
permissions. Permisisons can be specified either symbolically
(preferred) or with an octal number. The following will locate files
that are writeable by others:

find . -perm +o=w

(Using -perm is more complex than this example shows. You should
check both the POSIX documentation for find (which explains how the
symbolic modes work) and the Gnu find man page (which describes the
Gnu extensions).

When using find to locate files for backups, it often pays to use the
-depth option, which forces the output to be depth-first—that is,
files first and then the directories containing them. This helps when
the directories have restrictive permissions, and restoring the
directory first could prevent the files from restoring at all (and
would change the time stamp on the directory in any case). Normally,
find returns the directory first, before any of the files in that
directory. This is useful when using the -prune action to prevent
find from examining any files you want to ignore:

find / -name /dev -prune | xargs tar ...

When specifying time with find options such as -mmin (minutes) or
-mtime (24 hour periods, starting from now), you can specify a number
n to mean exactly n, -n to mean less than n, and +n to mean more than
n. 2 For example:

find . -mtime 0 # find files modified within the past 24 hours
find . -mtime -1 # find files modified within the past 24 hours
find . -mtime 1 # find files modified between 24 and 48 hours ago
find . -mtime +1 # find files modified more than 48 hours ago
find . -mmin +5 -mmin -10 # find files modifed between 6 and 9 minutes ago

The following displays non-hidden (no leading dot) files in the
current directory only (no subdirectories), with an arbitrary output
format (see the man page for the dozens of possibilities with the
-printf action):

find . -maxdepth 1 -name '[!.]*' -printf 'Name: %16f Size: %6s\n'

As a system administrator you can use find to locate suspicious files
(e.g., world writable files, files with no valid owner and/or group,
SetUID files, files with unusual permissions, sizes, names, or dates).
Here's a final more complex example (which I save as a shell script):

find / -noleaf -wholename '/proc' -prune \
-o -wholename '/sys' -prune \
-o -wholename '/dev' -prune \
-o -wholename '/windows-C-Drive' -prune \
-o -perm -2 ! -type l ! -type s \
! $ -type d -perm -1000 $ -print

This says to seach the whole system, skipping the directories /proc,
/sys, /dev, and /windows-C-Drive (presumably a Windows partition on a
dual-booted computer). The -noleaf option tells find to not assume
all remaining mounted filesystems are Unix file systems (you might
have a mounted CD for instance). The -o is the Boolean OR operator,
and ! is the Boolean NOT operator (applies to the following criteria).
So this criteria says to locate files that are world writable (-perm
-2) and NOT symlinks (! -type l) and NOT sockets (! -type s) and NOT
directories with the sticky (or text) bit set (! $ -type d -perm
-1000 $). (Symlinks, sockets and directories with the sticky bit set
are often world-writable and generally not suspicious.)

A common request is a way to find all the hard links to some file.
Using ls -li file will tell you how many hard links the file has, and
the inode number. You can locate all pathnames to this file with:

find mount-point -xdev -inum inode-number

Since hard links are restricted to a single filesystem, you need to
search that whole filesystem so you start the search at the
filesystem's mount point. (This is likely to be either /home or / for
files in your home directory.) The -xdev options tells find to not
search any other filesystems.

(While most Unix and all Linux systems have a find command that
supports the -inum criteria, this isn't POSIX standard. Older Unix
systems provided the ncheck command instead that could be used for
this.)
Using -exec Efficiently

The -exec option to find is great, but since it runs the command
listed for every found file, it isn't very efficient. On a large
system this makes a difference! One solution is to combine find with
xargs as discussed above:

find whatever... | xargs command

However this approach has two limitations. Firstly not all commands
accept the list of files at the end of the command. A good example is
cp:

find . -name \*.txt | xargs cp /tmp # This won't work!

(Note the Gnu version of cp has a non-POSIX option -t for this.)

Secondly filenames may contain spaces or newlines, which would confuse
the command used with xargs. (Again Gnu tools have options for that,
find ... -print0 |xargs -0 ....)

There are POSIX (but non-obvious) solutions to both problems. An
alternate form of -exec ends with a plus-sign, not a semi-colon. This
form collects the filenames into groups or sets, and runs the command
once per set. (This is exactly what xargs does, to prevent argument
lists from becoming too long for the system to handle.) In this form
the {} argument expands to the set of filenames. For example:

find / -name core -exec /bin/rm -f '{}' +

This form of -exec can be combined with a shell feature to solve the
other problem. The POSIX shell allows us to use:

sh -c 'command-line' [ command-name [ args... ] ]

(We don't usually care about the command-name, so X, dummy, or inline
cmd is used.) Here's an example of efficiently copying found files,
in a POSIX-compliant way 3:

find . -name '*.txt' -exec sh -c 'cp "$@" /tmp' dummy {} +

Or even better:

find . -name '*.txt' -type f \
-exec sh -c 'exec cp -f "$@" /tmp' find-copy {} +

The find command can be amazingly useful. See the man page to learn
all the criteria and options you can use.

Mojo

Sunday, June 1, 2008

A mini-tutorial on the Unix/Linux find command

No comments: