# Copyright (c) 2002-2003 # The President and Fellows of Harvard College. # # $Id: nfsscan.txt,v 1.5 2003/07/28 14:27:16 ellard Exp $ NFSSCAN DOCUMENTATION This is version 0.10a of nfsscan, dated 7/25/2003. THIS IS A PRELIMINARY RELEASE OF A NEW UTILITY. YOU SHOULD ASSUME THAT THE COMMANDLINE FORMATS WILL EVOLVE RAPIDLY OVER THE NEXT SEVERAL WEEKS. Please report bugs, problems or suggestions for improvements to ellard@eecs.harvard.edu. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - COMMANDLINE OPTIONS Usage: nfsscan [options] [trace1 [trace2 ...]] If no trace files are specified, then the trace is read from stdin. Command line options: -h Print usage message and exit. -B [CFUG] Compute per-Client, per-File, per-User, or per-Group info. -c c1[,c2]* Include only activity performed by the specified clients. -C c1[,c2]* Exclude activity performed by the specified clients. -d Compute per-directory statistics. This implicitly enables -BF so that per-file info is computed. -f Do file info tracking. This implicitly enables -BF so that per-File info is computed. -F fhtype Specify the file handle type used by the server. (advfs or netapp) -g g1[,g2]* Include only activity performed by the specified groups. -G g1[,g2]* Exclude activity performed by the specified groups. -l Record average operation latency. -o basename Write output to files starting with the specified basename. The "Count" table goes to basename.cnt, "Latency" to basename.lat, and "File" to basename.fil. The default is to write all output to stdout. -O op[,op]* Specify the list of "interesting" operations. The default list is: read,write,lookup,getattr,access,create,remove If the first op starts with +, then the specified list of ops is appended to the default list. The special pseudo-ops readM and writeM represent the number of bytes read and written, expressed in MB. -t interval Time interval for cummulative statistics (such as operation count). The default is 300 seconds. If set to 0, then the entire trace is processed. By default, time is specified in seconds, but if the last character of the interval is any of s, m, h, or d, then the interval is interpreted as seconds, minutes, hours, or days. -u u1[,u2]* Include only activity performed by the specified users. -U u1[,u2]* Exclude activity performed by the specified users. -Z Omit count and latency lines that have a zero total operation count. OUTPUT FORMATS Each line generated by nfsscan begins with a token that indicates the table to which it belongs. COUNT LINES These lines all begin with 'C' (or #C, for comments), and have the following format: C time client euid egid fh TOTAL INTERESTING time - The time of the start of the measurement interval. client - The IP number of the client. If the user has not requested per-client statistics, this field is 'u'. euid - The effective uid of the caller, in decimal. If the user has not requested per-user statistics, this field is 'u'. egid - The effective gid of the caller, in decimal. If the user has not requested per-group statistics, this field is 'u'. fh - The file handle used by the operation. If the user has not requested per-file statistics, this field is 'u'. TOTAL - The total number of operations for the given client, euid, egid, and fh, for all the operations in . Note that this is NOT the same as the total of all operations! For example, if you set to the empty list, the TOTAL will always be zero. INTERESTING - The total number of "interesting" operations. These are either the default operations (listed below) or the whatever set of operations the user specifies. - Following INTERESTING, the count for each "interesting" operation observed during the measurement interval is printed. The list of operations can be specified or extended on the commandline. By default, the list of operations is: read, write, lookup, getattr, access, create, remove LATENCY LINES The format for the "latency" lines is like that of the "count" lines, except the lines all begin with "L" (or "#L") and each count (including the TOTAL and INTERESTING counts) is followed the average latency for that operation. If the count for a particular operation is 0, then the average latency is shown as -1. Note that the counts for the latency lines may be (and often are) different than the counts for count lines. This is because the count lines show the number of calls that were observed, and the latency lines require observing both calls and reponses. FILE LINES These lines show information about files, rather than information about calls. These lines all begin with "F" (or "#F"). The file information (size, mode, etc) are currently only printed for files, not directories. The format is: F type state fh path dirs mode size atime mtime ctime type - Either "F" for file, or "D" for directory. state - "A" if the file is still alive at the end of the trace, or "D" if the file has been deleted. fh - The file handle. path - as much of the file path as can be reconstructed from the observed traffic. dirs - the number of directories in the path. mode - the most recently observed permissions on the file, in hex. size - the most recently observed size of the file, in hex. atime - last access time of the file mtime - last modification time of the file ctime - last file status change time DIRECTORY LINES These lines show operation count information for files and directories. These lines all begin with "D" (or "#D"). Each line begins with the following fields: time - The time of the start of the measurement interval. Dir/File - D for directory, F for file. dircnt - The length of the path to that directory, or zero for files. path - The path from the mount point to the directory or file. fh - The file handle of the directory or file. TOTAL - The total number of operations for the given client, euid, egid, and fh, for all the operations in . Note that this is NOT the same as the total of all operations! For example, if you set to the empty list, the TOTAL will always be zero. Following the TOTAL, the count for each "interesting" operation observed during the measurement interval is printed. The list of operations can be specified on the commandline. By default, the list of operations is: read, write, lookup, getattr, access, create, remove The information for files is redundant because this information is also reflected in the Count lines (if -BF is used), but it is sometimes useful to have it in the same format as the directory info. The TOTAL and op counts for directories represent the total number of ops for the files in that directory AND all of its subdirectories. NOTE: there are several potential problems with the directory information: 1. It is possible (and not uncommon) for some part of a path to be missing from the traces. The path is reconstructed as far back to the root as possible, but this is not always successful. If no information about the name of a file is known, then it is given the name ".". 2. If files are renamed during the trace, then the name shown is the most recent name for the file. 3. If files are removed during the trace, they are still reported in the summary. This can lead to an error: if a directory is deleted, and then another directory is created elsewhere with a new name but the same inode number, this new directory can "inherit" all the info about the files in the old directory, including parentage. In the worst case, this can cause a loop. The program will detect such loops, and therefore not get caught forever, but it doesn't do anything clever about them.