# Copyright (c) 2002-2003 # The President and Fellows of Harvard College. # # $Id: EXAMPLES,v 1.3 2003/07/28 14:27:16 ellard Exp $ # # For nfsscan version 0.10a (dated 7/25/2003). INTRODUCTION The usual procedure for analyzing a trace is the following: 1. Use nfsscan to produce a tabularized summary of each 300-second segment of the trace. For these examples, we'll call this DEFAULT_TABLE. Depending on what you're looking for, the default settings of nfsscan might not provide all the info you're going to want in the next step. The default is to omit per-client, per-user, per-group, and per-file stats and only tally total operation counts. [THE DEFAULTS MAY CHANGE IN A FUTURE RELEASE.] 2. Use ns_timeagg to create a summary of activity in the entire trace named SUMMARY_TABLE from DEFAULT_TABLE. Note that almost anything ns_timeagg and ns_split can do can also be done directly with nfsscan. However, the implicit goal of ns_timeagg and ns_split is to AVOID re-running nfsscan. It is much faster to re-process a table created by nfsscan than it is to re-create the table -- the input to nfsscan is typically several million (or billion) lines of trace data, while the output is usually only a few thousand table rows. 3. Use ns_quickview to plot interesting aspects of the DEFAULT_TABLE and/or SUMMARY_TABLE. 4. [optional] Use ns_split and/or ns_tsplit to isolate interesting parts of DEFAULT_TABLE (such as per-client or per-user counts). Repeat steps 2 and 3 with the results. 5. [optional] If steps 2-4 found anything interesting, re-run nfsscan with new parameters to take a closer look at the trace. Examples and discussion of these steps and related topics is given below. For these examples, TRACE is a trace file gathered by nfsdump (or another tool that creates traces files in the same format), and TABLE.ns is a file created by nfsscan from TRACE. The suffix ".ns" is also used to denote files that contain tables created by nfsscan, ns_timeagg, ns_split, and ns_tsplit. Example commandlines always begin with "%". 1. RUNNING NFSSCAN 2. CREATING A SUMMARY TABLE To compute a table contsisting of a single row with counts for each operation tallied by the nfsscan run, aggregate over time with a time length of zero. (Zero is treated as a special time length that includes the entire input table.) % ns_timeagg -t0 TABLE.ns > SUMMARY.ns Note that timeagg will always aggregate over every (except time) attribute, so it does not matter whether or not the TABLE.ns contains per-client, per-user, per-group, or per-file data. The sum will always be the same. On the other hand, if you want to prevent ns_timeagg from aggregating over a particular attribute, specify that attribute in the same manner as with nfsscan. For example, to create a table with a single row containing the operation count per user: % ns_timeagg -t0 -BU TABLE.ns > SUMMARY.ns Of course, ns_timeagg cannot create data out of thin air. If TABLE.ns does not contain per-user information then -BU will have no effect. 3. PLOTTING THE DATA To simply plot the total operation count: % ns_quickview TABLE.ns % gv qv.ps WHICH CLIENT REQUESTS THE MOST OPERATIONS? Method: use nfsscan to tally the per-client operation counts for the entire trace file (by using -t0), and then sort by the TOTAL op count fields: If TABLE contains per-client information, then this is easy: % ns_timeagg -t0 -BC TABLE | grep -v '^#' \ | awk '{print $7, $3}' | sort -nr If TABLE does not contain per-client info, then it's necessary to re-run nfsscan: % nfsscan -t0 -BC TRACE | grep -v '^#' \ | awk '{print $7, $3}' | sort -nr The output from either command is a two-column table. The first column is the total operation count of each client, and the second column is the ID of each client. WHICH CLIENT DOES THE MOST READING? If we've already got TABLE, and it contains per-client info, then the easiest way is to simply use extract the read count column (instead of the TOTAL column) from TABLE: % ns_timeagg -t0 -BC TABLE | grep -v '^#' \ | awk '{print $9, $3}' | sort -nr Or, we can nfsscan. Because we're not interested in anything except the read count, we can change the list of operations that nfsscan tabulates so that it only counts reads. (Of course, the resulting table is useless for anything except answering this particular question, and since nfsscan is expensive to run this is probably wasteful.) % nfsscan -t0 -BC -Oread -i TRACE | grep -v '^#' \ | awk '{print $7, $3}' | sort -nr WHICH CLIENT DOES THE MOST FSSTATS? fsstat is not ordinarily tabulated by nfsscan. To tell nfsscan to keep track of it, we can change the list of operations to consist only of fsstat: % nfsscan -t0 -BC -Ofsstat -i TRACE | ... As mentioned in the previous example, it is often wasteful to run nfsscan just to get one number. Another approach is to add fsstat to the default list of "interesting" operations, by using "+" at the start of the operation list. This tells nfsscan to append the given list of operations to the default list: % nfsscan -t0 -BC -O+fsstat -i TRACE | ... An implication of this is that it's impossible to know what each column in the table represents unless you know what operations were considered "interesting" for each run of nfsscan. To help with this, nfsscan includes the commandline and column titles at the start of each file it creates. WHICH USER DOES THE MOST READING? This is exactly like the previous example, except that we use -BU instead -BC, to do everything per-user instead of per-client. WHAT DIRECTORIES ARE HOTTEST? Use the -d option to find the cummulative number of operations per directory, then sort the results by operation count. In order to avoid drowning in data you might choose to print print only the top 100: % nfsscan -i TRACE -t0 -d | grep '^D' \ | awk '{print $7, $5}' | sort -nr | head -100