--- /dev/null
+# Copyright (c) 2002-2003\r
+# The President and Fellows of Harvard College.\r
+#\r
+# $Id: EXAMPLES,v 1.3 2003/07/28 14:27:16 ellard Exp $\r
+#\r
+# For nfsscan version 0.10a (dated 7/25/2003).\r
+\r
+INTRODUCTION\r
+\r
+The usual procedure for analyzing a trace is the following:\r
+\r
+ 1. Use nfsscan to produce a tabularized summary of each\r
+ 300-second segment of the trace. For these examples,\r
+ we'll call this DEFAULT_TABLE.\r
+\r
+ Depending on what you're looking for, the default\r
+ settings of nfsscan might not provide all the info\r
+ you're going to want in the next step. The default is\r
+ to omit per-client, per-user, per-group, and per-file\r
+ stats and only tally total operation counts. [THE\r
+ DEFAULTS MAY CHANGE IN A FUTURE RELEASE.]\r
+\r
+ 2. Use ns_timeagg to create a summary of activity in the\r
+ entire trace named SUMMARY_TABLE from DEFAULT_TABLE.\r
+\r
+ Note that almost anything ns_timeagg and ns_split can\r
+ do can also be done directly with nfsscan. However,\r
+ the implicit goal of ns_timeagg and ns_split is to\r
+ AVOID re-running nfsscan. It is much faster to\r
+ re-process a table created by nfsscan than it is to\r
+ re-create the table -- the input to nfsscan is\r
+ typically several million (or billion) lines of trace\r
+ data, while the output is usually only a few thousand\r
+ table rows.\r
+\r
+ 3. Use ns_quickview to plot interesting aspects of the\r
+ DEFAULT_TABLE and/or SUMMARY_TABLE.\r
+\r
+ 4. [optional] Use ns_split and/or ns_tsplit to isolate\r
+ interesting parts of DEFAULT_TABLE (such as per-client\r
+ or per-user counts). Repeat steps 2 and 3 with the\r
+ results.\r
+\r
+ 5. [optional] If steps 2-4 found anything interesting, re-run\r
+ nfsscan with new parameters to take a closer look at\r
+ the trace.\r
+\r
+Examples and discussion of these steps and related topics is given\r
+below.\r
+\r
+For these examples, TRACE is a trace file gathered by nfsdump (or\r
+another tool that creates traces files in the same format), and TABLE.ns\r
+is a file created by nfsscan from TRACE. The suffix ".ns" is also\r
+used to denote files that contain tables created by nfsscan,\r
+ns_timeagg, ns_split, and ns_tsplit. Example commandlines always\r
+begin with "%".\r
+\r
+1. RUNNING NFSSCAN\r
+\r
+2. CREATING A SUMMARY TABLE\r
+\r
+ To compute a table contsisting of a single row with counts for\r
+ each operation tallied by the nfsscan run, aggregate over time\r
+ with a time length of zero. (Zero is treated as a special\r
+ time length that includes the entire input table.)\r
+\r
+ % ns_timeagg -t0 TABLE.ns > SUMMARY.ns\r
+\r
+ Note that timeagg will always aggregate over every (except\r
+ time) attribute, so it does not matter whether or not the\r
+ TABLE.ns contains per-client, per-user, per-group, or per-file\r
+ data. The sum will always be the same.\r
+\r
+ On the other hand, if you want to prevent ns_timeagg from\r
+ aggregating over a particular attribute, specify that\r
+ attribute in the same manner as with nfsscan. For example, to\r
+ create a table with a single row containing the operation\r
+ count per user:\r
+\r
+ % ns_timeagg -t0 -BU TABLE.ns > SUMMARY.ns\r
+\r
+ Of course, ns_timeagg cannot create data out of thin air. If\r
+ TABLE.ns does not contain per-user information then -BU will\r
+ have no effect.\r
+\r
+3. PLOTTING THE DATA\r
+\r
+ To simply plot the total operation count:\r
+\r
+ % ns_quickview TABLE.ns\r
+ % gv qv.ps\r
+\r
+WHICH CLIENT REQUESTS THE MOST OPERATIONS?\r
+\r
+Method: use nfsscan to tally the per-client operation counts for the\r
+ entire trace file (by using -t0), and then sort by the TOTAL\r
+ op count fields:\r
+\r
+ If TABLE contains per-client information, then this is easy:\r
+\r
+ % ns_timeagg -t0 -BC TABLE | grep -v '^#' \\r
+ | awk '{print $7, $3}' | sort -nr\r
+\r
+ If TABLE does not contain per-client info, then it's necessary\r
+ to re-run nfsscan:\r
+\r
+ % nfsscan -t0 -BC TRACE | grep -v '^#' \\r
+ | awk '{print $7, $3}' | sort -nr\r
+\r
+ The output from either command is a two-column table. The\r
+ first column is the total operation count of each client, and\r
+ the second column is the ID of each client.\r
+\r
+WHICH CLIENT DOES THE MOST READING?\r
+\r
+ If we've already got TABLE, and it contains per-client info,\r
+ then the easiest way is to simply use extract the read count\r
+ column (instead of the TOTAL column) from TABLE:\r
+\r
+ % ns_timeagg -t0 -BC TABLE | grep -v '^#' \\r
+ | awk '{print $9, $3}' | sort -nr\r
+\r
+ Or, we can nfsscan. Because we're not interested in anything\r
+ except the read count, we can change the list of operations\r
+ that nfsscan tabulates so that it only counts reads. (Of course,\r
+ the resulting table is useless for anything except answering\r
+ this particular question, and since nfsscan is expensive to run\r
+ this is probably wasteful.)\r
+\r
+ % nfsscan -t0 -BC -Oread -i TRACE | grep -v '^#' \\r
+ | awk '{print $7, $3}' | sort -nr\r
+\r
+WHICH CLIENT DOES THE MOST FSSTATS?\r
+\r
+ fsstat is not ordinarily tabulated by nfsscan. To tell nfsscan\r
+ to keep track of it, we can change the list of operations to consist\r
+ only of fsstat:\r
+\r
+ % nfsscan -t0 -BC -Ofsstat -i TRACE | ...\r
+\r
+ As mentioned in the previous example, it is often wasteful to\r
+ run nfsscan just to get one number. Another approach is to\r
+ add fsstat to the default list of "interesting" operations, by\r
+ using "+" at the start of the operation list. This tells nfsscan\r
+ to append the given list of operations to the default list:\r
+\r
+ % nfsscan -t0 -BC -O+fsstat -i TRACE | ...\r
+\r
+ An implication of this is that it's impossible to know what\r
+ each column in the table represents unless you know what\r
+ operations were considered "interesting" for each run of\r
+ nfsscan. To help with this, nfsscan includes the commandline\r
+ and column titles at the start of each file it creates.\r
+\r
+WHICH USER DOES THE MOST READING?\r
+\r
+ This is exactly like the previous example, except that we use\r
+ -BU instead -BC, to do everything per-user instead of\r
+ per-client.\r
+\r
+WHAT DIRECTORIES ARE HOTTEST?\r
+\r
+ Use the -d option to find the cummulative number of operations\r
+ per directory, then sort the results by operation count. In\r
+ order to avoid drowning in data you might choose to print\r
+ print only the top 100:\r
+\r
+ % nfsscan -i TRACE -t0 -d | grep '^D' \\r
+ | awk '{print $7, $5}' | sort -nr | head -100\r
+\r