Benchstat computes and compares statistics about benchmarks.
Usage:
benchstat [-delta-test name] [-geomean] [-html] [-sort order] old.txt [new.txt] [more.txt ...]
Each input file should contain the concatenated output of a number of runs of “go test -bench.” For each different benchmark listed in an input file, benchstat computes the mean, minimum, and maximum run time, after removing outliers using the interquartile range rule.
If invoked on a single input file, benchstat prints the per-benchmark statistics for that file.
If invoked on a pair of input files, benchstat adds to the output a column showing the statistics from the second file and a column showing the percent change in mean from the first to the second file. Next to the percent change, benchstat shows the p-value and sample sizes from a test of the two distributions of benchmark times. Small p-values indicate that the two distributions are significantly different. If the test indicates that there was no significant change between the two benchmarks (defined as p > 0.05), benchstat displays a single ~ instead of the percent change.
The -delta-test option controls which significance test is applied: utest (Mann-Whitney U-test), ttest (two-sample Welch t-test), or none. The default is the U-test, sometimes also referred to as the Wilcoxon rank sum test.
If invoked on more than two input files, benchstat prints the per-benchmark statistics for all the files, showing one column of statistics for each file, with no column for percent change or statistical significance.
The -html option causes benchstat to print the results as an HTML table.
The -sort option specifies an order in which to list the results: none (input order), delta (percent improvement), or name (benchmark name). A leading “-” prefix, as in “-delta”, reverses the order.